Advanced Lesson 1
Regular Expressions
Chapter 4: Boundaries
End of sentence
Let’s say you are a "tea"
person, and want to search for phrases that only mentions "tea"
. But the following regular expression matches the phrase "tea and coffee"
(even though it only extracts the "tea"
part).
>>> pattern = "tea"
>>> phrase = "tea and coffee"
>>> re.search(pattern, phrase)
<re.Match object; span=(0, 3), match='tea'>
But you hate coffee! And having coffee next to your tea will only pollute it! No, you want only pure tea!
So how do you force the regular expression to match only tea (and nothing else after it)? This is where the end of sentence marker ($) becomes useful! Like the start of sentence marker (^
), $
marks the end of a sentence.
The following regular expression matches “tea followed by the end of sentence”. So you’ll get your pure tea!
>>> pattern = "tea$"
>>> phrase = "tea and coffee"
>>> re.search(pattern, phrase) # None
You can obviously combine both start and end of sentence markers for the purest of all teas! So the only string you will match is pure "tea"
.
>>> pattern = "^tea$"
>>> phrase = "herbal tea and coffee"
>>> re.search(pattern, phrase) # None
>>> phrase = "tea"
<re.Match object; span=(0, 3), match='tea'>