Advanced Lesson 1
Regular Expressions
Chapter 7: Groups
Backreferences
Here is something even more advanced, and may sometimes be useful.
You might want to reuse the content of a group that you have captured earlier in the same string. For example, you might want a word occurring earlier to be repeated.
Backreferences are useful for this. You use "\\1"
(or r"\1"
) to refer to the first group that you captured earlier.
>>> ### Note the r in front of the string - this is a Python raw string!
>>> pattern = r"([A-Za-z]+) went to ([A-Za-z]+). The cuisine of \2 fascinated \1."
>>> string1 = "Josiah went to Japan. The cuisine of Japan fascinated Josiah."
>>> match = re.match(pattern, string1)
>>> match.group()
'Josiah went to Japan. The cuisine of Japan fascinated Josiah.'
>>> match.groups()
('Josiah', 'Japan')
>>> string2 = "Harry went to Greece. The cuisine of Greece fascinated William."
>>> match = re.match(pattern, string2)
>>> print(match)
None
In Python, you can also use named groups (?P<name>)
, and refer to the content of the named groups with (?P=name)
>>> pattern = "(?P<person>[A-Za-z]+) went to (?P<place>[A-Za-z]+). The cuisine of (?P=place) fascinated (?P=person)."
>>> string3 = "Luca went to Brazil. The cuisine of Brazil fascinated Luca."
>>> match = re.match(pattern, string3)
>>> match.groupdict()
{'person': 'Luca', 'place': 'Brazil'}
Quick task
Write a regular expression that represents the set of all lowercase strings with two consecutive repeated words
- Valid:
baby baby
,no no
,yeah yeah
- Invalid:
my love
,no baby no
,sweet dreams
>>> pattern = r"([a-z]+) \1" # note that this is a raw string literal!
>>> re.match(pattern, "baby baby")
<re.Match object; span=(0, 9), match='baby baby'>
>>> re.match(pattern, "no no")
<re.Match object; span=(0, 5), match='no no'>
>>> re.match(pattern, "no baby no") # None