Chapter 7: Groups

Backreferences

face Josiah Wang

Here is something even more advanced, and may sometimes be useful.

You might want to reuse the content of a group that you have captured earlier in the same string. For example, you might want a word occurring earlier to be repeated.

Backreferences are useful for this. You use "\\1" (or r"\1") to refer to the first group that you captured earlier.

>>> ### Note the r in front of the string - this is a Python raw string! 
>>> pattern = r"([A-Za-z]+) went to ([A-Za-z]+). The cuisine of \2 fascinated \1."
>>> string1 = "Josiah went to Japan. The cuisine of Japan fascinated Josiah."
>>> match = re.match(pattern, string1)
>>> match.group()
'Josiah went to Japan. The cuisine of Japan fascinated Josiah.'
>>> match.groups()
('Josiah', 'Japan')
>>> string2 = "Harry went to Greece. The cuisine of Greece fascinated William." 
>>> match = re.match(pattern, string2)
>>> print(match)
None

In Python, you can also use named groups (?P<name>), and refer to the content of the named groups with (?P=name)

>>> pattern = "(?P<person>[A-Za-z]+) went to (?P<place>[A-Za-z]+). The cuisine of (?P=place) fascinated (?P=person)."
>>> string3 = "Luca went to Brazil. The cuisine of Brazil fascinated Luca."
>>> match = re.match(pattern, string3)
>>> match.groupdict()
{'person': 'Luca', 'place': 'Brazil'}

Quick task

Write a regular expression that represents the set of all lowercase strings with two consecutive repeated words

  • Valid: baby baby, no no, yeah yeah
  • Invalid: my love, no baby no, sweet dreams

>>> pattern = r"([a-z]+) \1"  # note that this is a raw string literal! 
>>> re.match(pattern, "baby baby")
<re.Match object; span=(0, 9), match='baby baby'>
>>> re.match(pattern, "no no")
<re.Match object; span=(0, 5), match='no no'>
>>> re.match(pattern, "no baby no") # None