This is an archived version of the course and is no longer updated. Please find the latest version of the course on the main webpage.

Backreferences and substitution

Here is something more advanced, and may sometimes be useful.

You might want to reuse the content of a group that you have captured earlier in the same string. For example, you might want a word occurring earlier to be repeated in the same string. Backreferences are useful for this. You use "\\1" (or r"\1") to refer to the first group that you captured earlier.

### Note the r in front of the string - this is a raw string! 
>>> pattern = r"([A-Za-z]+) went to ([A-Za-z]+). The cuisine of \2 fascinated \1."
>>> string1 = "Josiah went to Japan. The cuisine of Japan fascinated Josiah."
>>> match = re.match(pattern, string1)
>>> match.group()
?????
>>> match.groups()
????? 
>>> string2 = "Harry went to Greece. The cuisine of Greece fascinated William." 
>>> match = re.match(pattern, string2)
>>> print(match)
None

In Python, you can also used named groups (?P<name>), and refer to the content of the named groups with (?P=name)

>>> pattern = "(?P<person>[A-Za-z]+) went to (?P<place>[A-Za-z]+).\ 
                 The cuisine of (?P=place) fascinated (?P=person)."

Regular expression substitution

The true power of backreferencing can be seen when you need to find and replace a string.

Let’s say you want to make the section headers in your LaTeX document to be chapters (perhaps you are converting your paper into a book?).

You can replace all instances of section with chapter, but keeping the original header titles using backreferences. (We’re omitting the backslashes from LaTeX for simplicity)

The function re.sub(pattern, replacement, string) or the method pattern.sub(replacement, string) of a Pattern object can be used for this. It is similar to the str.replace() method, except that you can also search for substrings using regular expressions.

>>> pattern = r"section{([^}]*)}"
>>> replacement = r"chapter{\1}"
>>> string = "section{Introduction} section{Literature review}"
>>> re.sub(pattern, replacement, string)
chapter{Introduction} chapter{Literature review}

And, as expected, you can also use named groups for this. You use \g<name> to refer to the named group in the original pattern. \g<1> works too (and is equivalent to \1).

>>> pattern = r"section{(?P<title>[^}]*)}"
>>> string = "section{Introduction} section{Literature review}"
>>> re.sub(pattern, r"chapter{\1}", string)
?????
>>> re.sub(pattern, r"chapter{\g<1>}", string)
?????
>>> re.sub(pattern, r"chapter{\g<title>}", string)
?????

There is also a re.subn() function that also returns the number of substrings replaced.