This is an archived version of the course and is no longer updated. Please find the latest version of the course on the main webpage.

Groups

During the live lectures, we discussed that ( ) represents a group. You can retrieve the content of a group after matching. This is useful if you need to extract the specific substrings that match your pattern.

Here is an example use case. Try this out and make sure you understand what each output represents. Consult the documentation otherwise.

>>> pattern = "Name: ([A-Za-z ]+); Phone: (\d+); Position: (.+)"
>>> string = "Name: Josiah Wang; Phone: 012345678; Position: Senior Teaching Fellow"
>>> match = re.match(pattern, string)
>>> match.groups()
>>> match.group()
>>> match.group(1)
>>> match.group(2)
>>> match.group(3)
>>> match.group(1,3)
>>> match.start(), match.end()
>>> match.start(1), match.end(1)
>>> match.start(2), match.end(2)
>>> match.start(3), match.end(3)
>>> match.span(3)

Non-capturing groups

Sometimes you are not interested in the content of a group. You might just want to group them together. You can explicitly represent such non-capturing groups with (?: ). You cannot retrieve the content of such groups.

Here is an example. We do not need to know whether the class is “dog” or “puppy”, so we represent this as a non-capturing group. Examine the output, and you will find that the class has been skipped.

>>> pattern = "Accuracy: (\d+\.\d+) Class: (?:dog|puppy) Precision: (\d+\.\d+)"
>>> string = "Accuracy: 0.35 Class: dog Precision: 0.55"
>>> match = re.match(pattern, string)
>>> match.groups()

Named groups

You can also assign a name/identifier to each of your group, rather than by a number. You can then access your groups by their name. This is done with (?P<name> ). Note that name must be a valid Python identifier.

The names make the groups more meaningful, rather than referring to them by a number.

As usual, try out the examples yourself and make sure you understand what each line is doing.

>>> pattern = "Name: (?P<name>[A-Za-z ]+); Phone: (?P<phone>\d+)"
>>> string = "Name: Josiah Wang; Phone: 012345678"
>>> match = re.match(pattern, string)
>>> match.group("name")
>>> match.group("phone")
>>> match.group(1)
>>> match.group(2)
>>> match.groupdict()