Advanced Lesson 1
Regular Expressions
Chapter 7: Groups
Capturing groups
Now, let’s start talking about more advanced regular expression features so that you can really do things!
So far, we have used parenthesis ( )
in regular expressions to group characters. But you can actually do more with the group. You can actually retrieve the content of a group after matching. This is useful if you need to extract the specific substrings that match your pattern.
Below is an example use case. You might want to retrieve some information about that crazy lecturer of yours. Try it out yourself! And make sure you understand what each output represents. Consult the documentation otherwise.
>>> pattern = "Name: ([A-Za-z ]+); Phone: (\d+); Position: (.+)"
>>> string = "Name: Josiah Wang; Phone: 012345678; Position: Senior Teaching Fellow"
>>> match = re.match(pattern, string)
>>> print(match)
<re.Match object; span=(0, 69), match='Name: Josiah Wang; Phone: 012345678; Position: Se>
>>> match.groups()
('Josiah Wang', '012345678', 'Senior Teaching Fellow')
>>> match.group()
'Name: Josiah Wang; Phone: 012345678; Position: Senior Teaching Fellow'
>>> match.group(1)
'Josiah Wang'
>>> match.group(2)
'012345678'
>>> match.group(3)
'Senior Teaching Fellow'
>>> match.group(1,3)
('Josiah Wang', 'Senior Teaching Fellow')
>>> match.start(), match.end()
(0, 69)
>>> match.start(1), match.end(1)
(6, 17)
>>> match.start(2), match.end(2)
(26, 35)
>>> match.start(3), match.end(3)
(47, 69)
>>> match.span(2) # another way of getting a tuple (start, end)
(26, 35)
>>> match.span(3)
(47, 69)
Hopefully this is clear enough that I won’t need to explain further!
match.group()
is the same as match.group(0)
, which will give you the whole matched string. Any group number from 1
onwards gives you the captured content in each parenthesis.