Advanced Lesson 1
Regular Expressions
Chapter 3: Quantifiers
Bounded repetition
What if you want to specify that sheep must "baa"
with exactly 10 "a"
s? You can of course write "baaaaaaaaaa"
. You can also use a more convenient notation "ba{10}"
to specify that "a"
should be repeated exactly 10 times.
>>> pattern = "ba{10}"
>>> re.match(pattern, "baaaaaaaaaa")
<re.Match object; span=(0, 11), match='baaaaaaaaaa'>
>>> re.match(pattern, "baaaaaa") # None
You can also specify a range. So "a{2,4}"
will match "aa"
, "aaa"
, and "aaaa"
.
>>> pattern = "ba{2,4}"
>>> re.match(pattern, "baa")
<re.Match object; span=(0, 3), match='baa'>
>>> re.match(pattern, "baaaa")
<re.Match object; span=(0, 5), match='baaaa'>
>>> re.match(pattern, "baaaaaa") # Note that this only matches up to four 'a's
<re.Match object; span=(0, 5), match='baaaa'>
>>> re.match(pattern, "ba") # None
Don’t have an upper limit? You can just specify the minimum. So a{2,}
must match at least two "a"
s. Similarly, a{,4}
matches at most four "a"
s.
Quick task
Write a regular expression that matches 3 to 7 digits. So "100"
, "0234"
, "5394212"
are all valid strings.
>>> pattern = "[0-9]{3,7}"
>>> re.match(pattern, "5394212")
<re.Match object; span=(0, 7), match='5394212'>
>>> re.match(pattern, "12345678") # Note only up to 7 digits are matched
<re.Match object; span=(0, 7), match='1234567'>
>>> re.match(pattern, "12") # None