Advanced Lesson 1
Regular Expressions
Chapter 3: Quantifiers
Kleene star
So far we have only written regular expressions for a fixed number of characters.
What if I want a regular expression that accepts all the following: b
, ba
, baa
, baaa
, baaaa
, baaaaa
, baaaaaa
, baaaaaaa
, baaaaaaaa
, baaaaaaaaa
, baaaaaaaaa
, etc.
This is easy. Just say "ba*"
, where *
means “zero or more times”.
>>> pattern = "ba*"
>>> re.match(pattern, "b")
<re.Match object; span=(0, 1), match='b'>
>>> re.match(pattern, "ba")
<re.Match object; span=(0, 2), match='ba'>
>>> re.match(pattern, "baaaaaaaaaaaaa")
<re.Match object; span=(0, 14), match='baaaaaaaaaaaaa'
The star *
is called the Kleene star, named after our Mathematician friend Stephen Kleene.
Quick task
Write a regular expression where the first letter is a "b"
, followed by zero or more characters (any character except "!"
), and followed by a "!"
.
Example valid strings: b!
, ba!
, bdsf^123!
Example invalid strings: daaad!
, ba1a?2@cd
>>> pattern = "b.*!"
>>> re.match(pattern, "b!")
<re.Match object; span=(0, 2), match='b!'>
>>> re.match(pattern, "b7$!")
<re.Match object; span=(0, 4), match='b7$!'>
>>> re.match(pattern, "ba1a?2@cd") # None
>>> re.match(pattern, "daaad!") # None
>>>
You can also use "b[^!]*!"
if you want to be more explicit, but a simple “any character” wildcard as above will work.