Advanced Lesson 1
Regular Expressions
Chapter 3: Quantifiers
Kleene plus
We previously accepted "b"
(with no "a"
after the "b"
), in addition to ba
, baa
, baaa
, etc.
What if we want at least one "a"
after the "b"
? Because a "b"
without an "a"
is not really a sound a sheep would make!
The solution: just add an "a"
to force the first "a"
!
>>> pattern = "baa*"
>>> re.match(pattern, "b") # None
>>> re.match(pattern, "ba")
<re.Match object; span=(0, 2), match='ba'
You can also use a +
(Kleene Plus), which represents “one or more times”. This is a shorthand notation for the above! So a+
is the same as aa*
.
>>> pattern = "ba+"
>>> re.match(pattern, "b") # None
>>> re.match(pattern, "ba")
<re.Match object; span=(0, 2), match='ba'>
>>> re.match(pattern, "baaaaa")
<re.Match object; span=(0, 6), match='baaaaa'>
Quick task
Write a regular expression that matches one or more x
, y
or z
.
Example valid strings: x
, y
, z
, xx
, xy
, zxxyx
, yxyz
, xyzxyz
Example invalid strings: <blank>
, aaa
, ax
>>> pattern = "[xyz]+"
>>> re.match(pattern, "y")
<re.Match object; span=(0, 1), match='y'>
>>> re.match(pattern, "zyxy")
<re.Match object; span=(0, 4), match='zyxy'>
>>> re.match(pattern, "") # None
>>> re.match(pattern, "ax") # None
>>> re.match(pattern, "baaaaa") # None
Other possible solutions: "[x-z]+"
, "[x-z][x-z]*"
, `"[xyz][xyz]*"