Advanced Lesson 1
Regular Expressions
Chapter 2: Regular expression basics
Range
Now let’s say you want to match these six strings: lecture1
, lecture2
, lecture3
, lecture4
, lecture5
, and lecture6
.
You can define your set of valid numbers as we discussed earlier: lecture[123456]
.
An easier way is to define a range of valid values: lecture[1-6]
.
You can also define multiple ranges, for example [A-Za-z0-9_]
will match any uppercase character, lowercase character, digit, or underscore.
Like earlier, you can also match any characters except those in a defined range using the caret ^
. For example, [^n-p]ot
will NOT match not
, oot
and pot
.
You can combine ranges with individual characters too. For example, [^n-pbd]ot
will not match not
, oot
, pot
, bot
and dot
.
>>> pattern = "[A-Z][a-z]n"
>>> re.match(pattern, "Can")
<re.Match object; span=(0, 3), match='Can'>
>>> re.match(pattern, "can") # None
>>> re.match(pattern, "Cap") # None
>>> re.match(pattern, "CAn") # None
Quick task
Write a regular expression where the first character must be a lowercase letter, the second character must be a lowercase vowel, and the third character can be anything except a digit.
Example valid strings: boy
, hi!
, caT
Example invalid strings: HEY
, art
, co2
>>> pattern = "[a-z][aeiou][^0-9]"
>>> re.match(pattern, "hi!")
<re.Match object; span=(0, 3), match='hi!'>
>>> re.match(pattern, "co2") # None
>>> re.match(pattern, "art") # None
>>>