Chapter 2: Regular expression basics

Range

face Josiah Wang

Now let’s say you want to match these six strings: lecture1, lecture2, lecture3, lecture4, lecture5, and lecture6.

You can define your set of valid numbers as we discussed earlier: lecture[123456].

An easier way is to define a range of valid values: lecture[1-6].

You can also define multiple ranges, for example [A-Za-z0-9_] will match any uppercase character, lowercase character, digit, or underscore.

Like earlier, you can also match any characters except those in a defined range using the caret ^. For example, [^n-p]ot will NOT match not, oot and pot.

You can combine ranges with individual characters too. For example, [^n-pbd]ot will not match not, oot, pot, bot and dot.

>>> pattern = "[A-Z][a-z]n"
>>> re.match(pattern, "Can")
<re.Match object; span=(0, 3), match='Can'>
>>> re.match(pattern, "can") # None
>>> re.match(pattern, "Cap") # None
>>> re.match(pattern, "CAn") # None

Quick task

Write a regular expression where the first character must be a lowercase letter, the second character must be a lowercase vowel, and the third character can be anything except a digit.

Example valid strings: boy, hi!, caT

Example invalid strings: HEY, art, co2

>>> pattern = "[a-z][aeiou][^0-9]"
>>> re.match(pattern, "hi!")
<re.Match object; span=(0, 3), match='hi!'>
>>> re.match(pattern, "co2") # None
>>> re.match(pattern, "art")  # None
>>>