Matching exercises
What better way is there for you to test your understanding of regular expressions than to do it yourself?
Try these exercises (and test them in Python)!
Many thanks to the following people for contributing to these exercises: Josiah Wang
Task #1
Try to guess what the following regular expressions mean. Then verify it with re.match()
, using both correct and incorrect examples. Note the effects of having ^
/$
at the beginning/end of the regular expressions. Also check whether ^
makes any difference when using re.search()
instead of re.match()
.
^a*b+$
a(ba)*b$
^(a|ba|bb).*
a{2}b{1,3}c?[d-f2-9]+
(red|green|blue)((,| and) (red|green|blue))* colou?rs?
Task #2
Write regular expressions for the following languages. Test your regular expressions using re.match()
.
[Credits: Q1-4 are adapted from Speech and Language Processing, Jurafsky & Martin]
- The set of all alphabetic strings
- Valid:
cake
,HeLLo WorLd
,Llanfairpwllgwyngyllgogerychwyrndrobwllllantysiliogogogoch
, empty string - Invalid:
123
,backstreet_boys
,mumboNumber5
- Valid:
- The set of all lower case alphabetic strings ending in a
b
- Valid:
bob
,climb
,door knob
- Invalid:
Bob
,britney spears
,WANNABE
- Valid:
- The set of all strings with two consecutive repeated words
- Valid:
baby baby
,no no
,yeah yeah
- Invalid:
my love
,no baby no
,sweet dreams
- Valid:
- The set of all strings from the alphabet {
a
,b
} such that eacha
is preceded by and immediately followed by ab
- Valid:
bab
,babbab
,bababababab
,bbbabbbbbbabb
, empty string - Invalid:
a
,abba
,bbbb
,bbaba
- Valid:
- The set of lowercase letters, where there must be an
a
at every odd position in the string.- Valid:
a
,aca
,aaaba
,acafa
,aaaaaa
- Invalid:
b
,baba
,acda
,aeaaf
- Valid:
Task #3
Write regular expressions for the following languages. Test your regular expressions using re.match()
. You can make any further assumptions if something is not explicitly specified as allowed or disallowed.
- Python identifiers. Remember that Python identifiers can consist of upper and lower letters, digits, or underscores (
_
), and must start with a letter or an underscore. Python keywords are allowed for simplicity.- Valid:
my_variable123
,_12is_thisvalid
,for
,if
- Invalid:
123abc
,1
, empty string
- Valid:
- Python assignment statements, i.e.
LHS = RHS
. For simplicity, assume that the LHS can be any valid identifier or keyword (as above). The RHS can be an integer or a float (positive or negative). The spaces before and after=
are optional.- Valid:
my_variable123 = 5
,if= 1.23
,x=2.
,y =0.678
,abc = -2.1
- Invalid:
1 = 4
,my_var = your_var
,var
,x + 5
- Valid:
- Python comparison expressions. For simplicity, assume that the valid operators are
==
,<=
,>=
,<
,>
, and!=
. Also assume that both LHS and RHS can take either a Python identifier+keyword or a positive integer. Again, white spaces before and after the operator as optional.- Valid:
12 == 14
,x >=2
,for< if
,15!=age
- Invalid:
x = 2
,av<= -5
,123abc > abc123
,hello
,5+1
- Valid:
Task #4
(This has/will have been discussed in the live lecture, but try it again yourself as a refresher!)
Develop a regular expression to validate a given email address.
For simplicity, we will use these rules:
- An email address is in the form USER@HOST.EXT
- USER can be made up of one or more uppercase or lowercase letters, digits, underscores, dots (
.
),+
,-
and?
. - HOST can be made up of one or more uppercase or lowercase letters, digits, or a hyphen (
-
). It must not end with a hyphen (-
), and must not be made up of only digits. - EXT can be either
co
,com
, ororg
.
Stress test your regular expression with good and bad cases.