This is an archived version of the course. Please find the latest version of the course on the main webpage.

Chapter 5: Exercises

Building parsers

face Josiah Wang

Task 3: Building parsers

As an example of regular expressions in a practical application, you can use regular expressions to check for syntax errors in a programming language. For example, you can check whether a Python variable name is valid.

Let’s try this for a bit!

Write regular expressions for each of the following. Test your regular expressions using re.match() (or an online regular expression tester). You can make any further assumptions if something is not explicitly specified as allowed or disallowed.

Question 1: Python identifiers

Write a regular expression representing valid Python identifiers. Remember that Python identifiers can consist of upper and lower letters, digits, or underscores (_), and must start with a letter or an underscore. For simplicity, Python keywords like if are allowed.

  • Valid: my_variable123, _12is_thisvalid, for, if
  • Invalid: 123abc, 1, haha!, empty string

Question 2: Python assignment statements

Write a regular expression representing valid Python assignment statements, i.e. LHS = RHS. For simplicity, assume that the LHS can be any valid identifier or keyword (as above). The RHS can be either an integer or a float (positive or negative). The spaces before and after = are optional.

  • Valid: my_variable123 = 5, if= 1.23, x=2., y =0.678, abc = -2.1
  • Invalid: 1 = 4, my_var = your_var, var, x + 5

Hint: It can get a bit messy and hacky! It’s fine - that’s how regular expressions are!

Question 3: Python comparison expressions

Write a regular expression representing valid Python comparison expressions, e.g. LHS == RHS. For simplicity, assume that the valid operators are ==, <=, >=, <, >, and !=. Also assume that both LHS and RHS can take either a Python identifier+keyword or a positive integer. Again, white spaces before and after the operator as optional.

  • Valid: 12 == 14, x >=2, for< if, 15!=age
  • Invalid: x = 2, av<= -5, 123abc > abc123, hello, 5+1