This is an archived version of the course and is no longer updated. Please find the latest version of the course on the main webpage.

Metacharacters and special characters

As a reminder, here are the special metacharacters for regular expressions in Python

  • .: Match any characters except a newline.
  • ^: Match the start of a string, or “not” if used inside [ ].
  • $: Match the end of a string
  • ?: Match zero or one repetitions
  • *: Match zero or more repetitions
  • +: Match one or more repretitions
  • [ ]: Match a set of characters
  • ( ): Grouping
  • |: Or
  • {m}: Match exactly m repetitions
  • {m,n}: Match m to n repetitions. Omitting n will give you infinity.
  • \: Escape characters. Use this to represent any of the metacharacters above (e.g. “\?” if you want to match a question mark)

Here are also some special characters that can be used as a shorthand for the corresonding regular expressions:

  • \d == [0-9] (“digits”)
  • \D == [^0-9] (“non-digits”)
  • \s == [ \t\n\r\f\v] (“whitespace”)
  • \S == [^ \t\n\r\f\v] (“non-whitespace”)
  • \w == [a-zA-Z0-9_] (“word”)
  • \W == [^a-zA-Z0-9_] (“non-word”)

Also remember the curse of the backslash \. If you want to represent a single backslash, you will need to escape it with another backslash "\\". For example if you want to represent a newline character \n, it should be written as "\\n". If you want two backslashes \\, then you will need to represent it with four backslashes "\\\\". Alternatively, you can use a Python raw string literal: r"\n" is equivalent to "\\n", and r"\\" is equivalent to "\\\\".