Metacharacters and special characters
As a reminder, here are the special metacharacters for regular expressions in Python
.
: Match any characters except a newline.^
: Match the start of a string, or “not” if used inside[ ]
.$
: Match the end of a string?
: Match zero or one repetitions*
: Match zero or more repetitions+
: Match one or more repretitions[
]
: Match a set of characters(
)
: Grouping|
: Or{m}
: Match exactlym
repetitions{m,n}
: Matchm
ton
repetitions. Omittingn
will give you infinity.\
: Escape characters. Use this to represent any of the metacharacters above (e.g. “\?” if you want to match a question mark)
Here are also some special characters that can be used as a shorthand for the corresonding regular expressions:
\d
==[0-9]
(“digits”)\D
==[^0-9]
(“non-digits”)\s
==[ \t\n\r\f\v]
(“whitespace”)\S
==[^ \t\n\r\f\v]
(“non-whitespace”)\w
==[a-zA-Z0-9_]
(“word”)\W
==[^a-zA-Z0-9_]
(“non-word”)
Also remember the curse of the backslash \
. If you want to represent a single backslash, you will need to escape it with another backslash "\\"
. For example if you want to represent a newline character \n
, it should be written as "\\n"
. If you want two backslashes \\
, then you will need to represent it with four backslashes "\\\\"
. Alternatively, you can use a Python raw string literal: r"\n"
is equivalent to "\\n"
, and r"\\"
is equivalent to "\\\\"
.