re.split()
The re
module also provide a split()
function (or a method .split()
for the Pattern
object) that allows you to split a string based on a given regular expression. This is similar to str.split()
, except that you can now specify delimiter as a regular expression.
Let’s say we want to split our string at points where they are not alphanumeric characters.
>>> pattern = r"\W+"
>>> string = "doe, a deer, a female deer."
>>> re.split(pattern, string)
['doe', 'a', 'deer', 'a', 'female', 'deer', '']
In the code above, the string is split by non-alphanumeric characters like comma, spaces and a full stop. Note the empty string at the end of the resulting list.
You can also limit the maximum number of splits. Let’s say we only want to split the string at 3 points.
>>> re.split(pattern, string, 3)
['doe', 'a', 'deer', 'a female deer.']
If you need to keep the delimiters as well, use captured groups in your regular expression
>>> pattern2 = r"(\W+)"
>>> re.split(pattern2, string)
['doe', ', ', 'a', ' ', 'deer', ', ', 'a', ' ', 'female', ' ', 'deer', '.', '']