Regular Expressions in Python
Regular expressions let us find content inside strings matching a particular format.
By formulating a regular expression with a special syntax, you can
- search text a string
- replace substrings in a string
- extract information from a string
The re
Python standard library module gives us a set of tools to work with regular expressions.
In particular, among others it offers us the following functions:
re.match()
checks for a match at the beginning of the stringre.search()
checks for a match anywhere in the string
Both take take 3 parameters: the pattern, the string to search into, and the flags.
Before talking about how to use them, let’s introduce the basics of a regular expression pattern.
The pattern is a string wrapped in a r''
delimiter. Inside it, we can use some special combinations of characters we can use to capture the values we want.
For example:
.
matches a single character (except the new line character)\w
matches any alphanumeric character ([a-zA-Z0-9_]
)\W
matches any non-alphanumeric character\d
matches any digit\D
matches anything that’s not a digit\s
matches whitespace\S
matches anything that’s not whitespace
Square brackets can contain multiple characters matches: [\d\sa]
matches digits and whitespaces, and the character a
. [a-z]
matches characters from a
to z
.
\
can be used to escape, for example to match the dot .
, you should use \.
in your pattern.
|
means or
Then we have anchors:
^
matches the beginning of a line$
matches the end of a line
Then we have quantity modifiers:
?
means “zero or one” occurrences*
means “zero or more” occurrences+
means “one or more” occurrences{n}
means “exactlyn
” occurrences{n,}
means “at leastn
” occurrences{n, m}
means “at leastn
and at mostm
” occurrences
Parentheses, (<expression>)
, create a group. Groups are interesting because we can capture the content of a group.
Those 2 examples match the whole string:
re.match('^.*Roger', 'My dog name is Roger')
re.match('.*', 'My dog name is Roger')
Printing one of those statements will result in a string like this:
<re.Match object; span=(0, 20), match='My dog name is Roger'>
If you assign the result to a result
variable and call group()
on it, you will see the match:
result = re.match('^.*Roger', 'My dog name is Roger')
print(result.group())
# My dog name is Roger
Let’s try to get the name of the dog, if you don’t know what is going to be the name of the dog, you can look for “name is ” and then add a group, like this:
result = re.search('name is (.*)', 'My dog name is Roger')
result.group()
will print “name is Roger”, and result.group(1)
will print the content of the group, “Roger”:
print(result.group()) # name is Roger
print(result.group(1)) # Roger
I mentioned re.search()
and re.match()
take flags as the 3rd parameter. We have a few possible flags, the most used is re.I
to perform a case-insensitive match.
This is just an introduction to regular expressions, starting from this there’s a lot of rabbit holes you can go into.
I recommend trying your regular expressions on https://regex101.com for correctness. Make sure you choose the Python flavor in the sidebar.
→ I wrote 17 books to help you become a better developer, download them all at $0 cost by joining my newsletter
→ JOIN MY CODING BOOTCAMP, an amazing cohort course that will be a huge step up in your coding career - covering React, Next.js - next edition February 2025