Sample Regular Expressions
Below, you will find many example patterns that you can use for and adapt to your own purposes. Key techniques used in crafting each regex are explained, with links to the corresponding pages in the tutorial where these concepts and techniques are explained in great detail.
Oh, and you definitely do not need to be a programmer to take advantage of regular expressions!
Grabbing HTML Tags
<TAG\b[^>]*>(.*?)</TAG>
matches the opening and closing pair of a specific HTML tag. Anything between the tags is captured into the first backreference. The question mark in the regex makes the star lazy, to make sure it stops before the first closing tag rather than before the last, like a greedy star would do. This regex will not properly match tags nested inside themselves, like in <TAG>one<TAG>two</TAG>one</TAG>
.
<([A-Z][A-Z0-9]*)\b[^>]*>(.*?)</\1>
will match the opening and closing pair of any HTML tag. Be sure to turn off case sensitivity. The key in this solution is the use of the backreference \1
in the regex. Anything between the tags is captured into the second backreference. This solution will also not match tags nested in themselves.
Trimming Whitespace
You can easily trim unnecessary whitespace from the start and the end of a string or the lines in a text file by doing a regex search-and-replace. Search for ^[ \t]+
and replace with nothing to delete leading whitespace (spaces and tabs). Search for [ \t]+$
to trim trailing whitespace. Do both by combining the regular expressions into ^[ \t]+|[ \t]+$
. Instead of [ \t]
which matches a space or a tab, you can expand the character class into [ \t\r\n]
if you also want to strip line breaks. Or you can use the shorthand \s
instead.
More Detailed Examples
Numeric Ranges. Since regular expressions work with text rather than numbers, matching specific numeric ranges requires a bit of extra care.
Matching a Floating Point Number. Also illustrates the common mistake of making everything in a regular expression optional.
Matching an Email Address. There’s a lot of controversy about what is a proper regex to match email addresses. It’s a perfect example showing that you need to know exactly what you’re trying to match (and what not), and that there’s always a trade-off between regex complexity and accuracy.
Matching an IP Address.
Matching Valid Dates. A regular expression that matches 31-12-1999 but not 31-13-1999.
Finding or Verifying Credit Card Numbers. Validate credit card numbers entered on your order form. Find credit card numbers in documents for a security audit.
Matching Complete Lines. Shows how to match complete lines in a text file rather than just the part of the line that satisfies a certain requirement. Also shows how to match lines in which a particular regex does not match.
Removing Duplicate Lines or Items. Illustrates simple yet clever use of capturing parentheses or backreferences.
Regex Examples for Processing Source Code. How to match common programming language syntax such as comments, strings, numbers, etc.
Two Words Near Each Other. Shows how to use a regular expression to emulate the “near” operator that some tools have.
Common Pitfalls
Catastrophic Backtracking. If your regular expression seems to take forever, or simply crashes your application, it has likely contracted a case of catastrophic backtracking. The solution is usually to be more specific about what you want to match, so the number of matches the engine has to try doesn’t rise exponentially.
Making Everything Optional. If all the parts in your regex are optional, it will match a zero-length string anywhere. Your regex will need to express the facts that different parts are optional depending on which parts are present.
Repeating a Capturing Group vs. Capturing a Repeated Group. Repeating a capturing group will capture only the last iteration of the group. Capture a repeated group if you want to capture all iterations.
Mixing Unicode and 8-bit Character Codes. Using 8-bit character codes like \x80
with a Unicode engine and subject string may give unexpected results.