Using Regular Expressions with Ruby
Ruby supports regular expressions as a language feature. In Ruby, a regular expression is written in the form of /pattern/modifiers
where “pattern” is the regular expression itself, and “modifiers” are a series of characters indicating various options. The “modifiers” part is optional. This syntax is borrowed from Perl. Ruby supports the following modifiers:
/i
makes the regex match case insensitive. /m
makes the dot match newlines. Ruby indeed uses /m, whereas Perl and many other programming languages use /s for “dot matches newlines”. /x
tells Ruby to ignore whitespace between regex tokens. /o
causes any #{…} substitutions in a particular regex literal to be performed just once, the first time it is evaluated. Otherwise, the substitutions will be performed every time the literal generates a Regexp object.
You can combine multiple modifiers by stringing them together as in /regex/is
.
In Ruby, the caret and dollar always match before and after newlines. Ruby does not have a modifier to change this. Use \A
and \Z
to match at the start or the end of the string.
Since forward slashes delimit the regular expression, any forward slashes that appear in the regex need to be escaped. E.g. the regex 1/2
is written as /1\/2/
in Ruby.
How To Use The Regexp Object
/regex/
creates a new object of the class Regexp. You can assign it to a variable to repeatedly use the same regular expression, or use the literal regex directly. Ruby provides several different ways to test whether a particular regexp matches (part of) a string.
The ===
method allows you to compare a regexp to a string. It returns true if the regexp matches (part of) the string or false if it does not. This allows regular expressions to be used in case statements. Do not confuse ===
(3 equals signs) with ==
(2 equals signs). ==
allows you to compare one regexp to another regexp to see if the two regexes are identical and use the same matching modes.
The =~
method returns the character position in the string of the start of the match or nil if no match was found. In a boolean test, the character position evaluates to true and nil evaluates to false. So you can use =~
instead of ===
to make your code a little more easier to read as =~
is more obviously a regex matching operator. Ruby borrowed the =~
syntax from Perl. print(/\w+/ =~ "test")
prints “0”. The first character in the string has index zero. Switching the order of the =~
operator’s operands makes no difference.
The match()
method returns a MatchData object when a match is found, or nil if no matches was found. In a boolean context, the MatchData object evaluates to true. In a string context, the MatchData object evaluates to the text that was matched. So print(/\w+/.match("test"))
prints “test”.
Ruby 2.4 adds the match?()
method. It returns true or false like the ===
method. The difference is that match?()
does not does not set $~
(see below) and thus doesn’t need to create a MatchData object. If you don’t need any match details you should use match?()
to improve performance.
Special Variables
The ===
, =~
, and match()
methods create a MatchData object and assign it to the special variable $~
. Regexp.match()
also returns this object. The variable $~
is thread-local and method-local. That means you can use this variable until your method exits, or until the next time you use the =~
operator in your method, without worrying that another thread or another method in your thread will overwrite them.
A number of other special variables are derived from the $~
variable. All of these are read-only. If you assign a new MatchData instance to $~
, all of these variables will change too. $&
holds the text matched by the whole regular expression. $1
, $2
, etc. hold the text matched by the first, second, and following capturing groups. $+
holds the text matched by the highest-numbered capturing group that actually participated in the match. $`
and $'
hold the text in the subject string to the left and to the right of the regex match.
Search And Replace
Use the sub()
and gsub()
methods of the String class to search-and-replace the first regex match, or all regex matches, respectively, in the string. Specify the regular expression you want to search for as the first parameter, and the replacement string as the second parameter, e.g.: result = subject.gsub(/before/, "after")
.
To re-insert the regex match, use \0
in the replacement string. You can use the contents of capturing groups in the replacement string with backreferences \1
, \2
, \3
, etc. Note that numbers escaped with a backslash are treated as octal escapes in double-quoted strings. Octal escapes are processed at the language level, before the sub() function sees the parameter. To prevent this, you need to escape the backslashes in double-quoted strings. So to use the first backreference as the replacement string, either pass '\1'
or "\\1"
. '\\1'
also works.
Splitting Strings and Collecting Matches
To collect all regex matches in a string into an array, pass the regexp object to the string’s scan()
method, e.g.: myarray = mystring.scan(/regex/)
. Sometimes, it is easier to create a regex to match the delimiters rather than the text you are interested in. In that case, use the split()
method instead, e.g.: myarray = mystring.split(/delimiter/)
. The split()
method discards all regex matches, returning the text between the matches. The scan()
method does the opposite.
If your regular expression contains capturing groups, scan()
returns an array of arrays. Each element in the overall array will contain an array consisting of the overall regex match, plus the text matched by all capturing groups.