发表 admin at 2024年3月5日

类别

正则表达式

标签

关于正则表达式 » 正则表达式工具和实用程序 » 使用正则表达式与 Ruby

正则表达式工具

数据库

使用正则表达式与 Ruby

Ruby 支持正则表达式作为语言功能。在 Ruby 中，正则表达式以 /pattern/modifiers 的形式撰写，其中「pattern」是正则表达式本身，而「modifiers」是一系列表示各种选项的字符。这个「modifiers」部分是选用的。这个语法是从 Perl 借来的。Ruby 支持以下 modifiers

/i 使正则表达式比对不区分大小写。
/m 使点号比对换行符号。Ruby 确实使用 /m，而 Perl 和许多其他编程语言使用 /s 表示「点号比对换行符号」。
/x 告诉 Ruby 忽略正则表达式标记之间的空白。
/o 导致特定正则表达式字面中的任何 #{…} 替换只运行一次，也就是在第一次评估时。否则，每次字面产生 Regexp 对象时，都会运行替换。

你可以通过将多个 modifiers 串接在一起的方式，来结合多个 modifiers，例如 /regex/is。

在 Ruby 中，插入符号和美元符号总是在换行符号之前和之后比对。Ruby 没有 modifiers 可以变更这个设置。使用 \A 和 \Z 来比对字符串的开头或结尾。

由于正斜线界定正则表达式，因此出现在正则表达式中的任何正斜线都需要转义。例如，正则表达式 1/2 在 Ruby 中写成 /1\/2/。

如何使用 Regexp 对象

/regex/ 会创建 Regexp 类别的新对象。您可以将它指定给变量，以重复使用相同的正则表达式，或直接使用文本正则表达式。Ruby 提供了数种不同的方式来测试特定正则表达式是否符合（部分）字符串。

=== 方法允许您将正则表达式与字符串进行比较。如果正则表达式符合（部分）字符串，则传回 true，否则传回 false。这允许在 case 陈述式中使用正则表达式。请勿将 ===（三个等号）与 ==（两个等号）混淆。== 允许您将一个正则表达式与另一个正则表达式进行比较，以查看两个正则表达式是否相同并使用相同的比对模式。

=~ 方法传回比对开始处的字符串字符位置，或在找不到比对时传回 nil。在布尔测试中，字符位置会评估为 true，而 nil 会评估为 false。因此，您可以使用 =~ 取代 ===，让您的代码更容易阅读，因为 =~ 显然是一个正则表达式比对操作符。Ruby 从 Perl 借用了 =~ 语法。print(/\w+/ =~ "test") 会印出「0」。字符串中的第一个字符索引为零。切换 =~ 操作符的操作数顺序并不会造成任何差异。

match() 方法在找到比对时传回 MatchData 对象，或在找不到比对时传回 nil。在布尔环境中，MatchData 对象会评估为 true。在字符串环境中，MatchData 对象会评估为比对到的文本。因此，print(/\w+/.match("test")) 会印出「test」。

Ruby 2.4 添加了 match?() 方法。它会像 === 方法一样传回 true 或 false。不同之处在于 match?() 没有设置 $~（请见下方），因此不需要创建 MatchData 对象。如果您不需要任何比对详细数据，您应该使用 match?() 来提升性能。

特殊变量

===、=~ 和 match() 方法会创建 MatchData 对象并将它指定给特殊变量 $~。Regexp.match() 也会传回这个对象。变量 $~ 是线程本机和方法本机的。这表示您可以在方法结束之前或在方法中下次使用 =~ 操作符之前使用这个变量，而不用担心另一个线程或线程中的另一个方法会覆写它们。

许多其他特殊变量都是从 $~ 变量衍生而来的。这些变量都是唯读的。如果您将新的 MatchData 实例指定给 $~，所有这些变量也会跟着改变。$& 包含整个正则表达式所配对到的文本。$1、$2 等则包含由第一个、第二个等捕获组所配对到的文本。$+ 包含由实际参与配对的编号最高的捕获组所配对到的文本。$` 和 $' 包含正则表达式配对左右两侧的受测字符串文本。

搜索和取代

使用字符串类别的 sub() 和 gsub() 方法分别搜索并取代字符串中的第一个正则表达式配对或所有正则表达式配对。将您要搜索的正则表达式指定为第一个参数，将取代字符串指定为第二个参数，例如：result = subject.gsub(/before/, "after")。

若要重新插入正则表达式配对，请在取代字符串中使用 \0。您可以在取代字符串中使用反向引用 \1、\2、\3 等来使用捕获组的内容。请注意，在双引号字符串中，以反斜线作为转义字符的数字会被视为八进位转义字符。八进位转义字符会在语言层级中处理，在 sub() 函数看到参数之前。若要防止这种情况，您需要在双引号字符串中转义反斜线。因此，若要使用第一个反向引用作为取代字符串，请传递 '\1' 或 "\\1"。'\\1' 也可以使用。

分割字符串和收集配对

若要将字符串中的所有正则表达式配对收集到数组中，请将正则表达式对象传递给字符串的 scan() 方法，例如：myarray = mystring.scan(/regex/)。有时，创建正则表达式来配对分隔符号会比配对您有兴趣的文本容易。在这种情况下，请改用 split() 方法，例如：myarray = mystring.split(/delimiter/)。split() 方法会舍弃所有正则表达式配对，传回配对之间的文本。scan() 方法则相反。

如果您的正则表达式包含捕获组，scan() 会传回数组的数组。整体数组中的每个元素都包含一个数组，其中包含整体正则表达式比对，以及所有捕获组比对到的文本。

關於正規表示式 » 正規表示式工具和實用程式 » 使用正規表示式與 Ruby

正規表示式工具

資料庫

本網站的更多資訊

使用正規表示式與 Ruby

Ruby 支援正規表示式作為語言功能。在 Ruby 中，正規表示式以 /pattern/modifiers 的形式撰寫，其中「pattern」是正規表示式本身，而「modifiers」是一系列表示各種選項的字元。這個「modifiers」部分是選用的。這個語法是從 Perl 借來的。Ruby 支援以下 modifiers

/i 使正規表示式比對不區分大小寫。
/m 使點號比對換行符號。Ruby 確實使用 /m，而 Perl 和許多其他程式語言使用 /s 表示「點號比對換行符號」。
/x 告訴 Ruby 忽略正規表示式標記之間的空白。
/o 導致特定正規表示式字面中的任何 #{…} 替換只執行一次，也就是在第一次評估時。否則，每次字面產生 Regexp 物件時，都會執行替換。

你可以透過將多個 modifiers 串接在一起的方式，來結合多個 modifiers，例如 /regex/is。

在 Ruby 中，插入符號和美元符號總是在換行符號之前和之後比對。Ruby 沒有 modifiers 可以變更這個設定。使用 \A 和 \Z 來比對字串的開頭或結尾。

由於正斜線界定正規表示式，因此出現在正規表示式中的任何正斜線都需要跳脫。例如，正規表示式 1/2 在 Ruby 中寫成 /1\/2/。

如何使用 Regexp 物件

/regex/ 會建立 Regexp 類別的新物件。您可以將它指定給變數，以重複使用相同的正規表示式，或直接使用文字正規表示式。Ruby 提供了數種不同的方式來測試特定正規表示式是否符合（部分）字串。

=== 方法允許您將正規表示式與字串進行比較。如果正規表示式符合（部分）字串，則傳回 true，否則傳回 false。這允許在 case 陳述式中使用正規表示式。請勿將 ===（三個等號）與 ==（兩個等號）混淆。== 允許您將一個正規表示式與另一個正規表示式進行比較，以查看兩個正規表示式是否相同並使用相同的比對模式。

=~ 方法傳回比對開始處的字串字元位置，或在找不到比對時傳回 nil。在布林測試中，字元位置會評估為 true，而 nil 會評估為 false。因此，您可以使用 =~ 取代 ===，讓您的程式碼更容易閱讀，因為 =~ 顯然是一個正規表示式比對運算子。Ruby 從 Perl 借用了 =~ 語法。print(/\w+/ =~ "test") 會印出「0」。字串中的第一個字元索引為零。切換 =~ 運算子的運算元順序並不會造成任何差異。

match() 方法在找到比對時傳回 MatchData 物件，或在找不到比對時傳回 nil。在布林環境中，MatchData 物件會評估為 true。在字串環境中，MatchData 物件會評估為比對到的文字。因此，print(/\w+/.match("test")) 會印出「test」。

Ruby 2.4 新增了 match?() 方法。它會像 === 方法一樣傳回 true 或 false。不同之處在於 match?() 沒有設定 $~（請見下方），因此不需要建立 MatchData 物件。如果您不需要任何比對詳細資料，您應該使用 match?() 來提升效能。

特殊變數

===、=~ 和 match() 方法會建立 MatchData 物件並將它指定給特殊變數 $~。Regexp.match() 也會傳回這個物件。變數 $~ 是執行緒本機和方法本機的。這表示您可以在方法結束之前或在方法中下次使用 =~ 運算子之前使用這個變數，而不用擔心另一個執行緒或執行緒中的另一個方法會覆寫它們。

許多其他特殊變數都是從 $~ 變數衍生而來的。這些變數都是唯讀的。如果您將新的 MatchData 實例指定給 $~，所有這些變數也會跟著改變。$& 包含整個正規表示式所配對到的文字。$1、$2 等則包含由第一個、第二個等擷取群組所配對到的文字。$+ 包含由實際參與配對的編號最高的擷取群組所配對到的文字。$` 和 $' 包含正規表示式配對左右兩側的受測字串文字。

搜尋和取代

使用字串類別的 sub() 和 gsub() 方法分別搜尋並取代字串中的第一個正規表示式配對或所有正規表示式配對。將您要搜尋的正規表示式指定為第一個參數，將取代字串指定為第二個參數，例如：result = subject.gsub(/before/, "after")。

若要重新插入正規表示式配對，請在取代字串中使用 \0。您可以在取代字串中使用反向參照 \1、\2、\3 等來使用擷取群組的內容。請注意，在雙引號字串中，以反斜線作為跳脫字元的數字會被視為八進位跳脫字元。八進位跳脫字元會在語言層級中處理，在 sub() 函式看到參數之前。若要防止這種情況，您需要在雙引號字串中跳脫反斜線。因此，若要使用第一個反向參照作為取代字串，請傳遞 '\1' 或 "\\1"。'\\1' 也可以使用。

分割字串和收集配對

若要將字串中的所有正規表示式配對收集到陣列中，請將正規表示式物件傳遞給字串的 scan() 方法，例如：myarray = mystring.scan(/regex/)。有時，建立正規表示式來配對分隔符號會比配對您有興趣的文字容易。在這種情況下，請改用 split() 方法，例如：myarray = mystring.split(/delimiter/)。split() 方法會捨棄所有正規表示式配對，傳回配對之間的文字。scan() 方法則相反。

如果您的正規表示式包含擷取群組，scan() 會傳回陣列的陣列。整體陣列中的每個元素都包含一個陣列，其中包含整體正規表示式比對，以及所有擷取群組比對到的文字。

About Regular Expressions » Tools and Utilities for Regular Expressions » Using Regular Expressions with Ruby

Regex Tools

grep

Languages & Libraries

Databases

Using Regular Expressions with Ruby

Ruby supports regular expressions as a language feature. In Ruby, a regular expression is written in the form of /pattern/modifiers where “pattern” is the regular expression itself, and “modifiers” are a series of characters indicating various options. The “modifiers” part is optional. This syntax is borrowed from Perl. Ruby supports the following modifiers:

/i makes the regex match case insensitive.
/m makes the dot match newlines. Ruby indeed uses /m, whereas Perl and many other programming languages use /s for “dot matches newlines”.
/x tells Ruby to ignore whitespace between regex tokens.
/o causes any #{…} substitutions in a particular regex literal to be performed just once, the first time it is evaluated. Otherwise, the substitutions will be performed every time the literal generates a Regexp object.

You can combine multiple modifiers by stringing them together as in /regex/is.

In Ruby, the caret and dollar always match before and after newlines. Ruby does not have a modifier to change this. Use \A and \Z to match at the start or the end of the string.

Since forward slashes delimit the regular expression, any forward slashes that appear in the regex need to be escaped. E.g. the regex 1/2 is written as /1\/2/ in Ruby.

How To Use The Regexp Object

/regex/ creates a new object of the class Regexp. You can assign it to a variable to repeatedly use the same regular expression, or use the literal regex directly. Ruby provides several different ways to test whether a particular regexp matches (part of) a string.

The === method allows you to compare a regexp to a string. It returns true if the regexp matches (part of) the string or false if it does not. This allows regular expressions to be used in case statements. Do not confuse === (3 equals signs) with == (2 equals signs). == allows you to compare one regexp to another regexp to see if the two regexes are identical and use the same matching modes.

The =~ method returns the character position in the string of the start of the match or nil if no match was found. In a boolean test, the character position evaluates to true and nil evaluates to false. So you can use =~ instead of === to make your code a little more easier to read as =~ is more obviously a regex matching operator. Ruby borrowed the =~ syntax from Perl. print(/\w+/ =~ "test") prints “0”. The first character in the string has index zero. Switching the order of the =~ operator’s operands makes no difference.

The match() method returns a MatchData object when a match is found, or nil if no matches was found. In a boolean context, the MatchData object evaluates to true. In a string context, the MatchData object evaluates to the text that was matched. So print(/\w+/.match("test")) prints “test”.

Ruby 2.4 adds the match?() method. It returns true or false like the === method. The difference is that match?() does not does not set $~ (see below) and thus doesn’t need to create a MatchData object. If you don’t need any match details you should use match?() to improve performance.

Special Variables

The ===, =~, and match() methods create a MatchData object and assign it to the special variable $~. Regexp.match() also returns this object. The variable $~ is thread-local and method-local. That means you can use this variable until your method exits, or until the next time you use the =~ operator in your method, without worrying that another thread or another method in your thread will overwrite them.

A number of other special variables are derived from the $~ variable. All of these are read-only. If you assign a new MatchData instance to $~, all of these variables will change too. $& holds the text matched by the whole regular expression. $1, $2, etc. hold the text matched by the first, second, and following capturing groups. $+ holds the text matched by the highest-numbered capturing group that actually participated in the match. $` and $' hold the text in the subject string to the left and to the right of the regex match.

Search And Replace

Use the sub() and gsub() methods of the String class to search-and-replace the first regex match, or all regex matches, respectively, in the string. Specify the regular expression you want to search for as the first parameter, and the replacement string as the second parameter, e.g.: result = subject.gsub(/before/, "after").

To re-insert the regex match, use \0 in the replacement string. You can use the contents of capturing groups in the replacement string with backreferences \1, \2, \3, etc. Note that numbers escaped with a backslash are treated as octal escapes in double-quoted strings. Octal escapes are processed at the language level, before the sub() function sees the parameter. To prevent this, you need to escape the backslashes in double-quoted strings. So to use the first backreference as the replacement string, either pass '\1' or "\\1". '\\1' also works.

Splitting Strings and Collecting Matches

To collect all regex matches in a string into an array, pass the regexp object to the string’s scan() method, e.g.: myarray = mystring.scan(/regex/). Sometimes, it is easier to create a regex to match the delimiters rather than the text you are interested in. In that case, use the split() method instead, e.g.: myarray = mystring.split(/delimiter/). The split() method discards all regex matches, returning the text between the matches. The scan() method does the opposite.

If your regular expression contains capturing groups, scan() returns an array of arrays. Each element in the overall array will contain an array consisting of the overall regex match, plus the text matched by all capturing groups.