此网站上的更多内容

从前一次比对的结尾继续

锚点 \G 会在上次比对结束的位置进行比对。在第一次比对尝试期间，\G 会在字符串的开头进行比对，就像 \A 一样。

将 \G\w 套用到字符串 test string 会比对到 t。再次套用会比对到 e。第 3 次尝试会产生 s，而第 4 次尝试会比对到字符串中的第二个 t。第五次尝试会失败。在第五次尝试期间，字符串中唯一比对到 \G 的位置在第二个 t 之后。但该位置后面没有字符，因此比对会失败。

前一次比对的结尾与比对尝试的开头

在某些 regex 风格或工具中，\G 会在比对尝试的开头比对，而不是在前一次比对的结尾。这是 Ruby 的情况。\G 会在光标位置比对。当找到比对时，Ruby 会选取比对，并将光标移到比对的结尾。结果是，\G 只有在您没有在两次搜索之间移动光标时，才会在上次比对结果的结尾比对。总而言之，这在文本编辑器的环境中非常有意义。

如果您的正则表达式可以找到零长度比对，则前一次比对的结尾和比对尝试的开头之间的区别也很重要。大多数 regex 引擎会在零长度比对后在字符串中前进。在这种情况下，比对尝试的开头在字符串中比前一次比对尝试的结尾多一个字符。.NET、Java 和 Boost 会以这种方式前进，并在上次比对尝试的结尾比对 \G。因此，当 .NET、Java 和 Boost 在零长度比对后前进时，\G 会无法比对。

Perl 的 \G 魔术

在 Perl 中，上次比对结束的位置是一个「神奇」值，会针对每个字符串变量分别记住。该位置与任何正则表达式无关。这表示您可以使用 \G 让 regex 在另一个 regex 停止的位置继续在主旨字符串中进行。

如果比对尝试失败，\G 的保存位置会重设为字符串的开头。若要避免这种情况，请指定延续修饰词 /c。

所有这些对于让多个正则表达式一起运作非常有用。例如，你可以用以下方式剖析 HTML 文件

while ($string =~ m/</g) {
  if ($string =~ m/\GB>/c) {
    # Bold
  } elsif ($string =~ m/\GI>/c) {
    # Italics
  } else {
    # ...etc...
  }
}

while 循环中的正则表达式会搜索标签的打开方括号，而循环内的正则表达式会检查我们找到的标签。这样一来，你可以按照文件中出现的顺序剖析文件中的标签，而不用写一个单一的大正则表达式来比对你感兴趣的所有标签。

\G 在其他编程语言中

这种弹性在大部分其他编程语言中并不存在。例如，在 Java 中，\G 的位置会由 Matcher 对象记住。Matcher 与单一正则表达式和单一主旨字符串严密关联。不过，你可以添加一行代码，让第二个 Matcher 的比对尝试从第一个 Matcher 的比对结束处开始。然后 \G 会比对这个位置。

比对尝试的开始

通常，\A 是字符串开头锚点。但在 Tcl 中，锚点 \A 会比对比对尝试的开头，而不是字符串的开头。使用 GNU 风格时，\<（反斜线反引号）会运行相同的动作。如果你只调用 Tcl 中的 regexp 或 GNU 函数库中的 regexec() 一次，这不会造成任何差异。如果你在第一次比对后调用第二次比对，在字符串的剩余部分寻找另一个比对，这可能会造成差异。\A 或 \< 会比对第一次比对的结尾，而不是像字符串开头锚点通常会做的那样无法比对。奇怪的是，在 Tcl 或 GNU 的函数库中，插入符号都没有这个问题。

關於正規表示式 » 正規表示式教學 » 從前一次比對的結尾繼續

此網站上的更多內容

從前一次比對的結尾繼續

錨點 \G 會在上次比對結束的位置進行比對。在第一次比對嘗試期間，\G 會在字串的開頭進行比對，就像 \A 一樣。

將 \G\w 套用到字串 test string 會比對到 t。再次套用會比對到 e。第 3 次嘗試會產生 s，而第 4 次嘗試會比對到字串中的第二個 t。第五次嘗試會失敗。在第五次嘗試期間，字串中唯一比對到 \G 的位置在第二個 t 之後。但該位置後面沒有字元，因此比對會失敗。

前一次比對的結尾與比對嘗試的開頭

在某些 regex 風格或工具中，\G 會在比對嘗試的開頭比對，而不是在前一次比對的結尾。這是 Ruby 的情況。\G 會在游標位置比對。當找到比對時，Ruby 會選取比對，並將游標移到比對的結尾。結果是，\G 只有在您沒有在兩次搜尋之間移動游標時，才會在上次比對結果的結尾比對。總而言之，這在文字編輯器的環境中非常有意義。

如果您的正規表示式可以找到零長度比對，則前一次比對的結尾和比對嘗試的開頭之間的區別也很重要。大多數 regex 引擎會在零長度比對後在字串中前進。在這種情況下，比對嘗試的開頭在字串中比前一次比對嘗試的結尾多一個字元。.NET、Java 和 Boost 會以這種方式前進，並在上次比對嘗試的結尾比對 \G。因此，當 .NET、Java 和 Boost 在零長度比對後前進時，\G 會無法比對。

Perl 的 \G 魔術

在 Perl 中，上次比對結束的位置是一個「神奇」值，會針對每個字串變數分別記住。該位置與任何正規表示式無關。這表示您可以使用 \G 讓 regex 在另一個 regex 停止的位置繼續在主旨字串中進行。

如果比對嘗試失敗，\G 的儲存位置會重設為字串的開頭。若要避免這種情況，請指定延續修飾詞 /c。

所有這些對於讓多個正規表示式一起運作非常有用。例如，你可以用以下方式剖析 HTML 檔案

while ($string =~ m/</g) {
  if ($string =~ m/\GB>/c) {
    # Bold
  } elsif ($string =~ m/\GI>/c) {
    # Italics
  } else {
    # ...etc...
  }
}

while 迴圈中的正規表示式會搜尋標籤的開啟方括號，而迴圈內的正規表示式會檢查我們找到的標籤。這樣一來，你可以按照檔案中出現的順序剖析檔案中的標籤，而不用寫一個單一的大正規表示式來比對你感興趣的所有標籤。

\G 在其他程式語言中

這種彈性在大部分其他程式語言中並不存在。例如，在 Java 中，\G 的位置會由 Matcher 物件記住。Matcher 與單一正規表示式和單一主旨字串嚴密關聯。不過，你可以新增一行程式碼，讓第二個 Matcher 的比對嘗試從第一個 Matcher 的比對結束處開始。然後 \G 會比對這個位置。

比對嘗試的開始

通常，\A 是字串開頭錨點。但在 Tcl 中，錨點 \A 會比對比對嘗試的開頭，而不是字串的開頭。使用 GNU 風格時，\<（反斜線反引號）會執行相同的動作。如果你只呼叫 Tcl 中的 regexp 或 GNU 函式庫中的 regexec() 一次，這不會造成任何差異。如果你在第一次比對後呼叫第二次比對，在字串的剩餘部分尋找另一個比對，這可能會造成差異。\A 或 \< 會比對第一次比對的結尾，而不是像字串開頭錨點通常會做的那樣無法比對。奇怪的是，在 Tcl 或 GNU 的函式庫中，插入符號都沒有這個問題。

About Regular Expressions » Regular Expressions Tutorial » Continuing at The End of The Previous Match

Continuing at The End of The Previous Match

The anchor \G matches at the position where the previous match ended. During the first match attempt, \G matches at the start of the string in the way \A does.

Applying \G\w to the string test string matches t. Applying it again matches e. The 3rd attempt yields s and the 4th attempt matches the second t in the string. The fifth attempt fails. During the fifth attempt, the only place in the string where \G matches is after the second t. But that position is not followed by a word character, so the match fails.

End of The Previous Match vs. Start of The Match Attempt

With some regex flavors or tools, \G matches at the start of the match attempt, rather than at the end of the previous match. This is the case with Ruby. \G matches at the position of the text cursor. When a match is found, Ruby will select the match, and move the text cursor to the end of the match. The result is that \G matches at the end of the previous match result only when you do not move the text cursor between two searches. All in all, this makes a lot of sense in the context of a text editor.

The distinction between the end of the previous match and the start of the match attempt is also important if your regular expression can find zero-length matches. Most regex engines advance through the string after a zero-length match. In that case, the start of the match attempt is one character further in the string than the end of the previous match attempt. .NET, Java, and Boost advance this way and also match \G at the end of the previous match attempt. Thus \G fails to match when .NET, Java, and Boost have advanced after a zero-length match.

\G Magic with Perl

In Perl, the position where the last match ended is a “magical” value that is remembered separately for each string variable. The position is not associated with any regular expression. This means that you can use \G to make a regex continue in a subject string where another regex left off.

If a match attempt fails, the stored position for \G is reset to the start of the string. To avoid this, specify the continuation modifier /c.

All this is very useful to make several regular expressions work together. E.g. you could parse an HTML file in the following fashion:

while ($string =~ m/</g) {
  if ($string =~ m/\GB>/c) {
    # Bold
  } elsif ($string =~ m/\GI>/c) {
    # Italics
  } else {
    # ...etc...
  }
}

The regex in the while loop searches for the tag’s opening bracket, and the regexes inside the loop check which tag we found. This way you can parse the tags in the file in the order they appear in the file, without having to write a single big regex that matches all tags you are interested in.

\G in Other Programming Languages

This flexibility is not available with most other programming languages. E.g. in Java, the position for \G is remembered by the Matcher object. The Matcher is strictly associated with a single regular expression and a single subject string. What you can do though is to add a line of code to make the match attempt of the second Matcher start where the match of the first Matcher ended. Then \G will match at this position.

Start of Match Attempt

Normally, \A is a start-of-string anchor. But in Tcl, the anchor \A matches at the start of the match attempt rather than at the start of the string. With the GNU flavors, \` (backslash backtick) does the same. This makes no difference if you’re only making one call to regexp in Tcl or regexec() in the GNU library. It can make a difference if you make a second call to find another match in the remainder of the string after the first match. \A or \` then matches at the end of the first match, instead of failing to match as start-of-string anchors normally do. Strangely enough, the caret does not have this issue in either Tcl or GNU’s library.