本网站的更多内容

在正则表达式中指定模式

通常，比对模式会在正则表达式外指定。在编程语言中，您可以将它们作为标记传递给正则表达式构造函数，或附加到正则表达式文本。在应用程序中，您会切换适当的按钮或核取方块。您可以在本网站的工具和语言区段中找到详细信息。

有时，工具或语言不提供指定比对选项的功能。在Java中方便的String.matches()方法不采用比对选项参数，例如Pattern.compile()。或者，正则表达式风格可能支持未显示为外部标记的比对模式。在R中的正则表达式函数具有ignore.case作为其唯一选项，即使基础PCRE函数库具有比本教程中讨论的任何其他函数库更多的比对模式。

在这些情况下，您可以将下列模式修改器添加到正则表达式的开头。若要指定多个模式，只需将它们放在一起，如(?ismx)。

(?i)使正则表达式不区分大小写。
(?c)使正则表达式区分大小写。仅受Tcl支持。
(?x)打开自由间距模式。
(?t)关闭自由间距模式。仅受Tcl支持。
(?xx)打开自由间距模式，也出现在字符类别中。由Perl 5.26和PCRE2 10.30支持。
(?s)用于「单行模式」，使点比对所有字符，包括换行符号。不受Ruby支持。在Tcl中，(?s)也使插入符号和美元符号仅在字符串的开头和结尾比对。
(?m) 用于「多行模式」，会让插入符号和美元符号与主旨字符串中每一行的开头和结尾相符。在 Ruby 中，(?m) 会让点号与所有字符相符，但不会影响插入符号和美元符号，它们在 Ruby 中总是与每一行的开头和结尾相符。在 Tcl 中，(?m) 也会防止点号与换行符号相符。
(?p) 在 Tcl 中会让插入符号和美元符号与每一行的开头和结尾相符，并让点号与换行符号相符。
(?w) 在 Tcl 中会让插入符号和美元符号只与主旨字符串的开头和结尾相符，并防止点号与换行符号相符。
(?n) 会将所有未命名组转换为非捕获组。仅受 .NET 和 XRegExp 支持。在 Tcl 中，(?n) 与 (?m) 相同。
(?J) 允许重复群组名称。仅受 PCRE 和使用它的语言支持，例如 Delphi、PHP 和 R。
(?U) 会打开「非贪婪模式」，这会切换贪婪和非贪婪量词的语法。因此 (?U)a* 是非贪婪的，而 (?U)a*? 是贪婪的。仅受 PCRE 和使用它的语言支持。强烈建议不要使用，因为它会混淆标准量词语法的意义。
(?d) 对应 Java 中的 UNIX_LINES，使句点、插入符号和美元符号仅将换行字符 \n 视为换行，而不是辨识 Unicode 标准中的所有换行字符。是否与（at）换行相符取决于 (?s) 和 (?m)。
(?b) 使 Tcl 将正则表达式解译为 POSIX BRE。
(?e) 使 Tcl 将正则表达式解译为 POSIX ERE。
(?q) 使 Tcl 将正则表达式解译为字符串常数（减去 (?q) 字符）。
(?X) 使使用反斜线转义字母会变成错误，如果该组合不是有效的正则表达式代码。仅受 PCRE 和使用它的语言支持。

仅对正则表达式的一部分打开和关闭模式

现代正则表达式风格允许您仅对正则表达式的一部分套用修改器。如果您在正则表达式的中间插入修改器 (?ism)，则修改器仅套用于修改器右方的正则表达式部分。使用这些风格，您可以通过在减号前面加上减号来关闭模式。减号后的全部模式都会关闭。例如，(?i-sm) 会打开不区分大小写，并关闭单行模式和多行模式。

如果一种风格无法对正则表达式的一部分套用修改器，则它会将正则表达式中间的修改器视为错误。 Python 是此规则的例外。在 Python 中，将修改器放在正则表达式中间会影响整个正则表达式。因此，在 Python 中，(?i)不区分大小写 和 不区分大小写(?i) 都不区分大小写。在所有其他风格中，尾随模式修改器不具任何效果或会变成错误。

你可以快速测试你正在使用的正则表达式风格如何处理模式修饰符。正则表达式 (?i)te(?-i)st 应该会符合 test 和 TEst，但不会符合 teST 或 TEST。

修饰符范围

与其使用两个修饰符，一个用来打开选项，一个用来关闭选项，你可以使用修饰符范围。 (?i)caseless(?-i)cased(?i)caseless 等于 (?i)caseless(?-i:cased)caseless。这个语法类似于非捕获组 (?:group)。你可以将非捕获组视为不会改变任何修饰符的修饰符范围。但是，有些风格，例如 JavaScript、Python 和 Tcl 支持非捕获组，即使它们不支持修饰符范围。与非捕获组一样，修饰符范围不会创建反向引用。

修饰符范围受到所有允许你在正规表达式中间使用模式修饰符的正则表达式风格支持，而且仅受到这些风格支持。这些风格包括 .NET、Java、Perl 和 PCRE、PHP、Delphi 和 R。

關於正規表示式 » 正規表示式教學 » 在正規表示式中指定模式

本網站的更多內容

在正規表示式中指定模式

通常，比對模式會在正規表示法外指定。在程式語言中，您可以將它們作為標記傳遞給正規表示法建構函式，或附加到正規表示法文字。在應用程式中，您會切換適當的按鈕或核取方塊。您可以在本網站的工具和語言區段中找到詳細資訊。

有時，工具或語言不提供指定比對選項的功能。在Java中方便的String.matches()方法不採用比對選項參數，例如Pattern.compile()。或者，正規表示法風格可能支援未顯示為外部標記的比對模式。在R中的正規表示法函式具有ignore.case作為其唯一選項，即使基礎PCRE函式庫具有比本教學課程中討論的任何其他函式庫更多的比對模式。

在這些情況下，您可以將下列模式修改器新增到正規表示法的開頭。若要指定多個模式，只需將它們放在一起，如(?ismx)。

(?i)使正規表示法不區分大小寫。
(?c)使正規表示法區分大小寫。僅受Tcl支援。
(?x)開啟自由間距模式。
(?t)關閉自由間距模式。僅受Tcl支援。
(?xx)開啟自由間距模式，也出現在字元類別中。由Perl 5.26和PCRE2 10.30支援。
(?s)用於「單行模式」，使點比對所有字元，包括換行符號。不受Ruby支援。在Tcl中，(?s)也使插入符號和美元符號僅在字串的開頭和結尾比對。
(?m) 用於「多行模式」，會讓插入符號和美元符號與主旨字串中每一行的開頭和結尾相符。在 Ruby 中，(?m) 會讓點號與所有字元相符，但不會影響插入符號和美元符號，它們在 Ruby 中總是與每一行的開頭和結尾相符。在 Tcl 中，(?m) 也會防止點號與換行符號相符。
(?p) 在 Tcl 中會讓插入符號和美元符號與每一行的開頭和結尾相符，並讓點號與換行符號相符。
(?w) 在 Tcl 中會讓插入符號和美元符號只與主旨字串的開頭和結尾相符，並防止點號與換行符號相符。
(?n) 會將所有未命名群組轉換為非擷取群組。僅受 .NET 和 XRegExp 支援。在 Tcl 中，(?n) 與 (?m) 相同。
(?J) 允許重複群組名稱。僅受 PCRE 和使用它的語言支援，例如 Delphi、PHP 和 R。
(?U) 會開啟「非貪婪模式」，這會切換貪婪和非貪婪量詞的語法。因此 (?U)a* 是非貪婪的，而 (?U)a*? 是貪婪的。僅受 PCRE 和使用它的語言支援。強烈建議不要使用，因為它會混淆標準量詞語法的意義。
(?d) 對應 Java 中的 UNIX_LINES，使句點、插入符號和美元符號僅將換行字元 \n 視為換行，而不是辨識 Unicode 標準中的所有換行字元。是否與（at）換行相符取決於 (?s) 和 (?m)。
(?b) 使 Tcl 將正規表示式解譯為 POSIX BRE。
(?e) 使 Tcl 將正規表示式解譯為 POSIX ERE。
(?q) 使 Tcl 將正規表示式解譯為字串常數（減去 (?q) 字元）。
(?X) 使使用反斜線跳脫字母會變成錯誤，如果該組合不是有效的正規表示式代碼。僅受 PCRE 和使用它的語言支援。

僅對正規表示式的一部分開啟和關閉模式

現代正規表示式風格允許您僅對正規表示式的一部分套用修改器。如果您在正規表示式的中間插入修改器 (?ism)，則修改器僅套用於修改器右方的正規表示式部分。使用這些風格，您可以透過在減號前面加上減號來關閉模式。減號後的全部模式都會關閉。例如，(?i-sm) 會開啟不區分大小寫，並關閉單行模式和多行模式。

如果一種風格無法對正規表示式的一部分套用修改器，則它會將正規表示式中間的修改器視為錯誤。 Python 是此規則的例外。在 Python 中，將修改器放在正規表示式中間會影響整個正規表示式。因此，在 Python 中，(?i)不區分大小寫 和 不區分大小寫(?i) 都不區分大小寫。在所有其他風格中，尾隨模式修改器不具任何效果或會變成錯誤。

你可以快速測試你正在使用的正則表達式風味如何處理模式修飾符。正則表達式 (?i)te(?-i)st 應該會符合 test 和 TEst，但不會符合 teST 或 TEST。

修飾符範圍

與其使用兩個修飾符，一個用來開啟選項，一個用來關閉選項，你可以使用修飾符範圍。 (?i)caseless(?-i)cased(?i)caseless 等於 (?i)caseless(?-i:cased)caseless。這個語法類似於非擷取群組 (?:group)。你可以將非擷取群組視為不會改變任何修飾符的修飾符範圍。但是，有些風味，例如 JavaScript、Python 和 Tcl 支援非擷取群組，即使它們不支援修飾符範圍。與非擷取群組一樣，修飾符範圍不會建立反向參照。

修飾符範圍受到所有允許你在正規表達式中間使用模式修飾符的正則表達式風味支援，而且僅受到這些風味支援。這些風味包括 .NET、Java、Perl 和 PCRE、PHP、Delphi 和 R。

About Regular Expressions » Regular Expressions Tutorial » Specifying Modes Inside The Regular Expression

Specifying Modes Inside The Regular Expression

Normally, matching modes are specified outside the regular expression. In a programming language, you pass them as a flag to the regex constructor or append them to the regex literal. In an application, you’d toggle the appropriate buttons or checkboxes. You can find the specifics in the tools and languages section of this website.

Sometimes, the tool or language does not provide the ability to specify matching options. The handy String.matches() method in Java does not take a parameter for matching options like Pattern.compile() does. Or, the regex flavor may support matching modes that aren’t exposed as external flags. The regex functions in R have ignore.case as their only option, even though the underlying PCRE library has more matching modes than any other discussed in this tutorial.

In those situation, you can add the following mode modifiers to the start of the regex. To specify multiple modes, simply put them together as in (?ismx).

(?i) makes the regex case insensitive.
(?c) makes the regex case sensitive. Only supported by Tcl.
(?x) turn on free-spacing mode.
(?t) turn off free-spacing mode. Only supported by Tcl.
(?xx) turn on free-spacing mode, also in character classes. Supported by Perl 5.26 and PCRE2 10.30.
(?s) for “single line mode” makes the dot match all characters, including line breaks. Not supported by Ruby. In Tcl, (?s) also makes the caret and dollar match at the start and end of the string only.
(?m) for “multi-line mode” makes the caret and dollar match at the start and end of each line in the subject string. In Ruby, (?m) makes the dot match all characters, without affecting the caret and dollar which always match at the start and end of each line in Ruby. In Tcl, (?m) also prevents the dot from matching line breaks.
(?p) in Tcl makes the caret and dollar match at the start and the end of each line, and makes the dot match line breaks.
(?w) in Tcl makes the caret and dollar match only at the start and the end of the subject string, and prevents the dot from matching line breaks.
(?n) turns all unnamed groups into non-capturing groups. Only supported by .NET, and XRegExp. In Tcl, (?n) is the same as (?m).
(?J) allows duplicate group names. Only supported by PCRE and languages that use it such as Delphi, PHP and R.
(?U) turns on “ungreedy mode”, which switches the syntax for greedy and lazy quantifiers. So (?U)a* is lazy and (?U)a*? is greedy. Only supported by PCRE and languages that use it. Its use is strongly discouraged because it confuses the meaning of the standard quantifier syntax.
(?d) corresponds with UNIX_LINES in Java, which makes the dot, caret, and dollar treat only the newline character \n as a line break, instead of recognizing all line break characters from the Unicode standard. Whether they match or don’t match (at) line breaks depends on (?s) and (?m).
(?b) makes Tcl interpret the regex as a POSIX BRE.
(?e) makes Tcl interpret the regex as a POSIX ERE.
(?q) makes Tcl interpret the regex as a literal string (minus the (?q) characters).
(?X) makes escaping letters with a backslash an error if that combination is not a valid regex token. Only supported by PCRE and languages that use it.

Turning Modes On and Off for Only Part of The Regular Expression

Modern regex flavors allow you to apply modifiers to only part of the regular expression. If you insert the modifier (?ism) in the middle of the regex then the modifier only applies to the part of the regex to the right of the modifier. With these flavors, you can turn off modes by preceding them with a minus sign. All modes after the minus sign will be turned off. E.g. (?i-sm) turns on case insensitivity, and turns off both single-line mode and multi-line mode.

If a flavor can’t apply modifiers to only part of the regex then it treats modifiers in the middle of the regex as an error. Python is an exception to this. In Python, putting a modifier in the middle of the regex affects the whole regex. So in Python, (?i)caseless and caseless(?i) are both case insensitive. In all other flavors, the trailing mode modifier either has no effect or is an error.

You can quickly test how the regex flavor you’re using handles mode modifiers. The regex (?i)te(?-i)st should match test and TEst, but not teST or TEST.

Modifier Spans

Instead of using two modifiers, one to turn an option on, and one to turn it off, you use a modifier span. (?i)caseless(?-i)cased(?i)caseless is equivalent to (?i)caseless(?-i:cased)caseless. This syntax resembles that of the non-capturing group (?:group). You could think of a non-capturing group as a modifier span that does not change any modifiers. But there are flavors, like JavaScript, Python, and Tcl that support non-capturing groups even though they do not support modifier spans. Like a non-capturing group, the modifier span does not create a backreference.

Modifier spans are supported by all regex flavors that allow you to use mode modifiers in the middle of the regular expression, and by those flavors only. These include .NET, Java, Perl and PCRE, PHP, Delphi, and R.