发表 admin at 2024年3月5日

类别

正则表达式

标签

关于正则表达式 » 正则表达式工具和实用程序 » Oracle Database 正则表达式

Regex 工具

数据库

Oracle Database 正则表达式

从 10g 第 1 版开始，Oracle Database 提供 4 个 regexp 函数，您可以在 SQL 和 PL/SQL 陈述式中使用。这些函数实作 POSIX 扩展正则表达式 (ERE) 标准。Oracle 完全支持校对顺序和等价类别，用于方括号表达式。NLS_SORT 设置会决定所使用的 POSIX 区域设置，而区域设置会决定可用的校对顺序和等价类别。

不过，Oracle 并未完全实作 POSIX ERE 标准。它在三个方面有所不同。首先，Oracle 支持正则表达式中 \1 到 \9 的反向引用。POSIX ERE 标准不支持这些，尽管 POSIX BRE 支持。在完全兼容的引擎中，\1 到 \9 会是非法的。POSIX 标准指出，使用反斜线来转义非元字符的字符是非法的。Oracle 允许这样做，并只会忽略反斜线。例如，\q 在 Oracle 中与 q 相同。结果是所有 POSIX ERE 正则表达式都可以在 Oracle 中使用，但一些在 Oracle 中有效的正则表达式可能会在完全兼容 POSIX 的引擎中造成错误。显然地，如果您只使用 Oracle，这些差异就无关紧要。

第三个差异比较细微。它不会造成任何错误，但可能会导致不同的比对。正如我在 POSIX 标准主题中所解释的，它要求 regex 引擎在交替的情况下传回最长的比对。Oracle 的引擎不会这样做。它是一个传统 NFA 引擎，就像本网站上讨论的所有非 POSIX regex 风格一样。

如果您曾在其他编程语言中使用过正则表达式，请注意 POSIX 不支持非可打印字符的转义字符，例如 tab 的 \t 或换行符号的 \n。您可以在 C++ 等编程语言的 POSIX 引擎中使用这些字符，因为 C++ 编译器会将字符串常数中的 \t 和 \n 解释为 tab 和换行符号。在 SQL 陈述式中，您需要在字符串中输入实际的 tab 或换行符号，并搭配您的正则表达式才能与 tab 或换行符号相符。Oracle 的正则表达式引擎会将字符串 '\t' 解释为正则表达式 t，当传递为 regexp 参数时。

Oracle 10g R2 进一步扩充正则表达式语法，添加了自由间隔模式（不支持注解）、简写字符类别、惰性量词，以及锚点 \A、\Z 和 \z。Oracle 11g 和 12c 使用与 10g R2 相同的正则表达式风格。

Oracle 的 REGEXP 函数

Oracle Database 10g 提供四个正则表达式函数。您可以在 SQL 和 PL/SQL 陈述式中平等地使用这些函数。

REGEXP_LIKE(source, regexp, modes) 可能会是您最常使用的一个。您可以在 SELECT 陈述式的 WHERE 和 HAVING 子句中使用它。在 PL/SQL 脚本中，它会传回一个布尔值。您也可以在 CHECK 约束中使用它。source 参数是正则表达式应该与之相符的字符串或字段。regexp 参数是包含正则表达式的字符串。modes 参数是选用的。它设置相符模式。

SELECT * FROM mytable WHERE REGEXP_LIKE(mycolumn, 'regexp', 'i');
IF REGEXP_LIKE('subject', 'regexp') THEN /* Match */ ELSE /* No match */ END IF;
ALTER TABLE mytable ADD (CONSTRAINT mycolumn_regexp CHECK (REGEXP_LIKE(mycolumn, '^regexp$')));

REGEXP_SUBSTR(source, regexp, position, occurrence, modes) 传回一个字符串，其中包含由正则表达式相符的 source 部分。如果相符尝试失败，则传回 NULL。您可以将 REGEXP_SUBSTR 与单一字符串或字段一起使用。您可以在 SELECT 子句中使用它，以仅截取字段的特定部分。position 参数指定相符尝试应该开始的来源字符串中的字符位置。第一个字符的地址为 1。occurrence 参数指定要取得哪个相符。设置为 1 以取得第一个相符。如果您指定较高的数字，Oracle 会继续尝试从前一个相符的结尾开始相符正则表达式，直到找到与您指定的相符数目一样多的相符。然后传回最后一个相符。如果相符数目较少，则传回 NULL。请勿将此参数与反向引用混淆。Oracle 未提供函数来传回由捕获组相符的字符串部分。最后三个参数是选用的。

SELECT REGEXP_SUBSTR(mycolumn, 'regexp') FROM mytable;
match := REGEXP_SUBSTR('subject', 'regexp', 1, 1, 'i')

REGEXP_REPLACE(source, regexp, replacement, position, occurrence, modes) 传回来源字符串，其中一个或所有正则表达式相符都已取代。如果找不到相符，则取代原始字符串。如果您为 occurrence 指定正数（请参阅上段），则只会取代该相符。如果您指定零或略过参数，则会取代所有相符。最后三个参数是选用的。replacement 参数是每个正则表达式相符都会被取代的字符串。您可以在取代文本中使用反向引用 \1 到 \9，以重新插入由捕获组相符的文本。您可以多次参照同一个群组。没有取代文本代码可以重新插入整个正则表达式相符。若要运行此操作，请在整个正则表达式中加上括号，并在取代中使用 \1。如果您想要插入文本 \1，请使用字符串 '\\1'。只有在反斜线后接数字或另一个反斜线时，才需要转义反斜线。若要插入文本 \\，请使用字符串 '\\\\'。虽然 SQL 不要求在字符串中转义反斜线，但 REGEXP_REPLACE 函数需要。

SELECT REGEXP_REPLACE(mycolumn, 'regexp', 'replacement') FROM mytable;
result := REGEXP_REPLACE('subject', 'regexp', 'replacement', 1, 0, 'i');

REGEXP_INSTR(来源, 正则表达式, 位置, 出现次数, 回传选项, 模式) 回传来源字符串中正则表达式比对的开头或结尾位置。此函数采用与 REGEXP_SUBSTR 相同的参数，再加一个。设置 return_option 为零或省略参数，以取得比对中第一个字符的地址。设置为一，以取得比对后第一个字符的地址。字符串中的第一个字符为地址 1。如果找不到比对，REGEXP_INSTR 会回传零。最后 4 个参数为选用。

SELECT REGEXP_INSTR(mycolumn, 'regexp', 1, 1, 0, 'i') FROM mytable;

REGEXP_COUNT(来源, 正则表达式, 位置, 模式) 回传正则表达式可以在来源字符串中比对到的次数。如果正则表达式完全找不到比对，则会回传零。此函数仅在 Oracle 11g 及更新版本中提供。

SELECT REGEXP_COUNT(mycolumn, 'regexp', 1, 'i') FROM mytable;

Oracle 的比对模式

四个正则表达式函数所接受的 modes 参数应为包含最多三个字符的字符串，共四个可能的字符。例如，'i' 会打开不分大小写的比对，而 'inm' 会打开这三个选项。'i' 和 'c' 互斥。如果您省略此参数或传递空字符串，则会使用缺省的比对模式。

'i': 打开不分大小写的比对。默认值取决于 NLS_SORT 设置。
'c': 打开分大小写的比对。默认值取决于 NLS_SORT 设置。
'n': 使点与任何字符比对，包括换行符号。缺省情况下，点与任何字符比对，但排除换行符号。
'm': 使插入符号和美元符号与每一行的开头和结尾比对（即在来源字符串中嵌入的换行符号之后和之前）。缺省情况下，这些符号只与字符串的最开头和最结尾比对。
'x': 打开空白模式，忽略字符类别之外的任何未转义空白（10gR2 及更新版本）。

關於正規表示式 » 正規表示式工具和實用程式 » Oracle Database 正規表示式

Regex 工具

資料庫

本網站的更多資訊

Oracle Database 正規表示式

從 10g 第 1 版開始，Oracle Database 提供 4 個 regexp 函數，您可以在 SQL 和 PL/SQL 陳述式中使用。這些函數實作 POSIX 延伸正規表示式 (ERE) 標準。Oracle 完全支援校對順序和等價類別，用於方括號表示式。NLS_SORT 設定會決定所使用的 POSIX 區域設定，而區域設定會決定可用的校對順序和等價類別。

不過，Oracle 並未完全實作 POSIX ERE 標準。它在三個方面有所不同。首先，Oracle 支援正規表示式中 \1 到 \9 的反向參照。POSIX ERE 標準不支援這些，儘管 POSIX BRE 支援。在完全相容的引擎中，\1 到 \9 會是非法的。POSIX 標準指出，使用反斜線來跳脫非元字元的字元是非法的。Oracle 允許這樣做，並只會忽略反斜線。例如，\q 在 Oracle 中與 q 相同。結果是所有 POSIX ERE 正規表示式都可以在 Oracle 中使用，但一些在 Oracle 中有效的正規表示式可能會在完全相容 POSIX 的引擎中造成錯誤。顯然地，如果您只使用 Oracle，這些差異就無關緊要。

第三個差異比較細微。它不會造成任何錯誤，但可能會導致不同的比對。正如我在 POSIX 標準主題中所解釋的，它要求 regex 引擎在交替的情況下傳回最長的比對。Oracle 的引擎不會這樣做。它是一個傳統 NFA 引擎，就像本網站上討論的所有非 POSIX regex 風格一樣。

如果您曾在其他程式語言中使用過正規表示式，請注意 POSIX 不支援非可列印字元的跳脫字元，例如 tab 的 \t 或換行符號的 \n。您可以在 C++ 等程式語言的 POSIX 引擎中使用這些字元，因為 C++ 編譯器會將字串常數中的 \t 和 \n 解釋為 tab 和換行符號。在 SQL 陳述式中，您需要在字串中輸入實際的 tab 或換行符號，並搭配您的正規表示式才能與 tab 或換行符號相符。Oracle 的正規表示式引擎會將字串 '\t' 解釋為正規表示式 t，當傳遞為 regexp 參數時。

Oracle 10g R2 進一步擴充正規表示式語法，新增了自由間隔模式（不支援註解）、簡寫字元類別、惰性量詞，以及錨點 \A、\Z 和 \z。Oracle 11g 和 12c 使用與 10g R2 相同的正規表示式風格。

Oracle 的 REGEXP 函式

Oracle Database 10g 提供四個正規表示式函式。您可以在 SQL 和 PL/SQL 陳述式中平等地使用這些函式。

REGEXP_LIKE(source, regexp, modes) 可能會是您最常使用的一個。您可以在 SELECT 陳述式的 WHERE 和 HAVING 子句中使用它。在 PL/SQL 腳本中，它會傳回一個布林值。您也可以在 CHECK 約束中使用它。source 參數是正規表示式應該與之相符的字串或欄位。regexp 參數是包含正規表示式的字串。modes 參數是選用的。它設定相符模式。

SELECT * FROM mytable WHERE REGEXP_LIKE(mycolumn, 'regexp', 'i');
IF REGEXP_LIKE('subject', 'regexp') THEN /* Match */ ELSE /* No match */ END IF;
ALTER TABLE mytable ADD (CONSTRAINT mycolumn_regexp CHECK (REGEXP_LIKE(mycolumn, '^regexp$')));

REGEXP_SUBSTR(source, regexp, position, occurrence, modes) 傳回一個字串，其中包含由正規表示式相符的 source 部分。如果相符嘗試失敗，則傳回 NULL。您可以將 REGEXP_SUBSTR 與單一字串或欄位一起使用。您可以在 SELECT 子句中使用它，以僅擷取欄位的特定部分。position 參數指定相符嘗試應該開始的來源字串中的字元位置。第一個字元的位址為 1。occurrence 參數指定要取得哪個相符。設定為 1 以取得第一個相符。如果您指定較高的數字，Oracle 會繼續嘗試從前一個相符的結尾開始相符正規表示式，直到找到與您指定的相符數目一樣多的相符。然後傳回最後一個相符。如果相符數目較少，則傳回 NULL。請勿將此參數與反向參照混淆。Oracle 未提供函式來傳回由擷取群組相符的字串部分。最後三個參數是選用的。

SELECT REGEXP_SUBSTR(mycolumn, 'regexp') FROM mytable;
match := REGEXP_SUBSTR('subject', 'regexp', 1, 1, 'i')

REGEXP_REPLACE(source, regexp, replacement, position, occurrence, modes) 傳回來源字串，其中一個或所有正規表示式相符都已取代。如果找不到相符，則取代原始字串。如果您為 occurrence 指定正數（請參閱上段），則只會取代該相符。如果您指定零或略過參數，則會取代所有相符。最後三個參數是選用的。replacement 參數是每個正規表示式相符都會被取代的字串。您可以在取代文字中使用反向參照 \1 到 \9，以重新插入由擷取群組相符的文字。您可以多次參照同一個群組。沒有取代文字代碼可以重新插入整個正規表示式相符。若要執行此操作，請在整個正規表示式中加上括號，並在取代中使用 \1。如果您想要插入文字 \1，請使用字串 '\\1'。只有在反斜線後接數字或另一個反斜線時，才需要跳脫反斜線。若要插入文字 \\，請使用字串 '\\\\'。雖然 SQL 不要求在字串中跳脫反斜線，但 REGEXP_REPLACE 函式需要。

SELECT REGEXP_REPLACE(mycolumn, 'regexp', 'replacement') FROM mytable;
result := REGEXP_REPLACE('subject', 'regexp', 'replacement', 1, 0, 'i');

REGEXP_INSTR(來源, 正規表示式, 位置, 出現次數, 回傳選項, 模式) 回傳來源字串中正規表示式比對的開頭或結尾位置。此函式採用與 REGEXP_SUBSTR 相同的參數，再加一個。設定 return_option 為零或省略參數，以取得比對中第一個字元的位址。設定為一，以取得比對後第一個字元的位址。字串中的第一個字元為位址 1。如果找不到比對，REGEXP_INSTR 會回傳零。最後 4 個參數為選用。

SELECT REGEXP_INSTR(mycolumn, 'regexp', 1, 1, 0, 'i') FROM mytable;

REGEXP_COUNT(來源, 正規表示式, 位置, 模式) 回傳正規表示式可以在來源字串中比對到的次數。如果正規表示式完全找不到比對，則會回傳零。此函式僅在 Oracle 11g 及更新版本中提供。

SELECT REGEXP_COUNT(mycolumn, 'regexp', 1, 'i') FROM mytable;

Oracle 的比對模式

四個正規表示式函式所接受的 modes 參數應為包含最多三個字元的字串，共四個可能的字元。例如，'i' 會開啟不分大小寫的比對，而 'inm' 會開啟這三個選項。'i' 和 'c' 互斥。如果您省略此參數或傳遞空字串，則會使用預設的比對模式。

'i': 開啟不分大小寫的比對。預設值取決於 NLS_SORT 設定。
'c': 開啟分大小寫的比對。預設值取決於 NLS_SORT 設定。
'n': 使點與任何字元比對，包括換行符號。預設情況下，點與任何字元比對，但排除換行符號。
'm': 使插入符號和美元符號與每一行的開頭和結尾比對（即在來源字串中嵌入的換行符號之後和之前）。預設情況下，這些符號只與字串的最開頭和最結尾比對。
'x': 開啟空白模式，忽略字元類別之外的任何未跳脫空白（10gR2 及更新版本）。

About Regular Expressions » Tools and Utilities for Regular Expressions » Oracle Database Regular Expressions

Regex Tools

grep

Languages & Libraries

Databases

Oracle Database Regular Expressions

With version 10g Release 1, Oracle Database offers 4 regexp functions that you can use in SQL and PL/SQL statements. These functions implement the POSIX Extended Regular Expressions (ERE) standard. Oracle fully supports collating sequences and equivalence classes in bracket expressions. The NLS_SORT setting determines the POSIX locale used, which determines the available collating sequences and equivalence classes.

Oracle does not implement the POSIX ERE standard exactly, however. It deviates in three areas. First, Oracle supports the backreferences \1 through \9 in the regular expression. The POSIX ERE standard does not support these, even though POSIX BRE does. In a fully compliant engine, \1 through \9 would be illegal. The POSIX standard states it is illegal to escape a character that is not a metacharacter with a backslash. Oracle allows this, and simply ignores the backslash. E.g. \q is identical to q in Oracle. The result is that all POSIX ERE regular expressions can be used with Oracle, but some regular expressions that work in Oracle may cause an error in a fully POSIX-compliant engine. Obviously, if you only work with Oracle, these differences are irrelevant.

The third difference is more subtle. It won’t cause any errors, but may result in different matches. As I explained in the topic about the POSIX standard, it requires the regex engine to return the longest match in case of alternation. Oracle’s engine does not do this. It is a traditional NFA engine, like all non-POSIX regex flavors discussed on this website.

If you’ve worked with regular expressions in other programming languages, be aware that POSIX does not support non-printable character escapes like \t for a tab or \n for a newline. You can use these with a POSIX engine in a programming language like C++, because the C++ compiler will interpret the \t and \n in string constants. In SQL statements, you’ll need to type an actual tab or line break in the string with your regular expression to make it match a tab or line break. Oracle’s regex engine will interpret the string '\t' as the regex t when passed as the regexp parameter.

Oracle 10g R2 further extends the regex syntax by adding a free-spacing mode (without support for comments), shorthand character classes, lazy quantifiers, and the anchors \A, \Z, and \z. Oracle 11g and 12c use the same regex flavor as 10g R2.

Oracle’s REGEXP Functions

Oracle Database 10g offers four regular expression functions. You can use these equally in your SQL and PL/SQL statements.

REGEXP_LIKE(source, regexp, modes) is probably the one you’ll use most. You can use it in the WHERE and HAVING clauses of a SELECT statement. In a PL/SQL script, it returns a Boolean value. You can also use it in a CHECK constraint. The source parameter is the string or column the regex should be matched against. The regexp parameter is a string with your regular expression. The modes parameter is optional. It sets the matching modes.

SELECT * FROM mytable WHERE REGEXP_LIKE(mycolumn, 'regexp', 'i');
IF REGEXP_LIKE('subject', 'regexp') THEN /* Match */ ELSE /* No match */ END IF;
ALTER TABLE mytable ADD (CONSTRAINT mycolumn_regexp CHECK (REGEXP_LIKE(mycolumn, '^regexp$')));

REGEXP_SUBSTR(source, regexp, position, occurrence, modes) returns a string with the part of source matched by the regular expression. If the match attempt fails, NULL is returned. You can use REGEXP_SUBSTR with a single string or with a column. You can use it in SELECT clauses to retrieve only a certain part of a column. The position parameter specifies the character position in the source string at which the match attempt should start. The first character has position 1. The occurrence parameter specifies which match to get. Set it to 1 to get the first match. If you specify a higher number, Oracle will continue to attempt to match the regex starting at the end of the previous match, until it found as many matches as you specified. The last match is then returned. If there are fewer matches, NULL is returned. Do not confuse this parameter with backreferences. Oracle does not provide a function to return the part of the string matched by a capturing group. The last three parameters are optional.

SELECT REGEXP_SUBSTR(mycolumn, 'regexp') FROM mytable;
match := REGEXP_SUBSTR('subject', 'regexp', 1, 1, 'i')

REGEXP_REPLACE(source, regexp, replacement, position, occurrence, modes) returns the source string with one or all regex matches replaced. If no matches can be found, the original string is replaced. If you specify a positive number for occurrence (see the above paragraph) only that match is replaced. If you specify zero or omit the parameter, all matches are replaced. The last three parameters are optional. The replacement parameter is a string that each regex match will be replaced with. You can use the backreferences \1 through \9 in the replacement text to re-insert text matched by a capturing group. You can reference the same group more than once. There’s no replacement text token to re-insert the whole regex match. To do that, put parentheses around the whole regexp, and use \1 in the replacement. If you want to insert \1 literally, use the string '\\1'. Backslashes only need to be escaped if they’re followed by a digit or another backslash. To insert \\ literally, use the string '\\\\'. While SQL does not require backslashes to be escaped in strings, the REGEXP_REPLACE function does.

SELECT REGEXP_REPLACE(mycolumn, 'regexp', 'replacement') FROM mytable;
result := REGEXP_REPLACE('subject', 'regexp', 'replacement', 1, 0, 'i');

REGEXP_INSTR(source, regexp, position, occurrence, return_option, modes) returns the beginning or ending position of a regex match in the source string. This function takes the same parameters as REGEXP_SUBSTR, plus one more. Set return_option to zero or omit the parameter to get the position of the first character in match. Set it to one to get the position of the first character after the match. The first character in the string has position 1. REGEXP_INSTR returns zero if the match cannot be found. The last 4 parameters are optional.

SELECT REGEXP_INSTR(mycolumn, 'regexp', 1, 1, 0, 'i') FROM mytable;

REGEXP_COUNT(source, regexp, position, modes) returns the number of times the regex can be matched in the source string. It returns zero if the regex finds no matches at all. This function is only available in Oracle 11g and later.

SELECT REGEXP_COUNT(mycolumn, 'regexp', 1, 'i') FROM mytable;

Oracle’s Matching Modes

The modes parameter that each of the four regexp functions accepts should be a string of up to three characters, out of four possible characters. E.g. 'i' turns on case insensitive matching, while 'inm' turns on those three options. 'i' and 'c' are mutually exclusive. If you omit this parameter or pass an empty string, the default matching modes are used.

'i': Turn on case insensitive matching. The default depends on the NLS_SORT setting.
'c': Turn on case sensitive matching. The default depends on the NLS_SORT setting.
'n': Make the dot match any character, including newlines. By default, the dot matches any character except newlines.
'm': Make the caret and dollar match at the start and end of each line (i.e. after and before line breaks embedded in the source string). By default, these only match at the very start and the very end of the string.
'x': Turn on free-spacing mode which ignores any unescaped whitespace outside character classes (10gR2 and later).