发表 admin at 2024年3月5日

类别

正则表达式

标签

关于正则表达式 » 正则表达式工具和实用程序 » PostgreSQL 有三种正则表达式风格

正则表达式工具

数据库

PostgreSQL 有三种正则表达式风格

PostgreSQL 7.4 和更新版本使用 Henry Spencer 为 Tcl 8.2 开发的完全相同的正则表达式引擎。这表示 PostgreSQL 支持相同的三种正则表达式风格：Tcl 高端正则表达式、POSIX 扩展正则表达式和 POSIX 基本正则表达式。就像在 Tcl 中一样，ARE 是默认值。我对 Tcl 的正则表达式风格的所有评论，例如不寻常的模式修改符号和字词边界标记，完全适用于 PostgreSQL。如果您不熟悉 Tcl 的 ARE，您绝对应该检阅它们。不过，遗憾的是，PostgreSQL 的 regexp_replace 函数并未对替换文本使用与 Tcl 的 regsub 指令相同的语法。

7.4 之前的 PostgreSQL 版本仅支持 POSIX 扩展正则表达式。如果您要将旧的数据库代码移转到新版本的 PostgreSQL，您可以将 PostgreSQL 的「regex_flavor」运行时间参数设置为「extended」，而不是缺省的「advanced」，以使 ERE 成为默认值。

PostgreSQL 也支持传统的 SQL LIKE 操作符，以及 SQL:1999 SIMILAR TO 操作符。这些使用它们自己的模式语言，本文未讨论这些语言。ARE 强大许多，而且如果您不使用 LIKE 或 SIMILAR TO 未提供的功能，它们并不会更复杂。

波浪号操作符

波浪号中缀操作符会传回 true 或 false，具体取决于正则表达式是否可以比对字符串的一部分。例如，'subject' ~ 'regexp' 传回 false，而 'subject' ~ '\\w' 传回 true。如果正则表达式必须比对整个字符串，您需要使用锚点。例如，'subject' ~ '^\\w$' 传回 false，而 'subject' ~ '^\\w+$' 传回 true。此操作符有 4 种变化

~ 尝试进行大小写敏感的比对
~* 尝试进行不区分大小写的比对
!~ 尝试进行区分大小写的比对，如果正则表达式与主旨字符串的任何部分不符，则传回 true
!~* 尝试进行不区分大小写的比对，如果正则表达式与主旨字符串的任何部分不符，则传回 true

虽然只有区分大小写可以由操作符切换，但所有其他选项都可以在正则表达式的开头使用模式修改器来设置。模式修改器会覆写操作符类型。例如，‘(?c)regex’ 强制 regex 区分大小写。

此操作符最常见的用法是根据字段是否符合正则表达式来选取列，例如

select * from mytable where mycolumn ~* 'regexp'

正则表达式作为 PostgreSQL 文本字符串

反斜线用于转义 PostgreSQL 字符串中的字符。因此，包含反斜线的正则表达式，例如 \w，在写成 PostgreSQL 陈述式中的文本字符串时会变成 '\\w'。若要比对单一文本反斜线，您需要正则表达式 \\，在 PostgreSQL 中会变成 '\\\\'。

PostgreSQL Regexp 函数

使用 substring(字符串 from 样式) 函数，您可以截取字符串或字段的部分。它需要两个参数：您要从中截取文本的字符串，以及截取的文本应符合的样式。如果没有比对，substring() 会传回 null。例如，substring('subject' from 'regexp') 会传回 null。如果存在比对，且正则表达式有一个或多个捕获组，则会传回与第一个捕获组比对的文本。例如，substring('subject' from 's(\\w)') 会传回 ‘u’。如果存在比对，但正则表达式没有捕获组，则会传回整个正则表达式比对。例如，substring('subject' from 's\\w') 会传回 ‘su’。如果正则表达式与字符串比对多次，则只会传回第一个比对。由于 substring() 函数不采用「旗标」参数，因此您需要使用模式修改器来切换任何比对选项。

此函数特别适用于从字段中截取信息。例如，若要为每个列从字段 mycolumn 中截取第一个数字，请使用

select substring(mycolumn from '\d+') from mytable

使用 regexp_replace(主旨, 样式, 替换 [, 旗标])，您可以在字符串中替换正则表达式比对。如果您省略旗标参数，则正则表达式会区分大小写地套用，且只会替换第一个比对。如果您将旗标设置为 'i'，则正则表达式会不区分大小写地套用。'g' 旗标（代表「全域」）会导致正则表达式在字符串中的所有比对都被替换。您可以将两个旗标合并为 'gi'。

您可以在替换文本中使用反向引用 \1 到 \9，将与捕获组比对的文本重新插入正则表达式中。 \& 会重新插入整个正则表达式比对。请记得在文本字符串中加倍反斜线。

例如：regexp_replace('subject', '(\w)\w', '\&\1', 'g') 会传回 'susbjbecet'。

PostgreSQL 8.3 及更新版本有两个新函数，可用来沿着正则表达式比对来分割字符串。regexp_split_to_table(subject, pattern[, flags]) 会将分割的字符串传回为一个新表格。regexp_split_to_array(subject, pattern[, flags]) 会将分割的字符串传回为一个文本数组。如果正则表达式找不到任何比对，这两个函数都会传回主旨字符串。

關於正規表示式 » 正規表示式工具和實用程式 » PostgreSQL 有三種正則表示式風格

正規表示式工具

資料庫

本網站的更多資訊

PostgreSQL 有三種正則表示式風格

PostgreSQL 7.4 和更新版本使用 Henry Spencer 為 Tcl 8.2 開發的完全相同的正則表示式引擎。這表示 PostgreSQL 支援相同的三種正則表示式風格：Tcl 進階正則表示式、POSIX 延伸正則表示式和 POSIX 基本正則表示式。就像在 Tcl 中一樣，ARE 是預設值。我對 Tcl 的正則表示式風格的所有評論，例如不尋常的模式修改符號和字詞邊界標記，完全適用於 PostgreSQL。如果您不熟悉 Tcl 的 ARE，您絕對應該檢閱它們。不過，遺憾的是，PostgreSQL 的 regexp_replace 函數並未對替換文字使用與 Tcl 的 regsub 指令相同的語法。

7.4 之前的 PostgreSQL 版本僅支援 POSIX 延伸正則表示式。如果您要將舊的資料庫程式碼移轉到新版本的 PostgreSQL，您可以將 PostgreSQL 的「regex_flavor」執行時間參數設定為「extended」，而不是預設的「advanced」，以使 ERE 成為預設值。

PostgreSQL 也支援傳統的 SQL LIKE 運算子，以及 SQL:1999 SIMILAR TO 運算子。這些使用它們自己的模式語言，本文未討論這些語言。ARE 強大許多，而且如果您不使用 LIKE 或 SIMILAR TO 未提供的功能，它們並不會更複雜。

波浪號運算子

波浪號中綴運算子會傳回 true 或 false，具體取決於正則表示式是否可以比對字串的一部分。例如，'subject' ~ 'regexp' 傳回 false，而 'subject' ~ '\\w' 傳回 true。如果正規表示式必須比對整個字串，您需要使用錨點。例如，'subject' ~ '^\\w$' 傳回 false，而 'subject' ~ '^\\w+$' 傳回 true。此運算子有 4 種變化

~ 嘗試進行大小寫敏感的比對
~* 嘗試進行不區分大小寫的比對
!~ 嘗試進行區分大小寫的比對，如果正規表示式與主旨字串的任何部分不符，則傳回 true
!~* 嘗試進行不區分大小寫的比對，如果正規表示式與主旨字串的任何部分不符，則傳回 true

雖然只有區分大小寫可以由運算子切換，但所有其他選項都可以在正規表示式的開頭使用模式修改器來設定。模式修改器會覆寫運算子類型。例如，‘(?c)regex’ 強制 regex 區分大小寫。

此運算子最常見的用法是根據欄位是否符合正規表示式來選取列，例如

select * from mytable where mycolumn ~* 'regexp'

正規表示式作為 PostgreSQL 文字字串

反斜線用於跳脫 PostgreSQL 字串中的字元。因此，包含反斜線的正規表示式，例如 \w，在寫成 PostgreSQL 陳述式中的文字字串時會變成 '\\w'。若要比對單一文字反斜線，您需要正規表示式 \\，在 PostgreSQL 中會變成 '\\\\'。

PostgreSQL Regexp 函數

使用 substring(字串 from 樣式) 函數，您可以擷取字串或欄位的部分。它需要兩個參數：您要從中擷取文字的字串，以及擷取的文字應符合的樣式。如果沒有比對，substring() 會傳回 null。例如，substring('subject' from 'regexp') 會傳回 null。如果存在比對，且正規表示式有一個或多個擷取群組，則會傳回與第一個擷取群組比對的文字。例如，substring('subject' from 's(\\w)') 會傳回 ‘u’。如果存在比對，但正規表示式沒有擷取群組，則會傳回整個正規表示式比對。例如，substring('subject' from 's\\w') 會傳回 ‘su’。如果正規表示式與字串比對多次，則只會傳回第一個比對。由於 substring() 函數不採用「旗標」參數，因此您需要使用模式修改器來切換任何比對選項。

此函數特別適用於從欄位中擷取資訊。例如，若要為每個列從欄位 mycolumn 中擷取第一個數字，請使用

select substring(mycolumn from '\d+') from mytable

使用 regexp_replace(主旨, 樣式, 替換 [, 旗標])，您可以在字串中替換正規表示式比對。如果您省略旗標參數，則正規表示式會區分大小寫地套用，且只會替換第一個比對。如果您將旗標設定為 'i'，則正規表示式會不區分大小寫地套用。'g' 旗標（代表「全域」）會導致正規表示式在字串中的所有比對都被替換。您可以將兩個旗標合併為 'gi'。

您可以在替換文字中使用反向參照 \1 到 \9，將與擷取群組比對的文字重新插入正規表示式中。 \& 會重新插入整個正規表示式比對。請記得在文字字串中加倍反斜線。

例如：regexp_replace('subject', '(\w)\w', '\&\1', 'g') 會傳回 'susbjbecet'。

PostgreSQL 8.3 及更新版本有兩個新函式，可用來沿著正規表示式比對來分割字串。regexp_split_to_table(subject, pattern[, flags]) 會將分割的字串傳回為一個新表格。regexp_split_to_array(subject, pattern[, flags]) 會將分割的字串傳回為一個文字陣列。如果正規表示式找不到任何比對，這兩個函式都會傳回主旨字串。

About Regular Expressions » Tools and Utilities for Regular Expressions » PostgreSQL Has Three Regular Expression Flavors

Regex Tools

grep

Languages & Libraries

Databases

PostgreSQL Has Three Regular Expression Flavors

PostgreSQL 7.4 and later use the exact same regular expression engine that was developed by Henry Spencer for Tcl 8.2. This means that PostgreSQL supports the same three regular expressions flavors: Tcl Advanced Regular Expressions, POSIX Extended Regular Expressions and POSIX Basic Regular Expressions. Just like in Tcl, AREs are the default. All my comments on Tcl’s regular expression flavor, like the unusual mode modifiers and word boundary tokens, fully apply to PostgreSQL. You should definitely review them if you’re not familiar with Tcl’s AREs. Unfortunately, PostgreSQL’s regexp_replace function does not use the same syntax for the replacement text as Tcl’s regsub command, however.

PostgreSQL versions prior to 7.4 supported POSIX Extended Regular Expressions only. If you are migrating old database code to a new version of PostgreSQL, you can set PostgreSQL’s “regex_flavor” run-time parameter to “extended” instead of the default “advanced” to make EREs the default.

PostgreSQL also supports the traditional SQL LIKE operator, and the SQL:1999 SIMILAR TO operator. These use their own pattern languages, which are not discussed here. AREs are far more powerful, and no more complicated if you don’t use functionality not offered by LIKE or SIMILAR TO.

The Tilde Operator

The tilde infix operator returns true or false depending on whether a regular expression can match part of a string, or not. E.g. 'subject' ~ 'regexp' returns false, while 'subject' ~ '\\w' returns true. If the regex must match the whole string, you’ll need to use anchors. E.g. 'subject' ~ '^\\w$' returns false, while 'subject' ~ '^\\w+$' returns true. There are 4 variations of this operator:

~ attempts a case sensitive match
~* attempts a case insensitive match
!~ attempts a case sensitive match, and returns true if the regex does not match any part of the subject string
!~* attempts a case insensitive match, and returns true if the regex does not match any part of the subject string

While only case sensitivity can be toggled by the operator, all other options can be set using mode modifiers at the start of the regular expression. Mode modifiers override the operator type. E.g. ‘(?c)regex’ forces the to be regex case sensitive.

The most common use of this operator is to select rows based on whether a column matches a regular expression, e.g.:

select * from mytable where mycolumn ~* 'regexp'

Regular Expressions as Literal PostgreSQL Strings

The backslash is used to escape characters in PostgreSQL strings. So a regular expression like \w that contains a backslash becomes '\\w' when written as a literal string in a PostgreSQL statement. To match a single literal backslash, you’ll need the regex \\ which becomes '\\\\' in PostgreSQL.

PostgreSQL Regexp Functions

With the substring(string from pattern) function, you can extract part of a string or column. It takes two parameters: the string you want to extract the text from, and the pattern the extracted text should match. If there is no match, substring() returns null. E.g. substring('subject' from 'regexp') returns null. If there is a match, and the regex has one or more capturing groups, the text matched by the first capturing group is returned. E.g. substring('subject' from 's(\\w)') returns ‘u’. If there is a match, but the regex has no capturing groups, the whole regex match is returned. E.g. substring('subject' from 's\\w') returns ‘su’. If the regex matches the string more than once, only the first match is returned. Since the substring() function doesn’t take a “flags” parameter, you’ll need to toggle any matching options using mode modifiers.

This function is particularly useful to extract information from columns. E.g. to extract the first number from the column mycolumn for each row, use:

select substring(mycolumn from '\d+') from mytable

With regexp_replace(subject, pattern, replacement [, flags]) you can replace regex matches in a string. If you omit the flags parameter, the regex is applied case sensitively, and only the first match is replaced. If you set the flags to 'i', the regex is applied case insensitively. The 'g' flag (for “global”) causes all regex matches in the string to be replaced. You can combine both flags as 'gi'.

You can use the backreferences \1 through \9 in the replacement text to re-insert the text matched by a capturing group into the regular expression. \& re-inserts the whole regex match. Remember to double up the backslashes in literal strings.

E.g. regexp_replace('subject', '(\w)\w', '\&\1', 'g') returns 'susbjbecet'.

PostgreSQL 8.3 and later have two new functions to split a string along its regex matches. regexp_split_to_table(subject, pattern[, flags]) returns the split string as a new table. regexp_split_to_array(subject, pattern[, flags]) returns the split string as an array of text. If the regex finds no matches, both functions return the subject string.