发表 admin at 2024年3月5日

类别

正则表达式

标签

关于正则表达式 » 正则表达式工具和实用程序 » PHP 提供三组正则表达式函数

正则表达式工具

数据库

PHP 提供三组正则表达式函数

PHP 是一种用于制作动态网页的开源语言。PHP 有三组函数，可让您使用正则表达式。

最重要的正则表达式函数组以 preg 开头。这些函数是 PCRE 函数库 (Perl 兼容正则表达式) 的 PHP 封装。在本网站的正则表达式教程中，关于 PCRE 正则表达式特性的任何说明都适用于 PHP 的 preg 函数。当教学特别提到 PHP 时，假设您使用的是 preg 函数。对于所有使用正则表达式的新 PHP 代码，您都应该使用 preg 函数。从 PHP 4.2.0 (2002 年 4 月) 开始，PHP 缺省包含 PCRE。

最旧的正则表达式函数组是以 ereg 开头的函数。它们实作 POSIX 扩展正则表达式，就像传统的 UNIX egrep 指令。这些函数主要用于与 PHP 3 的向下兼容性。从 PHP 5.3.0 开始，它们已正式弃用。许多较新的正则表达式功能，例如惰性量词、环顾和 Unicode，都不受 ereg 函数支持。不要被「延伸」这个名称所迷惑。POSIX 标准是在 1986 年定义的，而正则表达式从那时以来已经有了很大的进展。

最后一组是 ereg 组的变体，在函数名称前加上「多字节」的 mb_ 前缀。虽然 ereg 将 regex 和主旨字符串视为一系列 8 比特字符，但 mb_ereg 可以处理来自各种编码页的多字节字符。如果您希望您的 regex 将远东字符视为个别字符，则您需要使用 mb_ereg 函数，或使用带有 /u 修饰词的 preg 函数。 mb_ereg 在 PHP 4.2.0 及后续版本中提供。它使用相同的 POSIX ERE 风格。

preg 函数组

所有 preg 函数都要求您使用 Perl 语法将正则表达式指定为字符串。在 Perl 中，/regex/ 定义正则表达式。在 PHP 中，这会变成 preg_match('/regex/', $subject)。当正斜线用作 regex 分隔符号时，正则表达式中的任何正斜线都必须用反斜线转义。因此 https://www\.domain\.com/ 会变成 '/https:\/\/www\.domain\.com\//'。就像 Perl 一样，preg 函数允许任何非字母数字字符作为 regex 分隔符号。使用百分比符号作为 regex 分隔符号，URL regex 会更具可读性，例如 '%https://www\.domain\.com/%'，因为这样您不需要转义正斜线。如果 regex 中包含任何百分比符号，您必须转义这些符号。

与 C# 或 Java 等编程语言不同，PHP 不需要转义字符串中的所有反斜线。如果您要在 PHP 字符串中包含反斜线作为文本字符，则只有在反斜线后跟随另一个需要转义的字符时，才需要转义反斜线。在单引号字符串中，只有单引号和反斜线本身需要转义。这就是为什么在上述 regex 中，我不必在文本句点前面加倍反斜线。regex \\ 用于匹配单个反斜线，将会变成 PHP preg 字符串 '/\\\\/'。除非您要在正则表达式中使用变量内插，否则您应该始终在 PHP 中使用单引号字符串表示正则表达式，以避免反斜线的混乱重复。

要指定 regex 匹配选项（例如不区分大小写），其指定方式与 Perl 中相同。 '/regex/i' 以不区分大小写的方式套用 regex。 '/regex/s' 使点匹配所有字符。 '/regex/m' 使行首和行尾锚点匹配主旨字符串中的内嵌换行符。 '/regex/x' 打开自由间距模式。您可以指定多个字母来打开多个选项。 '/regex/misx' 打开所有四个选项。

一个特殊的选项是 /u，它会打开 Unicode 比对模式，而不是缺省的 8 字节比对模式。您应该为使用 \x{FFFF}、\X 或 \p{L} 来比对 Unicode 字符、字位、属性或脚本的正则表达式指定 /u。PHP 会将 '/regex/u' 解释为 UTF-8 字符串，而不是 ASCII 字符串。

与 ereg 函数一样，bool preg_match (string pattern, string subject [, array groups]) 如果正则表达式模式与 subject 字符串或 subject 字符串的一部分相符，则传回 TRUE。如果您指定第三个参数，preg 会将由第一个捕获组比对到的子字符串保存在 $groups[1] 中。 $groups[2] 会包含第二个配对，依此类推。如果 regex 模式使用命名截取，您可以使用 $groups['name'] 根据名称访问群组。 $groups[0] 会保留整体比对。

int preg_match_all (string pattern, string subject, array matches, int flags) 会以 subject 字符串中正则表达式模式的所有比对来填满数组 “matches”。如果您指定 PREG_SET_ORDER 作为旗标，则 $matches[0] 是包含第一个比对的比对和反向引用的数组，就像 $groups 数组由 preg_match 填满一样。 $matches[1] 保留第二个比对的结果，依此类推。如果您指定 PREG_PATTERN_ORDER，则 $matches[0] 是包含完整连续 regex 比对的数组，$matches[1] 是包含所有比对的第一个反向引用的数组，$matches[2] 是包含每个比对的第二个反向引用的数组，依此类推。

array preg_grep (string pattern, array subjects) 传回一个数组，其中包含数组 “subjects” 中所有可以由正则表达式模式比对到的字符串。

mixed preg_replace (mixed pattern, mixed replacement, mixed subject [, int limit]) 传回一个字符串，其中 subject 字符串中 regex 模式的所有比对都已用 replacement 字符串取代。最多会进行 limit 次取代。一个关键的差异是，除了 limit 之外，所有参数都可以是数组，而不是字符串。在这种情况下，preg_replace 会运行其工作多次，同时反复处理数组中的元素。您也可以对某些参数使用字符串，对其他参数使用数组。然后，函数会反复处理数组，并对每次反复处理使用相同的字符串。使用模式和取代的数组，让您可以在单一 subject 字符串上运行一系列的搜索和取代作业。对 subject 字符串使用数组，让您可以在多个 subject 字符串上运行相同的搜索和取代作业。

preg_replace_callback (混合模式、回呼替换、混合主旨 [, int 限制]) 的运作方式与 preg_replace 相同，但第二个参数使用回呼，而不是字符串或字符串数组。回呼函数会针对每个相符项调用。回呼应接受单一参数。此参数会是一个字符串数组，元素 0 包含整体正则表达式相符项，而其他元素则包含由捕获组相符的文本。这与您从 preg_match 取得的数组相同。回呼函数应传回相符项应替换的文本。传回空字符串以删除相符项。传回 $groups[0] 以略过此相符项。

回呼让您可以运行强大的搜索和替换作业，而这是仅使用正则表达式无法做到的。例如，如果您搜索正则表达式 (\d+)\+(\d+)，您可以使用回呼将 2+3 替换为 5

function regexadd($groups) {
  return $groups[1] + $groups[2];
}

数组 preg_split (字符串模式、字符串主旨 [, int 限制]) 的运作方式与 split 相同，但它使用 Perl 语法作为正则表达式模式。

请参阅 PHP 手册以取得有关 preg 函数组的更多信息

ereg 函数组

ereg 函数要求您将正则表达式指定为字符串，正如您所预期的。 ereg('regex', "subject") 检查 regex 是否与 subject 相符。传递正则表达式作为文本字符串时，您应使用单引号。几个特殊字符（如美元符号和反斜线）在双引号 PHP 字符串中也是特殊字符，但在单引号 PHP 字符串中则不是。

int ereg (字符串模式、字符串主旨 [, 数组群组]) 如果正则表达式模式与主旨字符串或主旨字符串的一部分相符，则传回相符项的长度；否则传回零。由于零会评估为 False，而非零会评估为 True，因此您可以在 if 陈述式中使用 ereg 来测试相符项。如果您指定第三个参数，ereg 会将正则表达式第一对圆括号之间的部分所相符的子字符串保存在 $groups[1] 中。 $groups[2] 会包含第二对，依此类推。请注意，ereg 不支持仅分组的圆括号。 ereg 区分大小写。 eregi 是不区分大小写的等效函数。

字符串 ereg_replace (字符串模式、字符串替换、字符串主旨) 会将主旨字符串中正则表达式模式的所有相符项替换为替换字符串。您可以在替换字符串中使用反向引用。 \\0 是整个正则表达式相符项，\\1 是第一个反向引用，\\2 是第二个，依此类推。最高的反向引用为 \\9。 ereg_replace 区分大小写。 eregi_replace 是不区分大小写的等效函数。

数组 split (字符串 pattern, 字符串 subject [, 整数 limit]) 使用正则表达式 pattern 将 subject 字符串分割成数组。数组将包含正则表达式比对之间的子字符串。实际比对到的文本将会被舍弃。如果您指定一个限制，结果数组将最多包含这么多子字符串。subject 字符串将最多被分割 limit-1 次，数组中的最后一个项目将包含 subject 字符串中未分割的剩余部分。 split 区分大小写。 spliti 是不区分大小写的等效函数。

请参阅 PHP 手册以取得有关 ereg 函数集的更多信息

mb_ereg 函数集

mb_ereg 函数与 ereg 函数的工作方式完全相同，只有一个关键差异：ereg 将正则表达式和 subject 字符串视为一系列 8 比特元组，而 mb_ereg 可以处理来自各种编码页的多比特元组。例如，使用 Windows 编码页 936 (简体中文) 编码的字词 中国（「中国」）包含四个比特元组：D6D0B9FA。对此字符串使用正则表达式 . 的 ereg 函数会产生第一个比特元组 D6 作为结果。点号比对到一个比特元组，因为 ereg 函数是以比特元组为导向。在调用 mb_regex_encoding("CP936") 之后使用 mb_ereg 函数会产生比特元组 D6D0 或第一个字符 中 作为结果。

为确保您的正则表达式使用正确的编码页，请调用 mb_regex_encoding() 来设置编码页。如果您没有这么做，将改用 mb_internal_encoding() 回传或设置的编码页。

如果您的 PHP 代码使用 UTF-8，您可以使用 preg 函数搭配 /u 修饰词来比对多比特元组 UTF-8 字符，而不是个别比特元组。 preg 函数不支持任何其他编码页。

请参阅 PHP 手册以取得有关 mb_ereg 函数集的更多信息

關於正規表示式 » 正規表示式工具和實用程式 » PHP 提供三組正規表示式函式

正規表示式工具

資料庫

本網站的更多資訊

PHP 提供三組正規表示式函式

PHP 是一種用於製作動態網頁的開源語言。PHP 有三組函式，可讓您使用正規表示式。

最重要的正規表示式函式組以 preg 開頭。這些函式是 PCRE 函式庫 (Perl 相容正規表示式) 的 PHP 封裝。在本網站的正規表示式教學中，關於 PCRE 正規表示式特性的任何說明都適用於 PHP 的 preg 函式。當教學特別提到 PHP 時，假設您使用的是 preg 函式。對於所有使用正規表示式的新 PHP 程式碼，您都應該使用 preg 函式。從 PHP 4.2.0 (2002 年 4 月) 開始，PHP 預設包含 PCRE。

最舊的正規表示式函式組是以 ereg 開頭的函式。它們實作 POSIX 延伸正規表示式，就像傳統的 UNIX egrep 指令。這些函式主要用於與 PHP 3 的向下相容性。從 PHP 5.3.0 開始，它們已正式棄用。許多較新的正規表示式功能，例如惰性量詞、環顧和 Unicode，都不受 ereg 函式支援。不要被「延伸」這個名稱所迷惑。POSIX 標準是在 1986 年定義的，而正規表示式從那時以來已經有了很大的進展。

最後一組是 ereg 組的變體，在函數名稱前加上「多位元組」的 mb_ 前綴。雖然 ereg 將 regex 和主旨字串視為一系列 8 位元字元，但 mb_ereg 可以處理來自各種編碼頁的多位元組字元。如果您希望您的 regex 將遠東字元視為個別字元，則您需要使用 mb_ereg 函數，或使用帶有 /u 修飾詞的 preg 函數。 mb_ereg 在 PHP 4.2.0 及後續版本中提供。它使用相同的 POSIX ERE 風格。

preg 函數組

所有 preg 函數都要求您使用 Perl 語法將正規表示式指定為字串。在 Perl 中，/regex/ 定義正規表示式。在 PHP 中，這會變成 preg_match('/regex/', $subject)。當正斜線用作 regex 分隔符號時，正規表示式中的任何正斜線都必須用反斜線跳脫。因此 https://www\.domain\.com/ 會變成 '/https:\/\/www\.domain\.com\//'。就像 Perl 一樣，preg 函數允許任何非字母數字字元作為 regex 分隔符號。使用百分比符號作為 regex 分隔符號，URL regex 會更具可讀性，例如 '%https://www\.domain\.com/%'，因為這樣您不需要跳脫正斜線。如果 regex 中包含任何百分比符號，您必須跳脫這些符號。

與 C# 或 Java 等程式語言不同，PHP 不需要跳脫字串中的所有反斜線。如果您要在 PHP 字串中包含反斜線作為文字字元，則只有在反斜線後跟隨另一個需要跳脫的字元時，才需要跳脫反斜線。在單引號字串中，只有單引號和反斜線本身需要跳脫。這就是為什麼在上述 regex 中，我不必在文字句點前面加倍反斜線。regex \\ 用於匹配單個反斜線，將會變成 PHP preg 字串 '/\\\\/'。除非您要在正規表示式中使用變數內插，否則您應該始終在 PHP 中使用單引號字串表示正規表示式，以避免反斜線的混亂重複。

要指定 regex 匹配選項（例如不區分大小寫），其指定方式與 Perl 中相同。 '/regex/i' 以不區分大小寫的方式套用 regex。 '/regex/s' 使點匹配所有字元。 '/regex/m' 使行首和行尾錨點匹配主旨字串中的內嵌換行符。 '/regex/x' 開啟自由間距模式。您可以指定多個字母來開啟多個選項。 '/regex/misx' 開啟所有四個選項。

一個特殊的選項是 /u，它會開啟 Unicode 比對模式，而不是預設的 8 位元組比對模式。您應該為使用 \x{FFFF}、\X 或 \p{L} 來比對 Unicode 字元、字位、屬性或腳本的正規表示式指定 /u。PHP 會將 '/regex/u' 解釋為 UTF-8 字串，而不是 ASCII 字串。

與 ereg 函式一樣，bool preg_match (string pattern, string subject [, array groups]) 如果正規表示式模式與 subject 字串或 subject 字串的一部分相符，則傳回 TRUE。如果您指定第三個參數，preg 會將由第一個擷取群組比對到的子字串儲存在 $groups[1] 中。 $groups[2] 會包含第二個配對，依此類推。如果 regex 模式使用命名擷取，您可以使用 $groups['name'] 根據名稱存取群組。 $groups[0] 會保留整體比對。

int preg_match_all (string pattern, string subject, array matches, int flags) 會以 subject 字串中正規表示式模式的所有比對來填滿陣列 “matches”。如果您指定 PREG_SET_ORDER 作為旗標，則 $matches[0] 是包含第一個比對的比對和反向參照的陣列，就像 $groups 陣列由 preg_match 填滿一樣。 $matches[1] 保留第二個比對的結果，依此類推。如果您指定 PREG_PATTERN_ORDER，則 $matches[0] 是包含完整連續 regex 比對的陣列，$matches[1] 是包含所有比對的第一個反向參照的陣列，$matches[2] 是包含每個比對的第二個反向參照的陣列，依此類推。

array preg_grep (string pattern, array subjects) 傳回一個陣列，其中包含陣列 “subjects” 中所有可以由正規表示式模式比對到的字串。

mixed preg_replace (mixed pattern, mixed replacement, mixed subject [, int limit]) 傳回一個字串，其中 subject 字串中 regex 模式的所有比對都已用 replacement 字串取代。最多會進行 limit 次取代。一個關鍵的差異是，除了 limit 之外，所有參數都可以是陣列，而不是字串。在這種情況下，preg_replace 會執行其工作多次，同時反覆處理陣列中的元素。您也可以對某些參數使用字串，對其他參數使用陣列。然後，函式會反覆處理陣列，並對每次反覆處理使用相同的字串。使用模式和取代的陣列，讓您可以在單一 subject 字串上執行一系列的搜尋和取代作業。對 subject 字串使用陣列，讓您可以在多個 subject 字串上執行相同的搜尋和取代作業。

preg_replace_callback (混合模式、回呼替換、混合主旨 [, int 限制]) 的運作方式與 preg_replace 相同，但第二個參數使用回呼，而不是字串或字串陣列。回呼函式會針對每個相符項呼叫。回呼應接受單一參數。此參數會是一個字串陣列，元素 0 包含整體正規表示式相符項，而其他元素則包含由擷取群組相符的文字。這與您從 preg_match 取得的陣列相同。回呼函式應傳回相符項應替換的文字。傳回空字串以刪除相符項。傳回 $groups[0] 以略過此相符項。

回呼讓您可以執行強大的搜尋和替換作業，而這是僅使用正規表示式無法做到的。例如，如果您搜尋正規表示式 (\d+)\+(\d+)，您可以使用回呼將 2+3 替換為 5

function regexadd($groups) {
  return $groups[1] + $groups[2];
}

陣列 preg_split (字串模式、字串主旨 [, int 限制]) 的運作方式與 split 相同，但它使用 Perl 語法作為正規表示式模式。

請參閱 PHP 手冊以取得有關 preg 函式組的更多資訊

ereg 函式組

ereg 函式要求您將正規表示式指定為字串，正如您所預期的。 ereg('regex', "subject") 檢查 regex 是否與 subject 相符。傳遞正規表示式作為文字字串時，您應使用單引號。幾個特殊字元（如美元符號和反斜線）在雙引號 PHP 字串中也是特殊字元，但在單引號 PHP 字串中則不是。

int ereg (字串模式、字串主旨 [, 陣列群組]) 如果正規表示式模式與主旨字串或主旨字串的一部分相符，則傳回相符項的長度；否則傳回零。由於零會評估為 False，而非零會評估為 True，因此您可以在 if 陳述式中使用 ereg 來測試相符項。如果您指定第三個參數，ereg 會將正規表示式第一對圓括號之間的部分所相符的子字串儲存在 $groups[1] 中。 $groups[2] 會包含第二對，依此類推。請注意，ereg 不支援僅分組的圓括號。 ereg 區分大小寫。 eregi 是不區分大小寫的等效函式。

字串 ereg_replace (字串模式、字串替換、字串主旨) 會將主旨字串中正規表示式模式的所有相符項替換為替換字串。您可以在替換字串中使用反向參照。 \\0 是整個正規表示式相符項，\\1 是第一個反向參照，\\2 是第二個，依此類推。最高的反向參照為 \\9。 ereg_replace 區分大小寫。 eregi_replace 是不區分大小寫的等效函式。

陣列 split (字串 pattern, 字串 subject [, 整數 limit]) 使用正則表示式 pattern 將 subject 字串分割成陣列。陣列將包含正則表示式比對之間的子字串。實際比對到的文字將會被捨棄。如果您指定一個限制，結果陣列將最多包含這麼多子字串。subject 字串將最多被分割 limit-1 次，陣列中的最後一個項目將包含 subject 字串中未分割的剩餘部分。 split 區分大小寫。 spliti 是不區分大小寫的等效函數。

請參閱 PHP 手冊以取得有關 ereg 函數集的更多資訊

mb_ereg 函數集

mb_ereg 函數與 ereg 函數的工作方式完全相同，只有一個關鍵差異：ereg 將正則表示式和 subject 字串視為一系列 8 位元元組，而 mb_ereg 可以處理來自各種編碼頁的多位元元組。例如，使用 Windows 編碼頁 936 (簡體中文) 編碼的字詞 中国（「中國」）包含四個位元元組：D6D0B9FA。對此字串使用正則表示式 . 的 ereg 函數會產生第一個位元元組 D6 作為結果。點號比對到一個位元元組，因為 ereg 函數是以位元元組為導向。在呼叫 mb_regex_encoding("CP936") 之後使用 mb_ereg 函數會產生位元元組 D6D0 或第一個字元 中 作為結果。

為確保您的正則表示式使用正確的編碼頁，請呼叫 mb_regex_encoding() 來設定編碼頁。如果您沒有這麼做，將改用 mb_internal_encoding() 回傳或設定的編碼頁。

如果您的 PHP 程式碼使用 UTF-8，您可以使用 preg 函數搭配 /u 修飾詞來比對多位元元組 UTF-8 字元，而不是個別位元元組。 preg 函數不支援任何其他編碼頁。

請參閱 PHP 手冊以取得有關 mb_ereg 函數集的更多資訊

About Regular Expressions » Tools and Utilities for Regular Expressions » PHP Provides Three Sets of Regular Expression Functions

Regex Tools

grep

Languages & Libraries

Databases

PHP Provides Three Sets of Regular Expression Functions

PHP is an open source language for producing dynamic web pages. PHP has three sets of functions that allow you to work with regular expressions.

The most important set of regex functions start with preg. These functions are a PHP wrapper around the PCRE library (Perl-Compatible Regular Expressions). Anything said about the PCRE regex flavor in the regular expressions tutorial on this website applies to PHP’s preg functions. When the tutorial talks about PHP specifically, it assumes you’re using the preg functions. You should use the preg functions for all new PHP code that uses regular expressions. PHP includes PCRE by default as of PHP 4.2.0 (April 2002).

The oldest set of regex functions are those that start with ereg. They implement POSIX Extended Regular Expressions, like the traditional UNIX egrep command. These functions are mainly for backward compatibility with PHP 3. They are officially deprecated as of PHP 5.3.0. Many of the more modern regex features such as lazy quantifiers, lookaround and Unicode are not supported by the ereg functions. Don’t let the “extended” moniker fool you. The POSIX standard was defined in 1986, and regular expressions have come a long way since then.

The last set is a variant of the ereg set, prefixing mb_ for “multibyte” to the function names. While ereg treats the regex and subject string as a series of 8-bit characters, mb_ereg can work with multi-byte characters from various code pages. If you want your regex to treat Far East characters as individual characters, you’ll either need to use the mb_ereg functions, or the preg functions with the /u modifier. mb_ereg is available in PHP 4.2.0 and later. It uses the same POSIX ERE flavor.

The preg Function Set

All of the preg functions require you to specify the regular expression as a string using Perl syntax. In Perl, /regex/ defines a regular expression. In PHP, this becomes preg_match('/regex/', $subject). When forward slashes are used as the regex delimiter, any forward slashes in the regular expression have to be escaped with a backslash. So https://www\.domain\.com/ becomes '/https:\/\/www\.domain\.com\//'. Just like Perl, the preg functions allow any non-alphanumeric character as regex delimiters. The URL regex would be more readable as '%https://www\.domain\.com/%' using percentage signs as the regex delimiters, since then you don’t need to escape the forward slashes. You would have to escape percentage sings if the regex contained any.

Unlike programming languages like C# or Java, PHP does not require all backslashes in strings to be escaped. If you want to include a backslash as a literal character in a PHP string, you only need to escape it if it is followed by another character that needs to be escaped. In single quoted-strings, only the single quote and the backslash itself need to be escaped. That is why in the above regex, I didn’t have to double the backslashes in front of the literal dots. The regex \\ to match a single backslash would become '/\\\\/' as a PHP preg string. Unless you want to use variable interpolation in your regular expression, you should always use single-quoted strings for regular expressions in PHP, to avoid messy duplication of backslashes.

To specify regex matching options such as case insensitivity are specified in the same way as in Perl. '/regex/i' applies the regex case insensitively. '/regex/s' makes the dot match all characters. '/regex/m' makes the start and end of line anchors match at embedded newlines in the subject string. '/regex/x' turns on free-spacing mode. You can specify multiple letters to turn on several options. '/regex/misx' turns on all four options.

A special option is the /u which turns on the Unicode matching mode, instead of the default 8-bit matching mode. You should specify /u for regular expressions that use \x{FFFF}, \X or \p{L} to match Unicode characters, graphemes, properties or scripts. PHP will interpret '/regex/u' as a UTF-8 string rather than as an ASCII string.

Like the ereg function, bool preg_match (string pattern, string subject [, array groups]) returns TRUE if the regular expression pattern matches the subject string or part of the subject string. If you specify the third parameter, preg will store the substring matched by the first capturing group in $groups[1]. $groups[2] will contain the second pair, and so on. If the regex pattern uses named capture, you can access the groups by name with $groups['name']. $groups[0] will hold the overall match.

int preg_match_all (string pattern, string subject, array matches, int flags) fills the array “matches” with all the matches of the regular expression pattern in the subject string. If you specify PREG_SET_ORDER as the flag, then $matches[0] is an array containing the match and backreferences of the first match, just like the $groups array filled by preg_match. $matches[1] holds the results for the second match, and so on. If you specify PREG_PATTERN_ORDER, then $matches[0] is an array with full consecutive regex matches, $matches[1] an array with the first backreference of all matches, $matches[2] an array with the second backreference of each match, etc.

array preg_grep (string pattern, array subjects) returns an array that contains all the strings in the array “subjects” that can be matched by the regular expression pattern.

mixed preg_replace (mixed pattern, mixed replacement, mixed subject [, int limit]) returns a string with all matches of the regex pattern in the subject string replaced with the replacement string. At most limit replacements are made. One key difference is that all parameters, except limit, can be arrays instead of strings. In that case, preg_replace does its job multiple times, iterating over the elements in the arrays simultaneously. You can also use strings for some parameters, and arrays for others. Then the function will iterate over the arrays, and use the same strings for each iteration. Using an array of the pattern and replacement, allows you to perform a sequence of search and replace operations on a single subject string. Using an array for the subject string, allows you to perform the same search and replace operation on many subject strings.

preg_replace_callback (mixed pattern, callback replacement, mixed subject [, int limit]) works just like preg_replace, except that the second parameter takes a callback instead of a string or an array of strings. The callback function will be called for each match. The callback should accept a single parameter. This parameter will be an array of strings, with element 0 holding the overall regex match, and the other elements the text matched by capturing groups. This is the same array you’d get from preg_match. The callback function should return the text that the match should be replaced with. Return an empty string to delete the match. Return $groups[0] to skip this match.

Callbacks allow you to do powerful search-and-replace operations that you cannot do with regular expressions alone. E.g. if you search for the regex (\d+)\+(\d+), you can replace 2+3 with 5 using the callback:

function regexadd($groups) {
  return $groups[1] + $groups[2];
}

array preg_split (string pattern, string subject [, int limit]) works just like split, except that it uses the Perl syntax for the regex pattern.

See the PHP manual for more information on the preg function set

The ereg Function Set

The ereg functions require you to specify the regular expression as a string, as you would expect. ereg('regex', "subject") checks if regex matches subject. You should use single quotes when passing a regular expression as a literal string. Several special characters like the dollar and backslash are also special characters in double-quoted PHP strings, but not in single-quoted PHP strings.

int ereg (string pattern, string subject [, array groups]) returns the length of the match if the regular expression pattern matches the subject string or part of the subject string, or zero otherwise. Since zero evaluates to False and non-zero evaluates to True, you can use ereg in an if statement to test for a match. If you specify the third parameter, ereg will store the substring matched by the part of the regular expression between the first pair of parentheses in $groups[1]. $groups[2] will contain the second pair, and so on. Note that grouping-only parentheses are not supported by ereg. ereg is case sensitive. eregi is the case insensitive equivalent.

string ereg_replace (string pattern, string replacement, string subject) replaces all matches of the regex patten in the subject string with the replacement string. You can use backreferences in the replacement string. \\0 is the entire regex match, \\1 is the first backreference, \\2 the second, etc. The highest possible backreference is \\9. ereg_replace is case sensitive. eregi_replace is the case insensitive equivalent.

array split (string pattern, string subject [, int limit]) splits the subject string into an array of strings using the regular expression pattern. The array will contain the substrings between the regular expression matches. The text actually matched is discarded. If you specify a limit, the resulting array will contain at most that many substrings. The subject string will be split at most limit-1 times, and the last item in the array will contain the unsplit remainder of the subject string. split is case sensitive. spliti is the case insensitive equivalent.

See the PHP manual for more information on the ereg function set

The mb_ereg Function Set

The mb_ereg functions work exactly the same as the ereg functions, with one key difference: while ereg treats the regex and subject string as a series of 8-bit characters, mb_ereg can work with multi-byte characters from various code pages. E.g. encoded with Windows code page 936 (Simplified Chinese), the word 中国 (“China”) consists of four bytes: D6D0B9FA. Using the ereg function with the regular expression . on this string would yield the first byte D6 as the result. The dot matched exactly one byte, as the ereg functions are byte-oriented. Using the mb_ereg function after calling mb_regex_encoding("CP936") would yield the bytes D6D0 or the first character 中 as the result.

To make sure your regular expression uses the correct code page, call mb_regex_encoding() to set the code page. If you don’t, the code page returned by or set by mb_internal_encoding() is used instead.

If your PHP script uses UTF-8, you can use the preg functions with the /u modifier to match multi-byte UTF-8 characters instead of individual bytes. The preg functions do not support any other code pages.

See the PHP manual for more information on the mb_ereg function set