C++11 标准中定义的 C++ 标准函数库在 <regex> 标头中提供正则表达式的支持。在 C++11 之前，<regex> 是 C++ 标准函数库的 TR1 延伸模块的一部分。当此网站提到 std::regex 时，是指包含在 Visual C++ 2008 及更新版本中的 C++ 标准函数库的 Dinkumware 实作。当目标为 Win64 时，C++Builder XE3 及更新版本也支持此函数库。在 Visual C++ 2008 中，命名空间是 std::tr1::regex，而不是 std::regex。

C++Builder 10 及更新版本支持 Dinkumware 实作 std::regex，只要将使用传统 Borland 编译器的选项禁用，即可锁定 Win32。在 C++Builder XE3 及更新版本中使用传统 Borland 编译器时，您可以使用 boost::regex 取代 std::regex。虽然 std::regex 在 TR1 和 C++11 中定义的运算和类别与 boost::regex 几乎相同，但实际 regex 风格仍有许多重要的差异。最重要的是，Boost 中的 ECMAScript regex 语法添加了许多从 Perl 借用的功能，这些功能并非 ECMAScript 标准的一部分，且未在 Dinkumware 函数库中实作。

六种正则表达式风格

std::regex_constants 中定义了六种不同的正则表达式风格或语法

ECMAScript：类似于 JavaScript
basic：类似于 POSIX BRE。
extended：类似于 POSIX ERE。
grep：与 basic 相同，但会将换行符号视为交替操作符。
egrep：与 extended 相同，但会将换行符号视为交替操作符。
awk：与 extended 相同，但会支持非可打印字符的常见转义字符。

大多数 C++ 参考都说明 C++11 实作了 ECMA-262v3 和 POSIX 标准中定义的正则表达式。但实际上，C++ 实作是根据这些标准非常松散地创建的。语法非常接近。唯一的重大差异是 std::regex 即使在 ECMAScript 模式下也支持 POSIX 类别，且对于哪些字符必须转义（例如大括号和右中括号）以及哪些字符不需转义（例如字母）有些特别。

但此语法的实际行为有重要的差异。在 std::regex 中，插入符号和美元符号始终会与内嵌换行符号相符，而在 JavaScript 和 POSIX 中，这是一个选项。与大多数 regex 风格一样，对非参与群组的反向引用无法相符，而在 JavaScript 中，它们会找到零长度相符。在 JavaScript 中，\d 和 \w 仅限于 ASCII，而 \s 则相符于所有 Unicode 空白。这很奇怪，但所有现代浏览器都遵循此规范。在 std::regex 中，使用 char 字符串时，所有简写都仅限于 ASCII。在 Visual C++ 中（但 C++Builder 中没有），使用 wchar_t 字符串时，它们支持 Unicode。在 Visual C++ 中使用 wchar_t 时，POSIX 类别也会相符于非 ASCII 字符，但并未一致包含所有预期的 Unicode 字符。

实际上，您大多会使用 ECMAScript 语法。它是缺省语法，提供的功能远多于其他语法。每当本网站上的教程提到 std::regex 但未提到任何语法时，所写的内容就适用于 ECMAScript 语法，可能适用于其他语法，也可能不适用。您实际上只会在想要重复使用旧 POSIX 代码或 UNIX 脚本中的现有正则表达式时，才会使用其他语法。

创建正则表达式对象

在使用正则表达式之前，您必须创建范本类别 std::basic_regex 的对象。如果您要处理的主题是 char 数组或 std::string 对象，您可以轻松地使用这个范本类别的 std::regex 实例化来运行此动作。如果您要处理的主题是 wchar_t 数组或 std::wstring 对象，请使用 std::wregex 实例化。

将您的正则表达式作为字符串传递给构造函数的第一个参数。如果您想要使用 ECMAScript 以外的正则表达式风格，请将适当的常数作为第二个参数传递。您可以将这个常数「或」运算 std::regex_constants::icase，以让正则表达式不区分大小写。您也可以将它「或」运算 std::regex_constants::nosubs，以将所有捕获组转换为非捕获组，如果您只关心整体正则表达式比对，而且不想要截取任何捕获组比对到的文本，这样可以让您的正则表达式更有效率。

寻找正则表达式比对

调用 std::regex_search()，并将您的主题字符串作为第一个参数，将正则表达式对象作为第二个参数，以检查您的正则表达式是否可以比对字符串的任何部分。如果您想要检查您的正则表达式是否可以比对整个主题字符串，请调用 std::regex_match()，并使用相同的参数。由于 std::regex 缺乏专门在字符串开头和结尾比对的锚定，因此您在使用正则表达式验证用户输入时，必须调用 regex_match()。

regex_search() 和 regex_match() 都只会传回 true 或 false。若要取得 regex_search() 比对到的字符串部分，或是在使用任一函数时取得捕获组比对到的字符串部分，您需要将范本类别 std::match_results 的对象作为第二个参数传递。正则表达式对象接着会成为第三个参数。使用下列四个范本实例化之一的缺省构造函数来创建这个对象

如果您要处理的主题是 char 数组，请使用 std::cmatch
如果您要处理的主题是 std::string 对象，请使用 std::smatch
如果您要处理的主题是 wchar_t 数组，请使用 std::wcmatch
如果您要处理的主题是 std::wstring 对象，请使用 std::wsmatch

当函数调用传回 true 时，您可以调用 match_results 对象的 str()、position() 和 length() 成员函数，以取得配对的文本，或相对于主旨字符串的配对开始位置及其长度。调用这些成员函数时不带参数或以 0 为参数，以取得整体 regex 配对。调用时传递 1 或更大的数字，以取得特定捕获组的配对。size() 成员函数指出捕获组的数量，加上整体配对的 1。因此，您可以传递一个值，范围为 size()-1，至其他三个成员函数。

将所有内容组合在一起，我们可以像这样取得第一个捕获组配对的文本

std::string subject("Name: John Doe");
std::string result;
try {
  std::regex re("Name: (.*)");
  std::smatch match;
  if (std::regex_search(subject, match, re) && match.size() > 1) {
    result = match.str(1);
  } else {
    result = std::string("");
  }
} catch (std::regex_error& e) {
  // Syntax error in the regular expression
}

寻找所有 Regex 配对

若要寻找字符串中的所有 regex 配对，您需要使用反复运算器。使用这四个范本实例化之一，创建范本类别 std::regex_iterator 的对象

当您的主旨是 char 数组时，使用 std::cregex_iterator
当您的主旨是 std::string 对象时，使用 std::sregex_iterator
当您的主旨是 wchar_t 数组时，使用 std::wcregex_iterator
当您的主旨是 std::wstring 对象时，使用 std::wsregex_iterator

调用构造函数并使用三个参数来创建一个对象：指出搜索开始位置的字符串反复运算器、指出搜索结束位置的字符串反复运算器，以及 regex 对象。如果找到任何配对，对象在创建时将包含第一个配对。使用缺省构造函数创建另一个反复运算器对象，以取得串行结束反复运算器。您可以将第一个对象与第二个对象进行比较，以判断是否有任何进一步的配对。只要第一个对象不等于第二个对象，您就可以取消第一个对象的参考，以取得 match_results 对象。

std::string subject("This is a test");
try {
  std::regex re("\\w+");
  std::sregex_iterator next(subject.begin(), subject.end(), re);
  std::sregex_iterator end;
  while (next != end) {
    std::smatch match = *next;
    std::cout << match.str() << "\n";
    next++;
  }
} catch (std::regex_error& e) {
  // Syntax error in the regular expression
}

取代所有配对

若要取代字符串中的所有配对，请调用 std::regex_replace()，并将您的主旨字符串作为第一个参数、regex 对象作为第二个参数，以及包含取代文本的字符串作为第三个参数。此函数会传回一个套用取代结果的新字符串。

替换字符串的语法与 JavaScript 类似，但并非完全相同。无论您使用哪种 regex 语法或文法，都会使用相同的替换字符串语法。您可以使用 $& 或 $0 来插入整个 regex 比对，并使用 $1 到 $9 来插入前九个捕获组比对到的文本。没有办法插入第 10 个或更高群组比对到的文本。$10 和更高群组永远会被替换为空白，而 $9 和更低群组如果 regex 中的捕获组少于要求的数字，也会被替换为空白。$`（美元符号反引号）是比对左侧的字符串部分，而 $'（美元符号单引号）是比对右侧的字符串部分。

關於正規表示式 » 正規表示式工具和實用程式 » 使用 std::regex 的 C++ 正規表示式

正規表示式工具

資料庫

此網站上的更多資訊

使用 std::regex 的 C++ 正規表示式

C++11 標準中定義的 C++ 標準函式庫在 <regex> 標頭中提供正規表示式的支援。在 C++11 之前，<regex> 是 C++ 標準函式庫的 TR1 延伸模組的一部分。當此網站提到 std::regex 時，是指包含在 Visual C++ 2008 及更新版本中的 C++ 標準函式庫的 Dinkumware 實作。當目標為 Win64 時，C++Builder XE3 及更新版本也支援此函式庫。在 Visual C++ 2008 中，命名空間是 std::tr1::regex，而不是 std::regex。

C++Builder 10 及更新版本支援 Dinkumware 實作 std::regex，只要將使用傳統 Borland 編譯器的選項停用，即可鎖定 Win32。在 C++Builder XE3 及更新版本中使用傳統 Borland 編譯器時，您可以使用 boost::regex 取代 std::regex。雖然 std::regex 在 TR1 和 C++11 中定義的運算和類別與 boost::regex 幾乎相同，但實際 regex 風格仍有許多重要的差異。最重要的是，Boost 中的 ECMAScript regex 語法新增了許多從 Perl 借用的功能，這些功能並非 ECMAScript 標準的一部分，且未在 Dinkumware 函式庫中實作。

六種正規表示法風格

std::regex_constants 中定義了六種不同的正規表示法風格或語法

ECMAScript：類似於 JavaScript
basic：類似於 POSIX BRE。
extended：類似於 POSIX ERE。
grep：與 basic 相同，但會將換行符號視為交替運算子。
egrep：與 extended 相同，但會將換行符號視為交替運算子。
awk：與 extended 相同，但會支援非可列印字元的常見跳脫字元。

大多數 C++ 參考都說明 C++11 實作了 ECMA-262v3 和 POSIX 標準中定義的正規表示法。但實際上，C++ 實作是根據這些標準非常鬆散地建立的。語法非常接近。唯一的重大差異是 std::regex 即使在 ECMAScript 模式下也支援 POSIX 類別，且對於哪些字元必須跳脫（例如大括號和右中括號）以及哪些字元不需跳脫（例如字母）有些特別。

但此語法的實際行為有重要的差異。在 std::regex 中，插入符號和美元符號始終會與內嵌換行符號相符，而在 JavaScript 和 POSIX 中，這是一個選項。與大多數 regex 風格一樣，對非參與群組的反向參照無法相符，而在 JavaScript 中，它們會找到零長度相符。在 JavaScript 中，\d 和 \w 僅限於 ASCII，而 \s 則相符於所有 Unicode 空白。這很奇怪，但所有現代瀏覽器都遵循此規範。在 std::regex 中，使用 char 字串時，所有簡寫都僅限於 ASCII。在 Visual C++ 中（但 C++Builder 中沒有），使用 wchar_t 字串時，它們支援 Unicode。在 Visual C++ 中使用 wchar_t 時，POSIX 類別也會相符於非 ASCII 字元，但並未一致包含所有預期的 Unicode 字元。

實際上，您大多會使用 ECMAScript 語法。它是預設語法，提供的功能遠多於其他語法。每當本網站上的教學課程提到 std::regex 但未提到任何語法時，所寫的內容就適用於 ECMAScript 語法，可能適用於其他語法，也可能不適用。您實際上只會在想要重複使用舊 POSIX 程式碼或 UNIX 指令碼中的現有正規表示法時，才會使用其他語法。

建立正規表示式物件

在使用正規表示式之前，您必須建立範本類別 std::basic_regex 的物件。如果您要處理的主題是 char 陣列或 std::string 物件，您可以輕鬆地使用這個範本類別的 std::regex 實例化來執行此動作。如果您要處理的主題是 wchar_t 陣列或 std::wstring 物件，請使用 std::wregex 實例化。

將您的正規表示式作為字串傳遞給建構函式的第一個參數。如果您想要使用 ECMAScript 以外的正規表示式風格，請將適當的常數作為第二個參數傳遞。您可以將這個常數「或」運算 std::regex_constants::icase，以讓正規表示式不區分大小寫。您也可以將它「或」運算 std::regex_constants::nosubs，以將所有擷取群組轉換為非擷取群組，如果您只關心整體正規表示式比對，而且不想要擷取任何擷取群組比對到的文字，這樣可以讓您的正規表示式更有效率。

尋找正規表示式比對

呼叫 std::regex_search()，並將您的主題字串作為第一個參數，將正規表示式物件作為第二個參數，以檢查您的正規表示式是否可以比對字串的任何部分。如果您想要檢查您的正規表示式是否可以比對整個主題字串，請呼叫 std::regex_match()，並使用相同的參數。由於 std::regex 缺乏專門在字串開頭和結尾比對的錨定，因此您在使用正規表示式驗證使用者輸入時，必須呼叫 regex_match()。

regex_search() 和 regex_match() 都只會傳回 true 或 false。若要取得 regex_search() 比對到的字串部分，或是在使用任一函式時取得擷取群組比對到的字串部分，您需要將範本類別 std::match_results 的物件作為第二個參數傳遞。正規表示式物件接著會成為第三個參數。使用下列四個範本實例化之一的預設建構函式來建立這個物件

如果您要處理的主題是 char 陣列，請使用 std::cmatch
如果您要處理的主題是 std::string 物件，請使用 std::smatch
如果您要處理的主題是 wchar_t 陣列，請使用 std::wcmatch
如果您要處理的主題是 std::wstring 物件，請使用 std::wsmatch

當函式呼叫傳回 true 時，您可以呼叫 match_results 物件的 str()、position() 和 length() 成員函式，以取得配對的文字，或相對於主旨字串的配對開始位置及其長度。呼叫這些成員函式時不帶參數或以 0 為參數，以取得整體 regex 配對。呼叫時傳遞 1 或更大的數字，以取得特定擷取群組的配對。size() 成員函式指出擷取群組的數量，加上整體配對的 1。因此，您可以傳遞一個值，範圍為 size()-1，至其他三個成員函式。

將所有內容組合在一起，我們可以像這樣取得第一個擷取群組配對的文字

std::string subject("Name: John Doe");
std::string result;
try {
  std::regex re("Name: (.*)");
  std::smatch match;
  if (std::regex_search(subject, match, re) && match.size() > 1) {
    result = match.str(1);
  } else {
    result = std::string("");
  }
} catch (std::regex_error& e) {
  // Syntax error in the regular expression
}

尋找所有 Regex 配對

若要尋找字串中的所有 regex 配對，您需要使用反覆運算器。使用這四個範本實例化之一，建立範本類別 std::regex_iterator 的物件

當您的主旨是 char 陣列時，使用 std::cregex_iterator
當您的主旨是 std::string 物件時，使用 std::sregex_iterator
當您的主旨是 wchar_t 陣列時，使用 std::wcregex_iterator
當您的主旨是 std::wstring 物件時，使用 std::wsregex_iterator

呼叫建構函式並使用三個參數來建立一個物件：指出搜尋開始位置的字串反覆運算器、指出搜尋結束位置的字串反覆運算器，以及 regex 物件。如果找到任何配對，物件在建立時將包含第一個配對。使用預設建構函式建立另一個反覆運算器物件，以取得序列結束反覆運算器。您可以將第一個物件與第二個物件進行比較，以判斷是否有任何進一步的配對。只要第一個物件不等於第二個物件，您就可以取消第一個物件的參考，以取得 match_results 物件。

std::string subject("This is a test");
try {
  std::regex re("\\w+");
  std::sregex_iterator next(subject.begin(), subject.end(), re);
  std::sregex_iterator end;
  while (next != end) {
    std::smatch match = *next;
    std::cout << match.str() << "\n";
    next++;
  }
} catch (std::regex_error& e) {
  // Syntax error in the regular expression
}

取代所有配對

若要取代字串中的所有配對，請呼叫 std::regex_replace()，並將您的主旨字串作為第一個參數、regex 物件作為第二個參數，以及包含取代文字的字串作為第三個參數。此函式會傳回一個套用取代結果的新字串。

替換字串的語法與 JavaScript 類似，但並非完全相同。無論您使用哪種 regex 語法或文法，都會使用相同的替換字串語法。您可以使用 $& 或 $0 來插入整個 regex 比對，並使用 $1 到 $9 來插入前九個擷取群組比對到的文字。沒有辦法插入第 10 個或更高群組比對到的文字。$10 和更高群組永遠會被替換為空白，而 $9 和更低群組如果 regex 中的擷取群組少於要求的數字，也會被替換為空白。$`（美元符號反引號）是比對左側的字串部分，而 $'（美元符號單引號）是比對右側的字串部分。

About Regular Expressions » Tools and Utilities for Regular Expressions » C++ Regular Expressions with std::regex

Regex Tools

grep

Languages & Libraries

Databases

C++ Regular Expressions with std::regex

The C++ standard library as defined in the C++11 standard provides support for regular expressions in the <regex> header. Prior to C++11, <regex> was part of the TR1 extension to the C++ standard library. When this website mentions std::regex, this refers to the Dinkumware implementation of the C++ standard library that is included with Visual C++ 2008 and later. It is also supported by C++Builder XE3 and later when targeting Win64. In Visual C++ 2008, the namespace is std::tr1::regex rather than std::regex.

C++Builder 10 and later support the Dinkumware implementation std::regex when targeting Win32 if you disable the option to use the classic Borland compiler. When using the classic Borland compiler in C++Builder XE3 and later, you can use boost::regex instead of std::regex. While std::regex as defined in TR1 and C++11 defines pretty much the same operations and classes as boost::regex, there are a number of important differences in the actual regex flavor. Most importantly the ECMAScript regex syntax in Boost adds a number of features borrowed from Perl that aren’t part of the ECMAScript standard and that aren’t implemented in the Dinkumware library.

Six Regular Expression Flavors

Six different regular expression flavors or grammars are defined in std::regex_constants:

ECMAScript: Similar to JavaScript
basic: Similar to POSIX BRE.
extended: Similar to POSIX ERE.
grep: Same as basic, with the addition of treating line feeds as alternation operators.
egrep: Same as extended, with the addition of treating line feeds as alternation operators.
awk: Same as extended, with the addition of supporting common escapes for non-printable characters.

Most C++ references talk as if C++11 implements regular expressions as defined in the ECMA-262v3 and POSIX standards. But in reality the C++ implementation is very loosely based these standards. The syntax is quite close. The only significant differences are that std::regex supports POSIX classes even in ECMAScript mode, and that it is a bit peculiar about which characters must be escaped (like curly braces and closing square brackets) and which must not be escaped (like letters).

But there are important differences in the actual behavior of this syntax. The caret and dollar always match at embedded line breaks in std::regex, while in JavaScript and POSIX this is an option. Backreferences to non-participating groups fail to match as in most regex flavors, while in JavaScript they find a zero-length match. In JavaScript, \d and \w are ASCII-only while \s matches all Unicode whitespace. This is odd, but all modern browsers follow the spec. In std::regex all the shorthands are ASCII-only when using strings of char. In Visual C++, but not in C++Builder, they support Unicode when using strings of wchar_t. The POSIX classes also match non-ASCII characters when using wchar_t in Visual C++, but do not consistently include all the Unicode characters that one would expect.

In practice, you’ll mostly use the ECMAScript grammar. It’s the default grammar and offers far more features that the other grammars. Whenever the tutorial on this website mentions std::regex without mentioning any grammars then what is written applies to the ECMAScript grammar and may or may not apply to any of the other grammars. You’ll really only use the other grammars if you want to reuse existing regular expressions from old POSIX code or UNIX scripts.

Creating a Regular Expression Object

Before you can use a regular expression, you have to create an object of the template class std::basic_regex. You can easily do this with the std::regex instantiation of this template class if your subject is an array of char or an std::string object. Use the std::wregex instantiation if your subject is an array of wchar_t of an std::wstring object.

Pass your regex as a string as the first parameter to the constructor. If you want to use a regex flavor other than ECMAScript, pass the appropriate constant as a second parameter. You can “or” this constant with std::regex_constants::icase to make the regex case insensitive. You can also “or” it with std::regex_constants::nosubs to turn all capturing groups into non-capturing groups, which makes your regex more efficient if you only care about the overall regex match and don’t want to extract text matched by any of the capturing groups.

Finding a Regex Match

Call std::regex_search() with your subject string as the first parameter and the regex object as the second parameter to check whether your regex can match any part of the string. Call std::regex_match() with the same parameters if you want to check whether your regex can match the entire subject string. Since std::regex lacks anchors that exclusively match at the start and end of the string, you have to call regex_match() when using a regex to validate user input.

Both regex_search() and regex_match() return just true or false. To get the part of the string matched by regex_search(), or to get the parts of the string matched by capturing groups when using either function, you need to pass an object of the template class std::match_results as the second parameter. The regex object then becomes the third parameter. Create this object using the default constructor of one of these four template instantiations:

std::cmatch when your subject is an array of char
std::smatch when your subject is an std::string object
std::wcmatch when your subject is an array of wchar_t
std::wsmatch when your subject is an std::wstring object

When the function call returns true, you can call the str(), position(), and length() member functions of the match_results object to get the text that was matched, or the starting position and its length of the match relative to the subject string. Call these member functions without a parameter or with 0 as the parameter to get the overall regex match. Call them passing 1 or greater to get the match of a particular capturing group. The size() member function indicates the number of capturing groups plus one for the overall match. Thus you can pass a value up to size()-1 to the other three member functions.

Putting it all together, we can get the text matched by the first capturing group like this:

std::string subject("Name: John Doe");
std::string result;
try {
  std::regex re("Name: (.*)");
  std::smatch match;
  if (std::regex_search(subject, match, re) && match.size() > 1) {
    result = match.str(1);
  } else {
    result = std::string("");
  }
} catch (std::regex_error& e) {
  // Syntax error in the regular expression
}

Finding All Regex Matches

To find all regex matches in a string, you need to use an iterator. Construct an object of the template class std::regex_iterator using one of these four template instantiations:

std::cregex_iterator when your subject is an array of char
std::sregex_iterator when your subject is an std::string object
std::wcregex_iterator when your subject is an array of wchar_t
std::wsregex_iterator when your subject is an std::wstring object

Construct one object by calling the constructor with three parameters: a string iterator indicating the starting position of the search, a string iterator indicating the ending position of the search, and the regex object. If there are any matches to be found, the object will hold the first match when constructed. Construct another iterator object using the default constructor to get an end-of-sequence iterator. You can compare the first object to the second to determine whether there are any further matches. As long as the first object is not equal to the second, you can dereference the first object to get a match_results object.

std::string subject("This is a test");
try {
  std::regex re("\\w+");
  std::sregex_iterator next(subject.begin(), subject.end(), re);
  std::sregex_iterator end;
  while (next != end) {
    std::smatch match = *next;
    std::cout << match.str() << "\n";
    next++;
  }
} catch (std::regex_error& e) {
  // Syntax error in the regular expression
}

Replacing All Matches

To replace all matches in a string, call std::regex_replace() with your subject string as the first parameter, the regex object as the second parameter, and the string with the replacement text as the third parameter. The function returns a new string with the replacements applied.

The replacement string syntax is similar but not identical to that of JavaScript. The same replacement string syntax is used regardless of which regex syntax or grammar you are using. You can use $& or $0 to insert the whole regex match and $1 through $9 to insert the text matched by the first nine capturing groups. There is no way to insert the text matched by groups 10 or higher. $10 and higher are always replaced with nothing, and $9 and lower are replaced with nothing if there are fewer capturing groups in the regex than the requested number. $` (dollar backtick) is the part of the string to the left of the match, and $' (dollar quote) is the part of the string to the right of the match.