Feature | Syntax | Description | Example | .NET | Java | Perl | PCRE | PCRE2 | PHP | Delphi | R | JavaScript | VBScript | XRegExp | Python | Ruby | std::regex | Boost | Tcl ARE | POSIX BRE | POSIX ERE | GNU BRE | GNU ERE | Oracle | XML | XPath |
Grapheme |
\X |
Matches a single Unicode grapheme, whether encoded as a single code point or multiple code points using combining marks. A grapheme most closely resembles the everyday concept of a “character”. |
\X matches à encoded as U+0061 U+0300, à encoded as U+00E0, © , etc. |
no | 9 | YES | 5.0 | YES | 5.0.5 | YES | YES | no | no | no | no | 2.0 | no | ECMA extended egrep awk | no | no | no | no | no | no | no | no |
Code point |
\uFFFF where FFFF are 4 hexadecimal digits |
Matches a specific Unicode code point. |
\u00E0 matches à encoded as U+00E0 only. \u00A9 matches © |
YES | YES | no | no | no | no | no | no | YES | YES | YES | 3.3 2.4 string | 1.9 | ECMA | no | YES | no | no | no | no | no | no | no |
Code point |
\u{FFFF} where FFFF are 1 to 4 hexadecimal digits |
Matches a specific Unicode code point. |
\u{E0} matches à encoded as U+00E0 only. \u{A9} matches © |
no | no | no | no | no | 7.0.0 string | no | no | no | no | 3 | no | 1.9 | no | no | no | no | no | no | no | no | no | no |
Code point |
\xFFFF where FFFF are 4 hexadecimal digits |
Matches a specific Unicode code point. |
\x00E0 matches à encoded as U+00E0 only. \x00A9 matches © |
no | no | no | no | no | no | no | no | no | no | no | no | no | string | no | 8.4–8.5 | no | no | no | no | no | no | no |
Code point |
\x{FFFF} where FFFF are 1 to 4 hexadecimal digits |
Matches a specific Unicode code point. |
\x{E0} matches à encoded as U+00E0 only. \x{A9} matches © |
no | 7 | YES | YES | YES | YES | YES | YES | no | no | no | no | no | no | ECMA extended egrep awk | no | no | no | no | no | no | no | no |
Unicode category |
\pL where L is a Unicode category |
Matches a single Unicode code point in the specified Unicode category. |
\pL matches à encoded as U+00E0; \pS matches © |
no | YES | YES | 5.0 | YES | 5.0.5 | YES | YES | no | no | 3 | no | no | no | no | no | no | no | no | no | no | no | no |
Unicode category |
\PL where L is a Unicode category |
Matches a single Unicode code point that is not in the specified Unicode category. |
\PS matches à encoded as U+00E0; \PL matches © |
no | YES | YES | 5.0 | YES | 5.0.5 | YES | YES | no | no | 3 | no | no | no | no | no | no | no | no | no | no | no | no |
Unicode category |
\p{L} where L is a Unicode category |
Matches a single Unicode code point in the specified Unicode category. |
\p{L} matches à encoded as U+00E0; \p{S} matches © |
YES | YES | YES | 5.0 | YES | 5.0.5 | YES | YES | no | no | YES | no | 1.9 | no | no | no | no | no | no | no | no | YES | YES |
Unicode category |
\p{IsL} where L is a Unicode category |
Matches a single Unicode code point in the specified Unicode category. |
\p{IsL} matches à encoded as U+00E0; \p{IsS} matches © |
no | YES | YES | no | no | no | no | no | no | no | no | no | no | no | no | no | no | no | no | no | no | no | no |
Unicode category |
\p{Category} |
Matches a single Unicode code point in the specified Unicode category. |
\p{Letter} matches à encoded as U+00E0; \p{Symbol} matches © |
no | no | YES | no | no | no | no | no | no | no | YES | no | 1.9 | no | no | no | no | no | no | no | no | no | no |
Unicode category |
\p{IsCategory} |
Matches a single Unicode code point in the specified Unicode category. |
\p{IsLetter} matches à encoded as U+00E0; \p{IsSymbol} matches © |
no | no | YES | no | no | no | no | no | no | no | no | no | no | no | no | no | no | no | no | no | no | no | no |
Unicode script |
\p{Script} |
Matches a single Unicode code point that is part of the specified Unicode script. Each Unicode code point is part of exactly one script. Scripts never contain unassigned code points. |
\p{Greek} matches Ω |
no | no | YES | 6.5 | YES | 5.1.3 | YES | YES | no | no | YES | no | 1.9 | no | no | no | no | no | no | no | no | no | no |
Unicode script |
\p{IsScript} |
Matches a single Unicode code point that is part of the specified Unicode script. Each Unicode code point is part of exactly one script. Scripts never contain unassigned code points. |
\p{IsGreek} matches Ω |
no | 7 | YES | no | no | no | no | no | no | no | no | no | no | no | no | no | no | no | no | no | no | no | no |
Unicode block |
\p{Block} |
Matches a single Unicode code point that is part of the specified Unicode block. Each Unicode code point is part of exactly one block. Blocks may contain unassigned code points. |
\p{Arrows} matches any of the code points from U+2190 until U+21FF (← until ⇿ ) |
no | no | YES | no | no | no | no | no | no | no | no | no | no | no | no | no | no | no | no | no | no | no | no |
Unicode block |
\p{InBlock} |
Matches a single Unicode code point that is part of the specified Unicode block. Each Unicode code point is part of exactly one block. Blocks may contain unassigned code points. |
\p{InArrows} matches any of the code points from U+2190 until U+21FF (← until ⇿ ) |
no | YES | YES | no | no | no | no | no | no | no | 2–4 | no | 2.0 | no | no | no | no | no | no | no | no | no | no |
Unicode block |
\p{IsBlock} |
Matches a single Unicode code point that is part of the specified Unicode block. Each Unicode code point is part of exactly one block. Blocks may contain unassigned code points. |
\p{IsArrows} matches any of the code points from U+2190 until U+21FF (← until ⇿ ) |
YES | no | YES | no | no | no | no | no | no | no | no | no | no | no | no | no | no | no | no | no | no | YES | YES |
Negated Unicode property |
\P{Property} |
Matches a single Unicode code point that does not have the specified property (category, script, or block). |
\P{L} matches © |
YES | YES | YES | 5.0 | YES | 5.0.5 | YES | YES | no | no | YES | no | 1.9 | no | ECMA extended egrep awk | no | no | no | no | no | no | YES | YES |
Negated Unicode property |
\p{^Property} |
Matches a single Unicode code point that does not have the specified property (category, script, or block). |
\p{^L} matches © |
no | no | YES | 5.0 | YES | 5.0.5 | YES | YES | no | no | YES | no | 1.9 | no | no | no | no | no | no | no | no | no | no |
Unicode property |
\P{^Property} |
Matches a single Unicode code point that does have the specified property (category, script, or block). Double negative is taken as positive. |
\P{^L} matches q |
no | no | YES | 5.0 | YES | 5.0.5 | YES | YES | no | no | no | no | 1.9 | no | no | no | no | no | no | no | no | no | no |
Feature | Syntax | Description | Example | .NET | Java | Perl | PCRE | PCRE2 | PHP | Delphi | R | JavaScript | VBScript | XRegExp | Python | Ruby | std::regex | Boost | Tcl ARE | POSIX BRE | POSIX ERE | GNU BRE | GNU ERE | Oracle | XML | XPath |
---|