<xsd:simpleType name="SSN">
    <xsd:restriction base="xsd:token">
        <xsd:pattern value="[0-9]{3}-[0-9]{2}-[0-9]{4}"/>
    </xsd:restriction>
</xsd:simpleType>

与其他正则表达式风格相比，XML 架构风格的功能相当有限。由于它仅用于验证整个元素是否符合模式，而不是从大量数据区块中截取比对结果，因此您不会真正错过其他风格中常见的功能。这些限制允许架构验证程序使用有效率的文本导向引擎来实作。

特别值得注意的是完全没有锚定，例如插入符号和美元符号、字词边界和环顾。XML 架构总是隐含地锚定整个正则表达式。正则表达式必须与整个元素相符，该元素才会被视为有效。如果您有模式 regexp，XML 架构验证程序会以 Perl、Java 或 .NET 使用模式 ^regexp$ 的方式来套用它。如果您想要接受在内容中间某处有 regex 的所有元素，您需要使用正则表达式 .*regex.*。两个 .* 扩充比对范围以涵盖整个元素，假设它不包含换行符号。如果您想要允许换行符号，您可以使用类似 [\s\S]*regex[\s\S]* 的内容。将简写字符类别与其否定版本结合，会产生一个与任何内容相符的字符类别。

XML 架构未提供指定比对模式的方法。点永远不会比对换行符号，而且模式永远以大小写敏感的方式套用。如果您想要以大小写不敏感的方式套用 文本，您需要将其改写为 [lL][iI][tT][eE][rR][aA][lL]。

XML 正则表达式没有类似 \xFF 或 \uFFFF 的代码来比对特定（无法打印）字符。您必须将它们添加为文本字符到您的正则表达式中。如果您使用纯文本编辑器将正则表达式输入 XML 文件，则可以使用  XML 语法。否则，您需要从字符对应表粘贴字符。

惰性量词不可使用。由于模式固定在主旨字符串的开头和结尾，而且只会传回成功/失败结果，因此贪婪量词和惰性量词之间唯一潜在的差异将会是性能。您永远无法通过将贪婪量词变更为惰性量词或反之，让完全固定的模式比对或失败。

XML 架构正则表达式支持下列项目

字符类别，包括速记、范围和否定类别。
字符类别减法.
点，比对换行符号以外的任何字符。
交替和群组。
贪婪量词 ?、*、+ 和 {n,m}
Unicode 属性和区块

请注意，XQuery 和 XPath 中可用的正则表达式函数使用不同的正则表达式风格。此风格是此处所述 XML 架构风格的超集。它添加许多现代正则表达式风格中可用的功能，但 XML 架构风格中没有。

XML 字符类别

尽管有其限制，XML 架构正则表达式引入了两个便利的功能。特殊速记字符类别 \i 和 \c 使得比对 XML 名称变得容易。没有其他正则表达式风格支持这些功能。

字符类别减法可以轻松比对位于特定清单中，但不在另一个清单中的字符。例如：[a-z-[aeiou]] 比对英文辅音。此功能现在也在 .NET 正则表达式引擎中提供。在使用 Unicode 属性时，这项功能特别方便。例如：[\p{L}-[\p{IsBasicLatin}]] 比对任何非英文本母的字母。

關於正規表示式 » 正規表示式工具和實用程式 » XML 架構正規表示式

正規表示式工具

資料庫

本網站的更多內容

XML 架構正規表示式

W3C XML 架構標準定義了其自己的正規表示式風格。您可以在 XML 架構中簡單類型定義的 pattern 面向中使用它。例如，下列使用正規表示式定義簡單類型「SSN」，要求元素包含有效的美國社會安全號碼。

<xsd:simpleType name="SSN">
    <xsd:restriction base="xsd:token">
        <xsd:pattern value="[0-9]{3}-[0-9]{2}-[0-9]{4}"/>
    </xsd:restriction>
</xsd:simpleType>

與其他正規表示式風格相比，XML 架構風格的功能相當有限。由於它僅用於驗證整個元素是否符合模式，而不是從大量資料區塊中擷取比對結果，因此您不會真正錯過其他風格中常見的功能。這些限制允許架構驗證程式使用有效率的文字導向引擎來實作。

特別值得注意的是完全沒有錨定，例如插入符號和美元符號、字詞邊界和環顧。XML 架構總是隱含地錨定整個正規表示式。正規表示式必須與整個元素相符，該元素才會被視為有效。如果您有模式 regexp，XML 架構驗證程式會以 Perl、Java 或 .NET 使用模式 ^regexp$ 的方式來套用它。如果您想要接受在內容中間某處有 regex 的所有元素，您需要使用正規表示式 .*regex.*。兩個 .* 擴充比對範圍以涵蓋整個元素，假設它不包含換行符號。如果您想要允許換行符號，您可以使用類似 [\s\S]*regex[\s\S]* 的內容。將簡寫字元類別與其否定版本結合，會產生一個與任何內容相符的字元類別。

XML 架構未提供指定比對模式的方法。點永遠不會比對換行符號，而且模式永遠以大小寫敏感的方式套用。如果您想要以大小寫不敏感的方式套用 文字，您需要將其改寫為 [lL][iI][tT][eE][rR][aA][lL]。

XML 正規表示式沒有類似 \xFF 或 \uFFFF 的代碼來比對特定（無法列印）字元。您必須將它們新增為文字字元到您的正規表示式中。如果您使用純文字編輯器將正規表示式輸入 XML 檔案，則可以使用  XML 語法。否則，您需要從字元對應表貼上字元。

惰性量詞不可使用。由於模式固定在主旨字串的開頭和結尾，而且只會傳回成功/失敗結果，因此貪婪量詞和惰性量詞之間唯一潛在的差異將會是效能。您永遠無法透過將貪婪量詞變更為惰性量詞或反之，讓完全固定的模式比對或失敗。

XML 架構正規表示式支援下列項目

字元類別，包括速記、範圍和否定類別。
字元類別減法.
點，比對換行符號以外的任何字元。
交替和群組。
貪婪量詞 ?、*、+ 和 {n,m}
Unicode 屬性和區塊

請注意，XQuery 和 XPath 中可用的正規表示式函數使用不同的正規表示式風格。此風格是此處所述 XML 架構風格的超集。它新增許多現代正規表示式風格中可用的功能，但 XML 架構風格中沒有。

XML 字元類別

儘管有其限制，XML 架構正規表示式引入了兩個便利的功能。特殊速記字元類別 \i 和 \c 使得比對 XML 名稱變得容易。沒有其他正規表示式風格支援這些功能。

字元類別減法可以輕鬆比對位於特定清單中，但不在另一個清單中的字元。例如：[a-z-[aeiou]] 比對英文輔音。此功能現在也在 .NET 正規表示式引擎中提供。在使用 Unicode 屬性時，這項功能特別方便。例如：[\p{L}-[\p{IsBasicLatin}]] 比對任何非英文字母的字母。

About Regular Expressions » Tools and Utilities for Regular Expressions » XML Schema Regular Expressions

Regex Tools

grep

Languages & Libraries

Databases

XML Schema Regular Expressions

The W3C XML Schema standard defines its own regular expression flavor. You can use it in the pattern facet of simple type definitions in your XML schemas. E.g. the following defines the simple type “SSN” using a regular expression to require the element to contain a valid US social security number.

<xsd:simpleType name="SSN">
    <xsd:restriction base="xsd:token">
        <xsd:pattern value="[0-9]{3}-[0-9]{2}-[0-9]{4}"/>
    </xsd:restriction>
</xsd:simpleType>

Compared with other regular expression flavors, the XML schema flavor is quite limited in features. Since it’s only used to validate whether an entire element matches a pattern or not, rather than for extracting matches from large blocks of data, you won’t really miss the features often found in other flavors. The limitations allow schema validators to be implemented with efficient text-directed engines.

Particularly noteworthy is the complete absence of anchors like the caret and dollar, word boundaries, and lookaround. XML schema always implicitly anchors the entire regular expression. The regex must match the whole element for the element to be considered valid. If you have the pattern regexp, the XML schema validator will apply it in the same way as say Perl, Java or .NET would do with the pattern ^regexp$. If you want to accept all elements with regex somewhere in the middle of their contents, you’ll need to use the regular expression .*regex.*. The two .* expand the match to cover the whole element, assuming it doesn’t contain line breaks. If you want to allow line breaks, you can use something like [\s\S]*regex[\s\S]*. Combining a shorthand character class with its negated version results in a character class that matches anything.

XML schemas do not provide a way to specify matching modes. The dot never matches line breaks, and patterns are always applied case sensitively. If you want to apply literal case insensitively, you’ll need to rewrite it as [lL][iI][tT][eE][rR][aA][lL].

XML regular expressions don’t have any tokens like \xFF or \uFFFF to match particular (non-printable) characters. You have to add them as literal characters to your regex. If you are entering the regex into an XML file using a plain text editor, then you can use the  XML syntax. Otherwise, you’ll need to paste in the characters from a character map.

Lazy quantifiers are not available. Since the pattern is anchored at the start and the end of the subject string anyway, and only a success/failure result is returned, the only potential difference between a greedy and lazy quantifier would be performance. You can never make a fully anchored pattern match or fail by changing a greedy quantifier into a lazy one or vice versa.

XML Schema regular expressions support the following:

Character classes, including shorthands, ranges and negated classes.
Character class subtraction.
The dot, which matches any character except line breaks.
Alternation and groups.
Greedy quantifiers ?, *, + and {n,m}
Unicode properties and blocks

Note that the regular expression functions available in XQuery and XPath use a different regular expression flavor. This flavor is a superset of the XML Schema flavor described here. It adds some of the features that are available in many modern regex flavors, but not in the XML Schema flavor.

XML Character Classes

Despite its limitations, XML schema regular expressions introduce two handy features. The special short-hand character classes \i and \c make it easy to match XML names. No other regex flavor supports these.

Character class subtraction makes it easy to match a character that is in a certain list, but not in another list. E.g. [a-z-[aeiou]] matches an English consonant. This feature is now also available in .NET regex engines. It is particularly handy when working with Unicode properties. E.g. [\p{L}-[\p{IsBasicLatin}]] matches any letter that is not an English letter.