大多数应用程序和编程语言都不支持替换文本中的任何特殊语法，以简化输入非打印字符。如果您是应用程序的最终用户，这表示您必须使用 Windows 字符对应表等应用程序来协助您输入无法在键盘上输入的字符。如果您是编程人员，则可以将替换文本指定为原代码中的字符串常数。然后，您可以在编程语言中使用字符串常数语法来指定非打印字符。

Python 也支持上述的转义串行来取代文本，除了支持字符串常数。Python 和 Boost 也支持这些较不常见的非可打印字符：\a（铃声，0x07）、\f（换页，0x0C）和 \v（垂直定位标签，0x0B）。

Boost 也支持十六进位转义字符。您可以使用 \x{FFFF} 来插入 Unicode 字符。欧元货币符号占用 Unicode 码点 U+20AC。如果您无法在键盘上输入，您可以使用 \x{20AC} 将其插入取代文本。对于 127 个 ASCII 字符，您可以使用 \x00 到 \x7F。如果您使用 Boost 搭配 8 字节字符字符串，您也可以使用 \x80 到 \xFF 来插入那些 8 字节码页中的字符。

Python 不支持取代文本语法中的十六进位转义字符，尽管它在字符串常数中支持 \xFF 和 \uFFFF。

正则表达式语法与字符串语法

许多编程语言支持原代码中字符串文本的语法中非可打印字符的转义字符。然后，这些转义字符会在字符串传递到搜索和取代函数之前，由编译器转换成它们的实际字符。如果搜索和取代函数不支持相同的转义字符，这可能会导致正则表达式指定为原代码中的字符串文本时，与从文件读取或从用户输入接收的正则表达式相比，行为上有明显的差异。例如，JavaScript 的 string.replace() 函数不支持任何这些转义字符。但 JavaScript 语言确实支持字符串文本中的转义字符，例如 \n、\x0A 和 \u000A。因此，在 JavaScript 中开发应用程序时，\n 仅在您将取代文本添加为原代码中的字符串文本时，才会被解释为换行符号。然后，JavaScript 解译器会转换 \n，而 string.replace() 函数会看到实际的换行字符。如果您的代码从文件读取相同的取代文本，则 string.replace() 函数会看到 \n，它将其视为一个反斜线和一个文本 n。

關於正規表示式 » 替換字串教學 » 非列印字元

替換文字教學

本網站的其他內容

非列印字元

大多數應用程式和程式設計語言都不支援替換文字中的任何特殊語法，以簡化輸入非列印字元。如果您是應用程式的最終使用者，這表示您必須使用 Windows 字元對應表等應用程式來協助您輸入無法在鍵盤上輸入的字元。如果您是程式設計人員，則可以將替換文字指定為原始碼中的字串常數。然後，您可以在程式設計語言中使用字串常數語法來指定非列印字元。

Python 也支援上述的跳脫序列來取代文字，除了支援字串常數。Python 和 Boost 也支援這些較不常見的非可列印字元：\a（鈴聲，0x07）、\f（換頁，0x0C）和 \v（垂直定位標籤，0x0B）。

Boost 也支援十六進位跳脫字元。您可以使用 \x{FFFF} 來插入 Unicode 字元。歐元貨幣符號佔用 Unicode 碼點 U+20AC。如果您無法在鍵盤上輸入，您可以使用 \x{20AC} 將其插入取代文字。對於 127 個 ASCII 字元，您可以使用 \x00 到 \x7F。如果您使用 Boost 搭配 8 位元組字元字串，您也可以使用 \x80 到 \xFF 來插入那些 8 位元組碼頁中的字元。

Python 不支援取代文字語法中的十六進位跳脫字元，儘管它在字串常數中支援 \xFF 和 \uFFFF。

正規表示式語法與字串語法

許多程式語言支援原始碼中字串文字的語法中非可列印字元的跳脫字元。然後，這些跳脫字元會在字串傳遞到搜尋和取代函數之前，由編譯器轉換成它們的實際字元。如果搜尋和取代函數不支援相同的跳脫字元，這可能會導致正規表示式指定為原始碼中的字串文字時，與從檔案讀取或從使用者輸入接收的正規表示式相比，行為上有明顯的差異。例如，JavaScript 的 string.replace() 函數不支援任何這些跳脫字元。但 JavaScript 語言確實支援字串文字中的跳脫字元，例如 \n、\x0A 和 \u000A。因此，在 JavaScript 中開發應用程式時，\n 僅在您將取代文字新增為原始碼中的字串文字時，才會被解釋為換行符號。然後，JavaScript 解譯器會轉換 \n，而 string.replace() 函數會看到實際的換行字元。如果您的程式碼從檔案讀取相同的取代文字，則 string.replace() 函數會看到 \n，它將其視為一個反斜線和一個文字 n。

About Regular Expressions » Replacement Strings Tutorial » Non-Printable Characters

Replacement Text Tutorial

Introduction

Characters

Non-Printable Characters

Non-Printable Characters

Most applications and programming languages do not support any special syntax in the replacement text to make it easier to enter non-printable characters. If you are the end user of an application, that means you’ll have to use an application such as the Windows Character Map to help you enter characters that you cannot type on your keyboard. If you are programming, you can specify the replacement text as a string constant in your source code. Then you can use the syntax for string constants in your programming language to specify non-printable characters.

Python also supports the above escape sequences in replacement text, in addition to supporting them in string constants. Python and Boost also support these more exotic non-printables: \a (bell, 0x07), \f (form feed, 0x0C) and \v (vertical tab, 0x0B).

Boost also support hexadecimal escapes. You can use \x{FFFF} to insert a Unicode character. The euro currency sign occupies Unicode code point U+20AC. If you cannot type it on your keyboard, you can insert it into the replacement text with \x{20AC}. For the 127 ASCII characters, you can use \x00 through \x7F. If you are using Boost with 8-bit character strings, you can also use \x80 through \xFF to insert characters from those 8-bit code pages.

Python does not support hexadecimal escapes in the replacement text syntax, even though it supports \xFF and \uFFFF in string constants.

Regex Syntax versus String Syntax

Many programming languages support escapes for non-printable characters in their syntax for literal strings in source code. Then such escapes are translated by the compiler into their actual characters before the string is passed to the search-and-replace function. If the search-and-replace function does not support the same escapes, this can cause an apparent difference in behavior when a regex is specified as a literal string in source code compared with a regex that is read from a file or received from user input. For example, JavaScript’s string.replace() function does not support any of these escapes. But the JavaScript language does support escapes like \n, \x0A, and \u000A in string literals. So when developing an application in JavaScript, \n is only interpreted as a newline when you add the replacement text as a string literal to your source code. Then the JavaScript interpreter then translates \n and the string.replace() function sees an actual newline character. If your code reads the same replacement text from a file, then string.replace() function sees \n, which it treats as a literal backslash and a literal n.