Non-Printable Characters
Most applications and programming languages do not support any special syntax in the replacement text to make it easier to enter non-printable characters. If you are the end user of an application, that means you’ll have to use an application such as the Windows Character Map to help you enter characters that you cannot type on your keyboard. If you are programming, you can specify the replacement text as a string constant in your source code. Then you can use the syntax for string constants in your programming language to specify non-printable characters.
Python also supports the above escape sequences in replacement text, in addition to supporting them in string constants. Python and Boost also support these more exotic non-printables: \a
(bell, 0x07), \f
(form feed, 0x0C) and \v
(vertical tab, 0x0B).
Boost also support hexadecimal escapes. You can use \x{FFFF}
to insert a Unicode character. The euro currency sign occupies Unicode code point U+20AC. If you cannot type it on your keyboard, you can insert it into the replacement text with \x{20AC}
. For the 127 ASCII characters, you can use \x00
through \x7F
. If you are using Boost with 8-bit character strings, you can also use \x80
through \xFF
to insert characters from those 8-bit code pages.
Python does not support hexadecimal escapes in the replacement text syntax, even though it supports \xFF
and \uFFFF
in string constants.
Regex Syntax versus String Syntax
Many programming languages support escapes for non-printable characters in their syntax for literal strings in source code. Then such escapes are translated by the compiler into their actual characters before the string is passed to the search-and-replace function. If the search-and-replace function does not support the same escapes, this can cause an apparent difference in behavior when a regex is specified as a literal string in source code compared with a regex that is read from a file or received from user input. For example, JavaScript’s string.replace()
function does not support any of these escapes. But the JavaScript language does support escapes like \n
, \x0A
, and \u000A
in string literals. So when developing an application in JavaScript, \n
is only interpreted as a newline when you add the replacement text as a string literal to your source code. Then the JavaScript interpreter then translates \n
and the string.replace()
function sees an actual newline character. If your code reads the same replacement text from a file, then string.replace()
function sees \n
, which it treats as a literal backslash and a literal n.