Replacement String Conditionals
Replacement string conditionals allow you to use one replacement when a particular capturing group participated in the match and another replacement when that capturing group did not participate in the match. They are supported by Boost, and PCRE2. Boost and PCRE2 each invented their own syntax.
For conditionals to work in Boost, you need to pass regex_constants::format_all
to regex_replace
. For them to work in PCRE2, you need to pass PCRE2_SUBSTITUTE_EXTENDED to pcre2_substitute
.
Boost Replacement String Conditionals
Boost’s syntax is (?1matched:unmatched)
where 1
is a number between 1 and 99 referencing a numbered capturing group. matched
is used as the replacement for matches in which the capturing group participated. unmatched
is used for matches in which the group did not participate. The colon :
delimits the two parts. If you want a literal colon in the matched
part, then you need to escape it with a backslash. If you want a literal closing parenthesis anywhere in the conditional, then you need to escape that with a backslash too.
The parentheses delimit the conditional from the remainder of the replacement string. start(?1matched:unmatched)finish
replaces with startmatchedfinish
when the group participates and with startunmatchedfinish
when it doesn’t. Boost allows you to omit the parentheses if nothing comes after the conditional in the replacement. So ?1matched:unmatched
is the same as (?1matched:unmatched)
.
The matched
and unmatched
parts can be blank. You can omit the colon if the unmatched
part is blank. So (?1matched:)
and (?1matched)
replace with matched
when the group participates. They replace the match with nothing when the group does not participate.
You can use the full replacement string syntax in matched
and unmatched
. This means you can nest conditionals inside other conditionals. So (?1one(?2two):(?2two:none))
replaces with onetwo
when both groups participate, with one
or two
when group 1 or 2 participates and the other doesn’t, and with none
when neither group participates. With Boost ?1one(?2two):?2two:none
does exactly the same but omits parentheses that aren’t needed.
If there are two digits after the question mark but not enough capturing groups for a two-digit conditional to be valid, then only the first digit is used for the conditional and the second digit is a literal. So when there are less than 12 capturing groups in the regex, (?12matched)
replaces with 2matched
when capturing group 1 participates in the match.
Boost treats conditionals that reference a non-existing group number as conditionals to a group that never participates in the match. So (?12twelve:not twelve)
always replaces with not twelve
when there are fewer than 12 capturing groups in the regex.
You can avoid the ambiguity between single digit and double digit conditionals by placing curly braces around the number. (?{1}1:0)
replaces with 1
when group 1 participates and with 0
when it doesn’t, even if there are 11 or more capturing groups in the regex. (?{12}twelve:not twelve)
is always a conditional that references group 12, even if there are fewer than 12 groups in the regex (which may make the conditional invalid).
The syntax with curly braces also allows you to reference named capturing groups by their names. (?{name}matched:unmatched)
replaces with matched
when the group “name” participates in the match and with unmatched
when it doesn’t. If the group does not exist, Boost, treats conditionals that reference a non-existing group name as literals. So (?{nonexisting}matched:unmatched)
uses ?{nonexisting}matched:unmatched
as a literal replacement.
PCRE2 Replacement String Conditional
PCRE2’s syntax is ${1:+matched:unmatched}
where 1
is a number between 1 and 99 referencing a numbered capturing group. If your regex contains named capturing groups then you can reference them in a conditional by their name: ${name:+matched:unmatched}
.
matched
is used as the replacement for matches in which the capturing group participated. unmatched
is used for matches in which the group did not participate. :+
delimits the group number or name from the first part of the conditional. The second colon delimits the two parts. If you want a literal colon in the matched
part, then you need to escape it with a backslash. If you want a literal closing curly brace anywhere in the conditional, then you need to escape that with a backslash too. Plus signs have no special meaning beyond the :+
that starts the conditional, so they don’t need to be escaped.
You can use the full replacement string syntax in matched
and unmatched
. This means you can nest conditionals inside other conditionals. So ${1:+one${2:+two}:${2:+two:none}}
replaces with onetwo
when both groups participate, with one
or two
when group 1 or 2 participates and the other doesn’t, and with none
when neither group participates.
${1:-unmatched}
and ${name:-unmatched}
are shorthands for ${1:+${1}:unmatched}
and ${name:+${name}:unmatched}
. They insert the text captured by the group if it participated in the match. They insert unmatched
if the group did not participate. When using this syntax, :-
delimits the group number or name from the contents of the conditional. The conditional has only one part in which colons and minus signs have no special meaning.
PCRE2 treat conditionals that reference non-existing capturing groups as an error.
Escaping Question Marks, Colons, Parentheses, and Curly Braces
As explained above, you need to use backslashes to escape colons that you want to use as literals when used in the matched
part of the conditional. You also need to escape literal closing parentheses (Boost) or curly braces (PCRE2) with backslashes inside conditionals.
In replacement string flavors that support conditionals, you can escape colons, parentheses, curly braces, and even question marks with backslashes to make sure they are interpreted as literals anywhere in the replacement string. But generally there is no need to.
The colon does not have any special meaning in the unmatched
part or outside conditionals. So you don’t need to escape it there. The question mark does not have any special meaning if it is not followed by a digit or a curly brace. In PCRE2 it never has a special meaning. So you only need to escape question marks with backslashes if you want to use a literal question mark followed by a literal digit or curly brace as the replacement in Boost.
Boost always uses parentheses for grouping. An unescaped opening parenthesis always opens a group. Groups can be nested. An unescaped closing parenthesis always closes a group. An unescaped closing parenthesis that does not have a matching opening parenthesis effectively truncates the replacement string. So Boost requires you to always escape literal parentheses with backslashes.