如果一行可以符合一系列需求中的任何一个，只需在正则表达式中使用交替即可。^.*\b(one|two|three)\b.*$会匹配包含「one」、「two」或「three」等字词的完整文本行。第一个反向引用将包含该行实际包含的字词。如果它包含多个字词，则最后一个（最右边的）字词将被截取到第一个反向引用中。这是因为星号是贪婪的。如果我们让第一个星号变为非贪婪的，例如^.*?\b(one|two|three)\b.*$，则反向引用将包含第一个（最左边的）字词。

如果一行必须满足多个需求，我们需要使用先行断言。^(?=.*?\bone\b)(?=.*?\btwo\b)(?=.*?\bthree\b).*$ 符合包含所有字词「one」、「two」和「three」的完整文本行。同样地，锚点必须符合一行的开头和结尾，而点不能符合换行符号。由于插入符号和先行断言为零长度的事实，所有三个先行断言都会在每一行的开头尝试。每个先行断言都会符合单一行上的任何文本片段（.*?），后接其中一个字词。所有三个都必须成功符合，整个正则表达式才会符合。请注意，您可以将任何正则表达式（无论多么复杂）放入先行断言中，而不是像\bword\b这样的字词。最后，.*$会在先行断言确定符合需求后，让正则表达式实际符合该行。

如果你的条件是一行文本不包含某个东西，请使用负向先行断言。 ^((?!regexp).)*$ 符合完全不符合 regexp 的一行文本。请注意，与之前使用正向先行断言不同，我同时重复了负向先行断言和点。对于正向先行断言，我们只需要找到一个可以符合它的位置。但是负向先行断言必须在该行的每个字符位置进行测试。我们必须测试 regexp 在任何地方都失败，而不仅仅是在某个地方。

最后，你可以合并多个正向和负向需求，如下所示： ^(?=.*?\bmust-have\b)(?=.*?\bmandatory\b)((?!avoid|illegal).)*$。在检查多个正向需求时，正则表达式结尾处的 .* 充满了零长度断言，确保我们实际上符合某些东西。由于负向需求必须符合整行，因此很容易用负向测试取代 .*。

關於正規表示式 » 正規表示式範例 » 整行文字配對

範例

陷阱

本網站其他內容

整行文字配對

通常，您會想要在文字檔中配對整行，而不是僅配對符合特定需求的行的一部分。如果您想在文字編輯器中的搜尋取代中刪除整行，或在資訊檢索工具中收集整行，這會很有用。

為了讓這個範例更簡單，我們假設我們想要配對包含「John」這個字的整行。正規表示式 John 可以很輕易地找出這些行。但是，軟體只會指出 John 為配對結果，而不是包含這個字的整行。

解決方案相當簡單。若要指定我們需要整行，我們將使用插入符號和美元符號，並開啟選項，讓它們與內嵌換行符號相符。在用於處理文字檔案的軟體中，插入符號總會與內嵌換行符號相符。若要符合我們原始正規表示式John的匹配之前和之後的行部分，我們只需使用點和星號。務必關閉點符合換行符號的選項。

產生的正規表示式為：^.*John.*$。您可以使用相同的方法，將任何正規表示式的匹配範圍擴充至整行或完整行的區塊。在某些情況下，例如使用交替時，您需要使用括號將原始正規表示式分組在一起。

尋找包含或不包含特定字詞的行

如果一行可以符合一系列需求中的任何一個，只需在正規表示式中使用交替即可。^.*\b(one|two|three)\b.*$會匹配包含「one」、「two」或「three」等字詞的完整文字行。第一個反向參照將包含該行實際包含的字詞。如果它包含多個字詞，則最後一個（最右邊的）字詞將被擷取到第一個反向參照中。這是因為星號是貪婪的。如果我們讓第一個星號變為非貪婪的，例如^.*?\b(one|two|three)\b.*$，則反向參照將包含第一個（最左邊的）字詞。

如果一行必須滿足多個需求，我們需要使用先行斷言。^(?=.*?\bone\b)(?=.*?\btwo\b)(?=.*?\bthree\b).*$ 符合包含所有字詞「one」、「two」和「three」的完整文字行。同樣地，錨點必須符合一行的開頭和結尾，而點不能符合換行符號。由於插入符號和先行斷言為零長度的事實，所有三個先行斷言都會在每一行的開頭嘗試。每個先行斷言都會符合單一行上的任何文字片段（.*?），後接其中一個字詞。所有三個都必須成功符合，整個正規表示式才會符合。請注意，您可以將任何正規表示式（無論多麼複雜）放入先行斷言中，而不是像\bword\b這樣的字詞。最後，.*$會在先行斷言確定符合需求後，讓正規表示式實際符合該行。

如果你的條件是一行文字不包含某個東西，請使用負向先行斷言。 ^((?!regexp).)*$ 符合完全不符合 regexp 的一行文字。請注意，與之前使用正向先行斷言不同，我同時重複了負向先行斷言和點。對於正向先行斷言，我們只需要找到一個可以符合它的位置。但是負向先行斷言必須在該行的每個字元位置進行測試。我們必須測試 regexp 在任何地方都失敗，而不僅僅是在某個地方。

最後，你可以合併多個正向和負向需求，如下所示： ^(?=.*?\bmust-have\b)(?=.*?\bmandatory\b)((?!avoid|illegal).)*$。在檢查多個正向需求時，正規表示式結尾處的 .* 充滿了零長度斷言，確保我們實際上符合某些東西。由於負向需求必須符合整行，因此很容易用負向測試取代 .*。

About Regular Expressions » Sample Regular Expressions » Matching Whole Lines of Text

Examples

Regular Expressions Examples

Numeric Ranges

Floating Point Numbers

Email Addresses

IP Addresses

Valid Dates

Numeric Dates to Text

Credit Card Numbers

Matching Complete Lines

Deleting Duplicate Lines

Programming

Two Near Words

Pitfalls

Catastrophic Backtracking

Too Many Repetitions

Denial of Service

Making Everything Optional

Repeated Capturing Group

Mixing Unicode & 8-bit

Matching Whole Lines of Text

Often, you want to match complete lines in a text file rather than just the part of the line that satisfies a certain requirement. This is useful if you want to delete entire lines in a search-and-replace in a text editor, or collect entire lines in an information retrieval tool.

To keep this example simple, let’s say we want to match lines containing the word “John”. The regex John makes it easy enough to locate those lines. But the software will only indicate John as the match, not the entire line containing the word.

The solution is fairly simple. To specify that we need an entire line, we will use the caret and dollar sign and turn on the option to make them match at embedded newlines. In software aimed at working with text files, the anchors always match at embedded newlines. To match the parts of the line before and after the match of our original regular expression John, we simply use the dot and the star. Be sure to turn off the option for the dot to match newlines.

The resulting regex is: ^.*John.*$. You can use the same method to expand the match of any regular expression to an entire line, or a block of complete lines. In some cases, such as when using alternation, you will need to group the original regex together using parentheses.

Finding Lines Containing or Not Containing Certain Words

If a line can meet any out of series of requirements, simply use alternation in the regular expression. ^.*\b(one|two|three)\b.*$ matches a complete line of text that contains any of the words “one”, “two” or “three”. The first backreference will contain the word the line actually contains. If it contains more than one of the words, then the last (rightmost) word will be captured into the first backreference. This is because the star is greedy. If we make the first star lazy, like in ^.*?\b(one|two|three)\b.*$, then the backreference will contain the first (leftmost) word.

If a line must satisfy all of multiple requirements, we need to use lookahead. ^(?=.*?\bone\b)(?=.*?\btwo\b)(?=.*?\bthree\b).*$ matches a complete line of text that contains all of the words “one”, “two” and “three”. Again, the anchors must match at the start and end of a line and the dot must not match line breaks. Because of the caret, and the fact that lookahead is zero-length, all of the three lookaheads are attempted at the start of the each line. Each lookahead will match any piece of text on a single line (.*?) followed by one of the words. All three must match successfully for the entire regex to match. Note that instead of words like \bword\b, you can put any regular expression, no matter how complex, inside the lookahead. Finally, .*$ causes the regex to actually match the line, after the lookaheads have determined it meets the requirements.

If your condition is that a line should not contain something, use negative lookahead. ^((?!regexp).)*$ matches a complete line that does not match regexp. Notice that unlike before, when using positive lookahead, I repeated both the negative lookahead and the dot together. For the positive lookahead, we only need to find one location where it can match. But the negative lookahead must be tested at each and every character position in the line. We must test that regexp fails everywhere, not just somewhere.

Finally, you can combine multiple positive and negative requirements as follows: ^(?=.*?\bmust-have\b)(?=.*?\bmandatory\b)((?!avoid|illegal).)*$. When checking multiple positive requirements, the .* at the end of the regular expression full of zero-length assertions made sure that we actually matched something. Since the negative requirement must match the entire line, it is easy to replace the .* with the negative test.