在 Linux 中使用 Grep 和正则表达式搜索文本模式在 Linux 中使用 Grep 和正则表达式搜索文本模式在 Linux 中使用 Grep 和正则表达式搜索文本模式在 Linux 中使用 Grep 和正则表达式搜索文本模式
  • 文章
  • 正则表达式
    • 工具
  • 登录
找到的结果: {phrase} (显示: {results_count} 共: {results_count_total})
显示: {results_count} 共: {results_count_total}

加载更多搜索结果...

搜索范围
模糊匹配
搜索标题
搜索内容
发表 admin at 2025年2月28日
类别
  • 未分类
标签

在 Linux 中使用 Grep 和正则表达式搜索文本模式

介绍

grep 命令是 Linux 终端环境中最有用的命令之一。名称grep代表“全局正则表达式打印”。这意味着您可以使用grep来检查它接收到的输入是否匹配指定的模式。这个看似微不足道的程序非常强大;它根据复杂规则对输入进行排序的能力使其成为许多命令链中的流行链接。

在本教程中,您将探索 grep 命令的选项,然后您将深入了解如何使用正则表达式进行更高级的搜索。

先决条件

要按照本指南进行操作,您需要访问运行基于 Linux 的操作系统的计算机。这可以是您使用 SSH 连接到的虚拟专用服务器,也可以是您的本地计算机。请注意,本教程是使用运行 Ubuntu 20.04 的 Linux 服务器进行验证的,但给出的示例应该适用于运行任何版本的任何 Linux 发行版的计算机。

如果您计划使用远程服务器来遵循本指南,我们鼓励您首先完成我们的初始服务器设置指南。这样做将为您设置一个安全的服务器环境 — 包括具有 sudo 权限的非根用户和配置有 UFW 的防火墙 — 您可以使用它来培养您的 Linux 技能。

基本用法

在本教程中,您将使用 grep 在 GNU 通用公共许可证版本 3 中搜索各种单词和短语。

如果您使用的是 Ubuntu 系统,则可以在 /usr/share/common-licenses 文件夹中找到该文件。将其复制到您的主目录:

  1. cp /usr/share/common-licenses/GPL-3 .

如果您在其他系统上,请使用 curl 命令下载副本:

  1. curl -o GPL-3 https://www.gnu.org/licenses/gpl-3.0.txt

您还将在本教程中使用 BSD 许可证文件。在 Linux 上,您可以使用以下命令将其复制到您的主目录:

  1. cp /usr/share/common-licenses/BSD .

如果您在另一个系统上,请使用以下命令创建文件:

  1. cat << 'EOF' > BSD
  2. Copyright (c) The Regents of the University of California.
  3. All rights reserved.
  4. Redistribution and use in source and binary forms, with or without
  5. modification, are permitted provided that the following conditions
  6. are met:
  7. 1. Redistributions of source code must retain the above copyright
  8. notice, this list of conditions and the following disclaimer.
  9. 2. Redistributions in binary form must reproduce the above copyright
  10. notice, this list of conditions and the following disclaimer in the
  11. documentation and/or other materials provided with the distribution.
  12. 3. Neither the name of the University nor the names of its contributors
  13. may be used to endorse or promote products derived from this software
  14. without specific prior written permission.
  15. THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
  16. ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
  17. IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
  18. ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
  19. FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
  20. DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
  21. OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
  22. HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
  23. LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
  24. OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
  25. SUCH DAMAGE.
  26. EOF

现在您有了这些文件,您可以开始使用 grep。

在最基本的形式中,您使用 grep 来匹配文本文件中的文字模式。这意味着如果您向 grep 传递一个要搜索的词,它将打印出文件中包含该词的每一行。

执行以下命令以使用 grep 搜索包含单词 GNU 的每一行:

  1. grep "GNU" GPL-3

第一个参数 GNU 是您要搜索的模式,而第二个参数 GPL-3 是您要搜索的输入文件。

结果输出将是包含模式文本的每一行:

Output
GNU GENERAL PUBLIC LICENSE The GNU General Public License is a free, copyleft license for the GNU General Public License is intended to guarantee your freedom to GNU General Public License for most of our software; it applies also to Developers that use the GNU GPL protect your rights with two steps: "This License" refers to version 3 of the GNU General Public License. 13. Use with the GNU Affero General Public License. under version 3 of the GNU Affero General Public License into a single ... ...

在某些系统上,您搜索的模式将在输出中突出显示。

常用选项

默认情况下,grep 将在输入文件中搜索准确指定的模式并返回它找到的行。通过向 grep 添加一些可选标志,您可以使此行为更有用。

如果您希望 grep 忽略搜索参数的“大小写”并同时搜索大写和小写变体,您可以指定 -i 或 --忽略大小写 选项。

使用以下命令在与之前相同的文件中搜索单词 license 的每个实例(大写、小写或混合大小写):

  1. grep -i "license" GPL-3

结果包含:LICENSE、license 和 License:

Output
GNU GENERAL PUBLIC LICENSE of this license document, but changing it is not allowed. The GNU General Public License is a free, copyleft license for The licenses for most software and other practical works are designed the GNU General Public License is intended to guarantee your freedom to GNU General Public License for most of our software; it applies also to price. Our General Public Licenses are designed to make sure that you (1) assert copyright on the software, and (2) offer you this License "This License" refers to version 3 of the GNU General Public License. "The Program" refers to any copyrightable work licensed under this ... ...

如果有一个带有 LiCeNsE 的实例,它也会被返回。

如果要查找不包含指定模式的所有行,可以使用 -v 或 --invert-match 选项。

使用以下命令搜索 BSD 许可证中不包含单词 the 的每一行:

  1. grep -v "the" BSD

您将收到此输出:

Output
All rights reserved. Redistribution and use in source and binary forms, with or without are met: may be used to endorse or promote products derived from this software without specific prior written permission. THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE ... ...

由于您没有指定 \ignore case 选项,因此返回的最后两项没有单词 the。

了解匹配发生的行号通常很有用。您可以使用 -n 或 --line-number 选项来执行此操作。重新运行前面的示例并添加此标志:

  1. grep -vn "the" BSD

这将返回以下文本:

Output
2:All rights reserved. 3: 4:Redistribution and use in source and binary forms, with or without 6:are met: 13: may be used to endorse or promote products derived from this software 14: without specific prior written permission. 15: 16:THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND 17:ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE ... ...

现在,如果您想更改不包含 the 的每一行,您可以引用行号。这在处理源代码时特别方便。

常用表达

在介绍中,您了解到 grep 代表“全局正则表达式打印”。“正则表达式”是描述特定搜索模式的文本字符串。

不同的应用程序和编程语言对正则表达式的实现略有不同。在本教程中,您将只探索 grep 描述其模式的方式的一小部分。

文字匹配

在本教程前面的示例中,当您搜索单词 GNU 和 the 时,您实际上是在搜索与字符串 完全匹配的基本正则表达式GNU 和 。精确指定要匹配的字符的模式称为 \literals,因为它们逐字匹配模式,字符对字符。

将这些视为匹配字符串而不是匹配单词会很有帮助。当你学习更复杂的模式时,这将成为一个更重要的区别。

除非被其他表达式机制修改,否则所有字母和数字字符(以及某些其他字符)都按字面匹配。

锚匹配

锚点是特殊字符,用于指定匹配项必须出现在行中的什么位置才有效。

例如,使用锚点,您可以指定您只想了解与 GNU 匹配的行的最开头的行。为此,您可以在文字字符串之前使用 ^ 锚点。

运行以下命令搜索 GPL-3 文件并找到 GNU 出现在行首的行:

  1. grep "^GNU" GPL-3

此命令将返回以下两行:

Output
GNU General Public License for most of our software; it applies also to GNU General Public License, you may choose any version ever published

类似地,您在模式末尾使用 $ 锚点来指示匹配仅在出现在一行的末尾时才有效。

此命令将匹配 GPL-3 文件中以单词 and 结尾的每一行:

  1. grep "and$" GPL-3

您将收到此输出:

Output
that there is no warranty for this free software. For both users' and The precise terms and conditions for copying, distribution and License. Each licensee is addressed as "you". "Licensees" and receive it, in any medium, provided that you conspicuously and alternative is allowed only occasionally and noncommercially, and network may be denied when the modification itself materially and adversely affects the operation of the network or violates the rules and provisionally, unless and until the copyright holder explicitly and receives a license from the original licensors, to run, modify and make, use, sell, offer for sale, import and otherwise run, modify and

匹配任何字符

句点字符 (.) 在正则表达式中使用,表示任何单个字符都可以存在于指定位置。

例如,要匹配 GPL-3 文件中包含两个字符然后是字符串 cept 的任何内容,您可以使用以下模式:

  1. grep "..cept" GPL-3

此命令返回以下输出:

Output
use, which is precisely where it is most unacceptable. Therefore, we infringement under applicable copyright law, except executing it on a tells the user that there is no warranty for the work (except to the License by making exceptions from one or more of its conditions. form of a separately written license, or stated as exceptions; You may not propagate or modify a covered work except as expressly 9. Acceptance Not Required for Having Copies. ... ...

此输出包含 accept 和 except 的实例以及这两个词的变体。如果 z2cept 也被发现,那么该模式也会匹配。

括号表达式

通过将一组字符放在括号内(\[ 和 \]),您可以指定该位置的字符可以是括号组中的任何一个字符。

例如,要查找包含 too 或 two 的行,您可以使用以下模式简洁地指定这些变体:

  1. grep "t[wo]o" GPL-3

输出显示文件中存在两种变体:

Output
your programs, too. freedoms that you received. You must make sure that they, too, receive Developers that use the GNU GPL protect your rights with two steps: a computer network, with no transfer of a copy, is not conveying. System Libraries, or general-purpose tools or generally available free Corresponding Source from a network server at no charge. ... ...

括号符号为您提供了一些有趣的选项。通过以 ^ 字符开始括号内的字符列表,您可以使模式匹配除括号内字符以外的任何内容。

此示例类似于模式 .ode,但不会匹配模式 code:

  1. grep "[^c]ode" GPL-3

这是您将收到的输出:

Output
1. Source Code. model, to give anyone who possesses the object code either (1) a the only significant mode of use of the product. notice like this when it starts in an interactive mode:

请注意,在返回的第二行中,实际上有单词 code。这不是正则表达式或 grep 的故障。相反,返回此行是因为在该行的前面,找到了在单词 model 中找到的模式 mode。返回该行是因为存在与模式匹配的实例。

括号的另一个有用的功能是您可以指定一个字符范围,而不是单独键入每个可用字符。

这意味着如果要查找以大写字母开头的每一行,可以使用以下模式:

  1. grep "^[A-Z]" GPL-3

这是此表达式返回的输出:

Output
GNU General Public License for most of our software; it applies also to States should not allow patents to restrict development and use of License. Each licensee is addressed as "you". "Licensees" and Component, and (b) serves only to enable use of the work with that Major Component, or to implement a Standard Interface for which an System Libraries, or general-purpose tools or generally available free Source. User Product is transferred to the recipient in perpetuity or for a ... ...

由于一些遗留排序问题,使用 POSIX 字符类而不是像您刚刚使用的字符范围通常更准确。

讨论每个 POSIX 字符类超出了本指南的范围,但是可以完成与上一个示例相同过程的示例使用方括号选择器中的 \[:upper:\] 字符类:

  1. grep "^[[:upper:]]" GPL-3

输出将与以前相同。

重复模式零次或多次

最后,最常用的元字符之一是星号,或 *,意思是“将前一个字符或表达式重复零次或多次”。

要在 GPL-3 文件中查找包含左括号和右括号且中间只有字母和单个空格的每一行,请使用以下表达式:

  1. grep "([A-Za-z ]*)" GPL-3

您将获得以下输出:

Output
Copyright (C) 2007 Free Software Foundation, Inc. distribution (with or without modification), making available to the than the work as a whole, that (a) is included in the normal form of Component, and (b) serves only to enable use of the work with that (if any) on which the executable work runs, or a compiler used to (including a physical distribution medium), accompanied by the (including a physical distribution medium), accompanied by a place (gratis or for a charge), and offer equivalent access to the ... ...

到目前为止,您已经在表达式中使用了句点、星号和其他字符,但有时您需要专门搜索这些字符。

转义元字符

有时您需要搜索字面句点或字面左括号,尤其是在使用源代码或配置文件时。因为这些字符在正则表达式中有特殊含义,你需要“转义”这些字符来告诉 grep 你不希望在这种情况下使用它们的特殊含义。

您可以通过在通常具有特殊含义的字符前面使用反斜杠字符 (\) 来转义字符。

例如,要查找以大写字母开头并以句点结尾的任何行,请使用以下表达式来转义结束句点,以便它表示字面句点而不是通常的 \any character 含义:

  1. grep "^[A-Z].*\.$" GPL-3

这是您将看到的输出:

Output
Source. License by making exceptions from one or more of its conditions. License would be to refrain entirely from conveying the Program. ALL NECESSARY SERVICING, REPAIR OR CORRECTION. SUCH DAMAGES. Also add information on how to contact you by electronic and paper mail.

现在让我们看看其他正则表达式选项。

扩展正则表达式

grep 命令通过使用 -E 标志或通过调用 egrep 命令而不是 grep< 来支持更广泛的正则表达式语言/代码>。

这些选项打开了“扩展正则表达式”的功能。扩展正则表达式包括所有基本元字符,以及用于表达更复杂匹配的附加元字符。

分组

扩展的正则表达式打开的最有用的功能之一是能够将表达式组合在一起以作为一个单元进行操作或引用。

要将表达式组合在一起,请将它们括在括号中。如果您想在不使用扩展正则表达式的情况下使用括号,您可以使用反斜杠对它们进行转义以启用此功能。这意味着以下三个表达式在功能上是等价的:

  1. grep "\(grouping\)" file.txt
  2. grep -E "(grouping)" file.txt
  3. egrep "(grouping)" file.txt

交替

类似于括号表达式如何为单个字符匹配指定不同的可能选择,alternation 允许您为字符串或表达式集指定替代匹配。

要指示交替,请使用管道字符 |。这些通常在括号分组中使用,以指定应将两种或多种可能性中的一种视为匹配项。

以下将在文本中找到 GPL 或 General Public License:

  1. grep -E "(GPL|General Public License)" GPL-3

输出如下所示:

Output
The GNU General Public License is a free, copyleft license for the GNU General Public License is intended to guarantee your freedom to GNU General Public License for most of our software; it applies also to price. Our General Public Licenses are designed to make sure that you Developers that use the GNU GPL protect your rights with two steps: For the developers' and authors' protection, the GPL clearly explains authors' sake, the GPL requires that modified versions be marked as have designed this version of the GPL to prohibit the practice for those ... ...

交替可以通过在选择组中添加额外的选择来在两个以上的选择之间进行选择,这些选择由额外的竖线 (|) 字符分隔。

量词

与与前一个字符或字符集匹配零次或多次的 * 元字符一样,在扩展正则表达式中还有其他可用的元字符指定出现次数。

要匹配一个字符零次或一次,您可以使用 ? 字符。从本质上讲,这使得之前出现的字符或字符集成为可选的。

以下通过将 copy 放在可选组中来匹配 copyright 和 right:

  1. grep -E "(copy)?right" GPL-3

您将收到此输出:

Output
Copyright (C) 2007 Free Software Foundation, Inc. To protect your rights, we need to prevent others from denying you these rights or asking you to surrender the rights. Therefore, you have know their rights. Developers that use the GNU GPL protect your rights with two steps: (1) assert copyright on the software, and (2) offer you this License "Copyright" also means copyright-like laws that apply to other kinds of ...

+ 字符匹配一个表达式一次或多次。这几乎类似于 * 元字符,但是对于 + 字符,表达式必须至少匹配一次。

以下表达式匹配字符串 free 加上一个或多个不是空白字符的字符:

  1. grep -E "free[^[:space:]]+" GPL-3

你会看到这个输出:

Output
The GNU General Public License is a free, copyleft license for to take away your freedom to share and change the works. By contrast, the GNU General Public License is intended to guarantee your freedom to When we speak of free software, we are referring to freedom, not have the freedom to distribute copies of free software (and charge for you modify it: responsibilities to respect the freedom of others. freedomss that you received. You must make sure that they, too, receive protecting users' freedom to change the software. The systematic of the GPL, as needed to protect the freedom of users. patents cannot be used to render the program non-free.

指定匹配重复

要指定重复匹配的次数,请使用大括号字符({ 和 })。这些字符允许您指定一个确切的数字、一个范围,或者一个表达式可以匹配的次数的上限或下限。

使用以下表达式查找 GPL-3 文件中包含三重元音的所有行:

  1. grep -E "[AEIOUaeiou]{3}" GPL-3

返回的每一行都有一个包含三个元音的单词:

Output
changed, so that their problems will not be attributed erroneously to authors of previous versions. receive it, in any medium, provided that you conspicuously and give under the previous paragraph, plus a right to possession of the covered work so as to satisfy simultaneously your obligations under this

要匹配任何包含 16 到 20 个字符的单词,请使用以下表达式:

  1. grep -E "[[:alpha:]]{16,20}" GPL-3

这是此命令的输出:

Output
certain responsibilities if you distribute copies of the software, or if you modify it: responsibilities to respect the freedom of others. c) Prohibiting misrepresentation of the origin of that material, or

仅显示包含该长度内的单词的行。

结论

grep 对于在文件或文件系统层次结构中查找模式非常有用,因此花时间熟悉它的选项和语法是值得的。

正则表达式更加通用,可以用于许多流行的程序。例如,许多文本编辑器实现了用于搜索和替换文本的正则表达式。

此外,大多数现代编程语言都使用正则表达式来对特定数据片段执行过程。一旦理解了正则表达式,您就能够将这些知识应用到许多与计算机相关的常见任务中,从在文本编辑器中执行高级搜索到验证用户输入。

©2015-2025 艾丽卡 support@alaica.com