What Is grep?
Grep is a tool that originated from the UNIX world during the 1970’s. It can search through files and folders (directories in UNIX) and check which lines in those files match a given regular expression. Grep will output the filenames and the line numbers or the actual lines that matched the regular expression. All in all a very useful tool for locating information stored anywhere on your computer, even (or especially) if you do not really know where to look.
Using grep
If you type grep regex *.txt
grep will search through all text files in the current folder. It will apply the regex to each line in the files, and print (i.e. display) each line on which a match was found. This means that grep is inherently line-based. Regex matches cannot span multiple lines.
If you like to work on the command line, the traditional grep tool will make a lot of tasks easier. All Linux distributions (except tiny floppy-based ones) install a version of grep by default, usually GNU grep. If you are using Microsoft Windows, you will need to download and install it separately. If you use Borland development tools, you already have Borland’s Turbo GREP installed.
grep not only works with globbed files, but also with anything you supply on the standard input. When used with standard input, grep will print all lines it reads from standard input that match the regex. E.g.: the Linux find
command will glob the current directory and print all file names it finds, so find | grep regex
will print only the file names that match regex.
Grep’s Regex Engine
Most versions of grep use a regex-directed engine, like the regex flavors discussed in the regex tutorial on this website . However, grep’s regex flavor is very limited. On POSIX systems, it uses POSIX Basic Regular Expressions.
An enhanced version of grep is called egrep. It uses a text-directed engine. Since neither grep nor egrep support any of the special features such as lazy repetition or lookaround, and because grep and egrep only indicate whether a match was found on a particular line or not, this distinction does not matter, except that the text-directed engine is faster. On POSIX systems, egrep uses POSIX Extended Regular Expressions. Despite the name “extended”, egrep is almost the same as grep. It just uses a slightly different regex syntax and adds support for alternation, but loses support for backreferences.
GNU grep, the most popular version of grep on Linux, uses both a text-directed and a regex-directed engine. If you use backreferences it uses the regex-directed engine. Otherwise, it uses the faster text-directed engine. Again, for the tasks that grep is designed for, this does not matter to you, the user. If you type the “grep” command, you’ll use the GNU Basic Regular Expressions syntax. If you type the “egrep” command, you’ll use the GNU Extended Regular Expressions syntax. The GNU versions of grep and egrep have exactly the same capabilities, including alternation for grep and backreferences for egrep. They only use a slightly different syntax.
Beyond The Command Line
If you like to work on the command line, then the traditional grep tool is for you. But if you like to use a graphical user interface, there are many grep-like tools available for Windows and other platforms. Simply search for “grep” on your favorite software download site. Unfortunately, many grep tools come with poor documentation, leaving it up to you to figure out exactly which regex flavor they use. It’s not because they claim to be Perl-compatible, that they actually are. Some are almost perfectly compatible (but never identical, though), but others fail miserably when you want to use advanced and very useful constructs like lookaround.