The grep
command is one of the most useful commands in a Linux terminal environment. The name grep
stands for “global regular expression print”. This means that you can use grep
to check whether the input it receives matches a specified pattern. This seemingly trivial program is extremely powerful; its ability to sort input based on complex rules makes it a popular link in many command chains.
In this tutorial, you will explore the grep
command’s options, and then you’ll dive into using regular expressions to do more advanced searching.
To follow along with this guide, you will need access to a computer running a Linux-based operating system. This can either be a virtual private server which you’ve connected to with SSH or your local machine. Note that this tutorial was validated using a Linux server running Ubuntu 20.04, but the examples given should work on a computer running any version of any Linux distribution.
If you plan to use a remote server to follow this guide, we encourage you to first complete our Initial Server Setup guide. Doing so will set you up with a secure server environment — including a non-root user with sudo
privileges and a firewall configured with UFW — which you can use to build your Linux skills.
In this tutorial, you’ll use grep
to search the GNU General Public License version 3 for various words and phrases.
If you’re on an Ubuntu system, you can find the file in the /usr/share/common-licenses
folder. Copy it to your home directory:
If you’re on another system, use the curl
command to download a copy:
You’ll also use the BSD license file in this tutorial. On Linux, you can copy that to your home directory with the following command:
If you’re on another system, create the file with the following command:
Now that you have the files, you can start working with grep
.
In the most basic form, you use grep
to match literal patterns within a text file. This means that if you pass grep
a word to search for, it will print out every line in the file containing that word.
Execute the following command to use grep
to search for every line that contains the word GNU
:
The first argument, GNU
, is the pattern you’re searching for, while the second argument, GPL-3
, is the input file you wish to search.
The resulting output will be every line containing the pattern text:
On some systems, the pattern you searched for will be highlighted in the output.
By default, grep
will search for the exact specified pattern within the input file and return the lines it finds. You can make this behavior more useful though by adding some optional flags to grep
.
If you want grep
to ignore the “case” of your search parameter and search for both upper- and lower-case variations, you can specify the -i
or --ignore-case
option.
Search for each instance of the word license
(with upper, lower, or mixed cases) in the same file as before with the following command:
The results contain: LICENSE
, license
, and License
:
If there was an instance with LiCeNsE
, that would have been returned as well.
If you want to find all lines that do not contain a specified pattern, you can use the -v
or --invert-match
option.
Search for every line that does not contain the word the
in the BSD license with the following command:
You’ll receive this output:
Since you did not specify the “ignore case” option, the last two items were returned as not having the word the
.
It is often useful to know the line number that the matches occur on. You can do this by using the -n
or --line-number
option. Re-run the previous example with this flag added:
This will return the following text:
Now you can reference the line number if you want to make changes to every line that does not contain the
. This is especially handy when working with source code.
In the introduction, you learned that grep
stands for “global regular expression print”. A “regular expression” is a text string that describes a particular search pattern.
Different applications and programming languages implement regular expressions slightly differently. In this tutorial you will only be exploring a small subset of the way that grep
describes its patterns.
In the previous examples in this tutorial, when you searched for the words GNU
and the
, you were actually searching for basic regular expressions which matched the exact string of characters GNU
and the
. Patterns that exactly specify the characters to be matched are called “literals” because they match the pattern literally, character-for-character.
It is helpful to think of these as matching a string of characters rather than matching a word. This will become a more important distinction as you learn more complex patterns.
All alphabetical and numerical characters (as well as certain other characters) are matched literally unless modified by other expression mechanisms.
Anchors are special characters that specify where in the line a match must occur to be valid.
For instance, using anchors, you can specify that you only want to know about the lines that match GNU
at the very beginning of the line. To do this, you could use the ^
anchor before the literal string.
Run the following command to search the GPL-3
file and find lines where GNU
occurs at the very beginning of a line:
This command will return the following two lines:
Similarly, you use the $
anchor at the end of a pattern to indicate that the match will only be valid if it occurs at the very end of a line.
This command will match every line ending with the word and
in the GPL-3
file:
You’ll receive this output:
The period character (.) is used in regular expressions to mean that any single character can exist at the specified location.
For example, to match anything in the GPL-3
file that has two characters and then the string cept
, you would use the following pattern:
This command returns the following output:
This output has instances of both accept
and except
and variations of the two words. The pattern would also have matched z2cept
if that was found as well.
By placing a group of characters within brackets (\[
and \]
), you can specify that the character at that position can be any one character found within the bracket group.
For example, to find the lines that contain too
or two
, you would specify those variations succinctly by using the following pattern:
The output shows that both variations exist in the file:
Bracket notation gives you some interesting options. You can have the pattern match anything except the characters within a bracket by beginning the list of characters within the brackets with a ^
character.
This example is like the pattern .ode
, but will not match the pattern code
:
Here’s the output you’ll receive:
Notice that in the second line returned, there is, in fact, the word code
. This is not a failure of the regular expression or grep. Rather, this line was returned because earlier in the line, the pattern mode
, found within the word model
, was found. The line was returned because there was an instance that matched the pattern.
Another helpful feature of brackets is that you can specify a range of characters instead of individually typing every available character.
This means that if you want to find every line that begins with a capital letter, you can use the following pattern:
Here’s the output this expression returns:
Due to some legacy sorting issues, it is often more accurate to use POSIX character classes instead of character ranges like you just used.
To discuss every POSIX character class would be beyond the scope of this guide, but an example that would accomplish the same procedure as the previous example uses the \[:upper:\]
character class within a bracket selector:
The output will be the same as before.
Finally, one of the most commonly used meta-characters is the asterisk, or *
, which means “repeat the previous character or expression zero or more times”.
To find each line in the GPL-3
file that contains an opening and closing parenthesis, with only letters and single spaces in between, use the following expression:
You’ll get the following output:
So far you’ve used periods, asterisks, and other characters in your expressions, but sometimes you need to search for those characters specifically.
There are times where you’ll need to search for a literal period or a literal opening bracket, especially when working with source code or configuration files. Because these characters have special meaning in regular expressions, you need to “escape” these characters to tell grep
that you do not wish to use their special meaning in this case.
You escape characters by using the backslash character (\
) in front of the character that would normally have a special meaning.
For instance, to find any line that begins with a capital letter and ends with a period, use the following expression which escapes the ending period so that it represents a literal period instead of the usual “any character” meaning:
This is the output you’ll see:
Now let’s look at other regular expression options.
The grep
command supports a more extensive regular expression language by using the -E
flag or by calling the egrep
command instead of grep
.
These options open up the capabilities of “extended regular expressions”. Extended regular expressions include all of the basic meta-characters, along with additional meta-characters to express more complex matches.
One of the most useful abilities that extended regular expressions open up is the ability to group expressions together to manipulate or reference as one unit.
To group expressions together, wrap them in parentheses. If you would like to use parentheses without using extended regular expressions, you can escape them with the backslash to enable this functionality. This means that the following three expressions are functionally equivalent:
Similar to how bracket expressions can specify different possible choices for single character matches, alternation allows you to specify alternative matches for strings or expression sets.
To indicate alternation, use the pipe character |
. These are often used within parenthetical grouping to specify that one of two or more possibilities should be considered a match.
The following will find either GPL
or General Public License
in the text:
The output looks like this:
Alternation can select between more than two choices by adding additional choices within the selection group separated by additional pipe (|
) characters.
Like the *
meta-character that matched the previous character or character set zero or more times, there are other meta-characters available in extended regular expressions that specify the number of occurrences.
To match a character zero or one times, you can use the ?
character. This makes character or character sets that came before optional, in essence.
The following matches copyright
and right
by putting copy
in an optional group:
You’ll receive this output:
The +
character matches an expression one or more times. This is almost like the *
meta-character, but with the +
character, the expression must match at least once.
The following expression matches the string free
plus one or more characters that are not white space characters:
You’ll see this output:
To specify the number of times that a match is repeated, use the brace characters ({
and }
). These characters let you specify an exact number, a range, or an upper or lower bounds to the amount of times an expression can match.
Use the following expression to find all of the lines in the GPL-3
file that contain triple-vowels:
Each line returned has a word with three vowels:
To match any words that have between 16 and 20 characters, use the following expression:
Here’s this command’s output:
Only lines containing words within that length are displayed.
To validate CSV fields, you can use grep
with regular expressions to check for specific patterns or formats. For example, to check if all lines in a CSV file have exactly 5 comma-separated fields, you can use the following command:
This command will print only the lines that match the specified pattern, indicating valid CSV fields.
Filtering logs by error level is a common use case for grep
. To filter logs for lines containing the word “ERROR”, you can use the following command:
This command will print all lines from logs.txt
that contain the word “ERROR”, allowing you to focus on error messages.
When searching through source code, grep
can be used to find specific functions or patterns. For example, to find all occurrences of a function named calculateTotal
in a directory of source code files, you can use the following command:
This command will recursively search through all files in the specified directory and print the lines that contain the function name.
Regular expressions can be used to match URLs or email addresses in text. For example, to find all lines in a file that contain a URL, you can use the following command:
This command will print all lines that contain a URL starting with “http://” or “https://”.
Stopwords are common words like “the”, “and”, etc., that do not carry much meaning in a sentence. To filter out lines containing stopwords, you can use grep
with a list of stopwords. For example, to filter out lines containing the stopwords “the”, “and”, or “a”, you can use the following command:
This command will print all lines that do not contain any of the specified stopwords.
Regular expressions can be used to detect near-duplicate entries or misspellings by matching similar patterns. For example, to find lines that contain words with a single character difference, you can use the following command:
This command will print all lines that contain words with a single character repeated, indicating potential near-duplicates or misspellings.
Regular expressions can be used to match named entities or common phrases in text. For example, to find all lines that contain a specific phrase like “named entity recognition”, you can use the following command:
This command will print all lines that contain the specified phrase, allowing you to focus on relevant information.
When using regex operators like *
, +
, or ?
, it’s essential to escape them properly to avoid unexpected matches. For example, if you want to match the literal character *
, you should escape it with a backslash (\*
). Similarly, to match the literal characters +
or ?
, escape them with a backslash (\+
or \?
).
Example command to match a literal *
character:
To match empty lines or lines with only whitespace, you can use the following regex pattern:
This pattern matches lines that start (^
) and end ($
) with any whitespace characters (\s*
).
To match tabs (\t
) or carriage returns (\r
) in your text, you can use the following commands:
Note that in some cases, you might need to use the -E
option to enable extended regex patterns, which allow you to use more advanced regex features.
Command | Description | Features | Use Cases | Sample Command |
---|---|---|---|---|
grep | Basic pattern matching | Supports basic regex | General pattern matching | grep "pattern" file.txt |
egrep | Extended pattern matching | Supports extended regex | Complex pattern matching | egrep "pattern" file.txt |
fgrep | Fixed pattern matching | No regex support | Matching fixed strings | fgrep "pattern" file.txt |
<$>[noe]
Note: The main difference between these commands is the type of pattern matching they support. grep
supports basic regex, egrep
supports extended regex, and fgrep
does not support regex at all.
<$>
grep
is not well-suited for handling multiline patterns due to its line-oriented nature. However, there are alternative tools that can effectively handle such patterns. awk
and perl
are two popular options that can be used to search for patterns spanning multiple lines.
awk
is a powerful text processing tool that can be used to match patterns across multiple lines. It allows you to define a pattern to match and then perform actions on the matched lines. For example, to find lines that contain a pattern across multiple lines, you can use the following command:
This command will print all lines that match the specified pattern. Note that awk
can also be used to perform more complex operations on the matched lines, such as printing the entire block of text that matches the pattern.
perl
is another powerful tool that can be used to handle multiline patterns. It offers a more flexible and expressive way of matching patterns using its built-in regular expression engine. For example, you can use the following command to find lines that contain a pattern across multiple lines:
This command tells perl
to read the file in “slurp” mode (-0777
), which allows it to read the entire file into memory at once. The -ne
option specifies that the script should be executed for each line in the file. The print if /pattern/s
statement matches the pattern across multiple lines (due to the s
modifier) and prints the entire block of text that matches.
Both awk
and perl
offer more advanced features and flexibility than grep
when it comes to handling multiline patterns, making them ideal alternatives for such tasks.
grep
and egrep
are both used for pattern matching, but they differ in the type of patterns they support. grep
supports basic regular expressions, while egrep
supports extended regular expressions, which allow for more advanced pattern matching.
Yes, you can use grep
to search across multiple files by specifying multiple file names or using wildcards. For example:
or
To grep for lines that do not match a pattern, use the -v
option. For example:
This will print all lines that do not contain the specified pattern.
To include line numbers in grep
output, use the -n
option. For example:
This will print the line numbers along with the lines that match the specified pattern.
There could be several reasons why your grep
regex is not working as expected. Here are a few common issues to check:
grep
command (e.g., -E
for extended regex).To search for a pattern that includes whitespace or special characters, you need to properly escape these characters in your regex pattern. For example, to search for a pattern that includes whitespace, use the following command:
Similarly, to search for a pattern that includes special characters, escape them using a backslash (\
). For example:
grep
is a powerful tool for finding patterns within files or within the file system hierarchy. Mastering its options and syntax will greatly enhance your ability to work with text data.
Regular expressions are a fundamental concept in computing, and understanding them will open up a wide range of possibilities. From advanced text searching and replacement in text editors to data validation in programming languages, regular expressions are an essential skill to have.
To further improve your skills, we recommend checking out the following tutorials:
By exploring these tutorials, you’ll gain a deeper understanding of grep
, regular expressions, and other essential Linux tools, enabling you to tackle a wide range of tasks with confidence.
Spin up a real linux environment on a hosted virtual machine in seconds with DigitalOcean droplets! Simple enough for any user, powerful enough for fast-growing applications or businesses.
This textbox defaults to using Markdown to format your answer.
You can type !ref in this text area to quickly search our full set of tutorials, documentation & marketplace offerings and insert the link!
I mostly need it for searching snippets of code. Thanks for the writeup!
Great article…
Great article, thanks!
| The following expression matches the string “free” plus one or more characters that are not whitespace:
This wouldn’t match tab whitespace. For it to match tab whitespaces, you’ll need to make :space: into :blank: like so:
Hi , i need a quick answer to this question, I have the following two string where i search for 20141107(Date) soon after 5 0r 6 characters after start, and if it comes latter, it should be ignored. in simple words, i believe there is no match for 20141107 in below two string. please help me write a regular expression for this.
FSDE5201410201411074218GCDR FSDE54201410201411074218GCDR
Crystal clear! I’m very grateful to this article.
How can i make RE for following ,
function any_fun_name**(any_arq,more_arq,…){** /// Code }
i want to match pattern for font in bold,
valid is,
function abc(){ }
function abvvdd(dfd,dfd,dfd){ }
Not valid is
function adfhdf() { }
thanks
i mostly need it for searching special characters . i need search for “sometext *” sometext space star
can you give me the command ? thanks
Nice Article , can we add this too ?
Usage 8: Search for multiple patterns for single word.
fgrep –f file_with_patterns.txt file_to_search.txt
Normally we search for only one pattern.But with fgrep we can give multiple patterns for searching string.
Usage 9: Search for String in Zip Files.
zgrep –i warning /var/log/Logs.gz
In linux,we zip the files with extension as “.gz”.We can search in Zip file by using above command.
Best slack i did for grep till now, cool run!
Thank you for providing explanations, demonstrations and hands-on material to experiment with…