When searching for text in files or in a directory structure, Linux and Unix-like systems have many tools that can assist you. One of the most common of these is grep, which stands for global regular expression print.
Using grep, you can easily search for any pattern that can be expressed with regular expressions within any set of textual input. However, it is not the fastest tool, and it was created as a general purpose tool without any kind of optimization.
For searching source code specifically, a tool inspired by grep called ack was invented. It leverages Perl regular expressions to efficiently search source code for patterns, while taking care to not include the results you don’t care about.
In this guide, we’ll discuss how to use ack as a super-powered grep replacement for picking out patterns from your source code. The ack tool is available on any platform that can use Perl, but we’ll be demonstrating the utility on an Ubuntu 14.04 server.
To get started, the first step is to install the
ack tool on your machine.
On an Ubuntu or Debian machine, this is as simple as installing the utility from the default repositories. The package is called
sudo apt-get update sudo apt-get install ack-grep
Since the executable is also installed as
ack-grep, we can tell our system to shorten this to
ack with for our command line use of the tool by typing this command:
sudo dpkg-divert --local --divert /usr/bin/ack --rename --add /usr/bin/ack-grep
Now, the tool will respond to the name
ack instead of
If you are planning on using ack on other systems, the installation method may vary. The Perl module in CPAN is called
App::Ack. On other Linux distributions, the package names in the repositories may be different.
Before we get into the actual usage of ack, let’s discuss for a moment how it differs from grep and what files are within the realm of ack.
The ack tool was created specifically for finding text within the source code of programs. Because of this, the tool has been optimized to search certain files and ignore others.
For instance, if you are searching your project’s directory structure, you will almost never want to search the version control system’s repository hierarchy. This contains information about older versions of files, and would likely result in many duplicates. Ack realizes that this is not where you want to search, so it ignores these directories. This leads to more focused results, as well as fewer false positives.
In a similar vein, it will ignore common backup files created by certain text editors. It will also not attempt to search non-coding files commonly found in source directories, such as “minified” versions of web files, image and PDF files, etc. All of these things lead to better results for almost all searches. You can always override these settings during execution.
Another feature of ack is that it knows about the source files of different languages. You can ask it to find all Python files in the directory structure. It will return all files that end with
.py, but it will also return any file that begins with the lines:
<pre> #!<span class=“highlight”>/path/to/</span>python </pre>
This will match files identified by their extension and also files instructed to call the Python interpreter using the common first line magic number calls:
<pre> #!<span class=“highlight”>/path/to/interpreter/to/run</span> </pre>
This creates a powerful way to categorize very different kinds of files as being related. You can also add or modify the groupings to your liking.
The best way to demonstrate the power of ack is to use it on a source code directory.
Luckily, we can easily pull down a source tree from a public site like GitHub. Install git so that we can pull down a repository:
sudo apt-get install git
Now, we need to grab a project. The neovim project is a good example because it contains many different kinds of files. Let’s clone that repository to our home directory:
cd ~ git clone https://github.com/neovim/neovim.git
Now, let’s move into that directory to get started:
Check out the different files to get an idea of the variety we have:
BACKERS.md CMakeLists.txt Doxyfile scripts uncrustify.cfg clint-files.txt config Makefile src vim-license.txt clint.py contrib neovim.rb test cmake CONTRIBUTING.md README.md third-party
Just in that top-level directory, we see markdown files, plain text, a Ruby file, a Python file. And the main portion of the project is written in C.
We also want to set a few things to make our lives easier.
We want to pipe the output directly into
less if the results are larger than our terminal window. This will prevent the output scrolling uncontrollably off the screen.
Do that by typing:
echo '--pager=less -RFX' >> ~/.ackrc
This will create our ack configuration file and add its first non-default option. We tell it to pipe output to
less with some options that will allow it to display colored output and intelligently handle the pass.
Let’s get started. To begin, let me demonstrate the difference between what grep would search and what ack searches.
Grep searches every file in the directory structure for matches. We can see the total number of files in this project by typing:
find . | wc -l
At the time of this writing, there are 566 total files in the neovim project. To find out how many of those files that ack cares about, we can type:
ack -f | wc -l
As you can see, we’ve already eliminated around 12% of the files to be searched without even doing anything.
Let’s say we want to find out all of the instances where the pattern “restrict” is found in this project. We can type:
Doxyfile 1851:# that the size of a graph can be further restricted by MAX_DOT_GRAPH_DEPTH. 1860:# code bases. Also note that the size of a graph can be further restricted by 1861:# DOT_GRAPH_MAX_NODES. Using a depth of 0 means no depth restriction. vim-license.txt 3:I) There are no restrictions on distributing unmodified copies of Vim except 5: unmodified parts of Vim, likewise unrestricted except that they must . . .
As you can see, ack divides up the instances of “restrict” by the file where the matches were found. Furthermore, it gives the exact line number.
But as you can see, some of the instances (all of the sample portion I copied) are matching variations of “restrict” like “restricted” and “restriction”. What if we only want the word “restrict”?
We can use the
-w flag to tell it to search for instances of our pattern surrounded by word boundaries. This will eliminate the other tenses of the word:
ack -w restrict
vim-license.txt 37: add. The changes and their license must not restrict others from clint.py 107: Specify a number 0-5 to restrict errors to certain verbosity levels. src/nvim/fileio.c 6846: * Allow nesting of autocommands, but restrict the depth, because it's . . .
As you can see, our results now only show “restrict” without the variations we saw before. The output is much more focused.
You may have noticed above that the results that we have are found in different kinds of file types. One is a plain text file, there is one found in a Python file, and there are multiple cases in C source files.
If we want to tell ack to only show us the results found in Python files, we can do this painlessly by typing:
ack -w --python restrict
clint.py 107: Specify a number 0-5 to restrict errors to certain verbosity levels.
We haven’t had to specify the file patterns that we were looking for. We haven’t had to craft special regular expressions to catch the type of files we want without matching others. Ack simply knows the files of many common languages and you can refer to them by name.
We’ll go over how you can modify the files ack returns for each language and how to define your own language groups later.
We have gone from a very broad result, to only one by adding some very simple flags. Let’s see exactly how much we’ve narrowed down our results.
We can use the flags
-ch, which we can think of as a simple idiom meaning “how many matches were returned?”. By itself, the
-c flag tells ack to return only the count of matching lines in each file, like this:
ack -c restrict
Doxyfile:3 Makefile:0 uncrustify.cfg:0 .travis.yml:0 neovim.rb:0 vim-license.txt:5
This will return a line for every file, even those with no matches.
-h flag by itself suppresses the filename prefix in the output and eliminates the files with zero results. Together, they’ll spit out a single number representing the number of lines where the search was matched:
ack -ch restrict
We started with 101 results. When we told it to pay attention to word boundaries, we cut a large chunk of these out:
ack -ch -w restrict
And of course, when we specified that we only wanted to see Python files, we narrowed our results to a single match:
ack -ch -w --python restrict
Not only have we narrowed our search, but by adding the language restriction, we’ve actually sped up the search. Ack does not simply filter the results based on the language you request, it does this before searching to save itself from having to search irrelevant files.
We can see this by timing the searches with the
time ack -ch restrict
101 real 0m0.407s user 0m0.363s sys 0m0.041s
Now let’s try the language-specific subset search:
time ack -ch -w --python restrict
1 real 0m0.204s user 0m0.175s sys 0m0.028s
The second is significantly faster.
We’ve already talked about modifying the search output a little bit when we went over the
-h flags. There are other helpful flags that can help us shape the output that we want.
For instance, as you saw before, the
-c flag prints out the number of lines where a match pattern was found in each file. We modified it with
-h, before, but we could also modify it with
-l instead. This will only return numbers for files where the match was found:
ack -cl restrict
Doxyfile:3 vim-license.txt:5 clint.py:1 test/unit/formatc.lua:1 src/nvim/main.c:4 src/nvim/ex_cmds.c:5 src/nvim/misc1.c:1 . . .
As you can see, all of the lines that end with “0” have been pruned from the output.
If you want to see the column that a match is found within a line, you can tell ack to print that information as well with the
<pre> ack -w --column --python restrict </pre> <pre> clint.py 107:<span class=“highlight”>31</span>: Specify a number 0-5 to <span class=“highlight”>restrict</span> errors to certain verbosity levels. </pre>
The second number that is given is the column number where the match’s first character occurs. Some editors let you go to a specific line and column, which makes this very helpful.
For instance, if you open the
client.py file with the
vim text editor, you could go to the exact position of the match by typing
107G to get to the line, and then
31| to get to the column position. This kind of precise positioning can be really helpful, especially if you are searching for a common substring within larger words.
If you need more context for the results, you can tell ack to print out lines before or after the match occurrence. For instance, to print out 5 lines before the “restrict” match in the python file, we can use the
-B flag like this:
<pre> ack -w --python -B 5 restrict </pre> <pre> 102- output=vs7 103- By default, the output is formatted to ease emacs parsing. Visual Studio 104- compatible output (vs7) may also be used. Other formats are unsupported. 105- 106- verbose=# 107: Specify a number 0-5 to <span class=“highlight”>restrict</span> errors to certain verbosity levels. </pre>
You can specify the number of context lines after the match with the
<pre> ack -w --python -A 2 restrict </pre> <pre> 107: Specify a number 0-5 to <span class=“highlight”>restrict</span> errors to certain verbosity levels. 108- 109- filter=-x,+y,… </pre>
You can specify a general purpose context specification that will print a number of lines above and below the matches with the
-C flag. For instance, to get 3 lines of context in either direction, type:
<pre> ack -w --python -C 3 restrict </pre> <pre> 104- compatible output (vs7) may also be used. Other formats are unsupported. 105- 106- verbose=# 107: Specify a number 0-5 to <span class=“highlight”>restrict</span> errors to certain verbosity levels. 108- 109- filter=-x,+y,… 110- Specify a comma-separated list of category-filters to apply: only </pre>
To just print the files that have matches, instead of printing the matches themselves, you can use the
ack -f --python
We can do the same thing, but also specify a pattern for the file/directory structure by using the
-g flag. For instance, we can search for all of the C language files that have the pattern “log” somewhere in their path by typing:
ack -g log --cc
We’ve seen the basics of how to filter by file type. We can tell ack to only show us the C language files by typing:
ack -f --cc
test/includes/pre/sys/stat.h src/nvim/log.h src/nvim/farsi.h src/nvim/main.c src/nvim/ex_cmds.c src/nvim/os/channel.c src/nvim/os/server.c . . .
You can see all of the languages that ack knows about, and which extensions and file properties it associates with each category by typing:
Usage: ack-grep [OPTION]... PATTERN [FILES OR DIRECTORIES] The following is the list of filetypes supported by ack-grep. You can specify a file type with the --type=TYPE format, or the --TYPE format. For example, both --type=perl and --perl work. Note that some extensions may appear in multiple types. For example, .pod files are both Perl and Parrot. --[no]actionscript .as .mxml --[no]ada .ada .adb .ads --[no]asm .asm .s --[no]asp .asp . . .
As you can see, this gives you the matching parameters for each file type. You can also tell ack to exclude files of a certain category by preceding a type with “no”.
So we could see the number of C language files we have by typing:
ack -f --cc | wc -l
And we can do the reverse to see the number of non-C language files by typing:
ack -f --nocc | wc -l
What if we want to modify a type categorization? For instance, what if we want to match
.less files when we are looking for CSS files. We can see that these are already matched within the type “sass” and type “less” categories, but we can also add them to the CSS category if we would like.
To do this, we can use this general syntax:
<pre> ack --type-add=TYPE:FILTER:ARGS </pre>
--type-add command appends additional match rules for a specified
FILTER in this case is
ext, which means match by file extension. We can then tell it that we want to add those additional extensions.
The full command would look like this:
This however only applies to the current command (which isn’t doing any searching). We could add the searching by typing:
ack --type-add=css:ext:sass,scss,less -f --css
This would return any files that end in
.less. There don’t happen to be any of these files in our project. Either way, this command is not very useful because it only exists for the current command. You can make this permanent by adding it to your
echo "--type-add=css:ext:sass,less" >> ~/.ackrc
If we want to create an entirely new type, we would use the
--type-set option instead. The syntax is entirely the same, the only difference being that it is used to define a non-existent type.
As you have probably gathered, the
TYPE from our initial syntax specification is just the category name. The
FILTER we saw was the file extension, but we can use other filters as well.
We can match the file name directly by using the
is filter. To create a type called
example that matches a file called
example.txt, we could add this to our
We can also define matches with normal regular expressions by using the
match filter. For instance, if we wanted to to create a type called “bashcnf” that matches “.bashrc” and “.bash_profile” files, we could type:
echo "--type-set=bashcnf:match:/.bash(rc|_profile)/" >> ~/.ackrc
As you can see, ack is a very flexible tool for working with programming source code. Even if you are just using it to find files within your Linux environment, most of the time, the increased power of ack will be useful.
<div class=“author”>By Justin Ellingwood</div>
Thanks for learning with the DigitalOcean Community. Check out our offerings for compute, storage, networking, and managed databases.