Unbounded Above

Alex Teichman

Quickly Navigating Large Repositories From the Terminal

The tools grep, ack, and ag are great for searching repositories, but often we don’t just want to find a match, we want to navigate to a match and make changes. What follows are some bash functions for rapidly searching a repository and opening selected matches.

This is my primary method for navigating source repositories. I’ve written enough emails with this content that I decided it was time to write it up properly once and for all. I’ll discuss C++ in particular here, but the functions are easy to adapt to whatever language you work in.

cgrep and yank

Here’s a list of the most frequently used commands in my Bash history, with a count in the left column.

Most-used commands
1
2
3
4
5
6
7
17959 l
11922 cd
10079 make
4586 git
3986 hg
3405 cgrep      <------------
2874 cat

So what is this cgrep thing, and why is it my 6th-most-used command? It greps through all C++ source files and returns matches like this:

This is similar to the output of ack --nogroup --cpp imread, but with the addition that you can quickly open one of the matches and navigate to the appropriate line number using yank.

1
$ cgrep imread | yank 6

Running this will open ./src/program/image_cut.cpp in a running Emacs window and move to the imread call on line 253. I use emacsclient to make this possible, and presumably you can do something similar with Vim or whatever editor you use.

It looks like this.


More examples from my Bash history

Because this is all happening on the command line, you can apply whatever Unix filters might be useful. Here I wanted to see all usages of isinf except those in the PCL library.

1
$ cgrep isinf | grep -v './pcl_trunk/'`

Because cgrep is using grep under the hood, all the usual options you know and love still apply. Here I was showing usages of the Eigen library’s InnerIterator, with 10 following lines of context. This makes it easy to scan for errors, then jump to the location with yank to fix them.

1
cgrep InnerIterator -A10

We can also define a variation on cgrep to only search in header files. Here I was looking for compound assignment operators in class declarations.

1
hgrep 'operator[^=!]='`

Details

Here’s how cgrep and friends are implemented. You can paste this in your .bashrc. To hook up yank to emacsclient, you’ll also probably want export EDITOR='emacsclient -n' in your .bashrc and (server-start) in your Emacs init file.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
function cfiles {
    find -regextype posix-egrep -regex '.*\.h$|.*\.hpp$|.*\.c$|.*\.cpp|.*\.cc$'
}

function cgrep {
   2>/dev/null grep -n "$@" $(cfiles) | nl | grep "$@"
}

function hfiles {
    find -regextype posix-egrep -regex '.*\.h|.*\.hpp'
}

function hgrep {
    2>/dev/null grep -n "$@" $(hfiles) | nl | grep "$@"
}

function pyfiles {
    find -regextype posix-egrep -regex '.*\.py$' | grep -v '#'
}

function pygrep {
   2>/dev/null grep -n "$@" $(pyfiles) | nl | grep "$@"
}

function yank {
    grep "^\s*$@\s" | awk '{print $2}' | sed 's/-\([0-9]*\)-/:\1:/g' | awk -F: '{print $1 " +" $2}' | awk '{print $2 " " $1}' | xargs ${EDITOR:?EDITOR must be set.}
}

Related, the cfiles function is handy for doing large search and replace operations.

1
sed -i 's/NameMapping2/NameMapping/g' `cfiles`

If you’re wondering how to easily get a list of your own most frequently used commands, you can use something like this. The awk command likely won’t translate to your machine because I am using a custom definition of HISTTIMEFORMAT, but you get the idea.

1
$ history | awk '{print $6}' | sort | uniq -c | sort -n

Changelog

  • 2013-12-10: Added screencast example, clarified that the point is fast navigation rather than just search.
  • 2013-12-13: Made video scale automatically.