Introduction
These days I spend a lot of time in the bash shell. I use it for ad-hoc scripting or driving several Linux boxes. In my current project we set up a continuous delivery environment and migrate code onto it. I lift code from CVS to SVN, mavenize Ant builds and funnel artifacts into Nexus. One script I wrote determines if a jar that was checked into a CVS source tree exists in Nexus or not. This check can be done via the Nexus REST API. More on this script at the end of the blog. But first let’s have a look at a few bash commands that I use all the time in day-to-day bash usage, in no particular order.
- find
- for
- tr
- awk
- sed
- xargs
- grep
- sort
- Reverse search (CTRL-R)
- !!
Find searches files recursively in the current directory.
$ find -name *.jar
This command lists all jars in the current directory, recursively. We use this command to figure out if a source tree has jars. If this is the case we add them to Nexus and to the pom as part of the migration from Ant to Maven.
$ find -name *.jar -exec sha1sum {} \;
Find combined with exec is very powerful. This command lists the jars and computes sha1sum for each of them. The shasum command is put directly after the -exec flag. The {} will be replaced with the jar that is found. The \; is an escaped semicolon for find to figure out when the command ends.
For loops are often the basis of my shell scripts. I start with a for loop that just echoes some values to the terminal so I can check if it works and then go from there.
$ for i in $(cat items.txt); do echo $i; done;
The for loop keywords should be followed by either a newline or an ‘;’. When the for loop is OK I will add more commands between the do and done blocks. Note that I could have also used find -exec but if I have a script that is more than a one-liner I prefer a for loop for readability.
Transliterate. You can use this to get rid of certain characters or replace them, piecewise.
$ echo ‘Com_Acme_Library’ | tr ‘_A-Z’ ‘.a-z’
Lowercases and replaces underscores with dots.
$ echo 'one two three' | awk '{ print $2, $3 }'
Prints the second and third column of the output. Awk is of course a full blown programming language but I tend to use this snippets like this a lot for selecting columns from the output of another command.
Stream EDitor. A complete tool on its own, yet I use it mostly for small substitutions.
$ cat 'foo bar baz' | sed -e 's/foo/quux/'
Replaces foo with quux.
Run a command on every line of input on standard in.
$ cat jars.txt | xargs -n1 sha1sum
Run sha1sum on every line in the file. This is another for loop or find -exec alternative. I use this when I have a long pipeline of commands in a oneliner and want to process every line in the end result.
Here are some grep features you might not know:
$ grep -A3 -B3 keyword data.txt
This will list the match of the keyword in data.txt including 3 lines after (-A3) and 3 lines before (-B3) the match.
$ grep -v keyword data.txt
Inverse match. Match everything except keyword.
Sort is another command often used at the end of a pipeline. For numerical sorting use
$ sort -n
This one isn’t a real command but it’s really useful. Instead of typing history and looking up a previous command, press CTRL-R,
start typing and have bash autocomplete your history. Use escape to quit reverse search mode. When you press CTRL-R your prompt will look like this:
(reverse-i-search)`':
Pronounced ‘bang-bang’. Repeats the previous command. Here is the cool thing:
$ !!:s/foo/bar
This repeats the previous command, but with foo replaced by bar. Useful if you entered a long command with a typo. Instead of manually replacing one of the arguments replace it this way.
Bash script – checking artifacts in Nexus
Below is the script I talked about. It loops over every jar and dll file in the current directory, calls Nexus via wget and optionally outputs a pom dependency snippet. It also adds a status column at the end of the output, either an OK or a KO, which makes the output easy to grep for further processing.
#!/bin/bash ok=0 jars=0 for jar in $(find $(pwd) 2&>/dev/null -name '*.jar' -o -name '*.dll') do ((jars+=1)) output=$(basename $jar)-pom.xml sha1=$(sha1sum $jar | awk '{print $1}') response=$(curl -s http://oss.sonatype.org/service/local/data_index?sha1=$sha1) if [[ $response =~ groupId ]]; then ((ok+=1)) echo "findjars $jar OK" echo "" >> "$output" echo "$response" | grep groupId -A3 -m1 >> "$output" echo "" >> "$output" else echo "findjars $jar KO" fi done if [[ $jars > 0 ]]; then echo "findjars Found $ok/$jars jars/dlls. See -pom.xml file for XML snippet" exit 1 fi
Conclusions
It is amazing what you can do in terms of scripting when you combine just these commands via pipes and redirection! It’s like a Pareto’s law of shell scripting, 20% of the features of bash and related tools provide 80% of the results. The basis of most scripts can be a for loop. Inside the for loop the resulting data can be transliterated, grepped, replaced by sed and finally run through another program via xargs.
References
The Bash Cookbook is a great overview of how to solve solutions to common problems using bash. It also teaches good bash coding style.
Nice list – Always good to have set of bash tools in the toolbox.
I can add a combination of the above mentioned tools I find useful when debugging classloading issues in java:
$ for i in `find . -name *.jar`; do echo $i; jar tvf $i | grep [a name of a class]; done
It searches for a given classname in all the jars found in the directory or below.
If the class is found it gets printed together with the name of the jarfile it was located in.
Very handy if you i.e. have another version of a class is loaded, than you compiled against.
Really neet tricks. I especially cannot believe I never knew #7. I will now dive into the Bash Cookbook for more inspiration.
Hi Søren! I know what you mean, I also discovered some of the commands only recently 🙂 Btw, these are also good resources on bash: http://www.tldp.org/LDP/Bash-Beginners-Guide/html/Bash-Beginners-Guide.html and http://tldp.org/LDP/abs/html/
Hi Jacob!
Nice, I am stealing that one! 😉
Nice tricks. I never knew #10. Thanks!
As for #9 and #10, I’ve learned the hard way to be very careful with history tricks.
Was one of your typos a leading space before the long command? Then it’s not in your history, and the command before it gets repeated instead. This is a trivial example. (In case the formatting gets borked, line 2 is supposed to have a spurious leading space)
1$ rm -rf foo/*
2$ my_script_that_logs_to_foo()
3$ !!:s/foo/bar
Uhm.. Ooops!
Why not use arrow up instead? You’ll very likely never use bash with a keyboard that doesn’t have arrow keys.
The only history command I regularly use is !$ which is substituted with the arguments of the previous command
Yeah you gotta be careful indeed. !$ is useful also.
I’ve actually done that, and worse. The worst of all was near the end of a week long running number crunch job in college. A bug in a shell script caused the stdout log to be written to a giant file named “*” (without the quotes) in the same directory as the calculation results. Sometimes the fingers start typing before the brain can intervene…
@Jacob – It seems you have to put quotes around the find command to prevent the subshell from interpolating the *.jar wildcard. Like this:
#!/bin/bash
for i in $(find -name “*.jar”);
do
jar tvf $i | grep $1
done
Otherwise I get:
find: paths must precede expression: abc.jar
Usage: find [-H] [-L] [-P] [-Olevel] [-D help|tree|search|stat|rates|opt|exec] [path…] [expression]