Wednesday, July 19, 2006

Poor grammar isn't always a bad thing

If you write enough shell scripts you will eventually fall prey to your own comments. Unless you read my blog of course, in which case you will have saved hours of frustration!

Let's take a fictitious problem... You need to print the first and third columns of the /etc/passwd file so that a report can be generated correlating user IDs to user names. Being the UNIX monk that you are, you assure your management that a shell script can meet their every need, and there is really no reason to have an ODBC link from Microsoft Access to the passwd file.

You throw together some code, and it looks like this:

#!/usr/bin/ksh
nawk 'BEGIN { FS=":" }
# We don't want to print anything but
# the first and third column
{print $1,$4}' /etc/passwd
exit 0


Looks like a nice tight algorithm, well commented, and generally a job well done. You pat yourself on the back and refill your coffee, ready for the next challenge. Not so fast... First you decide to test that script, and you see the following:

testbox{cgh}$ ./comtst.ksh
./comtst.ksh[6]: syntax error at line 6 : `'' unmatched
testbox{cgh}$


But how can this be? It's a simple script, and the logic is flawless! Let's test it to be sure...

testbox{cgh}$ nawk 'BEGIN { FS=":" } {print $1,$4}' /etc/passwd
root 1
daemon 1
bin 2
sys 3
adm 4
lp 8
uucp 5
nuucp 9
ftp 60001
smmsp 25
listen 4
nobody 60001
noaccess 60002
nobody4 65534
cgh 1000


It works... What is the problem here?

It turns out that the comments in the embedded nawk code are the problem. In this case, the apostrophe in "don't" closes the opening apostrophe at the beginning of the nawk statement, and the shell interprets the code like this:

#!/usr/bin/ksh nawk 'BEGIN { FS=":" }# We don'


So what we really do it pass nawk a syntactically incorrect program. Having figured it out, we re-write the code as follows:

#!/usr/bin/ksh
nawk 'BEGIN { FS=":" }
# We do not want to print anything but
# the first and third column
{print $1,$4}' /etc/passwd
exit 0


There are two morals to this story: First, at the risk of repeating myself like a broken record, don't use multiple shells unless it's absolutely necessary because you run the risk of obscure interpretation problems. In this case, we could solve the problem by writing in Perl where there's no need to embed a second language.

The second moral is to always avoid using contractions and meta-characters in your comments. It makes for slightly longer comments, but if you scrictly avoid the temptation, it is one less thing to worry about. This example was so simple that it's not hard to locate, but if you had a complex nawk script with its own subroutines buried in a complex shell script, it can be very frustrating trying to locate the bug.

The dark side will tempt you with contractions, but now your Jedi training has equipped you to calm your mind and type out those extra few characters. Until next time, may the code be with you.

No comments: