Friday, January 18, 2008

Queuing Efficiency

On my way through the office door this morning I had an experience that I realized needs to be recorded so that I don't go on and rant about it in the future. This post will allow me instead to reference it, thus saving a vicious cycle of rehashing.

In addition to my career in systems engineering, and my small photography business I take martial arts classes. A few days ago I managed to pull a hamstring and sprain a toe on the opposite leg. Don't get me wrong - it's worth it. But these injuries are relevant to the story because they allow you to appreciate my inability to move quickly. I'm not limping, I'm just moving cautiously.

The weather was cold, and the sidewalks covered in treacherous ice. A bitter wind cut through me as I approached the building. Moving at a determined but non-rapid pace, I noticed someone about a mile (or three) front of me, presumably intending to enter the same doors. And then it happened.

They decided to take it upon themselves to hold the door open. At this point they almost need a telescope or radar to even know I'm planning to go through the same door, but yet they stood there holding it open while the arctic air flooded the entry to our building.

Here's the deal people... If someone behind you stands a chance of having the door slam in their face, then you hold it open. If they are carrying a heavy load and don't have arms free, you can wait until they get close to open the door for them. If it's more than five steps, you move along. Let's stop the self-gratifying good deed of holding a door open just to see if the recipient of your good will will start to jog because they feel guilty you are letting all that cold air into the building. If they have enough room to jog they probably don't need you to hold the door.

Wednesday, January 16, 2008

Got dependencies?

As anyone who follows my blog has learned, I'm a packaging junkie. I write packages for everything I deploy, and happily plunk them into my JumpStart server where they perform their duty in a predictable and maintainable way. Does it get any better than this? Well, actually... It got pretty rough this week as my troubleshooting skills enjoyed a solid workout.

We are in the process of moving our standard deployment from Solaris 9 with SRM project containers to Solaris 10 Zones. Now Zones are really surprisingly simple for the rich myriad of benefits they provide. It takes very little time to get to the point where you can set up a test box with multiple zones in a basic Solaris environment. But what if you don't have a basic environment?

We have packaged everything. Most of our /etc files that aren't stock are edited by class action scripts or postinstall routines. We've deployed hundreds of servers with this configuration and it worked flawlessly until we started deploying Zones. Then suddenly, our zones couldn't talk to the Directory, were missing resolv.conf, and all kinds of other cascading problems. Ugh. Where to start?

The first thing I did was check the error messages. Pretty clever, no?

The file /export/zones/testzone01/root/var/sadm/system/logs/install_log contains a log of the zone installation.

Ok, let's see what's in the log...

WARNING: setting mode of /var/spool/cron/crontabs/root to default mode (644)
ERROR: attribute verification of /export/zones/testzone01/root/var/spool/cro
n/crontabs/root failed
pathname does not exist

Installation of CGHtest on zone testzone01 partially failed.

Well that's strange. Let's see what the registry says about root's crontab:

$ grep "/var/spool/cron/crontabs/root" /var/sadm/install/contents
/var/spool/cron/crontabs/root e cron 0600 root sys 48 3760 1200431631 SUNWcsr:cronroot

The root crontab is owned by SUNWcsr. So it turns out that when the global zone creates the sparse root zone, it tried to install CGHtest before SUNWcsr was installed. The root of my problem was slipping a habit that I'm normally quite vigorous about enforcing.

Any time I use question marks in my prototype file to inherit attributes from an already-installed package I always, always, always stop immediately to look up that inherited object in the registry and add its parent package to a depend file. Here's an example prototype file displaying inheritance:

$ cat prototype
i pkginfo
i depend
i copyright
i i.cron
i r.cron
d none var ? ? ?
d none var/spool ? ? ?
d none var/spool/cron ? ? ?
d none var/spool/cron/crontabs ? ? ?
e cron var/spool/cron/crontabs/root ? ? ?

This was never a problem in our traditional JumpStart environment because the JET custom packages are installed after SUNWcsu. Fortunately, it's easy to control the order (dependency) of package installations using the depend(4) file.

$ cat depend
P SUNWcsr Core Solaris, (Root)
$ grep depend ./prototype
i depend

Problem solved! Of course, the problem never would have happened if I'd remembered my Jedi training. It's good to be humbled on a semi-regular basis.