Friday, May 18, 2007

Apache in Solaris 10: 3 Simple Things I Would Change

The Apache legacy run control script in Solaris 10 (/etc/init.d/apache) provides an excellent example of a few practices to avoid when writing init scripts.

Take a look at the code snippet below:

if [ ! -f ${CONF_FILE} ]; then
exit 0
fi


Are you kidding me? Of course this is easy to debug, but let's look at what it does anyway: If the configuration file is missing, when you ask to start Apache, and it will exit with a code of zero when it doesn't find the /etc/apache/httpd.conf file. In case you didn't catch the first four words of this paragraph I'll repeat them. Are you kidding me?

Here's a simple improvement...

if [ ! -f ${CONF_FILE} ]; then
echo "ERROR: ${CONF_FILE} not found. Exiting."
exit 1
fi


The first change was to exit with a non-zero status. Zero is the UNIX standard exit code representing successful completion. If the configuration file is missing and you request a startup, it should NOT exit with a zero status.

The second change is to provide a concise error message indicating why the exit code is going to be zero. There is no benefit to bolstering the cryptic nature of UNIX. In my mind the best systems are designed such that a tired SA at 4AM has a reasonable chance of accurate debug and corrective action.

Having said all this, the reason the code is necessarily convoluted because the not-yet-configured service has an active set of init scripts in the run control directories.

cgh@testbox{etc}$ ls -i /etc/init.d/apache 21813 /etc/init.d/apache*
cgh@testbox{etc}$ find /etc/rc?.d -inum 2813
/etc/rc0.d/K16apache
/etc/rc1.d/K16apache
/etc/rc2.d/K16apache
/etc/rc3.d/S50apache
/etc/rcS.d/K16apache


So the root cause of our problem is that someone decided to make it easy for someone who doesn't understand the Solaris Run Control facility to start Apache by simply creating the httpd.conf file. Is that really a good idea? I would argue that for many reasons it's a bad practice. If a service is not configured to run, it should not be active in any run level.

The third detail I would change is Solaris' default behavior of installing active sym-links in the legacy rc directories, and instead use an SMF manifest that adheres to standards.

None of this impacts the otherwise excellent web server that Sun has integrated into their OS, and I'm grateful that Sun has provided it in their standard OS rather than leaving it to the semi-integrated Companion CD. I woudl, however, like to see that integration brought up to Jedi standards.

5/21/07 Postscript: I probably should have made it clear that the Apache2 server is implemented nicely using SMF, and is probably what you ought be to using on Solaris 10 if you've decided to forego the JES Web Server. I don't think that excuses the older Apache server from maintaining Jedi discipline, but it does move the issue a bit toward the background.

No comments: