Thursday, August 31, 2006

Initology 101: A lesson in proper use of Solaris run control scripts

Starting and stopping applications through init scripts ought to be a simple thing that doesn't cause much debate, but in fact its just the opposite. I routinely see servers with functional but non-standard artifacts nested in the rc directories. I also hear many justifications for these configurations; some reasonable, others somewhat less so. But in the end, I believe that a systems engineering approach to using init scripts will filter the options, and this article intends to do just that.

There are three specific conventions that I want to address:

1. Which run levels should be used for starting and stopping typical applications.
2. Should a symbolic link (sym-link) or hard-link be used?
3. How should a link be disabled

Let's begin with identifying the correct run levels to start and stop a common application. By common application I mean something that is not a core part of the operating system, but rather in the application layer that depends on the operating environment's core features. Oracle and web servers are common examples of what I consider common applications. Knowing that the Solaris Operating Environment has well defined run level states, the first step is to consult the web site for your particular Solaris version and refer to those definitions. Let's take the case of Solaris 9 (9/05) which is that last release in the Solaris 9 series. I am not going to address Solaris 10 in this context because it uses the new Service Management Facility as part of the new Predictive Self Healing feature to replace init scripts.

According to the Solaris 9 (9/04) System Administration Guide: Basic Administration Section 8: Run Levels and Boot Files We have the following run levels and explanations:

Run LevelDescription
0Shut down all processes and power down to ok> prompt (sparc).
SRun as a single user with some file systems mounted and accessible.
1Administrative state with access to all file systems, but no user logins permitted
2Multi-user state. For normal operations. Multiple users can access the system and all file system. All daemons are running except for the NFS server daemons.
3Multi-user state: For normal operations with NFS resources shared. This is the default run level for the Solaris environment.
4Alternative MU state. This is not used by Solaris, but is available for site customization if needed. I recommend NOT using it.
5Power down after shutting down all processes.
6Reboot the system.

In theory, we need to consider that a system may transition from any run level to any other run level. This means that when the system enters run level S, if our application is running, we need to ensure it is stopped. The same thing goes for 0, 1, and 2. Run level three is the conventional system state associated with end user applications being loaded. Putting this into practice, we will need to install the following links to fully integrate with Solaris' run levels:


These will ensure that our application is started in run level 3, and stopped in any other run level. This contrasts with what I see in most data centers where rc scripts are installed to run level 2 or 3 for start up, and 0 for shut down. While this approach can work for reboots it has a down fall. How many times have you been told that before patching you need to reboot a server into single user mode? This is because kill scripts are not installed for all applications for all run level transitions. I still advocate rebooting into single user mode to be safe, but in a perfect world this would not be necessary.

Having selected the run control directories, you are now ready to put the links in place. But wait! You have another decision to make. Should you use a symbolic link or a hard link? There are all kinds of reasons for and against either method if you approach the question from an emotional standpoint. However, as a Solaris Jedi, you do not allow your emotions to control you. You look for standards.

Referring again to the web site, we return to the Solaris 9 (9/04) System Administration Guide: Basic Administration. This time, to Section 8, How to Add a Run Control Script. The examples on the page clearly show how to use the ln command to create a hard link. This is where the discussion should end. You didn't write Solaris, and you didn't do the integration testing. You are disciplined, and you follow standards; This is the way of the Jedi.

I have heard numerous arguments for using sym-links in place of hard links, and I believe each of them stems from not fully understanding how UNIX file system inodes work, and how Solaris commands can be used to understand them. Using the "ls -i" command you can prove that the files reference the same inode, and are thus the same.

cgh@soleil{/etc/rc0.d}# ls -li /etc/rc3.d/S90samba
9731 -rwxr--r-- 6 root sys 324 Jan 14 2006 /etc/rc3.d/S90samba*
cgh@soleil{/etc/rc0.d}# ls -li /etc/init.d/samba
9731 -rwxr--r-- 6 root sys 324 Jan 14 2006 /etc/init.d/samba*

Notice the first field in each record shows the integer, 9731? That is the inode number. The next field to attend to is the third. In this case, a "6" for each record. This refers to the link count, or number of links that point to the same piece of data.

Another approach to observing all rc links associated with an init script is to use the find command to search a branch of the file system for the inode number matching the init script. Let's look at the standard Samba service included with Solaris 10. We know from the prior example that inode #9731 references the samba script. The following command will seek out all of the hard links:

cgh@soleil{/etc/rc0.d}# find /etc/rc?.d -inum 9731

If these link were symbolic the task would not be as simple, and we would not have the benefit of a link counter to ensure the integrity of our boots.

The last facet of initology I want to discuss is proper convention for disabling an init script on a Solaris server. As with the above examples, the correct process comes right out of the Basic Administration Guide, Section 8. The init scripts only process files that begin with an "S" or a "K". I most often see the upper-case letter replaced with lower case. The number two method I've observed is to remove the links altogether, leaving (hopefully) the init script in place.

The correct process for disabling an init script is almost always to prepend an underscore. The underscore stands out clearly in the list while lower cases characters tend to have less contrast next to the upper case entries. It sounds trivial, but how goood is your eye sight at 3am after your pager goes off? Another benefit is the grouping of all disabled scripts in the directory listing so you can tell at a glance what is turned off. Finally, by not removing it altogether we can preserve the ordering of the scripts, which is some cases is critical. Take a look at the example below, and hopefully my suggestions will be apparent:

cgh@soleil{/etc/rc3.d}# ls -l
total 44
-rw-r--r-- 1 root sys 1285 Jan 21 2005 README
-rwxr--r-- 6 root sys 474 Jan 21 2005 S16boot.server*
-rwxr--r-- 6 root sys 1649 Jan 8 2005 S50apache*
-rwxr--r-- 6 root sys 5840 Jan 29 2004 S52imq*
-rwxr-xr-x 1 root sys 491 Apr 10 12:49 S75seaport*
-rwxr--r-- 6 root sys 685 Jan 21 2005 S76snmpdx*
-rwxr--r-- 6 root sys 1125 Jan 21 2005 S77dmi*
-rwxr--r-- 6 root sys 344 Jan 21 2005 S80mipagent*
-rwxr--r-- 6 root sys 513 May 15 19:21 S81volmgt*
-rwxr-xr-x 5 root sys 2225 Apr 10 12:49 S82initsma*
-rwxr--r-- 5 root sys 824 May 26 2004 S84appserv*
-rwxr--r-- 6 root sys 324 Jan 14 2006 S90samba*
-rw-r--r-- 1 root root 0 Aug 31 21:31 _S92foodb
-rw-r--r-- 1 root root 0 Aug 31 21:31 _S95fooapp

Henceforth, you will properly integrate your scripts with the entire run level facility using hard links. When those magical links need to be disabled you will prepend underscores to them. You are now a master of the Solaris init scripts, and ready to carry this knowledge to others. You are also ready to explore the Solaris 10 SMF and enjoy all that it has to offer.

No comments: