Thursday, August 31, 2006

Initology 101: A lesson in proper use of Solaris run control scripts

Starting and stopping applications through init scripts ought to be a simple thing that doesn't cause much debate, but in fact its just the opposite. I routinely see servers with functional but non-standard artifacts nested in the rc directories. I also hear many justifications for these configurations; some reasonable, others somewhat less so. But in the end, I believe that a systems engineering approach to using init scripts will filter the options, and this article intends to do just that.

There are three specific conventions that I want to address:

1. Which run levels should be used for starting and stopping typical applications.
2. Should a symbolic link (sym-link) or hard-link be used?
3. How should a link be disabled

Let's begin with identifying the correct run levels to start and stop a common application. By common application I mean something that is not a core part of the operating system, but rather in the application layer that depends on the operating environment's core features. Oracle and web servers are common examples of what I consider common applications. Knowing that the Solaris Operating Environment has well defined run level states, the first step is to consult the web site for your particular Solaris version and refer to those definitions. Let's take the case of Solaris 9 (9/05) which is that last release in the Solaris 9 series. I am not going to address Solaris 10 in this context because it uses the new Service Management Facility as part of the new Predictive Self Healing feature to replace init scripts.

According to the Solaris 9 (9/04) System Administration Guide: Basic Administration Section 8: Run Levels and Boot Files We have the following run levels and explanations:

Run LevelDescription
0Shut down all processes and power down to ok> prompt (sparc).
SRun as a single user with some file systems mounted and accessible.
1Administrative state with access to all file systems, but no user logins permitted
2Multi-user state. For normal operations. Multiple users can access the system and all file system. All daemons are running except for the NFS server daemons.
3Multi-user state: For normal operations with NFS resources shared. This is the default run level for the Solaris environment.
4Alternative MU state. This is not used by Solaris, but is available for site customization if needed. I recommend NOT using it.
5Power down after shutting down all processes.
6Reboot the system.

In theory, we need to consider that a system may transition from any run level to any other run level. This means that when the system enters run level S, if our application is running, we need to ensure it is stopped. The same thing goes for 0, 1, and 2. Run level three is the conventional system state associated with end user applications being loaded. Putting this into practice, we will need to install the following links to fully integrate with Solaris' run levels:


These will ensure that our application is started in run level 3, and stopped in any other run level. This contrasts with what I see in most data centers where rc scripts are installed to run level 2 or 3 for start up, and 0 for shut down. While this approach can work for reboots it has a down fall. How many times have you been told that before patching you need to reboot a server into single user mode? This is because kill scripts are not installed for all applications for all run level transitions. I still advocate rebooting into single user mode to be safe, but in a perfect world this would not be necessary.

Having selected the run control directories, you are now ready to put the links in place. But wait! You have another decision to make. Should you use a symbolic link or a hard link? There are all kinds of reasons for and against either method if you approach the question from an emotional standpoint. However, as a Solaris Jedi, you do not allow your emotions to control you. You look for standards.

Referring again to the web site, we return to the Solaris 9 (9/04) System Administration Guide: Basic Administration. This time, to Section 8, How to Add a Run Control Script. The examples on the page clearly show how to use the ln command to create a hard link. This is where the discussion should end. You didn't write Solaris, and you didn't do the integration testing. You are disciplined, and you follow standards; This is the way of the Jedi.

I have heard numerous arguments for using sym-links in place of hard links, and I believe each of them stems from not fully understanding how UNIX file system inodes work, and how Solaris commands can be used to understand them. Using the "ls -i" command you can prove that the files reference the same inode, and are thus the same.

cgh@soleil{/etc/rc0.d}# ls -li /etc/rc3.d/S90samba
9731 -rwxr--r-- 6 root sys 324 Jan 14 2006 /etc/rc3.d/S90samba*
cgh@soleil{/etc/rc0.d}# ls -li /etc/init.d/samba
9731 -rwxr--r-- 6 root sys 324 Jan 14 2006 /etc/init.d/samba*

Notice the first field in each record shows the integer, 9731? That is the inode number. The next field to attend to is the third. In this case, a "6" for each record. This refers to the link count, or number of links that point to the same piece of data.

Another approach to observing all rc links associated with an init script is to use the find command to search a branch of the file system for the inode number matching the init script. Let's look at the standard Samba service included with Solaris 10. We know from the prior example that inode #9731 references the samba script. The following command will seek out all of the hard links:

cgh@soleil{/etc/rc0.d}# find /etc/rc?.d -inum 9731

If these link were symbolic the task would not be as simple, and we would not have the benefit of a link counter to ensure the integrity of our boots.

The last facet of initology I want to discuss is proper convention for disabling an init script on a Solaris server. As with the above examples, the correct process comes right out of the Basic Administration Guide, Section 8. The init scripts only process files that begin with an "S" or a "K". I most often see the upper-case letter replaced with lower case. The number two method I've observed is to remove the links altogether, leaving (hopefully) the init script in place.

The correct process for disabling an init script is almost always to prepend an underscore. The underscore stands out clearly in the list while lower cases characters tend to have less contrast next to the upper case entries. It sounds trivial, but how goood is your eye sight at 3am after your pager goes off? Another benefit is the grouping of all disabled scripts in the directory listing so you can tell at a glance what is turned off. Finally, by not removing it altogether we can preserve the ordering of the scripts, which is some cases is critical. Take a look at the example below, and hopefully my suggestions will be apparent:

cgh@soleil{/etc/rc3.d}# ls -l
total 44
-rw-r--r-- 1 root sys 1285 Jan 21 2005 README
-rwxr--r-- 6 root sys 474 Jan 21 2005 S16boot.server*
-rwxr--r-- 6 root sys 1649 Jan 8 2005 S50apache*
-rwxr--r-- 6 root sys 5840 Jan 29 2004 S52imq*
-rwxr-xr-x 1 root sys 491 Apr 10 12:49 S75seaport*
-rwxr--r-- 6 root sys 685 Jan 21 2005 S76snmpdx*
-rwxr--r-- 6 root sys 1125 Jan 21 2005 S77dmi*
-rwxr--r-- 6 root sys 344 Jan 21 2005 S80mipagent*
-rwxr--r-- 6 root sys 513 May 15 19:21 S81volmgt*
-rwxr-xr-x 5 root sys 2225 Apr 10 12:49 S82initsma*
-rwxr--r-- 5 root sys 824 May 26 2004 S84appserv*
-rwxr--r-- 6 root sys 324 Jan 14 2006 S90samba*
-rw-r--r-- 1 root root 0 Aug 31 21:31 _S92foodb
-rw-r--r-- 1 root root 0 Aug 31 21:31 _S95fooapp

Henceforth, you will properly integrate your scripts with the entire run level facility using hard links. When those magical links need to be disabled you will prepend underscores to them. You are now a master of the Solaris init scripts, and ready to carry this knowledge to others. You are also ready to explore the Solaris 10 SMF and enjoy all that it has to offer.

Wednesday, August 23, 2006

Spotlight on Richard McDougall

If you haven't yet visited Richard McDougall's Blog you should fire up your browser and head over. I had the pleasure of meeting Richard at a SunUP Network Conference in Singapore where we were both giving presentations. We met up again at later conferences in Sydney Australia and Boston in the same scenario, and he hit it out of the park every time he got in front of customers. It's very rare to meet someone as brilliant as Richard who is also so down to Earth and generous with his knowledge; he is a true Jedi Master in the land of Solaris.

One characteristic you observe early in Richard's presentations is his enthusiasm for Solaris and its potential. This article on Chip Multi-Threading is a classic example of his style, and was what inspired me to write this entry. I remember him speaking about some of Sun Volume Manager's (SVM) new (at the time) features which were specifically designed to address the reasons customers had chosen Veritas Volume Manager. Rather than attacking the message with the technical nuts and bolts, he hit on a few topics and delivered the message that Sun was listening. I am certain that more people re-examined SVM after his delivery than any speeds and feeds preso would have motivated. A true Jedi master delivers important messages without patronizing through understanding the intended recipients.

The other item I want to draw your attention to is his new set of books: Solaris Internals and Solaris(TM) Performance and Tools: DTrace and MDB Techniques for Solaris 10 and OpenSolaris (Hardcover) which were just delivered to me from Amazon. First of all, I hate poorly bound books. I buy books to use as reference manuals - tools of my trade. These books feel like professional tools that you will appreciate returning to. Remember buying that Calculus book in college? The one that weighs 25 lbs? This is that book. I love it! They are expensive, but good books aren't cheap, and the investment the author made in sharing his skills isn't cheap either. The first edition of Solaris Internals is well known as the authoritative reference on Solaris plumbing, and with all of the exciting changes Solaris 10 brings, this book is timely. I'm anxious to dig into it and post a review, but in the mean time please check the books out. This page has more information, and sample content.

I'd like to start paying tribute to some of the Jedi Masters I've benefited from, and this post serves as the first. Please take a moment to read Richard's Blog. Check back frequently - if he posts it, you should know about it. And if you need a diversion from Solaris, he's also a great photographer.

Sunday, August 20, 2006

The verdict: Ubuntu Linux is a keeper

I'm writing now from the keyboard of my reborn laptop. Having just completed the installation and configuration of Ubuntu Linux on it, and happily retired its installation of Windows XP. Since this blog is really about Sun Solaris and Systems Engineering I don't want to spend too much time talking about this from a technical standpoint. It does have relevance to the theme as we all need a portable means of working on Solaris systems. If you use UNIX as your primary operating environment, you know how awkward it is to depend on Windows as your interface to the systems you support.

So far, Unbuntu "just works" with no headaches at all; It auto-detects and configures my Netgear WG511 "G" Network Card, and can successfully enter and exit both hibernate and suspend modes. These were the two big headaches for me under Fedora Core. I am really impressed that the special volume and mute keys worked as well. Those used to require installing a separate Thinkpad buttons package called tpb. The boot screens look slick, the theme is very clean and coherant, and the desktop is clean and EMPTY. I love that! I give it two thumbs up. It looks like I'm finally going to learn the Debian flavor Linux after years of being a die-hard Red-Hat camper.

Now let me clarify this position; I'm not changing my opinion about Mac OS being the ultimate desktop. But, I can obtain an old IBM Thinkpad T20 for a fraction of the cost of a PowerBook or MacBook. I wouldn't want to process my photographs on it, but for a tool that lets me perform systems work, and use typical Office Software, I'm very happy.

Saturday, August 19, 2006

Microsoft's Genuine Advantage

I have an older IBM Thinkpad T23 laptop which I purchased after sending most of my scrapyard through eBay. I bought it with the intention of running 90% Linux, and occasionally using Windows when I need some odd utility, or have to connect to something that only speaks Windows. The T23 is a rock solid machine, and being far from the bleeding edge, also has pretty decent hardware compatibility. With 1GB of RAM and a 1GHz CPU, the machine is plenty fast for it's intended role as a terminal web browser, email client, and occasional OpenOffice platform. I bet the most used application on it was gnome-terminal if I really analyzed the accounting records; Nothing stressful.

When I first loaded Fedora Linux it was a simple process to get the machine useable. Useable and optimial turned out to be divided by a full-strength, adult size, bang-a-roo of a headache. The little things like getting it to play MP3s didn't phase me too much. In fact, taken one by one the entire list isn't anything that can't be handled. The problem is that I'm tired of having to handle things. I just want my computers to let me do what I want without HAVING to hack. I'd rather hack by choice than for base survival.

I was able to get Wireless ethernet working after some digging, but what really sent me over the edge was ACPI. My power consumption was awful, and getting it to be even close to Windows proved as complex as tuning 100 Oracle instances fighting for the resources of a SparcStation 5. Not fun at all. Eventually, I decided that despite my inability to mentally mesh with Window's gears I would dump Linux and stick to the main stream.

I bought a copy of Windows XP Pro off eBay, complete with hologram media, funky sticker, and all of those gimmicky little things they do to make ou think you're getting something official and important. I downloaded all the updates, I filled out the registration, I did all the things someone would do when they are an IT professional who wants to be legitimate. After using it for about a year with no issues, including the "Windows Genuine Advantage" thingy which used to think I has a legitimate copy.

Today, after a long hiatus, my laptop was booted and it informed me that Windows Genuine Advantage had changed its mind. Warning boxes were popping up left and right, and graciously giving me the opportunity to "purchase genuine Windows". You know what? I already did. It was shrink wrapped, and had so many gimmicky little security things that it was gaudy. And now you want to give me an opportunity to do it again? No thanks. From a quick Google search it looks like I'm not the first person to be annoyed.

I did notice that my system clock was really goofed up, and I've heard that the validation process involves hardware checks, so maybe something in my configuration triggered it. I don't know, but frankly I don't care. I don't want to know why it happened. I'm going back to a world that doesn't include helpful paper clips and other rediculous instantiations of a help system.

Since my battery is nearly dead, I've decided not to worry about ACPI. Windows is being scrapped tonight and I'm going to either run Ubuntu or Fedora linux. I'm not crazy about Solaris on the desktop because it has less standard productivity software, and the updates for non-Solaris software are not convenient. I'm the #1 fan for servers, but on the desktop I'm a Linux guy until I can afford a PowerBook or MacBook.

I'll post more about the final choice I make, but I felt it necessary to document this eve of liberation for all to see. And now I must end this post as I have a date with fdisk to catch...

Thursday, August 03, 2006

IM Ruining Grammar?

This one is a bit off theme, but I couldn't resist. Apparently it has been found that the prolonged use of IM does not truly impair one's grammatical ability. Thank the University of Toronto for this piece of knowledge...

Are you serious? It's not grammar that gets hurt when IM is abused. It's one's social skills. The article I mention above is really talking about kids who get carried away, but living in a cube farm, I've seen the adult version as well. At some point, we've all been guilty of IM'ing someone close enough to hit with a paper airplane.

Between email, voicemail, wikis, and IM, not to mention remote work, just about everything is driving a wedge between developing the personal relationships that foster good working environments. When I've met someone in person I immediately feel more at ease trusting them for the role they will play in a project.

Body language plays a HUGE role in our ability to communicate effectively, and to dismiss it for the "efficiency" of electronic communication is naive at best. The next time you need to talk to someone, walk to the other side of the building, or schedule the time to drive to their site. You'll be glad you did, and they probably will too.

The demise of Linux

I read a quick editorial which suggests that Ubuntu Linux is going to be the downfall of Red Hat. The premise being that as Red Hat grew more commercial, and abandoned its community in favor of its stockholders, the sys-admins who used Red Hat as their desktop OS drifted away from the Red Hat camp. As Ubuntu came into being, and did so in a very strong way, those SA-types who were driven away from Red Hat will now want to put what they are more familiar with (Ubuntu) on their servers when they have the choice.

This whole discussion brought me right back down memory lane. I started using Linux before Red Hat existed with a few early Slackware distributions. I remember writing all those 3" diskettes - somewhere around 80 of them by the time I burned the X-windows distribution as well. Shortly after came Red Hat, and at that time I was helping to set up the first Linux environment at SUNY Plattsburgh. We switched over from Slackware to Red Hat, and loved it.

I ended up sticking with Red Hat for the next 10 years. It was the OS of choice when I lead a project to build servers for our local Boy Scouts of America council, and remained there until Red Hat went totally commercial, burning the bridges out from under us. Make no mistake, I was extremely disappointed with their decision. We switched over to Fedora Core, and for the most part it has been a smooth transition. Despite its big red "DEVELOPMENT" stamp, Fedora has been very good to our availability. In fact, at the moment we've got an impressive uptime on an FC2 system:

# uptime
21:41:15 up 412 days, 2:37, 2 users, load average: 0.00, 0.00, 0.00

But thinking back on the experience I have an entirely different memory of what direction I was forced in, and where I wanted to be. The places I've worked would not consider using Linux for their production environments. Linux has made some great in roads t the corporate world, but there aren't a lot of Fortune 500 companies running their SAP central instance on Linux. I'm sorry, it's just not happening. HPUX, AIX, and Solaris are king in the land of mission-critical highly scalable UNIX servers.

Although Solaris never made a particularly compelling desktop, it's what I've wanted to use on every server I've ever built. It's rock-solid, well documented, and very cohesive. When you use Solaris and Sun products, you rarely get the impression that 10,000 individual developers all tried to do it "their way" when the final build was cast in stone. What always stopped me was cost. Solaris x86 had pathetic support and commitment in the past. It's so incredibly painful to migrate between operating environments that I never wanted to risk Solaris x86 being yanked - which it was.

The second big barrier was cost. If you went with Sparc, you had to have money. Lots of money. Oodles of money! What I had was a basement full of x86 architecture hardware, and the not-for-profits I volunteer at had the same. There was simply no funding for shiny Sun hardware no matter how badly we wanted it.

And then the sleeping giant awoke. After being pummelled by the dot-com crash, Sun figured out what went wrong, and executed one of the m most amazing feats of corporate intertia changes I've ever seen. In a very short time frame, support for Solaris x86 was restored at a full commitment level. And it was made free. Then they continued to make their Java Enterprise System free to download and use as well.

So, the making of a fantastic Linux in Ubuntu may hurt Red Hat, but it's not what will deliver the killing blow. Red Hat has an opportunity right now to try to pull off a corporate inertia swing of Sun's magnitude. They need to restore faith in the community restore the religion they destroyed and find some kind of innovation to draw people back in. Solaris has done all of this and created an affordable support model that doesn't intimidate the small businesses who were once driven to Linux.

The first blood has been drawn by Solaris, but the second wound is far deeper. This second wound is bleeding internally and missing a lot of coverage. Mac OS-X is the killer desktop. If you have a reason to be using UNIX on a desktop, then using anything other than Mac OS-X is a tough sell in my book. Hardware is a bit more expensive, sure. But it's the best of every world, and solid as a rock. It doesn't hurt that it looks great either.

A recent seminar I attended talked about business models and knowing when to have the guts to drop a design. The idea was that you need to look at things you're developing and ask whether or not they give you a long-term sustainable advantage. I have to use that same litmus test to examine Linux. In the server world I can get free and open Solaris which is out-innovating Linux in my observation. And on the desktop, while Linux continues to improve, it's not even close to Mac OS-X.

In the end, these observations mean little to the tech-hobbyist who loves Linux for its religion. But in the business world, religion doesn't make IT choices. Competitive advantage does.