Friday, October 20, 2006

Packaging delinquency in 3rd party software

We're working hard at the moment on migrating unmanaged scripts and solutions into Solaris pkgadd format, a.k.a. "packages". This has a number of benefits. First, it avoids the need to have complicated manual installation routines; By including preinstall / postinstall scriptsinstallation can be automated. Second, it is easy to know what revision of a software environment is installed right down to the installation process. Third, it ensures that not only can I ensure software is installed with the ight attributes and in the right places, but also that I can validate at a later time things are still as intended using pkgchk. I refer to this as managed files vs. unmanaged files.

The easy part was packaging our custom site scripts. We standardized on a hierarchy under /opt which contained bin, man, etc, lib, and sbin subdirectories. We then inventoried the unmanaged scripts and determined which were still valid and which could be discarded. The "keepers" were then incorperated into packages which were fairly simple overall.

Phase two has been an asessment of what unmanaged files for third party applications are being deployed. We found a lot of opportunity here and decided to start with the utility software like monitoring, security, and other non-revenue generating software. Here is where we have been uncovering nothing less than a mess.

For some reason, third party software providers in the UNIX space seem determined to make it impossible to manage their files. We've seen many interesting perversons of best practices that I thought woudl be interesting to collect in one place.

One product choose to adopt a package management solution called LSM, which I believe stands for Linux Software Manager. Note that this solution is for Solaris which has a perfectly good vendor provided and supported standard for software management. It turns out to be quite a technical feat to reversen engineer the format of LSM and directly convert to packaged.

Another product did not use any software management, but went so far as to encrypt their pre-installation bundles so as to make it impossible to install via a standards-based management system. This really blew my mind. What could be of such critical intellectual property in an installation routine that it justified encryption? And wouldn't the real IP be available once the software was installed anyway?

We routinely encounter software that has highly interactive installation processes that become cumbersome to integrate into packages because on each new release the routines would need to be re-ported to pre/post install scripts. The idea of managing software is to reduce work - not increase maintenance.

The biggest thorn in our side is Oracle. It's deployed everywhere in our environment and is intalled manually each time because we're told that's just how its done, and from observation it seems to be in the PITA bucket as far as automation goes. Contrast this to PostgreSQL which as of Solaris 10 (6/06) is integrated into the Operating Environment in clean packages.

So here's a message to all you third party software developers who provide Solaris solutions: Sun publishes an excellent guide to software packaging that any reasonably technical person could use to master the process in a few hours. Let me summarize a few key points in advance:

(1) You don't need to have a conversation to install software. Just copy the files, then configure it later.

(2) Sometimes I don't want to have a conversation. Let me put the answers in a file and feed that instead.

(3) Some of your customers have too many systems to install manually.

(4) Don't use a non-standard solution when the OS vendor provides a perfectly usable solution.

Thursday, October 19, 2006

To err is human

I am writing this post as a catharsis and purification. A centering of my spiritual engineering energy that may otherwise be out of balance. Three days ago I made a typo which eliminated the /etc directory on a fairly important server. It was amazing how long that server continued to plug away after being lobatomized. Let me take you through the story as I relive the moment, and ensure that I learn from it.

Like many of the tasks I juggle, this was to be a short time-slice effort. I needed a distraction from a longer term project, and wanted to bite off a small piece of something that didn't require significant thought. Part of our Jumpstart environment deploys a tar archive to the client which is later unpacked and massaged by a custom script. My task was to eliminate the usr/local/etc directory from that archive and than recreate it. As my fingers systematically hit the keys, one extraneous finger made its imprint on the keyboard.

"r" "m" "-" "r" "." "/" "e" "t" "c".

The world slowed down as my finger hit enter, and I felt my heart stop beating. I believe I actually flat-lined that morning. Could it be? Had I really deleted /etc? Yes. I had. The command I entered was: "rm -r . /etc". I removed the current working directory and the server's /etc directory.

Why was I using elevated privileges for mundane work? The tarball had root-owned files in it. This is a downfall of our approach at the moment. When using pkgadd format, anyone can own the files which are given attributes at installation time. This makes day to day maintenance much safer. Ironically, I was editing the archive because I had just created a package to replace the files I was deleting. It was almost as if the prior bad practice were vomiting on me as I excercised it from the server.

Fortunately we had an excellent SA on hand to boot from CD and restore the missing file system, and it was back in business a relatively short while later. Eningeering nad operations are segregated in duties at my current site, so I was unable to clean up my own mess. A very humbling experience indeed, and this is what it taught me:

(1) Mirrored operating system disks are a good thing, but they don't protect you from human error propogating mistakes across both disks. While I've been a bit critical of maintaining a third contingency disk, there are other similar solutions which I have a heightened respect for.

(2) Whenever executing commands using RBAC, sudo, or the root account, count to three before hitting enter. No matter how much longer it takes to get your work done, no matter how good you are with UNIX, and no moatter how long it has been since you made a mistake, counting to three will always be quicker than restoring a file system from tape.

Wednesday, October 18, 2006

Sun loves Oracle, Sun loves PostgreSQL

Sun and Oracle have announced they will work together for another ten years. Not only that, but there's a new bundle in town that includes Oracle Enterprise with Sun servers. I haven't exactly figured out what it means to have software included for free that will require a support contract; Is that still free? But there were words indicating that processor count may not be relevant and that's probably where the savings lie. Maybe it's just saving you download time?

I don't really care about the pricing because big companies don't seem to hesitate to throw down dollars for Oracle licensing. What made this interesting to me was that Solaris, as of the 6/06 update now includes PostgreSQL natively - and there's no catches there. If you perform a "full distribution" install you already have an RDBMS. What's more, if you want to take that database into the critical waters of the production pool Sun will offer their world-class software support which means the company that knows their own operating system better than onyone else will also know the RDBMS sitting on top of it. Tres chique, n'est pas?

I'd imagine with Oracle's market share Sun has to play nice in the short term, but I give them a lot of credit for including PostgreSQL and picking a side. Right or wrong, in the age of mediocrity they made a decision. PostgreSQL is a phenominal database that competes aggressively with Oracle in many venues.

I'm fascinated with what the future will hold for relational databases on Solaris now that Sun has picked a side. This isn't just another open source database running on Linux farms - this battle will take place in the big data centers that Linux is just starting to scratch the surface of. I love Linux as much as the next guy, but how many sites do you know of running systems as large as an Enterprise 25K with Linux under the hood? Not too many - it's not in the heritage of Linux kernel - at least not yet.

So, where will this take Postgres? Methinks Oracle had better keep close tabs on Postgres over the next five years or so.