Wednesday, February 28, 2007

JET: Controlling custom_files with a custom extension

Any site running Sun hardware with more than one system should be looking at JumpStart to ensure that systems can be rebuilt consistently. the corollary to this is that any site running JumpStart environment should be using Sun's Jumpstart Enterprise Toolkit (JET). JET provides a consistent framemwork for accomplishing most common tasks, and a consistent framework to write extensions within. Standards and discipline are good.

One of the modules which comes with JET is called simly enough, custom. The custom module allows you specify either packages or files which should be added to a server during any of N predetermined reboots. This allows you to ensure that a change which requires a reboot can be made prior to a dependent process being started. Sounds good so far.

Following a recent Solaris 9 server build I was perusing the system for problems by auditing log files. In the messages file I discovered some lines indicating that a Kerberos problem was rearing its ugly head:
Kerberos mechanism library initialization error: No profile file open.
Our site does not use Kerberos, so it had to be a recent configuration change - not surprising considering we had just updated the patch set. After some research I arrived at BugID 5020096. This bug indicates that the issue can be resolved by removing some offending lines from /etc/krb5/krb5.conf.

This should be easy enough to fix in future builds. Just add the modified krb5.conf to the JET template's custom_files variable, and we'll be in good shape. Ahh, not so fast. How will we know what the file originally contained? A true Solaris Jedi will always manage an audit trail of his activities. If I were making the change manually I would copy the file to file.orig, or file.datestamp. Automation is not an excuse for abandoning discipline.

The trouble with JET is that its custom module's functionality for installing files is limited to two operations: overwrite or append. Overwrite simply clobbers any file which may exist. For example, to install the /etc/motd file I would palce my custom file in the configured JET file location, then add a line like this to the JET template:
custom_files_1="motd:o:/etc/motd"

motd is a fairly harmless little file, but knowing little about Kerberos, I dind't want to blindly whack the original file. The right solution to this problem lies in creating a simple extension to the JET toolkit. I began by examining the code from the custom module. Two modules specifically are relevant to this project: install, and postinstall. Within them is a simple case statement which handles the "o" or "a" functionality:

case ${mode} in
a) case ${fn2} in
/etc/hosts) JS_merge_hosts ${filefound};;
*) JS_cat ${filefound} ${ROOTDIR}${fn2}
;;
esac;;
o) JS_cp ${filefound} ${ROOTDIR}${fn2};;
esac


So, when I use an "o" in my custom_files module, it called JS_cp. I now needed to find the library which contains these core functions. Eventually, a colleague and I traced it back to /opt/SUNWjet/utils/lib. Looking at the JS_cp function revealed exactly what I expected: a simple copy routine wrapped in some voodoo.

Feeling a bt optimistic, I copied JS_cp to JS_cp_preserve and modified the code a bit so it would first check to see if the destination file exists, and if so, backup the file with a datestamp. Once the backup was in place, the original copy operation was performed. This was very trivial shell scripting. Here's what I ended up with:

if [ "$#" != "2" ]; then
JS_error "`basename $0`: Illegal Arguments. Usage: "
fi

JS_FROM=$1
JS_TO=$2

JS_display "Copying file `echo ${JS_FROM} | sed -e \"s?^${SI_CONFIG_DIR}/??\"` to ${JS_TO}"

if [ -f ${JS_TO} ] ; then
datestamp="`/usr/bin/date +%Y%m%d`"
/bin/cp -p ${JS_TO} ${JS_TO}.jet.${datestamp}
case $? in
0) # Success
JS_display "Successfully preserved ${JS_TO}.jet.${datestamp}"
;;
1) # Failure
JS_display "WARNING: Failed to preserve original file ${JS_TO}"
;;
esac
fi

/bin/cp -p ${JS_FROM} ${JS_TO}

if [ "$?" != "0" ]; then
JS_error "JS_cp:\t\tError occured while copying ${JS_FROM} to ${JS_TO}"
fi


Next, I returned to the install and postinstall code, and modified the case statements to accept a "b" operation (b for backup). I then executed a test Jump and was very pleased to see my JET extension had worked! I can now have custom_files install the workaround krb5.conf, and maintain a backup of the original. Here's the modified code:

case ${mode} in
a) case ${fn2} in
/etc/hosts) JS_merge_hosts ${filefound};;
*) JS_cat ${filefound} ${ROOTDIR}${fn2};;
esac;;
o) JS_cp ${filefound} ${ROOTDIR}${fn2}
b) JS_cp_preserve ${filefound} ${ROOTDIR}${fn2};;
esac


Note that you need to make this modification in both /opt/SUNWjet/Products/custom/isntall and postinstall.

Now, all I need to do it specify something in the custom_files module like this:
custom_files_N="krb5.workaround:b:/etc/krb5/krb5.conf"

And I will get a clean backup of the original file. Such a simple tweak - I hope the Sun folks who maintain JET will add something similar. While some limitations of JET can be frustrating, its intuitive layout and ease of extension make it something I grow more fond of each time I use it.

Thursday, February 01, 2007

Frustration with Solaris Packages

I have a love / hate relationship with Solaris packaging. When you need to crank out a simple package I find it much easier to deal with than RPM. I also like the simple file system or streams based stucture vs. the binary mode of RPM. All things considered, it gets the job done, and has been a tremendous help in standardizing our provisioning system. There are, however, a few things that I'm not crazy about.

In Sun's model, the package is used to release functionality while the patch is a vehicle to fix existing functionality. If I want to add feature X to my software, I need to release a new package version. In contrast, if feature X is broken in a package then I need to release a patch. Seems simple on the surface.

One place this model gets sketchy is if I have the following situation: Package FOO needs to be updated to a new revision, but package BAR depends on it, and is required for system operations. In this case I need to first remove package BAR, then update package FOO, and finally, reinstall package BAR. In my mind this causes an unjustified level of system disruption. An RPM or dpkg based system would use an update option to perform this in-place. I'm told that there's an "in place upgrade" capability in the Solaris packaging system, but I haven't yet discovered it or found it documented. I will be looking though.

I have also noticed documentation gaps in the use of patches. Sun does provide instructions on how to produce a patch-package, but they omit naming conventions. Clearly, it would be a bad thing to produce package 123456-01 and then have Sun release the same one. This conflict could be very disruptive to a patch process. It seems that by selecting an upper range (ie 90001-01) you can have safety similar to selecting a 10.0.0.0 network address. I'd feel quite a bit better if Sun woudl explicitly define this range so we'd know it was safe. In the interim, I've been fixing bugs by creating minor revisions of packages rather than using patches.

The last point I wanted to touch on in this article is the use of package prototypes. In packaging nomenclature, a prototype file is the list of files included in the package, and their ownership and permission attributes. Here's an example of a prototype I'm durrently working for a custom sendmail solution:

d none etc 0755 root sys
d none etc/mail 0755 root mail
f none etc/mail/foo-client-v10sun.cf 0644 root bin
f none etc/mail/foo-server-v10sun.cf 0644 root bin
d none usr 0755 root sys
d none usr/lib 0755 root bin
d none usr/lib/mail 0755 root mail
d none usr/lib/mail/cf 0755 root mail
f none usr/lib/mail/cf/proto.m4 0444 root mail
f none usr/lib/mail/cf/foo.m4 0644 root mail
f none usr/lib/mail/cf/foo-client-v10sun.mc 0644 root mail
f none usr/lib/mail/cf/foo-server-v10sun.mc 0644 root mail


Pay particular attention to what I call placeholder lines. Those are lines in the prototype referring to directories which this package depends on, but are really part of another package by virtue of already being registered. Of course, a directory like /usr is chocked full of nested package dependencies:

# grep ^/usr /var/sadm/install/contents head
/usr d none 0755 root sys FJSVvplu SUNWctlu SUNWcsr TSBWvplu SUNWocfd SUNWncft SUNWGlib SUNWgcmn SUNWGtku SUNWctpls SUNWxwdv SUNWpl5u SUNWcpp FJSVcpc SUNWopl5p FJSVcpcx FJSVmdb FJSVmdbx IPLTadman SUNWowbcp SUNWpamsc SUNWpamsx SUNWpcmcu IPLTdsman SUNWadmj SUNWmcdev SUNWjsnmp SUNWtftp SUNWbsu SUNWpd SUNWsckmu SUNWpdx SUNWpiclh SUNWuxflu SUNWuxfl1 SUNWeurf SUNW1251f SUNWuxfl2 SUNWuxfl4 SUNWuxfle SUNWmgapp SUNWrmui SUNWpiclx SUNWpl5p SUNWTcl SUNWjpg SUNWTiff SUNWTk SUNWaccu SUNWaclg SUNWadmap SUNWpng SUNWpool SUNWpoolx SUNWant SUNWrcmdc SUNWpppd SUNWpppdu SUNWpppdt SUNWpppdx SUNWpppg SUNWfns SUNWsadml SUNWapct SUNWascmn SUNWasac SUNWqosu SUNWjaf SUNWjmail SUNWxsrt SUNWxrgrt SUNWxrpcrt SUNWiqfs SUNWiqjx SUNWiqu SUNWiquc SUNWiqum SUNWjaxp SUNWasu SUNWasdem SUNWrmodu SUNWrmwbx SUNWrpm SUNWrsg SUNWfnsx SUNWrsgx SUNWdfbh SUNWsadmi SUNWi15cs SUNWsadmx SUNWi1cs ... (lines omitted)


That was a tiny fraction of the list...

I don't think there is anything wrong with declaring a package as being dependent on a pre-existing directory, but I have a problem with how easy it is for a new package to overwrite the intende dattributes of that directory. Note that in my custom package's prototype I need to declare the attributes for /usr. This typocally means that I need to look at a clean operating system on the platform I intend to deploy on (ie - consistent Solaris revision) and pick the attributes from there.

I'd like to see the packaging facility accept a prorotype entry that has no attributes, and instead inherit the attributes from the package which initially registered the directory. This would minimize the chances of stray patches and packages conflicting with intended system permissions.

Having spent all this time complaining, let me end on a positive note by reinforcing how much efficiency we have gained by moving from tarballs and custom scripts to version controlled packages. I'd do it again in a heartbeat. I'm hoping Jedi discipline will eventually reverse the chaos inherent to the current packaging architecture.