Tuesday, May 30, 2006

Testing for correct usage in shell functions

Here's a simple touch you can apply to your shell scripts to aid in debugging when they grow to become monstrous and you can't remember the syntax of all your subroutines any better than you can remember the 10th digit in Pi, which happens to be 3 for those who care about such things.

Although not strictly required to take advantage of this tweak, I recommend you begin by using good headers for each subroutine. I won't go into each one, but a specific entry I always make is usage. For example, if a subroutine do_foo takes arguments arg_one and arg_two, the header would look like this:

# ------
# do_foo
# ------
# USE: doo_foo ARG_ONE ARG_TWO
# DESC: Execute foo functionality
# PRE: na
# POST: na
# ERR: na
foo () {
    ...
    ...
} #end do_foo


The line I want you to pay attention to in the above code begins with "USE:" (4th line). This line specifies the interface which a user of your code should be aware of. You are telling them that this code expects TWO arguments. Now, you can get fancy and use EBNF like syntax to identify optional arguments, but let's keep it simple for this example and just recognize that we have established an interface.

What can we do as a developer to make sure that when someone calls our code, they do not get something unexpected? We can check to make sure they follow our instructions. It's simple enough, although you can certainly take it greater depths. Let's go back to our do_foo example and put a check in place...

foo () {
    test $# -eq 2 || exit 1
    ...
    ...
} #end do_foo


Let's break down the line I just added... test lives in /usr/bin and should be a fluent part of your shell vocabulary. We are "testing" to see if the number of arguments ($#) is equal to the integer 2. If not (symbolized by ||) then we exit with non-zero status, which is the UNIX convention for something other than success. The next level of effort would include writing a shell equivalent to Perl's die subroutine. This would allow an error message to accompany the exit. We'll save that for another article.

So, what's the benefit of adding this code-bloat to our subroutine? It's common to have a function that uses optional arguments and acts differently depending on what arguments it receives. If the function expects ARG_ONE and ARG_TWO, and you call it with only ARG_ONE, it may assume that ARG_TWO is equal to "". In that case, the output may be "object not found" rather then "Whoa! You made a mistake calling me!". If you were depending on a specific output, this could cause later code blocks to break.

Here's a more specific example. If we are using the ldaplist command to check on project information, we will get two totally different sets of output if we omit a second argument. Pay particular attention to the command and arguments in the examples below:

testbox# ldaplist project
dn: solarisprojectname=srs,ou=projects,dc=mydomain,dc=com
dn: solarisprojectname=bar,ou=projects,dc=mydomain,dc=com
dn: solarisprojectname=foo,ou=projects,dc=mydomain,dc=com
dn: solarisprojectname=group.staff,ou=projects,dc=mydomain,dc=com
dn: solarisprojectname=default,ou=projects,dc=mydomain,dc=com
dn: solarisprojectname=noproject,ou=projects,dc=mydomain,dc=com
dn: solarisprojectname=user.root,ou=projects,dc=mydomain,dc=com


In contrast, what we REALLY wanted was only one line that matches our criteria, not the whole set of data.

testbox# ldaplist project solarisprojectname=user.root
dn: solarisprojectname=user.root,ou=projects,dc=mydomain,dc=com


If we use an argument checker, the error woudl be caught immediately rather than passing on a long list of irrelevant data to whatever we do next. In this case it's particularly ugly because both outputs are identically formatted. Maybe you'd find the problem quickly, maybe you wouldn't.

When your code gets to be hundreds of lines long and you need to start debugging obscure behavior, it can save you a lot of time to write self-policing code. Chances are that if you make a simple mistake calling that subroutine it will fail immediately rather than doing the wrong thing in a hard to find way. A line of prevention is worth an hour of debugging!

Thursday, May 25, 2006

Using syslog with Perl

I recently had an occasion to write a fairly simple Perl script that checks for rhosts files in any home directory which is configured on a system. Nothing fancy, but very useful. After getting through the file detection logic I was left with the question, what now? Should I write a custom log file? Should I call /usr/bin/logger?

As always, I looked for precedents and standard facilities. The first thing that came to mind was syslog. And of course, the fact that I was using Perl led me to believe that I wasn't going to need to execute an external process (the "duct tape hack" as I call it). I view the shell as another language, and something never really feels right when I need to embed one language within another. Don't even get me started about embedding big awk scripts inside shell scripts... That's going to be a future topic.

The duct tape method is bad for a number of reasons. There is overhead associated with forking and executing a new child process from your main script. If you are running awk and sed, or other tools thousands or millions of times against a file then you are forcing Solaris to execute far more system calls than necessary. By keeping it all inside Perl and using modules, you can let the interpreter do the work, and realize a good part of the efficiency that C system programming gives you. I'll save the specifics of this for a later time - we need to dig into the syslog example.

In this case I quickly found the standard Sys::Syslog module. This little gem makes it a snap to log output. I won't go into the Solaris syslog facility here, but suffice it to say that you'll need to arrive at your intended Facility and Priority before going farther. For my purposes I went with User and LOG_NOTICE.

To begin with, we need to include some libraries...

use Sys::Syslog;


When we want to set up the connection with syslog we do the following:

openlog($progname, 'pid', 'user');


The above line specifies that we will use the 'user' facility, which is typically what you should be using if you don't have a specific reason to go with one of the other options. It also specifies that we want to log the pid of the logging process with each entry. Logging the pid is a convention that isn't always necessary, but I like it. The first part, $progname is a variable that stores the name of the script. This deserves a little extra attention.

Since I'm known to change the name of my scripts on occasion I don't like to hard code the name. In shell scripts I usually set a progname variable using /usr/bin/basename with the $0 argument. $0 always contains the first element in the array of command line variables. So, if I called a script named foo with the arguments one, two, three, the command would look something like this:

# /home/me/foo one two three

The resulting array $* would be:

[0:/home/me/foo][1:one][2:two][3:three]

To identify our program name we want the first array element. However, we don't want all that extra garbage of the path. It makes for a messy syslog. The basename UNIX utility helps us to prune the entry. Here's an example in shell:

$ basename /home/me/foo
foo

If we want to do the equivalent in Perl without spawning an external process we can use the File::Basename module. Again, with a simple include at the top of our script this function becomes available to us:

use File::Basename;

Now we can put it all together and create an easily referenced identity check:

my $progname=basename("$0");


Why don't we just hard code the script name? After all, not everyone likes to refactor their code for fun. Besides the idea that we want our code to be maintenance free, there are times when one set of code may be called from links which have different names than the primary body. For example, let's assume that the script foo performs three functions: geta, getb, and getc. To make it easier to call these functions we want to be able to call these directly without duplicating code. Here's how we could do that:

# ls -l ~/bin
-r-xr-xr-x 1 root root 5256 Jun 8 2004 /usr/local/bin/foo
# ln ~/bin/foo ~/bin/geta
# ln ~/bin/foo ~/bin/getb
# ln ~/bin/foo ~/bin/getc

We can now call any of geta,getb,getc and actually call foo. With some simple logic blocks based on what $programe evaluates to we are able to create a convenient interface to a multi-functional program with centralized code. Nice! But I digress - let's get back to looking at syslog...

We have opened a connection to the syslog, and now is the moment of truth. Let's write a syslog entry...

syslog($priority, $msg);


Let's recap... I used a facility of user, and a priority of notice. I want to record the pid, and write a message. What does this look like when its executed?

May 25 11:01:25 testbox rhostck[833]: rhosts file found at /u01/home/cgh


That was really easy, and it's much cleaner than executing the external logger utility because it's all inside Perl.

Tuesday, May 23, 2006

A plethora of ldapsearches...

If you're going to deploy a directory service for Solaris systems, and you are really lucky, your server and clients will all be using a Solaris version greater than 9. LDAP works nicely in 9, but it's a bit of a transition release. Only in Solaris 10 is Sun's commitment to LDAP clear. Let's take a look at one of the more frustrating examples of Solaris 9's transitionary status: The ldapsearch command.

ldapsearch comes in many different flavors. First is the native Solaris version which lives in /usr/bin. On Solaris 9 this version does not support SSL (-Z option). In Solaris 10 SSL is nicely supported through this client. Next we have the iPlanet flavor which lives in a dark and gloomy path: /usr/iplanet/ds5/shared/bin. This is installed by default with Solaris 9 and happily supports SSL despite its gloomy path. But wait, there's still one more! After installing the JES Directory Server you will find one more flavor of ldapsearch living in /usr/sadm/mps/admin/v5.2/shared/bin. Now that's an intuitive path. This last flavor will only be on your server, but I'd hate to leave it out of the fun.

As if having too many to choose from isn't enough, two of the ldapsearch flavors require proper setting of the LD_LIBRARY_PATH variable. When a dynamically linked binary requires a library that lives somewhere other than the system default (usually /usr/lib variants) it needs the LD_LIBRARY_PATH variable to tell it where to look.

Here's an example of a binary that needs the extra help from LD_LIBRARY_PATH:

testbox$ /usr/sadm/mps/admin/v5.2/shared/bin/ldapsearch
ld.so.1: ldapsearch: fatal: libldap50.so: open failed: No such file or directory
Killed

So what happened? Let's take a closer look...

testbox$ truss /usr/sadm/mps/admin/v5.2/shared/bin/ldapsearch
execve("/usr/sadm/mps/admin/v5.2/shared/bin/ldapsearch", 0xFFBFFB54, 0xFFBFFB5C) argc = 1
resolvepath("/usr/lib/ld.so.1", "/usr/lib/ld.so.1", 1023) = 16
resolvepath("/usr/sadm/mps/admin/v5.2/shared/bin/ldapsearch", "/usr/sadm/mps/admin/v5.2/shared/bin/ldapsearch", 1023) = 46
stat("/usr/sadm/mps/admin/v5.2/shared/bin/ldapsearch", 0xFFBFF928) = 0
open("/var/ld/ld.config", O_RDONLY) Err#2 ENOENT
stat("../libldap50.so", 0xFFBFF430) Err#2 ENOENT
stat("../lib/libldap50.so", 0xFFBFF430) Err#2 ENOENT
stat("../../lib/libldap50.so", 0xFFBFF430) Err#2 ENOENT
stat("../../../lib/libldap50.so", 0xFFBFF430) Err#2 ENOENT
stat("../../../../lib/libldap50.so", 0xFFBFF430) Err#2 ENOENT
stat("../lib-private/libldap50.so", 0xFFBFF430) Err#2 ENOENT
stat("/usr/lib/libldap50.so", 0xFFBFF430) Err#2 ENOENT
ld.so.1: ldapsearch: fatal: libldap50.so: open failed: No such file or directory
write(2, " l d . s o . 1 : l d a".., 81) = 81
lwp_self() = 1

Here we can see Solaris trying to find the required dynamically linked library, libldap50.so. It traverses 8 directories, each time returning the ENOENT key which intuitively means "ERROR - No entity found". So, job #1 is finding that library and acquainting it with the binary that's lost its way...

testbox$ grep libldap50.so /var/sadm/install/contents
/usr/appserver/lib/libldap50.so f none 0755 root bin 380348 45505 1052289104 SUNWasu
/usr/dt/appconfig/SUNWns/libldap50.so f none 0755 root sys 450716 23095 1032825102 SUNWnsb
/usr/iplanet/ds5/lib/libldap50.so f none 0755 root bin 361976 55632 1013353620 IPLTdsu
/usr/lib/mps/libldap50.so f none 0755 root bin 392416 44988 1100692806 SUNWldk
/usr/lib/mps/sparcv9/libldap50.so f none 0755 root bin 433976 29179 1100692807 SUNWldkx

In this case, we know that the needed library is going to be used with the JES ldapsearch, so we'll guess that appserver's offering isn't quite what we want. /usr/iplanet looks tempting, and will probably work, but what we want is the /usr/lib/mps directory which is distributed with the Sun LDAP C SDK.

So now that we've found the missing library, let's plug it into the LD_LIBRARY_PATH and see what happens. I'm using the Korn shell, so if you're a C-Shell type you'll just have translate on the fly.

testbox$ export LD_LIBRARY_PATH=/usr/lib/mps:/usr/lib/mps/sasl2
testbox$ sudo /usr/sadm/mps/admin/v5.2/shared/bin/ldapsearch [...]
version: 1
dn: dc=foo,dc=com
objectClass: top
objectClass: domain
objectClass: nisDomainObject
dc: apps
nisDomain: foo.com

It worked! (You didn't doubt me did you?) You may have noticed that I actually added two paths. After fixing the first missing library you would have dicovered a second missing one which was identified and fixed the same way. I love a problem with multiple layers... Especially when layer #2 is the same solution I needed to peel layer #1. The other thing to note is that I abridged the command line for ldapsearch. Executing an ldap query with SSL can be like writing a book so I cut it short.

So, not only do you need to pick the right ldapsearch flavor, but you also need to set LD_LIBRARY_PATH accordingly. If you are using the Solaris native versions you don't need to do anything. But for JES and iPlanet verions, here's what you need:

  • iPlanet: LD_LIBRARY_PATH=/usr/lib/mps

  • JES: LD_LIBRARY_PATH=/usr/lib/mps:/usr/lib/mps/sasl2


So which one should you use? Here's a quick flow to make that decision. If you are using Solaris 10, just go with /usr/bin/ldapsearch. It does everything without any hassle. If you are on 9, then a decision emerges. If you have an SSL-secured directory server you can not use /usr/bin/ldapsearch. Typically, you will use the iPlanet version on Solaris 9, and if you are on the server itself, go with the JES version.

So there you have it, a lot of hassle can be saved by deploying on Solaris 10 rather than 9. Most of what you'll need the Directory for will be handled by Solaris internals, so you won't need ldapsearch, for example, to authenticate users against the Directory Server. Where you will need ldapsearch is if you are storing custom entries in the directory, or executing a special query against it.

Monday, May 22, 2006

Solaris Naming Services: The good, the bad, and the ugly

One of the first things that needs to be done in streamlining an Enterprise is establishing a centralized naming service. Of course there are some fringe reasons not to, but the bottom line is that the IT world is continuously being asked to do more with less. If you need to manually update the passwords for 100 users on 100 systems once each month you are either a great candidate for life in a Skinner box or you are failing to do more with less.

My experience has always been that the risk of someone propagating a catastrophic change across a data center is much lower than that of someone making a subtle manual error that goes undetected for months until it rears its ugly head. When you know that your finger hovers over the big red button that controls the fate of your buddy's on-call pager you get religion fast and become very careful about what you propagate. Centralization and automation are good. Manual effort is bad. So, we've established that we want to do this. So, what is a Solaris site to do next?

There are a few naming services to choose from. In our case, common choices include: File propagation, NIS, NIS+, and LDAP. Within LDAP there are a few different implementations now that the industry has begun to move in that direction. Microsoft's Active Directory, OpenLDAP, and Sun's Java Enterprise System Directory Server. Again, going by experience, the best path is usually the one which aligns with your primary vendor's offerings. If you're running a Windows shop, use AD. If you're running a Solaris shop, use Sun's Directory Server. If you run both, it's a different question altogether, but I would be inclined to create a layered solution involving both AD and Sun DS. But, I digress. Let's take a look at the options...

File propagation isn't as bad as it sounds unless you are doing it manually. There are many bolt-on solutions for implementing the functionality of rdist. It can be done over SSH, and with a surprising degree of control. You can also use SSH with some script-fu to push scripts to be executed, or use a product like CF Engine. While not a bad solution, it tends to be a bit heavier in the maintenance and integration time. It's also somewhat limited in that it can push, but you aren't really centralizing. Systems inevitably get left out or miss pushes which results in more custom coding to make queues, and pretty soon you're on your way to reinventing a management framework. Ugh. File pushing is a good solution within its scope, but I'm not a fan when it comes to a general centralized administration scheme.

NIS is a grand old favorite. The environment I cut my teeth on included a Sparc 10 workstation hooked up to a QIC tape drive which acted as a NIS server for a building full of Sparc IPX workstations. I found it fairly easy to manage, rock solid, and generally a great technology. Unfortunately, it also included a whole new world of security gaps that while acceptable for a non critical workstation environment were totally inappropriate to a world class data center housing mission-critical data. Even if it weren't inherently insecure, Sun announced the EOL of NIS, so the end is near. Given that, I suggest that NIS is probably not the best technology to invest in these days although Linux has a good implementation that will allow it to continue for some time.

Our next option is NIS+. Have you actually supported a NIS+ environment? Yuck. After the joy of NIS I had expected a simple evolution, but it felt more like assuming that if you speak English you can speak Russian. While it fixed the gaps in NIS security, it was always annoying trying to figure out which key was broken and how to get it initialized again. Although Sun has not EOL'd NIS+ yet, it's clear that it is not a technology whose momentum is being maintained. The writing is on the wall. NIS+ again proves to be a non starter.

Finally we arrive at LDAP, and for the reasons above, We look at Sun's Directory Server. I'm currently working on a Directory infrastructure for a large site, and have been both impressed and frustrated with the product. I'll be explaining this dichotomy in more depth in future installments, but a few months into the project, I'm confident that I made the right recommendation.

Why? First of all the updates for this product come from the same place all the other Sun updates come through. Trying to keep track of multiple product update sites is a non-value-add activity. Second, LDAP is insanely flexible. You can store just about anything in a directory, and access it via shell script, OS client, Perl script, and more. Why is this cool? Its much easier than storing anything in an Oracle database because you won't need to gain a new certification in relational databases to take advantage of this new piece of your infrastructure. A systems engineer or advanced admin can do everything they need. Third, DS is part of the Java Enterprise System. JES is a stack of middleware products which are all nicely integrated with the OS and support. By giving us a stack of common and world-class components Sun has given systems engineers the opportunity to start including a deeper application awareness into their scope with an easy product to standardize on. And the final reason to go with DS and JES: It's free to download and use. Before you put it into production, USE it. Put it on your x86 machines, put it on your Sparc machines. Hack it, tweak it, learn it, know it, master it before you go live. Sun has just leveled the playing field. Now you can learn a world class product without capitol investment because Sun's on-line documentation is second to none. One last note: After you've enjoyed Sun's generous new distribution model, buy a production support contract or JES subscription. Don't go into production with unsupported software. You know better, and your customers expect that you'll advise them to do the right thing. Besides, I'm sure you didn't go into production with an unsupported Linux server, right? After all, you were too smart to fall for that whole "Linux is free" story weren't you? Nothing is free.

And for my final act I'd like to mention the paradox that continues to twist my mind into little pretzels. NIS and NIS+ were OS integrated components that were free components of a great Operating Environment. Sun has always been about the large scale net infrastructure, and the integrated NIS servers echoed that intent. With the JES Directory Server Sun has gone beyond the site-scale naming services with a product that can store just about anything and scale it to your wildest dreams. Think consolidation of eBay and Amazon identities and DS isn't breaking a sweat - it's that good. But in the mean time, there's a lot of sites that need a centralized naming service for their tightly secured data centers that will never share identities with the outside world, and never cross 1000 users. That's like having a Ferrari that never gets out of first gear.

What fills the gap between 100 and infinity? Fortunately, from a technical perspective JES DS scales in nice increments, and can deployed for a single site's centralized naming services on reasonably sized hardware. But there's a catch: Support will cost you. Even if all you are doing is Solaris native client support, the DS costs money to obtain a software support contract. Sun, what are you thinking here? Kill NIS and NIS+, tell everyone to go to LDAP, and then neglect to include it in the Operating System support? Crazy! Native OS support should be included with Solaris just as Active Directory is free with the Windows Server support. Charge us more money when we ask you for support outside native clients, but don't' make us pay for what you provide for free with NIS+.

JES is a great product, I'm glad we're using it in our project, but I was really disappointed when I discovered this Dilbertian marketing twist. There's still hope; SRM used to be an expensive add-on that's now a part of the OS. Given enough time, Sun usually does the right thing.

In the beginning there was SunOS...

I'm enjoying a renewal of enthusiasm for Solaris, and this posting marks the beginning of a blog tracking the things I'm working on and exploring. I've use Solaris professionally for more than ten years from both Systems Administration and Systems Engineering perspectives. Six of those years I spent working in various roles at Sun Microsystems; a journey I'm grateful for as I learned more in those years than I would have elsewhere.

For a significant portion of those six years I was increasingly frustrated by how much ground Linux made on Solaris' lead. Linux has its place, and I'm certainly not anti-Linux or anti-OSS. In fact, quite the opposite. I've been tinkering with Linux since 1992, and use it regularly at not-for-profit site I volunteer at. For many years it was primary workstation (Fedora / Red Hat) so I've spent years observing Linux in both a server and workstation role.

I'm no longer using the Linux workstation which served me the past few years. I've fallen in love with Mac OS-X and never looked back. I don't expect to write much about Mac OS because in itself it doesn't stand out. That's the whole point - after using my Mac I talk more about what I've done than how I did it. Interesting paradigm shift.

With the advent of Solaris 10 I firmly believe the Solaris has taken another evolutionary leap which seem to leverage the best of both Open and Closed source development models. I think this evolution is important as it shows what can be done when extremism is curbed for practicality. Solaris 10 found a sweet spot which I hope to write more about.

I'm now working as a consultant for a large UNIX shop in my area and working on some exciting technologies. My primary interest is in large scale UNIX infrastructure, so expect to see a lot of content focused on things like Directory Server, Consolidation, Automation, Resource Management, RBAC, and much more.

So there you have it... A brief introduction to me, and the intended content of this blog. I've been notoriously lousy at keeping blogs up to date, but not for lack of good intent.