Thursday, November 20, 2008

Kerberos and the SCSECA Curriculum

I remember when I first took the Network Administrator (SCNA) Exam back in the Solaris 7 days, and I was frustrated by the depth of NIS / NIS+ content. NIS was widely used back in the day, and fairly intuitive. However, NIS+ was a bit of a niche, and its use dropped off like a rock on the Solaris 7 era. I think people really failed to enjoy all those key exchanges and inherant troubleshooting.

Long after NIS and NIS+ services were deprecated by the coming promise of LDAP their place in the curriculum was maintained. But of course, I learned it and passed the exam. Having recently passed the SCNA again for Solaris 10 I was pleased with its content. I was convinced that Sun had brought the canon into the modern era. Good stuff. But just when I thought it was safe...

I'm now finishing up my prep for the Sun Certified Security Administrator (SCSECA) and am finding myself frustrated by the presence of Kerberos on the SCSECA test curriculum.

Will the number of sites using Kerberos please raise their hands? Ah ha! We now know the answer to the question, "What is the sound of one hand clapping?". Ok, it's more than one, I know. It's not very many though... I'm really hoping that when I sit down to the test the questions are written to a depth proportional to the installed base.

I think there's a lot of great content that can be included on a Solaris security exam in place of esoteric solutions like Kerberos. I'd like to see the bulk of the SCSECA content focus on an SA's ability to implement and evaluate impact of the various checks in the CIS Solaris 10 Benchmark. The key of course is "evaluate" more than "implement." I'm amazed at how many people flip through checklists without understanding the implications of these reconfigurations, and I think the SCSECA content is a great opportunity to fix that problem.

But that's ok. I'll brush up on my Kerberos and maintain my historical acumen.

Tuesday, September 23, 2008

Capturing output from format

Ever need to obtain the contents of the format command for other processing in a shell or Perl script? It's fairly simple to do, but the command's behavior is a bit counter-intuitive and makes for an interesting case.

When you run the format command it lists the disks, then issues a prompt asking you to select one of the enumerated devices. It does not provide an option for existing the command at that point. So, we need to appease this interface oddity by passing a "0" to the command, which will arbitraily select the first disk from the list. This should work in any case excepting a diskless client.

The format command looks for its input from a file descriptor known as STDIN, or standard input. The way we queue up entries in STDIN is using the good old echo command. Altogether it looks like this:

root@testbox# /usr/bin/echo 0 | /usr/sbin/format
Searching for disks...

0. c1t0d0
1. c1t1d0

Specify disk (enter its number): selecting c1t0d0
[disk formatted]
/dev/dsk/c1t0d0s0 is part of SVM volume stripe:d10. Please see metaclear(1M).
/dev/dsk/c1t0d0s1 is part of SVM volume stripe:d11. Please see metaclear(1M).
/dev/dsk/c1t0d0s5 is part of SVM volume stripe:d15. Please see metaclear(1M).
/dev/dsk/c1t0d0s7 contains an SVM mdb. Please see metadb(1M).

disk - select a disk
type - select (define) a disk type
partition - select (define) a partition table
current - describe the current disk
format - format and analyze the disk
repair - repair a defective sector
label - write label to the disk
analyze - surface analysis
defect - defect list management
backup - search for backup labels
verify - read and display labels
save - save new disk/partition definitions
inquiry - show vendor, product and revision
volname - set 8-character volume name
! - execute , then return

The problem with this is it captured more than we want in the output. We don't need a menu, and we don't need to know about selecting c1t0d0 since that's already enumerated in the first disk list. To edit this stream of test, we'll need a stream editor... Can you guess what it's called? Sed. Let's modify the command to squelch out some of the noise.

root@testbox# /usr/bin/echo 0 | /usr/sbin/format 2>&1 | sed -e '/^Specify disk/,$d'

0. c1t0d0
1. c1t1d0


That's better!

Google Blog Search - Algorithm Insanity?

This isn't really Solaris related as much as computer science related, but today I experienced a very strange behavior from the great Google. One of my hobbies is archery, and I live in the Rochester, NY area. So, I was searching blogs on Google with the following string: "rochester NY archery". Seems pretty benign, right? Apparently, it's more like looking for Dick's Sporting Goods at

The number one hit for Rochester, NY archery is: "Club Intoxicated Girls". Actually, almost all of them were blog spam hits. That's incredibly frustrating. It's also a bit surprising because in my experience the SPAM heuristics in Gmail are second to none. Interesting times we live in.

Thursday, September 04, 2008

A quick way to check UDP ports on Solaris

Ever need a quick way to check what UDP connections are active on your Solaris server? I recently had to validate a scanner's report that we had an unnecessary service running on UDP port 177. Unfortunately, Solaris does not yet ship with lsof as a standard tool, so it requires the use of netstat(1M).

root# netstat -an -P udp

Local Address Remote Address State
-------------------- -------------------- ----------
*.123 Idle Idle Idle
*.111 Idle
*.* Unbound
*.32771 Idle
Active UNIX domain sockets
Address Type Vnode Conn Local Addr Remote Addr
6001f6c18f8 dgram 6001fa6eb40 00000000 /var/vx/isis/vea_portal
6001f6c1c88 stream-ord 6001f6a4180 00000000 /var/run/.inetd.uds

Not too painful at all. Turns out that scan must have been an intermittent service, or a false-positive because I didn't turn up any trace of it, but it did give me a chance to reacquaint myself with a useful incantation of netstat.

Thursday, August 28, 2008

Solaris available on Dell Servers

I have to admit I was surprised, albeit pleasantly, when I saw a post on indicating that Solaris 10 is now an order option on certain Dell servers.

Of course Solaris has been available on best in class Sun x64 hardware for some time now, but the mainstream world doesn't follow Sun's products in the same way they do Dell. In a sense, I think this going to be a better channel for advertising than revenue, although I really hope its beneficial for both.

There will now be a lot of Dell customers who see Solaris on their order options, and I believe this will make a larger group of consumers think about Solaris where previously they had no occasion to.

Regardless of the outcome, it feels good to see Sun opening up a new channel and I sincerely wish both Dell and Sun success with it.

Tuesday, August 05, 2008

Repairing file permissions: pkgchk -f

I was recently testing a process for repartitioning root disks which requires booting on an alternate disk, then copying and restoring data to the primary disk. I used ufsdump for this because of its excellent handling of some of UFS' nuances. The downside is that if you don't use ufsrestore frequently, you will be asked a nonintuitive question at the end of the operation. Yes, yes, a quick trip to the man pages would have helped. Unfortunately, I was being a bit cavalier at the time, and since it was a lab machine I thought little of it.

Turns out I should have thought a little harder. I ended up restoring data wonderfully, but pretty much toasted the system because all files were owned by root, with group other. Good in some places, not so good in others. Prognosis: rejump the server? Naah.

Sun published a Blueprint way back in 1999 which I think all system administrators should read. Someday this information will save your butt. Repairing File Ownership and Mode by Richard Elling.

I had forgotten about the "-f" option to pkgchk, which is described in this document. This option will attempt to correct any file system attributes such that they align with the registry's entries. This won't help things outside the OS, but it will restore sanity to an OS full of toasted attributes. The recommendation is to boot CD-ROM or network, then mount the root file system on /a, and run a pkgchk -R /a -f. I found that simply booting single-user and running pkgchk -f did the trick. Your mileage may vary.

I don't think there would have been any other practical approach short of re-jumping the box to restore all of the lost attributes, so it is with great enthusiasm that I recommend keeping "pkgchk -f" in your tool bag.

Wednesday, July 30, 2008

nslookup: Rumors of my death have been greatly exaggerated

I'm currently working through the Sun Learning Connection (on-line / web-based training) to review the curriculum for my Sun Certified Network Administrator (SCNA) update examination. I'm a big fan of Sun's web based training as a study tool because it has always done a great job of preparing me for my certifications.

One of the interesting pieces of content I passed through indicated that in Solaris 10 the nslookup command has been deprecated. Dig is now included in Solaris, and according to the WS-3002-S10 course, is the preferred tool for querying DNS information. I remember when this same fascination with dig sped through the Linux distributions I used to use as well. I would type "nslookup _____" and the OS would dutifully reply that I really ought to be using dig, but here's my reply. You know what? I don't need my OS to tell me what I want. I just need it to do what I ask.

Fortunately, despite the menacing overtone of this training curriculum's message, I have yet to find a warning message come out of my Solaris servers. Dig is indeed included in Solaris, which is a great thing. It is certainly a more detailed tool for diagnosing DNS queries, and I'm thrilled to see Solaris inclusion of industry standard DNS tools.

But let's return once again to that hint about deprecating nslookup... Let's say I just want to see what the name service is returning for a given lookup. I'm just looking for right or wrong, not a detailed and cryptic report to gaze through. Here's the dig command and output for a reverse-lookup:

testbox# dig @ -x

; <<>> DiG 9.2.4 <<>> @ -x
;; global options: printcmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 1174
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 0

; IN A

;; AUTHORITY SECTION: 10800 IN SOA 2005010101 3600 1800 6048000 86400

;; Query time: 11 msec
;; WHEN: Wed Jan 12 08:07:30 2005
;; MSG SIZE rcvd: 72

;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR,
id: 1982
;; flags: qr rd ra; QUERY: 1, ANSWER: 1,




;; Query time: 6 msec
;; WHEN: Wed Jan 12 08:07:30 2005
;; MSG SIZE rcvd: 109

Whoa. That was a lot to digest. Now, REALLY QUICK... Go find out what the hostname is for the queried IP. Yeah, sorry, you took too long tracing through all that. Now lets' look at the nslookup approach:

testbox# nslookup
Address: name =

Yep, that's a bit more efficient.

The moral of the story is that UNIX includes many tools, each of which serves a specific purpose it is (usually) optimized for. I'd hate to think that my future basic DNS queries would be serviced by unwieldy dig output. I'm thrilled that if I run into a more serious DNS issue I can call on dig to help me, but replacing nslookup completely with dig would be like replacing gEdit with OpenOffice Writer. The completely wrong philosophy.

To borrow from Mark Twain, "The rumours of nslookup's death are greatly exaggerated!"

Monday, July 07, 2008

Setting Terminal Title

Now that I'm spending a lot of time working on zones I've found myself needing to keep my desktop better organized so I can quickly find the zone and host I need amongst a slew of terminals. I like to keep things simple, so I went with a little shell script that sets the title of a window on demand. Here's what I ended up with:

if [ -x /bin/zonename ]; then
# if we are on a box that supports zones, include zone info in title
/bin/echo "\033]0;`/bin/hostname` [`/bin/zonename`]\007\c"
# handle non-zone platforms by omitting the zone name
/bin/echo "\033]0;`/bin/hostname`\007\c"

This will update the gnome-terminal, or xterm title bar with "hostname [zonename]" on a platform that supports zones (as determined by the presence and executable attribute of /bin/zonename). If a host does not have that executable available and executable (such as pre-Solaris 10) it will simply print the hostname.

True to traditional UNIX' abbreviated nature I named the script stt, short for "set terminal title" and placed in my $HOME/bin directory for convenience. Now when I log in to a host, if I'll be in there for a while I just type 'stt' and my window is properly adorned.

A simple extension of this script would be to include the function in a shell's profile and inject it into the PS1 variable so that it is executed after each command. This would allow the title to update dynamically with each command. Haven't messed with that approach yet as this has scratched my itch quite well.

Tuesday, June 17, 2008

No space left on device? (metainit)

Here comes another rant about error messages. I was rebuilding a server today that uses SVM to manage som SAN storage which gives a home to four very nice Solaris zones. I began by issuing a metainit command to build a concat/stripe device from these two SAN devices...

testbox{lvm}$ sudo metainit -f d100
metainit: testbox: /etc/lvm/ line 72: c4t6006048000018775125753594D433742d0s7: No space left on device

What?!?! I took a quick look at partitioning...

Part Tag Flag Cylinders Size Blocks
0 unassigned wm 0 0 (0/0/0) 0
1 unassigned wm 0 0 (0/0/0) 0
2 backup wu 0 - 56653 25.93GB (56654/0/0) 54387840
3 unassigned wm 1 - 3 1.41MB (3/0/0) 2880
4 unassigned wm 4 - 56653 25.93GB (56650/0/0) 54384000
5 unassigned wm 0 0 (0/0/0) 0
6 unassigned wm 0 0 (0/0/0) 0
7 - wu 0 - 56653 25.93GB (56654/0/0) 54387840

Ok, so the partition exists. What the heck is wrong?

In my absent minded hurry to get this trivial task completed I made an undiscipined assumption that both devices which are to comprise d100 have the same underlying VTOC. It turns out they did not. One of them was set up to use slice 4, and the other slice 7.

So, I issued a quick command to synchronize them using the traditional prtvtoc | fmthard tango, then edited the /etc/lvm/ file to accomodate the s4 slice when defining d100. This time it worked nicely.

But come on, "no space left on device?" What kind of an error message is that? How about something more like, "specified slice does not exist." Technically, a storage device of size zero would have no space available, but there sure are more direct ways to express that concept.

Thursday, June 05, 2008

The Evolution of Email

Have you ever stopped to ask yourself what benefits have been derived by the evolution of email from the days of ASCII text to our modern world where Microsoft Word can act as the email editor?

Fortunately I don't need to ponder this question any longer. Today I received an email which simply would not have had the same impact back in the old days of low-tech correspondence.

The email started out with the following, which is a direct quote:

Starting IMMEDIATELY - ZERO TOLERANCE for any and all non compliance of the following process!

It looks pretty menacing in ASCII text, but thanks to Microsoft Exchange and its mind-blowing capabilities to allow more effective self-expression I was able to receive that motivational phrase in a 24-point underlined red font.

I have to admit, it's difficult to fully realize the gravity of the phrase without gratuitous aesthetic enhancement. Let's face it, it would take a PowerPoint attachment to more effectively intimidate me.

Wednesday, June 04, 2008

The Unconventional Explorer

The habit of Sun's explorer dumping output to /opt/SUNWexplo/output makes me wince a bit. In all fairness, I think the documentation could be seen as technically inconclusive, but in spirit I believe a more correct solution is not difficult to derive.

Consulting the Solaris 10 System Administration Guide: Devices and File Systems we find a concise chart of default Solaris file systems and their raison d'etra. Three specific entries jump out at me as being relevant to this topic:

  • /opt: Optional mount point for third-party software. On some systems, the /opt directory might be a UFS file system on a local disk slice.

  • /var: System files and directories that are likely to change or grow over the life of the local system. These include system logs, vi and ex backup files, and uucp files.

  • root(/): The top of the hierarchical file tree. The root (/) directory contains the directories and files that are critical for system operation, such as the kernel, the device drivers, and the programs used to boot the system. The root (/) directory also contains the mount point directories where local and remote file systems can be attached to the file tree.

Considering these practices, it makes perfect sense that explorer is installed in /opt/SUNWexplo. So far, so good. On the systems we deploy at my current place of employment, the /opt file system is part of the root file system, which means that Explorer is dumping output at ~ 5mb per shot onto the root file system.

All things considered, it's pretty benign considering we use either 72 or 146 GB boot drives. But as Solaris Jedi, we look to the harmony and availability of the system, and Explorer is definitely creating a disturbance in the force by dumping volitile files into a subdirectory within /opt. What if someone wrote a script to manage the contents of that output directory and made a little error in their code? What file system would you want it compartmentalized within? Would you want the potential of filling root, or filling a less critical file system? Methinks there must be a better way.

As in most dilemmas, I tend to look for precedents. Where would we find a traditional location in the standard Solaris file system that might be used to spool (hint, hint) volatile files which might grow over time? I would immediately look to /var. There are two immediate paths I see as being preferential to /opt/SUNWexplo/output.

The first option would be /var/spool/explo. This would follow a convention that aligns with out use of a local explorer agent. The servers here produce an explorer on a regular file which is immediately shipped to a central (on-site) repository. The most recent explorer is typically left on the system and the history is managed at the repository. This makes the output directory a traditional spool directory, and as such a perfect fit for /var/spool/explo.

Where this may not be as intuitive is the case of an environment where explorers are retained on the host rather than collected and managed centrally. In that case, the explorers are better described as log files than spools. Intuition brings me to the use of /var/opt/SUNWexplo/output for this case. It's close to the legacy directory structure of the tool, which makes the solution marginally more intuitive than using a spool directory. It also follows the rarely observed SYSV standard of pairing optional software installed in /opt with a directory in /etc/opt, /usr/opt, and /var/opt. I'm not a fan of this specific model when taken to its literal implementation, but it's worth noting.

So, which one is best? As noted earlier, it depends. If I were a member of Sun's Explorer engineering team and needed to pick one consistent location with the intent of minimizing discontent I would select /var/opt/SUNWexplo/output. It is intuitive in the largest set of configurations, and doesn't break any rules. My secondary recommendation would be to create a symbolic link to redirect /opt/SUNWexplo/output for backwards compatibility over the next few years until it could be phased out.

Now I'm left wondering what interesting problems I might create in the data center if I put together a change package that implemented this very model... Nothing is ever as simple or benign as it appears on the surface.

Thursday, May 01, 2008

Complex commands with sudo

I've heard all the excuses for why someone issued a "sudo su -" command, and instantiated a shell that no longer tracked their actions. Of course we can argue about how to configure sudo so that problem goes away, but what if you have a lenient sudoers configuration?

The problem usually occurs when you need to redirect output. For example:

# tar cvf - /etc/ | gzip -c > /protected_dir/etc_backup.tgz

Or, the one which I just used, and reminded me that this deserves a quick posting:

# m4 somefile.m4 >

Both of these will fail if the target directory is one that your user ID does not have permission to write to. In many cases, the frustrated SA will simply use sudo to "su" to the root user and perform the command there. But we Solaris Jedi know that this is simply a temptation of the dark side pulling at a time when you need to get work done.

The right thing to do is create a subshell that executes the command. Returning to the above examples, the right instantiation would be:

# sudo sh -c "m4 >"
# sudo sh -c "tar cvf - /etc | gzip -c > /protected_dir/etc_backup.tgz"

Works like a charm. That being said, I'm much more an advocate for using RBAC on Solaris, but I'm going to fight the power of scope creep on this posting and stick with sudo.

Tuesday, March 11, 2008

Open Engineering: snmpXdmid follow-up

Just for kicks I did a search on Google for the same problem I encountered only a few days ago: How to disable snmpXdmid in Solaris 10. The first time I searched for this information I found a wealth of Solaris 8 and 9 information, but very little about Solaris 10, and nothing about the alleged bug in SMF.

After finding the solution, I posted to this blog documenting the answer. Having given the Googlebots a little time to work their magic, I returned to the scene of the crime and entered the following search query: "disable snmpXdmid Solaris 10". SolarisJedi shows up in the #3 position for that query with all the information necessary for remediation. It feels pretty good knowing that someone else might get to complete a job in five minutes rather than five hours.

While I was riding the warm-fuzzy, I started thinking about how many large Corporations with legions of skilled SAs and Engineers maintain private knowledge bases rather than using public resources. I'm not talking about internal problems and proprietary issues - I'm talking about solving problems related to the generic off the shelf products they leverage. Let's face it, there isn't much proprietary about sendmail and DNS other than perhaps some parameters that are easily scraped clean.

Companies like Sun Microsystems have really paved the way of the future by encouraging their employees to blog, and trusting that proper standards of professionalism will be maintained. I believe Sun recognized that many of the problems they generate revenue from solving are based on the Internet's ability to act as a research assistant. I'd like to see more IT professionals invest back in the community.

One of the points in the System Administrator's Code of Ethics, a joint statement by LOPSA, USENIX, and SAGE, is the following:

RESPONSIBILITY TO THE COMPUTING COMMUNITY: I will cooperate with the larger computing community to maintain the integrity of network and computing resources.

Sometimes the definition of "network and computing resources" is one of hardware and software, but I suspect that other times it ought to apply to the operators of those resources since you cannot have one without the other.

Monday, March 03, 2008

Die Hard: disabling snmpXdmid on Solaris 10 (dmi)

On a recent server build project we ran into a security scan that surprised us with a mandate that snmpXdmid be disabled. The alleged vulnerability is based on a buffer overflow that originated in the days of Solaris 8 as documented in CIAC Information Bulleting l-065 and SunSolve Security bulletin #00207. The details aren't important to this story other than finding it entertaining to respond to a Solaris 8 vulnerability on a Solaris 10 build. I'll save my thoughts on the corporate world's implementation of automated scanning for another post.

Our normal JumpStart image was configured about a year and a half ago, and in it we addressed the problem. Of course, none of us could remember what we did, and it turns out to not be the easiest thing to extract from Google. The process is pretty straight forward once you find it...

# svcadm disable dmi

Next, I dug up a quick test case to make sure the fix worked. It's easy enough to check registered RPC services using rpcinfo.

# rpcinfo -p | grep 100249
100249 1 udp 43483
100249 1 tcp 42683

Wait a minute... I thought I turned that off! The normal behavior of the service management facility, or SMF is to immediately change the state of a service after a disable command, so I made the (false) assumption that I had disabled the wrong service. After some additional research and testing I found that I wasn't wrong. The SMF was wrong. I did a quick zone reboot, and sure enough, the service was no longer responding. This led me to conclude the DMI service was not removing its registration with the portmapper (svc:/network/rpc/bind:default).

The next step on the path is to look at the service method that stops and starts the DMI service. All method scripts are stored in /lib/svc/method, and this one is easy to find: svc-dmi. So now we need to take a look at how it goes about stopping the service:

/usr/bin/pkill -9 -x -u 0 -z ${_INIT_ZONENAME:=`/sbin/zonename`} \

And so we come to the flaw in this method. In order to be consistent with predominant SMF behavior, this method should stop the service completely. There are two ways we can address this. We can either restart the entire portmapper, or we can be more surgical and remove the snmpXdmid registration from the portmapper. I preferred the latter since restarting the portmapper could temporarily impact other services. the code change is pretty simple:

/usr/bin/pkill -9 -x -u 0 -z ${_INIT_ZONENAME:=`/sbin/zonename`} \
'(snmpXdmid|dmispd)' && /usr/bin/rpcinfo -d 100249 1

The problem, of course, is that to implement this fix I need to modify a script which is managed by the pkgadd facility, and subsequently, checksummed. I wont' get into addressing that issue right now, as the goal of this post is simply to provide breadcrumbs to other Jedi working to improve security with as little impact as possible.

Tuesday, February 12, 2008

The DaVinci Zone: Automating Zone Installation

I love a good mystery as much as the next guy, and this one took a bit of piecing together. It's all documented, and with the proper grasp of, man pages, and Google query syntax anyone can automate their zone installation. Since it took me a while to piece it together I thought I'd leave a few notes in the Jedi archives.

I'm going to leave my breadcrumbs in Perl, and focus on the workflow more than the syntax, so you won't be able to copy and paste code. If you know some Perl you chouls be able to fill in the blanks pretty easily.

The first thing we need to do is create the input stream for zonecfg. This is essentially the same things you would type if you doing it interactively, which is exactly how I derived the text I'm using.

# Open the file for writing
open(ZONECFGTMP, ">$zonecfgfile") or die "ERROR: Could not open $zonecfgfile for writing";
# Write the contents
print ZONECFGTMP "create\n";
print ZONECFGTMP "set zonepath=$zonepath/$zonename\n";
print ZONECFGTMP "add net\n";
print ZONECFGTMP "set physical=$zoneif\n";
print ZONECFGTMP "set address=$zoneip\n";
print ZONECFGTMP "end\n";
print ZONECFGTMP "exit\n";

Next we need to create a sysidcfg file using a similar strategy... It gets a bit funky in the middle when I base some logic on whether or not the zone is entered into DNS. Solaris has what I consider a nuisance behavior during installation. If you want to configure DNS at install-time, the hostname must already be in DNS. If not, the install will revert to an interactive prompt asking if you really want to do this. To get around this, we need to FIRST determine if the zone name is in DNS. If it is, then install a sysyidcfg that reflects DNS. If not, then we need to use "none" for the sysidcfg naming service, and then install a resolv.conf file. It's kludgy, but it works.

# Make sysidcfg file (either NONE or DNS dep. on earlier check)
# File name should be mkzone.sysidcfg.ppid
open(SYSIDCFGTMP, ">$sysidcfg") or die "ERROR: Could not open $sysidcfg for writing";
print SYSIDCFGTMP "root_password=\n";
print SYSIDCFGTMP "system_locale=en_US\n";
print SYSIDCFGTMP "timeserver=localhost\n";
print SYSIDCFGTMP "timezone=US/Eastern\n";
print SYSIDCFGTMP "terminal=vt100\n";
print SYSIDCFGTMP "security_policy=NONE\n";
print SYSIDCFGTMP "nfs4_domain=$mydomain\n";

# if host not in DNS, use NONE, else use DNS.
if ( $zonenotindns ) {
print SYSIDCFGTMP "name_service=NONE\n";
} else {
print SYSIDCFGTMP "name_service=DNS {\n";
print SYSIDCFGTMP " domain_name=$mydomain\n";
print SYSIDCFGTMP " name_server=,,\n";
print SYSIDCFGTMP " search=search.domain1, search.domain2\n";
print SYSIDCFGTMP "}\n";
} #end if

print SYSIDCFGTMP "network_interface=PRIMARY {\n";
print SYSIDCFGTMP " hostname=$zonename\n";
print SYSIDCFGTMP " ip_address=$zoneip\n";
print SYSIDCFGTMP " netmask=\n";
print SYSIDCFGTMP " protocol_ipv6=no\n";
print SYSIDCFGTMP " default_route=$zonedefroute\n";
print SYSIDCFGTMP "}\n";

If we needed to create a resolv.conf, it would look something like the example below. Note that I entered the DNS server list into an arry called @dnsserverlist. The $zonenotindns variable is set earlier in the execution when we perform an nslookup. I do this with a call to system() rather than using a separate module because it makes the code easier to distribute.

if ( $zonenotindns ) {
print RESOLVDOTCONF "domain $domain\n";
foreach ( @dnsserverlist ) {
print RESOLVDOTCONF "nameserver $_\n";
} #end foreach
print RESOLVDOTCONF "search $mydomain\n";

The piece that threw me for a loop was getting rid of the NFSv4 prompt. It turns out to be as simple as putting this command into the code right before the zone is booted, but after the zone is installed. Kudos to the OpenSolaris Zones and Containers FAQ for documenting it!

system("/usr/bin/touch $zonepath/$zonename/root/etc/.NFS4inst_state.domain");

Using the above files is covered well in other posts, so I won't duplicate content. Using these details, you should be able to get your site's zone installation automated without too much trouble.

Tuesday, February 05, 2008

Zonecfg: removing a resource

I just noticed that there aren't a whole lot of examples of removing a resource from a zone to be had in the vast caches of Google at the moment. It's pretty simple once you understand the zonecfg syntax. Of course, just about everything in UNIX is simple once you know how to do it!

First, we need to fire up zonecfg and look at the specifics of how our zone is configured:

cgh@testbox$ pfexec zonecfg -z testzone
zonecfg:testzone> info
zonename: testzone
zonepath: /export/zones/testzone
autoboot: true
dir: /lib
dir: /platform
dir: /sbin
dir: /usr
dir: /myapp/u01
special: /dev/dsk/c2t5006048ACC36D646d138s0
raw: /dev/rdsk/c2t5006048ACC36D646d138s0
type: ufs
options: []
physical: e1000g0

In this case, the file system /myapp/u01 has a problem and is preventing the zone from rebooting. In order to remove it we need to use the remove syntax, which requires enough parameters to uniquely identify the resource we want removed. In this case, the dir setting of /myapp/u01 should be sufficient.

zonecfg:usa0300uz0002> remove fs dir=/uv1234/u01

A quick repeat of the info command should now display that the file system is not part of this configuration, and indeed it does.

zonecfg:testzone> info
zonename: testzone
zonepath: /export/zones/testzone
autoboot: true
dir: /lib
dir: /platform
dir: /sbin
dir: /usr
physical: e1000g0

And funally, we commit the changes using the commit command. A quick call to zoneadm, and a reboot is issued, allowing our zone to successfuly reboot.

Monday, February 04, 2008

RBAC, Zone Management, and the mortal user

I'm doing a lot of work with automating zone configuration at the moment, and have been using the zlogin command frequently. Having never been a big fan of Sudo, I really wanted an excuse to dabble in RBAC and see if I could get it to work for me. Turns out to be a very trivial thing. In this case I wanted to be able to perform zone administration as conveniently as possible without spending a lot of time whittling down a command set - just give me quick and easy.

I started out looking for any execution attributes which may have been preconfigured for my convenience...

cgh@testbox$ grep -i zone /etc/security/exec_attr
Zone Management:solaris:cmd:::/usr/sbin/zlogin:uid=0
Zone Management:solaris:cmd:::/usr/sbin/zoneadm:uid=0
Zone Management:solaris:cmd:::/usr/sbin/zonecfg:uid=0

So now I needed to get them plugged into my user ID (I didn't want to fiddle with su-ing to a role, just wanted them in my ID). I loaded up the /etc/user_attr file into my favorite editor (for those who are curious, I'm a vi guy) and added my name, and the profile:

adm::::profiles=Log Management
lp::::profiles=Printer Management
rroot::::auths=solaris.*,solaris.grant;profiles=Web Console Management,All;lock_after_retries=no
cgh::::profiles=Zone Management

A quick test verifies that all is well with the world:

cgh@testbox$ profiles
Zone Management
Basic Solaris User

And finally we give it a shot:

cgh@testbox$ zlogin testzone
zlogin: You lack sufficient privilege to run this command (all privs required)

But of course! The use of RBAC commands seamlessly requires that you use an RBAC-aware shell such as pfcsh, pfsh, or pfksh. But at the moment my shell is a standard ksh. The easy way to get around this is to use the pfexec command.

cgh@testbox$ pfexec zlogin testzone
[Connected to zone 'testzone' pts/4]
Last login: Mon Feb 4 13:45:24 on pts/4
You have new mail.

And there you have it. With RBAC, it's easy to attach administrative commands to a general user ID. Of course, this demonstration was a hack, and isn't a best practice. Why? Administrative commands are separated from user commands for a reason. You dont' want a general user doing things that can impact the entire system.

The best way to do this in most situations would be to embrace the R in RBAC and create a role for Zone Management that a user could assume to perform this work. In my case it's a lab machine, not many people are using it, and I wanted an excuse to play with RBAC.

Friday, February 01, 2008

Basename saves the day...

One of the things I like to do when setting up a Perl script is to set a variable called "thisscript". It's essentially the $0 special variable, but with a subtle twist. The inspiration for this article comes from forgetting the twist, and true to my mission, I am documenting my detours from the Jedi path.

I'm working on a fun script at the moment which simplifies and automates the process of deploying and configuring a zone. Sort of a JET-lite if you will. The script creates numerous temp files, and I prefer the following naming convention: "tmpdir"/"name of parent script"."functional identifier"."process pid". So, a file name may look like this: /tmp/mysite-mkzone.sysidcfg.343224. And here begins the oddity I ignored, and eventually fixed.

Although the script worked well, I noted the following output:

Cleaning up temp files...

Fortunately, the way UNIX inteprets a pathname, this is a perfectly legitimate albeit circuitous path value. The "./" evaluates to the current directory and continues on its merry way. To be more explicit, the following examples all evaluate to the same value:

  • /tmp/mysite-mkzone.zonecfg.2012

  • /tmp/./mysite-mkzone.zonecfg.2012

  • /tmp/././mysite-mkzone.zonecfg.2012

  • /tmp/./././mysite-mkzone.zonecfg.2012

But, my heightened Jedi awareness felt this extraneous path element to be disturbing the balance of the force. Once you journey down the path of the dark side, it is difficult to return to the light. But where was this coming from? My first suspicion was a syntax error somewhere in a Perl string catenation.

In perl, strings are catenated with a dot operator ("."). For example, we could set up a string using catenation as follows:

$ vi
my $a="The quick brown fox";
my $b="jumped over the lazy dog";
my $sentence="$a" . " $b." . "\n";
print $sentence;
The quick brown fox jumped over the lazy dog.

So, if I were to misplace a quote, it's possible that I might have included an errant period somewhere in the code. After a scan of each use of the variable I quickly determined that my hypothesis was unlikely to have manifested itself. I then returned to the code which set the initial variable:

my $thisscript=$0;

It was then that I remembered what I had omitted. I don't typically have "." in my current path. Just a habit, the result of which is another habit. I always qualify the path to whatever I'm running. So, if I'm executing a script called mysite-mkzone in the current directory, I execute the following on the command line:

$ ./mysite-mkzone

Now, consider the preceeding wetware behavior in concert with the following software behavior:


Herein lies the problem. Evaluating $sysidcfg we get the following: "/tmp + ./mysite-mkzone + sysidcfg + 12345" which explains where the extraneous "./" is coming from. So how did I fix it? I used File::basename. Which is a Perl equivalent of the shell basename(1) command. It deletes any path prefix ending in "/" from a string. In other words, it yanks the directory part of a command, leaving just the command. To use this, I made the following trivial modification to my code:

use File::Basename;
my $thisscript=basename($0);

And the output dutifully responded as follows:

Cleaning up temp files...

This wouldn't have happened if I'd used my normal starter tempalte, which has this variable pre-configured, but I'd made a careless decision to just go from scratch on this one. Yet another misstep on the path, but a good lesson.

Friday, January 18, 2008

Queuing Efficiency

On my way through the office door this morning I had an experience that I realized needs to be recorded so that I don't go on and rant about it in the future. This post will allow me instead to reference it, thus saving a vicious cycle of rehashing.

In addition to my career in systems engineering, and my small photography business I take martial arts classes. A few days ago I managed to pull a hamstring and sprain a toe on the opposite leg. Don't get me wrong - it's worth it. But these injuries are relevant to the story because they allow you to appreciate my inability to move quickly. I'm not limping, I'm just moving cautiously.

The weather was cold, and the sidewalks covered in treacherous ice. A bitter wind cut through me as I approached the building. Moving at a determined but non-rapid pace, I noticed someone about a mile (or three) front of me, presumably intending to enter the same doors. And then it happened.

They decided to take it upon themselves to hold the door open. At this point they almost need a telescope or radar to even know I'm planning to go through the same door, but yet they stood there holding it open while the arctic air flooded the entry to our building.

Here's the deal people... If someone behind you stands a chance of having the door slam in their face, then you hold it open. If they are carrying a heavy load and don't have arms free, you can wait until they get close to open the door for them. If it's more than five steps, you move along. Let's stop the self-gratifying good deed of holding a door open just to see if the recipient of your good will will start to jog because they feel guilty you are letting all that cold air into the building. If they have enough room to jog they probably don't need you to hold the door.

Wednesday, January 16, 2008

Got dependencies?

As anyone who follows my blog has learned, I'm a packaging junkie. I write packages for everything I deploy, and happily plunk them into my JumpStart server where they perform their duty in a predictable and maintainable way. Does it get any better than this? Well, actually... It got pretty rough this week as my troubleshooting skills enjoyed a solid workout.

We are in the process of moving our standard deployment from Solaris 9 with SRM project containers to Solaris 10 Zones. Now Zones are really surprisingly simple for the rich myriad of benefits they provide. It takes very little time to get to the point where you can set up a test box with multiple zones in a basic Solaris environment. But what if you don't have a basic environment?

We have packaged everything. Most of our /etc files that aren't stock are edited by class action scripts or postinstall routines. We've deployed hundreds of servers with this configuration and it worked flawlessly until we started deploying Zones. Then suddenly, our zones couldn't talk to the Directory, were missing resolv.conf, and all kinds of other cascading problems. Ugh. Where to start?

The first thing I did was check the error messages. Pretty clever, no?

The file /export/zones/testzone01/root/var/sadm/system/logs/install_log contains a log of the zone installation.

Ok, let's see what's in the log...

WARNING: setting mode of /var/spool/cron/crontabs/root to default mode (644)
ERROR: attribute verification of /export/zones/testzone01/root/var/spool/cro
n/crontabs/root failed
pathname does not exist

Installation of CGHtest on zone testzone01 partially failed.

Well that's strange. Let's see what the registry says about root's crontab:

$ grep "/var/spool/cron/crontabs/root" /var/sadm/install/contents
/var/spool/cron/crontabs/root e cron 0600 root sys 48 3760 1200431631 SUNWcsr:cronroot

The root crontab is owned by SUNWcsr. So it turns out that when the global zone creates the sparse root zone, it tried to install CGHtest before SUNWcsr was installed. The root of my problem was slipping a habit that I'm normally quite vigorous about enforcing.

Any time I use question marks in my prototype file to inherit attributes from an already-installed package I always, always, always stop immediately to look up that inherited object in the registry and add its parent package to a depend file. Here's an example prototype file displaying inheritance:

$ cat prototype
i pkginfo
i depend
i copyright
i i.cron
i r.cron
d none var ? ? ?
d none var/spool ? ? ?
d none var/spool/cron ? ? ?
d none var/spool/cron/crontabs ? ? ?
e cron var/spool/cron/crontabs/root ? ? ?

This was never a problem in our traditional JumpStart environment because the JET custom packages are installed after SUNWcsu. Fortunately, it's easy to control the order (dependency) of package installations using the depend(4) file.

$ cat depend
P SUNWcsr Core Solaris, (Root)
$ grep depend ./prototype
i depend

Problem solved! Of course, the problem never would have happened if I'd remembered my Jedi training. It's good to be humbled on a semi-regular basis.