[lang]

Present Perfect

Personal
Projects
Packages
Patches
Presents
Linux

Picture Gallery
Present Perfect

Filed under: General — Thomas @ 12:16

2003-12-31
12:16

My boss is on holiday. He had set up my schedule so that I could program on one thing for three days. Ah, the cruel hand of fate !

I arrived this morning to find one of our servers with a cryptic Unable to load interpreter /lib/ld-linux.so.2. And you thought stuff like this only happened on Windows (which I refuse to spell incorrectly, btw).

Now, I've seen this error before, and mostly it happened when too many processes were running at the same time and memory space was exhausted - which shouldn't be happening anyway, but it does sometimes. Mostly when I try out a new monitoring tool which goes into a frenzy because of an NFS or samba mount error. But it has always happened when I was still logged in on a terminal somewhere so it was, with some effort, fixable. But no such luck in this case. Since I've made an effort to log out any root terminals left open on the servers I now find myself stuck.

The server is in heavy continuous use : all of our journalists write their pieces in their Explorers, and news is read every half an hour during the day. So I can't afford much downtime. But a quick reboot might do the trick, right ? I can't hack into my own system, so it's my last option. I will not learn what caused it, but at least it'll get on.

Well, no, actually. On a fresh reboot, I couldn't log in either. This was getting worrysome. I quickly tried booting in single user mode and that worked. I downloaded newer glibc packages (which include this problematic library) and rebooted, but still the same. I was getting nervous.

What's a guy to do in a situation like this ? Well, either you panic and start fixing it like mad, or you look and see how any possible solution can be fitted into the bigger plan. Let's see. First of all, the server still works. Only we cannot log in, which is problematic for the import of playlists, as well as for the newsletter I spent enough time on already lately (about which you can also read in this diary ;) ). So people are sure to complain.

Well, the long-term plan was to upgrade a few servers here anyway. I installed them when I was young and fresh at the job, and I might have dropped the ball here and there. I know for sure I installed a tarball too many and a few alien RPMs as well. People sometimes complain about rpm, but here too, it is a case of poor workmen blaming their tools. I used to blame it as well, but now I know that if you heed the warnings rpm gives you, and if you stick to RPM's on an rpm-based system, you're safe.

The thing is, I'll never be able to shut it down for the hour or so I'll need to upgrade or reinstall it, so how do I fix that ? Well, the answer should be easy. Upgrade one of the other, less critical servers (that dual-processor machine with RH70 and a 2.2 kernel that keeps crashing anyway for example) and let that take over for a day or so. So I started upgrading that one. And being the reckless idiot that I am, I thought it was a good idea to finally find a solution for the opened cases problem.

This problem being the one where both the server case and the external storage case need to be left open because I could only get a connection to work between them using an internal twisted SCSI cable. Yes, you are allowed to laugh. This would actually be a good idea since last year during winter it started snowing inside the building through a hole in the roof and the next morning I arrived to find a small puddle of water on the carpet ten centimeters next to the drive array. The opened drive array.

Anyways, this wasn't such a good idea : I tested various SCSI cable connections, but my terminator still lit up green while my hunch was it was supposed to lite red, judging from the other two terminators on the tape streamers.

I spent three hours working on that - my boss will love that when he returns ;) - and gave up. I did run the internal cable through holes at the back of both the server and the drive case.

So I considered if I should install fresh or upgrade. Someone convinced me to upgrade, even though I wasn't too happy with the drive layout. So I did, and of course the new kernel didn't boot. Probably because we can't expect Red Hat to compile in each and every raid controller into the kernel. So here I am, booting from another older kernel (in that respect, GRUB is great), downloading the new kernel updates and recompiling a kernel to match my system.

Let's see if this time I at least get my kernel configuration right the first time around. That's what experience should do for a person, right ? *SIGH*

2 Comments

No comments yet.

RSS feed for comments on this post. TrackBack URL

Sorry, the comment form is closed at this time.

picture