Present Perfect


Picture Gallery
Present Perfect

using xargs on a list of paths with spaces in a file

Filed under: Hacking,sysadmin — Thomas @ 19:18


Every few weeks I have to spend an hour figuring out exactly the same non-googleable thing I've already needed to figure out. So this time it's going on my blog.

The problem is simple: given an input file listing paths, one per line, which probably contain spaces - how do I run a shell command that converts each line to a single shell argument ?

Today, my particular case was a file /tmp/dirs on my NAS which lists all directories in one of my dirvish vaults that contains files bigger than a GB. For some reason not everything is properly hardlinked, but running hardlink on the vault blows up because there are so many files in there.

Let's see if wordpress manages to not mangle the following shell line.

perl -p -e 's@\n@\000@g' /tmp/dirs | xargs -0 /root/hardlink.py -f -p -t -c --dry-run

ssh friction

Filed under: Hacking,sysadmin — Thomas @ 12:57


I haven't been too good this year at removing friction from my workflow. Today I wanted to change that. And the random friction thrown my way today has to do with ssh.

You see, somewhere along the line I read that it is a good idea to create separate keys for separate identities. So I have an identity for all work-related stuff (which I consider 'ring 1': it's unlikely to change but everyone can get fired or change jobs), one for personal stuff on machines I actually control ('ring 0': they'd have to pry it out of my dead hands), another for my 'public online default' identity ('ring 2': I can always pull a whytheluckystiff and pull myself of the net and reinvent myself), and then per-project identities ('ring 3': I may lose interest in being a fedora or gstreamer contributor without massive changes in my personality).

I started splitting ring 3 per project when it made sense - for example, Fedora recently enforced a key change even if your account wasn't compromised and even if you already have a strong passphrase on your key (like I had), and of course a massive flamefest ensued. I shrugged and decided to split off a new key and set that on all my machines.

But the problem is, this whole tower of ssh doesn't really work well in practice. I chose a long passphrase for the new fedora keys, but obviously I do not want to type that every time I clone a package or commit changes. So I use ssh-agent. In theory, ssh-agent adds your keys and asks you for the passphrase once, and is then able to offer those identities to the other side.

The problem is a lot of ssh servers out there only give you a few tries. So your ssh agent will offer identity by identity until it gets refused. If my fedora identity was added as the fourth identity I lose - I can't clone a package.

Specifying IdentityFile in the ssh config is useless. It is poorly documented, but IdentityFile files actually come after your ssh-agent identities. So your agent blasts all the wrong keys at the host first, and you get denied.

So you can specify IdentityOnly to make sure that only the identity file you want is being used. Sadly in that case it will not use the ssh-agent at all, so it will ask you for the password to your key file - the whole reason you want agents to be used in the first place.

Now obviously ssh has all the pieces it needs to Do The Right Thing. If my config says to use this identity and this identity only, ssh should be able to request ssh-agent to present that identity, and that identity only, and make the login happen without any password.

Surely I must be missing something obvious. Surely one of you uberhackers out there has set up the same thing as me. Why don't you comment about it here and help the rest of us?

kslowd000 and friends

Filed under: Question,sysadmin — Thomas @ 20:43


Ever since upgrading to Fedora 14 my desktop felt sluggish. It was more than the typical boiling frog kind of sluggishness, where you get the feeling everything's snappy just after you bought a new fancy computer and install it freshly with a recent OS, and over time performance slowly degrades until you wonder why computers are always so slow. Sure, it looks like Evolution, after a round of improvements in memory management, has gone back to being a memory hog. But this time, it was more. It would go through short phases of unresponsiveness and then come back. Load would be consistently around 1 or more, but for no apparent reason at all.

After a while watching top, I noticed a process called kslowd[xxx] jumping up and down in the top output regularly. The k says it's a kernel process. No idea what it is. Googling isn't very helpful to learn what it actually is, but it did put me on the trail because there are huge amounts of posts on sites and mailing lists about this process eating CPU time and slowing down the computer.

After a bunch of reading some post suggested it might be this patch by Dave Airlie, a name I recognize, to the kernel. I took the Fedora kernel src.rpm, spent a few minutes getting acquainted with Fedora's kernel spec layout de l'année, integrated the patch, rebooted, and voila. No more kslowd000 eating all my CPU.

I recently found this workaround which I'll try next time the kernel gets upgraded.

That still doesn't tell me what that kernel process is supposed to be doing (anyone up for a mandatory rule of having man pages for kernel processes too ?), so feel free to comment!

MySQL InnoDB table corruption

Filed under: Hacking,sysadmin — Thomas @ 21:31


One of our customers mailed us to say that they had some database corruption in one of their main tables, and obviously our friend Murphy had forgotten to take backups, and could we take a look at it ?

Obviously it's always way more fun to poke at someone else's train wreck than your own, and you might learn something that could save your bacon in the future, so I decided to give it a go in my spare time. I asked for a full tar of the system so I could chroot into it and do a post-mortem.

It became quite a time-consuming endeavour - not in actual time spent doing stuff, but waiting for things to happen. First waiting to get ftp details, get the password, figure out their iptables rules so I could actually log in, finding a machine with 150 GB free to transfer the image to, actually transferring the image (took the best part of a week), realizing they only gave me the db partition and not the actual system, when I prefer to have the real system to make my debugging easier, making a backup of all the data over our internal network to the new file server, ... Lots of little steps all taking one minute of work and various hours to complete.

But finally I had their system on one of ours, and I was able to chroot into it, start mysql, and run the query that brought down their server.

The table probably has millions of lines, and I was able to query about 20000 before it crashed. I tried varies things, REPAIR doesn't work on InnoDB anyway.

Since the table was InnoDB, I found a utility called innnodbchecksum and tried it. It spat out:

page 535 invalid (fails new style checksum)

So, one of the early pages is invalid, and MySQL just gives up after that.

I found this presentation that explained a bunch of things about the InnoDB database file format, and looked around for file format parsers.

Inspecting the innodbchecksum binary, it seems there are "old style" and "new style" checksums on pages. Reading the code, it seems the old-style matched, but the new-style didn't. So, one of the random ideas that popped into my head was to change the newstyle checksum on that page. I mean, one of the two seems fine, no ? Haven't tried that yet, saving it for later.

I tried a bunch of methods I found all over, including the interesting-looking innodb_force_recovery option which you can dial from 1 to 6, but that didn't help much either.

I checked if hachoir maybe had a tool to parse innodb files, because I've been looking for a good excuse to play with hachoir since forever, but no luck, although it sounds like a good match.

Surely someone must have already written some tool to look at corrupt innodb database files and recover the 99.99% of good pages out of it ?

Activating my network of MySQL-related contacts however brought me to this very interesting post that I hadn't found through Google. Excited, I followed the instructions. I noticed the instructions were for 0.3, and the latest version had moved and now was 0.4.

The tool basically worked, although something has changed, and I couldn't leave a comment on Chris' blog, so I hope trackback works for the people coming after me... Instead of doing

cd innodb-recovery-tool-0.3/
./create_defs.pl --user=root --password=mysql --db=test --table=t1 > table_defs.h

in step 3, I had to do:

cd percona-innodb-recovery-tool
./create_defs.pl --user=root --password=mysql --db=test --table=t1 > include/table_defs.h

If you don't write to the include dir, you end up with recovery results for a table called 'reptest' which is in the default include/table_defs.h, and not what you want.

Now the tool is taking satisfyingly long to complete, the output data looks like it's mostly intact, and hopefully I can make a customer happy for Christmas with a non-standard service that I hope they'll enjoy.

Dell R815 power draw

Filed under: sysadmin — Thomas @ 17:43


Dear intarweb,

I've been searching all over for data on what these fancy Dell R815 servers (which can house 4 12-core Opteron CPU's for a total of 48 cores in 2U) draw in terms of power.

This handy capacity planner from Dell doesn't have this machine listed yet.

Anyone out there know where I can find this info, or have one of these babies actually running ?

« Previous PageNext Page »