thomas.apestaart.org

399 days without hard drive failures

Filed under: Fedora,sysadmin — Thomas @ 20:36

2012-12-01
20:36

Well, it's been a record 399 days, but they have come to an end. Last weekend, a drive in my home desktop started failing. I had noticed some spurious SATA errors in dmesg before, and load times were rising (although lately in the 3.4/5/6 kernels I've been running I've actually seen that happen more and more, so it wasn't a clear clue).

Then things really started slowing down, and a little later I noticed the telltale clicking sound a drive can make when it's about to give up.

Luckily life has taught me many valuable lessons when it comes to dealing with hard drives. The failing drive was a 1TB drive in a RAID-1 software raid setup, so fixing it would be simple - buy a new 1TB drive and put it in the RAID, and just wait for hours on end (or, go to sleep) as the RAID rebuilds.

A few years ago I started keeping track of my drives in a spreadsheet, labeling each drive with a simple four digit code - the first two digits the year I bought the drive in, and the second two digits just a sequence (and before you ask, the highest those two digits got so far is 07 - both in '11 and '12). The particular drive failing was 0906, so the drive was about 3 years old - reasonable when it comes to failure (given that it has been running pretty much 24/7), but possibly still under warranty, and I've never had the opportunity to try and get a disk back under warranty, although this particular one was bought in Belgium.

But I digress.

Of course, I seldom take the simple route. When buying hard drives, I basically only follow one rule - buy the biggest drive with the cheapest unit price. And at last, Barcelona stores have gotten to the 3TB drives being at that sweet spot. So, why buy a comparatively expensive 1 TB drive and not get to have any fun with complicated drive migration?

So I settled on a 3TB Seagate Red drive (this is a new range specifically for NAS systems, although I'm not convinced they're worth the 6% extra cost, but let's give it a try) so I could replace the penultimate 2TB drive in my ReadyNAS, get 1TB of extra capacity on that, and then just use the newly freed 2TB drive in my desktop computer.

Of course, that's when I ended up with two problems.

Problem 1 was the NAS. The ReadyNAS was at 10TB already, having 4 3TB drives and 2 2TB drives with dual redundancy. I took out a tray, replaced the drive, put it back in, and then waited a good 18 hours for the array to rebuild. (The ReadyNAS has something they call XRaid2 which really is just a fancy way of creating software raids and grouping them with LVM, but in practice it usually works really well - figuring out a number of raid devices it should create using the mix of physical drives).

This time, it had correctly done the raid shuffle, but then gave me an error message saying it couldn't actually grow the ext4 filesystem on it because it ran out of free inodes. Ouch. A lot of googling told me that I should try to do an offline resize, so I stopped all services using the file system, killed all apple servers that somehow don't shut down, and did the offline resize. And then I rebooted.

The ReadyNAS seemed to be happy with that at first, saying it now had more space (although depending on the tool you use, it still says 10TB, because of the 2 to the 10/10 to the 3rd number differences adding up). But soon after that it gave me ext3 errrors. Uh oh.

With sweaty palms, I stopped all services again, unmounted the file system, and fsck'd it. And almost immediately it gave me a bunch of warnings about wrong superblocks, wrong inodes, all in the first 2048 sectors. Sure I have backups, but I wasn't looking forward to figuring out how up-to-date they were and restoring up to 10 TB from them.

I gasped for air and soldiered on, answering yes to all questions, until it churned away, and I went to sleep. The next morning, a few more yeses, and the file system seemed to have been checked. Another reboot, and everything seemed to be there... Phew, bullet 1 dodged.

On to problem number 2 - the desktop. The first bit was easy enough - although I've never been able to use gdisk to copy over partition tables like I used to with fdisk - it seems to say it did it, but it never actually updates the partition table. Anyway, I created it by hand copying the exact numbers, then added the partitions to the software raid one by one, and again waited a good 6 hours.

And looking at my drive spreadsheet, I noticed I had a spare 2TB drive lying around that I was keeping in case one of the NAS drives would fail - but given that most of them are 3TB right now, that wouldn't be very useful. So, after the software raid rebuilt in my desktop, I switched out the working 1 TB drive as well, and repeated the whole dance.

So now I had 2 2TB drives, 1 TB of which was correctly used. At this point I would normally figure out how to grow the partitions, then the md device, then the LVM on it, and then finally grow my ext4 /home partition. But since it's using LVM and I never played with it much, this time I wanted to experiment.

I still had the working 1TB drive which I could use as a backup in case everything would fail, so I was safe as houses.

At first I was hoping to do this with gparted live, but it seems gparted doesn't understand either software raid or lvm natively, so it's back to the command line.
Create two linux raid partitions on the two 2TB drives, assemble a new md device, and spend a lot of time reading the LVM howto.

In the end it was pretty simple; step 1 was to use vgextend to add the new md to the volume group, and then lvextend -l 100%FREE -r to grow the logical volume and resize the file system all at once. That automatically fsck's (which you can follow progress of by sending USR1) and then resize2fs (which you can't really check progress of once it's started)

(By now, we're over a week into the whole disk dance, in case you were wondering - doing anything with TB-sized disks takes a good night for each operation).

Except that now rebooting for some reason didn't work - grub complained that it didn't know the filesystems it needed - /boot is on a software raid too, and even though I don't recall running anything grub-related in this whole process, I had swapped out a few disks and may have botched something up when transferring boot records.

At the same time, I was also experimenting with Matthias's excellent new GLIM boot usb project (where you finally just drop in .iso files if you want to have multiple bootable systems on your usb key, without too much fidgeting), so I tried doing this in system rescue cd.

Boot into that, manually mount the right partitions, chroot into that, and then grub2-install /dev/sda.

Except that grub complained saying
Path `/boot/grub2' is not readable by GRUB on boot. Installation is impossible. Aborting.

Most likely this was due to it being on a software raid. Lots of people seemed to run into that, but no clear solutions, so I went the dirty way. I stopped the raid device, mounted one half of it as a normal ext file system (tried read-only first, but grub2-install actually needs to write to it), ran grub2-install, unmounted again. Then I recreated the software raid device for /boot again by reassembling, and that somehow seemed to work.

Reboot again, and this time past GRUB, but dropped in a rescue shell. My mdadm.conf didn't list the new raid device, so the whole volume group failed to start. Use blkid to identify the UUID, add that to /etc/mdadm.conf (changing the way it's formatted, those pesky dashes and colons in different places), verify that it can start it, and reboot.

And finally, the reboot seems to work. Except, it needs to do an SELinux relabel for some reason! And in the time it took me to write this way-too-long blogpost, it only managed to get up to 54%.

And I was hoping to write some code tonight...

Oh well, it looks like I will have 1TB free again on my NAS, and 1TB of free space on my home desktop.

There is never enough space for all of the internet to go on your drives...

UPDATE: SELinux relabeling is now at 124%. I have no idea what to expect.

Comments (8)

morituri 0.1.3 “cranes” released

Filed under: morituri,Releases — Thomas @ 19:53

2012-11-23
19:53

It was long overdue, but I finally got around to releasing a new version of morituri, my cd ripper.

Originally I planned to do a quick release so I could be the first cd ripper that supported MusicBrainz NGS, which I quickly implemented when they released that, and then figured out how to properly do multi-cd rips (which worked fine before MusicBrainz NGS but stopped working in the early days of MusicBrainz NGS).

Anyway, I finally made some time this week to fix a few dangling issues and clean up for a release.

See the trac page for more info and download links. You can also download it from my package repository for Fedora 16 if that's your distro.

For the curious, here's some more info:

Coverage in 0.1.3: 60 % (1716 / 2825), 85 python tests


Features added in 0.1.3:
- shorten really long file names if needed

- support multi-disc ripping

- add %y for release year in templates

- added rip cd rip --release-id option to select the exact release

- allow track and disc templates to create files in different directories

- work out relative paths from cue/m3u files to audio files
Bugs fixed in 0.1.3:

- 77: Unable to find solution to UTF-8 problem - 93: Unable to choose if there are more than one matching CD - 67: unable to rip multi-cd-sets correctly - 73: rip image breaks with "query failed" - 78: Could not create encoded file - 84: Error when checksumming extremely short tracks - 91: --release-id does not work for Pink Floyd - The Wall (Experience Edition) (Disc 1) - 94: mp3vbr uses quality=0 instead of vbr-quality=0 - 95: Discs with multiple media not correctly identified. - 99: rip offset find fails with "UnboundLocalError: local variable 'archecksum' referenced before assignment" - 102: Unable to run without -d option - 98: Year of release in templates

Comments (2)

tidy for HTML5

Filed under: Fedora — Thomas @ 23:05

2012-11-11
23:05

For a website for work, I wanted to make sure the web guy writes valid HTML code.

I found a validation middleware for Django, which uses tidy.

But the website is in HTML5, so the normal tidy doesn't validate it properly.

Luckily, there's tidy-html5, a fork of tidy for html5, and it's a drop-in replacement for tidy - to the point where it even works with python-tidy.

So I packaged it up for Fedora 16/17 and put it in my package repository.

The package conflicts with the tidy packages; I'm not sure if I should set it to Obsoletes: the tidy package instead, and I don't know if validates non-HTML5 the same way tidy does.

If anyone uses tidy regularly, give me some feedback. If anyone wants to take this into Fedora proper, let me know.

Comments (2)

Released mach 1.0.1 “Concussion”

Filed under: Fedora,mach,Releases — Thomas @ 19:52

2012-11-10
19:52

In the middle of my
CouchDB
Security
Series
I made an unlucky fall during basketball, falling on my back, and feeling my head continue its downward trajectory until it was halted painfully by the cement floor. As I tried to get up the world turned, and as I tried to walk to the bathroom five minutes later I involuntarly kept veering off to the left.

It took me a few weeks to recover from that, and I managed to go back to playing basketball after a month. But my amount of off-work hacking was zero.

That's now been over two months, so I've finally reserved part of the weekend for some hacking again. And now I'm busy tying up loose ends, of which this is one - a new mach release for Fedora 17. Nothing very special, just warming up the muscles again.

I'm comically amused by what is still my first ever python program, but I have no desire to redo it, clean it up, or continue on the half-done mach3 version (which uses novelty programming techniques like, you know, 'more than one file').

Comments (0)

Getting Things Done with CouchDB, part 3: Security in mushin

Filed under: couchdb,General,Hacking,Python — Thomas @ 23:26

2012-09-16
23:26

After piecing together the security story of CouchDB as it applies to mushin, I secured the mushin database on various machines. This serves as a quick setup guide for security for mushin, but I think it's useful for other people using CouchDB.

Stop using Admin Party

This is easy to do in Futon (link only works if you run couchdb locally on port 5984, as per default). Jan's blog post explains it perfectly, including screenshots.

Under the hood, couchdb will actually rewrite your local.ini file to add this user - all admin users are stored in the config files. (I'm sure there's an obvious reason for that)

Given that you most likely will use this password in Futon, make sure you pick a unique password - as far as I can tell this password goes over the wire.

Create a user object for your user

explains the basics. You need to create or update the _users database, which is a special couchdb database. You can get to it in Futon. If, like most people, you're still on a couchdb before 1.2.0, you have to fiddle yourself to calculate the password_sha field, but at least the page explains how to do it. Not the most user-friendly thing to do in the world, so I'm considering adding commands for this to a different application I'm working on.

Allow this user to read and write to the mushin database

Again, the best reference is the CouchDB wiki, but the information is easy to miss.
Every database has a _security object under the database name; in the case of mushin, you can get to it in Futon. _security is a special document that does not get versioned, and doesn't show up in listings either. In fact, it is so special that Futon doesn't let you change it; when you click save it just resets. So your only option is to PUT the document, for example:

curl -X PUT -d @security.json http://admin:sup3rs3kr3t@localhost:5984/mushin/_security

Oops, see what I did there ? I had to specify my admin password on the command line, and now it's in my shell history. I did tell you to choose a unique one because it's going to be all over the place, didn't I ?

security.json is just the contents of the _security document; just adapt the example on the wiki, and put your user under readers, and leave the role empty for now.

test denial

This one is simple; just try to GET the database:

$ curl http://localhost:5984/mushin
{"error":"unauthorized","reason":"You are not authorized to access this db."}

If you did it right, you should see the same error. If you're brave, you can retry the same curl command, but add your username and password. But you know how we feel about that.

Comments (1)

« Previous Page — Next Page »

Present Perfect

399 days without hard drive failures

2012-12-0120:36

morituri 0.1.3 “cranes” released

2012-11-2319:53