Present Perfect


Picture Gallery
Present Perfect

399 days without hard drive failures

Filed under: Fedora,sysadmin — Thomas @ 20:36


Well, it's been a record 399 days, but they have come to an end. Last weekend, a drive in my home desktop started failing. I had noticed some spurious SATA errors in dmesg before, and load times were rising (although lately in the 3.4/5/6 kernels I've been running I've actually seen that happen more and more, so it wasn't a clear clue).

Then things really started slowing down, and a little later I noticed the telltale clicking sound a drive can make when it's about to give up.

Luckily life has taught me many valuable lessons when it comes to dealing with hard drives. The failing drive was a 1TB drive in a RAID-1 software raid setup, so fixing it would be simple - buy a new 1TB drive and put it in the RAID, and just wait for hours on end (or, go to sleep) as the RAID rebuilds.

A few years ago I started keeping track of my drives in a spreadsheet, labeling each drive with a simple four digit code - the first two digits the year I bought the drive in, and the second two digits just a sequence (and before you ask, the highest those two digits got so far is 07 - both in '11 and '12). The particular drive failing was 0906, so the drive was about 3 years old - reasonable when it comes to failure (given that it has been running pretty much 24/7), but possibly still under warranty, and I've never had the opportunity to try and get a disk back under warranty, although this particular one was bought in Belgium.

But I digress.

Of course, I seldom take the simple route. When buying hard drives, I basically only follow one rule - buy the biggest drive with the cheapest unit price. And at last, Barcelona stores have gotten to the 3TB drives being at that sweet spot. So, why buy a comparatively expensive 1 TB drive and not get to have any fun with complicated drive migration?

So I settled on a 3TB Seagate Red drive (this is a new range specifically for NAS systems, although I'm not convinced they're worth the 6% extra cost, but let's give it a try) so I could replace the penultimate 2TB drive in my ReadyNAS, get 1TB of extra capacity on that, and then just use the newly freed 2TB drive in my desktop computer.

Of course, that's when I ended up with two problems.

Problem 1 was the NAS. The ReadyNAS was at 10TB already, having 4 3TB drives and 2 2TB drives with dual redundancy. I took out a tray, replaced the drive, put it back in, and then waited a good 18 hours for the array to rebuild. (The ReadyNAS has something they call XRaid2 which really is just a fancy way of creating software raids and grouping them with LVM, but in practice it usually works really well - figuring out a number of raid devices it should create using the mix of physical drives).

This time, it had correctly done the raid shuffle, but then gave me an error message saying it couldn't actually grow the ext4 filesystem on it because it ran out of free inodes. Ouch. A lot of googling told me that I should try to do an offline resize, so I stopped all services using the file system, killed all apple servers that somehow don't shut down, and did the offline resize. And then I rebooted.

The ReadyNAS seemed to be happy with that at first, saying it now had more space (although depending on the tool you use, it still says 10TB, because of the 2 to the 10/10 to the 3rd number differences adding up). But soon after that it gave me ext3 errrors. Uh oh.

With sweaty palms, I stopped all services again, unmounted the file system, and fsck'd it. And almost immediately it gave me a bunch of warnings about wrong superblocks, wrong inodes, all in the first 2048 sectors. Sure I have backups, but I wasn't looking forward to figuring out how up-to-date they were and restoring up to 10 TB from them.

I gasped for air and soldiered on, answering yes to all questions, until it churned away, and I went to sleep. The next morning, a few more yeses, and the file system seemed to have been checked. Another reboot, and everything seemed to be there... Phew, bullet 1 dodged.

On to problem number 2 - the desktop. The first bit was easy enough - although I've never been able to use gdisk to copy over partition tables like I used to with fdisk - it seems to say it did it, but it never actually updates the partition table. Anyway, I created it by hand copying the exact numbers, then added the partitions to the software raid one by one, and again waited a good 6 hours.

And looking at my drive spreadsheet, I noticed I had a spare 2TB drive lying around that I was keeping in case one of the NAS drives would fail - but given that most of them are 3TB right now, that wouldn't be very useful. So, after the software raid rebuilt in my desktop, I switched out the working 1 TB drive as well, and repeated the whole dance.

So now I had 2 2TB drives, 1 TB of which was correctly used. At this point I would normally figure out how to grow the partitions, then the md device, then the LVM on it, and then finally grow my ext4 /home partition. But since it's using LVM and I never played with it much, this time I wanted to experiment.

I still had the working 1TB drive which I could use as a backup in case everything would fail, so I was safe as houses.

At first I was hoping to do this with gparted live, but it seems gparted doesn't understand either software raid or lvm natively, so it's back to the command line.
Create two linux raid partitions on the two 2TB drives, assemble a new md device, and spend a lot of time reading the LVM howto.

In the end it was pretty simple; step 1 was to use vgextend to add the new md to the volume group, and then lvextend -l 100%FREE -r to grow the logical volume and resize the file system all at once. That automatically fsck's (which you can follow progress of by sending USR1) and then resize2fs (which you can't really check progress of once it's started)

(By now, we're over a week into the whole disk dance, in case you were wondering - doing anything with TB-sized disks takes a good night for each operation).

Except that now rebooting for some reason didn't work - grub complained that it didn't know the filesystems it needed - /boot is on a software raid too, and even though I don't recall running anything grub-related in this whole process, I had swapped out a few disks and may have botched something up when transferring boot records.

At the same time, I was also experimenting with Matthias's excellent new GLIM boot usb project (where you finally just drop in .iso files if you want to have multiple bootable systems on your usb key, without too much fidgeting), so I tried doing this in system rescue cd.

Boot into that, manually mount the right partitions, chroot into that, and then grub2-install /dev/sda.

Except that grub complained saying

Path `/boot/grub2' is not readable by GRUB on boot. Installation is impossible. Aborting.

Most likely this was due to it being on a software raid. Lots of people seemed to run into that, but no clear solutions, so I went the dirty way. I stopped the raid device, mounted one half of it as a normal ext file system (tried read-only first, but grub2-install actually needs to write to it), ran grub2-install, unmounted again. Then I recreated the software raid device for /boot again by reassembling, and that somehow seemed to work.

Reboot again, and this time past GRUB, but dropped in a rescue shell. My mdadm.conf didn't list the new raid device, so the whole volume group failed to start. Use blkid to identify the UUID, add that to /etc/mdadm.conf (changing the way it's formatted, those pesky dashes and colons in different places), verify that it can start it, and reboot.

And finally, the reboot seems to work. Except, it needs to do an SELinux relabel for some reason! And in the time it took me to write this way-too-long blogpost, it only managed to get up to 54%.

And I was hoping to write some code tonight...

Oh well, it looks like I will have 1TB free again on my NAS, and 1TB of free space on my home desktop.

There is never enough space for all of the internet to go on your drives...

UPDATE: SELinux relabeling is now at 124%. I have no idea what to expect.


  1. Yeah, SELinux is an amazingly beautiful IDEA, but in real life, it’s eye-wateringly difficult to maintain over time (and this is assuming you EVER get it to bless all your services).

    I figure half the security of it is that it’s so damn hard to configure–and that’s why it’s so hard to get past. Security through obscurity rides again!

    Comment by Bucky — 2012-12-01 @ 22:18

  2. The hardest hard drive lesson I ever learned was the importance of getting a good UPS. The cheap UPSes won’t save your drives from a brownout.

    Then again, I have a terrible power company and live in a place with wiring from the 1940’s.

    Comment by MrEricSir — 2012-12-02 @ 02:37

  3. I see your blog software is still broken in the presence of trailing slashes, and your RSS feed still generates such links.

    Comment by Marius Gedminas — 2012-12-02 @ 14:05

  4. That’s a little vague. Can you tell me exactly what is going wrong for you? Are you using Google Reader perhaps?

    Comment by Thomas — 2012-12-02 @ 15:22

  5. Okay, it’s not trailing slashes that make all the text hidden. I’ve no idea what causes it to come up with no text on the first load…

    Comment by Marius Gedminas — 2012-12-02 @ 14:07

  6. You look ready to try btrfs. Before you say it’s scary, think about how scary your use of software RAID is.

    Comment by Bernie — 2012-12-02 @ 21:27

  7. Heh, that UPDATE! bit was unexpected and hilarious. Good luck mate! ;)

    Comment by jordi — 2012-12-03 @ 18:59

  8. Thats it – buying cheap drives with high capacities leads you to such a high number of failed drives and waste of time later in rebuilds. The only HDD that ever let down for me was the hell bad IBM 30GB series. Having a HDD running for 9 years 24/7, buying used ones, but not the cheap class. I think this is worth it.

    Comment by comodet — 2012-12-06 @ 22:25

RSS feed for comments on this post. TrackBack URL

Leave a comment