Backdrop for the story – after a weekend or two fiddling with all my power supplies to try and revive my home server, I finally got it back online. Well, sort of. Disk 1 of the software RAID for my vault drives (that have backups of all my stuff) was making strange ticking noises at bootup and the BIOS did not find it.
Well, that’ s why I have RAID in the first place, right ? Drives fail, they seem to do so more often now that I have too many of them, and the solution is simple. I get a new drive, replace the broken one, and do a hot add, and there you go.
I’ve been so good the last year about backups. This drive has dirvish backups of my laptop and my work machine, and it was doing great! I was sure I was never going to lose data again. Now I wasn’t so sure anymore.
So I turn off the server, disconnect the vault drives (to make sure that the other one has no chance to start failing), and fly back to Barcelona, where I go drive shopping during the week. No more 400 GB drives to be found, so I get a 500 GB one. (And while I’m at it, I buy the two 1 TB drives I’ve been trying not to buy for a long time).
Come back home to Brussels, put the drive in, boot, and try to mount the slave from the vault Software RAID. Because that’ s the beauty of RAID-1 Software in Linux, isn’t it ? Either drive can be mounted on its own just as if it was a normal drive.
Well, except that there was no partition table, and I couldn’t mount the drive. Now I was really worried :/
So, data recovery. Let’s start with a dd from the old slave to the new 500 GB master drive, so we can try funky partitioning stuff on there. Three hours in, I accidentally made a motherboard box drop from the shelf, and of course it lands right on the slave vault drive, knocking out the power. Oops.
Start over. Meanwhile, learn about gpart, a tool that scans your drive for partition information. Let that run for three hours, finds absolutely nothing. Nothing. (The smart ones among you have already figured out how stupid I am by now). I’m starting to get really worried.
With half the dd done, I run fdisk on the master (which has the copy) and partition the drive, then try to mount the partition. Still no go. It says I should try and find an alternate superblock. I try some numbers, but nothing works. I start to get really worried.
Google some more, and find a tool called scandrive. I compile it, run it, and limit to the first few sectors:
[root@onzenbak ~]# ./scandrive -v /dev/hdd -C 100
scandrive v1.00 (2002/02/01) – firstname.lastname@example.org
I/O buffer: 256 sectors of 512 bytes
Device /dev/hdd is open (capacity = 100 sectors)
Loop 0: scanning sector 0 ( 0.00%)…
Found ext2 magic at sector 2 (size 97677824, #0)
What the ? There’s an ext2 magic string right at the beginning.
And then it hit me. Doh, was I really that stupid ? Wait, shouldn’t I have config for the software RAID in the first place ?
[root@onzenbak ~]# cat /etc/mdadm.conf
# these devices can be used as part of RAID arrays
DEVICE /dev/hdc /dev/hdd
# md0 is our mirrored RAID partition made from our two drives
ARRAY /dev/md0 devices=/dev/hdc,/dev/hdd
Yeah, now I remember that time in my life where I thought partitioning was a waste of space, and I might just as well use the drive directly.
So now I know why I should not do that – I have a 400 GB drive and a 500 GB drive in my machine, and I should create a second partition on the bigger drive so I have some 100 GB of extra useful disk space.
So, here’s the question to you smart people: how can I fix up my /dev/hdd (the 400 GB drive) to have its ext3 file system inside a partition ? And can I then just simply change the devices line to use hdc1 and hdd1 ? How do I get this right without losing data ?