[lang]

Present Perfect

Personal
Projects
Packages
Patches
Presents
Linux

Picture Gallery
Present Perfect

399 days without hard drive failures

Filed under: Fedora,sysadmin — Thomas @ 8:36 pm

2012-12-1
8:36 pm

Well, it’s been a record 399 days, but they have come to an end. Last weekend, a drive in my home desktop started failing. I had noticed some spurious SATA errors in dmesg before, and load times were rising (although lately in the 3.4/5/6 kernels I’ve been running I’ve actually seen that happen more and more, so it wasn’t a clear clue).

Then things really started slowing down, and a little later I noticed the telltale clicking sound a drive can make when it’s about to give up.

Luckily life has taught me many valuable lessons when it comes to dealing with hard drives. The failing drive was a 1TB drive in a RAID-1 software raid setup, so fixing it would be simple – buy a new 1TB drive and put it in the RAID, and just wait for hours on end (or, go to sleep) as the RAID rebuilds.

A few years ago I started keeping track of my drives in a spreadsheet, labeling each drive with a simple four digit code – the first two digits the year I bought the drive in, and the second two digits just a sequence (and before you ask, the highest those two digits got so far is 07 – both in ’11 and ’12). The particular drive failing was 0906, so the drive was about 3 years old – reasonable when it comes to failure (given that it has been running pretty much 24/7), but possibly still under warranty, and I’ve never had the opportunity to try and get a disk back under warranty, although this particular one was bought in Belgium.

But I digress.

Of course, I seldom take the simple route. When buying hard drives, I basically only follow one rule – buy the biggest drive with the cheapest unit price. And at last, Barcelona stores have gotten to the 3TB drives being at that sweet spot. So, why buy a comparatively expensive 1 TB drive and not get to have any fun with complicated drive migration?

So I settled on a 3TB Seagate Red drive (this is a new range specifically for NAS systems, although I’m not convinced they’re worth the 6% extra cost, but let’s give it a try) so I could replace the penultimate 2TB drive in my ReadyNAS, get 1TB of extra capacity on that, and then just use the newly freed 2TB drive in my desktop computer.

Of course, that’s when I ended up with two problems.

Problem 1 was the NAS. The ReadyNAS was at 10TB already, having 4 3TB drives and 2 2TB drives with dual redundancy. I took out a tray, replaced the drive, put it back in, and then waited a good 18 hours for the array to rebuild. (The ReadyNAS has something they call XRaid2 which really is just a fancy way of creating software raids and grouping them with LVM, but in practice it usually works really well – figuring out a number of raid devices it should create using the mix of physical drives).

This time, it had correctly done the raid shuffle, but then gave me an error message saying it couldn’t actually grow the ext4 filesystem on it because it ran out of free inodes. Ouch. A lot of googling told me that I should try to do an offline resize, so I stopped all services using the file system, killed all apple servers that somehow don’t shut down, and did the offline resize. And then I rebooted.

The ReadyNAS seemed to be happy with that at first, saying it now had more space (although depending on the tool you use, it still says 10TB, because of the 2 to the 10/10 to the 3rd number differences adding up). But soon after that it gave me ext3 errrors. Uh oh.

With sweaty palms, I stopped all services again, unmounted the file system, and fsck’d it. And almost immediately it gave me a bunch of warnings about wrong superblocks, wrong inodes, all in the first 2048 sectors. Sure I have backups, but I wasn’t looking forward to figuring out how up-to-date they were and restoring up to 10 TB from them.

I gasped for air and soldiered on, answering yes to all questions, until it churned away, and I went to sleep. The next morning, a few more yeses, and the file system seemed to have been checked. Another reboot, and everything seemed to be there… Phew, bullet 1 dodged.

On to problem number 2 – the desktop. The first bit was easy enough – although I’ve never been able to use gdisk to copy over partition tables like I used to with fdisk – it seems to say it did it, but it never actually updates the partition table. Anyway, I created it by hand copying the exact numbers, then added the partitions to the software raid one by one, and again waited a good 6 hours.

And looking at my drive spreadsheet, I noticed I had a spare 2TB drive lying around that I was keeping in case one of the NAS drives would fail – but given that most of them are 3TB right now, that wouldn’t be very useful. So, after the software raid rebuilt in my desktop, I switched out the working 1 TB drive as well, and repeated the whole dance.

So now I had 2 2TB drives, 1 TB of which was correctly used. At this point I would normally figure out how to grow the partitions, then the md device, then the LVM on it, and then finally grow my ext4 /home partition. But since it’s using LVM and I never played with it much, this time I wanted to experiment.

I still had the working 1TB drive which I could use as a backup in case everything would fail, so I was safe as houses.

At first I was hoping to do this with gparted live, but it seems gparted doesn’t understand either software raid or lvm natively, so it’s back to the command line.
Create two linux raid partitions on the two 2TB drives, assemble a new md device, and spend a lot of time reading the LVM howto.

In the end it was pretty simple; step 1 was to use vgextend to add the new md to the volume group, and then lvextend -l 100%FREE -r to grow the logical volume and resize the file system all at once. That automatically fsck’s (which you can follow progress of by sending USR1) and then resize2fs (which you can’t really check progress of once it’s started)

(By now, we’re over a week into the whole disk dance, in case you were wondering – doing anything with TB-sized disks takes a good night for each operation).

Except that now rebooting for some reason didn’t work – grub complained that it didn’t know the filesystems it needed – /boot is on a software raid too, and even though I don’t recall running anything grub-related in this whole process, I had swapped out a few disks and may have botched something up when transferring boot records.

At the same time, I was also experimenting with Matthias’s excellent new GLIM boot usb project (where you finally just drop in .iso files if you want to have multiple bootable systems on your usb key, without too much fidgeting), so I tried doing this in system rescue cd.

Boot into that, manually mount the right partitions, chroot into that, and then grub2-install /dev/sda.

Except that grub complained saying
Path `/boot/grub2' is not readable by GRUB on boot. Installation is impossible. Aborting.

Most likely this was due to it being on a software raid. Lots of people seemed to run into that, but no clear solutions, so I went the dirty way. I stopped the raid device, mounted one half of it as a normal ext file system (tried read-only first, but grub2-install actually needs to write to it), ran grub2-install, unmounted again. Then I recreated the software raid device for /boot again by reassembling, and that somehow seemed to work.

Reboot again, and this time past GRUB, but dropped in a rescue shell. My mdadm.conf didn’t list the new raid device, so the whole volume group failed to start. Use blkid to identify the UUID, add that to /etc/mdadm.conf (changing the way it’s formatted, those pesky dashes and colons in different places), verify that it can start it, and reboot.

And finally, the reboot seems to work. Except, it needs to do an SELinux relabel for some reason! And in the time it took me to write this way-too-long blogpost, it only managed to get up to 54%.

And I was hoping to write some code tonight…

Oh well, it looks like I will have 1TB free again on my NAS, and 1TB of free space on my home desktop.

There is never enough space for all of the internet to go on your drives…

UPDATE: SELinux relabeling is now at 124%. I have no idea what to expect.

Puppet pains

Filed under: sysadmin — Thomas @ 2:53 pm

2012-3-27
2:53 pm

The jury is still out on puppet as far as I’m concerned.

On the one hand, of course I relish that feeling of ultimate power you are promised over all those machines… I appreciate the incremental improvements it lets you make, and have it give you the feeling that anything will be possible.

But sometimes, it is just so painful to deal with. Agent runs are incredibly slow. It really shouldn’t take over a minute for a simple configuration with four machines. Also, does it really need to be eating 400 MB of RAM while it does so ? And when running with the default included web server (is that webrick ?), I have to restart my puppetmaster for every single run because there is this one multiple definition that I can’t figure out that simply goes away when you restart, but comes back after an agent run:
err: Could not retrieve catalog from remote server: Error 400 on SERVER: Duplicate definition: Class[Firewall::Drop] is already defined; cannot redefine at /etc/puppet/environments/testing/modules/manifests/firewall/drop.pp:19 on node esp

And sometimes it’s just painfully silly. I just spent two hours trying to figure out why my production machine couldn’t complete its puppet run.

All it was telling me was
Could not evaluate: 'test' is not executable

After a lot of googling, I stumbled on this ticket. And indeed, I had a file called ‘test’ in my /root directory.

I couldn’t agree with the reporter more:

I find it incredibly un-pragmatic to have policies fail to run whenever someone creates a file in root which matches the name of an executable I am running.

More adventures in puppet

Filed under: General,Hacking,sysadmin — Thomas @ 11:32 pm

2012-3-4
11:32 pm

After last week’s Linode incident I was getting a bit more worried about security than usual. That coincided with the fact that I found I couldn’t run puppet on one of my linodes, and some digging turned up that it was because /tmp was owned by uid:gid 1000:1000. Since I didn’t know the details of the breakin (and I hadn’t slept more than 4 hours for two nights, one of which involving a Flumotion DVB problem), I had no choice but to be paranoid about it. And it took me a good half hour to realize that I had inflicted this problem on myself – a botched rsync command (rsync arv . root@somehost:/tmp).

So I wasn’t hacked, but I still felt I needed to tighten security a bit. So I thought I’d go with something simple to deploy using puppet – port knocking.

Now, that would be pretty easy to do if I just deployed firewall rules in a single set. But I started deploying firewall rules using the puppetlabs firewall module, which allows me to group rules per service. So that’s the direction I wanted to head off into.

On saturday, I worked on remembering enough iptables to actually understand how port knocking works in a firewall. Among other things, I realized that our current port knocking is not ideal – it uses only two ports. They’re in descending order, so usually they would not be triggered by a normal port scan, but they would be triggered by one in reverse order. That is probably why most sources recommend using three ports, where the third port is between the first two, so they’re out of order.

So I wanted to start by getting the rules right, and understanding them. I started with this post, and found a few problems in it that I managed to work out. The fixed version is this:
UPLINK="p21p1"
#
# Comma seperated list of ports to protect with no spaces.
SERVICES="22,3306"
#
# Location of iptables command
IPTABLES='/sbin/iptables'

# in stage1, connects on 3456 get added to knock2 list
${IPTABLES} -N stage1
${IPTABLES} -A stage1 -m recent --remove --name knock
${IPTABLES} -A stage1 -p tcp --dport 3456 -m recent --set --name knock2

# in stage2, connects on 2345 get added to heaven list
${IPTABLES} -N stage2
${IPTABLES} -A stage2 -m recent --remove --name knock2
${IPTABLES} -A stage2 -p tcp --dport 2345 -m recent --set --name heaven

# at the door:
# - jump to stage2 with a shot at heaven if you're on list knock2
# - jump to stage1 with a shot at knock2 if you're on list knock
# - get on knock list if connecting t0 1234
${IPTABLES} -N door
${IPTABLES} -A door -m recent --rcheck --seconds 5 --name knock2 -j stage2
${IPTABLES} -A door -m recent --rcheck --seconds 5 --name knock -j stage1
${IPTABLES} -A door -p tcp --dport 1234 -m recent --set --name knock

${IPTABLES} -A INPUT -m state --state ESTABLISHED,RELATED -j ACCEPT
${IPTABLES} -A INPUT -p tcp --match multiport --dport ${SERVICES} -i ${UPLINK} -m recent --rcheck --seconds 5 --name heaven -j ACCEPT
${IPTABLES} -A INPUT -p tcp --syn -j door

# close everything else
${IPTABLES} -A INPUT -j REJECT --reject-with icmp-port-unreachable

And it gives me this iptables state:

knock

So the next step was to reproduce these rules using puppet firewall rules.

Immediately I ran into the first problem – we need to add new chains, and there doesn’t seem to be a way to do that in the firewall resource. At the same time, it uses the recent iptables module, and none of that is implemented either. I spent a bunch of hours trying to add this, but since I don’t really know Ruby and I’ve only started using Puppet for real in the last two weeks, that wasn’t working out well. So then I thought, why not look in the bug tracker and see if anyone else tried to do this ? I ask my chains question on IRC, while I find a ticket about recent support. A minute later danblack replies on IRC with a link to a branch that supports creating chains – the same person that made the recent branch.

This must be a sign – the same person helping me with my problem in two different ways, with two branches? Today will be a git-merging to-the-death hacking session, fueled by the leftovers of yesterday’s mexicaganza leftovers.

I start with the branch that lets you create chains, which works well enough, bar some documentation issues. I create a new branch and merge this one on, ending up in a clean rebase.

Next is the recent branch. I merge that one on. I choose to merge in this case, because I hope it will be easier to make the fixes needed in both branches, but still pull everything together on my portknock branch, and merge in updates every time.

This branch has more issues – rake test doesn’t even pass. So I start digging through the failing testcases, adding print debugs and learning just enough ruby to be dangerous.

I slowly get better at fixing bugs. I create minimal .pp files in my /etc/puppet/manifests so I can test just one rule with e.g. puppet apply manifests/recent.pp

The firewall module hinges around being able to convert a rule to a hash as expressed in puppet, and back again, so that puppet can know that a rule is already present and does not need to be executed. I add a conversion unit test for each of the features that tests these basic operations, but I end up actually fixing the bugs by sprinkling print’s and testing with a single apply.

I learn to do service iptables restart; service iptables stop to reset my firewall and start cleanly. It takes me a while to realize when I botched the firewall so that I can’t even google (in my case, forgetting to have -A INPUT -m state –state ESTABLISHED,RELATED -j ACCEPT
) – not helped by the fact that for the last two weeks the network on my home desktop is really flaky, and simply stops working after some activity, forcing me to restart NetworkManager and reload network modules.

I start getting an intuition for how puppet’s basic resource model works. For example, if a second puppet run produces output, something’s wrong. I end up fixing lots of parsing bugs because of that – once I notice that a run tells me something like
notice: /Firewall[999 drop all other requests]/chain: chain changed '-p' to 'INPUT'
notice: Firewall[999 drop all other requests](provider=iptables): Properties changed - updating rule

I know that, even though the result seems to work, I have some parsing bug, and I can attack that bug by adding another unit test and adding more prints for a simple rule.

I learn that, even though the run may seem clean, if the module didn’t figure out that it already had a rule (again, because of bogus parsing), it just adds the same rule again – another thing we don’t want. That gets fixed on a few branches too.

And then I get to the point where my puppet apply brings all the rules together – except it still does not work. And I notice one little missing rule: ${IPTABLES} -A INPUT -p tcp –syn -j door

And I learn about –syn, and –tcp-flags, and to my dismay, there is no support for tcp-flags anywhere. There is a ticket for TCP flags matching support, but nobody worked on it.

So I think, how hard can it be, with everything I’ve learned today? And I get onto it. It turns out it’s harder than expected. Before today, all firewall resource properties swallowed exactly one argument – for example, -p (proto). In the recent module, some properties are flags, and don’t have an argument, so I had to support that with some hacks.

The rule_to_hash function works by taking an iptables rule line, and stripping off the parameters from the back in reverse order one by one, but leaving the arguments there. At the end, it has a list of keys it saw, and hopefully, a string of arguments that match the keys, but in reverse order. (I would have done this by stripping the line of both parameter and argument(s) and putting those on a list, but that’s just me)

But the –tcp-flags parameter takes two arguments – a mask of flags, and a list of flags that needs to be set. So I hack it in by adding double quotes around it, so it looks the same way a –comment does (except –comment is always quoted in iptables –list-rules output), and handle it specially. But after some fidgeting, that works too!

And my final screenshot for the day:

knock-puppet

So, today’s result:

Now, I have a working node that implements port knocking:
node 'ana' {

$port1 = '1234'
$port2 = '3456'
$port3 = '2345'

$dports = [22, 3306]

$seconds = 5

firewall { "000 accept all icmp requests":
proto => "icmp",
action => "accept",
}

firewall { "001 accept all established connections":
proto => "all",
state => ["RELATED", "ESTABLISHED"],
action => "accept",
}

firewall { "999 drop all other requests":
chain => "INPUT",
proto => "tcp",
action => "reject",
}

firewallchain { [':stage1:', ':stage2:', ':door:']:
}

# door
firewall { "098 knock2 goes to stage2":
chain => "door",
recent_command => "rcheck",
recent_name => "knock2",
recent_seconds => $seconds,
jump => "stage2",
require => [
Firewallchain[':door:'],
Firewallchain[':stage2:'],
]
}

firewall { "099 knock goes to stage1":
chain => "door",
recent_command => "rcheck",
recent_name => "knock",
recent_seconds => $seconds,
jump => "stage1",
require => [
Firewallchain[':door:'],
Firewallchain[':stage1:'],
]
}

firewall { "100 knock on port $port1 sets knock":
chain => "door",
proto => 'tcp',
recent_name => 'knock',
recent_command => 'set',
dport => $port1,
require => [
Firewallchain[':door:'],
]
}

# stage 1
firewall { "101 stage1 remove knock":
chain => "stage1",
recent_name => "knock",
recent_command => "remove",
require => Firewallchain[':stage1:'],
}

firewall { "102 stage1 set knock2 on $port2":
chain => "stage1",
recent_name => "knock2",
recent_command => "set",
proto => "tcp",
dport => $port2,
require => Firewallchain[':stage1:'],
}

# stage 2
firewall { "103 stage2 remove knock":
chain => "stage2",
recent_name => "knock",
recent_command => "remove",
require => Firewallchain[':stage2:'],
}

firewall { "104 stage2 set heaven on $port3":
chain => "stage2",
recent_name => "heaven",
recent_command => "set",
proto => "tcp",
dport => $port3,
require => Firewallchain[':stage2:'],
}

# let people in heaven
firewall { "105 heaven let connections through":
chain => "INPUT",
proto => "tcp",
recent_command => "rcheck",
recent_name => "heaven",
recent_seconds => $seconds,
dport => $dports,
action => accept,
require => Firewallchain[':stage2:'],
}

firewall { "106 connection initiation to door":
# FIXME: specifying chain explicitly breaks insert_order !
chain => "INPUT",
proto => "tcp",
tcp_flags => "FIN,SYN,RST,ACK SYN",
jump => "door",
require => [
Firewallchain[':door:'],
]
}
}

and I can log in with
nc -w 1 ana 1234; nc -w 1 ana 3456; nc -w 1 ana 2345; ssh -A ana

Lessons learned today:

  • watch iptables -nvL is an absolutely excellent way of learning more about your firewall – you see your rules and the traffic on them in real time. It made it really easy to see for example the first nc command triggering the knock.
  • Puppet is reasonably hackable – I was learning quickly as I progressed through test and bug after test and bug.
  • I still don’t like ruby, and we may never be friends, but at least it’s something I’m capable of learning. Puppet might just end up being the trigger.

Tomorrow, I need to clean up the firewall rules into something reusable, and deploy it on the platform.

How do you manage mailing lists?

Filed under: Question,sysadmin — Thomas @ 5:00 pm

2012-1-2
5:00 pm

Every new year is a time of cleaning. After getting back to Inbox 0, my next target is my mailing list subscriptions.

It must be something psychological, but I cannot bring myself to unsubscribe from some of these mailing lists. I don’t check on them daily, but once in a while it’s darn useful to search through my local copy of mails on, say, selinux, and find solutions for a problem I’m having.

However, all this mailing list mail brings me a lot of headache. My email client is slow, and I would want it to be fast for the real mail I’m getting (from actual people, needing actual work). It’s hard to track the mails that matter – all my list mail gets put into folders automatically with some procmail magic, but it also means that some of the things I should be paying more attention to are just another bold folder in Evolution somewhere down the mail tree. And lastly, the server where I host my mail shared with friends gets too much traffic, and syncing 3 different evolutions over IMAP with it is a big part of the burden.

I vastly prefered the newsreader model of old, and I think the de facto standard of mailing lists really is a mistake. But I’m not sure what to replace it with.

What I want:

  1. have selected mailing list archives be available on my machines, locally
  2. have them synced/updated automatically
  3. have them out of the way of my normal mail usage unless when I need them

I’ve been considering getting a separate email account just for email lists for this purpose, although I don’t look forward much to having to change all my subscriptions, and would first like to hear from other people how this approach works out for them.

There used to be a push towards web-based mailing list subscriptions, but I don’t know if anyone is really seriously using that, and I would like to have the option of reading these mailing list archives offline.

How do you separate your ‘real’ mail from your mailing list mail? How do you handle them?

using xargs on a list of paths with spaces in a file

Filed under: Hacking,sysadmin — Thomas @ 7:18 pm

2011-12-30
7:18 pm

Every few weeks I have to spend an hour figuring out exactly the same non-googleable thing I’ve already needed to figure out. So this time it’s going on my blog.

The problem is simple: given an input file listing paths, one per line, which probably contain spaces – how do I run a shell command that converts each line to a single shell argument ?

Today, my particular case was a file /tmp/dirs on my NAS which lists all directories in one of my dirvish vaults that contains files bigger than a GB. For some reason not everything is properly hardlinked, but running hardlink on the vault blows up because there are so many files in there.

Let’s see if wordpress manages to not mangle the following shell line.

perl -p -e 's@\n@\000@g' /tmp/dirs | xargs -0 /root/hardlink.py -f -p -t -c --dry-run

Next Page »
picture