thomas.apestaart.org

Low cost per core

Filed under: Question,sysadmin,Work — Thomas @ 12:16

2010-06-21
12:16

For work, I'm re-reviewing servers and systems with the simple-but-not-easy goal of lowering the basic monthly cost per core. The world of racks, servers, CPU's and cores is a more complicated place than it was a few years ago, since in a few U's you can put anything from a bunch of small cheap servers up to monster boards with four CPU sockets and 12 core CPU's for a total of 48 CPU's in a 2U space. And a look at a Blade system still makes me drool, although I'm still not sure in what case a Blade really makes sense.

In any case, I tried to do a little comparison (which is hard, because you end up comparing apples and oranges) using Dell's online configurator.

On the one hand, filling racks with Poweredge R810, 4 x 8 core 1.86 GHz, Intel XeonL7555 machines, gets the price down 26 euro per core per month. Doing the same with Opterons, which surely aren't as powerful as the Intel ones, I can get a Poweredge R815, 48 cores quad Opteron 2.2 GHz, 6174, for 48 cores total, at 9.53 euro per core per month.

And then I thought a Blade would be an even better deal, but it turns out that it isn't really. The cost per core, with similar CPU's, really did come out pretty much the same as the R810 based solution. Probably not that surprising in the end since if you fill a machine with cores, the CPU cost will start dominating. But somehow I thought that Blades would end up being cheaper for maximum core power.

Maybe I'm approaching this the wrong way ? If the main concern is cost per core in a datacenter, how would you go about selecting systems ?

Comments (11)

DEFCON 17

Filed under: Fluendo,sysadmin — Thomas @ 10:49

2009-08-11
10:49

Defcon 17 wrapped up last week. Apparently, we had three of our people in the contest, and they scored very well!

Sergi from Fluendo, being an oldtimer going for the third time, ended up in 5th position with his 'Sexy Pwndas' team.

Javier and Guillem, from the Flumotion support team, went for the first time, and landed in 7th position with their 'Sapheads' team, a particularly impressive feat for first-timers.

Check ddtek for the final ranking.

Congratulations guys! Maybe we should have you guys do an audit of our platform ?

Comments (0)

Recovering from a lost /var on Fedora/Red Hat/CentOS

Filed under: Fedora,sysadmin — Thomas @ 20:28

2009-06-24
20:28

Last week, after upgrading my home desktop to F11, I had palimpsest tell me one of my disks was broken on the desktop machine. The desktop is running on two 250 GB drives in software raid. It was time to get new drives.

After a weekend of fiddling with new 1 TB disks for my home desktop, trying failure scenarios, making sure the system can boot from each of the two drives, and waiting for the 4 hour resync of the software RAID in between each step, I finally closed up the desktop machine and cleaned up under my desk again, thinking I was done with my halfyearly messing about with broken disks.

I guess I was tempting faith anyway. Doing a routine operation on my home server after all the configuration stuff I'd done to set up asterisk last week, suddenly an rsync aborted, a journal errored out, a partition changed to being mounted read-only, and the log was full of scary drive errors. Ouch.

Well, that's why I keep around a big box of old drives - for when some drive fails and I want to tempt fate even more by reusing an old drive that's probably going to fail real soon too. And anyway, I had just spent my hard drive piggybank on the new desktop drives.

Luckily, I seemed to have a 400 GB SATA drive lying around that used to belong to my media center. I don't remember why I swapped it out, given that the media center has a 160GB drive for the OS (and two 1.5 TB raid drives for the data, of course), but this was a lucky break. I booted with a rescue cd, and tried copying the root filesystem of my CentOS 5.2 home server partition to this new drive. Which worked fine, except that /var was where I triggered an Input/Output error and some more drive errors in the kernel log.

So, powered off, took out the broken drive, and put it in a USB chassis. The advantage of a USB chassis is that you can easily just replug the drive to try again, instead of locking up your system terribly and having to reboot. Sadly, /var was broken beyond repair. I ran an e2fsck hoping to recover the contents, and that partly worked, but some of the important stuff is missing even from lost+found (apart from the annoying situation where you have to reconstruct file names, which I usually end up not bothering with).

But really, how important can /var be ? Turns out, rather important. As in, you need it to boot in the first place. And also, it holds your rpm database. Crap.

Some Googling gave me some posts on how to reconstruct your rpm database from log files (using --justdb --noscripts --notriggers). But to use those, you actually need those log files. Where are those ? On /var as well. Crap. And they're not in lost+found either.

Ok, so time to get creative. Here's what I ended up doing:

create /var/lib/rpm, and run rpm --rebuilddb to end up with an empty rpm database
Based on the contents of /usr/bin, figure out what packages ought to be installed:
rpm -qf /etc/* | grep 'not owned' | cut -f2 -d' ' > /tmp/unowned yum --enablerepo=c5-media --disablerepo=base --disablerepo=updates --disablerepo=addons --disablerepo=extras whatprovides `cat /tmp/unowned` | cut -f1 -d' ' | sort | uniq > /tmp/missing yum --enablerepo=c5-media --disablerepo=base --disablerepo=updates --disablerepo=addons --disablerepo=extras install `cat /tmp/missing`

This works by first listing all files that are not owned by rpm (on the first run, that's all of them), figure out what packages can provide these files, then installing those packages.
Repeat the process for other important directories, like /bin, /sbin, /usr/sbin, /usr/lib, /usr/include, ...
Clean up .rpmnew files that don't actually contain differences:
find / -name *.rpmnew | sed s/.rpmnew//g > /tmp/rpmnew for c in `cat /tmp/rpmnew`; do echo $c; diff $c $c.rpmnew && mv -f $c.rpmnew $c; done
Same for *.rpmorig:
find / -name *.rpmorig | sed s/.rpmorig//g > /tmp/rpmorig for c in `cat /tmp/rpmorig`; do echo $c; diff $c $c.rpmorig && mv -f $c.rpmorig $c; done
Inspect the remaining ones, and merge changes.

While it's not an experience I hope to repeat any time soon, it worked out surprisingly well!

Comments (1)

Home test, and tftp bits

Filed under: sysadmin — Thomas @ 13:13

2009-06-19
13:13

After some situations at work this week where I lost time where I really shouldn't have had to, combined with the observation that I get more useful strategic work done at home in Belgium, and because practically speaking going to Barcelona next week would be silly given that I can only leave on Monday and Wednesday is a day off (which I loathe - San Joan, the most dangerous night in Barcelona), I decided to stay home next week and compensate by fixing my phone setup.

You see, the only really annoying thing is that any conference call I end up in is terrible because I have a really hard time hearing the other side through either my mobile or my fixed phone, as the audio cuts out several times a second.

So, I spent a few hours yesterday first setting up the VPN, which aside from some minor issues seems to be working fine now. This was apparently a prerequisite for setting up asterisk because asterisk needs a fixed IP address or something I've been told.

After that, I started setting up Asterisk so that I could use the same THOMSON phone we have at work from home and call people in the office over it.

All of that is not what this post is about though.

This post is about the TFTP tricks and things I always need to re-learn any time I meddle with tftp. I'm putting them here because Google usually doesn't find the problems and solutions I come up with, so maybe they're of use to you if you play with TFTP. They will definately be of use to me next time I mess with tftp.

TFTP runs over UDP on port 69
on Linux TFTP typically runs from xinetd. Do yourself a favour, edit /etc/xinetd.d/tftpboot and add -v -v -v to the server_args line. These lines should end up in your /var/log/messages
For some reason xinetd is fidgety with tftp. It doesn't restart in.tftpd properly when you reload or restart xinetd, and so your verbose changes might not happen. Check with ps aux. You can kill it, but then xinetd doesn't seem to start up in.tftpd properly for a while either. Strange stuff - please tell me if you know what's going on here
Keep a tcpdump running on your tftp server to see requests actually make it in: tcpdump | grep tftp
Start by trying a tftp transfer on the server to localhost:tftp localhost -c get testIdeally, you should get Error code 1: File not found back immediately.
Now try an nmap from another machine: nmap -sU -p 69 server which should come back with 69/udp open|filtered tftp. If it doesn't, you probably didn't open 69/UDP on your server's firewall. You can confirm by just turning off your firewall on the server for a quick test.
If it shows as open|filtered, try tftp server -c get test. This should error out immediately as well. If it doesn't, it's probably because your test machine does not allow tftp in. Confirm simply by turning off your firewall. The simplest way to fix this is to load the tftp connection tracking module: modprobe nf_conntrack_tftp. This makes sure that your machine knows to accept the reply tftp request coming in on a random port. On Fedora/RedHat systems you can make this permanent by adding it int /etc/sysconfig/iptables-config to the IPTABLES_MODULES variable. This is the number one thing I keep forgetting when debugging tftp troubles.
After that, try with actually existing files. Make sure you have the SEcontext correct; you can run restorecon -vR /tftpbooton the server for that. You can always confirm or deny whether SELinux is giving you trouble by temporarily turning it off. My auditd (the process that logs SELinux violations to /var/log/audit/audit.log) sometimes stops logging properly to the log file, and I need to restart it in that case. It's easy to spot when auditd is misbehaving because by default it even logs replies to calls like setenforce 0.
Be careful with symlinks in /tftpboot if you use them. On your system they should actually be broken, because the tftp server will serve from /tftpboot and treat that as its root, as if it were chroot'd. So, if you have a file /tftpboot/phone/phone.inf, and you want a symlink to that file to exist and work in /tftpboot, you actually need to create a broken symlink like this: ln -sf /phone/phone.inf /tftpbootso that the symlink will work for tftpd.in This is one of those steps that I completely forget every time too.

Well, that should be it for the next time I have tftp troubles!

Comments (3)

.pid files after reboot

Filed under: sysadmin — Thomas @ 16:30

2009-05-04
16:30

Hey all you Unix heads and sysadminny/developer types,

should .pid files be cleaned up on a reboot, because the processes definately went away ? If so, which part of the system should take care of this ? Each and every service script on its own somehow ? If not, why ?

Answers on a postcard or in the comments!

Comments (11)

Present Perfect

Low cost per core

2010-06-2112:16

DEFCON 17

2009-08-1110:49