thomas.apestaart.org » 2009

This weekend’s yak shave

Filed under: Hacking,Python — Thomas @ 13:17

2009-06-27
13:17

The yak shave started yesterday evening. The yak stack is actually a forked one this time, both of the forks involving pychecker.

I might not remember everything in order, but in a nutshell the stack is something like this:

The original goal for this weekend at the beginning of the week was to release my cd ripper, morituri
Something in its pychecker run did not run with pychecker 0.8.17 (the version still in Fedora, from 2006), and worked with pychecker 0.8.18 (my own build). Fork point 1.
While releasing moap this week, I realized that Freshmeat changed their remote API, which I should fix before I do another release of anything. Fork point 2.

Fork point 1 continues here:

I mailed Fedora's pychecker maintainer, offering my help, and sending a patch to update to 0.8.18, which I built and installed locally. He mailed back informing me about this pychecker bug with anaconda which was blocking the upgrade. Looking at that bug, it looked suspiciously like a bug triggered by code I added to pychecker last year.
However, I'd like to confirm so in the easiest way. git's got this great feature, git bisect, and wouldn'it be nice if I could do that on pychecker now ? Hey, why not add bisection to moap ?
CVS is actually not a very manageable VCS if you want to do fancy stuff. It costs me a few hours to figure out how I should get a more-or-less usable date from a CVS checkout. The final solution is similar to the reply to my stackoverflow question
moap vcs bisect run is now implemented and finds the commit that broke anaconda's pychecking. STACK POINTER IS CURRENTLY HERE.

Fork point 2 continues here:

moap's make check didn't work because pychecker complained about the following code:
ef func(): d = { 'a': 1, 'b': 2} print d.keys()

which triggers, in python 2.6, the following warning:
Object (d) has no attribute (keys)
I can't let myself change code in moap without a working make check, so on to figuring out what's wrong in pychecker
After lots of debugging and print statements, I figure out that pychecker dispatches Python opcodes, and it silently drops the ones it doesn't know about. Python 2.6 added a new opcode, STORE_MAP, and so pychecker doesn't properly handle the stack since it ignores the opcode. I should error out on those opcodes. STACK POINTER IS CURRENTLY HERE
Before I can fix that though, I decide I should make pychecker's test suite error out on unknown opcodes.
Of course, this will error out differently for different python versions, so I need different python versions on this machine.
I could do that by hand, but I'd also like Twisted's trial to run the testsuite, which also needs zope-interface in recent versions, which needs setuptools. So hey, why not set up jhbuild stuff to build all these python versions ?
Python 2.3 on my 64 bit machine doesn't work with setuptools, so the newest Twisted without it is 1.3.0, and it takes a while to figure out why trial doesn't run my testcases (it ends up being because 1.3.0 trial expects subclasses of twisted.trial.unittest.TestCase, not unittest.TestCase)

I'll blog about the useful products of my yak shave separately, for those who don't enjoy descriptions of yak shavings, only outcomes.

In general, I actually enjoy yak shaves. They're massive treasure hunts, you learn a lot, and you end up fixing a nice bunch of things all over the stack if you persevere. But it's probably more a mentality thing than anything else, and I really only indulge myself in these in my spare time.

Comments (3)

moap 0.2.7 released

Filed under: Hacking,moap,Python,Releases — Thomas @ 21:41

2009-06-24
21:41

moap is a swiss army knife for maintainers and developers.

This is MOAP 0.2.7, "MMM...".

Coverage in 0.2.7: 1424 / 1899 (74 %), 109 python tests, 2 bash tests

Features added since 0.2.6:
- Added moap vcs backup, a command to backup a checkout to a tarball that
can be used later to reconstruct the checkout. Implemented for svn.
- Fixes for git-svn, git, svn and darcs.
- Fixes for Python 2.3 and Python 2.6

I've been fixing things left and right for python 2.6, and in the process I noticed that moap hasn't had a release for over a year. This release contains mostly bug fixes collected over the year, and a new feature that isn't implemented yet for all VCS's. Basically it's an automatic replacement for something I was doing manually every time I removed an old GNOME cvs/svn/git checkout: figure out what's in that tree that's not in the repository (diffs, unversioned files, ...), so I can delete everything else and free some disk space.

The only problem with this release is that, after doing the release, I noticed that Freshmeat removed their XML-RPC interface. Apparently they have some new kind of interface they want people to use. Sigh. But that means 0.2.8 is right around the corner!

Comments (0)

Recovering from a lost /var on Fedora/Red Hat/CentOS

Filed under: Fedora,sysadmin — Thomas @ 20:28

20:28

Last week, after upgrading my home desktop to F11, I had palimpsest tell me one of my disks was broken on the desktop machine. The desktop is running on two 250 GB drives in software raid. It was time to get new drives.

After a weekend of fiddling with new 1 TB disks for my home desktop, trying failure scenarios, making sure the system can boot from each of the two drives, and waiting for the 4 hour resync of the software RAID in between each step, I finally closed up the desktop machine and cleaned up under my desk again, thinking I was done with my halfyearly messing about with broken disks.

I guess I was tempting faith anyway. Doing a routine operation on my home server after all the configuration stuff I'd done to set up asterisk last week, suddenly an rsync aborted, a journal errored out, a partition changed to being mounted read-only, and the log was full of scary drive errors. Ouch.

Well, that's why I keep around a big box of old drives - for when some drive fails and I want to tempt fate even more by reusing an old drive that's probably going to fail real soon too. And anyway, I had just spent my hard drive piggybank on the new desktop drives.

Luckily, I seemed to have a 400 GB SATA drive lying around that used to belong to my media center. I don't remember why I swapped it out, given that the media center has a 160GB drive for the OS (and two 1.5 TB raid drives for the data, of course), but this was a lucky break. I booted with a rescue cd, and tried copying the root filesystem of my CentOS 5.2 home server partition to this new drive. Which worked fine, except that /var was where I triggered an Input/Output error and some more drive errors in the kernel log.

So, powered off, took out the broken drive, and put it in a USB chassis. The advantage of a USB chassis is that you can easily just replug the drive to try again, instead of locking up your system terribly and having to reboot. Sadly, /var was broken beyond repair. I ran an e2fsck hoping to recover the contents, and that partly worked, but some of the important stuff is missing even from lost+found (apart from the annoying situation where you have to reconstruct file names, which I usually end up not bothering with).

But really, how important can /var be ? Turns out, rather important. As in, you need it to boot in the first place. And also, it holds your rpm database. Crap.

Some Googling gave me some posts on how to reconstruct your rpm database from log files (using --justdb --noscripts --notriggers). But to use those, you actually need those log files. Where are those ? On /var as well. Crap. And they're not in lost+found either.

Ok, so time to get creative. Here's what I ended up doing:

create /var/lib/rpm, and run rpm --rebuilddb to end up with an empty rpm database
Based on the contents of /usr/bin, figure out what packages ought to be installed:
rpm -qf /etc/* | grep 'not owned' | cut -f2 -d' ' > /tmp/unowned yum --enablerepo=c5-media --disablerepo=base --disablerepo=updates --disablerepo=addons --disablerepo=extras whatprovides `cat /tmp/unowned` | cut -f1 -d' ' | sort | uniq > /tmp/missing yum --enablerepo=c5-media --disablerepo=base --disablerepo=updates --disablerepo=addons --disablerepo=extras install `cat /tmp/missing`

This works by first listing all files that are not owned by rpm (on the first run, that's all of them), figure out what packages can provide these files, then installing those packages.
Repeat the process for other important directories, like /bin, /sbin, /usr/sbin, /usr/lib, /usr/include, ...
Clean up .rpmnew files that don't actually contain differences:
find / -name *.rpmnew | sed s/.rpmnew//g > /tmp/rpmnew for c in `cat /tmp/rpmnew`; do echo $c; diff $c $c.rpmnew && mv -f $c.rpmnew $c; done
Same for *.rpmorig:
find / -name *.rpmorig | sed s/.rpmorig//g > /tmp/rpmorig for c in `cat /tmp/rpmorig`; do echo $c; diff $c $c.rpmorig && mv -f $c.rpmorig $c; done
Inspect the remaining ones, and merge changes.

While it's not an experience I hope to repeat any time soon, it worked out surprisingly well!

Comments (1)

Upgrading to F11

Filed under: Fedora,Hacking — Thomas @ 15:44

15:44

I managed to completely skip updating to F10. All my machines (work desktop, home desktop, laptop, media center) where running F9 without any real problems I worried about.

But of course I was curious. And, especially with the move to python 2.6, things I care about where bound to break.

So, last weekend I took the plunge, and after little over a week here are my first impressions:

Overall F11 looks slick. Nice work on the artwork! I particularly liked the GDM background, looking like an ancient brushed metal object, reminding me of how I used to love playing Gods by the Bitmap Brothers.
Apparently anaconda now has bugzilla integration, allowing you to file a bug directly from inside anaconda. Luckily for Jeremy (who I assume still maintains it) it has some code in there to look for existing bug entries with the same backtrace. Very nice!
Of course, I wouldn't have found out if I hadn't run into exceptions in anaconda. I ran into while setting up two completely new hard disks with 2 software raid partition and LVM on the second one.
I first installed my work desktop, as usual putting the new installation on a separate partition, keeping my old one around in case the install goes wrong or F11 just isn't stable enough for me. For me, that involves having a /mnt/alpha and /mt/omega partition between which I alternate. At some point I should figure out if other people do this too and if it makes sense for anaconda to support something like this and at least allow me to keep my GRUB configuration for the older installation. For now I do this manually, using a hugeupgrade text file I follow each time I upgrade, accumulating more and more steps each time I perform the procedure.
On my home machine, when I booted into F11, as usual my second monitor didn't work (I have a Radeon GeCube Pro 2400). My own fault really - I should have tried to get a patch upstream into the default radeon driver the same way I sent a patch for the radeonhd driver that I still use. A rebuild later, I at least had the old radeonhd driver rebuilt to get my second screen working again.
Having the second screen now made me change my mind completely about the GDM wallpaper. That lion on the right hand side that I didn't see before completely ruins the style for me. Sorry!
Upon logging in to the work desktop, I had no network. Completely puzzled as to why, until I figured out that I had to actually right-click on NetworkManager's tray icon, choose to configure, and activate eth0 by default. After some browsing it seems that this was a deliberate choice to increase security. While I can possibly sympathize with the motivation for doing so, it really is terrible to change this by default and not provide *any* indication during or after installation. At the very least, the following things could have been done:
1. provide a clear notice during installation, and allow a user to choose to enable it anyway, assuming the security risk
2. the same, but during firstboot
3. after logging in, having the network manager tooltip say 'the network is disabled by default in this new release, here's how you enable it
I am not entirely sure what the security problems are with enabling the network after installation. The default firewall is pretty locked down, SELinux is enabled by default, and there's no way I can install updates without the network anyway. But I'm sure that I could find huge bikeshedding threads on fedora-devel about this if I really cared why this was decided.
Upon logging in to the home desktop, I was greeted with a tooltip saying that one of my drives was going bad. That was a nice touch! Really good idea to have something like that be monitored by default. This prompted me to ponder to finally replace my desktop's 250 GB PATA drives with real SATA drives - a story for another post.
Various deprecation warnings pop up running various Python programs, including my own. Flumotion needed a patch for running against 2.6 (I rebuilt and pushed to F11). So I have some cleanup ahead, and I should revisit pychecker soon.
The first piece of functionality I checked was Evolution's Google Calendar integration. It still seems a bit shaky, given that I had to restart Evolution a few times as it froze doing stuff with the net, but it does seem to work. That means I will finally be able to accept work invitations done through Outlook and get them on my Google Calendar! Awesome. Now if only I didn't have to manually configure each of the ten calendars I'm interested in...
At work, when I played a video using XVideo, my machine instantly froze. Seems to be a known bug. The intel drivers are being rewritten. I've never quite understood why rewriting is an excuse for breaking stuff that worked (I should check if Firewire video finally works reliably now when I have the chance, for example), but all in the name of progress I guess.
I don't know why it's happening, but once in a while my screens blank. Even in the middle of doing stuff. If I were a gamer I'd be hugely annoyed as my character would be shot through the head in that split instant. The closest bug I can find is this one, where I commented. Hugely annoying bug because I don event know how to begin debugging a bug like this that I can't catch in the act.
PulseAudio integration in GDM seems a bit fragile. I have my pulseaudio configured to send audio to my media center pulseaudio server. Sometimes, after choosing a username in GDM, it doesn't manage to play the audio sample related to that action, and then GDM is stuck there not showing me the password entry dialog. Pretty sure it's due to blocking on pulseaudio, because when I kill it the password dialog appears. Pretty painful bug for new users though.

All in all, not a bad first week experience, and seems like a solid release. Now, off to rebuild bits and pieces, and clean up Python 2.6 deprecation warnings...

Comments (7)

Home test, and tftp bits

Filed under: sysadmin — Thomas @ 13:13

2009-06-19
13:13

After some situations at work this week where I lost time where I really shouldn't have had to, combined with the observation that I get more useful strategic work done at home in Belgium, and because practically speaking going to Barcelona next week would be silly given that I can only leave on Monday and Wednesday is a day off (which I loathe - San Joan, the most dangerous night in Barcelona), I decided to stay home next week and compensate by fixing my phone setup.

You see, the only really annoying thing is that any conference call I end up in is terrible because I have a really hard time hearing the other side through either my mobile or my fixed phone, as the audio cuts out several times a second.

So, I spent a few hours yesterday first setting up the VPN, which aside from some minor issues seems to be working fine now. This was apparently a prerequisite for setting up asterisk because asterisk needs a fixed IP address or something I've been told.

After that, I started setting up Asterisk so that I could use the same THOMSON phone we have at work from home and call people in the office over it.

All of that is not what this post is about though.

This post is about the TFTP tricks and things I always need to re-learn any time I meddle with tftp. I'm putting them here because Google usually doesn't find the problems and solutions I come up with, so maybe they're of use to you if you play with TFTP. They will definately be of use to me next time I mess with tftp.

TFTP runs over UDP on port 69
on Linux TFTP typically runs from xinetd. Do yourself a favour, edit /etc/xinetd.d/tftpboot and add -v -v -v to the server_args line. These lines should end up in your /var/log/messages
For some reason xinetd is fidgety with tftp. It doesn't restart in.tftpd properly when you reload or restart xinetd, and so your verbose changes might not happen. Check with ps aux. You can kill it, but then xinetd doesn't seem to start up in.tftpd properly for a while either. Strange stuff - please tell me if you know what's going on here
Keep a tcpdump running on your tftp server to see requests actually make it in: tcpdump | grep tftp
Start by trying a tftp transfer on the server to localhost:tftp localhost -c get testIdeally, you should get Error code 1: File not found back immediately.
Now try an nmap from another machine: nmap -sU -p 69 server which should come back with 69/udp open|filtered tftp. If it doesn't, you probably didn't open 69/UDP on your server's firewall. You can confirm by just turning off your firewall on the server for a quick test.
If it shows as open|filtered, try tftp server -c get test. This should error out immediately as well. If it doesn't, it's probably because your test machine does not allow tftp in. Confirm simply by turning off your firewall. The simplest way to fix this is to load the tftp connection tracking module: modprobe nf_conntrack_tftp. This makes sure that your machine knows to accept the reply tftp request coming in on a random port. On Fedora/RedHat systems you can make this permanent by adding it int /etc/sysconfig/iptables-config to the IPTABLES_MODULES variable. This is the number one thing I keep forgetting when debugging tftp troubles.
After that, try with actually existing files. Make sure you have the SEcontext correct; you can run restorecon -vR /tftpbooton the server for that. You can always confirm or deny whether SELinux is giving you trouble by temporarily turning it off. My auditd (the process that logs SELinux violations to /var/log/audit/audit.log) sometimes stops logging properly to the log file, and I need to restart it in that case. It's easy to spot when auditd is misbehaving because by default it even logs replies to calls like setenforce 0.
Be careful with symlinks in /tftpboot if you use them. On your system they should actually be broken, because the tftp server will serve from /tftpboot and treat that as its root, as if it were chroot'd. So, if you have a file /tftpboot/phone/phone.inf, and you want a symlink to that file to exist and work in /tftpboot, you actually need to create a broken symlink like this: ln -sf /phone/phone.inf /tftpbootso that the symlink will work for tftpd.in This is one of those steps that I completely forget every time too.

Well, that should be it for the next time I have tftp troubles!

Comments (3)

« Previous Page — Next Page »

Present Perfect