Present Perfect


Picture Gallery
Present Perfect

Adventures in Maemo

Filed under: Hacking,maemo — Thomas @ 22:04


August is a great month for clearing the decks in Spain. Half the people at work are on holiday, and by the time they get back the other half goes away. It's a great month for making progress on all those things that've been lying around for months not getting done.

Same in my spare time. After lots of changes in my life in the last year, I've been settling down again, in a new apartment, in a new life, and I am finally in a place where I can do some hacking again. And it's a great feeling, to be back in a flow state and fixing problems.

Tonight I wanted to direct my hacking attention to my venerable N900 phone. First of all, I wanted to figure out why mushin, my couchdb-based GTD application, was not showing the right results for my shopping list. Turns out I had a broken svn snapshot running on the phone, and that was easy to fix. In the process I set up scratchbox again on my new laptop.

Then I directed my attention to erminig-ng, an application that syncs between the maemo calendar and Google calendar. It had stopped working a few months ago, giving a traceback during sync. I first tried reproducing the problem in scratchbox, but I couldn't even hit the bug after copying the sqlite databases for both erminig and calendar.

I ended up adding some well-placed prints and

import code; code.interact(local=locals())

statements on the phone's file system so I could inspect the objects provoking the crash. It turns out there was a simple call somewhere that overwrites an event's alarm time setting, but it does so with an int when the code in gdata expects a string.

Now, to submit the fix, I have the same problem as last time I submitted a fix. erminig-ng is a fork of erminig. erminig has a website that has no updates since 2008. The version number on that site is higher than the version number of erminig-ng, but erminig-ng is packaged as erminig, which is confusing for people. The bug tracker on maemo garage that the package for erminig-ng links to is only for erminig, and people get confused, so instead they're reporting all their problems in a long maemo talk conversation. It looks like the maintainer for erminig-ng is the same as for erminig, so I'm not sure why this confusion is there. It looks like it would b easy to clear up. The code is not in any public repository as far as I can tell, and people are fixing things here and there with patch files in the maemo talk thread giving instructions on how to patch files on the root file system. A mess, really. It should probably just go on github somewhere, but I want to hear from the maintainer first.

So, I slurped the latest 0.2.12 release source into my svn repository, forked it to a trunk directory, and started integrating my patch. All .py files are in the root, I had to fight the urge to not clean it up properly and put it nicely in separate directories, because that would make merging patches back harder. I added a test case for the bug I was hitting as well - there were no tests so far in the package.

After my bug was fixed, and the test worked, I rebuilt a package for my phone. That led me to updating my maemo repository, adding gpg signing to it, adding metadata info to it, and doing everything needed to make it easy to update and not trigger any errors on the phone when upgrading. Now it looks like a nice proper maemo repository (and even apt-file should work) - but if any debhead wants to give it a looksee to see if anything's wrong feel free!

So, now erminig is available from my repository, works on my phone, and is now again properly syncing my calendar information. I dropped a line in the forum thread for a person who was probably running into the same bug as me.

Next on my radar - looking at barriosquare to see if an update to the foursquare v2 API is doable.

I felt so guilty for not doing enough maemo stuff over the last year to try and get an N950. Now I regret not applying - it's going to be the last phone in the series (yes, I'm one of those who think Nokia is making a huge mistake), the focus of what's left of Maemo will be on that phone, and I won't be able to update the programs I use on Maemo, and my phone will slowly become obsolete. At which point I probably have no choice but to switch to Android, the non-Linux Linux system. I don't know many people anymore who work for Nokia, but if any of them are listening and still want to help me get an N950, feel free to let me know.

And the kicker of my patchwork ? It turns out that, when I fixed a bug in the previous version, I had already forked off 0.2.11 into my repository at a different location. And in that checkout, I also added a unit test, a similar-but-different hack for not being able to load hildon as a module, and a HACKING and TODO file with very similar notes as today.

Hey, at least I'm consistent. And maybe I should throw this stuff on github after all, so I can use git's branch powers to track the upstream releases, and the work I did that did not get taken upstream...

Adventures in fingerprinting

Filed under: DAD,Fedora,GStreamer — Thomas @ 20:55


One of the key concepts in my rewrite of DAD is that it should be possible to relate the same track across different files and computers. I have copies of files, and different encodings of the same track, spread across machines. Various applications I use for playback seem to exist in isolation on each machine, and so I tend to rate only occasionally knowing that my ratings aren't centralized. And I get annoyed when banshee detects three copies of an album, and then orders them by track number, playing each track three times before moving on to the next one.

The logical way to do is is through acoustic fingerprinting. These are algorithms that extract certain features from an audio file and calculate an algorithm-specific 'fingerprint' for it. Usually, these fingerprints are not identical across different encodings of the same file, so you can't look up twins in a list; but the fingerprints can be compared to each other and a 'difference' within a certain confidence interval calculated.

Most fingerprinting algorithms have a library that calculates a fingerprint and then submits it to a complimentary web service where it can quickly compare it to find twins.

In the past, either the client library/application or the web service (or both) was not open enough to be of interest for most Free Software people.

But recently, someone in the #morituri channel mentioned acoustid which only consists of open components. So, that seemed interesting enough to try out!

The chromaprint client-side library consists of a library, a sample application (linked against FFMPEG), and a python module with some sample scripts.

There is also a gst-chromaprint GStreamer plug-in on github. (As a side note, amazing to see that GStreamer plug-ins these days come for free! I recall the days when we had to the work ourselves to write GStreamer plug-ins for libraries)

So, after giving them a quick test run, I packaged up the whole set and it's now available for Fedora 14 and 15 in my package repositories

The chromaprint-tools package contains fpcalc and you need to enable rpmfusion-nonfree to get its ffmpeg dependency.

And after that, I created a Task in DAD for chromaprint, and now I have:

$ dad analyze chromaprint /opt/davedina/audio/albums/Afghan\ Whigs\ -\ Gentlemen/Afghan\ Whigs\ -\ Debonair.ogg
** Message: pygobject_register_sinkfunc is deprecated (GstObject)
/opt/davedina/audio/albums/Afghan Whigs - Gentlemen/Afghan Whigs - Debonair.ogg:
Found 1 results
- Found 4 recordings.
- musicbrainz id: 62b2952a-4605-4793-8b79-9f9745ea5da5
- artist: The Afghan Whigs
- title: Debonair
- musicbrainz id: 8ff78e73-f8bd-4d78-b562-c3e939fb93fb
- artist: The Afghan Whigs
- title: Debonair
- musicbrainz id: a0d5ced6-43e8-450a-bf11-94f1f4520b92
- artist: The Afghan Whigs
- title: Debonair
- musicbrainz id: d01ac720-874c-48d6-95c6-a2cb66f9d5d0
- artist: The Afghan Whigs
- title: Debonair


Now it's time to dump that in the couchdb database backend, and start identifying duplicate tracks.

Acoustid seems to be a relatively young project, but its maintainer is very active on the mailing list and it's filling a hole in the open world that I'm happy to see filled! Thank you Lukas.

Step 1

Filed under: GStreamer,Hacking — Thomas @ 13:24


[root@ana ~]# rpm -Uhv /home/thomas/rpm/RPMS/x86_64/gstreamer011-*
Preparing... ########################################### [100%]
1:gstreamer011 ########################################### [ 33%]
2:gstreamer011-devel ########################################### [ 67%]
3:gstreamer011-debuginfo ########################################### [100%]


Digital Audio Database

Filed under: DAD — Thomas @ 01:07


Over the past few years I've been quietly exploring ideas for my ideal music application. When I lived together in that great house in Gent, we had a hacky set of PHP code that let us import music, rate it, and have it play back. It worked for our purposes, but it was a collection of hacky PHP code and hacky Perl code.

Now I'm not saying I got that much better at coding, but I'm sure I improved a little bit. I've always put off actually writing the damn code to replace it, and hence I have a bunch of separate music collections - the music I was listening to in that house (properly rated, but very outdated), random collections of downloads, and now the collection of CD's I bought ever since leaving that house that never quite made it into my computer and are now being imported by the Lego robot.

Over a year ago, I re-implemented the mixing backend on top of GNonLin, which for the most part works as long as I don't actually dereference tracks played - somethign to figure out at some point. I have ideas about a pure web-based mixing backend as well, but I need to learn modern stuff like JQuery first.

But the missing key really was something that handles the database part well enough, because my application should work distributed - it should manage my tracks on all my devices, including all my computers, and be able to figure out that some crappy mp3 of a song on my laptop is the same song as the flac version at home on my NAS. So if I rate that crappy mp3 on my laptop, I want that taken into account when my home machine creates a mix.

And for me, CouchDB promised to fill that niche. Except of course that I spent the last year figuring out how I can marry CouchDB's approach to replication with my natural desire to denormalize. It turns out that's possible with CouchDB, but it involves doing a lot of client-side caching (and invalidating/changing on change notifications) and is already pretty slow when I do it for my 14000 test tracks.

So, I've decided to experiment in a world where normalization is not needed, and I'm just going to pick one central concept (The 'track'), store as much related data into that document as possible (on each computer, the fragments of audio files that represent that track; its ratings; what album it's on; which artists made the track), treat some of those values as caches for the last known value from parent documents, and just go for speed first and see how that goes.

Yes, I am going to relax about not having everything perfect on the inside, so I can move on and write some more code that I can actually use.

I enjoyed a lot trying to shoehorn CouchDB into my relational wordview, but I want to see what life is like on the other side.

Before I was also very focused on migrating my old data (from the music I had when I was in the house in Gent) and its ratings. That's still important to me, but I think right now I'd more enjoy having something that lets me listen to and rate new music. When I originally wrote DAD I didn't expect to be getting so much music that wasn't from CD's. That's obviously not the case anymore, and I'm probably one of the last maniacs still buying CD's and worrying about getting them sample-perfect onto my NAS. In today's reality I need to deal with having the same track fifteen times, in various qualities, and I wish my computer handled that for me.

As part of this shift in approach, both in how I use CouchDB and what music I now want to listen to, I'm going to build the code from the opposite side I've been doing, focusing on smaller building blocks and getting the experience right. Step one will be collecting the right data about audio files, splitting them into individual fragments, and loading music in two passes into the databases. I'll focus on having small tools that show that the application can add tracks quickly and start playing them, filling in the more costly information later, and show that the GUI frontend can update these in realtime in the database view.

And, as usual, I like to shoehorn in a use for my python command class, so I'll be using that as a collection point for these little tools as I work my way up.

After plugging in the right plumbing, in twenty minutes I had this on top of my old code:

$ dad analyze level /mnt/nas/media/davedina/audio/albums/Nirvana\ -\ In\ Utero/Nirvana\ -\ All\ Apologies.ogg
** Message: pygobject_register_sinkfunc is deprecated (GstObject)
Successfully analyzed file /mnt/nas/media/davedina/audio/albums/Nirvana - In Utero/Nirvana - All Apologies.ogg.
2 fragment(s)
- fragment 0: 0:00:00.000000000 - 0:03:50.230204081
- peak 0.240 dB (105.672 %)
- rms -14.199868248342282 dB
- peak rms -8.913940439528652 dB
- 95 percentile rms -12.001385041642244 dB
- weighted rms -14.202287606952533 dB
- weighted from 0:00:01.205986394 to 0:03:39.612879818
- fragment 1: 0:23:59.107482993 - 0:31:32.227482993
- peak 0.526 dB (112.876 %)
- rms -14.742109190444983 dB
- peak rms -8.729096757819718 dB
- 95 percentile rms -11.56951163744373 dB
- weighted rms -14.742603253857133 dB
- weighted from 0:23:59.223582765 to 0:31:18.498684807

In case you were wondering, this shows the code correctly determining that the 'All Apologies' track on the In Utero CD contains in fact two songs. It always annoys the hell out of me when any of the music players I use doesn't play anything for 20 minutes just because Kurdt thought that would be amusing all those years ago.

(In case you were really astute, you may have noticed that this code claims that the peak of these fragments is over unity, which would be weird and wrong you would think. Monty could give you a long and interesting explanation on how that is in fact natural and every time I read it I still don't get it, even with my audio engineering background, and I still don't know if this apparent peak level is a bad thing, but in practice my playback code auto-levels anyway and consistently reduces volume on tracks, so I don't think it matters anyway...)

Removing objects from running Python processes using GDB

Filed under: Flumotion,Python — Thomas @ 11:47


This week at work we ran into a problem where one of our Python processes was consuming close to 3 GB of memory because it's not properly cleaning up a list. Because of other bugs this process could not be easily restarted without triggering other problems, so our core team asked for some suggestions and I told them "Why don't you try cleaning up the Python list using GDB and the Python C API ?" I had a vague recollection of someone on our team doing something like this a few years ago.

I also asked them to blog about it, because there aren't that many resources readily findable on the subject.

So here is Andoni's take on the problem.

If any Pythonista can suggest how he could have avoided the segfault during garbage collection, please let us know!