[lang]

Present Perfect

Personal
Projects
Packages
Patches
Presents
Linux

Picture Gallery
Present Perfect

Reverse Engineering

Filed under: General — Thomas @ 00:45

2005-08-30
00:45

I've been mentally building up a backlog of stuff to write about and the moment where I sit down and just lay it all out isn't happening for some reason. So I've decided to do some reverse engineering and work my way backwards instead. Starting with tonight.

Went to see The Island with part of the gang and our honorary employee of the week, the Welsh Frenchie. It was actually quite enjoyable, and while I'm sure lots of people will slag it for the obvious plot holes and the overblown action ... Well, I've never understood why some people can't judge a film in its own category, anyway. And hey - what a happy ending the movie leaves us with - only 5m$ between you and a clone of Scarlett Johanson ! Finally something worth saving for.

Spent way too big a chunk of my weekend covering up for someone who shall rename nameless who completely dropped the ball on his responsibilities after managing to not prepare at all for the commitment he made; he made up for it by going to the beach instead of actually trying to show a morsel of remorse and a whiff of something that would vaguely smell of duty on a good day.

The only redeeming feature of my weekend was the fact that I had visitors over again - Els and Wiebe passed by on their way back from Malaga. We went on a wild tapas hunt looking for Can Tomas, which is rumoured to have the best bravas in town. But two taxis and a closed door later, we let the cab driver decide where to take us. He took us to La Cerveceria Catalana, which he said was one of the best places in town. Entering was like coming home after a long trip - a feeling of having been there before but stuff is out of place. The menu had some stuff that I knew, the waiters looked sort-of-familiarly-dressed, and the decoration reminded me of something. After a few moments it hit me that this place looked an awful lot like our favourite tapas place, Ciudad Condal. It was spooky. The downside was that I didn't find out about a great new tapas place. The upside is that I had great food and we really do know one of the best tapas places around.

Jukebox

Boy, does it take long to close circles. Some five years ago I did a thesis project in my last year at university. The goal of it was to do real-time mixing of MP3 files on a Pentium 200MMX. I don't know if people remember the good old days, but a machine like that had a hard time already just decoding an MP3. I found a really nice way of doing mixing in the subband sample domain, bypassing the most expensive steps of the decode and encode processes, and was able to mix up to four tracks simultaneously in real-time, outputting to MPEG Layer 2. The jukebox driven by that ran for a very long time before it finally got retired.

Now, this was the only good thing about the program I wrote. The bad thing was that it was a gross hack evolved from the reference encoder/decoder code; that it only worked for MP3 files; that all files were required to be of the same sample rate (otherwise the subband sample mixing is wrong); and that as a project it wasn't very maintainable at all. Now this last point was back then still a bit unknown to me - I was just getting started in Free Software stuff. But regardless - I started looking for stuff that would allow me to do a nicer jukebox program for the radio station, and while googling (or was it still called altavistaing back then ?) I learnt about GStreamer. That was around 0.1.1 I think.

I read the docs, was impressed with the ideas (little did I know back then :)), started poking at it, found some things that didn't work, got involved more and more, all with the idea of doing my mixer as quickly as possible. But pretty soon all things I wanted to do with something like GStreamer were moved to the back of my mind and GStreamer became a goal on its own. Sure, I wrote a bunch of programs with it, and once in a while I took a small stab at some ideas for the mixer, but I never pushed through.

Until a few weeks ago, where we needed a decent radio demo to be online 24/7 and I just didn't want to put in another hack. So I wrote some stuff in Python, found some bugs in the current versions of some of the first GStreamer elements I wrote (why did people let me commit DSP stuff in the first place ?), and now I have something that does a remarkable job at mixing music of *any* type GStreamer can handle, automatically, picking decent mix points and a per-track volume level that makes the average loudness over songs constant.

It's definitely not perfect yet (Bohemian Rhapsody's dynamics trip it up for example), but it's a good first stab that I'll work on refining over the next few months. If you want to give it a listen and tell me what you think, the URL is http://stream.fluendo.com:8831 for a HQ Vorbis stream. (There's an mp3 stream at 8833 and a LQ Vorbis stream at 8832)
You can also run it for yourself; get GStreamer core, plugins and Python bindings for the 0.8 branch from CVS, go into gst-python/gst/extend and run

python jukebox.py playlist.m3u queue max-size-bytes=0 max-size-time=3000000000 ! { osssink }

. The program will eat a *lot* of CPU while it's scanning all the files from your playlist for good mixing points, but once that's done the values are cached.

I'm making a much bigger deal out of having finally done this than it is, but it is weird how these things take on a life of their own and run away with you. It was nice to finally just get something done that I was planning to do when learning about GStreamer.

Memory hunting

The only problem I had was that the jukebox component I wrote for Flumotion was eating too much memory. After a week, it was consuming 50% of the 512MB memory on the machine it was running on. I spent some free time looking into what was going on, fixing various memleaks in the elements (I also finally learnt how to properly valgrind python applications), learning some more about pygtk's refcounting and cycle problems, and doing some more explicit cleanup in my python code. I got to the point where I was sure my program wasn't leaking in the sense that it was dropping references to blocks of memory.

But it was still leaking memory in the sense that memory use was slowly increasing over time. The problem is that that sort of thing is pretty hard to work on in Python. I started adding some debugging that made the program garbage-collect at specific points in the program so I could bring some determinism to the freeing patterns and cross-reference with the GStreamer object refcounting debugging. That allowed me to fix a bunch of elements that were kept around, but not used anymore.

To get a better sense of how bad the problem was, I wanted to run the program outputting to a fakesink, basically making it mix as fast as it could. The only problem is that my home desktop heats up fairly quickly, and starts flooding the console with CPU thermal info around 70C, making the system unusable. This happens after about five minutes of 100% CPU use.

So, because I really wanted to run these tests over the weekend, I finally dove into the wonderful world of case modding. I took Wiebe to some of the computer stores and we bought a whole set of fans - plus a KVM switch that I wanted for some time so I could stop diving under the table whenever the server machine had a problem. I took the opportunity to completely rework the engine room layout, separate the two machines (they were standing right next to each other), install the fans, work out airflows, and re-paste the CPU's. And now the machine is running at 40C idle, and no more than 54C at 100% CPU. With that, I could run the jukebox at 100% CPU, and at this point it's mixed about 12000 songs - which is over a month of output. The Data size is around 44 MB, or a good 10% of my memory. So it's still increasing slowly, taking about 3.7K per song.

Not bad, but could be better. At this point I'm stuck. If anyone has ideas on how I could further figure out what could be causing this, let me know. I've tried massif, but both in 2.4.0 and 3.0.0 it failed on the first song. I've tried memprof, but I can't even get that to run anymore. My next hack will be something that LD_PRELOAD's to track allocation and freeing, and has a python hook so I can ask for the allocated memory at specific points (for example, each time a song has played completely) that are comparable to each other, to figure out what I'm not getting rid of each cycle. Not sure if it's possible to do a python binding for a lib you LD_preload, but it's worth a shot...

Anyway, upwards and onwards. I can't imagine why I've went in such detail on so little given that I was planning to flush out a whole backlog, but folks back home were getting worried about what I was up to...

No Comments

No comments yet.

RSS feed for comments on this post. TrackBack URL

Sorry, the comment form is closed at this time.

picture