thomas.apestaart.org

validate me

Filed under: Hacking,Python — Thomas @ 23:08

2009-04-21
23:08

I'm writing some code that does something that seems straightforward but I'm trying to cover all my bases.

The code in question should perform a bunch of file/directory renames, and update some files that contain references to those changed filenames, such that the complete set stays consistent. (As a purely theoretical example, think 'a directory of sound files, and the .m3u/.cue/.log file in it).

I used to have something like this in DAD, so that you could fix spelling of tracks after ripping. Except that it was an ugly perl script and just gave up if anything went wrong in the middle (permissions, bad naming, machine crashing, ...)

This time around I want to take this conceptually simple but by design unatomic operation, and turn it into a set of resumable operations, such that if for whatever reason one of the atomic operations fails, or the machine crashes in the middle, there is enough information to resume the complete operation.

My current approach goes something like this:
- each Operation (renaming a file, replacing references in a file, ...) is an object
- the object has serialize/deserialize methods
- an Operator object gets loaded with a list of these Operations
- each Operator mains state of all tasks to be done, and which tasks are done already
- the Operator starts by serializing the todo list to disk in a given state directory
- it goes through each operation, performing the operation (which should end in an atomic step, for example moving the temporary file over the original file in the case of renaming in a file)
- after performing the operation, the operator serializes done to the state directory.
- the Operator continues performing tasks and saving done to the state directory.
- at the end of all operations, the Operator first removes the todo state, then the done state.

If, at any point, an operation errors out, then the problem can be resolved manually, and the Operator can be resumed (since it knows what the last operation was that it performed successfully).

If, at any point, the application or machine crashes, the todo and done state can be found in the state directory, so the Operator can be loaded from disk. The Operator knows which tasks have definately been performed, knows what the next task is. The only unknown is if the next task actually got performed successfully or not. Since each operation has to end in a final atomic call, it suffices to check if that final call was made correctly, and update the done state. From there it can continue.

This is the basic design I'm planning on doing. But I'm pretty tired today, lacking some sleep, so please punch holes in this approach, or tell me how overengineered this whole approach is and how it could be much better. Basically, the only even remotely similar thing I could find was this Perl module. I probably got the idea for using a state file from discussions with Jan and Arek at work about their RRD synchronization framework and its journal files, but I probably butchered their approach for this particular problem.

I could use a good reality check!

Comments (2)

EAT

Filed under: Flumotion,Python,Releases,Twisted — Thomas @ 19:10

19:10

Today the team released another development version of Flumotion! Strangely enough I still made it to the contributors list. Maybe I should look up what I am guilty of.

Here's what the guys say:

Yet another step in the long march towards a stable release. We made
sure we close more bugs than we create, hence the scarce features and
numerous fixes.

The bulk of the improvements is centered around the administration
interface. The configuration assistant gained in functionality,
stability and consistency.

More information here

Not sure why the guys decided to break with tradition and name this release after a restaurant (they probably assume it doubles as a bar). According to Flumotion tradition, micro releases are named after bars where we celebrate the release, and major/minor releases are named after the restaurants where we celebrate the release.

Comments (0)

90 minutes of hacking

Filed under: GStreamer,Hacking,Python — Thomas @ 23:39

2009-04-20
23:39

today give me, as a followup to yesterday's task post:

an implementation of MultiTask that tracks progress across all tasks combined
a new task that calculates the MusicBrainz TRM id/fingerprint of a track
an example that uses the new MultiTask with the new TRMTask to calculate the fingerprints of a playlist given

Not bad for 90 minutes of hacking. I really like my expressiveness in Python. And all of this done while listening to beautifully mixed music with my current jukebox script. I'm actually enjoying hacking again!

The example doesn't actually save the fingerprints yet. My mini-goal with this for DAD is to fingerprint all audio on all my devices, as a basis to uniquely identify audio tracks, and then layer the rating of tracks on various machines on top of that information.

Comments (1)

asynchronous task interface

Filed under: Hacking,Python — Thomas @ 23:13

2009-04-19
23:13

In a previous post, I mentioned looking for some code implementing asynchronous operations that you can then attach a progress bar to.

I didn't end up finding anything, so as part of the ripping code I'm writing I prototyped a simple approach to it.

It's working nicely already; I have the basic class for a Task, a class for MultiTasks, and two runners - a command-line blocking one and a GtkProgressBar-using widget.

Here's the code if you're interested. You can run that file to see a simple DummyTask (one that takes 10 seconds to complete) progressing on the command line. Another example you can run (for which you need to check out the whole code) is examples/gtkchecksum.py which checksums decoded audio data using GStreamer and shows you a GTK+ progress bar. An even better example (but you'll need a complete CD rip with matching .cue file) is the ARcue.py program, which calculates AccurateRip checksums and compares them against the online database, and you can choose whether you want to run it in cli or gtk mode.

The cli one uses a GObject main loop to make the asynchronous code seem blocking again. I could easily add a Twisted reactor-using runner, but haven't needed it yet. Also, the one big annoyance I have with the otherwise excellent Twisted is the fact that you can't do much at all without the ugly 'global variable' that is the reactor.

I don't think I'm completely happy with these classes yet. I realized last week that unconsciously I was again moving towards an MVC approach with this, and realizing that also made me realize I'm not yet there. If you consider the Task the Model, then the Runner is currently both the View and Controller. I will probably need to split out the View part again, so I can attach multiple Views to the Model that is the task.

Also, I need to add a way to report on errors while executing the task, I'm considering adding timing statistics and finish estimation, and maybe something that allows doing global progress and subtask progress, the way ripping programs do too (showing current track percentage and complete disc percentage). Not sure how much I want to overengineer it though.

Feel free to comment, suggest improvements, or show me similar concepts and implementations!

Comments (5)

Skype hack session

Filed under: GStreamer,Hacking,Python — Thomas @ 23:18

2009-04-18
23:18

Today, Jan and I did some hacking over Skype! It was fun, worked out well. Jaime was off shopping and Kristien had her radio show and some interview with Flowrida.

Jan mostly helped me with stuff though, he didn't seem to need any help from me. I started ripping CD's to flac this week, and was very disappointed yesterday when my jukebox program didn't seem to work at all with .flac files. With Jan's help I narrowed down the problem, checked with Edward, then I fixed it on a git branch.

I also set up http publishing of my personal git branches, and a cgit installation. cgit was a little more difficult to configure than I'd like, the documentation isn't much help. I'll keep that for a separate post though.

Anyway, my jukebox is happily playing again with all the freshly and accurately ripped .flac files I have. My life just improved by 20% !

Although, I don't know whether it was switching to uridecodebin for decoding, or some bug in .flac, but sometimes the composition just seems to jump a few seconds.

I'll worry about that later, first fix some more bugs and make the jukebox example a bit more featureful. I'm wondering what to do on 'next' and 'previous' - rearrange the composition on the fly to match ? I hope gnonlin will be able to keep up...

Tip from our hacking session: add persistent history to your gdb session by putting the following in .gdbinit:
set history filename ~/.gdbhistory set history save on

Comments (3)

Present Perfect

validate me

2009-04-2123:08

EAT