thomas.apestaart.org

The Art of the Rip

Filed under: DAD,Hacking,Python — Thomas @ 10:40

2009-05-03
10:40

While I'm working on the ripping software, I find myself going back and forth between various references to figure out the small details and the pieces that subtly get interpreted differently between them. As is the case with other projects, I can easily see myself forgetting about these details soon, and cursing myself a year from now for not having written down my clear understanding of today.

So, in an effort to appease my future self, I've started writing down a condensed form of the important information I've come across.

On that page, I'm also comparing various ripping programs and how they handle the various details I consider important for correct ripping. I'll use that information and that chart as the basis for the features of my ripping program.

I'm trying to stay as objective as possible on that page, so feel free to tell me about mistakes, omissions, software I should be adding, ...

By now, I have a good set of goals for my ripping program:

lossless ripping
accuracy is the number one goal
speed is always second to accuracy
hands-off one-click/command ripping
separate ripping from metadata fixing
rip hidden track one audio automatically

With this in mind, I thought yesterday how I could figure out the drive's read offset the way EAC does it. I've come up with a simple program that:

checks if the current CD is in the AccurateRip database
if it is, rip the first track with various offsets
if any of the AccurateRip checksums match, that is most likely the offset for your drive

It took longer to test the program than to write it, since my AccurateRip checksum calculation is currently done purely in Python and thus rather slow.

In any case, using Bat For Lashes' "Fur and Gold":
[gst-git] [thomas@ana trunk]$ PYTHONPATH=$PYTHONPATH:`pwd` python examples/ARcalibrate.py CDDB disc id 8a0aa10b AccurateRip URL http://www.accuraterip.com/accuraterip/9/f/f/dBAR-011-00112ff9-00976269-8a0aa10b.bin 4 AccurateRip reponses found ripping track 1 with offset 46 AR checksum calculated: b880421e ripping track 1 with offset 47 AR checksum calculated: 4a29a173 ripping track 1 with offset 48 AR checksum calculated: 903b390e MATCHED against response 3 offset of device is 48 ripping track 1 with offset 49 AR checksum calculated: e7c008f1 [gst-git] [thomas@ana trunk]$

I made the program scan from 46 to 49, knowing that my drive has a +48 read offset. Now I'm going to add an option to choose the range, an option to start with the most common offsets, and think about including using online databases of drive features to start with the one most likely to be correct for your drive.

Comments (13)

read order

Filed under: Belgium,General — Thomas @ 16:02

2009-05-02
16:02

A few weeks ago a few neurons tragically misfired, causing me to click a buy button on a 20 item shopping cart at Amazon.

Really, I just wanted to buy Overqualified, after reading how Joey sold out his first run already and realizing I did not want to wait for a second print. But somehow I thought I'd optimize my costs if I added some more stuff from my wishlist.

Anyway, the post office had trouble finding the books I had ordered. Probably because instead of a normal box, the books were packaged in this:
71068

Inside was the traditional Amazon box:
71071
Sadly, things that look like a bag are thrown around like a bag. The soft carton inside the bag was not strong enough to protect the books inside of being damaged.

If your spine reading is good, you can see what I ended up ordering:
71077

The problem is I already had stuff I still have to read; I have a special hole in my bookshelf just for that:
71074

Maybe it's time for another week of stay-at-home-holiday...

(This post brought to you by the Gallery2 for WordPress plugin)

Comments (6)

Hidden Track One Audio: check

Filed under: Hacking,Music,Python — Thomas @ 21:01

2009-05-01
21:01

After another day of on-and-off hacking, wrapping cdrdao and cdparanoia binaries in my task interfaces I mentioned before, I inserted a CD by Bloc Party called 'Silent Alarm', ran a command, and saw the following output on my screen:
[gst-git] [thomas@ana trunk]$ PYTHONPATH=$PYTHONPATH:`pwd` python examples/readhtoa.py Found Hidden Track One Audio from frame 0 to 15220 runner done Checksums match [gst-git] [thomas@ana trunk]$ ls -l track00.wav -rw------- 1 thomas thomas 35797484 2009-05-01 21:56 track00.wav

I'm going to guess this is the first piece of Linux code that is able to automatically find and rip the hidden track at the start of a CD. (Feel free to correct me using your choice of alliterative insult if I am wrong!)

It's time to start collecting all my new-found wisdom in something more permanently written down, but that will be for tomorrow.

Comments (2)

CentOS debug fail

Filed under: sysadmin — Thomas @ 20:41

2009-04-30
20:41

Recently we got a new server for apestaart, our little hosting project between friends.

This time we decided to install CentOS, since people I trust have been saying good things about it, we use RHEL at work and it's pretty much the same, and Wiebe, the other admin, also was all for it.

This week I ran into a segfault and I wanted to debug it. Turns out CentOS ships without any debug repositories installed by default. Fedora installs the files, but cleverly disables them, as they should. But yumutils contains debuginfo-install, a handy script that allows you to install all the dependencies of a -debuginfo package, and gdb spits out useful commands like what packages to install when you're looking at a stack trace with missing symbols.

So, this doesn't work out of the box on CentOS. Fail #1.
I went on IRC, mentioned this there, and I was told that 'most CentOS users don't need debuginfo packages'. Fail #2 - neither do Fedora users, yet somehow Fedora managed to figure out how to work both for the ones that don't need them and the ones that do. So goodbye unhelpful IRC channel.
I had to manually create a .repo file based on some guy's post complaining about the same thing; something much like this:
[root@betsy ~]# cat /etc/yum.repos.d/debuggery.repo [debuggery] name=CentOS-$releasever - DebugInfo baseurl=http://debuginfo.centos.org/$releasever/$basearch/ priority=1

Then I actually tried to do debuginfo-install python, and what happened ?

First of all, the python-debuginfo package for the installed version of python isn't even in that repo (though 3 others are, go figure). FAIL #3

Of the other packages that it did find, one was unsigned:
yum.Errors.YumBaseError: Package libtermcap-debuginfo-2.0.8-46.1.x86_64.rpm is not signed

FAIL #4.

CentOS, hostile to developers. Back to old-school 90's style rpm hunting and pecking on the web.

UPDATE:
As for the python debuginfo package, there is a RHEL5 package with exactly the right version and release tag, but that one was built for RHEL5, and installing it mismatches against the CRC of the installed python package. CentOS simply does not provide the debuginfo package for its shipped version of python. FAIL #5

Comments (12)

validate me

Filed under: Hacking,Python — Thomas @ 23:08

2009-04-21
23:08

I'm writing some code that does something that seems straightforward but I'm trying to cover all my bases.

The code in question should perform a bunch of file/directory renames, and update some files that contain references to those changed filenames, such that the complete set stays consistent. (As a purely theoretical example, think 'a directory of sound files, and the .m3u/.cue/.log file in it).

I used to have something like this in DAD, so that you could fix spelling of tracks after ripping. Except that it was an ugly perl script and just gave up if anything went wrong in the middle (permissions, bad naming, machine crashing, ...)

This time around I want to take this conceptually simple but by design unatomic operation, and turn it into a set of resumable operations, such that if for whatever reason one of the atomic operations fails, or the machine crashes in the middle, there is enough information to resume the complete operation.

My current approach goes something like this:
- each Operation (renaming a file, replacing references in a file, ...) is an object
- the object has serialize/deserialize methods
- an Operator object gets loaded with a list of these Operations
- each Operator mains state of all tasks to be done, and which tasks are done already
- the Operator starts by serializing the todo list to disk in a given state directory
- it goes through each operation, performing the operation (which should end in an atomic step, for example moving the temporary file over the original file in the case of renaming in a file)
- after performing the operation, the operator serializes done to the state directory.
- the Operator continues performing tasks and saving done to the state directory.
- at the end of all operations, the Operator first removes the todo state, then the done state.

If, at any point, an operation errors out, then the problem can be resolved manually, and the Operator can be resumed (since it knows what the last operation was that it performed successfully).

If, at any point, the application or machine crashes, the todo and done state can be found in the state directory, so the Operator can be loaded from disk. The Operator knows which tasks have definately been performed, knows what the next task is. The only unknown is if the next task actually got performed successfully or not. Since each operation has to end in a final atomic call, it suffices to check if that final call was made correctly, and update the done state. From there it can continue.

This is the basic design I'm planning on doing. But I'm pretty tired today, lacking some sleep, so please punch holes in this approach, or tell me how overengineered this whole approach is and how it could be much better. Basically, the only even remotely similar thing I could find was this Perl module. I probably got the idea for using a state file from discussions with Jan and Arek at work about their RRD synchronization framework and its journal files, but I probably butchered their approach for this particular problem.

I could use a good reality check!

Comments (2)

Present Perfect

The Art of the Rip

2009-05-0310:40

read order

2009-05-0216:02