|
|
|
I have tomorrow (saturday) blocked out for a whole day of morituri hacking as I will be home alone.
One of the things a lot of morituri users are puzzled by is its relentless drive to extract every single sample of audio from the CD. Currently, even if it’s a really short pre-gap, and most likely just an inaccurate master or burn, with no useful audio in it.
For me, that was a design goal of morituri – I want to be able to exactly reproduce a CD as is. That is to say, ripping a CD should extract *all* audio from the CD, and it should be possible to make a copy of that CD and then rip that copy, and end up with exactly the same result as from the original CD. (I’m sure there’s a fancy scientific term for that that I can’t remember right now)
To a lot of other people, it seems to be annoying and they don’t like having those small almost empty files lying around.
So I thought I’d do something about that, and that it might be useful as well to analyze my current collection of tracks and figure out what’s in there. Maybe I can find some hidden gems that I hadn’t noticed before?
So I added a quick task to morituri that calculates the maximum sample value (I didn’t want to use my own level element in GStreamer for this as I wanted to make sure it was actual digital zero; this should be done in an element instead though, but I preferred the five minute hack for this one).
And then I ran:
rip debug maxsample /mnt/nas/media/audio/rip/morituri/own/album/*/00*flac
Sadly, that turned up 0 as the biggest sample for all these tracks!
Wait, what? I spent all that time on getting those secret tracks ripped just to get none? That’s not possible! I know some of those tracks!
Maybe the algorithm is wrong. Nope, it works fine on all the regular tracks.
Oh, crap. Maybe morituri has been ripping silence all this time because my CD drive can’t get that data off. Yikes, that would be a bit of egg on my face.
No, it works if I check that Bloc Party track I know about.
Ten minutes of staring at the screen to realize that, while I was outputting names from a variable from the for loop over my arguments, the track I was actually passing to the task was always the first one. Duh. Problem solved.
As for what I found in my collection:
- a cute radio jingle that brought back memories from a live bootleg I had made myself of Bloem. That’s from over ten years ago, but that must have been around the time I learned about the existence of HTOA and wanted to get one in
- found unknown HTOA tracks on Art Brut’s Bang Bang Rock & Roll, Mew’s Half the world is watching me; not their best stuff
- soundscapey or stagesetting tracks on QOTSA’s Songs for the Deaf, Motorpsycho’s Angels and Daemons at play And Blissard; not that worth it (the Blissard track was ok, but really quiet)
-
Pulp hid a single piano chord in a 2 second pre-gap on This is Hardcore; very curious. It’s not an intro to the first track, because it doesn’t fit with the sound at all.
- Damien Rice hid a demo version of 9 Crimes (the first track) in the pregap; instead of piano and female vocals, he plays guitar and sings all the parts.
- Got reacquainted with my favourite HTOA tracks: the orchestral quasi-wordless medley on the Luke Haines/Das Capital disc; the first Bloc Party album with a beautiful instrumental (up there with the hidden track at the end of Placebo’s first album; both bands delivering an atypical but stunning moodscape; the beautiful cover of Ben Kenobi’s Theme by Arab Strap on the Cherubs EP (no idea why that landed in my album dir, that needs to be fixed); the silly Soulwax skit for their second album.
Of course, Wikipedia has the last word on everything
I note that they think Pulp recorded a cymbal, not a piano. And now that I see the title of the QOTSA hidden track, I get the joke I think.
In total, on my album collection of 1564 full CD’s, I have 171 HTOA’s ripped, 138 tracks of pure digital silence, and only about 11 are actually useful tracks.
I expected to find more gems in my collection. I’ll go through ep’s, singles and compilations next just to be sure.
But with this code in hand, maybe it’s time to add something to morituri to save the silent HTOA tracks as pure .cue information.
I’ve never been a fan of voting for talks, because it tends to be poorly implemented under the guise of democracy. Of course it’s easy for me to talk, I’ve never organized anything at that scale.
I’ll give two examples on why I feel this way, one of which triggering today’s blog post.
First off, my colleague Marek submitted a talk to Djangocon. The talk was about how to use feat (a toolkit we wrote for livetranscoding) to serve Django pages, but in such a way that they can use Deferreds to remove the concurrency bottleneck of “1 request at a time” per process running Django.
Personally, to me, this is one of the most irritating design choices of Django – from the ground up it was built synchronously (which could have been fine in most places). But the fact that, when you get a request, you have to always synchronously respond to it (and block every other request for that process in the meantime) is a design choice that could have easily been avoided.
In our particular use case, it was really painful. If our website has to do an API request to some other service we don’t control that can easily take 30 seconds, our process throughput suddenly becomes 2 pages per minute. All the while, the server is sitting there waiting.
Yes, you can throw RAM at the problem and start 30 times more processes; or thread out API requests; or farm it out to Celery, and do some back-and-forthing to see when the call’s done. Or do any other number of workarounds for a fundamental design choice.
Since we like Twisted, we preferred to throw Twisted at the problem, and ended up with something that worked.
Anyway, that’s a lot of setup to explain what the talk was about. Marek submitted the talk to DjangoCon, and honestly I didn’t expect it to get much traction because, when you’re inside Django, you think like Django, and you don’t really realize that this is a real problem. Most people who do realize it switch away to something else.
But to my surprise, Marek’s talk was the most-voted talk! I wish I could link to the results, but of course that vote site is no longer online.
I guess I expected that would mean he’d be presenting at DjangoCon this year. So I asked him today when his talk was, and he said “Oh that’s right. I did not get accepted.”
Well, that was a surprise. Of course, the organising committee reserves the right to decide on their own – maybe they just didn’t like the talk. But if you ask your potential visitors to vote, you’d expect the most-voted talk to make it on the schedule no ?
The feedback Marek got from them was surprising too, though. Their first response was that this talk was too similar to another talk, titled “How to combine JavaScript & Django in a smart way”. Now, I’m not a JavaScript expert, but from the title alone I can already tell that it’s very unlikely that these two talks have many similarities beyond the word ‘Django’.
After refuting that point, their second reason was that they wanted more experienced speakers (but they didn’t ask Marek for his experience), and their third reason was that the talk was in previous editions of DjangoCon US/EU (it’s unclear whether they meant his talk or the JavaScript one, but Marek’s definitely wasn’t, and we couldn’t find any mention of the other talk in previous conferences. I’m also not sure why that even matters one way or the other. This email thread was in Polish, so I have to rely on Marek’s interpretation of it)
Personally, my reaction would have been to complain to the organizers or Django maintainers. Marek’s flegmatic attitude was much better though – after such an exchange, he simply doesn’t want to have anything to do with the conference.
He’s probably right – it’s hard to argue with someone who doesn’t want to invite you and is lying about the reasons.
The second example is BCNDevCon, a great conference here in Barcelona, organized by a guy who used to work for Flumotion who I have enormous respect for. I’ve never seen anyone create such a big conference over so little time.
He believes strongly in the democratic aspect, and as far as I can tell constructs the schedule solely based on the votes.
Sadly I didn’t go to the last one, and the reason is simply because I felt that the talks that made it were too obviously corporate. A lot of talks were about Microsoft products, and you could tell that they won votes because people’s coworkers voted on talks. I’m not saying that’s necessarily wrong – given that he worked at our company and has friends here, I’m sure people working here presenting at his conference have also done vote tending. It’s natural to do so. But there should be a way to balance that out.
I think the idea of voting is good, but implementation matters too. Ideally, you would only want people that actually are going to show up to vote. I have no idea how you can ensure that, though. Do you ask people to pre-pay ? Do you ask them to commit to pay if at least 50% of their votes make it in the final schedule, kickstarter-style ?
These two examples are on opposite extremes of voting. One conference simply disregards completely what people vote on. If I had voted or bought a ticket, I would feel lied to. Why waste the time of so many people? The other conference puts so much stock in the vote, that I feel the final result was strongly affected. I seriously doubt all those Windows 8 voters actually showed up.
Does anyone have good experiences with conference voting that did work? Feel free to share!
After piecing together the security story of CouchDB as it applies to mushin, I secured the mushin database on various machines. This serves as a quick setup guide for security for mushin, but I think it’s useful for other people using CouchDB.
Stop using Admin Party
This is easy to do in Futon (link only works if you run couchdb locally on port 5984, as per default). Jan’s blog post explains it perfectly, including screenshots.
Under the hood, couchdb will actually rewrite your local.ini file to add this user – all admin users are stored in the config files. (I’m sure there’s an obvious reason for that)
Given that you most likely will use this password in Futon, make sure you pick a unique password – as far as I can tell this password goes over the wire.
Create a user object for your user
explains the basics. You need to create or update the _users database, which is a special couchdb database. You can get to it in Futon. If, like most people, you’re still on a couchdb before 1.2.0, you have to fiddle yourself to calculate the password_sha field, but at least the page explains how to do it. Not the most user-friendly thing to do in the world, so I’m considering adding commands for this to a different application I’m working on.
Allow this user to read and write to the mushin database
Again, the best reference is the CouchDB wiki, but the information is easy to miss.
Every database has a _security object under the database name; in the case of mushin, you can get to it in Futon. _security is a special document that does not get versioned, and doesn’t show up in listings either. In fact, it is so special that Futon doesn’t let you change it; when you click save it just resets. So your only option is to PUT the document, for example:
curl -X PUT -d @security.json http://admin:sup3rs3kr3t@localhost:5984/mushin/_security
Oops, see what I did there ? I had to specify my admin password on the command line, and now it’s in my shell history. I did tell you to choose a unique one because it’s going to be all over the place, didn’t I ?
security.json is just the contents of the _security document; just adapt the example on the wiki, and put your user under readers, and leave the role empty for now.
test denial
This one is simple; just try to GET the database:
$ curl http://localhost:5984/mushin
{“error”:”unauthorized”,”reason”:”You are not authorized to access this db.”}
If you did it right, you should see the same error. If you’re brave, you can retry the same curl command, but add your username and password. But you know how we feel about that.
(… wherein I disappoint the readers who were planning to drop my database)
So, I assume you now know what mushin is.
The goal for my nine/eleven hack day was simply to add authentication everywhere to mushin, making it as user-friendly as possible, and as secure as couchdb was going to let me.
Now, CouchDB’s security story has always been a little confusing to me. It’s gotten better over the years, though, so it was time to revisit it and see how far we could get.
By default, CouchDB listens only on localhost, uses plaintext HTTP, and is in Admin Party mode. Which means, anyone is an admin, and anyone who can do a request on localhost can create and delete databases or documents. This is really useful for playing around with CouchDB, learning its REST API using curl. So easy in fact that it’s hard to go away from that simplicity (I would not be surprised to find that there are companies out there running couchdb on localhost and unprotected).
Authorization
What can users do ? What type of users are there ?
In a nutshell, couchdb has three levels of permissions:
- server admin: can do anything; create, delete databases, replicate, … Is server wide. Think of it as root for couchdb.
- database admin: can do anything to a database; including changing design documents
- database reader: can read documents from the database, and (confusingly) write normal documents, but not design documents
CouchDB: The Definitive Guide sadly only mentions the server admin and admin party, and is not as definitive as its title suggests. A slightly better reference is (although I still haven’t cleared up what roles are to be used for, beside the internal _admin role).
By far the clearest explanation of security-related concepts in CouchDB is in Jan Lernhardt’s CouchBase blog post.
I’ll come back to how these different objects/permissions get configured in a later post.
Authentication
How do you tell CouchDB who you are, so it can decide what it lets you do ?
By default, CouchDB has the following authentication handlers:
- OAuth
- cookie authentication
- HTTP basic authentication, RFC 2617
But wait a minute… To tell the database who I am, I have a choice between OAuth (which isn’t documented anywhere, and there doesn’t seem to be an actual working example of it, but I assume this was contributed by desktopcouch), cookie authentication (which creates a session and a cookie for later use, but to create the session you need to use a different authentication mechanism in the first place), or basic authentication (which is easy to sniff).
So, in practice, at some point the password is going to be sent over plaintext, and your choice basically is between once, a few times (every time you let your cookie time out, which happens after ten minutes, although you get a new cookie on every request), or every single time. I’m not a security expert, but that doesn’t sound good enough.
And typically, my solution would be to switch to https:// and SSL to solve that part. Since CouchDB 1.1 this is included, although I haven’t tried it yet (since I’m still on couchdb 1.0.3 because that’s what I got working on my phone)
Now, my use case is a command-line application. This adds some additional security and usability concerns:
- When invoking gtd (the mushin command-line client) to add a task, it would be great if I didn’t have to specify my username and password every single time. Luckily, gtd can be started as a command line interpreter, so that helps.
- It would be great if I didn’t have to specify a password on the command line, either as part of a URL (for example, when replicating) or as an option. I really hate to see passwords either in the process list or in shell history or in a config file, and typically I will use my lowest-quality passwords for apps that force me to do this, and want to avoid writing software that has no other option.
The Plan
In the end, the plan of attack started to clear up:
I made it pretty far down the list, stopping short at upgrading to couchdb 1.1
But at least tomorrow at work, people will not be able to get at my tasks on their first attempt. (Not that there’s anything secret in there anyway, but I digress)
After ripping over a 1000 CD’s perfectly, and having problems on a few (bad discs, weird audio, a few small niggles to fix), I ran into a fun failure.
Apparently, the file name in u’morituri/Sufjan Stevens – Illinois/02. Sufjan Stevens – The Black Hawk War, or, How to Demolish an Entire Civilization and Still Feel Good About Yourself in the Morning, or, We Apologize for the Inconvenience but You\’re Going to Have to Leave Now, or, “I Have Fought the Big Knives and Will Continue to Fight Them Until They Are Off Our Lands!”.flac’ is too long for my NAS.
Thank you Mister Sufjan. In your honour, I added a function to morituri to shrink the filename to a power of two minus one, below either the given length or 128 characters, whichever is less. For now the algorithm splits on spaces and changed the file name to morituri/Sufjan Stevens – Illinois/02. Sufjan Stevens – The Black Hawk War, or, How to Demolish an Entire Civilization and Still Feel Good About Yourself.flac
That is good enough for me… I was worried I had to teach this one tiny function about keeping quoted pieces together, or how comma’s work, or how ‘or, ‘ works, and so on, just to satisfy my crazy sense of aesthetics.
Next Page »
|