Present Perfect


Picture Gallery
Present Perfect

Getting Things Done with CouchDB, part 3: Security in mushin

Filed under: couchdb,General,Hacking,Python — Thomas @ 11:26 pm

11:26 pm

After piecing together the security story of CouchDB as it applies to mushin, I secured the mushin database on various machines. This serves as a quick setup guide for security for mushin, but I think it’s useful for other people using CouchDB.

Stop using Admin Party

This is easy to do in Futon (link only works if you run couchdb locally on port 5984, as per default). Jan’s blog post explains it perfectly, including screenshots.

Under the hood, couchdb will actually rewrite your local.ini file to add this user – all admin users are stored in the config files. (I’m sure there’s an obvious reason for that)

Given that you most likely will use this password in Futon, make sure you pick a unique password – as far as I can tell this password goes over the wire.

Create a user object for your user

explains the basics. You need to create or update the _users database, which is a special couchdb database. You can get to it in Futon. If, like most people, you’re still on a couchdb before 1.2.0, you have to fiddle yourself to calculate the password_sha field, but at least the page explains how to do it. Not the most user-friendly thing to do in the world, so I’m considering adding commands for this to a different application I’m working on.

Allow this user to read and write to the mushin database

Again, the best reference is the CouchDB wiki, but the information is easy to miss.
Every database has a _security object under the database name; in the case of mushin, you can get to it in Futon. _security is a special document that does not get versioned, and doesn’t show up in listings either. In fact, it is so special that Futon doesn’t let you change it; when you click save it just resets. So your only option is to PUT the document, for example:

curl -X PUT -d @security.json http://admin:sup3rs3kr3t@localhost:5984/mushin/_security

Oops, see what I did there ? I had to specify my admin password on the command line, and now it’s in my shell history. I did tell you to choose a unique one because it’s going to be all over the place, didn’t I ?

security.json is just the contents of the _security document; just adapt the example on the wiki, and put your user under readers, and leave the role empty for now.

test denial

This one is simple; just try to GET the database:

$ curl http://localhost:5984/mushin
{“error”:”unauthorized”,”reason”:”You are not authorized to access this db.”}

If you did it right, you should see the same error. If you’re brave, you can retry the same curl command, but add your username and password. But you know how we feel about that.

Getting Things Done with CouchDB, part 2: Security

Filed under: couchdb,Hacking,mushin,Python — Thomas @ 4:26 pm

4:26 pm

(… wherein I disappoint the readers who were planning to drop my database)

So, I assume you now know what mushin is.

The goal for my nine/eleven hack day was simply to add authentication everywhere to mushin, making it as user-friendly as possible, and as secure as couchdb was going to let me.

Now, CouchDB’s security story has always been a little confusing to me. It’s gotten better over the years, though, so it was time to revisit it and see how far we could get.

By default, CouchDB listens only on localhost, uses plaintext HTTP, and is in Admin Party mode. Which means, anyone is an admin, and anyone who can do a request on localhost can create and delete databases or documents. This is really useful for playing around with CouchDB, learning its REST API using curl. So easy in fact that it’s hard to go away from that simplicity (I would not be surprised to find that there are companies out there running couchdb on localhost and unprotected).


What can users do ? What type of users are there ?

In a nutshell, couchdb has three levels of permissions:

  • server admin: can do anything; create, delete databases, replicate, … Is server wide. Think of it as root for couchdb.
  • database admin: can do anything to a database; including changing design documents
  • database reader: can read documents from the database, and (confusingly) write normal documents, but not design documents

CouchDB: The Definitive Guide sadly only mentions the server admin and admin party, and is not as definitive as its title suggests. A slightly better reference is (although I still haven’t cleared up what roles are to be used for, beside the internal _admin role).
By far the clearest explanation of security-related concepts in CouchDB is in
Jan Lernhardt’s CouchBase blog post.

I’ll come back to how these different objects/permissions get configured in a later post.


How do you tell CouchDB who you are, so it can decide what it lets you do ?

By default, CouchDB has the following authentication handlers:

  • OAuth
  • cookie authentication
  • HTTP basic authentication, RFC 2617

But wait a minute… To tell the database who I am, I have a choice between OAuth (which isn’t documented anywhere, and there doesn’t seem to be an actual working example of it, but I assume this was contributed by desktopcouch), cookie authentication (which creates a session and a cookie for later use, but to create the session you need to use a different authentication mechanism in the first place), or basic authentication (which is easy to sniff).

So, in practice, at some point the password is going to be sent over plaintext, and your choice basically is between once, a few times (every time you let your cookie time out, which happens after ten minutes, although you get a new cookie on every request), or every single time. I’m not a security expert, but that doesn’t sound good enough.

And typically, my solution would be to switch to https:// and SSL to solve that part. Since CouchDB 1.1 this is included, although I haven’t tried it yet (since I’m still on couchdb 1.0.3 because that’s what I got working on my phone)

Now, my use case is a command-line application. This adds some additional security and usability concerns:

  • When invoking gtd (the mushin command-line client) to add a task, it would be great if I didn’t have to specify my username and password every single time. Luckily, gtd can be started as a command line interpreter, so that helps.
  • It would be great if I didn’t have to specify a password on the command line, either as part of a URL (for example, when replicating) or as an option. I really hate to see passwords either in the process list or in shell history or in a config file, and typically I will use my lowest-quality passwords for apps that force me to do this, and want to avoid writing software that has no other option.

The Plan

In the end, the plan of attack started to clear up:

  • Get away from Admin Party in CouchDB, add an admin user
  • Create a new user in the _users database in CouchDB, with the same name as my system username
  • Create a _security object on the mushin database, and allow my username as a reader.
  • to connect to couchdb from gtd, use the current OS user, and ask for the password on the terminal
  • Use Paisley’s username/password and basic auth support. This means auth details still go over the network in plaintext. Add an Authenticator class that integrates with Paisley such that, when CouchDB refuses the operation, the authenticator can be asked to provide username and password to repeat the same request with. Together with a simple implementation that asks for the password on the terminal, this handles the security problem of passing the password to the application.
  • Use the cookie mechanism to avoid sending username/password every time. Create a session using the name and password, then store the cookie, and use that instead for the next request. Anytime you get a new cookie, use that from now on. This was relatively easy to do since paisley has changed to use the new


    and so it was easy to add a cookie-handling Agent together with the cookielib module.

  • A tricky bit was replication. When you replicate, you tell one CouchDB server to replicate to or from a database on another CouchDB server – from the point of view of the first one. On the one hand, CouchDB sometimes gives confusing response codes; for example, a 404 in the case where the remote database refuses access to the db, but a 401 in the case where the local database refuses access. On the other hand, we have to give our couchdb database the authentication information for the other one – again, we have to pass username and password, in plaintext, as part of the POST body for replication. I doubt there is a way to do this with cookies or oauth, although I don’t know. And in any case, you’re not even guaranteed that you can get an oauth token or cookie from the other database, since that database might not even be reachable by you (although this wouldn’t be a common case). The best I could do here is, again, ask for the password on the terminal if username is given but password is not.
  • Don’t log the password anywhere visibly; replace it with asterisks wherever it makes sense (Incidentally, later on I found out that couchdb does exactly the same on its console logging. Excellent.)
  • Upgrade to use CouchDB 1.1 everywhere, and do everything over SSL
  • Figure out OAuth, possibly stealing techniques from desktopcouch. For a command-line client, it would make sense that my os user is allowed to authenticate to a local couchdb instance only once per, say, X session, and a simple ‘gtd add U:5 @home p:mushin add oauth support’ would not ask for a password.

I made it pretty far down the list, stopping short at upgrading to couchdb 1.1

But at least tomorrow at work, people will not be able to get at my tasks on their first attempt. (Not that there’s anything secret in there anyway, but I digress)

Getting Things Done with CouchDB, part 1: how did I get here?

Filed under: couchdb,Hacking,mushin — Thomas @ 8:47 pm

8:47 pm

(… where I spend a whole post setting the scene)

Today is a day off in Barcelona – to some people it’s a national holiday. To me, it’s a big old opportunity to spend a whole day hacking on something I want to work on for myself.

And today, I wanted to go back to hack on mushin. What is mushin, you ask? I thought you’d never. It’s going to take all of this post to explain, and I won’t even get to today’s hack part.

mushin is an application to implement the Getting Things Done approach. I follow this approach with varying degrees of success, but it’s what has worked best for me so far.

I was never really happy with any of the tools available that claimed to implement it. For me, the basic requirements are:

  • available off-line (no online-only solutions)
  • data available across devices. I use my laptop, home desktop, and work desktop regularly; and when I’m in the store and I only have my phone with me I want to be able to see what I was supposed to buy whenever I’m in the @shopping context)
  • easy to add new tasks; preferably a command-line. I add tasks during meetings, when I’m on the phone, when I’m talking to someone, or simply typing them on my phone, so it has to be easy and quick.

This excluded many solutions at the time I first started looking for one. I recall RememberTheMilk was popular at the time, but since I was spending at least four hours a week on plane trips back then, and planes were excellent places to do GTD reviewing, it was simply not an option for me.

I don’t know if Getting things GNOME already existed back then. When I first looked at it, it was basically a local-only application, but since then it’s evolved to letting you synchronize tasks online, although it still looks like it’s an added-on feature instead of an integral design choice. I should try it again someday.

Anyway, I ended up using yagtd, which is a command-line application operating on a text file. I put the text file in subversion, and then proceeded to use it across three computers (back then I did not have a real smartphone yet), and cursing every time I forgot to update from subversion or commit to subversion. At least the conflicts were usually easy to manage since yagtd basically stores one line per ‘thing’.

And then I discovered CouchDB and I did what they told me to – I relaxed. I created a personal project called ‘things‘ that took most of yagtd’s functionality but put all the data in CouchDB. CouchDB solved two of the three requirements of my list above – it promised to make it possible to have my data available locally, even when offline, and to be able to synchronize it across devices. (Of course, I later figured out that it’s nice in theory but not that simple in practice – but the basics are there)

Even though I really liked the name ‘things’, because, you know, if you’re writing a GTD application, the things you are doing are actually, ‘things’. But I realized it was a stupid ungoogleable name, so I ended up going for something Japanesey that was close enough to ‘mind like water’ and stumbled on the name ‘mushin’ – (bonus: it started with an m, and for some reason I’m accumulating personal projects that start with m)

So I happily hacked away on mushin, making it have the same featureset as yagtd, but with couchdb as the backend. Originally I used python-couchdb, since for a command-line application it’s not strictly necessary to write non-blocking code. This was almost three years ago, and I’ve been using this application pretty much every day since then. (In fact, I have 2153 open things to do, and a well-rested mind that typically isn’t too concerned about forgetting about stuff, because everything I need to worry about is *somewhere* in those 2153 open things. And some days that statement is truer than on others!)

I wonder how many people by now think I’m a classic case of NIH – surely lots of people are happily using tools for GTD already. The way I convinced myself that it made sense to create this tool is because I was incredibly excited about the promise of CouchDB (and I still am, although I ‘m confused about what’s going on in CouchDB land, but more on that in another post).

Maybe I was a rarity back then, with my work desktop in Barcelona, my laptop, and my home desktop in Belgium, and wanting my data in all three, all the time. In the back of my mind I was still planning to write the Ultimate Music Application, with automatic synchronization of tracks and ratings synchronized across devices, and I thought that a simple GTD application would be an excellent testing ground to see if CouchDB really could deliver on the promises it was making to my future self.

Over time, I adapted mushin. At some point, I got an N900, and I wanted mushin to work on it, and a command-line client didn’t make that much sense. So I wrote a GUI frontend for it, and it was time to port mushin over to use Twisted, so that all calls could be done asynchronously and integrate nicely with the GUI. I switched from python-couchdb to Paisley, a CouchDB client from Twisted. (At one time I was even tricked into thinking I was the maintainer, appointed by the previous maintainer according to the last-touched rule, but then someone else happily forked Paisley from under me, and it now lives on github. I copied over the python-couchdb Document/Schema/Mapping code, because I liked how it worked.

And there I had a Maemo client, using most of the same code, and with access to the same data as all my desktops and laptop. I had reached my requirements, in a fashion.

It wasn’t ideal: replication had to be triggered manually. I had a command to do so, but any restart of a couchdb server, or being offline too long (a few hours of my laptop not being online or even on, for example) would break the replication. In practice, you still somehow need to initiate the replication, or write code around that to do that for you. Especially with my phone, which I usually don’t have online, it’s easy to forget and find yourself at the store without having synced first and not remembering exactly what it was I was supposed to buy. But it was good enough.

But I never really mentioned anything about his project, and the reason was simple. I was using CouchDB in Admin Party mode (anyone can do anything), and to be able to replicate (without fiddling with tunnels and the like) I had couchdb listening on So if anyone had known I was using this tool, it would have been very easy to take a look at all my tasks (bad), assign me some new tasks (even worse), or, you know, drop my whole database of things (I used to think that was bad, but today I’m not so convinced anymore?)

So I decided to rely on security by obscurity, and never write about my GTD application.

But now I did, so if you’re still with me, you may be excited about the prospect of getting onto a network of mine and dropping my database, to release me from the mental pressure of 2153 things to be done?

Ah, well… That’s what part 2 is going to be about. Stay tuned.

In the meantime, I converted my SVN repository to git and threw it on github. I only run it uninstalled at the moment, not ready yet to be packaged, but hey, if you’re brave… have a go.

More adventures in puppet

Filed under: General,Hacking,sysadmin — Thomas @ 11:32 pm

11:32 pm

After last week’s Linode incident I was getting a bit more worried about security than usual. That coincided with the fact that I found I couldn’t run puppet on one of my linodes, and some digging turned up that it was because /tmp was owned by uid:gid 1000:1000. Since I didn’t know the details of the breakin (and I hadn’t slept more than 4 hours for two nights, one of which involving a Flumotion DVB problem), I had no choice but to be paranoid about it. And it took me a good half hour to realize that I had inflicted this problem on myself – a botched rsync command (rsync arv . root@somehost:/tmp).

So I wasn’t hacked, but I still felt I needed to tighten security a bit. So I thought I’d go with something simple to deploy using puppet – port knocking.

Now, that would be pretty easy to do if I just deployed firewall rules in a single set. But I started deploying firewall rules using the puppetlabs firewall module, which allows me to group rules per service. So that’s the direction I wanted to head off into.

On saturday, I worked on remembering enough iptables to actually understand how port knocking works in a firewall. Among other things, I realized that our current port knocking is not ideal – it uses only two ports. They’re in descending order, so usually they would not be triggered by a normal port scan, but they would be triggered by one in reverse order. That is probably why most sources recommend using three ports, where the third port is between the first two, so they’re out of order.

So I wanted to start by getting the rules right, and understanding them. I started with this post, and found a few problems in it that I managed to work out. The fixed version is this:

# Comma seperated list of ports to protect with no spaces.
# Location of iptables command

# in stage1, connects on 3456 get added to knock2 list
${IPTABLES} -N stage1
${IPTABLES} -A stage1 -m recent --remove --name knock
${IPTABLES} -A stage1 -p tcp --dport 3456 -m recent --set --name knock2

# in stage2, connects on 2345 get added to heaven list
${IPTABLES} -N stage2
${IPTABLES} -A stage2 -m recent --remove --name knock2
${IPTABLES} -A stage2 -p tcp --dport 2345 -m recent --set --name heaven

# at the door:
# - jump to stage2 with a shot at heaven if you're on list knock2
# - jump to stage1 with a shot at knock2 if you're on list knock
# - get on knock list if connecting t0 1234
${IPTABLES} -N door
${IPTABLES} -A door -m recent --rcheck --seconds 5 --name knock2 -j stage2
${IPTABLES} -A door -m recent --rcheck --seconds 5 --name knock -j stage1
${IPTABLES} -A door -p tcp --dport 1234 -m recent --set --name knock

${IPTABLES} -A INPUT -p tcp --match multiport --dport ${SERVICES} -i ${UPLINK} -m recent --rcheck --seconds 5 --name heaven -j ACCEPT
${IPTABLES} -A INPUT -p tcp --syn -j door

# close everything else
${IPTABLES} -A INPUT -j REJECT --reject-with icmp-port-unreachable

And it gives me this iptables state:

So the next step was to reproduce these rules using puppet firewall rules.

Immediately I ran into the first problem – we need to add new chains, and there doesn’t seem to be a way to do that in the firewall resource. At the same time, it uses the recent iptables module, and none of that is implemented either. I spent a bunch of hours trying to add this, but since I don’t really know Ruby and I’ve only started using Puppet for real in the last two weeks, that wasn’t working out well. So then I thought, why not look in the bug tracker and see if anyone else tried to do this ? I ask my chains question on IRC, while I find a ticket about recent support. A minute later danblack replies on IRC with a link to a branch that supports creating chains – the same person that made the recent branch.

This must be a sign – the same person helping me with my problem in two different ways, with two branches? Today will be a git-merging to-the-death hacking session, fueled by the leftovers of yesterday’s mexicaganza leftovers.

I start with the branch that lets you create chains, which works well enough, bar some documentation issues. I create a new branch and merge this one on, ending up in a clean rebase.

Next is the recent branch. I merge that one on. I choose to merge in this case, because I hope it will be easier to make the fixes needed in both branches, but still pull everything together on my portknock branch, and merge in updates every time.

This branch has more issues – rake test doesn’t even pass. So I start digging through the failing testcases, adding print debugs and learning just enough ruby to be dangerous.

I slowly get better at fixing bugs. I create minimal .pp files in my /etc/puppet/manifests so I can test just one rule with e.g. puppet apply manifests/recent.pp

The firewall module hinges around being able to convert a rule to a hash as expressed in puppet, and back again, so that puppet can know that a rule is already present and does not need to be executed. I add a conversion unit test for each of the features that tests these basic operations, but I end up actually fixing the bugs by sprinkling print’s and testing with a single apply.

I learn to do service iptables restart; service iptables stop to reset my firewall and start cleanly. It takes me a while to realize when I botched the firewall so that I can’t even google (in my case, forgetting to have -A INPUT -m state –state ESTABLISHED,RELATED -j ACCEPT
) – not helped by the fact that for the last two weeks the network on my home desktop is really flaky, and simply stops working after some activity, forcing me to restart NetworkManager and reload network modules.

I start getting an intuition for how puppet’s basic resource model works. For example, if a second puppet run produces output, something’s wrong. I end up fixing lots of parsing bugs because of that – once I notice that a run tells me something like

notice: /Firewall[999 drop all other requests]/chain: chain changed '-p' to 'INPUT'
notice: Firewall[999 drop all other requests](provider=iptables): Properties changed - updating rule

I know that, even though the result seems to work, I have some parsing bug, and I can attack that bug by adding another unit test and adding more prints for a simple rule.

I learn that, even though the run may seem clean, if the module didn’t figure out that it already had a rule (again, because of bogus parsing), it just adds the same rule again – another thing we don’t want. That gets fixed on a few branches too.

And then I get to the point where my puppet apply brings all the rules together – except it still does not work. And I notice one little missing rule: ${IPTABLES} -A INPUT -p tcp –syn -j door

And I learn about –syn, and –tcp-flags, and to my dismay, there is no support for tcp-flags anywhere. There is a ticket for TCP flags matching support, but nobody worked on it.

So I think, how hard can it be, with everything I’ve learned today? And I get onto it. It turns out it’s harder than expected. Before today, all firewall resource properties swallowed exactly one argument – for example, -p (proto). In the recent module, some properties are flags, and don’t have an argument, so I had to support that with some hacks.

The rule_to_hash function works by taking an iptables rule line, and stripping off the parameters from the back in reverse order one by one, but leaving the arguments there. At the end, it has a list of keys it saw, and hopefully, a string of arguments that match the keys, but in reverse order. (I would have done this by stripping the line of both parameter and argument(s) and putting those on a list, but that’s just me)

But the –tcp-flags parameter takes two arguments – a mask of flags, and a list of flags that needs to be set. So I hack it in by adding double quotes around it, so it looks the same way a –comment does (except –comment is always quoted in iptables –list-rules output), and handle it specially. But after some fidgeting, that works too!

And my final screenshot for the day:

So, today’s result:

Now, I have a working node that implements port knocking:

node 'ana' {

$port1 = '1234'
$port2 = '3456'
$port3 = '2345'

$dports = [22, 3306]

$seconds = 5

firewall { "000 accept all icmp requests":
proto => "icmp",
action => "accept",

firewall { "001 accept all established connections":
proto => "all",
state => ["RELATED", "ESTABLISHED"],
action => "accept",

firewall { "999 drop all other requests":
chain => "INPUT",
proto => "tcp",
action => "reject",

firewallchain { [':stage1:', ':stage2:', ':door:']:

# door
firewall { "098 knock2 goes to stage2":
chain => "door",
recent_command => "rcheck",
recent_name => "knock2",
recent_seconds => $seconds,
jump => "stage2",
require => [

firewall { "099 knock goes to stage1":
chain => "door",
recent_command => "rcheck",
recent_name => "knock",
recent_seconds => $seconds,
jump => "stage1",
require => [

firewall { "100 knock on port $port1 sets knock":
chain => "door",
proto => 'tcp',
recent_name => 'knock',
recent_command => 'set',
dport => $port1,
require => [

# stage 1
firewall { "101 stage1 remove knock":
chain => "stage1",
recent_name => "knock",
recent_command => "remove",
require => Firewallchain[':stage1:'],

firewall { "102 stage1 set knock2 on $port2":
chain => "stage1",
recent_name => "knock2",
recent_command => "set",
proto => "tcp",
dport => $port2,
require => Firewallchain[':stage1:'],

# stage 2
firewall { "103 stage2 remove knock":
chain => "stage2",
recent_name => "knock",
recent_command => "remove",
require => Firewallchain[':stage2:'],

firewall { "104 stage2 set heaven on $port3":
chain => "stage2",
recent_name => "heaven",
recent_command => "set",
proto => "tcp",
dport => $port3,
require => Firewallchain[':stage2:'],

# let people in heaven
firewall { "105 heaven let connections through":
chain => "INPUT",
proto => "tcp",
recent_command => "rcheck",
recent_name => "heaven",
recent_seconds => $seconds,
dport => $dports,
action => accept,
require => Firewallchain[':stage2:'],

firewall { "106 connection initiation to door":
# FIXME: specifying chain explicitly breaks insert_order !
chain => "INPUT",
proto => "tcp",
tcp_flags => "FIN,SYN,RST,ACK SYN",
jump => "door",
require => [

and I can log in with

nc -w 1 ana 1234; nc -w 1 ana 3456; nc -w 1 ana 2345; ssh -A ana

Lessons learned today:

  • watch iptables -nvL is an absolutely excellent way of learning more about your firewall – you see your rules and the traffic on them in real time. It made it really easy to see for example the first nc command triggering the knock.
  • Puppet is reasonably hackable – I was learning quickly as I progressed through test and bug after test and bug.
  • I still don’t like ruby, and we may never be friends, but at least it’s something I’m capable of learning. Puppet might just end up being the trigger.

Tomorrow, I need to clean up the firewall rules into something reusable, and deploy it on the platform.

Collabora and Fluendo collaborate fluently!

Filed under: Fluendo,GStreamer — Thomas @ 1:01 pm

1:01 pm

Well, this sure has been a long time in the making.

Fluendo and Collabora have a checkered past which I won’t get into, but on paper it has always made sense for these two companies to collaborate and making GStreamer work commercially. One company specializes in products, the other in consulting (I’m sure you can figure out which is which), and complement each other perfectly to make GSstreamer more successful commercially.

I personally have always believed that we need to get GStreamer to other platforms and make them as easy to use as possible. Windows was an obvious target in the past, and now Android is another. There is a big difference between a successful open source project, and a commercially successful one. Flumotion’s Andoni Morales who came with me to the GStreamer 0.11 hackfest in Malaga is going to be working on this one SDK to rule them all.

Christian beat me to it in the blogosphere, but the word is now officially out! Feel free to read Fluendo’s press release.

« Previous PageNext Page »