|
|
After piecing together the security story of CouchDB as it applies to mushin, I secured the mushin database on various machines. This serves as a quick setup guide for security for mushin, but I think it's useful for other people using CouchDB.
Stop using Admin Party
This is easy to do in Futon (link only works if you run couchdb locally on port 5984, as per default). Jan's blog post explains it perfectly, including screenshots.
Under the hood, couchdb will actually rewrite your local.ini file to add this user - all admin users are stored in the config files. (I'm sure there's an obvious reason for that)
Given that you most likely will use this password in Futon, make sure you pick a unique password - as far as I can tell this password goes over the wire.
Create a user object for your user
explains the basics. You need to create or update the _users database, which is a special couchdb database. You can get to it in Futon. If, like most people, you're still on a couchdb before 1.2.0, you have to fiddle yourself to calculate the password_sha field, but at least the page explains how to do it. Not the most user-friendly thing to do in the world, so I'm considering adding commands for this to a different application I'm working on.
Allow this user to read and write to the mushin database
Again, the best reference is the CouchDB wiki, but the information is easy to miss.
Every database has a _security object under the database name; in the case of mushin, you can get to it in Futon. _security is a special document that does not get versioned, and doesn't show up in listings either. In fact, it is so special that Futon doesn't let you change it; when you click save it just resets. So your only option is to PUT the document, for example:
curl -X PUT -d @security.json http://admin:sup3rs3kr3t@localhost:5984/mushin/_security
Oops, see what I did there ? I had to specify my admin password on the command line, and now it's in my shell history. I did tell you to choose a unique one because it's going to be all over the place, didn't I ?
security.json is just the contents of the _security document; just adapt the example on the wiki, and put your user under readers, and leave the role empty for now.
test denial
This one is simple; just try to GET the database:
$ curl http://localhost:5984/mushin
{"error":"unauthorized","reason":"You are not authorized to access this db."}
If you did it right, you should see the same error. If you're brave, you can retry the same curl command, but add your username and password. But you know how we feel about that.
After piecing together the security story of CouchDB as it applies to mushin, I secured the mushin database on various machines. This serves as a quick setup guide for security...
(... wherein I disappoint the readers who were planning to drop my database)
So, I assume you now know what mushin is.
The goal for my nine/eleven hack day was simply to add authentication everywhere to mushin, making it as user-friendly as possible, and as secure as couchdb was going to let me.
Now, CouchDB's security story has always been a little confusing to me. It's gotten better over the years, though, so it was time to revisit it and see how far we could get.
By default, CouchDB listens only on localhost, uses plaintext HTTP, and is in Admin Party mode. Which means, anyone is an admin, and anyone who can do a request on localhost can create and delete databases or documents. This is really useful for playing around with CouchDB, learning its REST API using curl. So easy in fact that it's hard to go away from that simplicity (I would not be surprised to find that there are companies out there running couchdb on localhost and unprotected).
Authorization
What can users do ? What type of users are there ?
In a nutshell, couchdb has three levels of permissions:
- server admin: can do anything; create, delete databases, replicate, ... Is server wide. Think of it as root for couchdb.
- database admin: can do anything to a database; including changing design documents
- database reader: can read documents from the database, and (confusingly) write normal documents, but not design documents
CouchDB: The Definitive Guide sadly only mentions the server admin and admin party, and is not as definitive as its title suggests. A slightly better reference is (although I still haven't cleared up what roles are to be used for, beside the internal _admin role).
By far the clearest explanation of security-related concepts in CouchDB is in Jan Lernhardt's CouchBase blog post.
I'll come back to how these different objects/permissions get configured in a later post.
Authentication
How do you tell CouchDB who you are, so it can decide what it lets you do ?
By default, CouchDB has the following authentication handlers:
- OAuth
- cookie authentication
- HTTP basic authentication, RFC 2617
But wait a minute... To tell the database who I am, I have a choice between OAuth (which isn't documented anywhere, and there doesn't seem to be an actual working example of it, but I assume this was contributed by desktopcouch), cookie authentication (which creates a session and a cookie for later use, but to create the session you need to use a different authentication mechanism in the first place), or basic authentication (which is easy to sniff).
So, in practice, at some point the password is going to be sent over plaintext, and your choice basically is between once, a few times (every time you let your cookie time out, which happens after ten minutes, although you get a new cookie on every request), or every single time. I'm not a security expert, but that doesn't sound good enough.
And typically, my solution would be to switch to https:// and SSL to solve that part. Since CouchDB 1.1 this is included, although I haven't tried it yet (since I'm still on couchdb 1.0.3 because that's what I got working on my phone)
Now, my use case is a command-line application. This adds some additional security and usability concerns:
- When invoking gtd (the mushin command-line client) to add a task, it would be great if I didn't have to specify my username and password every single time. Luckily, gtd can be started as a command line interpreter, so that helps.
- It would be great if I didn't have to specify a password on the command line, either as part of a URL (for example, when replicating) or as an option. I really hate to see passwords either in the process list or in shell history or in a config file, and typically I will use my lowest-quality passwords for apps that force me to do this, and want to avoid writing software that has no other option.
The Plan
In the end, the plan of attack started to clear up:
I made it pretty far down the list, stopping short at upgrading to couchdb 1.1
But at least tomorrow at work, people will not be able to get at my tasks on their first attempt. (Not that there's anything secret in there anyway, but I digress)
(... wherein I disappoint the readers who were planning to drop my database) So, I assume you now know what mushin is. The goal for my nine/eleven hack day was...
After ripping over a 1000 CD's perfectly, and having problems on a few (bad discs, weird audio, a few small niggles to fix), I ran into a fun failure.
Apparently, the file name in u'morituri/Sufjan Stevens - Illinois/02. Sufjan Stevens - The Black Hawk War, or, How to Demolish an Entire Civilization and Still Feel Good About Yourself in the Morning, or, We Apologize for the Inconvenience but You\'re Going to Have to Leave Now, or, "I Have Fought the Big Knives and Will Continue to Fight Them Until They Are Off Our Lands!".flac' is too long for my NAS.
Thank you Mister Sufjan. In your honour, I added a function to morituri to shrink the filename to a power of two minus one, below either the given length or 128 characters, whichever is less. For now the algorithm splits on spaces and changed the file name to morituri/Sufjan Stevens - Illinois/02. Sufjan Stevens - The Black Hawk War, or, How to Demolish an Entire Civilization and Still Feel Good About Yourself.flac
That is good enough for me... I was worried I had to teach this one tiny function about keeping quoted pieces together, or how comma's work, or how 'or, ' works, and so on, just to satisfy my crazy sense of aesthetics.
After ripping over a 1000 CD's perfectly, and having problems on a few (bad discs, weird audio, a few small niggles to fix), I ran into a fun failure. Apparently,...
This week at work we ran into a problem where one of our Python processes was consuming close to 3 GB of memory because it's not properly cleaning up a list. Because of other bugs this process could not be easily restarted without triggering other problems, so our core team asked for some suggestions and I told them "Why don't you try cleaning up the Python list using GDB and the Python C API ?" I had a vague recollection of someone on our team doing something like this a few years ago.
I also asked them to blog about it, because there aren't that many resources readily findable on the subject.
So here is Andoni's take on the problem.
If any Pythonista can suggest how he could have avoided the segfault during garbage collection, please let us know!
This week at work we ran into a problem where one of our Python processes was consuming close to 3 GB of memory because it's not properly cleaning up a...
I have been hacking on Paisley more recently. I actually got to hack during work time on Paisley, because the guys needed a feature I had developed - change notification. But more on that later.
I eked out a four day hacking session over this easter weekend, and my primary goal is to make some good advances on my music database system using CouchDB. The code is still in prototype stage, and I wanted to start removing hacks, tightening things down, and adding tests. But suddenly I found myself having to add a bunch of unicode() calls on data coming back from paisley just because I was being stricter on the input to paisley functions.
I didn't want to have to deal with unicode, again, as it detracted me of the core of my application. But I didn't like paying the technical debt either of not understanding what was going wrong under the hood.
From my limited understanding, JSON is an object notation format used for exchange of information between processes and applications. A JSON string is in unicode, which is great. It would be pretty useless otherwise in today's world. So I should be able to send in unicode to JSON libraries and get unicode back out.
A recent change in Paisley by one of the maintainers prefers simplejson over the stdlib-json. I first thought this change was to blame for my problems.
And yes, when decoding a JSON object, text was returned as str instead of unicode objects. Now, this is only when the text is in fact ASCII and hence works fine both as str and as unicode. And I'm sure opinions will differ here - but I think that a JSON library should *always* deserialize text to the same type of object by default - ie, unicode.
Clearly, simplejson disagrees with me. But I didn't have this problem a few weeks ago, so something changed! What gives? And changing back to json over simplejson didn't fix it either!
After some googling, I stumbled upon this bug report. Apparently, in 2.7, the C-based implementation deserializes ASCII text as str instead of unicode. The Python-based one always returns unicode for text. And in previous Pythons, both always returned unicode for text.
In essence, my problem boiled down to this:
[thomas@ana ~]$ ipython
Python 2.7 (r27:82500, Sep 16 2010, 18:02:00)
Type "copyright", "credits" or "license" for more information.
IPython 0.10.2 -- An enhanced Interactive Python.
? -> Introduction and overview of IPython's features.
%quickref -> Quick reference.
help -> Python's own help system.
object? -> Details about 'object'. ?object also works, ?? prints more.
In [1]: from json import decoder
In [2]: decoder.py_scanstring('"str"', 1)
Out[2]: (u'str', 5)
In [3]: decoder.c_scanstring('"str"', 1)
Out[3]: ('str', 5)
versus
[py-2.6] [thomas@ana ~]$ python
Python 2.6.2 (r262:71600, Sep 29 2009, 21:49:07)
[GCC 4.4.1 20090725 (Red Hat 4.4.1-2)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from json import decoder
>>> decoder.py_scanstring('"str"', 1)
(u'str', 5)
>>> decoder.c_scanstring('"str"', 1)
(u'str', 5)
Note how the last command returned a normal str object.
And yes, in the past few weeks since I last tested this, I did indeed upgrade my machine to Fedora 14, pulling in Python 2.7
simplejson seems to always deserialize to str when it can. I would consider that a bug - ie, 'be strict in what you produce'.
As for Paisley, I made a feature-unicode branch on github, and this commit introduces a compatibility pjson module. By default it is STRICT, ie it wants unicode back always, and tests for the buggy behaviour, and does an alternative loads implementation that falls back to the python one. I'm sure some Paisley devs will prefer simplejson still, so you can change STRICTness and prefer simplejson.
Now, back to the hack.
I have been hacking on Paisley more recently. I actually got to hack during work time on Paisley, because the guys needed a feature I had developed - change notification....
« Previous Page — Next Page »
|