thomas.apestaart.org » 2011

Python 2.7, JSON, and unicode

Filed under: couchdb,DAD,General,Python,Twisted — Thomas @ 17:28

2011-04-23
17:28

I have been hacking on Paisley more recently. I actually got to hack during work time on Paisley, because the guys needed a feature I had developed - change notification. But more on that later.

I eked out a four day hacking session over this easter weekend, and my primary goal is to make some good advances on my music database system using CouchDB. The code is still in prototype stage, and I wanted to start removing hacks, tightening things down, and adding tests. But suddenly I found myself having to add a bunch of unicode() calls on data coming back from paisley just because I was being stricter on the input to paisley functions.

I didn't want to have to deal with unicode, again, as it detracted me of the core of my application. But I didn't like paying the technical debt either of not understanding what was going wrong under the hood.

From my limited understanding, JSON is an object notation format used for exchange of information between processes and applications. A JSON string is in unicode, which is great. It would be pretty useless otherwise in today's world. So I should be able to send in unicode to JSON libraries and get unicode back out.

A recent change in Paisley by one of the maintainers prefers simplejson over the stdlib-json. I first thought this change was to blame for my problems.

And yes, when decoding a JSON object, text was returned as str instead of unicode objects. Now, this is only when the text is in fact ASCII and hence works fine both as str and as unicode. And I'm sure opinions will differ here - but I think that a JSON library should *always* deserialize text to the same type of object by default - ie, unicode.

Clearly, simplejson disagrees with me. But I didn't have this problem a few weeks ago, so something changed! What gives? And changing back to json over simplejson didn't fix it either!

After some googling, I stumbled upon this bug report. Apparently, in 2.7, the C-based implementation deserializes ASCII text as str instead of unicode. The Python-based one always returns unicode for text. And in previous Pythons, both always returned unicode for text.

In essence, my problem boiled down to this:

[thomas@ana ~]$ ipython Python 2.7 (r27:82500, Sep 16 2010, 18:02:00) Type "copyright", "credits" or "license" for more information. IPython 0.10.2 -- An enhanced Interactive Python. ? -> Introduction and overview of IPython's features. %quickref -> Quick reference. help -> Python's own help system. object? -> Details about 'object'. ?object also works, ?? prints more. In [1]: from json import decoder In [2]: decoder.py_scanstring('"str"', 1) Out[2]: (u'str', 5) In [3]: decoder.c_scanstring('"str"', 1) Out[3]: ('str', 5)

versus

[py-2.6] [thomas@ana ~]$ python Python 2.6.2 (r262:71600, Sep 29 2009, 21:49:07) [GCC 4.4.1 20090725 (Red Hat 4.4.1-2)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> from json import decoder >>> decoder.py_scanstring('"str"', 1) (u'str', 5) >>> decoder.c_scanstring('"str"', 1) (u'str', 5)

Note how the last command returned a normal str object.

And yes, in the past few weeks since I last tested this, I did indeed upgrade my machine to Fedora 14, pulling in Python 2.7

simplejson seems to always deserialize to str when it can. I would consider that a bug - ie, 'be strict in what you produce'.

As for Paisley, I made a feature-unicode branch on github, and this commit introduces a compatibility pjson module. By default it is STRICT, ie it wants unicode back always, and tests for the buggy behaviour, and does an alternative loads implementation that falls back to the python one. I'm sure some Paisley devs will prefer simplejson still, so you can change STRICTness and prefer simplejson.

Now, back to the hack.

Comments (2)

Mac userfriendliness

Filed under: General — Thomas @ 10:53

2011-04-17
10:53

I never really got why the Mac is thought to be so userfriendly. This weekend we planned to backup and upgrade an older 2007 iMac running 10.4 Tiger to 10.5 Leopard.

I first wanted to make sure we could make a bootable copy of the hard drive. We got a WD MyBook Studio which is supposedly what you'd get for a Mac, with a fancy e-ink display for name and space left.

When attached to Firewire it first of all was not recognized at all. Over USB it worked, and recommended we do a firmware upgrade. After doing the firmware upgrade and rebooting, the drive wouldn't light up anymore. A heavy paperweight, essentially. Trickery with dmesg showed that something does get recognized on the USB port, but that's it. After an hour and a half of trying out various firmware uploading tools, we gave up and sent it back to the store, and settled for a standard no-additional-firmware USB drive.

Let Superduper run for a night backing up 120 GB of drive over 4 hours, and we were good to go (incidentally, I created two boot partitions, so after naming the first one 'Bootie', the second one named itself. A for the system drive, B for the first boot drive, and C for the second.)

Then comes the reboot. You're supposed to hold the Option key during boot. With Macs, this always gives me anxiety - do you start holding down a key before or after you turn it on ? Can you hold it down while you are rebooting ? When can you let go ? There are simply no clues. Between the sound, the grey screen, and the apple, you have no idea what is going on. At least, with the usual PC boot screens, you can check for common problems like 'is the keyboard even working'. I get Apple Anxiety all the time.

In this particular case, apparently I misremembered what the Option key was in the first place. I was holding down that four-lobed rotated clover key. But apparently Option is the Alt/railroad join key.

How is not labeling a key with the same name your software uses considered userfriendly by anyone ?

After holding down that railroad join key before rebooting until a boot menu pops up, we could choose the Bootie drive and boot from it. At least that bit was easy to use, and worked.

Make another backup just in case, then reboot with an official install CD of 10.5 from work.

This time you have to hold down the C key. I still don't understand why having to search the net for random information JUST so you can boot from a CD is so much better than a simple boot menu and a prompt to get into it.

And after a lot of whirring and booting into the installer, it simply pops up a message saying 'Mac OS X cannot be installed on the computer.' This software cannot be installed on this computer.

No further explanation. How hard could it be to tell me ?

Googling, it turns out that grey install discs are tied to a specific model. The disc came from a MacMini.

And again, after much Googling, it looks like the 30 euro retail version of Snow Leopard can be installed over Tiger on intel Macs, so maybe we should just wait until Monday to upgrade.

Now, if only we could actually get the CD out of the drive when rebooting, as the installer runs from CD it doesn't let you eject, and when you reboot I don't know the magic key combination to eject... and I just *know* I've had to do this before on a MacMini and all I remember is that it was some stupid combination of tricks...

Comments (21)

Debugging sshd on N900 (after restore from backup)

Filed under: maemo — Thomas @ 20:52

2011-04-14
20:52

I got my N900 back. The micro-USB port had broken off cleanly. Apparently that's a known thing. I brought it to a Nokia Support Centre and they fixed it in two weeks. So far for the good news.

The bad news is that, even if they only reattached a connector, they still wiped the phone itself. I had a backup from the backup application and a dirvish backup of /home/user

But obviously that's not enough. First of all, restoring from the backup takes a while - it reinstalled 110 apps, 20 of which stopped the installation because they feel I should click ok to install. Sigh.

Transferring 30 GB of old data is slow no matter which way you cut it.

I lost my Angry Birds levels and the Sygic Mobile Maps app, still trying to figure out if I can get those back easily.

But at the moment I'm mostly annoyed at my problems figuring out why I can't ssh as user@ to the N900 anymore.

So, here are the steps I took:

$ sudo gainroot
# apt-get install sysklogd
# vi /etc/syslog.conf
(uncomment the line that says: auth,authpriv.* /var/log/auth.log)
# vi /etc/ssh/sshd_config
change LogLevel to DEBUG3
# killall sshd
(doing /etc/init.d/ssh restart does not actually get the job done; neither does initctl stop sshd)
# tail -f /var/log/auth.log
(finally the log is there)
try and log in

Now the log tells me:
Apr 7 12:51:31 Nokia-N900 sshd[2266]: User user not allowed because account is locked

And a quick look at /etc/passwd shows ! as the password, meaning the account is locked out.

So...

# passwd user
(pick a password)

then try and log in again using ssh keys, and now it works.

I had no idea a user needed a password before being allowed to log in with ssh keys (not using the password).

After this, don't forget to set logging back to INFO lest you fill up your limited disk space with useless debug info.

Comments (4)

kslowd000 and friends

Filed under: Question,sysadmin — Thomas @ 20:43

2011-04-07
20:43

Ever since upgrading to Fedora 14 my desktop felt sluggish. It was more than the typical boiling frog kind of sluggishness, where you get the feeling everything's snappy just after you bought a new fancy computer and install it freshly with a recent OS, and over time performance slowly degrades until you wonder why computers are always so slow. Sure, it looks like Evolution, after a round of improvements in memory management, has gone back to being a memory hog. But this time, it was more. It would go through short phases of unresponsiveness and then come back. Load would be consistently around 1 or more, but for no apparent reason at all.

After a while watching top, I noticed a process called kslowd[xxx] jumping up and down in the top output regularly. The k says it's a kernel process. No idea what it is. Googling isn't very helpful to learn what it actually is, but it did put me on the trail because there are huge amounts of posts on sites and mailing lists about this process eating CPU time and slowing down the computer.

After a bunch of reading some post suggested it might be this patch by Dave Airlie, a name I recognize, to the kernel. I took the Fedora kernel src.rpm, spent a few minutes getting acquainted with Fedora's kernel spec layout de l'annÃ©e, integrated the patch, rebooted, and voila. No more kslowd000 eating all my CPU.

I recently found this workaround which I'll try next time the kernel gets upgraded.

That still doesn't tell me what that kernel process is supposed to be doing (anyone up for a mandatory rule of having man pages for kernel processes too ?), so feel free to comment!

Comments (4)

CouchDB python unittest setUp/tearDown

Filed under: couchdb,Hacking,Python,Twisted — Thomas @ 23:44

2011-04-03
23:44

I've been hacking on Paisley again recently since I found it I am not the only current maintainer. There is a branch on github from which a 0.3 release was recently made.

That's good news, because I didn't really need a new project to maintain. But I still have code I want to see land there, so I'm working on merging branches between launchpad, github, and some of my experimental svn branches here and there.

I had just implemented a cache for the object view mapping using couchdb-python's mapping.py and it turns out someone else was interested in adding memcache support to cache document lookups.

Some discussion started on a possible API, and I took a stab at a first draft over the past week.

Separately from that, I also took a CouchDB training course for work (together with Marek, one of our developers) ran by the Couchbase (company merger of CouchOne, formerly CouchIO (?) and Membase) people. That was a good training - but I digress.

At night Marek told me that they have some 300 lines of code that sadly reuses some classes from the current work codebase to set up and tear down test cases that work against an actual couchdb instance. He didn't feel like rewriting all that code to not use some of work's code just so that it could be contributed to Paisley for example. I felt I could do it in less than 100 lines, but he didn't seem to believe me.

So here I am after a magnificent Jose Gonzalez concert at the Palau de la Musica which is right around the corner from me, trying to write the caching code, and realizing I can't properly test it together with the change notification listener I wrote.

So while I was watching an episode of Breaking Bad, I wrote the setUp and tearDown code to do just that - start a couchdb instance on a random port, get the port, and connect to it.

It's probably not perfect yet (I do a busy loop for the creation and filling of the log file to read the port), but it worked for my simple test case. And it's 74 lines of code, including docstrings (which Marek for some reason does not believe in) and comments (which Marek also not believes in).

It's being worked on in this branch and I hope to land that in the paisley tree soon.

Comments (0)

Present Perfect

Python 2.7, JSON, and unicode

2011-04-2317:28

Mac userfriendliness

2011-04-1710:53