[lang]

Present Perfect

Personal
Projects
Packages
Patches
Presents
Linux

Picture Gallery
Present Perfect

Python 2.7, JSON, and unicode

Filed under: couchdb,DAD,General,Python,Twisted — Thomas @ 5:28 pm

2011-4-23
5:28 pm

I have been hacking on Paisley more recently. I actually got to hack during work time on Paisley, because the guys needed a feature I had developed – change notification. But more on that later.

I eked out a four day hacking session over this easter weekend, and my primary goal is to make some good advances on my music database system using CouchDB. The code is still in prototype stage, and I wanted to start removing hacks, tightening things down, and adding tests. But suddenly I found myself having to add a bunch of unicode() calls on data coming back from paisley just because I was being stricter on the input to paisley functions.

I didn’t want to have to deal with unicode, again, as it detracted me of the core of my application. But I didn’t like paying the technical debt either of not understanding what was going wrong under the hood.

From my limited understanding, JSON is an object notation format used for exchange of information between processes and applications. A JSON string is in unicode, which is great. It would be pretty useless otherwise in today’s world. So I should be able to send in unicode to JSON libraries and get unicode back out.

A recent change in Paisley by one of the maintainers prefers simplejson over the stdlib-json. I first thought this change was to blame for my problems.

And yes, when decoding a JSON object, text was returned as str instead of unicode objects. Now, this is only when the text is in fact ASCII and hence works fine both as str and as unicode. And I’m sure opinions will differ here – but I think that a JSON library should *always* deserialize text to the same type of object by default – ie, unicode.

Clearly, simplejson disagrees with me. But I didn’t have this problem a few weeks ago, so something changed! What gives? And changing back to json over simplejson didn’t fix it either!

After some googling, I stumbled upon this bug report. Apparently, in 2.7, the C-based implementation deserializes ASCII text as str instead of unicode. The Python-based one always returns unicode for text. And in previous Pythons, both always returned unicode for text.

In essence, my problem boiled down to this:


[thomas@ana ~]$ ipython
Python 2.7 (r27:82500, Sep 16 2010, 18:02:00)
Type "copyright", "credits" or "license" for more information.
IPython 0.10.2 -- An enhanced Interactive Python.
? -> Introduction and overview of IPython's features.
%quickref -> Quick reference.
help -> Python's own help system.
object? -> Details about 'object'. ?object also works, ?? prints more.
In [1]: from json import decoder
In [2]: decoder.py_scanstring('"str"', 1)
Out[2]: (u'str', 5)
In [3]: decoder.c_scanstring('"str"', 1)
Out[3]: ('str', 5)

versus


[py-2.6] [thomas@ana ~]$ python
Python 2.6.2 (r262:71600, Sep 29 2009, 21:49:07)
[GCC 4.4.1 20090725 (Red Hat 4.4.1-2)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from json import decoder
>>> decoder.py_scanstring('"str"', 1)
(u'str', 5)
>>> decoder.c_scanstring('"str"', 1)
(u'str', 5)

Note how the last command returned a normal str object.

And yes, in the past few weeks since I last tested this, I did indeed upgrade my machine to Fedora 14, pulling in Python 2.7

simplejson seems to always deserialize to str when it can. I would consider that a bug – ie, ‘be strict in what you produce’.

As for Paisley, I made a feature-unicode branch on github, and this commit introduces a compatibility pjson module. By default it is STRICT, ie it wants unicode back always, and tests for the buggy behaviour, and does an alternative loads implementation that falls back to the python one. I’m sure some Paisley devs will prefer simplejson still, so you can change STRICTness and prefer simplejson.

Now, back to the hack.

Mac userfriendliness

Filed under: General — Thomas @ 10:53 am

2011-4-17
10:53 am

I never really got why the Mac is thought to be so userfriendly. This weekend we planned to backup and upgrade an older 2007 iMac running 10.4 Tiger to 10.5 Leopard.

I first wanted to make sure we could make a bootable copy of the hard drive. We got a WD MyBook Studio which is supposedly what you’d get for a Mac, with a fancy e-ink display for name and space left.

When attached to Firewire it first of all was not recognized at all. Over USB it worked, and recommended we do a firmware upgrade. After doing the firmware upgrade and rebooting, the drive wouldn’t light up anymore. A heavy paperweight, essentially. Trickery with dmesg showed that something does get recognized on the USB port, but that’s it. After an hour and a half of trying out various firmware uploading tools, we gave up and sent it back to the store, and settled for a standard no-additional-firmware USB drive.

Let Superduper run for a night backing up 120 GB of drive over 4 hours, and we were good to go (incidentally, I created two boot partitions, so after naming the first one ‘Bootie’, the second one named itself. A for the system drive, B for the first boot drive, and C for the second.)

Then comes the reboot. You’re supposed to hold the Option key during boot. With Macs, this always gives me anxiety – do you start holding down a key before or after you turn it on ? Can you hold it down while you are rebooting ? When can you let go ? There are simply no clues. Between the sound, the grey screen, and the apple, you have no idea what is going on. At least, with the usual PC boot screens, you can check for common problems like ‘is the keyboard even working’. I get Apple Anxiety all the time.

In this particular case, apparently I misremembered what the Option key was in the first place. I was holding down that four-lobed rotated clover key. But apparently Option is the Alt/railroad join key.

How is not labeling a key with the same name your software uses considered userfriendly by anyone ?

After holding down that railroad join key before rebooting until a boot menu pops up, we could choose the Bootie drive and boot from it. At least that bit was easy to use, and worked.

Make another backup just in case, then reboot with an official install CD of 10.5 from work.

This time you have to hold down the C key. I still don’t understand why having to search the net for random information JUST so you can boot from a CD is so much better than a simple boot menu and a prompt to get into it.

And after a lot of whirring and booting into the installer, it simply pops up a message saying ‘Mac OS X cannot be installed on the computer.’ This software cannot be installed on this computer.

No further explanation. How hard could it be to tell me ?

Googling, it turns out that grey install discs are tied to a specific model. The disc came from a MacMini.

And again, after much Googling, it looks like the 30 euro retail version of Snow Leopard can be installed over Tiger on intel Macs, so maybe we should just wait until Monday to upgrade.

Now, if only we could actually get the CD out of the drive when rebooting, as the installer runs from CD it doesn’t let you eject, and when you reboot I don’t know the magic key combination to eject… and I just *know* I’ve had to do this before on a MacMini and all I remember is that it was some stupid combination of tricks…

Download or Downloads

Filed under: friction,Open Source — Thomas @ 6:42 pm

2011-3-14
6:42 pm

Having various machines, some with homedirs passed on across distro versions, I somehow ended up with both Download and Downloads directories. Not to mention that my Firefox and Chromium instances were downloading everywhere, and the silly mental friction of this one letter difference is pissing me off. I was renaming one to the other on one machine, and probably doing the opposite on another.

So, no more. Instead of figuring out what the spec says, I just created a fresh user on my Fedora 14 laptop and did:

$ ls
Desktop Documents Downloads Music Pictures Public Templates Videos

So, that’s what it’s going to be. Never mind that I hate with a passion something ugly like Music vs Videos (where do I put audio podcasts then ?), I’ll just follow like rank and file.

Since I did get a little curious though where the two folders came from, I took a quick look at the FDO page (which as recent blogs have shown is the One Standard Body) and sure enough – version 0.11 renamed Download to Downloads.

This post is just a note to self so that any time I find a computer with both, I fix it properly, instead of the Browsian motion I’m on right now.

Evolution mailing iPhone

Filed under: General — Thomas @ 11:20 pm

2011-1-26
11:20 pm

I must be missing something blindingly obvious. I’m sure at least one of the Evolution hackers has an iPhone. But as far as I can tell, Evolution cannot send mails with attachments to iPhones such that the iPhone can show the attachment.

It looks like Evolution can only send multipart/mixed, and Mail.app on iPhone only understands multipart/alternative.

Now, far be it from me to be surprised at Apple not understanding established standards. But really – am I the only one having this problem ?

This weekend’s yak shave…

Filed under: Hacking,Open Source,pychecker,Python — Thomas @ 10:09 pm

2010-12-19
10:09 pm

… went a little something like this.

  • home desktop upgraded to F14, still a bunch of packages missing.
  • Want to do a release of moap because it’s been too long, but can’t because make check fails because pychecker fails because F14 is the first one with Python 2.7
  • Look into pychecker failures. Figure out that there are new opcodes. Worse – for the first time I can tell opcode numbers have been shifted around. Find this bug where renumbering started; commented to ask if this was expected. Apparently it is, although no good reason for doing so has been offered. Shrug, not my fight, although it’s going to make for ugly if’s in Pychecker code.
  • Realize I don’t actually have Python 2.7 on the Pychecker buildbot page. Remember I have a script to build Python versions from source which now builds 2.3-2.7 and 3.0 (just check out and run make, then py-x.y to start a shell with that version of python).
  • Add it to buildbot master config, restart, see buildbot fail spectacularly. Apparently buildbot was upgraded since I last started it, and code has been shuffled around. Spend a few hours figuring out how to upgrade buildbot and my code with custom steps without losing the history. Now have a buildbot 0.7.12 building on 2.7 as well.
  • Add some ugly if’s for python versions to handle opcode reshuffling. Brings down the test failures. Add some handlers for new things like POP_JUMP_IF_TRUE/FALSE and JUMP_IF_TRUE/FALSE_OR_POP. End up with only one test failure difference with 2.6
  • Look into the remaining failure, realize that it is checking for constness of return results, and JUMP_IF_TRUE/POP_TOP pairs are now POP_JUMP_IF_TRUE, so peeking ahead to see LOAD_CONST should move by one opcode.

So, yay! That means CVS (yes…) HEAD of pychecker now works just as good for 2.7 as 2.6 and it’s time to start releasing somewhere this week. And maybe I should push on and fix some of the older failures, or shelve them for now, while I’m at it, and have nicely green builds!

« Previous PageNext Page »
picture