[lang]

Present Perfect

Personal
Projects
Packages
Patches
Presents
Linux

Picture Gallery
Present Perfect

Python ugliness

Filed under: Python — Thomas @ 17:02

2009-11-28
17:02

I usually tend to think of Python as the discerning gentleman's programming language: well-behaved, well-documented, people take care of the code written. I like the batteries-included approach and assume that the battery code in the standard library is well-written. "import this" is a vision statement directly included in the language - it's hard to get more stylish than that.

I got an eye-opener this weekend however. I was still on my quest to get desktopcouch and ubuntuone working on Fedora. While wresting with this bug and doing things that I usually consider a hanging offense (changing /usr-installed code by adding prints to figure out where the craziness was coming from) I finally drilled down to the exception-raising reason. It all boiled down to a single line of code in httplib.py:

def __init__(self, host, port=None, key_file=None, cert_file=None,
strict=None, timeout=socket._GLOBAL_DEFAULT_TIMEOUT):

where socket.py contains

_GLOBAL_DEFAULT_TIMEOUT = object()

So, in a nutshell, httplib2.py tracebacks because of this new object, which isn't a valid argument to sock.settimeout()

Now, I'm pretty sure I'm running into this problem because I'm doing "bad" things to some of the stdlib, pulling in bits I need to make ubuntuone (coded on a 2.6.3 python where my Fedora 11 comes with 2.6) work.

But pulling the cover off like this did point out this one object that:

  • seems to be intended to be private, but it gets referenced from other stdlib modules
  • comes with no documentation at all
  • comes with not a single comment explaining *why* it's there, or *why* it's ok to just create a completely empty and useless "object" that you can't even trace the origin of (I had to override __setattr__ on some class to figure out what the anonymous object was, and where it was being set from, to find it)

Maybe I'm oldfashioned, but this leaves me disappointed. This one line breaks beauty, explicitness, and readability that is included in the Zen of Python.

The only attempt at explaining I found is this.

Any Python guru want to set me straight on why this isn't the incredibly ugly wart on stdlib that I consider it to be ?

Meanwhile I'll go digging in svn to see when it was added to 2.6 and why.

9 Comments »

  1. the reason probably is that it is intended to allow lazy behavior, eg like this (not knowing if that is actually done in httplib, but it could be):

    _DEFAULT_ENVIRONMENT = SomeEnvironmentObject()
    _GLOBAL_DEFAULT_TIMEOUT = object()
    _GLOBAL_DEFAULT_TIMEOUT_FOR_PATIENT_PEOPLE = object()

    def do_something(*args, session=_DEFAULT_ENVIRONMENT, timeout=_GLOBAL_DEFAULT_TIMEOUT):
    if timeout is _GLOBAL_DEFAULT_TIMEOUT:
    timeout = session.timeout
    elif timeout is _GLOBAL_DEFAULT_TIMEOUT_FOR_PATIENT_PEOPLE:
    timeout = session.timeout_for_patient_people
    etc

    thus, _GLOBAL_DEFAULT_TIMEOUT can be passed in from “outside” but does not need to have a numeric value. it’s like a constant that is never resolved.

    this is arguably not the most beautiful way to do it; sometimes, you see things like this:

    class Token(str):
    pass

    _GLOBAL_DEFAULT_TIMEOUT = Token(“socket._GLOBAL_DEFAULT_TIMEOUT”)

    or similar – works the same way, but is easier to debug.

    anyway, this variable has exactly one underscore at the start, usually a sign that it is somehow private (but not enforced to be so, so whoever uses this from the outside has to know what he is doing.

    Comment by me — 2009-11-28 @ 17:32

  2. You may want to read this discussion: http://bugs.python.org/issue2451

    Comment by marcin — 2009-11-28 @ 18:27

  3. Not all parts of the standard library are examples of how to write good modules. I got over that shock a long time ago.

    Comment by Floris Bruynooghe — 2009-11-28 @ 22:36

  4. Creating an empty object is a not-so-uncommon way of getting hold of a unique sentinel value.

    If a module has “foo = object()”, the rest of the code can let “foo” be a default value for function arguments. Inside the function one can check if “arg is foo” to see if “arg” was left to its default value (foo). This is necessary if you want to differentiate between people passing no argument and passing the normal value for “no argument” namely None.

    Comment by Martin Geisler — 2009-11-29 @ 02:23

  5. I use python at work quite a bit. It’s a very good language in many ways. But as far as library handling goes the language is rather the ugly duckling. There’s just too many issues around library versioning, installing and general handling. Far too often an app will fail to work because libraries aren’t installed where the app can find them or there’s some incompatibility between minor python or library versions, or python fails to adapt gracefully to being installed in different ways by different distributions.

    It doesn’t need to be this way, and it can be done better. Ruby and Perl are both much better behaved about this and fail far less. I don’t know if it’s a deficiency in python system’s own ability to adapt properly to changes, or if it’s just not giving library and app writers the tools to effortlessly do the right thing. But this is a visible and unnecessary blemish on an otherwise quite beautiful language.

    Comment by Janne — 2009-11-29 @ 02:54

  6. Creating just an object like that and then using it as a default parameter is a common and perfectly acceptable way to track if a parameter was passed or not, when None is an acceptable value to pass in.

    It’s used in this code:
    if timeout is not _GLOBAL_DEFAULT_TIMEOUT:
    sock.settimeout(timeout)
    And since None is an acceptable parameter to settimout, that means None is an acceptable parameter to create_connection. Therefore the default needs to be something else. So a marker object is createt and used.

    Why is it referenced from other parts? Well, because they also want a timeout parameter. So:

    class HTTPConnection:
    def __init__(self, host, port=None, strict=None,
    timeout=socket._GLOBAL_DEFAULT_TIMEOUT):

    That way they can just pass their timeout value into socket.create_connection(), and if no timeout value is passed, the default is used.

    I do agree that a small comment above the definition in the standard library would have been good.

    Comment by Lennart Regebro — 2009-11-29 @ 08:23

  7. @Lennart: I understand the goal, but the problem is with the style. First, creating a general object() makes it impossible to trace what object it is, where it is created, and what purpose it serves. Second, if it starts with an underscore it’s clearly intended to be private. So there is a mismatch in purpose of intent on the author’s part: a ‘constant’ to be used by various modules should not be started with an underscore and left without documentation. It’s quite simply laziness on the author’s part, almost suggesting that he put in an underscore because he couldn’t be bothered to write a comment or docs and get away with it since he can later handwave and say ‘this was private’.

    Consenting adults works fine when people accept their responsibility. I doubht the author did in this case.

    Comment by Thomas — 2009-11-29 @ 12:27

  8. […] whom I have never met but works in some of the areas of Linux that I used to be involved with, writes about some ugly code he found in the Python standard libraries, and says I usually tend to think of […]

    Pingback by Peter’s Work Journal » Blog Archive » The Python Standard Libraries: Not Very Good — 2009-11-29 @ 18:02

  9. Using a marker object is not that uncommon, although it can be done better. In Storm, we use an object called “Undef” for this purpose that has a repr() of “Undef”. That would seem to cover the debugging aspects of your complaint.

    Comment by James Henstridge — 2009-11-30 @ 01:47

RSS feed for comments on this post. TrackBack URL

Leave a comment

picture