strongly typed |
2007-10-29
|
First of all, congratulations to Joe Shaw on getting hitched! I can't believe that, only a few months ago, I was having drinks with him, and when I asked him how he was he didn't even mentioned getting married! I guess it was already a plain simple fact of life for him by the time.
But to keep it technical, I read through his reply to some Beagle comments, which resonated with me:
And lastly, having worked with Trow on a reasonably big desktop Python app, we wanted a strongly typed language. Writing real applications in Python requires a discipline that unfortunately most people (including myself) are unwilling to adhere to, and this easily leads to buggy and hard to maintain programs. You have to be very diligent about unit tests and code coverage for every line of code, because you can’t rely on the compiler to catch errors for you. We had been burned by this a bit, and wanted to get back to a strongly typed, but still easy to use language that integrated well with the desktop.
The core of this message is 100% true - you need discipline to write real apps in Python; you have to be very diligent about unit tests and code coverage.
The crux of the matter though is that this is true for *any* real application in *any* language. However old and boring it makes me feel to say so, doing these things are part of good software engineering practice, and for good reason. Since realizing this, all the projects I've started working on, as soon as they leave the single-day-hack-prototype phase, first get converted into a simple bare-bones project, and the first things I add is unit test, code coverage and API documentation infrastructure. This is very hard to retrofit on an existing codebase; you teach yourself the habit of doing so at the start so you can reap the rewards later.
Unit tests and code coverage support two separate things. The first is that they give you confidence, if done right, that overall your project code is working well. It allows you to decide to release because you have some way of measuring that the quality of your project is at least as high as on the previous release.
The second is that you can refactor correctly. Some of our hackers in the office incorrectly use the word refactor when they really mean "I don't like or understand this big section of code and it's buggy and I am going to rewrite it completely and see what happens." Refactoring is much more well-defined than that. It is the act of rewriting the internal implementation of a section of code, without changing the external behaviour at all. (A logical consequence is that refactoring code does not fix bugs - you need to produce the same bugs in your refactored code!)
How do you know if you changed the external behaviour ? You can only know so if you have unit tests which cover the code you're refactoring.
As a side note, a distinct bonus of your project having unit tests is that it makes it easier for others to hack on. Just this weekend, I found a simple bug in pychecker. Now, pychecker has a unit test setup, so that means I can add the unit test that is failing, figure out where in the code it's failing, fix the code, verify that it works with my unit test, then verify that I did not break any other functionality by running the complete unit test. The result is 15 minutes of patching work, and it's a joy to go bug fixing this way.
Getting back to Joe's post. I'm not sure Joe intended it to sound that way, but I felt the conclusion he drew from his experience with Python was "I don't use Python anymore because it forces me to think about unit tests and code coverage."
I don't know what kind of tests Beagle has as part of its codebase and release procedure. I wrote about my experience with Beagle at some point
and Joe commented to the effect that the kernel gives you 8192 inotify watches and when you run out, it acts funny.
Well, no shit. Of course I have over 8192 directories under my home directory. If Beagle is, for example (I have no idea if it does), really watching every .svn and every CVS directory, no wonder it is running out. Two questions spring to mind - how did Joe get away with not having over 8192 .svn and CVS directories, and why does the software not deal with this gracefully ? If I would learn about this limit, I would write an integration test (probably not a unit test, in this case) and test graceful handling of this boundary condition. (It's entirely possible that a test like this exists in the Beagle code, I haven't checked - I don't intend this as an attack on Beagle or Joe, for those who assume the worst in me :))
To cast a stone at myself, GStreamer for a long time only had a bunch of small ad-hoc test applications that developers would write as they hacked on some feature, then at best it would be integrated in some make check command, or at worst it would be forgotten about and bitrot. I think one of my biggest contributions to GStreamer was to reorganize the test suite into a coherent whole, and making it easy for developers to integrate into it. It's obviously not the only reason 0.10 is so much better than 0.8, but it has definitely helped a lot.
I'm not saying a program is automatically better for having unit tests and code coverage, that would be silly. A perfect programmer can knock out a perfect program with zero coverage and tests. A crap programmer can knock out a crap program with 100% test coverage. But all things equal, having tests and coverage makes your program better. These days, I definitely trust programs more, and prefer to use them more, if I've seen they come with tests, and the developers care about their quality.
The first project that really convinced me of these simple principles was the amazing Twisted project. In spite of the sometimes vicious social commentary those guys receive from their peers (bloat, overdesigned, badly documented, downright crazy ?), which makes them sound jaded sometimes, these guys have, over several years, created a consistently excellent framework. They care about their quality and their users, and have integrated so many of these good engineering practices in their daily workflow. I'm not sure they themselves realize how rare this was for a FOSS project back then, but for me, Twisted as a project was an eye opener.
Wow, this post is a lot longer than I intended it to be, after waking up way too early and still fighting the aftermath of a stomach flu. I have to get going.
Anyway, here's what I want you to take away from today's post.
I program in Python precisely because it forces me to write unit tests and think about coverage. You should be doing this anyway, regardless of the language you do it in. By choosing a language Python, you will realize why much earlier on in your project, or your project will fail spectacularly :)
Of course you should write tests (unit tests or random tests) for any bigger software even if you get to enjoy the advantages of strong (and IMO preferrably static) typing. It is as you say a very very valuable tool for refactoring and for catching programming errors.
Having to constantly check back with your testsuite to catch type errors that the compiler won’t complain about however isn’t my idea of fun.
Comment by Mattias Bengtsson — 2007-10-29 @ 12:16
Amen.
(/me needs to learn more about python and unit tests)
Comment by Lukas — 2007-10-29 @ 12:16
You can come and test my two units any time baby.
Comment by Lien — 2007-10-29 @ 12:19
Hey Thomas,
First of all, I think when we were talking in England we hadn’t actually decided to get married at that point. :) We had talked about it, but it wasn’t a sure thing at that point. We’re not big on “planning” so we pulled this thing together in about two months. :)
You won’t get any disagreement from me about the fact that one *should* write unit tests and code coverage for every piece of software one writes. The static vs. dynamic typed languages debate has raged on forever and is one that I struggle with myself, going back and forth between the two. In rereading my original message, I made it sound like using a statically typed language makes this need magically go away. I think it helps, and enforces greater structure — which has its tradeoffs — but will not lead to magically bug free code. In the end I always advocate using whatever tools the programmer thinks is best for the job.
For the Beagle bug you mentioned, it’s funny that you bring it up because it actually wasn’t a bug. That was the intended behavior, but the side effects of that behavior and the index-harder-on-screensaver behavior did not mix well. In the end we decided to make inotify (and a higher watch limit) mandatory. An integration test might have brought the severity to our attention, but I doubt it. And FWIW, Beagle doesn’t watch any dot-directories (like .svn) or certain others (like CVS). You just have a lot of files. :)
Hope you feel better!
Joe
Comment by Joe Shaw — 2007-10-29 @ 14:46
One need to understand (and love) Duck Typing before programming in Python. I’m not saying that you dont understand it, Joe, but I can see that you dont love it :)
Comment by Eduardo Padoan — 2007-10-29 @ 19:06
I have to say that the “compiler catches errors for me” idea is really great in a statically typed language like Haskell. Unfortunately, that kind of comment is usually made in the context of languages like C++ or Java, where the type system and compiler support is a pale shadow of that found in languages with a more advanced type system; in those cases, I find the compiler is often just complaining about problems caused by things the language should be smart enough to deal with for me, and the benefits simply just don’t seem worth it over a dynamic language like Python and friends.
Comment by Tristan Seligmann — 2007-11-01 @ 21:58
Doctest, (http://en.wikipedia.org/wiki/Doctest), can give great tests for little effort in Python.
Comment by Paddy3118 — 2007-12-13 @ 11:15
I totally agree with your point about software engineering discipline. However, it *is* possible to use “manifest” typing in Python by using Traits (http://code.enthought.com/traits/). That is, the types are explicit (“manifest”) in the code, and are validated at runtime, so if you run tests, you’re more likely to catch type-related errors.
Disclaimer: I work for Enthought, which publishes Traits as open source (and therefore does not derive direct revenue from it).
Comment by Janet Swisher — 2007-12-14 @ 17:05