thomas.apestaart.org » 2004

Life

Filed under: Life — Thomas @ 00:11

2004-08-12
00:11

Woke up late-ish on Sunday but not really. Johan, Wim and I had a movie afternoon planned at Wim's place, since he has the big projector and the empty wall. King Kong, Monster and Kiki's Delivery Service. I think I like them in that order, with the last one being the best.

Yesterday night we went out to dinner with Lotte, Lotte's mum and uncle, and Herbert Flack, a Famous Fleming :) Some absolutely excellent food at a restaurant called Limbo - we didn't hold back, and we still only managed to end up with 23 euro a person for the bill. Excellent, can't wait to go back...

Bought some ground wiring yesterday, need to find a free moment to install it.

Ran a flat tire with my bike today (I ride the bike to work these days since it's slightly less sweat-inducing than skating, but only marginally so). After some cursing I took the subway home. For some reason we have a spare wheel lying around, and it also happened to be a rear one, yay. So I swapped it out. Good to know my bike repair skills are still buried in my brain somehow. Now I need to buy some tools so I can repair the broken tire.

Sometimes life seems so simple here. If it doesn't work today, there's tomorrow.

Comments (0)

Server

Filed under: Fluendo,Hacking — Thomas @ 22:47

2004-08-09
22:47

So, our server has been tested in production a bunch of times. Each time, it runs fine for fifteen minutes, and clients connect to it all the time. It serves about 500 streams without any issues, only about 1% CPU usage total. At some random point in time however it drops clients and hangs; it looks like it's hanging in a read, but the stack trace seems corrupted.

So the hard thing about this kind of problem is that we cannot trigger it in a local setup where we simulate clients in a dumb way (1000 wget processes, for exmaple), and that on the server it's hard to get usable debug info. The log file with only DEBUG logging from one element on the server is about 600 MB by the time this problem happens. And really, 1000 wget's are no good simulation for a 1000 real users, each with their own network speed, and each with their own "Reload" push frequency.

I've searched the net for network stress test tools, but haven't found anything yet I can use. All of the web stress test tools use the complete request as a testing primitive. Meaning, a successful request is one where you get a complete reply, a full page, and a normal end. Of course are streams are "infinite" so we cannot use these test apps.

Other network testing tools work more low-level, which would mean we'd have to write some TCP/HTTP handling code as well. Really, what we'd need is some tool that allows us to get a URL, specify the client's bandwidth and possibly bandwidth profile, and keep connections alive for a random amount of time. If you know of anything, let me know.

Anyway, I started reading about possible limits for file descriptors and so on, and learned a bunch of useful new stuff. Then I started theorizing about possible failure scenarios from what I had learnt, and then I went through our plugin code again to see if these cases could be triggered. I also thought about how I could test each of these test cases.

The actual bug seems to be a really silly oversight in handling some error cases, but the good thing is I got about ten different points to watch out for and how I could reproduce, test and fix. I can hardly wait to get to work tomorrow to start doing all these tests, because something tells me this will fix our problem and give us a rock solid server. Or, at least, one that runs for more than 15 minutes when faced with a lot of clients :)

Comments (0)

home network

Filed under: General — Thomas @ 22:38

22:38

So, on saturday I decided to spend some time to figure out my network trouble at home. I finally figured out that the real symptom was my ADSL router rebooting. At some points the lights lock up, it keeps at it for about a minute, and then it reboots itself. No clue why, but at least I finally have something I can measure to see how my network is doing: the uptime of my router.

Armed with that piece of information, I unplugged everything from the network. I only used my laptop over wireless. Sure enough, ADSL router works fine for over an hour. I add one hub and one machine. Things keep working for more than an hour. And so I go on until I connect the small QBIC box that since a short time acts like the home server. And things fall apart.

To cut a long story short, this machine has an on-board network adapter that doesn't work, and with two different PCI NIC's it messes up the network badly enough to make the router crash.

I should also say that my appartment isn't grounded anywhere - which is typical in Spain. I had started thinking about adding a ground cable myself.

I was considering giving up, and move the home server back to the crappy AMD that overheats as soon as it runs 100% busy for a minute and starts beeping like crazy, and can only take 32 GB disks. I even got so far as tracking down and installing the latest (unstable beta) BIOS for it, which made it not halt on big drives, but it just reboots as soon as it loads the kernel.

Luckily at that point I remembered I still have some USB network adapters lying around. I hoped the USB adapter given the lower signals it works with would not cause problems on the network wrt. ground issues. And yes, using that my routers is a lot more stable. It still reboots sometimes, but it's more in the six hour range than the ten minutes. Enough to be bearable until I either add grounding myself or get someone to do it.

Only one drawback left in the meantime - even though the network adapter claims it can do 100Mbit, I cannot get it above 5 mbit of actual throughput...

Now let's hope I'm clear of network/harddisk/server issues for some time...

Comments (0)

nano version numbers

Filed under: Hacking — Thomas @ 21:32

2004-08-05
21:32

Rich Burridge rants about nano version numbers.

I don't know what project Rich encountered having this, but I'll share my take on it.
In GStreamer, and these days in a lot of projects I work on, we use nano numbers, but in a very specific way.
First of all, any tarball with a nano number is never an "official" release.
Second, a nano of 1 indicates that "this project is in free-for-all CVS mode, between two releases". As soon as a release is made, the nano number gets bumped to 1.
Third, a nano bigger than 1 (2 .. infinity) indicates a prerelease. This is a tarball specifically meant to be tested as an alpha/beta/rc/whatever for the final release.
The final release will have a major.minor.micro version. Internally, the nano is 0, and externally, it's never visible.

This sounds elaborate but works really well in practice.

There are various reasons for why I do this. An important one is that I feel that "official" tarballs should come from one source only, and be used as such. I don't want someone taking our current CVS, roll a tarball of it, and because I haven't updated the versioning of my project the tarball ends up with the same name as my last release. And then it gets packaged, and put out in the wild, and people file bugs, and I can't for the live of me figure out why their x.y.z version asserts in some piece of code that isn't even in our official x.y.z release. Worse, if I had bumped the micro number after releasing, people would be having tarballs in the wild of what would look like our next release !

A second reason is that some bugs or problems are version-related or release-related. Increasing the version by bumping
the nano brings out bugs that would otherwise have been brought out only when you did your final release. Some bugs are directly tied to your versioning. Others are just simply bugs that you don't get around to fixing until when you mentally get into release mode. For example, the first thing I do after a prerelease is drag it through a complete package build cycle.

This is usually the time when I notice silly stuff like headers not being installed correctly, plugins having the wrong name, duplicates, ... Our GStreamer debian maintainer somehow manages to each time only start building packages a week after release, hence he gets himself into trouble with bugs and issues that he could just as well have fixed properly for the release by testing the prerelease (Why he repeatedly chooses to shoot himself in the foot I don't know :) Lord knows I've tried to psycho-analyse this behaviour in the past...)

Using this scheme avoids the notorious brown-paper-bag-release syndrome. I'm not saying we don't run into that anymore, but the frequency is a hell of a lot less.

At one point someone released one of my modules because people felt it had a brown paper bag and they couldn't reach me right away, so they felt it was wise to release because they were impatient, and they chose (horrors) to add a .1 nano. Oh, the humiliation :)

A third reason is that I absolutely abhor having letters in versions. alpha, beta, gold, pl, rc, cvs, try2, pre, ... I'll never use them. They screw up ordering (order these modifiers by implied age), introduce bugs in code (compare handling in different versions of rpm), are ambiguous, and are a pain in the ass for packagers (take a look at the huge volume of threads on fedora.us packaging guidelines). Versions are meant to imply a clear order, and letters get in the way.
(Even worse is when they make you look stupid - one project we use has had an rc5 tarball as their latest release for over two years. Another project you might know has released a tarball with version 1.0pre3try2. And don't get me started on avifile).

So anyway, far too long of an explanation, but here are my reasons. Are there drawbacks to this system ?

Sadly, yes. There's one thing I am unhappy with. If you're about to increase your major.minor pair because you're going to release an x.y.0 release, you do a prerelease of the x.y-1.z series. This doesn't help you shake out bugs related to the minor being y and not y-1. But I've learned to live with it :)

Anyway, Rich, please point me to those other projects, I'm curious.

Please move along now.

Comments (0)

Xiph

Filed under: Hacking — Thomas @ 16:37

16:37

Have commited some of my build patches to some of the Xiph modules now. I broke one small thing but luckily I was keeping a close eye on the mailing lists.

So I've gained enough trust to do release engineering on the imminent libvorbis 1.1.0 release. So I've first fixed distcheck which was broken due to the doc build setup (a common problem), then fixed some smaller issues, and now I'm going through the differences to check what to do to the library versioning.

Attended the Xiph IRC meeting again last night, but given how my ADSL resets every fifteen minutes and that the meeting is at 2AM, it was a waste of my time. I really should get that fixed.

Comments (0)

Present Perfect

Life

2004-08-1200:11

Server

2004-08-0922:47

home network

22:38

nano version numbers

2004-08-0521:32

Xiph

16:37

2004-08-12
00:11

2004-08-09
22:47

2004-08-05
21:32