[lang]

Present Perfect

Personal
Projects
Packages
Patches
Presents
Linux

Picture Gallery
Present Perfect

Votes for talks at open source conferences

Filed under: Conference,Python — Thomas @ 12:53 pm

2013-5-7
12:53 pm

I’ve never been a fan of voting for talks, because it tends to be poorly implemented under the guise of democracy. Of course it’s easy for me to talk, I’ve never organized anything at that scale.

I’ll give two examples on why I feel this way, one of which triggering today’s blog post.

First off, my colleague Marek submitted a talk to Djangocon. The talk was about how to use feat (a toolkit we wrote for livetranscoding) to serve Django pages, but in such a way that they can use Deferreds to remove the concurrency bottleneck of “1 request at a time” per process running Django.

Personally, to me, this is one of the most irritating design choices of Django – from the ground up it was built synchronously (which could have been fine in most places). But the fact that, when you get a request, you have to always synchronously respond to it (and block every other request for that process in the meantime) is a design choice that could have easily been avoided.

In our particular use case, it was really painful. If our website has to do an API request to some other service we don’t control that can easily take 30 seconds, our process throughput suddenly becomes 2 pages per minute. All the while, the server is sitting there waiting.

Yes, you can throw RAM at the problem and start 30 times more processes; or thread out API requests; or farm it out to Celery, and do some back-and-forthing to see when the call’s done. Or do any other number of workarounds for a fundamental design choice.

Since we like Twisted, we preferred to throw Twisted at the problem, and ended up with something that worked.

Anyway, that’s a lot of setup to explain what the talk was about. Marek submitted the talk to DjangoCon, and honestly I didn’t expect it to get much traction because, when you’re inside Django, you think like Django, and you don’t really realize that this is a real problem. Most people who do realize it switch away to something else.

But to my surprise, Marek’s talk was the most-voted talk! I wish I could link to the results, but of course that vote site is no longer online.

I guess I expected that would mean he’d be presenting at DjangoCon this year. So I asked him today when his talk was, and he said “Oh that’s right. I did not get accepted.”

Well, that was a surprise. Of course, the organising committee reserves the right to decide on their own – maybe they just didn’t like the talk. But if you ask your potential visitors to vote, you’d expect the most-voted talk to make it on the schedule no ?

The feedback Marek got from them was surprising too, though. Their first response was that this talk was too similar to another talk, titled “How to combine JavaScript & Django in a smart way”. Now, I’m not a JavaScript expert, but from the title alone I can already tell that it’s very unlikely that these two talks have many similarities beyond the word ‘Django’.

After refuting that point, their second reason was that they wanted more experienced speakers (but they didn’t ask Marek for his experience), and their third reason was that the talk was in previous editions of DjangoCon US/EU (it’s unclear whether they meant his talk or the JavaScript one, but Marek’s definitely wasn’t, and we couldn’t find any mention of the other talk in previous conferences. I’m also not sure why that even matters one way or the other. This email thread was in Polish, so I have to rely on Marek’s interpretation of it)

Personally, my reaction would have been to complain to the organizers or Django maintainers. Marek’s flegmatic attitude was much better though – after such an exchange, he simply doesn’t want to have anything to do with the conference.

He’s probably right – it’s hard to argue with someone who doesn’t want to invite you and is lying about the reasons.

The second example is BCNDevCon, a great conference here in Barcelona, organized by a guy who used to work for Flumotion who I have enormous respect for. I’ve never seen anyone create such a big conference over so little time.

He believes strongly in the democratic aspect, and as far as I can tell constructs the schedule solely based on the votes.

Sadly I didn’t go to the last one, and the reason is simply because I felt that the talks that made it were too obviously corporate. A lot of talks were about Microsoft products, and you could tell that they won votes because people’s coworkers voted on talks. I’m not saying that’s necessarily wrong – given that he worked at our company and has friends here, I’m sure people working here presenting at his conference have also done vote tending. It’s natural to do so. But there should be a way to balance that out.

I think the idea of voting is good, but implementation matters too. Ideally, you would only want people that actually are going to show up to vote. I have no idea how you can ensure that, though. Do you ask people to pre-pay ? Do you ask them to commit to pay if at least 50% of their votes make it in the final schedule, kickstarter-style ?

These two examples are on opposite extremes of voting. One conference simply disregards completely what people vote on. If I had voted or bought a ticket, I would feel lied to. Why waste the time of so many people? The other conference puts so much stock in the vote, that I feel the final result was strongly affected. I seriously doubt all those Windows 8 voters actually showed up.

Does anyone have good experiences with conference voting that did work? Feel free to share!

If I was 16 years younger…

Filed under: General — Thomas @ 10:30 pm

2013-5-3
10:30 pm

I’d totally try and be the intern for pinboard.

The money is great for a summer job, but that’s not the important part. pinboard seems interesting, it’s a real service, and it’s (I assume) small enough to understand from top to bottom. Contrary to, say, a Google Summer of Code project, you get to touch a real existing service, and from what I can tell from the blog you get to do it with a smart and funny guy.

You’ve got five weeks left; even if you’re in the middle of exams right now, apply!

(And if you do, why not add the features to merge and rename tags while you’re at it?)

measuring puppet

Filed under: puppet — Thomas @ 8:58 pm

2013-1-24
8:58 pm

For one of work’s projects, we’ll soon be working on scaling our platform more, which will require deploying a bunch more machines. For this project, we basically have a local dev platform, an online dev platform, a preproduction platform, and a production platform.

Right now, all these platforms have exactly one host. There are two puppetmasters, one for the dev platforms and one for the pre/pro platforms.

Since deploying a bunch more machines is going to require a lot more puppet running, I want to work on removing as much friction as I can from my puppet work. I do the runs manually as we upgrade platforms during deployment, and a run typically takes well over a minute. For me, that’s too long – it causes me to waste time, lose focus, task switch, and forget I should be following up on puppet runs. It makes finetuning puppet modules a chore as I hack on them.

So I wanted to start by trimming some of the obvious fat before I segment my puppet config into separately testable pieces. I would have expected puppet apply to actually have something to help with that, but it doesn’t.

After thinking it through, I realized I wanted some kind of tool that would timestamp output of puppet apply –debug so I could see which things it does take more time than others.

I wasn’t sure what to google for, but timestamp stdout seemed to bring up some results, and I hit on http://joeyh.name/code/moreutils/ which includes a tool called ‘ts’ and is a simple pipe filter that timestamps lines going to stdout.

That was almost good enough. What I really wanted though was to know how much time elapsed since printing the last line. My perl is rusty, but I managed to quickly cook up a patch that makes it print incremental timestamps.

Now I can do a puppet run like this:

puppet apply --modulepath=`pwd`/modules:`pwd`/dev/modules manifests/site.pp --debug | ts -i "%H:%M:%.S" | egrep -B 1 "^00:00:0[^0]"
00:00:00.001066 debug: Executing 'test -e ../commit && ( test xorigin/master == `cat ../commit` || ( git fetch -a; test `git rev-parse --verify origin/master^0 | head -n 1` == `cat ../commit` ) )'
00:00:02.646908 debug: Service[postfix](provider=redhat): Executing '/sbin/service postfix status'
--
00:00:00.000987 debug: Executing 'test -e ../commit && ( test xorigin/release-1.4.x == `cat ../commit` || ( git fetch -a; test `git rev-parse --verify origin/release-1.4.x^0 | head -n 1` == `cat ../commit` ) )'
00:00:01.871258 debug: Exec[git-checkout-/var/www/partner-test](provider=posix): Executing check 'test -e ../commit && ( test xorigin/release-1.4.x == `cat ../commit` || ( git fetch -a; test `git rev-parse --verify origin/release-1.4.x^0 | head -n 1` == `cat ../commit` ) )'
00:00:00.000942 debug: Executing 'test -e ../commit && ( test xorigin/release-1.4.x == `cat ../commit` || ( git fetch -a; test `git rev-parse --verify origin/release-1.4.x^0 | head -n 1` == `cat ../commit` ) )'
00:00:01.886606 debug: Prefetching aliases resources for mailalias
--
00:00:00.000957 debug: Executing '/usr/sbin/semanage fcontext -l | grep -q '^/home/git/dev(/.*)?''
00:00:01.750281 debug: /Stage[main]/Dexter::Apache/Selinux::Set_fcontext[home-httpd]/Exec[semanage-/home/git/dev-httpd_sys_content_t]/unless: /usr/sbin/semanage: Broken pipe
--
00:00:00.000855 debug: Executing 'test -e ../commit && ( test xorigin/release-1.4.x == `cat ../commit` || ( git fetch -a; test `git rev-parse --verify origin/release-1.4.x^0 | head -n 1` == `cat ../commit` ) )'
00:00:02.064475 debug: /Schedule[puppet]: Skipping device resources because running on a host
--
00:00:00.001048 debug: Executing '/usr/sbin/semanage fcontext -l | grep -q '^/srv/merchant(/.*)?''
00:00:01.750129 debug: /Stage[main]/Partner::Install/Selinux::Set_fcontext[srv-merchant-httpd]/Exec[semanage-/srv/merchant-httpd_sys_content_t]/unless: /usr/sbin/semanage: Broken pipe
--
00:00:00.000861 debug: Executing 'test -e ../commit && ( test xmaster == `cat ../commit` || ( git fetch -a; test `git rev-parse --verify master^0 | head -n 1` == `cat ../commit` ) )'
00:00:01.841316 debug: Exec[git-checkout-/var/www/merchant](provider=posix): Executing check 'test -e ../commit && ( test xorigin/release-1.4.x == `cat ../commit` || ( git fetch -a; test `git rev-parse --verify origin/release-1.4.x^0 | head -n 1` == `cat ../commit` ) )'
00:00:00.000955 debug: Executing 'test -e ../commit && ( test xorigin/release-1.4.x == `cat ../commit` || ( git fetch -a; test `git rev-parse --verify origin/release-1.4.x^0 | head -n 1` == `cat ../commit` ) )'
00:00:01.858206 debug: Service[httpd](provider=redhat): Executing '/sbin/service httpd status'

Some explanation is in order.

The puppet apply is straightforward if you know puppet a little – it will apply a manifest and spit out a lot of debug info.

The output gets piped into ts, which will do incremental timestamping (with -i, which is what my patch adds) according to the specified format (ts by default uses seconds precision, but can do microsecond precision if you use %.S in the format).

Then I grep for all lines that take at least 1 second to be output, and display the line before that too (since puppet is generating output either before or after a possibly long-running task, so either on the line before or the line that took too long).

In the first section, I doubt service postfix status is to blame, so it’s probably my convoluted git updating that takes too long. I need to rework that module so it doesn’t fetch on every run.

In the third section, semanage is to blame. Hm, maybe I need to find a different way to look up whether the particular fcontext rule I want to add is already there. I’ve considered converting it to facts, although that sounds like it would be stretching facts a little – that’s a lot of info to store in a fact.

The others are repeats of both, so I know where to start trimming the fat now!

And when all > 1 sec items are gone, time to shave off more below that.

If you want to try out ts with incremental timestamping, it’s available in the rebuilt moreutils rpm in my package repositories for CentOS 6 and F16/17/18.

If any puppetmaster (hah!) has good tips on how to debug and measure the catalog generation step (the one on the master), let me know!

mach 1.0.2 “ears” released

Filed under: mach,Releases — Thomas @ 10:32 pm

2013-1-22
10:32 pm

Another Fedora, another mach release. This release fixes a minor bug and adds support for Fedora 18.

Get the source, update from my repository, or wait until updates hit the Fedora repository.

Happy packaging!

morituri 0.2.0 “ears” released

Filed under: morituri,Releases — Thomas @ 11:45 pm

2013-1-20
11:45 pm

A new year, a new morituri release.

I got informed some people wanted to use morituri with a different log output, so I made the logger pluggable.

For my personal use, I have now gotten to ripping all my singles and ep’s, and so instead of having singles with the same name as an album overwrite the album, I added template variables for the release type. I’ve also changed the default templates to use it, so if you were relying on the default template for your collection, you may want to either move those files or use the previous default template.

morituri now has a config file, so once you’ve run rip offset find to find your drive’s offset, it will save it and automatically use it for ripping. Same for checking whether cdparanoia can defeat the drive’s caching. morituri saves it by drive information, not by device node, so it will work with different USB drives too.

See the trac page for more info and download links. You can also download it from my package repository for Fedora 17 and 18 if that’s your distro.

For the curious, here’s some more info:


This is morituri 0.2.0, "ears"

Coverage in 0.2.0: 67 % (1890 / 2807), 95 python tests

Features added in 0.2.0:

- added plugins system for logger
- added rip cd rip --logger to specify logger
- added reading speed, cdparanoia and cdrdao version to logger
- added rip drive analyze to detect whether we can defeat audio cache behaviour
- store drive offsets and cache defeating in config file
- rip drive list shows configured offset and audio cache defeating
- added rip image retag --release-id to specify the release id to tag with
- added %r/%R for release type to use in track/disc template
- added %x for extension to release template

Bugs fixed in 0.2.0:

- 89: Fails to rip track with \ in its name
- 105: Backslash in track names causes "Cannot find file" during rip
- 108: Unable to find offset / rip
- 109: KeyError when running "rip offset find"
- 111: Python traceback when config has no read offset for CD
- 76: morituri should allow for a configuration file
- 96: rip image retag: allow specification of release ID
- 107: Backslash in track name confuses AR step
- 112: add MusicBrainz lookup URL to generated logfile

« Previous PageNext Page »
picture