[lang]

Present Perfect

Personal
Projects
Packages
Patches
Presents
Linux

Picture Gallery
Present Perfect

Flumotion release

Filed under: Fluendo,Releases — Thomas @ 18:45

2004-12-17
18:45

Rolling both a GStreamer prerelease and a Flumotion release.

I'm pretty damn happy with the new Flumotion release - it has some very nice improvements. The important one is the bundling of UI code which gets sent over the network, so that the admin client really is just a light shell that works everywhere.

We already has a basic concept of sending over UI, but now it's been cleaned up and it works a lot better.

Basically, the manager on a machine has a registry that tells it how a bunch of files in the local tree are to be "partitioned" into a set of "bundles". Each bundle is just a group of files that belong together (like, say, icons, glade files, and PyGTK code for changing colorbalance). Bundles can depend on each other (since, say, the tv card UI code depends on the colorbalance ui code). The union of all bundles represents the whole set of files that can be sent over the network.

The interesting part is that when the UI wants to show a page for a component, it asks the manager "I want to do this import. Give me everything I need". The manager replies with the list of bundle names it needs, as well as md5sums for the .zip files for these bundles. The admin checks locally for which zip files it doesn't have yet (or are outdated -hence the md5sum), and then requests all bundles it needs.

When it has them, it extracts them locally in a cache directory, *in a unique dir based on bundle name and md5sum*. Then some python magic is done so that you can import from the bundle namespace directly. So even if a file is extracted locally as "bttv-ui/a589fec...f3f/flumotion/component/producers/bttv/admin_gtk.py", you can just run "import flumotion.component.producers.bttv.admin_gtk" and it works.

Now, why all this caching ? Because you can run the same admin client against different versions of managers. So instead of having to download all code each time, you just cache all downloaded instances, and regularly clean up old ones.

One of the nice things here is that just clicking on the component ui again automatically runs the newest code. Very handy for testing.

Anyway, new flumotion came out today. Give it a spin.

Flumotion hacking spree

Filed under: Fluendo — Thomas @ 01:47

2004-11-10
01:47

So we had some silly bugs in our 0.1.2 release that should have been easy fixes. And to some extent they were. But fixing the easy bits revealed a whole slew of other things that left me unsatisfied. So I dug deeper and deeper, at each step in the way thinking "It's just five more minutes and then it will be perfect". I ended up staying at work until half past midnight. Luckily the big guy stayed with me all the time offering moral support, or I would have gone home earlier.

I added a ton of debugging and error handling, making sure we handle a whole bunch of potential problems correctly. Now you should get a dialog for pretty much everything that can go wrong.

I need to rethink how we test the server, because obviously just trying random things is not going to work. Especially if it's the developer doing the random testing, because everyone knows as a developer you only test the things you know will work.

Anyway, after five hours of spare time hacking I'm pretty happy with the result, so tomorrow we do a release after some extensive testing. And then I think long and hard on how to fix some of the process since that's what I'm supposed to be doing.

2 months in

Filed under: General,Life,Work — Thomas @ 23:54

2015-01-03
23:54

Today is my two month monthiversary at my new job. Haven't had time so far to sit back and reflect and let people know, but now during packing boxes for our upcoming move downtown, I welcome the distraction.

I dove into the black hole. I joined the borg collective. I'm now working for the little search engine that could.

I sure had my reservations while contemplating this choice. This is the first job I've had that I had to interview for - and quite a bit, I might add (though I have to admit that curiosity about the interviewing process is what made me go for the interviews in the first place - I wasn't even considering a different job at that time). My first job, a four month high school math teaching stint right after I graduated, was suggested to me by an ex-girlfriend, and I was immediately accepted after talking to the headmaster (that job is still a fond memory for many reasons). For my first real job, I informally chatted over dinner with one of the four founders, and then I started working for them without knowing if they were going to pay me. They ended up doing so by the end of the month, and that was that. The next job was offered to me over IRC, and from that Fluendo and Flumotion were born. None of these were through a standard job interview, and when I interviewed at Google I had much more experience on the other side of the interviewing table.

From a bunch of small startups to a company the scale of Google is a big step up, so that was my main reservation. Am I going to be able to adapt to a big company's way of working? On the other hand, I reasoned, I don't really know what it's like to work for a big company, and clearly Google is one of the best of those to work for. I'd rather try out working for a big company while I'm still considered relatively young job-market-wise, so I rack up some experience with both sides of this coin during my professionally mobile years.

But I'm not going to lie either - seeing that giant curious machine from the inside, learn how they do things, being allowed to pierce the veil and peak behind the curtain - there is a curiosity here that was waiting to be satisfied. Does a company like this have all big problems solved already? How do they handle things I've had to learn on the fly without anyone else to learn from? I was hiring and leading a small group of engineers - how does a company that big handle that on an industrial scale? How does a search query really work? How many machines are involved?

And Google is delivering in spades on that front. From the very first day, there's an openness and a sharing of information that I did not expect. (This also explains why I've always felt that people who joined Google basically disappeared into a black hole - in return for this openness, you are encouraged to swear yourself to secrecy towards the outside world. I'm surprised that that can work as an approach, but it seems to). By day two we did our first commit (obviously nothing that goes to production, but still.) In my first week I found the way to the elusive (to me at least) roof top terrace by searching through internal documentation.IMG_20141229_144054The view was totally worth it.

So far, in my first two months, I've only had good surprises. I think that's normal - even the noogler training itself tells you about the happiness curve, and how positive and excited you feel the first few months. It was easy to make fun of some of the perks from an outside perspective, but what you couldn't tell from that outside perspective is how these perks are just manifestations of common engineering sense on a company level. You get excellent free lunches so that you go eat with your team mates or run into colleagues and discuss things, without losing brain power on deciding where to go eat (I remember the spreadsheet we had in Barcelona for a while for bike lunch once a week) or losing too much time doing so (in Barcelona, all of the options in the office building were totally shit. If you cared about food it was not uncommon to be out of the office area for ninety minutes or more). You get snacks and drinks so that you know that's taken care of for you and you don't have to worry about getting any and leave your workplace for them. There are hammocks and nap pods so you can take a nap and be refreshed in the afternoon. You get massage points for massages because a healthy body makes for a healthy mind. You get a health plan where the good options get subsidized because Google takes that same data-driven approach to their HR approach and figured out how much they save by not having sick employees. None of these perks are altruistic as such, but there is also no pretense of them being so. They are just good business sense - keep your employees healthy, productive, focused on their work, and provide the best possible environment to do their best work in. I don't think I will ever make fun of free food perks again given that the food is this good, and possibly the favorite part of my day is the smoothie I pick up from the cafe on the way in every morning. It's silly, it's small, and they probably only do it so that I get enough vitamins to not get the flu in winter and miss work, but it works wonders on me and my morning mood.

I think the bottom line here is that you get treated as a responsible adult by default in this company. I remember silly discussions we had at Flumotion about developer productivity. Of course, that was just a breakdown of a conversation that inevitably stooped to the level of measuring hours worked as a measurement of developer productivity, simply because that's the end point of any conversation on that spirals out of control. Counting hours worked was the only thing that both sides of that conversation understood as a concept, and paying for hours worked was the only thing that both sides agreed on as a basic rule. But I still considered it a major personal fault to have let the conversation back then get to that point; it was simply too late by then to steer it back in the right direction. At Google? There is no discussion about hours worked, work schedule, expected productivity in terms of hours, or any of that. People get treated like responsible adults, are involved in their short-, mid- and long-term planning, feel responsible for their objectives, and allocate their time accordingly. I've come in really early and I've come in late (by some personal definition of "on time" that, ever since my second job 15 years ago, I was lucky enough to define as '10 AM'). I've left early on some days and stayed late on more days. I've seen people go home early, and I've seen people stay late on a Friday night so they could launch a benchmark that was going to run all weekend so there'd be useful data on Monday. I asked my manager one time if I should let him know if I get in later because of a doctor's visit, and he told me he didn't need to know, but it helps if I put it on the calendar in case people wanted to have a meeting with me at that hour.

And you know what? It works. Getting this amount of respect by default, and seeing a standard to live up to set all around you - it just makes me want to work even harder to be worthy of that respect. I never had any trouble motivating myself to do work, but now I feel an additional external motivation, one this company has managed to create and maintain over the fifteen+ years they've been in business. I think that's an amazing achievement.

So far, so good, fingers crossed, touch wood and all that. It's quite a change from what came before, and it's going to be quite the ride. But I'm ready for it.

(On a side note - the only time my habit of wearing two different shoes was ever considered a no-no for a job was for my previous job - the dysfunctional one where they still owe me money, among other stunts they pulled. I think I can now empirically elevate my shoe habit to a litmus test for a decent job, and I should have listened to my gut on the last one. Live and learn!)

Votes for talks at open source conferences

Filed under: Conference,Python — Thomas @ 12:53

2013-05-07
12:53

I've never been a fan of voting for talks, because it tends to be poorly implemented under the guise of democracy. Of course it's easy for me to talk, I've never organized anything at that scale.

I'll give two examples on why I feel this way, one of which triggering today's blog post.

First off, my colleague Marek submitted a talk to Djangocon. The talk was about how to use feat (a toolkit we wrote for livetranscoding) to serve Django pages, but in such a way that they can use Deferreds to remove the concurrency bottleneck of "1 request at a time" per process running Django.

Personally, to me, this is one of the most irritating design choices of Django - from the ground up it was built synchronously (which could have been fine in most places). But the fact that, when you get a request, you have to always synchronously respond to it (and block every other request for that process in the meantime) is a design choice that could have easily been avoided.

In our particular use case, it was really painful. If our website has to do an API request to some other service we don't control that can easily take 30 seconds, our process throughput suddenly becomes 2 pages per minute. All the while, the server is sitting there waiting.

Yes, you can throw RAM at the problem and start 30 times more processes; or thread out API requests; or farm it out to Celery, and do some back-and-forthing to see when the call's done. Or do any other number of workarounds for a fundamental design choice.

Since we like Twisted, we preferred to throw Twisted at the problem, and ended up with something that worked.

Anyway, that's a lot of setup to explain what the talk was about. Marek submitted the talk to DjangoCon, and honestly I didn't expect it to get much traction because, when you're inside Django, you think like Django, and you don't really realize that this is a real problem. Most people who do realize it switch away to something else.

But to my surprise, Marek's talk was the most-voted talk! I wish I could link to the results, but of course that vote site is no longer online.

I guess I expected that would mean he'd be presenting at DjangoCon this year. So I asked him today when his talk was, and he said "Oh that's right. I did not get accepted."

Well, that was a surprise. Of course, the organising committee reserves the right to decide on their own - maybe they just didn't like the talk. But if you ask your potential visitors to vote, you'd expect the most-voted talk to make it on the schedule no ?

The feedback Marek got from them was surprising too, though. Their first response was that this talk was too similar to another talk, titled "How to combine JavaScript & Django in a smart way". Now, I'm not a JavaScript expert, but from the title alone I can already tell that it's very unlikely that these two talks have many similarities beyond the word 'Django'.

After refuting that point, their second reason was that they wanted more experienced speakers (but they didn't ask Marek for his experience), and their third reason was that the talk was in previous editions of DjangoCon US/EU (it's unclear whether they meant his talk or the JavaScript one, but Marek's definitely wasn't, and we couldn't find any mention of the other talk in previous conferences. I'm also not sure why that even matters one way or the other. This email thread was in Polish, so I have to rely on Marek's interpretation of it)

Personally, my reaction would have been to complain to the organizers or Django maintainers. Marek's flegmatic attitude was much better though - after such an exchange, he simply doesn't want to have anything to do with the conference.

He's probably right - it's hard to argue with someone who doesn't want to invite you and is lying about the reasons.

The second example is BCNDevCon, a great conference here in Barcelona, organized by a guy who used to work for Flumotion who I have enormous respect for. I've never seen anyone create such a big conference over so little time.

He believes strongly in the democratic aspect, and as far as I can tell constructs the schedule solely based on the votes.

Sadly I didn't go to the last one, and the reason is simply because I felt that the talks that made it were too obviously corporate. A lot of talks were about Microsoft products, and you could tell that they won votes because people's coworkers voted on talks. I'm not saying that's necessarily wrong - given that he worked at our company and has friends here, I'm sure people working here presenting at his conference have also done vote tending. It's natural to do so. But there should be a way to balance that out.

I think the idea of voting is good, but implementation matters too. Ideally, you would only want people that actually are going to show up to vote. I have no idea how you can ensure that, though. Do you ask people to pre-pay ? Do you ask them to commit to pay if at least 50% of their votes make it in the final schedule, kickstarter-style ?

These two examples are on opposite extremes of voting. One conference simply disregards completely what people vote on. If I had voted or bought a ticket, I would feel lied to. Why waste the time of so many people? The other conference puts so much stock in the vote, that I feel the final result was strongly affected. I seriously doubt all those Windows 8 voters actually showed up.

Does anyone have good experiences with conference voting that did work? Feel free to share!

More adventures in puppet

Filed under: General,Hacking,sysadmin — Thomas @ 23:32

2012-03-04
23:32

After last week's Linode incident I was getting a bit more worried about security than usual. That coincided with the fact that I found I couldn't run puppet on one of my linodes, and some digging turned up that it was because /tmp was owned by uid:gid 1000:1000. Since I didn't know the details of the breakin (and I hadn't slept more than 4 hours for two nights, one of which involving a Flumotion DVB problem), I had no choice but to be paranoid about it. And it took me a good half hour to realize that I had inflicted this problem on myself - a botched rsync command (rsync arv . root@somehost:/tmp).

So I wasn't hacked, but I still felt I needed to tighten security a bit. So I thought I'd go with something simple to deploy using puppet - port knocking.

Now, that would be pretty easy to do if I just deployed firewall rules in a single set. But I started deploying firewall rules using the puppetlabs firewall module, which allows me to group rules per service. So that's the direction I wanted to head off into.

On saturday, I worked on remembering enough iptables to actually understand how port knocking works in a firewall. Among other things, I realized that our current port knocking is not ideal - it uses only two ports. They're in descending order, so usually they would not be triggered by a normal port scan, but they would be triggered by one in reverse order. That is probably why most sources recommend using three ports, where the third port is between the first two, so they're out of order.

So I wanted to start by getting the rules right, and understanding them. I started with this post, and found a few problems in it that I managed to work out. The fixed version is this:

UPLINK="p21p1"
#
# Comma seperated list of ports to protect with no spaces.
SERVICES="22,3306"
#
# Location of iptables command
IPTABLES='/sbin/iptables'

# in stage1, connects on 3456 get added to knock2 list
${IPTABLES} -N stage1
${IPTABLES} -A stage1 -m recent --remove --name knock
${IPTABLES} -A stage1 -p tcp --dport 3456 -m recent --set --name knock2

# in stage2, connects on 2345 get added to heaven list
${IPTABLES} -N stage2
${IPTABLES} -A stage2 -m recent --remove --name knock2
${IPTABLES} -A stage2 -p tcp --dport 2345 -m recent --set --name heaven

# at the door:
# - jump to stage2 with a shot at heaven if you're on list knock2
# - jump to stage1 with a shot at knock2 if you're on list knock
# - get on knock list if connecting t0 1234
${IPTABLES} -N door
${IPTABLES} -A door -m recent --rcheck --seconds 5 --name knock2 -j stage2
${IPTABLES} -A door -m recent --rcheck --seconds 5 --name knock -j stage1
${IPTABLES} -A door -p tcp --dport 1234 -m recent --set --name knock

${IPTABLES} -A INPUT -m state --state ESTABLISHED,RELATED -j ACCEPT
${IPTABLES} -A INPUT -p tcp --match multiport --dport ${SERVICES} -i ${UPLINK} -m recent --rcheck --seconds 5 --name heaven -j ACCEPT
${IPTABLES} -A INPUT -p tcp --syn -j door

# close everything else
${IPTABLES} -A INPUT -j REJECT --reject-with icmp-port-unreachable

And it gives me this iptables state:
71242

So the next step was to reproduce these rules using puppet firewall rules.

Immediately I ran into the first problem - we need to add new chains, and there doesn't seem to be a way to do that in the firewall resource. At the same time, it uses the recent iptables module, and none of that is implemented either. I spent a bunch of hours trying to add this, but since I don't really know Ruby and I've only started using Puppet for real in the last two weeks, that wasn't working out well. So then I thought, why not look in the bug tracker and see if anyone else tried to do this ? I ask my chains question on IRC, while I find a ticket about recent support. A minute later danblack replies on IRC with a link to a branch that supports creating chains - the same person that made the recent branch.

This must be a sign - the same person helping me with my problem in two different ways, with two branches? Today will be a git-merging to-the-death hacking session, fueled by the leftovers of yesterday's mexicaganza leftovers.

I start with the branch that lets you create chains, which works well enough, bar some documentation issues. I create a new branch and merge this one on, ending up in a clean rebase.

Next is the recent branch. I merge that one on. I choose to merge in this case, because I hope it will be easier to make the fixes needed in both branches, but still pull everything together on my portknock branch, and merge in updates every time.

This branch has more issues - rake test doesn't even pass. So I start digging through the failing testcases, adding print debugs and learning just enough ruby to be dangerous.

I slowly get better at fixing bugs. I create minimal .pp files in my /etc/puppet/manifests so I can test just one rule with e.g. puppet apply manifests/recent.pp

The firewall module hinges around being able to convert a rule to a hash as expressed in puppet, and back again, so that puppet can know that a rule is already present and does not need to be executed. I add a conversion unit test for each of the features that tests these basic operations, but I end up actually fixing the bugs by sprinkling print's and testing with a single apply.

I learn to do service iptables restart; service iptables stop to reset my firewall and start cleanly. It takes me a while to realize when I botched the firewall so that I can't even google (in my case, forgetting to have -A INPUT -m state --state ESTABLISHED,RELATED -j ACCEPT
) - not helped by the fact that for the last two weeks the network on my home desktop is really flaky, and simply stops working after some activity, forcing me to restart NetworkManager and reload network modules.

I start getting an intuition for how puppet's basic resource model works. For example, if a second puppet run produces output, something's wrong. I end up fixing lots of parsing bugs because of that - once I notice that a run tells me something like

notice: /Firewall[999 drop all other requests]/chain: chain changed '-p' to 'INPUT'
notice: Firewall[999 drop all other requests](provider=iptables): Properties changed - updating rule

I know that, even though the result seems to work, I have some parsing bug, and I can attack that bug by adding another unit test and adding more prints for a simple rule.

I learn that, even though the run may seem clean, if the module didn't figure out that it already had a rule (again, because of bogus parsing), it just adds the same rule again - another thing we don't want. That gets fixed on a few branches too.

And then I get to the point where my puppet apply brings all the rules together - except it still does not work. And I notice one little missing rule: ${IPTABLES} -A INPUT -p tcp --syn -j door

And I learn about --syn, and --tcp-flags, and to my dismay, there is no support for tcp-flags anywhere. There is a ticket for TCP flags matching support, but nobody worked on it.

So I think, how hard can it be, with everything I've learned today? And I get onto it. It turns out it's harder than expected. Before today, all firewall resource properties swallowed exactly one argument - for example, -p (proto). In the recent module, some properties are flags, and don't have an argument, so I had to support that with some hacks.

The rule_to_hash function works by taking an iptables rule line, and stripping off the parameters from the back in reverse order one by one, but leaving the arguments there. At the end, it has a list of keys it saw, and hopefully, a string of arguments that match the keys, but in reverse order. (I would have done this by stripping the line of both parameter and argument(s) and putting those on a list, but that's just me)

But the --tcp-flags parameter takes two arguments - a mask of flags, and a list of flags that needs to be set. So I hack it in by adding double quotes around it, so it looks the same way a --comment does (except --comment is always quoted in iptables --list-rules output), and handle it specially. But after some fidgeting, that works too!

And my final screenshot for the day:
71245

So, today's result:

Now, I have a working node that implements port knocking:

node 'ana' {

$port1 = '1234'
$port2 = '3456'
$port3 = '2345'

$dports = [22, 3306]

$seconds = 5

firewall { "000 accept all icmp requests":
proto => "icmp",
action => "accept",
}

firewall { "001 accept all established connections":
proto => "all",
state => ["RELATED", "ESTABLISHED"],
action => "accept",
}

firewall { "999 drop all other requests":
chain => "INPUT",
proto => "tcp",
action => "reject",
}

firewallchain { [':stage1:', ':stage2:', ':door:']:
}

# door
firewall { "098 knock2 goes to stage2":
chain => "door",
recent_command => "rcheck",
recent_name => "knock2",
recent_seconds => $seconds,
jump => "stage2",
require => [
Firewallchain[':door:'],
Firewallchain[':stage2:'],
]
}

firewall { "099 knock goes to stage1":
chain => "door",
recent_command => "rcheck",
recent_name => "knock",
recent_seconds => $seconds,
jump => "stage1",
require => [
Firewallchain[':door:'],
Firewallchain[':stage1:'],
]
}

firewall { "100 knock on port $port1 sets knock":
chain => "door",
proto => 'tcp',
recent_name => 'knock',
recent_command => 'set',
dport => $port1,
require => [
Firewallchain[':door:'],
]
}

# stage 1
firewall { "101 stage1 remove knock":
chain => "stage1",
recent_name => "knock",
recent_command => "remove",
require => Firewallchain[':stage1:'],
}

firewall { "102 stage1 set knock2 on $port2":
chain => "stage1",
recent_name => "knock2",
recent_command => "set",
proto => "tcp",
dport => $port2,
require => Firewallchain[':stage1:'],
}

# stage 2
firewall { "103 stage2 remove knock":
chain => "stage2",
recent_name => "knock",
recent_command => "remove",
require => Firewallchain[':stage2:'],
}

firewall { "104 stage2 set heaven on $port3":
chain => "stage2",
recent_name => "heaven",
recent_command => "set",
proto => "tcp",
dport => $port3,
require => Firewallchain[':stage2:'],
}

# let people in heaven
firewall { "105 heaven let connections through":
chain => "INPUT",
proto => "tcp",
recent_command => "rcheck",
recent_name => "heaven",
recent_seconds => $seconds,
dport => $dports,
action => accept,
require => Firewallchain[':stage2:'],
}

firewall { "106 connection initiation to door":
# FIXME: specifying chain explicitly breaks insert_order !
chain => "INPUT",
proto => "tcp",
tcp_flags => "FIN,SYN,RST,ACK SYN",
jump => "door",
require => [
Firewallchain[':door:'],
]
}
}

and I can log in with

nc -w 1 ana 1234; nc -w 1 ana 3456; nc -w 1 ana 2345; ssh -A ana

Lessons learned today:

  • watch iptables -nvL is an absolutely excellent way of learning more about your firewall - you see your rules and the traffic on them in real time. It made it really easy to see for example the first nc command triggering the knock.
  • Puppet is reasonably hackable - I was learning quickly as I progressed through test and bug after test and bug.
  • I still don't like ruby, and we may never be friends, but at least it's something I'm capable of learning. Puppet might just end up being the trigger.

Tomorrow, I need to clean up the firewall rules into something reusable, and deploy it on the platform.

« Previous PageNext Page »
picture