thomas.apestaart.org

More adventures in puppet

Filed under: General,Hacking,sysadmin — Thomas @ 23:32

2012-03-04
23:32

After last week's Linode incident I was getting a bit more worried about security than usual. That coincided with the fact that I found I couldn't run puppet on one of my linodes, and some digging turned up that it was because /tmp was owned by uid:gid 1000:1000. Since I didn't know the details of the breakin (and I hadn't slept more than 4 hours for two nights, one of which involving a Flumotion DVB problem), I had no choice but to be paranoid about it. And it took me a good half hour to realize that I had inflicted this problem on myself - a botched rsync command (rsync arv . root@somehost:/tmp).

So I wasn't hacked, but I still felt I needed to tighten security a bit. So I thought I'd go with something simple to deploy using puppet - port knocking.

Now, that would be pretty easy to do if I just deployed firewall rules in a single set. But I started deploying firewall rules using the puppetlabs firewall module, which allows me to group rules per service. So that's the direction I wanted to head off into.

On saturday, I worked on remembering enough iptables to actually understand how port knocking works in a firewall. Among other things, I realized that our current port knocking is not ideal - it uses only two ports. They're in descending order, so usually they would not be triggered by a normal port scan, but they would be triggered by one in reverse order. That is probably why most sources recommend using three ports, where the third port is between the first two, so they're out of order.

So I wanted to start by getting the rules right, and understanding them. I started with this post, and found a few problems in it that I managed to work out. The fixed version is this:
UPLINK="p21p1" # # Comma seperated list of ports to protect with no spaces. SERVICES="22,3306" # # Location of iptables command IPTABLES='/sbin/iptables'


# in stage1, connects on 3456 get added to knock2 list

${IPTABLES} -N stage1

${IPTABLES} -A stage1 -m recent --remove --name knock

${IPTABLES} -A stage1 -p tcp --dport 3456 -m recent --set --name knock2
# in stage2, connects on 2345 get added to heaven list

${IPTABLES} -N stage2

${IPTABLES} -A stage2 -m recent --remove --name knock2

${IPTABLES} -A stage2 -p tcp --dport 2345 -m recent --set --name heaven
# at the door:

# - jump to stage2 with a shot at heaven if you're on list knock2

# - jump to stage1 with a shot at knock2 if you're on list knock

# - get on knock list if connecting t0 1234

${IPTABLES} -N door

${IPTABLES} -A door -m recent --rcheck --seconds 5 --name knock2 -j stage2

${IPTABLES} -A door -m recent --rcheck --seconds 5 --name knock -j stage1

${IPTABLES} -A door -p tcp --dport 1234 -m recent --set --name knock
${IPTABLES} -A INPUT -m state --state ESTABLISHED,RELATED -j ACCEPT

${IPTABLES} -A INPUT -p tcp --match multiport --dport ${SERVICES}  -i ${UPLINK} -m recent --rcheck --seconds 5 --name heaven -j ACCEPT

${IPTABLES} -A INPUT -p tcp --syn -j door

# close everything else ${IPTABLES} -A INPUT -j REJECT --reject-with icmp-port-unreachable

And it gives me this iptables state:
71242

So the next step was to reproduce these rules using puppet firewall rules.

Immediately I ran into the first problem - we need to add new chains, and there doesn't seem to be a way to do that in the firewall resource. At the same time, it uses the recent iptables module, and none of that is implemented either. I spent a bunch of hours trying to add this, but since I don't really know Ruby and I've only started using Puppet for real in the last two weeks, that wasn't working out well. So then I thought, why not look in the bug tracker and see if anyone else tried to do this ? I ask my chains question on IRC, while I find a ticket about recent support. A minute later danblack replies on IRC with a link to a branch that supports creating chains - the same person that made the recent branch.

This must be a sign - the same person helping me with my problem in two different ways, with two branches? Today will be a git-merging to-the-death hacking session, fueled by the leftovers of yesterday's mexicaganza leftovers.

I start with the branch that lets you create chains, which works well enough, bar some documentation issues. I create a new branch and merge this one on, ending up in a clean rebase.

Next is the recent branch. I merge that one on. I choose to merge in this case, because I hope it will be easier to make the fixes needed in both branches, but still pull everything together on my portknock branch, and merge in updates every time.

This branch has more issues - rake test doesn't even pass. So I start digging through the failing testcases, adding print debugs and learning just enough ruby to be dangerous.

I slowly get better at fixing bugs. I create minimal .pp files in my /etc/puppet/manifests so I can test just one rule with e.g. puppet apply manifests/recent.pp

The firewall module hinges around being able to convert a rule to a hash as expressed in puppet, and back again, so that puppet can know that a rule is already present and does not need to be executed. I add a conversion unit test for each of the features that tests these basic operations, but I end up actually fixing the bugs by sprinkling print's and testing with a single apply.

I learn to do service iptables restart; service iptables stop to reset my firewall and start cleanly. It takes me a while to realize when I botched the firewall so that I can't even google (in my case, forgetting to have -A INPUT -m state --state ESTABLISHED,RELATED -j ACCEPT
) - not helped by the fact that for the last two weeks the network on my home desktop is really flaky, and simply stops working after some activity, forcing me to restart NetworkManager and reload network modules.

I start getting an intuition for how puppet's basic resource model works. For example, if a second puppet run produces output, something's wrong. I end up fixing lots of parsing bugs because of that - once I notice that a run tells me something like
notice: /Firewall[999 drop all other requests]/chain: chain changed '-p' to 'INPUT' notice: Firewall[999 drop all other requests](provider=iptables): Properties changed - updating rule
I know that, even though the result seems to work, I have some parsing bug, and I can attack that bug by adding another unit test and adding more prints for a simple rule.

I learn that, even though the run may seem clean, if the module didn't figure out that it already had a rule (again, because of bogus parsing), it just adds the same rule again - another thing we don't want. That gets fixed on a few branches too.

And then I get to the point where my puppet apply brings all the rules together - except it still does not work. And I notice one little missing rule: ${IPTABLES} -A INPUT -p tcp --syn -j door

And I learn about --syn, and --tcp-flags, and to my dismay, there is no support for tcp-flags anywhere. There is a ticket for TCP flags matching support, but nobody worked on it.

So I think, how hard can it be, with everything I've learned today? And I get onto it. It turns out it's harder than expected. Before today, all firewall resource properties swallowed exactly one argument - for example, -p (proto). In the recent module, some properties are flags, and don't have an argument, so I had to support that with some hacks.

The rule_to_hash function works by taking an iptables rule line, and stripping off the parameters from the back in reverse order one by one, but leaving the arguments there. At the end, it has a list of keys it saw, and hopefully, a string of arguments that match the keys, but in reverse order. (I would have done this by stripping the line of both parameter and argument(s) and putting those on a list, but that's just me)

But the --tcp-flags parameter takes two arguments - a mask of flags, and a list of flags that needs to be set. So I hack it in by adding double quotes around it, so it looks the same way a --comment does (except --comment is always quoted in iptables --list-rules output), and handle it specially. But after some fidgeting, that works too!

And my final screenshot for the day:
71245

So, today's result:

Now, I have a working node that implements port knocking:
node 'ana' {


    $port1 = '1234'

    $port2 = '3456'

    $port3 = '2345'
    $dports = [22, 3306]
    $seconds = 5
    firewall { "000 accept all icmp requests":

      proto => "icmp",

      action => "accept",

    }
    firewall { "001 accept all established connections":

      proto => "all",

      state => ["RELATED", "ESTABLISHED"],

      action => "accept",

    }
    firewall { "999 drop all other requests":

      chain => "INPUT",

      proto => "tcp",

      action => "reject",

    }
    firewallchain { [':stage1:', ':stage2:', ':door:']:

    }
    # door

    firewall { "098 knock2 goes to stage2":

      chain => "door",

      recent_command => "rcheck",

      recent_name => "knock2",

      recent_seconds => $seconds,

      jump => "stage2",

      require => [

	Firewallchain[':door:'],

	Firewallchain[':stage2:'],

      ]

    }
    firewall { "099 knock goes to stage1":

      chain => "door",

      recent_command => "rcheck",

      recent_name => "knock",

      recent_seconds => $seconds,

      jump => "stage1",

      require => [

	Firewallchain[':door:'],

	Firewallchain[':stage1:'],

      ]

    }
    firewall { "100 knock on port $port1 sets knock":

      chain => "door",

      proto => 'tcp',

      recent_name => 'knock',

      recent_command => 'set',

      dport => $port1,

      require => [

	Firewallchain[':door:'],

      ]

    }
    # stage 1

    firewall { "101 stage1 remove knock":

      chain => "stage1",

      recent_name => "knock",

      recent_command => "remove",

      require => Firewallchain[':stage1:'],

    }
    firewall { "102 stage1 set knock2 on $port2":

      chain => "stage1",

      recent_name => "knock2",

      recent_command => "set",

      proto => "tcp",

      dport => $port2,

      require => Firewallchain[':stage1:'],

    }
    # stage 2

    firewall { "103 stage2 remove knock":

      chain => "stage2",

      recent_name => "knock",

      recent_command => "remove",

      require => Firewallchain[':stage2:'],

    }
    firewall { "104 stage2 set heaven on $port3":

      chain => "stage2",

      recent_name => "heaven",

      recent_command => "set",

      proto => "tcp",

      dport => $port3,

      require => Firewallchain[':stage2:'],

    }
    # let people in heaven

    firewall { "105 heaven let connections through":

      chain => "INPUT",

      proto => "tcp",

      recent_command => "rcheck",

      recent_name => "heaven",

      recent_seconds => $seconds,

      dport => $dports,

      action => accept,

      require => Firewallchain[':stage2:'],

    }

firewall { "106 connection initiation to door": # FIXME: specifying chain explicitly breaks insert_order ! chain => "INPUT", proto => "tcp", tcp_flags => "FIN,SYN,RST,ACK SYN", jump => "door", require => [ Firewallchain[':door:'], ] } }
and I can log in with
nc -w 1 ana 1234; nc -w 1 ana 3456; nc -w 1 ana 2345; ssh -A ana

Lessons learned today:

watch iptables -nvL is an absolutely excellent way of learning more about your firewall - you see your rules and the traffic on them in real time. It made it really easy to see for example the first nc command triggering the knock.
Puppet is reasonably hackable - I was learning quickly as I progressed through test and bug after test and bug.
I still don't like ruby, and we may never be friends, but at least it's something I'm capable of learning. Puppet might just end up being the trigger.

Tomorrow, I need to clean up the firewall rules into something reusable, and deploy it on the platform.

Comments (0)

GStreamer 0.11 Application Porting Hackfest

Filed under: Conference,Flumotion,GStreamer,Hacking,Open Source — Thomas @ 11:16

2012-01-26
11:16

I'm in the quiet town of Malaga these three days to attend the GStreamer hackfest. The goal is to port applications over to the 0.11 API which will eventually be 1.0 There's about 18 people here, which is a good number for a hackfest.

The goal for me is to figure out everything that needs to be done to have Flumotion working with GStreamer 0.11. It looks like there is more work than expected, since some of the things we rely on haven't been ported successfully.

Luckily back in the day we spent quite a bit of time to layer parts as best as possible so they don't depend too much on each other. Essentially, Flumotion adds a layer on top of GStreamer where GStreamer pipelines can be run in different processes and on different machines, and be connected to each other over the network. To that end, the essential communication between elements is abstracted and wrapped inside a data protocol, so that raw bytes can be transferred from one process to another, and the other end ends up receiving those same GStreamer buffers and events.

First up, there is the GStreamer Data protocol. Its job is to serialize buffers and events into a byte stream.

Second, there is the concept of streamheaders (which is related to the DELTA_UNIT flag in GStreamer). These are buffers that always need to be send at the beginning of a new stream to be able to interpret the buffers coming after it. In 0.10, that meant that at least a GDP version of the caps needed to be in the streamheader (because the other side cannot interpret a running stream without its caps), and in more recent versions a new-segment event. These streamheaders are analogous to the new sticky event concept in 0.11 - some events, like CAPS and TAG and SEGMENT are now sticky to the pad, which means that a new element connected to that pad will always see those events to make sense of the new data it's getting.

Third, the actual network communication is done using the multifdsink element (and an fdsrc element on the other side). This element just receives incoming buffers, keeps them on a global buffer list, and sends all of them to the various clients added to it by file descriptor. It understands about streamheaders, and makes sure clients get the right ones for wherever they end up in the buffer list. It manages the buffers, the speed of clients, the bursting behaviour, ... It doesn't require GDP at all to work - Flumotion uses this element to stream Ogg, mp3, asf, flv, webm, ... to the outside world. But to send GStreamer buffers, it's as simple as adding a gdppay before multifdsink, and a gdpdepay after fdsrc. Also, at the same level, there are tcpserversink/tcpclientsrc and tcpclientsink/tcpserversrc elements that do the same thing over a simple TCP connection.

Fourth, there is an interface between multifdsink/fdsrc and Python. We let Twisted set up the connections, and then steal the file descriptor and hand those off to multifdsink and fdsrc. This makes it very easy to set up all sorts of connections (like, say, in SSL, or just pipes) and do things to them before streaming (like, for example, authentication). But by passing the actual file descriptor, we don't lose any performance - the low-level streaming is still done completely in C. This is a general design principle of Flumotion: use Python and Twisted for setup, teardown, and changes to the system, and where we need a lot of functionality and can sacrifice performance; but use C and GStreamer for the lower-level processor-intensive stuff, the things that happen in steady state, processing the signal.

So, there is work to do in GStreamer 0.11:

The GStreamer data protocol has not really been ported. gdppay/depay are still there, but don't entirely work.
streamheaders in those elements will need adapting to handle sticky events.
multifdsink was moved to -bad and left with broken unit tests. There is now multisocketsink. But sadly it looks like GSocket isn't meant to handle pure file descriptors (which we use in our component that records streams to disk for example)
0.11 doesn't have the traditional Python bindings. It uses gobject-introspection instead. That will need a lot of work on the Flumotion side, and ideally we would want to keep the codebase working against both 0.10 and 0.11 as we did for the 0.8->0.10 move. Apparently these days you cannot mix gi-style binding with old-style binding anymore, because they create separate class trees. I assume this also means we need to port the glib2/gtk2 reactors in Twisted to using gobject-introspection.

So, there is a lot of work to be done it looks like. Luckily Andoni arrived today too, so we can share some work.

After discussing with Wim, Tim, and Sebastien, my plan is:

create a common base class for multihandlesink, and refactor multisocketsink and multifdsink as subclasses of it
create g_value_transform functions to bytestreams for basic objects like Buffers and Events
use these transform functions as the basis for a new version of GDP, which we'll make typefindable this time around
support sticky events
ignore metadata for now, as it is not mandatory; although in the future we could let gdppay decide which metadata it wants to serialize, so the application can request to do so
try multisocketsink as a transport for inside Flumotion and/or for the streaming components.
In the latter case, do some stress testing - on our platform, we have pipelines with multifdsink running for months on end without crashing or leaking, sometimes going up to 10000 connections open.
Make twisted reactors
prototype flumotion-launch with 0.11 code by using gir

That's probably not going to be finished over this week, but it's a good start. Last night I started by fixing the unit tests for multifdsink, and now I started refactoring multisocketsink and multifdsink with that. I'll first try and make unit tests for multisocketsink though, to verify that I'm refactoring properly.

Comments (1)

About an anime

Filed under: General — Thomas @ 20:42

2011-09-26
20:42

We had a problem with one of the encoders producing artifacts under certain conditions. It was hard to reproduce, but it usually happened on cartoons, so some of the web developers helped the core team out to see where they were triggered and spent half an hour watching anime cartoons looking for artifacts.

The boss walked past when one of them was watching the cartoon. A week later, he informed the development manager that his guys were watching cartoons on the job. It wasn't his business, of course, but the boss thought he should know.

So the development manager, in his next sitdown with the developer, said: "Don't get upset, but I wanted to let you know that the boss has caught you watching anime at work..."

Needless to say the developer was rightfully upset, wondering how the boss could possibly think he was stupid enough to be watching cartoons for fun in plain sight at work...

Comments (4)

About an intern

Filed under: General — Thomas @ 19:39

2011-09-19
19:39

Our company has a history of working with interns, thanks to our marketing manager. One day, our Operational Manager got an intern. He's easy-going and gets along with everyone in the company. The intern came for her first day, and joined him in a bunch of meetings as he took the time to explain what sort of things the Operations department actually does.

At the end of the day, he spoke the now-famous words "Espero verte maÃ±ana" - I hope to see you tomorrow!

He didn't. She never came back!

Comments (0)

Launching our new baby

Filed under: Conference,Flumotion,Open Source,Work — Thomas @ 11:01

2011-05-05
11:01

Well, the cat has been out of the bag for a few days and I have been too busy to blog about it.

But today as I wait for my team to do a final deploy fixing a bug with too-long URL names for Flash Media Encoder, I have some spare time to mention what's going on and make some people an offer they cannot refuse.

So, for the past half year of so we've been hacking away at a new service to solve a very specific problem in streaming. From 2005-2010 the streaming world mostly settled on Flash as a common platform, which was an unstable equilibrium for everyone involved, but it seemed to work. However, with the amount of codecs, devices and platforms there are today, this equilibrium has been falling. The introduction of iPhone, Microsoft's heavy pushing of Silverlight (paying companies to stream in it - and funnily enough those companies usually stop using Silverlight when the money faucet closes), GoogleTV, the introduction of WebM, the arrival of HTML5 (ironically pushed by Apple - yay - even though their HTML5 sites usually only work in Safari - boo)... all these movements served to upset the status quo once again.

To the eye of the casual observer, it would seem that all streaming has standardized on H264, and so transmuxing technologies are popping up - taking the same video encoding and just remux it for different technologies. However, in practice, H264 is a collection of many techniques and profiles, different levels of complexity, and not all devices support the same profiles and techniques. If you want to stream to all H264 devices with just one encoding, you'll have to settle for the least common denominator in terms of quality, and you'll have to pick a resolution that works subpar for all of them.

Now, content producers hate this sort of situation. They just want to get the signal out there, because that's what matters. The codec and the streaming is just the technological means to get it across the internet. And now the market is asking them to put a bunch of machines in their facilities, learn a lot of technologies they'd rather not worry about, consume heaps of bandwidth to send each version online, and then have to do it all over again each time something changes out there - a new codec, a new device, a new favorite resolution, ...

Our answer to this problem is simple: send us one encoding, we will do the rest. Our service will take your live stream, transcode it to as many different encodings as you want, and hand them off to a CDN. That's basically it. Want full HTML5 coverage ? We'll do it for you - H264 single and multibitrate, Theora, WebM, and a Flash fallback. Want Silverlight, Flash RTMP, Windows Media MMS ? All there.

Services like this already exist for ondemand - see zencoder and encoding.com and Panda. Live is just inherently more difficult - you don't get to work with nice single finished files, and it has to happen right now. But this is exactly the sort of thing a framework like GStreamer is good for.

In reality we aren't doing anything new here - Flumotion runs a CDN that already provides this service to customers. The difference is that this time, you will be able to set it up yourself online. A standard integration time with any CDN is around two weeks. This service will cut that time down to five minutes. We're not quite there yet, but we're close.

What's that you say ? Something about an offer ? Oh, right. It's always pained me to see that, when we wanted to stream a conference for free, it was still quite a bit of work in the setup stage for our support team, and hence we didn't stream as many conferences as I would have liked to. Similarly, it pains me to see a lot of customers not even considering free formats.

So the offer is simple. If you are running an event or a conference that flies under a Free/Open banner, and you're willing to stream only in free formats (meaning, Theora and WebM), and you're willing to ride the rough wave of innovation as we shake out our last bugs, we want to help you out. Send us the signal, we'll do the rest. Drop me a line and let's see how we can set it up. Offer limited, standard handwavy disclaimers apply, you'll have to take my word for it, etc...

If you're in the streaming industry, I will be demoing this new service next week on Wednesday around 2.00 pm local time in New York City, at Streaming Media East. And after that our Beta program starts.

Feel free to follow our twitter feed and find us on Facebook somewhere, as the kids these days say...

Happy streaming!

Comments (6)

« Previous Page — Next Page »

Present Perfect

More adventures in puppet

2012-03-0423:32

GStreamer 0.11 Application Porting Hackfest

2012-01-2611:16