thomas.apestaart.org » Open Source

Evolution backup recovery

Filed under: Open Source — Thomas @ 15:36

2012-04-01
15:36

I pretty much never drink and hack, and last Friday's evening is a good reason why. I was having a rare beer and managed to spill part of it on my keyboard and desk. So I turned the keyboard around, started cleaning it as fast as I could, forgetting to actually unplug it. I called it a night because nothing good was going to come from that night anymore.

And on Saturday morning I noticed that my INBOX was gone. Hm, is it really gone? Yep, gone from my laptop too. Crap, must have deleted it on the server by accident while cleaning my keyboard...

And because my NAS is a little full lately, I haven't been as diligent with backups as I normally have been. Hm, and the modest cache on my N900 isn't very useful either...

Luckily, evolution on my work machine was shut down for some reason, so yay, it has a reasonably fresh cache of my INBOX!

Except that it's not all that straightforward to actually get this cache back into Evolution. Just copying its contents to an existing or new folder doesn't do anything. The files themselves are split up versions of the actual email, assumingly because the evo guys thought it would be faster to search header and body by splitting them off from the attachments and saving them separately, inventing their own caching format. Which is fine, but makes it impossible to actually restore a backup with...

After lots of Googling, I stumbled upon this tool that did the trick for me. A lot of hours wasted over a bunch of emails... But what would happen if I really lost my IMAP server mail ? Run this script by hand on all the folders ? Shudder...

Comments (5)

GStreamer 0.11 Application Porting Hackfest

Filed under: Conference,Flumotion,GStreamer,Hacking,Open Source — Thomas @ 11:16

2012-01-26
11:16

I'm in the quiet town of Malaga these three days to attend the GStreamer hackfest. The goal is to port applications over to the 0.11 API which will eventually be 1.0 There's about 18 people here, which is a good number for a hackfest.

The goal for me is to figure out everything that needs to be done to have Flumotion working with GStreamer 0.11. It looks like there is more work than expected, since some of the things we rely on haven't been ported successfully.

Luckily back in the day we spent quite a bit of time to layer parts as best as possible so they don't depend too much on each other. Essentially, Flumotion adds a layer on top of GStreamer where GStreamer pipelines can be run in different processes and on different machines, and be connected to each other over the network. To that end, the essential communication between elements is abstracted and wrapped inside a data protocol, so that raw bytes can be transferred from one process to another, and the other end ends up receiving those same GStreamer buffers and events.

First up, there is the GStreamer Data protocol. Its job is to serialize buffers and events into a byte stream.

Second, there is the concept of streamheaders (which is related to the DELTA_UNIT flag in GStreamer). These are buffers that always need to be send at the beginning of a new stream to be able to interpret the buffers coming after it. In 0.10, that meant that at least a GDP version of the caps needed to be in the streamheader (because the other side cannot interpret a running stream without its caps), and in more recent versions a new-segment event. These streamheaders are analogous to the new sticky event concept in 0.11 - some events, like CAPS and TAG and SEGMENT are now sticky to the pad, which means that a new element connected to that pad will always see those events to make sense of the new data it's getting.

Third, the actual network communication is done using the multifdsink element (and an fdsrc element on the other side). This element just receives incoming buffers, keeps them on a global buffer list, and sends all of them to the various clients added to it by file descriptor. It understands about streamheaders, and makes sure clients get the right ones for wherever they end up in the buffer list. It manages the buffers, the speed of clients, the bursting behaviour, ... It doesn't require GDP at all to work - Flumotion uses this element to stream Ogg, mp3, asf, flv, webm, ... to the outside world. But to send GStreamer buffers, it's as simple as adding a gdppay before multifdsink, and a gdpdepay after fdsrc. Also, at the same level, there are tcpserversink/tcpclientsrc and tcpclientsink/tcpserversrc elements that do the same thing over a simple TCP connection.

Fourth, there is an interface between multifdsink/fdsrc and Python. We let Twisted set up the connections, and then steal the file descriptor and hand those off to multifdsink and fdsrc. This makes it very easy to set up all sorts of connections (like, say, in SSL, or just pipes) and do things to them before streaming (like, for example, authentication). But by passing the actual file descriptor, we don't lose any performance - the low-level streaming is still done completely in C. This is a general design principle of Flumotion: use Python and Twisted for setup, teardown, and changes to the system, and where we need a lot of functionality and can sacrifice performance; but use C and GStreamer for the lower-level processor-intensive stuff, the things that happen in steady state, processing the signal.

So, there is work to do in GStreamer 0.11:

The GStreamer data protocol has not really been ported. gdppay/depay are still there, but don't entirely work.
streamheaders in those elements will need adapting to handle sticky events.
multifdsink was moved to -bad and left with broken unit tests. There is now multisocketsink. But sadly it looks like GSocket isn't meant to handle pure file descriptors (which we use in our component that records streams to disk for example)
0.11 doesn't have the traditional Python bindings. It uses gobject-introspection instead. That will need a lot of work on the Flumotion side, and ideally we would want to keep the codebase working against both 0.10 and 0.11 as we did for the 0.8->0.10 move. Apparently these days you cannot mix gi-style binding with old-style binding anymore, because they create separate class trees. I assume this also means we need to port the glib2/gtk2 reactors in Twisted to using gobject-introspection.

So, there is a lot of work to be done it looks like. Luckily Andoni arrived today too, so we can share some work.

After discussing with Wim, Tim, and Sebastien, my plan is:

create a common base class for multihandlesink, and refactor multisocketsink and multifdsink as subclasses of it
create g_value_transform functions to bytestreams for basic objects like Buffers and Events
use these transform functions as the basis for a new version of GDP, which we'll make typefindable this time around
support sticky events
ignore metadata for now, as it is not mandatory; although in the future we could let gdppay decide which metadata it wants to serialize, so the application can request to do so
try multisocketsink as a transport for inside Flumotion and/or for the streaming components.
In the latter case, do some stress testing - on our platform, we have pipelines with multifdsink running for months on end without crashing or leaking, sometimes going up to 10000 connections open.
Make twisted reactors
prototype flumotion-launch with 0.11 code by using gir

That's probably not going to be finished over this week, but it's a good start. Last night I started by fixing the unit tests for multifdsink, and now I started refactoring multisocketsink and multifdsink with that. I'll first try and make unit tests for multisocketsink though, to verify that I'm refactoring properly.

Comments (1)

Launching our new baby

Filed under: Conference,Flumotion,Open Source,Work — Thomas @ 11:01

2011-05-05
11:01

Well, the cat has been out of the bag for a few days and I have been too busy to blog about it.

But today as I wait for my team to do a final deploy fixing a bug with too-long URL names for Flash Media Encoder, I have some spare time to mention what's going on and make some people an offer they cannot refuse.

So, for the past half year of so we've been hacking away at a new service to solve a very specific problem in streaming. From 2005-2010 the streaming world mostly settled on Flash as a common platform, which was an unstable equilibrium for everyone involved, but it seemed to work. However, with the amount of codecs, devices and platforms there are today, this equilibrium has been falling. The introduction of iPhone, Microsoft's heavy pushing of Silverlight (paying companies to stream in it - and funnily enough those companies usually stop using Silverlight when the money faucet closes), GoogleTV, the introduction of WebM, the arrival of HTML5 (ironically pushed by Apple - yay - even though their HTML5 sites usually only work in Safari - boo)... all these movements served to upset the status quo once again.

To the eye of the casual observer, it would seem that all streaming has standardized on H264, and so transmuxing technologies are popping up - taking the same video encoding and just remux it for different technologies. However, in practice, H264 is a collection of many techniques and profiles, different levels of complexity, and not all devices support the same profiles and techniques. If you want to stream to all H264 devices with just one encoding, you'll have to settle for the least common denominator in terms of quality, and you'll have to pick a resolution that works subpar for all of them.

Now, content producers hate this sort of situation. They just want to get the signal out there, because that's what matters. The codec and the streaming is just the technological means to get it across the internet. And now the market is asking them to put a bunch of machines in their facilities, learn a lot of technologies they'd rather not worry about, consume heaps of bandwidth to send each version online, and then have to do it all over again each time something changes out there - a new codec, a new device, a new favorite resolution, ...

Our answer to this problem is simple: send us one encoding, we will do the rest. Our service will take your live stream, transcode it to as many different encodings as you want, and hand them off to a CDN. That's basically it. Want full HTML5 coverage ? We'll do it for you - H264 single and multibitrate, Theora, WebM, and a Flash fallback. Want Silverlight, Flash RTMP, Windows Media MMS ? All there.

Services like this already exist for ondemand - see zencoder and encoding.com and Panda. Live is just inherently more difficult - you don't get to work with nice single finished files, and it has to happen right now. But this is exactly the sort of thing a framework like GStreamer is good for.

In reality we aren't doing anything new here - Flumotion runs a CDN that already provides this service to customers. The difference is that this time, you will be able to set it up yourself online. A standard integration time with any CDN is around two weeks. This service will cut that time down to five minutes. We're not quite there yet, but we're close.

What's that you say ? Something about an offer ? Oh, right. It's always pained me to see that, when we wanted to stream a conference for free, it was still quite a bit of work in the setup stage for our support team, and hence we didn't stream as many conferences as I would have liked to. Similarly, it pains me to see a lot of customers not even considering free formats.

So the offer is simple. If you are running an event or a conference that flies under a Free/Open banner, and you're willing to stream only in free formats (meaning, Theora and WebM), and you're willing to ride the rough wave of innovation as we shake out our last bugs, we want to help you out. Send us the signal, we'll do the rest. Drop me a line and let's see how we can set it up. Offer limited, standard handwavy disclaimers apply, you'll have to take my word for it, etc...

If you're in the streaming industry, I will be demoing this new service next week on Wednesday around 2.00 pm local time in New York City, at Streaming Media East. And after that our Beta program starts.

Feel free to follow our twitter feed and find us on Facebook somewhere, as the kids these days say...

Happy streaming!

Comments (6)

Download or Downloads

Filed under: friction,Open Source — Thomas @ 18:42

2011-03-14
18:42

Having various machines, some with homedirs passed on across distro versions, I somehow ended up with both Download and Downloads directories. Not to mention that my Firefox and Chromium instances were downloading everywhere, and the silly mental friction of this one letter difference is pissing me off. I was renaming one to the other on one machine, and probably doing the opposite on another.

So, no more. Instead of figuring out what the spec says, I just created a fresh user on my Fedora 14 laptop and did:
$ ls Desktop Documents Downloads Music Pictures Public Templates Videos

So, that's what it's going to be. Never mind that I hate with a passion something ugly like Music vs Videos (where do I put audio podcasts then ?), I'll just follow like rank and file.

Since I did get a little curious though where the two folders came from, I took a quick look at the FDO page (which as recent blogs have shown is the One Standard Body) and sure enough - version 0.11 renamed Download to Downloads.

This post is just a note to self so that any time I find a computer with both, I fix it properly, instead of the Browsian motion I'm on right now.

Comments (8)

This weekend’s yak shave…

Filed under: Hacking,Open Source,pychecker,Python — Thomas @ 22:09

2010-12-19
22:09

... went a little something like this.

home desktop upgraded to F14, still a bunch of packages missing.
Want to do a release of moap because it's been too long, but can't because make check fails because pychecker fails because F14 is the first one with Python 2.7
Look into pychecker failures. Figure out that there are new opcodes. Worse - for the first time I can tell opcode numbers have been shifted around. Find this bug where renumbering started; commented to ask if this was expected. Apparently it is, although no good reason for doing so has been offered. Shrug, not my fight, although it's going to make for ugly if's in Pychecker code.
Realize I don't actually have Python 2.7 on the Pychecker buildbot page. Remember I have a script to build Python versions from source which now builds 2.3-2.7 and 3.0 (just check out and run make, then py-x.y to start a shell with that version of python).
Add it to buildbot master config, restart, see buildbot fail spectacularly. Apparently buildbot was upgraded since I last started it, and code has been shuffled around. Spend a few hours figuring out how to upgrade buildbot and my code with custom steps without losing the history. Now have a buildbot 0.7.12 building on 2.7 as well.
Add some ugly if's for python versions to handle opcode reshuffling. Brings down the test failures. Add some handlers for new things like POP_JUMP_IF_TRUE/FALSE and JUMP_IF_TRUE/FALSE_OR_POP. End up with only one test failure difference with 2.6
Look into the remaining failure, realize that it is checking for constness of return results, and JUMP_IF_TRUE/POP_TOP pairs are now POP_JUMP_IF_TRUE, so peeking ahead to see LOAD_CONST should move by one opcode.

So, yay! That means CVS (yes...) HEAD of pychecker now works just as good for 2.7 as 2.6 and it's time to start releasing somewhere this week. And maybe I should push on and fix some of the older failures, or shelve them for now, while I'm at it, and have nicely green builds!

Comments (0)

Present Perfect

Evolution backup recovery

2012-04-0115:36

GStreamer 0.11 Application Porting Hackfest

2012-01-2611:16