Scheduling

By now, you've seen several example applications. All of them would set up a pipeline and call gst_bin_iterate () to start media processing. You might have started wondering what happens during pipeline iteration. This whole process of media processing is called scheduling. Scheduling is considered one of the most complex parts of GStreamer. Here, we will do no more than give a global overview of scheduling, most of which will be purely informative. It might help in understanding the underlying parts of GStreamer.

The scheduler is responsible for managing the plugins at runtime. Its main responsibilities are:

The scheduler is a pluggable component; this means that alternative schedulers can be written and plugged into GStreamer. There is usually no need for interaction in the process of choosing the scheduler, though. The default scheduler in GStreamer is called "opt". Some of the concepts discussed here are specific to opt.

Managing elements and data throughput

To understand some specifics of scheduling, it is important to know how elements work internally. Largely, there are four types of elements: _chain ()-based elements, _loop ()-based elements, _get ()-based elements and decoupled elements. Each of those have a set of features and limitations that are important for how they are scheduled.

Obviously, the type of elements that are linked together have implications for how the elements will be scheduled. If a get-based element is linked to a loop-based element and the loop-based element requests data from its sinkpad, we can just call the get-function and be done with it. However, if two loop-based elements are linked to each other, it's a lot more complicated. Similarly, a loop-based element linked to a chain-based element is a lot easier than two loop-based elements linked to each other.

The default GStreamer scheduler, "opt", uses a concept of chains and groups. A group is a series of elements that can that do not require any context switches or intermediate data stores to be executed. In practice, this implies zero or one loop-based elements, one get-based element (at the beginning) and an infinite amount of chain-based elements. If there is a loop-based element, then the scheduler will simply call this elements loop-function to iterate. If there is no loop-based element, then data will be pulled from the get-based element and will be pushed over the chain-based elements.

A chain is a series of groups that depend on each other for data. For example, two linked loop-based elements would end up in different groups, but in the same chain. Whenever the first loop-based element pushes data over its source pad, the data will be temporarily stored inside the scheduler until the loop-function returns. When it's done, the loop-function of the second element will be called to process this data. If it pulls data from its sinkpad while no data is available, the scheduler will "emulate" a get-function and, in this function, iterate the first group until data is available.

The above is roughly how scheduling works in GStreamer. This has some implications for ideal pipeline design. An pipeline would ideally contain at most one loop-based element, so that all data processing is immediate and no data is stored inside the scheduler during group switches. You would think that this decreases overhead significantly. In practice, this is not so bad, however. It's something to keep in the back of your mind, nothing more.