Gapless and instant URI switching in playback elements

This document explains the various changes and improvements to the playback elements in order to support gapless playback and instantaneous URI switching.

Last Update: November 23rd 2022

Background

The new playbin3 element and its components (uridecodebin3, decodebin3 and urisourcebin) are replacements to the legacy playbin2 and decodebin2 elements.

The goals of these new elements are to both allow new use-cases and improve performance (lower memory/cpu/io usage, lower latency). One of the key principles is also to re-use elements as much as possible. For example, when switching audio tracks the decoder can be re-used (if compatible).

The separation of roles was also more clearly split up into various new elements (from lowest-level to highest-level):

  • urisourcebin handles choosing the right source elements for the given URI, and handles buffering (via queue2) if needed (for network sources for example).

  • parsebin takes an input stream and figures out which demuxer, parsers and/or payloaders are needed to provide timed elementary streams.

  • decodebin3 internally uses parsebin to handle any input stream and will handle the decoding, inter-stream muxing interleave, stream selection and switching. It can also handle multiple inputs (such as an audio/video file and a separate subtitle file).

  • uridecodebin3 wraps urisourcebins and decodebin3 for any use-cases where one wishes to have decoded streams from given URIs.

  • Finally playbin3 combines uridecodebin3 and playsink for providing a high-level convenience pipeline for playing back content.

This design has received many improvements over time:

  • decodebin3 was able to detect input changes (caps changes) and reconfigure the associated parsebin if incompatible. This allows use-cases where upstream is an HLS/DASH stream where codecs are different across bitrates. The playback remains seamless if the decoders are compatible.

  • decodebin3 was able to bypass the usage of parsebin altogether if the incoming stream is pull-based, provides a GstStreamCollection and is compatible with the decoders or output caps.

  • urisourcebin can handle sources that handle buffering internally, avoiding dual-buffering.

  • A new core query GST_QUERY_SELECTABLE was added so that (source) elements could notify decodebin3 that they can handle stream selection and switching themselves.

  • Several improvements were made to playbin3 to allow complete stream type changes (such as going from playing audio+video to just audio or just video, and back), This allows temporarily disabling whole chains of elements when not needed.

Limitation/Issue

Two limitations existed though, which are both related:

  • Changing URI required bringing playbin3 (and all contained elements) down to GST_STATE_READY, setting the uri, and then bringing all elements back to GST_STATE_PAUSED.

    • This meant that all elements contained within were either discarded (decoders, demuxers, parsers, sources, ...) or reset (sinks)... despite potentially being 100% compatible (ex: going from h264/aac to h264/aac).
  • Gapless playback (i.e. automatically switching from one source to another, and removing any potential gap in the data arriving to the sinks) was implemented by pre-rolling a full uridecodebin3 for the next item to play and switching the inputs to playsink when the original uridecodebin3 was EOS.

    • This meant that none of the existing elements (demuxers, parsers, decoders, ..) contained in the original uridecodebin3 were re-used.

Those two use-cases are the same thing: We want to change the URI (i.e. urisourcebin) but re-use as much as possible of existing elements (i.e. decodebin3 and playsink). The only difference between the two use-cases is that changing URI should happen instantaneously in the first case, whereas in the second case it happens when the initial source is done (EOS).

Fixing this will allow:

  • Reducing memory and cpu usage (no duplicate elements)

  • Lowering latency (no longer re-instantiate/reconfigure elements and re-use compatible ones as fast as possible).

Another issue which is related, is figuring out the optimal time at which the next item should be prepared so that it has enough data to playback immediately:

  • This shouldn't be too early, some URIs expire after a given time, or the user might change their mind in between
  • This shouldn't be too late, otherwise we risk not having enough data to playback seamlessly.

Changes

parsebin in urisourcebin

In order to figure out the optimal time at which a switch should happen (i.e. a given amount of "time" before the end of the previous play entry), this can only be done on "timed" data (i.e. parsed elementary streams).

There is therefore a new option on urisourcebin : parse-streams, which if set to TRUE (non-default) will add a parsebin (if and where needed) so that urisourcebin only outputs elementary streams. A multiqueue will also be present to handle any interleave present (i.e. only queue up what is needed to offer coherent streams downstream).

If buffering is activated on urisourcebin, the multiqueue present after the parsebin will be configured in order to handle it (and post the appropriate buffering messages).

This offers the following benefits:

  • about-to-finish can be emitted by urisourcebin as soon as EOS enters those multiqueue, which will be more precise than the previous usage (before queue2 on non-timed data)

  • buffering is much closer to the actual buffering amount (in time) which is specified on the properties.

  • ALL scheduling downstream of urisourcebin is push-based, removing a lot of issues when trying to change scheduling modes (push vs pull) dynamically.

The parse-streams property is set to TRUE when used in uridecodebin3

Only use a single uridecodebin3 in playbin3

Only a single uridecodebin3 is in use in playbin3 and the source pads it provides are directly linked to playsink.

There can only be at most one stream of each stream type (audio, video, text) on the output side of uridecodebin3. The exception to this is if the user/application configured a specific multi-sinkpad combiner element for a given stream type, in which case all streams of that given stream type are linked to that.

All uri-related properties are forwarded directly to uridecodebin3, which will handle switching the sources to the single decodebin3 it contains.

uridecodebin3 URI and source handling

The URI for a given entry are handled in a GstPlayItem structure which controls (via intermediary structures):

  • The urisourcebin associated with the specified URI (and optional subtitle URI)

  • The pads provided by those sources, and which states they are in (eos, blocked, ...) and the associated GstStream (if present)

  • The buffering messages posted by those sources.

At any given point there is:

  • A input_play_item, which is the play item currently feeding data into decodebin3

  • A output_play_itm, which is the play item currently being outputted by decodebin3

Most of the time those two will be the same. But when switching play items (going from one URI to another, whether gapless or not) this switch will happen asynchronously.

Switching inputs to decodebin3

The high-level goal is to add to uridecodebin3 the capability of being able to change GstPlayItem with the same decodebin3 either:

  • When the previous GstPlayItem has finished and there is a pending next GstPlayItem. This is the "gapless" scenario.

  • Or immediately switch to the given GstPlayItem without having to change state. This is the "instantaneous URI switch" scenario.

For this, the following points need to be solved:

  1. both scenarios: Add a way for "next" GstPlayItem to be pre-rolled
  2. gapless: Determining when the switch can happen
  3. instant-uri: pre-roll next GstPlayItem and flush downstream (to make the switch as quick as possile)
  4. both scenarios: Do the actual switch

pre-rolling play items

In order to be able to re-use the same decoders (within decodebin3) as much as possible from the outside, we need to ensure that we feed the ideal "replacement" stream to the same decodebin3 sink pad.

For example, if we are switching from an audio+video HLS source to another audio+video DASH source, we want to make sure we link the new urisourcebin source pad providing video to the decodebin3 pad that was previously consuming the old video stream.

In order to do this, the urisourcebin we wish to switch to needs to be pre-rolled (set to PAUSED, new pads are set to be blocked, and we wait for a buffer/GAP to arrive on at least one of the pads).

At that point we will know the streams which are present in the new and old urisourcebins and can unlink/relink compatible pads. If new sink pads are required they will be requested, and if old pads are no longer needed (for example switching from two streams to a single one) they will be removed.

Note: Doing this also has the benefit that "replacing" the inputs to decodebin3 are done from a new streaming thread, and not the old urisourcebin streaming thread which could cause deadlocks.

Note: This "waiting" is only done when "switching", i.e. on sources which aren't in the current input play item. If the pads are from the current play entry they are linked/unlinked as soon as they are added/removed.

The moment at which the next play item is pre-rolled is done:

  • When the current play item has posted about-to-finish and the user/application has set a new play item.

  • When a new play item has been set and the instant-uri property has been set to TRUE.

When a play item is pre-rolled, it is marked as "active". There can only be one "active" play item in addition to the input play item.

gapless: determining when the switch can happen

For gapless use-cases, we want to know the earliest time we can switch from one play item to another.

Since all streams coming from urisourcebin parse-streams=True are push-based, this is when the last EOS has been pushed through all pads of the source.

Instantaneous URI switching

In order to be able to switch URI as soon as possible while re-using as many existing elements as possible, there is a new instant-uri boolean property on uridecodebin3/playbin3. The default value is FALSE.

If it is set to TRUE, the following happens whenever the uri property is set:

  • On all pads of the current input play item:

    • FLUSH_START is sent to the downstream peer pads
    • The pad is made blocking
    • The pad is marked as EOS (i.e. as if EOS had been seen)
  • And then again on all pads:

    • FLUSH_STOP is sent to the downstream peer pads
  • Finally the new play item for the new URI is activated (pre-rolled).

    • Once it is pre-rolled it will switch over

This ensures all downstream elements are kept and are ready to receive the new data.

Switching play items

Switching play items requires special attention since it needs to be done "atomically". We need to ensure it is done by a single thread. This is done by having a lock (play_items_lock) which is taken whenever we need to modify the list of play items and which play item is the current input/output.

We need to ensure the streaming thread(s) that were previously used are stopped. Since we are only dealing with push-based sources this is simple: we wait for the moment EOS is pushed on the last pad of the play item.

Another important consideration is that we need to ensure the thread that does the switch is not the previous streaming thread (it needs to be stopped).

In order to solve those issues, the actual replacement of the inputs will always happen from the streaming thread of the new play item, i.e. the one we wish to make the current input. This is done in a pad block probe on the new item source pad. Whenever a buffer (or GAP event) is received, we check whether we can switch:

  • If the current input play item is completely EOS, the switch can happen immediately. This will always be the case in instant-uri scenario and if the current input play item is pull-based.

  • If the current input play item is not completely EOS, the probe waits on the GCond input_source_drained. This is the case that will commonly happen in gapless push-based scenarios, since we are waiting for the current input play item to be finished.

Once the switch can happen, we unlink all pads from decodebin3 and attempt to match compatible new source pads from urisourcebin to decodebin3. If new sink pads are required they are requested, and if some sink pads are no longer needed or do not match they are released.

Once all pads are linked, the new play item is set as the current play item.

uridecodebin3 handles about-to-finish signalling

In regards to gapless playback, the API does not change. Users are still expected to listen to about-to-finish and set the next URI to play back.

One thing that needs to be taken care of is making sure we don't emit about-to-finish for play items which aren't currently used. This would end up in a situation where about-to-finish would cause a snowball effect of pending play items emitting it, which would cause a future entry to be created, prerolled and emitting it again.

For that reason, if a play item emits that signal but isn't the input or output play item, then it is just stored and not propagated upstream. When that play entry becomes the new input entry it will be propagated.

The results of the search are