Gapless and instant URI switching in playback elements
This document explains the various changes and improvements to the playback elements in order to support gapless playback and instantaneous URI switching.
Last Update: November 23rd 2022
playbin3 element and its components (
urisourcebin) are replacements to the legacy
The goals of these new elements are to both allow new use-cases and improve performance (lower memory/cpu/io usage, lower latency). One of the key principles is also to re-use elements as much as possible. For example, when switching audio tracks the decoder can be re-used (if compatible).
The separation of roles was also more clearly split up into various new elements (from lowest-level to highest-level):
urisourcebinhandles choosing the right source elements for the given URI, and handles buffering (via
queue2) if needed (for network sources for example).
parsebintakes an input stream and figures out which demuxer, parsers and/or payloaders are needed to provide timed elementary streams.
parsebinto handle any input stream and will handle the decoding, inter-stream muxing interleave, stream selection and switching. It can also handle multiple inputs (such as an audio/video file and a separate subtitle file).
decodebin3for any use-cases where one wishes to have decoded streams from given URIs.
playsinkfor providing a high-level convenience pipeline for playing back content.
This design has received many improvements over time:
decodebin3was able to detect input changes (caps changes) and reconfigure the associated
parsebinif incompatible. This allows use-cases where upstream is an HLS/DASH stream where codecs are different across bitrates. The playback remains seamless if the decoders are compatible.
decodebin3was able to bypass the usage of
parsebinaltogether if the incoming stream is pull-based, provides a
GstStreamCollectionand is compatible with the decoders or output caps.
urisourcebincan handle sources that handle buffering internally, avoiding dual-buffering.
A new core query
GST_QUERY_SELECTABLEwas added so that (source) elements could notify
decodebin3that they can handle stream selection and switching themselves.
Several improvements were made to
playbin3to allow complete stream type changes (such as going from playing audio+video to just audio or just video, and back), This allows temporarily disabling whole chains of elements when not needed.
Two limitations existed though, which are both related:
Changing URI required bringing
playbin3(and all contained elements) down to
GST_STATE_READY, setting the uri, and then bringing all elements back to
- This meant that all elements contained within were either discarded (decoders, demuxers, parsers, sources, ...) or reset (sinks)... despite potentially being 100% compatible (ex: going from h264/aac to h264/aac).
Gapless playback (i.e. automatically switching from one source to another, and removing any potential gap in the data arriving to the sinks) was implemented by pre-rolling a full
uridecodebin3for the next item to play and switching the inputs to
playsinkwhen the original
- This meant that none of the existing elements (demuxers, parsers, decoders,
..) contained in the original
- This meant that none of the existing elements (demuxers, parsers, decoders, ..) contained in the original
Those two use-cases are the same thing: We want to change the URI
urisourcebin) but re-use as much as possible of existing elements
playsink). The only difference between the two
use-cases is that changing URI should happen instantaneously in the first case,
whereas in the second case it happens when the initial source is done (EOS).
Fixing this will allow:
Reducing memory and cpu usage (no duplicate elements)
Lowering latency (no longer re-instantiate/reconfigure elements and re-use compatible ones as fast as possible).
Another issue which is related, is figuring out the optimal time at which the next item should be prepared so that it has enough data to playback immediately:
- This shouldn't be too early, some URIs expire after a given time, or the user might change their mind in between
- This shouldn't be too late, otherwise we risk not having enough data to playback seamlessly.
parsebin in urisourcebin
In order to figure out the optimal time at which a switch should happen (i.e. a given amount of "time" before the end of the previous play entry), this can only be done on "timed" data (i.e. parsed elementary streams).
There is therefore a new option on
parse-streams, which if
TRUE (non-default) will add a
parsebin (if and where needed) so that
urisourcebin only outputs elementary streams. A
multiqueue will also be
present to handle any interleave present (i.e. only queue up what is needed to
offer coherent streams downstream).
If buffering is activated on
multiqueue present after the
parsebin will be configured in order to handle it (and post the appropriate
This offers the following benefits:
about-to-finishcan be emitted by
urisourcebinas soon as
multiqueue, which will be more precise than the previous usage (before
queue2on non-timed data)
buffering is much closer to the actual buffering amount (in time) which is specified on the properties.
ALL scheduling downstream of
urisourcebinis push-based, removing a lot of issues when trying to change scheduling modes (push vs pull) dynamically.
parse-streams property is set to
TRUE when used in
Only use a single uridecodebin3 in playbin3
Only a single
uridecodebin3 is in use in
playbin3 and the source pads it
provides are directly linked to
There can only be at most one stream of each stream type (audio, video, text) on
the output side of
uridecodebin3. The exception to this is if the user/application
configured a specific multi-sinkpad combiner element for a given stream type,
in which case all streams of that given stream type are linked to that.
All uri-related properties are forwarded directly to
uridecodebin3, which will
handle switching the sources to the single
decodebin3 it contains.
uridecodebin3 URI and source handling
The URI for a given entry are handled in a
GstPlayItem structure which
controls (via intermediary structures):
urisourcebinassociated with the specified URI (and optional subtitle URI)
The pads provided by those sources, and which states they are in (eos, blocked, ...) and the associated GstStream (if present)
The buffering messages posted by those sources.
At any given point there is:
input_play_item, which is the play item currently feeding data into
output_play_itm, which is the play item currently being outputted by
Most of the time those two will be the same. But when switching play items (going from one URI to another, whether gapless or not) this switch will happen asynchronously.
Switching inputs to decodebin3
The high-level goal is to add to
uridecodebin3 the capability of being able to
GstPlayItem with the same
When the previous
GstPlayItemhas finished and there is a pending next
GstPlayItem. This is the "gapless" scenario.
Or immediately switch to the given
GstPlayItemwithout having to change state. This is the "instantaneous URI switch" scenario.
For this, the following points need to be solved:
- both scenarios: Add a way for "next"
GstPlayItemto be pre-rolled
- gapless: Determining when the switch can happen
- instant-uri: pre-roll next
GstPlayItemand flush downstream (to make the switch as quick as possile)
- both scenarios: Do the actual switch
pre-rolling play items
In order to be able to re-use the same decoders (within
decodebin3) as much as
possible from the outside, we need to ensure that we feed the ideal
"replacement" stream to the same
decodebin3 sink pad.
For example, if we are switching from an audio+video HLS source to another
audio+video DASH source, we want to make sure we link the new
source pad providing video to the
decodebin3 pad that was previously consuming
the old video stream.
In order to do this, the
urisourcebin we wish to switch to needs to be
pre-rolled (set to PAUSED, new pads are set to be blocked, and we wait for a
buffer/GAP to arrive on at least one of the pads).
At that point we will know the streams which are present in the new and old
urisourcebins and can unlink/relink compatible pads. If new sink pads are
required they will be requested, and if old pads are no longer needed (for
example switching from two streams to a single one) they will be removed.
Note: Doing this also has the benefit that "replacing" the inputs to
decodebin3are done from a new streaming thread, and not the old
urisourcebinstreaming thread which could cause deadlocks.
Note: This "waiting" is only done when "switching", i.e. on sources which aren't in the current input play item. If the pads are from the current play entry they are linked/unlinked as soon as they are added/removed.
The moment at which the next play item is pre-rolled is done:
When the current play item has posted
about-to-finishand the user/application has set a new play item.
When a new play item has been set and the
instant-uriproperty has been set to TRUE.
When a play item is pre-rolled, it is marked as "active". There can only be one "active" play item in addition to the input play item.
gapless: determining when the switch can happen
For gapless use-cases, we want to know the earliest time we can switch from one play item to another.
Since all streams coming from
urisourcebin parse-streams=True are push-based,
this is when the last EOS has been pushed through all pads of the source.
Instantaneous URI switching
In order to be able to switch URI as soon as possible while re-using as many
existing elements as possible, there is a new
instant-uri boolean property on
playbin3. The default value is FALSE.
If it is set to TRUE, the following happens whenever the
uri property is set:
On all pads of the current input play item:
FLUSH_STARTis sent to the downstream peer pads
- The pad is made blocking
- The pad is marked as EOS (i.e. as if EOS had been seen)
And then again on all pads:
FLUSH_STOPis sent to the downstream peer pads
Finally the new play item for the new URI is activated (pre-rolled).
- Once it is pre-rolled it will switch over
This ensures all downstream elements are kept and are ready to receive the new data.
Switching play items
Switching play items requires special attention since it needs to be done
"atomically". We need to ensure it is done by a single thread. This is done by
having a lock (
play_items_lock) which is taken whenever we need to modify the
list of play items and which play item is the current input/output.
We need to ensure the streaming thread(s) that were previously used are stopped. Since we are only dealing with push-based sources this is simple: we wait for the moment EOS is pushed on the last pad of the play item.
Another important consideration is that we need to ensure the thread that does the switch is not the previous streaming thread (it needs to be stopped).
In order to solve those issues, the actual replacement of the inputs will always happen from the streaming thread of the new play item, i.e. the one we wish to make the current input. This is done in a pad block probe on the new item source pad. Whenever a buffer (or GAP event) is received, we check whether we can switch:
If the current input play item is completely EOS, the switch can happen immediately. This will always be the case in instant-uri scenario and if the current input play item is pull-based.
If the current input play item is not completely EOS, the probe waits on the
GCond input_source_drained. This is the case that will commonly happen in gapless push-based scenarios, since we are waiting for the current input play item to be finished.
Once the switch can happen, we unlink all pads from
decodebin3 and attempt to
match compatible new source pads from
decodebin3. If new
sink pads are required they are requested, and if some sink pads are no longer
needed or do not match they are released.
Once all pads are linked, the new play item is set as the current play item.
In regards to gapless playback, the API does not change. Users are still
expected to listen to
about-to-finish and set the next URI to play back.
One thing that needs to be taken care of is making sure we don't emit
about-to-finish for play items which aren't currently used. This would end up
in a situation where
about-to-finish would cause a snowball effect of pending
play items emitting it, which would cause a future entry to be created,
prerolled and emitting it again.
For that reason, if a play item emits that signal but isn't the input or output play item, then it is just stored and not propagated upstream. When that play entry becomes the new input entry it will be propagated.
The results of the search are