speechmaticstranscriber
speechmaticstranscriber is an element that uses the speechmatics
realtime API
to transcribe audio speech into text.
The element can work with both live and non-live data.
With non-live data, the speechmatics endpoint can process audio data much faster than real time. This makes the element useful for quickly transcribing large audio samples, but can also use up credit at a similar rate. It is thus advisable to exercise restraint when using the element with non-live data for the sole purpose of testing it.
With live data, it is up to the user to determine the appropriate values for both the
latency property (which is the latency the element will advertise to the pipeline), and
optionally for the max_delay property, which is a value that can be passed to the speechmatics
API to request a latency from it.
In practice, the speechmatics API seems to take this value into account, but the actual observed delay from the sending of an audio sample to the reception of the matching text items may often exceed it.
It is important for that observed delay to remain lesser than the latency that was selected.
For this reason the element tracks a max-observed-delay, available as a readonly, notified
property. The user can then monitor the property to tune the latency property as desired.
In addition, a warning message will be posted by the element whenever the maximum observed delay is greater than the latency that was selected.
Note that application users can also use the lateness property in combination with or instead
of the latency property. Setting this property will result in the running times of the output items
being shifted forward, thus desynchronizing them with the audio by a fixed offset. This is
useful when loss of synchronization is an acceptable tradeoff for a decreased latency.
Finally, the element will push gaps at regular intervals as empty transcripts are received from the speechmatics API. In practice the average duration of such gaps has been observed to be roughly two seconds, this can serve as a basis for sizing queues on parallel branches when applicable.
Hierarchy
GObject ╰──GInitiallyUnowned ╰──GstObject ╰──GstElement ╰──speechmaticstranscriber
Implemented interfaces
Factory details
Authors: – Mathieu Duponchelle
Classification: – Audio/Text/Filter
Rank – none
Plugin – speechmatics
Package – gst-plugin-speechmatics
Pad Templates
sink
audio/x-raw:
rate: [ 8000, 48000 ]
channels: 1
layout: { (string)interleaved, (string)non-interleaved }
format: S16LE
src
text/x-raw:
format: utf8
translate_src_%u
text/x-raw:
format: utf8
unsynced_translate_src_%u
application/x-json:
Properties
additional-vocabulary
“additional-vocabulary” GstValueArray *
Additional vocabulary speechmatics should use
Flags : Read / Write
diarization
“diarization” GstSpeechmaticsTranscriberDiarization *
Defines how to separate speakers in the audio
Flags : Read / Write
Default value : none (0)
enable-late-punctuation-hack
“enable-late-punctuation-hack” gboolean
deprecated: speechmatics now appears to group punctuation reliably
Flags : Read / Write
Default value : true
join-punctuation
“join-punctuation” gboolean
Whether punctuation should be joined with the preceding word
Flags : Read / Write
Default value : true
language-code
“language-code” gchararray
The Language of the Stream, ISO code
Flags : Read / Write
Default value : en
latency
“latency” guint
Amount of milliseconds to allow for transcription
Flags : Read / Write
Default value : 8000
lateness
“lateness” guint
Amount of milliseconds to introduce as lateness
Flags : Read / Write
Default value : 0
mask-profanities
“mask-profanities” gboolean
Mask profanities with * of the same length as the word
Flags : Read / Write
Default value : false
max-delay
“max-delay” guint
Max delay to pass to the speechmatics API (0 = use latency)
Flags : Read / Write
Default value : 0
max-observed-delay
“max-observed-delay” guint
Maximum delay observed between the sending of an audio sample and the reception of an item
Flags : Read
Default value : 0
max-speakers
“max-speakers” guint
The maximum number of speakers that may be detected with diarization=speaker
Flags : Read / Write
Default value : 50
url
“url” gchararray
URL of the transcription server
Flags : Read / Write
Default value : ws://0.0.0.0:9000
Named constants
GstSpeechmaticsTranscriberDiarization
Members
none (0) – None: no diarization
speaker (1) – Speaker: identify speakers by their voices
GstSpeechmaticsTranscriberSrcPad
GObject ╰──GInitiallyUnowned ╰──GstObject ╰──GstPad ╰──GstSpeechmaticsTranscriberSrcPad
Properties
language-code
“language-code” gchararray
The Language the Stream must be translated to
Flags : Read / Write
Default value : NULL
The results of the search are