speechmaticstranscriber

speechmaticstranscriber is an element that uses the speechmatics realtime API to transcribe audio speech into text.

The element can work with both live and non-live data.

With non-live data, the speechmatics endpoint can process audio data much faster than real time. This makes the element useful for quickly transcribing large audio samples, but can also use up credit at a similar rate. It is thus advisable to exercise restraint when using the element with non-live data for the sole purpose of testing it.

With live data, it is up to the user to determine the appropriate values for both the latency property (which is the latency the element will advertise to the pipeline), and optionally for the max_delay property, which is a value that can be passed to the speechmatics API to request a latency from it.

In practice, the speechmatics API seems to take this value into account, but the actual observed delay from the sending of an audio sample to the reception of the matching text items may often exceed it.

It is important for that observed delay to remain lesser than the latency that was selected.

For this reason the element tracks a max-observed-delay, available as a readonly, notified property. The user can then monitor the property to tune the latency property as desired.

In addition, a warning message will be posted by the element whenever the maximum observed delay is greater than the latency that was selected.

Note that application users can also use the lateness property in combination with or instead of the latency property. Setting this property will result in the running times of the output items being shifted forward, thus desynchronizing them with the audio by a fixed offset. This is useful when loss of synchronization is an acceptable tradeoff for a decreased latency.

Finally, the element will push gaps at regular intervals as empty transcripts are received from the speechmatics API. In practice the average duration of such gaps has been observed to be roughly two seconds, this can serve as a basis for sizing queues on parallel branches when applicable.

Hierarchy

GObject
    ╰──GInitiallyUnowned
        ╰──GstObject
            ╰──GstElement
                ╰──speechmaticstranscriber

Implemented interfaces

Factory details

Authors: – Mathieu Duponchelle

Classification:Audio/Text/Filter

Rank – none

Plugin – speechmatics

Package – gst-plugin-speechmatics

Pad Templates

sink

audio/x-raw:
           rate: [ 8000, 48000 ]
       channels: 1
         layout: { (string)interleaved, (string)non-interleaved }
         format: S16LE

Presencealways

Directionsink

Object typeGstPad


src

text/x-raw:
         format: utf8

Presencealways

Directionsrc

Object typeGstSpeechmaticsTranscriberSrcPad


translate_src_%u

text/x-raw:
         format: utf8

Presencerequest

Directionsrc

Object typeGstSpeechmaticsTranscriberSrcPad


unsynced_src

application/x-json:

Presencealways

Directionsrc

Object typeGstPad


unsynced_translate_src_%u

application/x-json:

Presencesometimes

Directionsrc

Object typeGstPad


Properties

additional-vocabulary

“additional-vocabulary” GstValueArray *

Additional vocabulary speechmatics should use

Flags : Read / Write


api-key

“api-key” gchararray

Speechmatics API Key

Flags : Read / Write

Default value : NULL


diarization

“diarization” GstSpeechmaticsTranscriberDiarization *

Defines how to separate speakers in the audio

Flags : Read / Write

Default value : none (0)


enable-late-punctuation-hack

“enable-late-punctuation-hack” gboolean

deprecated: speechmatics now appears to group punctuation reliably

Flags : Read / Write

Default value : true


join-punctuation

“join-punctuation” gboolean

Whether punctuation should be joined with the preceding word

Flags : Read / Write

Default value : true


language-code

“language-code” gchararray

The Language of the Stream, ISO code

Flags : Read / Write

Default value : en


latency

“latency” guint

Amount of milliseconds to allow for transcription

Flags : Read / Write

Default value : 8000


lateness

“lateness” guint

Amount of milliseconds to introduce as lateness

Flags : Read / Write

Default value : 0


mask-profanities

“mask-profanities” gboolean

Mask profanities with * of the same length as the word

Flags : Read / Write

Default value : false


max-delay

“max-delay” guint

Max delay to pass to the speechmatics API (0 = use latency)

Flags : Read / Write

Default value : 0


max-observed-delay

“max-observed-delay” guint

Maximum delay observed between the sending of an audio sample and the reception of an item

Flags : Read

Default value : 0


max-speakers

“max-speakers” guint

The maximum number of speakers that may be detected with diarization=speaker

Flags : Read / Write

Default value : 50


url

“url” gchararray

URL of the transcription server

Flags : Read / Write

Default value : ws://0.0.0.0:9000


Named constants

GstSpeechmaticsTranscriberDiarization

Members

none (0) – None: no diarization
speaker (1) – Speaker: identify speakers by their voices

GstSpeechmaticsTranscriberSrcPad

GObject
    ╰──GInitiallyUnowned
        ╰──GstObject
            ╰──GstPad
                ╰──GstSpeechmaticsTranscriberSrcPad

Properties

language-code

“language-code” gchararray

The Language the Stream must be translated to

Flags : Read / Write

Default value : NULL


The results of the search are