speechmaticstranscriber

speechmaticstranscriber is an element that uses the speechmatics realtime API to transcribe audio speech into text.

The element can work with both live and non-live data.

With non-live data, the speechmatics endpoint can process audio data much faster than real time. This makes the element useful for quickly transcribing large audio samples, but can also use up credit at a similar rate. It is thus advisable to exercise restraint when using the element with non-live data for the sole purpose of testing it.

With live data, it is up to the user to determine the appropriate values for both the latency property (which is the latency the element will advertise to the pipeline), and optionally for the max_delay property, which is a value that can be passed to the speechmatics API to request a latency from it.

In practice, the speechmatics API seems to take this value into account, but the actual observed delay from the sending of an audio sample to the reception of the matching text items may often exceed it.

It is important for that observed delay to remain lesser than the latency that was selected.

For this reason the element tracks a max-observed-delay, available as a readonly, notified property. The user can then monitor the property to tune the latency property as desired.

In addition, a warning message will be posted by the element whenever the maximum observed delay is greater than the latency that was selected.

Note that application users can also use the lateness property in combination with or instead of the latency property. Setting this property will result in the running times of the output items being shifted forward, thus desynchronizing them with the audio by a fixed offset. This is useful when loss of synchronization is an acceptable tradeoff for a decreased latency.

Finally, the element will push gaps at regular intervals as empty transcripts are received from the speechmatics API. In practice the average duration of such gaps has been observed to be roughly two seconds, this can serve as a basis for sizing queues on parallel branches when applicable.

Hierarchy

GObject
    ╰──GInitiallyUnowned
        ╰──GstObject
            ╰──GstElement
                ╰──speechmaticstranscriber

Implemented interfaces

GstChildProxy

Factory details

Authors: – Mathieu Duponchelle

Classification: – Audio/Text/Filter

Rank – none

Plugin – speechmatics

Package – gst-plugin-speechmatics

Pad Templates

`sink`

audio/x-raw:
           rate: [ 8000, 48000 ]
       channels: 1
         layout: { (string)interleaved, (string)non-interleaved }
         format: S16LE

Presence – always

Direction – sink

Object type – GstPad

`src`

text/x-raw:
         format: utf8

Presence – always

Direction – src

Object type – GstSpeechmaticsTranscriberSrcPad

`translate_src_%u`

text/x-raw:
         format: utf8

Presence – request

Direction – src

Object type – GstSpeechmaticsTranscriberSrcPad

`unsynced_src`

application/x-json:

Presence – always

Direction – src

Object type – GstPad

`unsynced_translate_src_%u`

application/x-json:

Presence – sometimes

Direction – src

Object type – GstPad

Properties

additional-vocabulary

“additional-vocabulary” GstValueArray *

Additional vocabulary speechmatics should use

Flags : Read / Write

api-key

“api-key” gchararray

Speechmatics API Key

Flags : Read / Write

Default value : NULL

diarization

“diarization” GstSpeechmaticsTranscriberDiarization *

Defines how to separate speakers in the audio

Flags : Read / Write

Default value : none (0)

enable-late-punctuation-hack

“enable-late-punctuation-hack” gboolean

deprecated: speechmatics now appears to group punctuation reliably

Flags : Read / Write

Default value : true

get-speakers-interval

“get-speakers-interval” guint

Interval between GetSpeakers calls, in number of non-empty transcripts. 0 = disabled

Flags : Read / Write

Default value : 0

join-punctuation

“join-punctuation” gboolean

Whether punctuation should be joined with the preceding word

Flags : Read / Write

Default value : true

labeled-speakers

“labeled-speakers” GstValueArray *

Known array of labeled speakers. Each structure should a hold a label field with a string value, and a speaker_identifiers field with an array of strings as value.See https://docs.speechmatics.com/speech-to-text/realtime/speaker-identification for more information.

Flags : Read / Write

language-code

“language-code” gchararray

The Language of the Stream, ISO code

Flags : Read / Write

Default value : en

latency

“latency” guint

Amount of milliseconds to allow for transcription

Flags : Read / Write

Default value : 8000

lateness

“lateness” guint

Amount of milliseconds to introduce as lateness

Flags : Read / Write

Default value : 0

mask-profanities

“mask-profanities” gboolean

Mask profanities with * of the same length as the word

Flags : Read / Write

Default value : false

max-delay

“max-delay” guint

Max delay to pass to the speechmatics API (0 = use latency)

Flags : Read / Write

Default value : 0

max-observed-delay

“max-observed-delay” guint

Maximum delay observed between the sending of an audio sample and the reception of an item

Flags : Read

Default value : 0

max-speakers

“max-speakers” guint

The maximum number of speakers that may be detected with diarization=speaker

Flags : Read / Write

Default value : 50

remove-disfluencies

“remove-disfluencies” gboolean

Remove hesitation sounds from transcript

Flags : Read / Write

Default value : false

url

“url” gchararray

URL of the transcription server

Flags : Read / Write

Default value : ws://0.0.0.0:9000

Named constants

GstSpeechmaticsTranscriberDiarization

Members

none (0) – None: no diarization

speaker (1) – Speaker: identify speakers by their voices

GstSpeechmaticsTranscriberSrcPad

GObject
    ╰──GInitiallyUnowned
        ╰──GstObject
            ╰──GstPad
                ╰──GstSpeechmaticsTranscriberSrcPad

Properties

language-code

“language-code” gchararray

The Language the Stream must be translated to

Flags : Read / Write

Default value : NULL

The results of the search are