platypush.plugins.stt

class platypush.plugins.stt.SttPlugin(input_device: Union[int, str, None] = None, hotword: Optional[str] = None, hotwords: Optional[List[str]] = None, conversation_timeout: Optional[float] = 10.0, block_duration: float = 1.0)[source]

Abstract class for speech-to-text plugins.

Triggers:

__init__(input_device: Union[int, str, None] = None, hotword: Optional[str] = None, hotwords: Optional[List[str]] = None, conversation_timeout: Optional[float] = 10.0, block_duration: float = 1.0)[source]
Parameters:
  • input_device – PortAudio device index or name that will be used for recording speech (default: default system audio input device).
  • hotword – When this word is detected, the plugin will trigger a platypush.message.event.stt.HotwordDetectedEvent instead of a platypush.message.event.stt.SpeechDetectedEvent event. You can use these events for hooking other assistants.
  • hotwords – Use a list of hotwords instead of a single one.
  • conversation_timeout – If hotword or hotwords are set and conversation_timeout is set, the next speech detected event will trigger a platypush.message.event.stt.ConversationDetectedEvent instead of a platypush.message.event.stt.SpeechDetectedEvent event. You can hook custom hooks here to run any logic depending on the detected speech - it can emulate a kind of “OK, Google. Turn on the lights” interaction without using an external assistant (default: 10 seconds).
  • block_duration – Duration of the acquired audio blocks (default: 1 second).
before_recording() → None[source]

Method called when the recording_thread starts. Put here any logic that you may want to run before the recording thread starts.

static convert_frames(frames: bytes) → bytes[source]

Conversion method for raw audio frames. It just returns the input frames as bytes. Override it if required by your logic.

Parameters:frames – Input audio frames, as bytes.
Returns:The audio frames as passed on the input. Override if required.
detect(audio_file: str) → platypush.message.response.stt.SpeechDetectedResponse[source]

Perform speech-to-text analysis on an audio file. Must be implemented by the derived classes.

Parameters:audio_file – Path to the audio file.
detect_speech(frames) → str[source]

Method called within the detection_thread when new audio frames have been captured. Must be implemented by the derived classes.

Parameters:frames – Audio frames, as returned by convert_frames.
Returns:Detected text, as a string. Returns an empty string if no text has been detected.
detection_thread() → None[source]

This thread reads frames from _audio_queue, performs the speech-to-text detection and calls

on_detection_ended() → None[source]

Method called when the detection_thread stops. Clean up your context variables and models here.

on_detection_started() → None[source]

Method called when the detection_thread starts. Initialize your context variables and models here if required.

on_recording_ended() → None[source]

Method called when the recording_thread stops. Put here any logic that you want to run after the audio device is closed.

on_recording_started() → None[source]

Method called after the recording_thread opens the audio device. Put here any logic that you may want to run after the recording starts.

on_speech_detected(speech: str) → None[source]

Hook called when speech is detected. Triggers the right event depending on the current context.

Parameters:speech – Detected speech.
recording_thread(block_duration: Optional[float] = None, block_size: Optional[int] = None, input_device: Optional[str] = None) → None[source]

Recording thread. It reads raw frames from the audio device and dispatches them to detection_thread.

Parameters:
  • block_duration – Audio blocks duration. Specify either block_duration or block_size.
  • block_size – Size of the audio blocks. Specify either block_duration or block_size.
  • input_device – Input device
start_detection(input_device: Optional[str] = None, seconds: Optional[float] = None, block_duration: Optional[float] = None) → None[source]

Start the speech detection engine.

Parameters:
  • input_device – Audio input device name/index override
  • seconds – If set, then the detection engine will stop after this many seconds, otherwise it’ll start running until stop_detection is called or application stop.
  • block_durationblock_duration override.
stop_detection() → None[source]

Stop the speech detection engine.