org.llm4s.speech.stt
Members list
Type members
Classlikes
Errors that can occur during speech-to-text processing.
Errors that can occur during speech-to-text processing.
Attributes
Options for speech-to-text transcription.
Options for speech-to-text transcription.
Value parameters
- confidenceThreshold
-
Minimum confidence (0.0-1.0) to include words
- diarization
-
Whether to detect and separate speakers
- enableTimestamps
-
Whether to include word-level timestamps
- language
-
BCP 47 language tag (e.g., "en-US", "fr-FR")
- prompt
-
Optional context or dictionary to guide transcription
Attributes
- Companion
- object
- Supertypes
-
trait Serializabletrait Producttrait Equalsclass Objecttrait Matchableclass AnyShow all
Attributes
- Companion
- class
- Supertypes
-
trait Producttrait Mirrorclass Objecttrait Matchableclass Any
- Self type
-
STTOptions.type
Abstraction for speech-to-text conversion providers.
Abstraction for speech-to-text conversion providers.
Implementations should handle various audio formats and provide optional features like word-level timestamps and speaker diarization.
Attributes
- Supertypes
-
class Objecttrait Matchableclass Any
- Known subtypes
-
class VoskSpeechToTextclass WhisperSpeechToText
Complete transcription result from speech-to-text processing.
Complete transcription result from speech-to-text processing.
Value parameters
- confidence
-
Overall confidence of the transcription
- language
-
Detected or specified language
- meta
-
Source audio metadata
- processingTimeMs
-
Time taken to process (for metrics/monitoring)
- text
-
Full transcription text
- timestamps
-
Word-level timing information (only if enabled)
Attributes
- Supertypes
-
trait Serializabletrait Producttrait Equalsclass Objecttrait Matchableclass AnyShow all
Vosk-based speech-to-text implementation. Replaces Sphinx4 as it's more actively maintained and has better performance.
Vosk-based speech-to-text implementation. Replaces Sphinx4 as it's more actively maintained and has better performance.
Value parameters
- bufferSize
-
Buffer size for audio processing (bytes). Larger sizes may improve throughput.
- modelPath
-
Path to the Vosk model directory. Defaults to standard Vosk model location.
- targetSampleRate
-
Target sample rate for audio preprocessing (Hz). Vosk standard is 16000.
Attributes
- Companion
- object
- Supertypes
Attributes
- Companion
- class
- Supertypes
-
class Objecttrait Matchableclass Any
- Self type
-
VoskSpeechToText.type
Enhanced Whisper integration via CLI (whisper.cpp or openai-whisper). Supports various Whisper models and output formats.
Enhanced Whisper integration via CLI (whisper.cpp or openai-whisper). Supports various Whisper models and output formats.
Attributes
- Supertypes
Word-level timestamp information from transcription with optional speaker identification.
Word-level timestamp information from transcription with optional speaker identification.
Value parameters
- confidence
-
Optional confidence score (0.0-1.0)
- endSec
-
End time in seconds
- speakerId
-
Optional speaker identifier for diarized content (int-based for efficiency)
- startSec
-
Start time in seconds (relative to audio start)
- word
-
The word text
Attributes
- Companion
- object
- Supertypes
-
trait Serializabletrait Producttrait Equalsclass Objecttrait Matchableclass AnyShow all
Attributes
- Companion
- class
- Supertypes
-
trait Producttrait Mirrorclass Objecttrait Matchableclass Any
- Self type
-
WordTimestamp.type