org.llm4s.speech.stt

Members list

Type members

Classlikes

sealed trait STTError extends LLMError

Errors that can occur during speech-to-text processing.

Errors that can occur during speech-to-text processing.

Attributes

Companion
object
Supertypes
trait LLMError
trait Serializable
trait Product
trait Equals
class Object
trait Matchable
class Any
Show all
Known subtypes
object STTError

Attributes

Companion
trait
Supertypes
trait Sum
trait Mirror
class Object
trait Matchable
class Any
Self type
STTError.type
final case class STTOptions(language: Option[String], prompt: Option[String], enableTimestamps: Boolean, diarization: Boolean, confidenceThreshold: Double)

Options for speech-to-text transcription.

Options for speech-to-text transcription.

Value parameters

confidenceThreshold

Minimum confidence (0.0-1.0) to include words

diarization

Whether to detect and separate speakers

enableTimestamps

Whether to include word-level timestamps

language

BCP 47 language tag (e.g., "en-US", "fr-FR")

prompt

Optional context or dictionary to guide transcription

Attributes

Companion
object
Supertypes
trait Serializable
trait Product
trait Equals
class Object
trait Matchable
class Any
Show all
object STTOptions

Attributes

Companion
class
Supertypes
trait Product
trait Mirror
class Object
trait Matchable
class Any
Self type
STTOptions.type
trait SpeechToText

Abstraction for speech-to-text conversion providers.

Abstraction for speech-to-text conversion providers.

Implementations should handle various audio formats and provide optional features like word-level timestamps and speaker diarization.

Attributes

Supertypes
class Object
trait Matchable
class Any
Known subtypes
final case class Transcription(text: String, language: Option[String], confidence: Option[Double], timestamps: List[WordTimestamp], meta: Option[AudioMeta], processingTimeMs: Option[Long])

Complete transcription result from speech-to-text processing.

Complete transcription result from speech-to-text processing.

Value parameters

confidence

Overall confidence of the transcription

language

Detected or specified language

meta

Source audio metadata

processingTimeMs

Time taken to process (for metrics/monitoring)

text

Full transcription text

timestamps

Word-level timing information (only if enabled)

Attributes

Supertypes
trait Serializable
trait Product
trait Equals
class Object
trait Matchable
class Any
Show all
final class VoskSpeechToText(modelPath: Option[String], targetSampleRate: Int, bufferSize: Int) extends SpeechToText

Vosk-based speech-to-text implementation. Replaces Sphinx4 as it's more actively maintained and has better performance.

Vosk-based speech-to-text implementation. Replaces Sphinx4 as it's more actively maintained and has better performance.

Value parameters

bufferSize

Buffer size for audio processing (bytes). Larger sizes may improve throughput.

modelPath

Path to the Vosk model directory. Defaults to standard Vosk model location.

targetSampleRate

Target sample rate for audio preprocessing (Hz). Vosk standard is 16000.

Attributes

Companion
object
Supertypes
trait SpeechToText
class Object
trait Matchable
class Any

Attributes

Companion
class
Supertypes
class Object
trait Matchable
class Any
Self type
final class WhisperSpeechToText(command: Seq[String], model: String, outputFormat: String) extends SpeechToText

Enhanced Whisper integration via CLI (whisper.cpp or openai-whisper). Supports various Whisper models and output formats.

Enhanced Whisper integration via CLI (whisper.cpp or openai-whisper). Supports various Whisper models and output formats.

Attributes

Supertypes
trait SpeechToText
class Object
trait Matchable
class Any
final case class WordTimestamp(word: String, startSec: Double, endSec: Double, speakerId: Option[Int], confidence: Option[Double])

Word-level timestamp information from transcription with optional speaker identification.

Word-level timestamp information from transcription with optional speaker identification.

Value parameters

confidence

Optional confidence score (0.0-1.0)

endSec

End time in seconds

speakerId

Optional speaker identifier for diarized content (int-based for efficiency)

startSec

Start time in seconds (relative to audio start)

word

The word text

Attributes

Companion
object
Supertypes
trait Serializable
trait Product
trait Equals
class Object
trait Matchable
class Any
Show all
object WordTimestamp

Attributes

Companion
class
Supertypes
trait Product
trait Mirror
class Object
trait Matchable
class Any
Self type