ChunkingUtils

org.llm4s.llmconnect.utils.ChunkingUtils
object ChunkingUtils

Utilities for splitting text, audio, and video data into fixed-size overlapping chunks.

Each chunking method produces a sequence of windows with configurable size and overlap, suitable for embedding pipelines and multimodal processing.

Attributes

Graph
Supertypes
class Object
trait Matchable
class Any
Self type

Members list

Value members

Concrete methods

def chunkAudio(samples: Array[Float], sampleRate: Int, windowSeconds: Int, overlapRatio: Double, padToWindow: Boolean): Seq[Array[Float]]

Window an audio signal into fixed-length segments with overlap. Optionally right-pad the final window with zeros so all windows have equal length.

Window an audio signal into fixed-length segments with overlap. Optionally right-pad the final window with zeros so all windows have equal length.

Value parameters

overlapRatio

Overlap ratio in [0, 1). For example, 0.25 = 25% overlap.

padToWindow

If true, pad the last segment with zeros to full window length.

sampleRate

Samples per second (> 0).

samples

Mono PCM samples in [-1, 1].

windowSeconds

Window length in seconds (> 0).

Attributes

Returns

Sequence of audio windows (each Array[Float] of length windowSamples if padded).

def chunkText(text: String, size: Int, overlap: Int): Seq[String]

Splits a long text into chunks with specified size and overlap.

Splits a long text into chunks with specified size and overlap.

Value parameters

overlap

Number of overlapping characters between chunks (0 <= overlap < size).

size

Maximum characters per chunk (> 0).

text

Input string.

Attributes

Returns

Sequence of text chunks.

def chunkVideo[T](frames: Seq[T], fps: Int, clipSeconds: Int, overlapRatio: Double): Seq[Seq[T]]

Chunk a sequence of frames into clips of fixed duration with overlap. Generic over frame type T (e.g., BufferedImage).

Chunk a sequence of frames into clips of fixed duration with overlap. Generic over frame type T (e.g., BufferedImage).

Value parameters

clipSeconds

Clip duration in seconds (> 0).

fps

Frames per second (> 0).

frames

Sequence of frames.

overlapRatio

Overlap ratio in [0, 1).

Attributes

Returns

Sequence of frame clips (each is a Seq[T]).