SentenceChunker

org.llm4s.chunking.SentenceChunker
See theSentenceChunker companion object

Sentence-aware document chunker.

Splits text at sentence boundaries to preserve semantic coherence. Uses pattern matching for sentence detection (periods, question marks, etc.) while handling edge cases like abbreviations and decimal numbers.

This chunker produces higher quality chunks than simple character-based splitting because it never breaks in the middle of a sentence.

Usage:

val chunker = SentenceChunker()
val chunks = chunker.chunk(text, ChunkingConfig(targetSize = 800))

// Sentences are kept intact
chunks.foreach { c =>
 println(s"[$${c.index}] $${c.content}")
}

Attributes

Companion
object
Graph
Supertypes
class Object
trait Matchable
class Any

Members list

Value members

Concrete methods

override def chunk(text: String, config: ChunkingConfig): Seq[DocumentChunk]

Split text into chunks.

Split text into chunks.

Value parameters

config

Chunking configuration

text

Input text to chunk

Attributes

Returns

Sequence of document chunks

Definition Classes

Inherited methods

def chunkWithSource(text: String, sourceFile: String, config: ChunkingConfig): Seq[DocumentChunk]

Split text into chunks with source file metadata.

Split text into chunks with source file metadata.

Value parameters

config

Chunking configuration

sourceFile

Source file name for metadata

text

Input text to chunk

Attributes

Returns

Sequence of document chunks with source metadata

Inherited from:
DocumentChunker