org.llm4s.chunking.SentenceChunker
See theSentenceChunker companion object
class SentenceChunker extends DocumentChunker
Sentence-aware document chunker.
Splits text at sentence boundaries to preserve semantic coherence. Uses pattern matching for sentence detection (periods, question marks, etc.) while handling edge cases like abbreviations and decimal numbers.
This chunker produces higher quality chunks than simple character-based splitting because it never breaks in the middle of a sentence.
Usage:
val chunker = SentenceChunker()
val chunks = chunker.chunk(text, ChunkingConfig(targetSize = 800))
// Sentences are kept intact
chunks.foreach { c =>
println(s"[$${c.index}] $${c.content}")
}
Attributes
- Companion
- object
- Graph
-
- Supertypes
Members list
In this article