DocumentChunker
org.llm4s.chunking.DocumentChunker
trait DocumentChunker
Document chunking strategy.
Implementations split text into manageable chunks for embedding and retrieval. Different strategies optimize for different content types:
- SimpleChunker: Basic character-based splitting
- SentenceChunker: Respects sentence boundaries
- MarkdownChunker: Preserves markdown structure
- SemanticChunker: Splits at topic boundaries using embeddings
Usage:
val chunker = ChunkerFactory.sentence()
val config = ChunkingConfig(targetSize = 800, overlap = 150)
val chunks = chunker.chunk(documentText, config)
chunks.foreach { chunk =>
println(s"Chunk $${chunk.index}: $${chunk.content.take(50)}...")
}
Attributes
- Graph
-
- Supertypes
-
class Objecttrait Matchableclass Any
- Known subtypes
Members list
In this article