llm4s-core/org.llm4s/org.llm4s.chunking/SentenceChunker

SentenceChunker

org.llm4s.chunking.SentenceChunker

See theSentenceChunker companion object

class SentenceChunker extends DocumentChunker

Sentence-aware document chunker.

Splits text at sentence boundaries to preserve semantic coherence. Uses pattern matching for sentence detection (periods, question marks, etc.) while handling edge cases like abbreviations and decimal numbers.

This chunker produces higher quality chunks than simple character-based splitting because it never breaks in the middle of a sentence.

Usage:

val chunker = SentenceChunker()
val chunks = chunker.chunk(text, ChunkingConfig(targetSize = 800))

// Sentences are kept intact
chunks.foreach { c =>
 println(s"[$${c.index}] $${c.content}")
}

Attributes

Companion: object
Graph
Supertypes: trait DocumentChunker

class Object

trait Matchable

class Any

Members list

Value members

Concrete methods

Split text into chunks.

Value parameters

config: Chunking configuration
text: Input text to chunk

Attributes

Returns: Sequence of document chunks
Definition Classes: DocumentChunker

Inherited methods

Split text into chunks with source file metadata.

Value parameters

config: Chunking configuration
sourceFile: Source file name for metadata
text: Input text to chunk

Attributes

Returns: Sequence of document chunks with source metadata
Inherited from:: DocumentChunker

In this article

Generated with