ChunkerFactory
Factory for creating document chunkers.
Provides convenient factory methods for creating different chunking strategies. Each strategy has different trade-offs between quality and performance.
Usage:
// Simple character-based chunking (fastest)
val simple = ChunkerFactory.simple()
// Sentence-aware chunking (recommended for most use cases)
val sentence = ChunkerFactory.sentence()
// Markdown-aware chunking (preserves structure)
val markdown = ChunkerFactory.markdown()
// Semantic chunking (highest quality, requires embeddings)
val modelConfig = EmbeddingModelConfig("text-embedding-3-small", 1536)
val semantic = ChunkerFactory.semantic(embeddingClient, modelConfig)
// Auto-detect based on content
val auto = ChunkerFactory.auto(text)
Attributes
- Graph
-
- Supertypes
-
class Objecttrait Matchableclass Any
- Self type
-
ChunkerFactory.type
Members list
Type members
Classlikes
Value members
Concrete methods
Auto-detect the best chunker based on content.
Auto-detect the best chunker based on content.
Analyzes the text to determine if it's markdown or plain text, then returns an appropriate chunker.
Value parameters
- text
-
Content to analyze
Attributes
- Returns
-
Appropriate DocumentChunker
Create a chunker by strategy name.
Create a chunker by strategy name.
Value parameters
- strategy
-
Strategy name: "simple", "sentence", "markdown", "semantic"
Attributes
- Returns
-
DocumentChunker or None if strategy unknown Note: "semantic" strategy requires an EmbeddingProvider and returns a SentenceChunker as fallback. Use semantic() method for proper semantic chunking.
Create a chunker based on strategy enum.
Create a chunker based on strategy enum.
Value parameters
- strategy
-
Chunking strategy
Attributes
- Returns
-
DocumentChunker
Create a markdown-aware chunker.
Create a markdown-aware chunker.
Preserves markdown structure including:
- Heading boundaries and hierarchy
- Code blocks (keeps them intact)
- List structure
Best for markdown documentation and README files.
Attributes
Create a semantic chunker using embeddings.
Create a semantic chunker using embeddings.
Splits text at topic boundaries by analyzing semantic similarity between consecutive sentences. Produces the highest quality chunks but requires an embedding client.
Value parameters
- batchSize
-
Number of sentences to embed at once (default: 50)
- embeddingClient
-
Client for generating embeddings
- modelConfig
-
Model configuration for embeddings
- similarityThreshold
-
Minimum similarity to stay in same chunk (0.0-1.0, default: 0.5)
Attributes
Create a sentence-aware chunker.
Create a sentence-aware chunker.
Respects sentence boundaries for better quality chunks. Recommended for most text content.
Attributes
Create a simple character-based chunker.
Create a simple character-based chunker.
Fast but doesn't respect semantic boundaries. Use for content without clear sentence structure.
Attributes
Concrete fields
Get the default chunker (sentence-aware).
Get the default chunker (sentence-aware).