ChunkerFactory
Factory for creating document chunkers.
Provides convenient factory methods for creating different chunking strategies. Each strategy has different trade-offs between quality and performance.
Usage:
// Simple character-based chunking (fastest)
val simple = ChunkerFactory.simple()
// Sentence-aware chunking (recommended for most use cases)
val sentence = ChunkerFactory.sentence()
// Markdown-aware chunking (preserves structure)
val markdown = ChunkerFactory.markdown()
// Semantic chunking (highest quality, requires embeddings)
val modelConfig = EmbeddingModelConfig("text-embedding-3-small", 1536)
val semantic = ChunkerFactory.semantic(embeddingClient, modelConfig)
// Auto-detect based on content
val auto = ChunkerFactory.auto(text)
Attributes
- Graph
-
- Supertypes
-
class Objecttrait Matchableclass Any
- Self type
-
ChunkerFactory.type
Members list
Type members
Classlikes
Value members
Concrete methods
Auto-detect the best chunker based on content.
Auto-detect the best chunker based on content.
Analyzes the text to determine if it's markdown or plain text, then returns an appropriate chunker.
Value parameters
- text
-
Content to analyze
Attributes
- Returns
-
Appropriate DocumentChunker
Create a chunker by strategy name.
Create a chunker by strategy name.
Value parameters
- strategy
-
Strategy name: "simple", "sentence", "markdown", "semantic"
Attributes
- Returns
-
DocumentChunker or None if strategy unknown Note: The "semantic" strategy has a special fallback behavior:
- When requested via create("semantic"), a SentenceChunker is returned as fallback
- This is because semantic chunking requires an embedding client (not available via factory)
- To use true semantic chunking with embeddings, use semantic(embeddingClient, modelConfig)
- The fallback ensures graceful degradation instead of failing when embeddings aren't configured This design allows applications to specify "semantic" as a strategy preference without requiring embedding setup at construction time, while still providing a usable result.
Create a chunker based on strategy enum.
Create a chunker based on strategy enum.
Value parameters
- strategy
-
Chunking strategy
Attributes
- Returns
-
DocumentChunker Note on Semantic Strategy: When Strategy.Semantic is passed, this method returns a SentenceChunker as a fallback. This is intentional and provides several benefits:
- Graceful Degradation: Applications can safely request semantic chunking without requiring embedding configuration, falling back to sentence-based chunking.
- Type Safety: Unlike create(String), this method always returns a DocumentChunker (never fails), which is important for code that must execute.
- Ease of Testing: Tests can specify Strategy.Semantic without needing to mock embedding clients, while still verifying that a chunker is returned. For actual semantic chunking with embeddings, use the semantic(embeddingClient, modelConfig) method directly, which provides full semantic chunking capabilities.
Create a markdown-aware chunker.
Create a markdown-aware chunker.
Preserves markdown structure including:
- Heading boundaries and hierarchy
- Code blocks (keeps them intact)
- List structure
Best for markdown documentation and README files.
Attributes
Create a semantic chunker using embeddings.
Create a semantic chunker using embeddings.
Splits text at topic boundaries by analyzing semantic similarity between consecutive sentences. Produces the highest quality chunks but requires an embedding client.
Value parameters
- batchSize
-
Number of sentences to embed at once (default: 50)
- embeddingClient
-
Client for generating embeddings
- modelConfig
-
Model configuration for embeddings
- similarityThreshold
-
Minimum similarity to stay in same chunk (0.0-1.0, default: 0.5)
Attributes
Create a sentence-aware chunker.
Create a sentence-aware chunker.
Respects sentence boundaries for better quality chunks. Recommended for most text content.
Attributes
Create a simple character-based chunker.
Create a simple character-based chunker.
Fast but doesn't respect semantic boundaries. Use for content without clear sentence structure.
Attributes
Concrete fields
Get the default chunker (sentence-aware).
Get the default chunker (sentence-aware).