llm4s-core/org.llm4s/org.llm4s.chunking/ChunkerFactory

ChunkerFactory

org.llm4s.chunking.ChunkerFactory

Factory for creating document chunkers.

Provides convenient factory methods for creating different chunking strategies. Each strategy has different trade-offs between quality and performance.

Usage:

// Simple character-based chunking (fastest)
val simple = ChunkerFactory.simple()

// Sentence-aware chunking (recommended for most use cases)
val sentence = ChunkerFactory.sentence()

// Markdown-aware chunking (preserves structure)
val markdown = ChunkerFactory.markdown()

// Semantic chunking (highest quality, requires embeddings)
val modelConfig = EmbeddingModelConfig("text-embedding-3-small", 1536)
val semantic = ChunkerFactory.semantic(embeddingClient, modelConfig)

// Auto-detect based on content
val auto = ChunkerFactory.auto(text)

Attributes

Graph
Supertypes: class Object

trait Matchable

class Any
Self type: ChunkerFactory.type

Members list

Type members

Classlikes

Attributes

Companion: trait
Supertypes: trait Sum

trait Mirror

class Object

trait Matchable

class Any
Self type: Strategy.type

Chunking strategy type

Attributes

Companion: object
Supertypes: class Object

trait Matchable

class Any
Known subtypes: object Markdown

object Semantic

object Sentence

object Simple

Value members

Concrete methods

Auto-detect the best chunker based on content.

Analyzes the text to determine if it's markdown or plain text, then returns an appropriate chunker.

Value parameters

text: Content to analyze

Attributes

Returns: Appropriate DocumentChunker

Create a chunker by strategy name.

Value parameters

strategy: Strategy name: "simple", "sentence", "markdown", "semantic"

Attributes

Returns: DocumentChunker or None if strategy unknown Note: "semantic" strategy requires an EmbeddingProvider and returns a SentenceChunker as fallback. Use semantic() method for proper semantic chunking.

Create a chunker based on strategy enum.

Value parameters

strategy: Chunking strategy

Attributes

Returns: DocumentChunker

Create a markdown-aware chunker.

Preserves markdown structure including:

Heading boundaries and hierarchy
Code blocks (keeps them intact)
List structure

Best for markdown documentation and README files.

Attributes

Create a semantic chunker using embeddings.

Splits text at topic boundaries by analyzing semantic similarity between consecutive sentences. Produces the highest quality chunks but requires an embedding client.

Value parameters

batchSize: Number of sentences to embed at once (default: 50)
embeddingClient: Client for generating embeddings
modelConfig: Model configuration for embeddings
similarityThreshold: Minimum similarity to stay in same chunk (0.0-1.0, default: 0.5)

Attributes

Create a sentence-aware chunker.

Respects sentence boundaries for better quality chunks. Recommended for most text content.

Attributes

Create a simple character-based chunker.

Fast but doesn't respect semantic boundaries. Use for content without clear sentence structure.

Attributes

Concrete fields

Get the default chunker (sentence-aware).

Attributes

In this article

Generated with