SimpleChunker

org.llm4s.chunking.SimpleChunker
See theSimpleChunker companion object

Simple character-based chunker wrapping legacy ChunkingUtils.

Provides compatibility with existing code while conforming to the new DocumentChunker interface. Splits text into fixed-size chunks without semantic awareness.

Use this chunker when:

  • You need maximum compatibility with existing code
  • Content has no clear sentence structure
  • Performance is more important than quality

For better quality chunks, consider:

  • SentenceChunker: Respects sentence boundaries
  • MarkdownChunker: Preserves markdown structure
  • SemanticChunker: Uses embedding similarity

Usage:

val chunker = SimpleChunker()
val chunks = chunker.chunk(text, ChunkingConfig(targetSize = 800, overlap = 150))

Attributes

Companion
object
Graph
Supertypes
class Object
trait Matchable
class Any

Members list

Value members

Concrete methods

override def chunk(text: String, config: ChunkingConfig): Seq[DocumentChunk]

Split text into chunks.

Split text into chunks.

Value parameters

config

Chunking configuration

text

Input text to chunk

Attributes

Returns

Sequence of document chunks

Definition Classes

Inherited methods

def chunkWithSource(text: String, sourceFile: String, config: ChunkingConfig): Seq[DocumentChunk]

Split text into chunks with source file metadata.

Split text into chunks with source file metadata.

Value parameters

config

Chunking configuration

sourceFile

Source file name for metadata

text

Input text to chunk

Attributes

Returns

Sequence of document chunks with source metadata

Inherited from:
DocumentChunker