MarkdownChunker

org.llm4s.chunking.MarkdownChunker
See theMarkdownChunker companion object

Markdown-aware document chunker.

Preserves markdown structure by:

  • Respecting heading boundaries (# through ######)
  • Keeping code blocks intact when possible
  • Tracking heading hierarchy in chunk metadata
  • Preserving list structure

This chunker produces higher quality chunks for markdown content because it understands document structure.

Usage:

val chunker = MarkdownChunker()
val chunks = chunker.chunk(markdownText, ChunkingConfig(targetSize = 800))

chunks.foreach { c =>
 val headingPath = c.metadata.headings.mkString(" > ")
 println(s"[$$headingPath] $${c.content.take(50)}...")
}

Attributes

Companion
object
Graph
Supertypes
class Object
trait Matchable
class Any

Members list

Value members

Concrete methods

override def chunk(text: String, config: ChunkingConfig): Seq[DocumentChunk]

Split text into chunks.

Split text into chunks.

Value parameters

config

Chunking configuration

text

Input text to chunk

Attributes

Returns

Sequence of document chunks

Definition Classes

Inherited methods

def chunkWithSource(text: String, sourceFile: String, config: ChunkingConfig): Seq[DocumentChunk]

Split text into chunks with source file metadata.

Split text into chunks with source file metadata.

Value parameters

config

Chunking configuration

sourceFile

Source file name for metadata

text

Input text to chunk

Attributes

Returns

Sequence of document chunks with source metadata

Inherited from:
DocumentChunker