core/org.llm4s/org.llm4s.knowledgegraph/org.llm4s.knowledgegraph.extraction/MultiDocumentGraphBuilder

MultiDocumentGraphBuilder

org.llm4s.knowledgegraph.extraction.MultiDocumentGraphBuilder

class MultiDocumentGraphBuilder(llmClient: LLMClient, config: ExtractionConfig)

Orchestrates multi-document knowledge graph extraction.

Composes the extraction pipeline:

Per document: coreference resolution → schema-guided (or free-form) extraction → schema validation
After all documents: merge graphs → entity linking → final SourceTrackedGraph

Supports incremental building: pass an existing SourceTrackedGraph to add new documents without re-extracting from previously processed ones.

Value parameters

config: Configuration controlling which pipeline stages are enabled
llmClient: The LLM client used by all pipeline components

Attributes

Example

val builder = new MultiDocumentGraphBuilder(llmClient, ExtractionConfig(
 schema = Some(ExtractionSchema.simple(Seq("Person", "Org"), Seq("WORKS_FOR")))
))
val docs = Seq(
 ("Alice works at Acme.", DocumentSource("doc1", "Report 1")),
 ("Bob works at Acme.", DocumentSource("doc2", "Report 2"))
)
val result = builder.extractDocuments(docs)

Graph

Supertypes

class Object

trait Matchable

class Any

Members list

Value members

Concrete methods

Extracts a knowledge graph from a single document with source provenance.

Runs the full per-document pipeline: coreference → extraction → validation.

Value parameters

source: The document source metadata
text: The document text to extract from

Attributes

Returns: A SourceTrackedGraph containing entities from this document

Extracts knowledge graphs from multiple documents and merges them into a single graph.

Runs the full pipeline per document, merges all results, then applies entity linking. Supports incremental building via the existingGraph parameter.

Fails fast: if any document's extraction fails, the method returns immediately with that error and subsequent documents are not processed.

Value parameters

documents: Sequence of (text, source) pairs to extract from
existingGraph: Optional existing graph to build upon incrementally

Attributes

Returns: A SourceTrackedGraph combining all documents, or the first extraction error

In this article

Generated with