MultiDocumentGraphBuilder

org.llm4s.knowledgegraph.extraction.MultiDocumentGraphBuilder

Orchestrates multi-document knowledge graph extraction.

Composes the extraction pipeline:

  1. Per document: coreference resolution → schema-guided (or free-form) extraction → schema validation
  2. After all documents: merge graphs → entity linking → final SourceTrackedGraph

Supports incremental building: pass an existing SourceTrackedGraph to add new documents without re-extracting from previously processed ones.

Value parameters

config

Configuration controlling which pipeline stages are enabled

llmClient

The LLM client used by all pipeline components

Attributes

Graph
Supertypes
class Object
trait Matchable
class Any

Members list

Value members

Concrete methods

Extracts a knowledge graph from a single document with source provenance.

Extracts a knowledge graph from a single document with source provenance.

Runs the full per-document pipeline: coreference → extraction → validation.

Value parameters

source

The document source metadata

text

The document text to extract from

Attributes

Returns

A SourceTrackedGraph containing entities from this document

def extractDocuments(documents: Seq[(String, DocumentSource)], existingGraph: Option[SourceTrackedGraph]): Result[SourceTrackedGraph]

Extracts knowledge graphs from multiple documents and merges them into a single graph.

Extracts knowledge graphs from multiple documents and merges them into a single graph.

Runs the full pipeline per document, merges all results, then applies entity linking. Supports incremental building via the existingGraph parameter.

Fails fast: if any document's extraction fails, the method returns immediately with that error and subsequent documents are not processed.

Value parameters

documents

Sequence of (text, source) pairs to extract from

existingGraph

Optional existing graph to build upon incrementally

Attributes

Returns

A SourceTrackedGraph combining all documents, or the first extraction error