org.llm4s.knowledgegraph.extraction

Resolves coreferences within a document before knowledge graph extraction.

Uses an LLM to replace pronouns and indirect references (e.g., "he", "the company", "its founder") with explicit entity names. This pre-processing step reduces duplicate nodes that would otherwise be created when the extractor encounters unresolved references.

Value parameters

llmClient: The LLM client to use for coreference resolution

Attributes

Example

val resolver = new CoreferenceResolver(llmClient)
val resolved = resolver.resolve("Alice works at Acme. She is the CEO.")
// resolved: Right("Alice works at Acme. Alice is the CEO.")

Supertypes

class Object

trait Matchable

class Any

Represents a source document from which knowledge graph entities were extracted.

Value parameters

id: Unique identifier for the document
metadata: Additional metadata about the document (e.g., author, date, URL)
title: Human-readable title or filename

Attributes

Supertypes: trait Serializable

trait Product

trait Equals

class Object

trait Matchable

class Any
Show all

Links and deduplicates entities across a knowledge graph.

Performs two passes:

Deterministic same-name merging: nodes with the same label and normalized name property are merged, with edges rewritten to point at the surviving node.
LLM-assisted disambiguation (optional): ambiguous clusters (e.g., "Jobs" vs "Steve Jobs") are sent to the LLM to confirm or reject merges.

Value parameters

llmClient: Optional LLM client for disambiguation. If None, only deterministic merging is performed.

Attributes

Example

// Deterministic linking only
val linker = new EntityLinker(None)
val deduped = linker.link(graphWithDuplicates)
// With LLM-assisted disambiguation
val smartLinker = new EntityLinker(Some(llmClient))
val result = smartLinker.link(graphWithAmbiguousEntities)

Supertypes

class Object

trait Matchable

class Any

Defines an entity type expected in the extraction schema.

Value parameters

description: Optional description used as a hint for the LLM
name: The entity type name (e.g., "Person", "Organization")
properties: Properties expected on entities of this type

Attributes

Supertypes: trait Serializable

trait Product

trait Equals

class Object

trait Matchable

class Any
Show all

Configuration for multi-document graph extraction.

Value parameters

enableCoreference: Whether to run coreference resolution on each document before extraction
enableEntityLinking: Whether to run entity linking (deduplication) after merging documents
llmDisambiguation: Whether to use LLM-assisted disambiguation during entity linking
schema: Optional schema to constrain extraction. When absent, free-form extraction is used.

Attributes

Example

val config = ExtractionConfig(
 schema = Some(ExtractionSchema.simple(Seq("Person", "Org"), Seq("WORKS_FOR"))),
 enableCoreference = true,
 enableEntityLinking = true
)

Supertypes

trait Serializable

trait Product

trait Equals

class Object

trait Matchable

class Any

Show all

Schema for guided knowledge graph extraction.

When provided to an extractor, the LLM prompt is constrained to these types and extracted results are validated against the schema. If allowOutOfSchema is true, entities and relationships outside the schema are preserved but flagged.

Value parameters

allowOutOfSchema: If true, out-of-schema entities/relationships are kept; if false, they are dropped
entityTypes: Expected entity type definitions
relationshipTypes: Expected relationship type definitions

Attributes

Companion: object
Supertypes: trait Serializable

trait Product

trait Equals

class Object

trait Matchable

class Any
Show all

Attributes

Companion: class
Supertypes: trait Product

trait Mirror

class Object

trait Matchable

class Any
Self type: ExtractionSchema.type

Generates a Knowledge Graph from unstructured text using an LLM and writes it to a GraphStore.

Value parameters

graphStore: The graph store to write extracted entities and relationships to
llmClient: The LLM client to use for extraction

Attributes

Example

val generator = new KnowledgeGraphGenerator(llmClient, graphStore)
val result = generator.extract(
 text = "Alice works at Acme Corp in San Francisco.",
 entityTypes = List("Person", "Organization", "Location")
)
// result: Right(Graph) with nodes for Alice, Acme Corp, San Francisco

Supertypes

class Object

trait Matchable

class Any

Orchestrates multi-document knowledge graph extraction.

Composes the extraction pipeline:

Per document: coreference resolution → schema-guided (or free-form) extraction → schema validation
After all documents: merge graphs → entity linking → final SourceTrackedGraph

Supports incremental building: pass an existing SourceTrackedGraph to add new documents without re-extracting from previously processed ones.

Value parameters

config: Configuration controlling which pipeline stages are enabled
llmClient: The LLM client used by all pipeline components

Attributes

Example

val builder = new MultiDocumentGraphBuilder(llmClient, ExtractionConfig(
 schema = Some(ExtractionSchema.simple(Seq("Person", "Org"), Seq("WORKS_FOR")))
))
val docs = Seq(
 ("Alice works at Acme.", DocumentSource("doc1", "Report 1")),
 ("Bob works at Acme.", DocumentSource("doc2", "Report 2"))
)
val result = builder.extractDocuments(docs)

Supertypes

class Object

trait Matchable

class Any

Defines a property expected on an entity type.

Value parameters

description: Optional description used as a hint for the LLM
name: The property name (e.g., "role", "department")
required: Whether this property must be present on extracted entities of the parent type

Attributes

Supertypes: trait Serializable

trait Product

trait Equals

class Object

trait Matchable

class Any
Show all

Defines a relationship type expected in the extraction schema.

Value parameters

description: Optional description used as a hint for the LLM
name: The relationship type name (e.g., "WORKS_FOR", "LOCATED_IN")
sourceTypes: Valid source entity types for this relationship (empty means any)
targetTypes: Valid target entity types for this relationship (empty means any)

Attributes

Supertypes: trait Serializable

trait Product

trait Equals

class Object

trait Matchable

class Any
Show all

Extracts a Knowledge Graph from text using schema-constrained LLM prompts.

Unlike the free-form KnowledgeGraphGenerator, this extractor guides the LLM by listing the allowed entity types, relationship types, and expected properties from an ExtractionSchema. The LLM output is still parsed as JSON but the prompt strongly constrains what types are produced.

Value parameters

llmClient: The LLM client to use for extraction

Attributes

Example

val schema = ExtractionSchema.simple(
 entityTypes = Seq("Person", "Organization"),
 relationshipTypes = Seq("WORKS_FOR", "MANAGES")
)
val extractor = new SchemaGuidedExtractor(llmClient)
val result = extractor.extract("Alice manages Bob at Acme Corp.", schema)

Supertypes

class Object

trait Matchable

class Any

Validates an extracted graph against an ExtractionSchema.

Separates nodes and edges into conforming (valid) and non-conforming (out-of-schema) sets, and reports specific constraint violations such as relationship endpoint type mismatches.

Value parameters

schema: The extraction schema to validate against

Attributes

Example

val schema = ExtractionSchema.simple(
 entityTypes = Seq("Person", "Organization"),
 relationshipTypes = Seq("WORKS_FOR")
)
val validator = new SchemaValidator(schema)
val result = validator.validate(extractedGraph)
if (result.isFullyValid) println("Graph conforms to schema")

Supertypes

class Object

trait Matchable

class Any

Represents a violation found during schema validation.

Value parameters

description: Human-readable description of the violation
entityId: Optional ID of the entity involved
violationType: Category of violation (e.g., "out_of_schema_entity", "invalid_relationship_endpoint")

Attributes

Supertypes: trait Serializable

trait Product

trait Equals

class Object

trait Matchable

class Any
Show all

A graph with source provenance tracking.

Wraps a standard Graph with metadata about which documents contributed each node and edge. The underlying Graph remains a pure data structure; provenance is tracked externally.

Value parameters

edgeSources: Mapping of (source, target, relationship) to the set of source document IDs
graph: The underlying knowledge graph
nodeSources: Mapping of node ID to the set of source document IDs that contributed it
sources: All document sources that contributed to this graph

Attributes

Example

val doc = DocumentSource("doc1", "Annual Report")
val graph = Graph.empty
 .addNode(Node("alice", "Person"))
 .addEdge(Edge("alice", "acme", "WORKS_FOR"))
val tracked = SourceTrackedGraph.fromGraph(graph, doc)
tracked.getNodeSources("alice") // Set(doc)

Companion

object

Supertypes

trait Serializable

trait Product

trait Equals

class Object

trait Matchable

class Any

Show all

Attributes

Companion: class
Supertypes: trait Product

trait Mirror

class Object

trait Matchable

class Any
Self type: SourceTrackedGraph.type

Result of validating an extracted graph against an ExtractionSchema.

Value parameters

outOfSchemaEdges: Edges whose relationship does not match any schema relationship type
outOfSchemaNodes: Nodes whose label does not match any schema entity type
validGraph: Graph containing only nodes/edges that conform to the schema
violations: Specific constraint violations found during validation

Attributes

Supertypes: trait Serializable

trait Product

trait Equals

class Object

trait Matchable

class Any
Show all

org.llm4s.knowledgegraph.extraction

Members list

Type members

Classlikes

Value parameters

Attributes

Value parameters

Attributes

Value parameters

Attributes

Value parameters

Attributes

Value parameters

Attributes

Value parameters

Attributes

Attributes

Value parameters

Attributes

Value parameters

Attributes

Value parameters

Attributes

Value parameters

Attributes

Value parameters

Attributes

Value parameters

Attributes

Value parameters

Attributes

Value parameters

Attributes

Attributes

Value parameters

Attributes