org.llm4s.knowledgegraph.extraction

Members list

Type members

Classlikes

class CoreferenceResolver(llmClient: LLMClient)

Resolves coreferences within a document before knowledge graph extraction.

Resolves coreferences within a document before knowledge graph extraction.

Uses an LLM to replace pronouns and indirect references (e.g., "he", "the company", "its founder") with explicit entity names. This pre-processing step reduces duplicate nodes that would otherwise be created when the extractor encounters unresolved references.

Value parameters

llmClient

The LLM client to use for coreference resolution

Attributes

Supertypes
class Object
trait Matchable
class Any
case class DocumentSource(id: String, title: String, metadata: Map[String, String])

Represents a source document from which knowledge graph entities were extracted.

Represents a source document from which knowledge graph entities were extracted.

Value parameters

id

Unique identifier for the document

metadata

Additional metadata about the document (e.g., author, date, URL)

title

Human-readable title or filename

Attributes

Supertypes
trait Serializable
trait Product
trait Equals
class Object
trait Matchable
class Any
Show all
class EntityLinker(llmClient: Option[LLMClient])

Links and deduplicates entities across a knowledge graph.

Links and deduplicates entities across a knowledge graph.

Performs two passes:

  1. Deterministic same-name merging: nodes with the same label and normalized name property are merged, with edges rewritten to point at the surviving node.
  2. LLM-assisted disambiguation (optional): ambiguous clusters (e.g., "Jobs" vs "Steve Jobs") are sent to the LLM to confirm or reject merges.

Value parameters

llmClient

Optional LLM client for disambiguation. If None, only deterministic merging is performed.

Attributes

Supertypes
class Object
trait Matchable
class Any
case class EntityTypeDefinition(name: String, description: String, properties: Seq[PropertyDefinition])

Defines an entity type expected in the extraction schema.

Defines an entity type expected in the extraction schema.

Value parameters

description

Optional description used as a hint for the LLM

name

The entity type name (e.g., "Person", "Organization")

properties

Properties expected on entities of this type

Attributes

Supertypes
trait Serializable
trait Product
trait Equals
class Object
trait Matchable
class Any
Show all
case class ExtractionConfig(schema: Option[ExtractionSchema], enableCoreference: Boolean, enableEntityLinking: Boolean, llmDisambiguation: Boolean)

Configuration for multi-document graph extraction.

Configuration for multi-document graph extraction.

Value parameters

enableCoreference

Whether to run coreference resolution on each document before extraction

enableEntityLinking

Whether to run entity linking (deduplication) after merging documents

llmDisambiguation

Whether to use LLM-assisted disambiguation during entity linking

schema

Optional schema to constrain extraction. When absent, free-form extraction is used.

Attributes

Supertypes
trait Serializable
trait Product
trait Equals
class Object
trait Matchable
class Any
Show all
case class ExtractionSchema(entityTypes: Seq[EntityTypeDefinition], relationshipTypes: Seq[RelationshipTypeDefinition], allowOutOfSchema: Boolean)

Schema for guided knowledge graph extraction.

Schema for guided knowledge graph extraction.

When provided to an extractor, the LLM prompt is constrained to these types and extracted results are validated against the schema. If allowOutOfSchema is true, entities and relationships outside the schema are preserved but flagged.

Value parameters

allowOutOfSchema

If true, out-of-schema entities/relationships are kept; if false, they are dropped

entityTypes

Expected entity type definitions

relationshipTypes

Expected relationship type definitions

Attributes

Companion
object
Supertypes
trait Serializable
trait Product
trait Equals
class Object
trait Matchable
class Any
Show all

Attributes

Companion
class
Supertypes
trait Product
trait Mirror
class Object
trait Matchable
class Any
Self type
class KnowledgeGraphGenerator(llmClient: LLMClient, graphStore: GraphStore)

Generates a Knowledge Graph from unstructured text using an LLM and writes it to a GraphStore.

Generates a Knowledge Graph from unstructured text using an LLM and writes it to a GraphStore.

Value parameters

graphStore

The graph store to write extracted entities and relationships to

llmClient

The LLM client to use for extraction

Attributes

Supertypes
class Object
trait Matchable
class Any

Orchestrates multi-document knowledge graph extraction.

Orchestrates multi-document knowledge graph extraction.

Composes the extraction pipeline:

  1. Per document: coreference resolution → schema-guided (or free-form) extraction → schema validation
  2. After all documents: merge graphs → entity linking → final SourceTrackedGraph

Supports incremental building: pass an existing SourceTrackedGraph to add new documents without re-extracting from previously processed ones.

Value parameters

config

Configuration controlling which pipeline stages are enabled

llmClient

The LLM client used by all pipeline components

Attributes

Supertypes
class Object
trait Matchable
class Any
case class PropertyDefinition(name: String, description: String, required: Boolean)

Defines a property expected on an entity type.

Defines a property expected on an entity type.

Value parameters

description

Optional description used as a hint for the LLM

name

The property name (e.g., "role", "department")

required

Whether this property must be present on extracted entities of the parent type

Attributes

Supertypes
trait Serializable
trait Product
trait Equals
class Object
trait Matchable
class Any
Show all
case class RelationshipTypeDefinition(name: String, description: String, sourceTypes: Seq[String], targetTypes: Seq[String])

Defines a relationship type expected in the extraction schema.

Defines a relationship type expected in the extraction schema.

Value parameters

description

Optional description used as a hint for the LLM

name

The relationship type name (e.g., "WORKS_FOR", "LOCATED_IN")

sourceTypes

Valid source entity types for this relationship (empty means any)

targetTypes

Valid target entity types for this relationship (empty means any)

Attributes

Supertypes
trait Serializable
trait Product
trait Equals
class Object
trait Matchable
class Any
Show all
class SchemaGuidedExtractor(llmClient: LLMClient)

Extracts a Knowledge Graph from text using schema-constrained LLM prompts.

Extracts a Knowledge Graph from text using schema-constrained LLM prompts.

Unlike the free-form KnowledgeGraphGenerator, this extractor guides the LLM by listing the allowed entity types, relationship types, and expected properties from an ExtractionSchema. The LLM output is still parsed as JSON but the prompt strongly constrains what types are produced.

Value parameters

llmClient

The LLM client to use for extraction

Attributes

Supertypes
class Object
trait Matchable
class Any

Validates an extracted graph against an ExtractionSchema.

Validates an extracted graph against an ExtractionSchema.

Separates nodes and edges into conforming (valid) and non-conforming (out-of-schema) sets, and reports specific constraint violations such as relationship endpoint type mismatches.

Value parameters

schema

The extraction schema to validate against

Attributes

Supertypes
class Object
trait Matchable
class Any
case class SchemaViolation(description: String, entityId: Option[String], violationType: String)

Represents a violation found during schema validation.

Represents a violation found during schema validation.

Value parameters

description

Human-readable description of the violation

entityId

Optional ID of the entity involved

violationType

Category of violation (e.g., "out_of_schema_entity", "invalid_relationship_endpoint")

Attributes

Supertypes
trait Serializable
trait Product
trait Equals
class Object
trait Matchable
class Any
Show all
case class SourceTrackedGraph(graph: Graph, sources: Seq[DocumentSource], nodeSources: Map[String, Set[String]], edgeSources: Map[(String, String, String), Set[String]])

A graph with source provenance tracking.

A graph with source provenance tracking.

Wraps a standard Graph with metadata about which documents contributed each node and edge. The underlying Graph remains a pure data structure; provenance is tracked externally.

Value parameters

edgeSources

Mapping of (source, target, relationship) to the set of source document IDs

graph

The underlying knowledge graph

nodeSources

Mapping of node ID to the set of source document IDs that contributed it

sources

All document sources that contributed to this graph

Attributes

Companion
object
Supertypes
trait Serializable
trait Product
trait Equals
class Object
trait Matchable
class Any
Show all

Attributes

Companion
class
Supertypes
trait Product
trait Mirror
class Object
trait Matchable
class Any
Self type
case class ValidationResult(validGraph: Graph, outOfSchemaNodes: List[Node], outOfSchemaEdges: List[Edge], violations: List[SchemaViolation])

Result of validating an extracted graph against an ExtractionSchema.

Result of validating an extracted graph against an ExtractionSchema.

Value parameters

outOfSchemaEdges

Edges whose relationship does not match any schema relationship type

outOfSchemaNodes

Nodes whose label does not match any schema entity type

validGraph

Graph containing only nodes/edges that conform to the schema

violations

Specific constraint violations found during validation

Attributes

Supertypes
trait Serializable
trait Product
trait Equals
class Object
trait Matchable
class Any
Show all