org.llm4s.knowledgegraph.extraction
Members list
Type members
Classlikes
Resolves coreferences within a document before knowledge graph extraction.
Resolves coreferences within a document before knowledge graph extraction.
Uses an LLM to replace pronouns and indirect references (e.g., "he", "the company", "its founder") with explicit entity names. This pre-processing step reduces duplicate nodes that would otherwise be created when the extractor encounters unresolved references.
Value parameters
- llmClient
-
The LLM client to use for coreference resolution
Attributes
- Supertypes
-
class Objecttrait Matchableclass Any
Represents a source document from which knowledge graph entities were extracted.
Represents a source document from which knowledge graph entities were extracted.
Value parameters
- id
-
Unique identifier for the document
- metadata
-
Additional metadata about the document (e.g., author, date, URL)
- title
-
Human-readable title or filename
Attributes
- Supertypes
-
trait Serializabletrait Producttrait Equalsclass Objecttrait Matchableclass AnyShow all
Links and deduplicates entities across a knowledge graph.
Links and deduplicates entities across a knowledge graph.
Performs two passes:
- Deterministic same-name merging: nodes with the same label and normalized name property are merged, with edges rewritten to point at the surviving node.
- LLM-assisted disambiguation (optional): ambiguous clusters (e.g., "Jobs" vs "Steve Jobs") are sent to the LLM to confirm or reject merges.
Value parameters
- llmClient
-
Optional LLM client for disambiguation. If None, only deterministic merging is performed.
Attributes
- Supertypes
-
class Objecttrait Matchableclass Any
Defines an entity type expected in the extraction schema.
Defines an entity type expected in the extraction schema.
Value parameters
- description
-
Optional description used as a hint for the LLM
- name
-
The entity type name (e.g., "Person", "Organization")
- properties
-
Properties expected on entities of this type
Attributes
- Supertypes
-
trait Serializabletrait Producttrait Equalsclass Objecttrait Matchableclass AnyShow all
Configuration for multi-document graph extraction.
Configuration for multi-document graph extraction.
Value parameters
- enableCoreference
-
Whether to run coreference resolution on each document before extraction
- enableEntityLinking
-
Whether to run entity linking (deduplication) after merging documents
- llmDisambiguation
-
Whether to use LLM-assisted disambiguation during entity linking
- schema
-
Optional schema to constrain extraction. When absent, free-form extraction is used.
Attributes
- Supertypes
-
trait Serializabletrait Producttrait Equalsclass Objecttrait Matchableclass AnyShow all
Schema for guided knowledge graph extraction.
Schema for guided knowledge graph extraction.
When provided to an extractor, the LLM prompt is constrained to these types and extracted results are validated against the schema. If allowOutOfSchema is true, entities and relationships outside the schema are preserved but flagged.
Value parameters
- allowOutOfSchema
-
If true, out-of-schema entities/relationships are kept; if false, they are dropped
- entityTypes
-
Expected entity type definitions
- relationshipTypes
-
Expected relationship type definitions
Attributes
- Companion
- object
- Supertypes
-
trait Serializabletrait Producttrait Equalsclass Objecttrait Matchableclass AnyShow all
Attributes
- Companion
- class
- Supertypes
-
trait Producttrait Mirrorclass Objecttrait Matchableclass Any
- Self type
-
ExtractionSchema.type
Generates a Knowledge Graph from unstructured text using an LLM and writes it to a GraphStore.
Generates a Knowledge Graph from unstructured text using an LLM and writes it to a GraphStore.
Value parameters
- graphStore
-
The graph store to write extracted entities and relationships to
- llmClient
-
The LLM client to use for extraction
Attributes
- Supertypes
-
class Objecttrait Matchableclass Any
Orchestrates multi-document knowledge graph extraction.
Orchestrates multi-document knowledge graph extraction.
Composes the extraction pipeline:
- Per document: coreference resolution → schema-guided (or free-form) extraction → schema validation
- After all documents: merge graphs → entity linking → final SourceTrackedGraph
Supports incremental building: pass an existing SourceTrackedGraph to add new documents without re-extracting from previously processed ones.
Value parameters
- config
-
Configuration controlling which pipeline stages are enabled
- llmClient
-
The LLM client used by all pipeline components
Attributes
- Supertypes
-
class Objecttrait Matchableclass Any
Defines a property expected on an entity type.
Defines a property expected on an entity type.
Value parameters
- description
-
Optional description used as a hint for the LLM
- name
-
The property name (e.g., "role", "department")
- required
-
Whether this property must be present on extracted entities of the parent type
Attributes
- Supertypes
-
trait Serializabletrait Producttrait Equalsclass Objecttrait Matchableclass AnyShow all
Defines a relationship type expected in the extraction schema.
Defines a relationship type expected in the extraction schema.
Value parameters
- description
-
Optional description used as a hint for the LLM
- name
-
The relationship type name (e.g., "WORKS_FOR", "LOCATED_IN")
- sourceTypes
-
Valid source entity types for this relationship (empty means any)
- targetTypes
-
Valid target entity types for this relationship (empty means any)
Attributes
- Supertypes
-
trait Serializabletrait Producttrait Equalsclass Objecttrait Matchableclass AnyShow all
Extracts a Knowledge Graph from text using schema-constrained LLM prompts.
Extracts a Knowledge Graph from text using schema-constrained LLM prompts.
Unlike the free-form KnowledgeGraphGenerator, this extractor guides the LLM by listing the allowed entity types, relationship types, and expected properties from an ExtractionSchema. The LLM output is still parsed as JSON but the prompt strongly constrains what types are produced.
Value parameters
- llmClient
-
The LLM client to use for extraction
Attributes
- Supertypes
-
class Objecttrait Matchableclass Any
Validates an extracted graph against an ExtractionSchema.
Validates an extracted graph against an ExtractionSchema.
Separates nodes and edges into conforming (valid) and non-conforming (out-of-schema) sets, and reports specific constraint violations such as relationship endpoint type mismatches.
Value parameters
- schema
-
The extraction schema to validate against
Attributes
- Supertypes
-
class Objecttrait Matchableclass Any
Represents a violation found during schema validation.
Represents a violation found during schema validation.
Value parameters
- description
-
Human-readable description of the violation
- entityId
-
Optional ID of the entity involved
- violationType
-
Category of violation (e.g., "out_of_schema_entity", "invalid_relationship_endpoint")
Attributes
- Supertypes
-
trait Serializabletrait Producttrait Equalsclass Objecttrait Matchableclass AnyShow all
A graph with source provenance tracking.
A graph with source provenance tracking.
Wraps a standard Graph with metadata about which documents contributed each node and edge. The underlying Graph remains a pure data structure; provenance is tracked externally.
Value parameters
- edgeSources
-
Mapping of (source, target, relationship) to the set of source document IDs
- graph
-
The underlying knowledge graph
- nodeSources
-
Mapping of node ID to the set of source document IDs that contributed it
- sources
-
All document sources that contributed to this graph
Attributes
- Companion
- object
- Supertypes
-
trait Serializabletrait Producttrait Equalsclass Objecttrait Matchableclass AnyShow all
Attributes
- Companion
- class
- Supertypes
-
trait Producttrait Mirrorclass Objecttrait Matchableclass Any
- Self type
-
SourceTrackedGraph.type
Result of validating an extracted graph against an ExtractionSchema.
Result of validating an extracted graph against an ExtractionSchema.
Value parameters
- outOfSchemaEdges
-
Edges whose relationship does not match any schema relationship type
- outOfSchemaNodes
-
Nodes whose label does not match any schema entity type
- validGraph
-
Graph containing only nodes/edges that conform to the schema
- violations
-
Specific constraint violations found during validation
Attributes
- Supertypes
-
trait Serializabletrait Producttrait Equalsclass Objecttrait Matchableclass AnyShow all