org.llm4s.rag.evaluation

Verification result for a single claim in the Faithfulness metric.

Value parameters

claim: The extracted claim from the answer
evidence: Optional evidence from context that supports/refutes the claim
supported: Whether the claim is supported by the context

Attributes

Supertypes: trait Serializable

trait Product

trait Equals

class Object

trait Matchable

class Any
Show all

Complete evaluation result for a single sample.

Value parameters

evaluatedAt: Timestamp of evaluation
metrics: Results from each evaluated metric
ragasScore: Composite RAGAS score (mean of all metric scores)
sample: The evaluated sample

Attributes

Supertypes: trait Serializable

trait Product

trait Equals

class Object

trait Matchable

class Any
Show all

A single evaluation sample containing all inputs needed for RAGAS metrics.

Value parameters

answer: The generated answer from the RAG system
contexts: The retrieved context documents used to generate the answer
groundTruth: Optional ground truth answer (required for precision/recall metrics)
metadata: Additional metadata for tracking/filtering
question: The user's query

Attributes

Supertypes: trait Serializable

trait Product

trait Equals

class Object

trait Matchable

class Any
Show all

Summary of batch evaluation across multiple samples.

Value parameters

averages: Average score per metric across all samples
overallRagasScore: Average RAGAS score across all samples
results: Individual results for each sample
sampleCount: Number of samples evaluated

Attributes

Supertypes: trait Serializable

trait Product

trait Equals

class Object

trait Matchable

class Any
Show all

Error type for evaluation failures.

Attributes

Companion: object
Supertypes: trait LLMError

trait Serializable

trait Product

trait Equals

class Object

trait Matchable

class Any
Show all

Attributes

Companion: class
Supertypes: trait Product

trait Mirror

class Object

trait Matchable

class Any
Self type: EvaluationError.type

Configuration options for the RAGAS evaluator.

Value parameters

maxConcurrency: Maximum concurrent metric evaluations
parallelEvaluation: Whether to evaluate metrics in parallel
timeoutMs: Timeout per metric evaluation in milliseconds

Attributes

Supertypes: trait Serializable

trait Product

trait Equals

class Object

trait Matchable

class Any
Show all

Result of evaluating a single metric.

Value parameters

details: Metric-specific breakdown (e.g., individual claim scores)
metricName: Unique identifier of the metric (e.g., "faithfulness")
score: Score between 0.0 (worst) and 1.0 (best)

Attributes

Supertypes: trait Serializable

trait Product

trait Equals

class Object

trait Matchable

class Any
Show all

Main RAGAS evaluator that orchestrates all metrics.

RAGAS (Retrieval Augmented Generation Assessment) evaluates RAG pipelines across four dimensions:

Faithfulness: Are claims in the answer supported by context?
Answer Relevancy: Does the answer address the question?
Context Precision: Are relevant docs ranked at the top?
Context Recall: Were all relevant docs retrieved?

The composite RAGAS score is the mean of all evaluated metric scores.

Value parameters

embeddingClient: Embedding client for similarity calculations
embeddingModelConfig: Configuration for the embedding model
llmClient: LLM client for semantic evaluation (claim verification, relevance)
metrics: Custom metrics to use (defaults to all four RAGAS metrics)
options: Evaluation options (parallelism, timeouts)
tracer: Optional tracer for cost tracking

Attributes

Example

{
val evaluator = RAGASEvaluator(llmClient, embeddingClient, embeddingConfig)
val sample = EvalSample(
 question = "What is the capital of France?",
 answer = "Paris is the capital of France.",
 contexts = Seq("Paris is the capital and largest city of France."),
 groundTruth = Some("The capital of France is Paris.")
)
val result = evaluator.evaluate(sample)
result match {
 case Right(eval) =>
   println(s"RAGAS Score: $${eval.ragasScore}")
   eval.metrics.foreach { m =>
     println(s"  $${m.metricName}: $${m.score}")
   }
 case Left(error) =>
   println(s"Evaluation failed: $${error.message}")
}

}

Companion

object

Supertypes

class Object

trait Matchable

class Any

Attributes

Companion: class
Supertypes: class Object

trait Matchable

class Any
Self type: RAGASEvaluator.type

Factory for creating RAGAS evaluators and individual metrics.

Provides convenient methods to create evaluators from environment configuration or with specific settings.

Attributes

Example

{
// Create from environment
val evaluator = RAGASFactory.fromConfigs(providerCfg, embeddingCfg)
// Create with specific metrics
val basicEvaluator = RAGASFactory.withMetrics(
 llmClient, embeddingClient, embeddingConfig,
 Set("faithfulness", "answer_relevancy")
)
// Create individual metrics
val faithfulness = RAGASFactory.faithfulness(llmClient)

}

Supertypes

class Object

trait Matchable

class Any

Self type

RAGASFactory.type

Observer that logs RAGAS evaluation results to Langfuse.

Integrates with existing Langfuse tracing infrastructure to log:

Individual metric scores
Composite RAGAS scores
Evaluation details and metadata

Value parameters

environment: Environment name (e.g., "production", "development")
langfuseUrl: The Langfuse API URL
publicKey: Langfuse public key
release: Release version
secretKey: Langfuse secret key
version: API version

Attributes

Example

{
val observer = RAGASLangfuseObserver.fromTracingSettings(tracingSettings)
val result = evaluator.evaluate(sample)
result.foreach { evalResult =>
 observer.logEvaluation(evalResult)
}

}

Companion

object

Supertypes

class Object

trait Matchable

class Any

Attributes

Companion: class
Supertypes: class Object

trait Matchable

class Any
Self type: RAGASLangfuseObserver.type

Base trait for RAGAS evaluation metrics.

Each metric evaluates a specific aspect of RAG quality and returns a score between 0.0 (worst) and 1.0 (best).

Implementations should:

Use LLM calls for semantic evaluation (faithfulness, relevancy)
Use embeddings for similarity calculations (answer relevancy)
Return detailed breakdowns in the MetricResult.details map

Attributes

Example

val faithfulness = new Faithfulness(llmClient)
val result = faithfulness.evaluate(sample)
result match {
 case Right(r) => println(s"Faithfulness: $${r.score}")
 case Left(e) => println(s"Error: $${e.message}")
}

Supertypes

class Object

trait Matchable

class Any

Known subtypes

class AnswerRelevancy

class ContextPrecision

class ContextRecall

class Faithfulness

Enumeration of possible required inputs for RAGAS metrics.

Attributes

Companion: object
Supertypes: class Object

trait Matchable

class Any
Known subtypes: object Answer

object Contexts

object GroundTruth

object Question

Attributes

Companion: trait
Supertypes: trait Sum

trait Mirror

class Object

trait Matchable

class Any
Self type: RequiredInput.type

Test dataset for RAG evaluation.

Supports loading from JSON files and generating synthetic test cases from documents using LLM.

Value parameters

metadata: Additional metadata for tracking/filtering
name: Name identifier for this dataset
samples: The evaluation samples

Attributes

Example

{
// Load from file
val dataset = TestDataset.fromJsonFile("test_cases.json")
// Generate synthetic test cases
val generated = TestDataset.generateFromDocuments(
 documents = Seq("Paris is the capital of France...", "Tokyo is the capital of Japan..."),
 llmClient = client,
 samplesPerDoc = 3
)
// Save to file
TestDataset.save(dataset, "output.json")

}

Companion

object

Supertypes

trait Serializable

trait Product

trait Equals

class Object

trait Matchable

class Any

Show all

Attributes

Companion: class
Supertypes: trait Product

trait Mirror

class Object

trait Matchable

class Any
Self type: TestDataset.type

org.llm4s.rag.evaluation

Members list

Packages

Type members

Classlikes

Value parameters

Attributes

Value parameters

Attributes

Value parameters

Attributes

Value parameters

Attributes

Attributes

Attributes

Value parameters

Attributes

Value parameters

Attributes

Value parameters

Attributes

Attributes

Attributes

Value parameters

Attributes

Attributes

Attributes

Attributes

Attributes

Value parameters

Attributes

Attributes