RAGASEvaluator

org.llm4s.rag.evaluation.RAGASEvaluator
See theRAGASEvaluator companion object
class RAGASEvaluator(llmClient: LLMClient, embeddingClient: EmbeddingClient, embeddingModelConfig: EmbeddingModelConfig, metrics: Seq[RAGASMetric], options: EvaluatorOptions, tracer: Option[Tracing])

Main RAGAS evaluator that orchestrates all metrics.

RAGAS (Retrieval Augmented Generation Assessment) evaluates RAG pipelines across four dimensions:

  • Faithfulness: Are claims in the answer supported by context?
  • Answer Relevancy: Does the answer address the question?
  • Context Precision: Are relevant docs ranked at the top?
  • Context Recall: Were all relevant docs retrieved?

The composite RAGAS score is the mean of all evaluated metric scores.

Value parameters

embeddingClient

Embedding client for similarity calculations

embeddingModelConfig

Configuration for the embedding model

llmClient

LLM client for semantic evaluation (claim verification, relevance)

metrics

Custom metrics to use (defaults to all four RAGAS metrics)

options

Evaluation options (parallelism, timeouts)

tracer

Optional tracer for cost tracking

Attributes

Example
{
val evaluator = RAGASEvaluator(llmClient, embeddingClient, embeddingConfig)
val sample = EvalSample(
 question = "What is the capital of France?",
 answer = "Paris is the capital of France.",
 contexts = Seq("Paris is the capital and largest city of France."),
 groundTruth = Some("The capital of France is Paris.")
)
val result = evaluator.evaluate(sample)
result match {
 case Right(eval) =>
   println(s"RAGAS Score: $${eval.ragasScore}")
   eval.metrics.foreach { m =>
     println(s"  $${m.metricName}: $${m.score}")
   }
 case Left(error) =>
   println(s"Evaluation failed: $${error.message}")
}

}

Companion
object
Graph
Supertypes
class Object
trait Matchable
class Any

Members list

Value members

Concrete methods

Evaluate a single sample against all applicable metrics.

Evaluate a single sample against all applicable metrics.

Only metrics whose required inputs are present in the sample will be evaluated. For example, Context Precision requires ground_truth, so it will be skipped if the sample doesn't have ground_truth.

Value parameters

sample

The evaluation sample

Attributes

Returns

Evaluation result with all metric scores and composite RAGAS score

Evaluate multiple samples.

Evaluate multiple samples.

Value parameters

samples

The evaluation samples

Attributes

Returns

Summary with individual results and aggregate statistics

Evaluate from a test dataset.

Evaluate from a test dataset.

Value parameters

dataset

The test dataset containing samples

Attributes

Returns

Summary with individual results and aggregate statistics

def evaluateMetric(sample: EvalSample, metricName: String): Result[MetricResult]

Evaluate a single metric on a sample.

Evaluate a single metric on a sample.

Useful for debugging or when only one metric is needed.

Value parameters

metricName

The name of the metric to evaluate

sample

The evaluation sample

Attributes

Returns

The metric result or an error

Get the list of active metrics.

Get the list of active metrics.

Attributes

def withMetrics(metricNames: Set[String]): RAGASEvaluator

Create a new evaluator with only specific metrics enabled.

Create a new evaluator with only specific metrics enabled.

Value parameters

metricNames

The names of metrics to enable

Attributes

Returns

A new evaluator with only the specified metrics

Create a new evaluator with different options.

Create a new evaluator with different options.

Value parameters

newOptions

The new evaluation options

Attributes

Returns

A new evaluator with the specified options

Create a new evaluator with tracing enabled.

Create a new evaluator with tracing enabled.

Value parameters

newTracer

The tracer to use for cost tracking

Attributes

Returns

A new evaluator with tracing enabled