llm4s-core/org.llm4s/org.llm4s.rag/org.llm4s.rag.evaluation/RAGASEvaluator

RAGASEvaluator

org.llm4s.rag.evaluation.RAGASEvaluator

See theRAGASEvaluator companion object

class RAGASEvaluator(llmClient: LLMClient, embeddingClient: EmbeddingClient, embeddingModelConfig: EmbeddingModelConfig, metrics: Seq[RAGASMetric], options: EvaluatorOptions, tracer: Option[Tracing])

Main RAGAS evaluator that orchestrates all metrics.

RAGAS (Retrieval Augmented Generation Assessment) evaluates RAG pipelines across four dimensions:

Faithfulness: Are claims in the answer supported by context?
Answer Relevancy: Does the answer address the question?
Context Precision: Are relevant docs ranked at the top?
Context Recall: Were all relevant docs retrieved?

The composite RAGAS score is the mean of all evaluated metric scores.

Value parameters

embeddingClient: Embedding client for similarity calculations
embeddingModelConfig: Configuration for the embedding model
llmClient: LLM client for semantic evaluation (claim verification, relevance)
metrics: Custom metrics to use (defaults to all four RAGAS metrics)
options: Evaluation options (parallelism, timeouts)
tracer: Optional tracer for cost tracking

Attributes

Example

{
val evaluator = RAGASEvaluator(llmClient, embeddingClient, embeddingConfig)
val sample = EvalSample(
 question = "What is the capital of France?",
 answer = "Paris is the capital of France.",
 contexts = Seq("Paris is the capital and largest city of France."),
 groundTruth = Some("The capital of France is Paris.")
)
val result = evaluator.evaluate(sample)
result match {
 case Right(eval) =>
   println(s"RAGAS Score: $${eval.ragasScore}")
   eval.metrics.foreach { m =>
     println(s"  $${m.metricName}: $${m.score}")
   }
 case Left(error) =>
   println(s"Evaluation failed: $${error.message}")
}

}

Companion

object

Graph

Supertypes

class Object

trait Matchable

class Any

Members list

Value members

Concrete methods

Evaluate a single sample against all applicable metrics.

Only metrics whose required inputs are present in the sample will be evaluated. For example, Context Precision requires ground_truth, so it will be skipped if the sample doesn't have ground_truth.

Value parameters

sample: The evaluation sample

Attributes

Returns: Evaluation result with all metric scores and composite RAGAS score

Evaluate multiple samples.

Value parameters

samples: The evaluation samples

Attributes

Returns: Summary with individual results and aggregate statistics

Evaluate from a test dataset.

Value parameters

dataset: The test dataset containing samples

Attributes

Returns: Summary with individual results and aggregate statistics

Evaluate a single metric on a sample.

Useful for debugging or when only one metric is needed.

Value parameters

metricName: The name of the metric to evaluate
sample: The evaluation sample

Attributes

Returns: The metric result or an error

Get the list of active metrics.

Attributes

Create a new evaluator with only specific metrics enabled.

Value parameters

metricNames: The names of metrics to enable

Attributes

Returns: A new evaluator with only the specified metrics

Create a new evaluator with different options.

Value parameters

newOptions: The new evaluation options

Attributes

Returns: A new evaluator with the specified options

Create a new evaluator with tracing enabled.

Value parameters

newTracer: The tracer to use for cost tracking

Attributes

Returns: A new evaluator with tracing enabled

In this article

Generated with