RAGASMetric

org.llm4s.rag.evaluation.RAGASMetric
trait RAGASMetric

Base trait for RAGAS evaluation metrics.

Each metric evaluates a specific aspect of RAG quality and returns a score between 0.0 (worst) and 1.0 (best).

Implementations should:

  • Use LLM calls for semantic evaluation (faithfulness, relevancy)
  • Use embeddings for similarity calculations (answer relevancy)
  • Return detailed breakdowns in the MetricResult.details map

Attributes

Example
val faithfulness = new Faithfulness(llmClient)
val result = faithfulness.evaluate(sample)
result match {
 case Right(r) => println(s"Faithfulness: $${r.score}")
 case Left(e) => println(s"Error: $${e.message}")
}
Graph
Supertypes
class Object
trait Matchable
class Any
Known subtypes

Members list

Value members

Abstract methods

def description: String

Human-readable description of what this metric measures.

Human-readable description of what this metric measures.

Attributes

Evaluate a single sample.

Evaluate a single sample.

Value parameters

sample

The evaluation sample containing question, answer, contexts

Attributes

Returns

Score between 0.0 and 1.0, with optional details

def name: String

Unique name of this metric (e.g., "faithfulness", "answer_relevancy"). Used as an identifier in results and configuration.

Unique name of this metric (e.g., "faithfulness", "answer_relevancy"). Used as an identifier in results and configuration.

Attributes

Which inputs this metric requires from an EvalSample. Used to skip metrics when required inputs are missing.

Which inputs this metric requires from an EvalSample. Used to skip metrics when required inputs are missing.

Attributes

Concrete methods

def canEvaluate(sample: EvalSample): Boolean

Check if this metric can be evaluated for a given sample.

Check if this metric can be evaluated for a given sample.

Attributes

def evaluateBatch(samples: Seq[EvalSample]): Result[Seq[MetricResult]]

Evaluate multiple samples.

Evaluate multiple samples.

Default implementation evaluates sequentially. Override for batch optimizations (e.g., batched LLM calls).

Value parameters

samples

The evaluation samples

Attributes

Returns

Results for each sample in order