org.llm4s.rag.evaluation
Members list
Packages
Type members
Classlikes
Verification result for a single claim in the Faithfulness metric.
Verification result for a single claim in the Faithfulness metric.
Value parameters
- claim
-
The extracted claim from the answer
- evidence
-
Optional evidence from context that supports/refutes the claim
- supported
-
Whether the claim is supported by the context
Attributes
- Supertypes
-
trait Serializabletrait Producttrait Equalsclass Objecttrait Matchableclass AnyShow all
Complete evaluation result for a single sample.
Complete evaluation result for a single sample.
Value parameters
- evaluatedAt
-
Timestamp of evaluation
- metrics
-
Results from each evaluated metric
- ragasScore
-
Composite RAGAS score (mean of all metric scores)
- sample
-
The evaluated sample
Attributes
- Supertypes
-
trait Serializabletrait Producttrait Equalsclass Objecttrait Matchableclass AnyShow all
A single evaluation sample containing all inputs needed for RAGAS metrics.
A single evaluation sample containing all inputs needed for RAGAS metrics.
Value parameters
- answer
-
The generated answer from the RAG system
- contexts
-
The retrieved context documents used to generate the answer
- groundTruth
-
Optional ground truth answer (required for precision/recall metrics)
- metadata
-
Additional metadata for tracking/filtering
- question
-
The user's query
Attributes
- Supertypes
-
trait Serializabletrait Producttrait Equalsclass Objecttrait Matchableclass AnyShow all
Summary of batch evaluation across multiple samples.
Summary of batch evaluation across multiple samples.
Value parameters
- averages
-
Average score per metric across all samples
- overallRagasScore
-
Average RAGAS score across all samples
- results
-
Individual results for each sample
- sampleCount
-
Number of samples evaluated
Attributes
- Supertypes
-
trait Serializabletrait Producttrait Equalsclass Objecttrait Matchableclass AnyShow all
Attributes
- Companion
- class
- Supertypes
-
trait Producttrait Mirrorclass Objecttrait Matchableclass Any
- Self type
-
EvaluationError.type
Configuration options for the RAGAS evaluator.
Configuration options for the RAGAS evaluator.
Value parameters
- maxConcurrency
-
Maximum concurrent metric evaluations
- parallelEvaluation
-
Whether to evaluate metrics in parallel
- timeoutMs
-
Timeout per metric evaluation in milliseconds
Attributes
- Supertypes
-
trait Serializabletrait Producttrait Equalsclass Objecttrait Matchableclass AnyShow all
Result of evaluating a single metric.
Result of evaluating a single metric.
Value parameters
- details
-
Metric-specific breakdown (e.g., individual claim scores)
- metricName
-
Unique identifier of the metric (e.g., "faithfulness")
- score
-
Score between 0.0 (worst) and 1.0 (best)
Attributes
- Supertypes
-
trait Serializabletrait Producttrait Equalsclass Objecttrait Matchableclass AnyShow all
Main RAGAS evaluator that orchestrates all metrics.
Main RAGAS evaluator that orchestrates all metrics.
RAGAS (Retrieval Augmented Generation Assessment) evaluates RAG pipelines across four dimensions:
- Faithfulness: Are claims in the answer supported by context?
- Answer Relevancy: Does the answer address the question?
- Context Precision: Are relevant docs ranked at the top?
- Context Recall: Were all relevant docs retrieved?
The composite RAGAS score is the mean of all evaluated metric scores.
Value parameters
- embeddingClient
-
Embedding client for similarity calculations
- embeddingModelConfig
-
Configuration for the embedding model
- llmClient
-
LLM client for semantic evaluation (claim verification, relevance)
- metrics
-
Custom metrics to use (defaults to all four RAGAS metrics)
- options
-
Evaluation options (parallelism, timeouts)
- tracer
-
Optional tracer for cost tracking
Attributes
- Example
-
{ val evaluator = RAGASEvaluator(llmClient, embeddingClient, embeddingConfig) val sample = EvalSample( question = "What is the capital of France?", answer = "Paris is the capital of France.", contexts = Seq("Paris is the capital and largest city of France."), groundTruth = Some("The capital of France is Paris.") ) val result = evaluator.evaluate(sample) result match { case Right(eval) => println(s"RAGAS Score: $${eval.ragasScore}") eval.metrics.foreach { m => println(s" $${m.metricName}: $${m.score}") } case Left(error) => println(s"Evaluation failed: $${error.message}") }}
- Companion
- object
- Supertypes
-
class Objecttrait Matchableclass Any
Attributes
- Companion
- class
- Supertypes
-
class Objecttrait Matchableclass Any
- Self type
-
RAGASEvaluator.type
Factory for creating RAGAS evaluators and individual metrics.
Factory for creating RAGAS evaluators and individual metrics.
Provides convenient methods to create evaluators from environment configuration or with specific settings.
Attributes
- Example
-
{ // Create from environment val evaluator = RAGASFactory.fromConfigs(providerCfg, embeddingCfg) // Create with specific metrics val basicEvaluator = RAGASFactory.withMetrics( llmClient, embeddingClient, embeddingConfig, Set("faithfulness", "answer_relevancy") ) // Create individual metrics val faithfulness = RAGASFactory.faithfulness(llmClient)}
- Supertypes
-
class Objecttrait Matchableclass Any
- Self type
-
RAGASFactory.type
Observer that logs RAGAS evaluation results to Langfuse.
Observer that logs RAGAS evaluation results to Langfuse.
Integrates with existing Langfuse tracing infrastructure to log:
- Individual metric scores
- Composite RAGAS scores
- Evaluation details and metadata
Value parameters
- environment
-
Environment name (e.g., "production", "development")
- langfuseUrl
-
The Langfuse API URL
- publicKey
-
Langfuse public key
- release
-
Release version
- secretKey
-
Langfuse secret key
- version
-
API version
Attributes
- Example
-
{ val observer = RAGASLangfuseObserver.fromTracingSettings(tracingSettings) val result = evaluator.evaluate(sample) result.foreach { evalResult => observer.logEvaluation(evalResult) }}
- Companion
- object
- Supertypes
-
class Objecttrait Matchableclass Any
Attributes
- Companion
- class
- Supertypes
-
class Objecttrait Matchableclass Any
- Self type
Base trait for RAGAS evaluation metrics.
Base trait for RAGAS evaluation metrics.
Each metric evaluates a specific aspect of RAG quality and returns a score between 0.0 (worst) and 1.0 (best).
Implementations should:
- Use LLM calls for semantic evaluation (faithfulness, relevancy)
- Use embeddings for similarity calculations (answer relevancy)
- Return detailed breakdowns in the MetricResult.details map
Attributes
- Example
-
val faithfulness = new Faithfulness(llmClient) val result = faithfulness.evaluate(sample) result match { case Right(r) => println(s"Faithfulness: $${r.score}") case Left(e) => println(s"Error: $${e.message}") } - Supertypes
-
class Objecttrait Matchableclass Any
- Known subtypes
Enumeration of possible required inputs for RAGAS metrics.
Enumeration of possible required inputs for RAGAS metrics.
Attributes
- Companion
- object
- Supertypes
-
class Objecttrait Matchableclass Any
- Known subtypes
Attributes
- Companion
- trait
- Supertypes
-
trait Sumtrait Mirrorclass Objecttrait Matchableclass Any
- Self type
-
RequiredInput.type
Test dataset for RAG evaluation.
Test dataset for RAG evaluation.
Supports loading from JSON files and generating synthetic test cases from documents using LLM.
Value parameters
- metadata
-
Additional metadata for tracking/filtering
- name
-
Name identifier for this dataset
- samples
-
The evaluation samples
Attributes
- Example
-
{ // Load from file val dataset = TestDataset.fromJsonFile("test_cases.json") // Generate synthetic test cases val generated = TestDataset.generateFromDocuments( documents = Seq("Paris is the capital of France...", "Tokyo is the capital of Japan..."), llmClient = client, samplesPerDoc = 3 ) // Save to file TestDataset.save(dataset, "output.json")}
- Companion
- object
- Supertypes
-
trait Serializabletrait Producttrait Equalsclass Objecttrait Matchableclass AnyShow all
Attributes
- Companion
- class
- Supertypes
-
trait Producttrait Mirrorclass Objecttrait Matchableclass Any
- Self type
-
TestDataset.type