org.llm4s.rag.benchmark

Members list

Type members

Classlikes

Report generator for benchmark results.

Report generator for benchmark results.

Supports multiple output formats:

  • Console: Formatted text for terminal display
  • JSON: Machine-readable format for processing
  • Markdown: Documentation-ready format

Attributes

Supertypes
class Object
trait Matchable
class Any
Self type
final case class BenchmarkResults(suite: BenchmarkSuite, results: Seq[ExperimentResult], startTime: Long, endTime: Long)

Results from running a complete benchmark suite.

Results from running a complete benchmark suite.

Value parameters

endTime

When the benchmark completed

results

Results for each experiment

startTime

When the benchmark started

suite

The benchmark suite that was run

Attributes

Companion
object
Supertypes
trait Serializable
trait Product
trait Equals
class Object
trait Matchable
class Any
Show all

Attributes

Companion
class
Supertypes
trait Product
trait Mirror
class Object
trait Matchable
class Any
Self type
class BenchmarkRunner(llmClient: LLMClient, embeddingClient: EmbeddingClient, resolveEmbeddingProvider: String => Result[EmbeddingProviderConfig], datasetManager: DatasetManager, val options: BenchmarkRunnerOptions)

Main execution engine for RAG benchmarks.

Main execution engine for RAG benchmarks.

Orchestrates the full benchmark workflow:

  1. Load dataset
  2. For each experiment configuration: a. Create RAG pipeline with config b. Index documents c. Run queries and generate answers d. Evaluate with RAGAS metrics e. Collect timing and results
  3. Aggregate results and generate reports

Value parameters

datasetManager

Dataset loading manager

embeddingClient

Default embedding client

llmClient

LLM client for answer generation and evaluation

options

Runner configuration options

Attributes

Example
val runner = BenchmarkRunner(llmClient, embeddingClient, resolveEmbeddingProvider)
val suite = BenchmarkSuite.chunkingSuite("data/datasets/ragbench/test.jsonl")
val results = runner.runSuite(suite)
println(BenchmarkReport.console(results))
Companion
object
Supertypes
class Object
trait Matchable
class Any

Attributes

Companion
class
Supertypes
class Object
trait Matchable
class Any
Self type
final case class BenchmarkRunnerOptions(verbose: Boolean, parallelExperiments: Boolean, saveIntermediateResults: Boolean, outputDir: String)

Configuration options for the benchmark runner.

Configuration options for the benchmark runner.

Value parameters

outputDir

Directory for saving results

parallelExperiments

Run experiments in parallel (not yet implemented)

saveIntermediateResults

Save results after each experiment

verbose

Enable verbose logging

Attributes

Supertypes
trait Serializable
trait Product
trait Equals
class Object
trait Matchable
class Any
Show all
final case class BenchmarkSuite(name: String, description: String, experiments: Seq[RAGExperimentConfig], datasetPath: String, subsetSize: Option[Int], seed: Long)

A suite of benchmark experiments to run together.

A suite of benchmark experiments to run together.

Groups related experiments for systematic comparison. Provides pre-built suites for common comparison scenarios.

Value parameters

datasetPath

Path to the evaluation dataset JSON file

description

What this suite tests

experiments

The experiments in this suite

name

Suite identifier

seed

Random seed for reproducible sample selection

subsetSize

Optional limit on samples to evaluate (for quick tests)

Attributes

Example
val suite = BenchmarkSuite.chunkingSuite("data/datasets/ragbench/test.json")
val results = runner.runSuite(suite)
Companion
object
Supertypes
trait Serializable
trait Product
trait Equals
class Object
trait Matchable
class Any
Show all

Attributes

Companion
class
Supertypes
trait Product
trait Mirror
class Object
trait Matchable
class Any
Self type
sealed trait DatasetFormat

Supported dataset formats.

Supported dataset formats.

Attributes

Companion
object
Supertypes
class Object
trait Matchable
class Any
Known subtypes
object MultiHopRAG
object RAGBench
object TestDataset
object Unknown
object DatasetFormat

Attributes

Companion
trait
Supertypes
trait Sum
trait Mirror
class Object
trait Matchable
class Any
Self type

Manages loading and processing of benchmark datasets.

Manages loading and processing of benchmark datasets.

Supports multiple dataset formats:

  • RAGBench (Hugging Face JSONL format)
  • MultiHop-RAG (JSON format)
  • Custom JSON format (TestDataset format)

Attributes

Example
val manager = DatasetManager()
// Load RAGBench dataset
val dataset = manager.loadRAGBench("data/datasets/ragbench/test.jsonl")
// Load with subset
val subset = manager.loadRAGBench("data/datasets/ragbench/test.jsonl", Some(100))
Companion
object
Supertypes
class Object
trait Matchable
class Any

Attributes

Companion
class
Supertypes
class Object
trait Matchable
class Any
Self type
sealed trait EmbeddingConfig

Configuration for embedding provider in benchmark experiments.

Configuration for embedding provider in benchmark experiments.

Supports multiple embedding providers for comparison testing:

  • OpenAI (text-embedding-3-small, text-embedding-3-large)
  • Voyage AI (voyage-3, voyage-code-3)
  • Ollama (nomic-embed-text, mxbai-embed-large)

Attributes

Companion
object
Supertypes
class Object
trait Matchable
class Any
Known subtypes
class Ollama
class OpenAI
class Voyage

Attributes

Companion
trait
Supertypes
trait Sum
trait Mirror
class Object
trait Matchable
class Any
Self type
final case class ExperimentComparison(baseline: ExperimentResult, comparison: ExperimentResult)

Comparison between two experiment results.

Comparison between two experiment results.

Value parameters

baseline

The baseline experiment

comparison

The experiment being compared

Attributes

Supertypes
trait Serializable
trait Product
trait Equals
class Object
trait Matchable
class Any
Show all
final case class ExperimentResult(config: RAGExperimentConfig, evalSummary: Option[EvalSummary], timings: Seq[TimingInfo], documentCount: Int, chunkCount: Int, queryCount: Int, metadata: Map[String, String], error: Option[String])

Result of a single experiment run.

Result of a single experiment run.

Value parameters

chunkCount

Number of chunks created

config

The experiment configuration used

documentCount

Number of documents indexed

error

Optional error if experiment failed

evalSummary

RAGAS evaluation summary with all metric scores

metadata

Additional experiment metadata

queryCount

Number of queries evaluated

timings

Timing breakdown by phase

Attributes

Companion
object
Supertypes
trait Serializable
trait Product
trait Equals
class Object
trait Matchable
class Any
Show all

Attributes

Companion
class
Supertypes
trait Product
trait Mirror
class Object
trait Matchable
class Any
Self type
final case class GeneratorOptions(temperature: Double, maxTokens: Int)

Options for ground truth generation.

Options for ground truth generation.

Value parameters

maxTokens

Max tokens for generation

temperature

LLM temperature (lower = more deterministic)

Attributes

Supertypes
trait Serializable
trait Product
trait Equals
class Object
trait Matchable
class Any
Show all
class GroundTruthGenerator(llmClient: LLMClient, options: GeneratorOptions)

Generates ground truth evaluation datasets from documents using LLM.

Generates ground truth evaluation datasets from documents using LLM.

Creates question-answer pairs with context that can be used for RAGAS evaluation. Supports multiple generation strategies for different testing scenarios.

Value parameters

llmClient

LLM client for generation

options

Generation options

Attributes

Example
val generator = GroundTruthGenerator(llmClient)
// Generate from documents
val dataset = generator.generateFromDocuments(
 documents = Seq(doc1, doc2, doc3),
 questionsPerDoc = 5,
 datasetName = "my-test-set"
)
// Save for later use
TestDataset.save(dataset, "data/generated/my-test-set.json")
Companion
object
Supertypes
class Object
trait Matchable
class Any

Attributes

Companion
class
Supertypes
class Object
trait Matchable
class Any
Self type
final case class RAGAnswer(question: String, answer: String, contexts: Seq[String], searchResults: Seq[HybridSearchResult])

Result from RAG pipeline answer generation.

Result from RAG pipeline answer generation.

Value parameters

answer

The generated answer

contexts

Retrieved context chunks

question

The original question

searchResults

Full search results with scores

Attributes

Supertypes
trait Serializable
trait Product
trait Equals
class Object
trait Matchable
class Any
Show all
final case class RAGExperimentConfig(name: String, description: String, chunkingStrategy: Strategy, chunkingConfig: ChunkingConfig, embeddingConfig: EmbeddingConfig, fusionStrategy: FusionStrategy, topK: Int, useReranker: Boolean, rerankTopK: Int)

Configuration for a single RAG experiment.

Configuration for a single RAG experiment.

Defines all parameters that can vary between experiments:

  • Chunking strategy and parameters
  • Embedding provider and model
  • Search fusion strategy
  • Retrieval settings

Value parameters

chunkingConfig

Parameters for chunking (size, overlap, etc.)

chunkingStrategy

Which chunker to use (Simple, Sentence, Markdown, Semantic)

description

Human-readable description

embeddingConfig

Embedding provider configuration

fusionStrategy

How to combine vector and keyword search results

name

Unique identifier for this experiment

rerankTopK

Number of candidates for reranking (if enabled)

topK

Number of chunks to retrieve

useReranker

Whether to apply cross-encoder reranking

Attributes

Example
val config = RAGExperimentConfig(
 name = "sentence-rrf60",
 description = "Sentence chunking with RRF fusion",
 chunkingStrategy = ChunkerFactory.Strategy.Sentence,
 fusionStrategy = FusionStrategy.RRF(60),
 topK = 5
)
Companion
object
Supertypes
trait Serializable
trait Product
trait Equals
class Object
trait Matchable
class Any
Show all

Attributes

Companion
class
Supertypes
trait Product
trait Mirror
class Object
trait Matchable
class Any
Self type
final class RAGPipeline

A configurable RAG pipeline for benchmark experiments.

A configurable RAG pipeline for benchmark experiments.

Wraps all RAG components (chunker, embeddings, vector store, keyword index) and provides a unified interface for indexing documents and answering queries.

Value parameters

chunker

Document chunker based on config strategy

config

Experiment configuration

embeddingClient

Embedding client for vectorization

embeddingModelConfig

Model config for embedding requests

hybridSearcher

Hybrid search with vector + keyword fusion

llmClient

LLM client for answer generation

tracer

Optional tracer for cost tracking

Attributes

Companion
object
Supertypes
class Object
trait Matchable
class Any
object RAGPipeline

Attributes

Companion
class
Supertypes
class Object
trait Matchable
class Any
Self type
sealed trait ReportFormat

Report output format.

Report output format.

Attributes

Companion
object
Supertypes
class Object
trait Matchable
class Any
Known subtypes
object Console
object Json
object Markdown
object ReportFormat

Attributes

Companion
trait
Supertypes
trait Sum
trait Mirror
class Object
trait Matchable
class Any
Self type
final case class TimingInfo(phase: String, durationMs: Long, itemCount: Int)

Timing information for a benchmark phase.

Timing information for a benchmark phase.

Value parameters

durationMs

Duration in milliseconds

itemCount

Number of items processed

phase

Name of the phase (e.g., "indexing", "search", "evaluation")

Attributes

Companion
object
Supertypes
trait Serializable
trait Product
trait Equals
class Object
trait Matchable
class Any
Show all
object TimingInfo

Attributes

Companion
class
Supertypes
trait Product
trait Mirror
class Object
trait Matchable
class Any
Self type
TimingInfo.type