BenchmarkRunner

org.llm4s.rag.benchmark.BenchmarkRunner
See theBenchmarkRunner companion object
class BenchmarkRunner(llmClient: LLMClient, embeddingClient: EmbeddingClient, resolveEmbeddingProvider: String => Result[EmbeddingProviderConfig], datasetManager: DatasetManager, val options: BenchmarkRunnerOptions)

Main execution engine for RAG benchmarks.

Orchestrates the full benchmark workflow:

  1. Load dataset
  2. For each experiment configuration: a. Create RAG pipeline with config b. Index documents c. Run queries and generate answers d. Evaluate with RAGAS metrics e. Collect timing and results
  3. Aggregate results and generate reports

Value parameters

datasetManager

Dataset loading manager

embeddingClient

Default embedding client

llmClient

LLM client for answer generation and evaluation

options

Runner configuration options

Attributes

Example
val runner = BenchmarkRunner(llmClient, embeddingClient, resolveEmbeddingProvider)
val suite = BenchmarkSuite.chunkingSuite("data/datasets/ragbench/test.jsonl")
val results = runner.runSuite(suite)
println(BenchmarkReport.console(results))
Companion
object
Graph
Supertypes
class Object
trait Matchable
class Any

Members list

Value members

Concrete methods

def compareConfigs(config1: RAGExperimentConfig, config2: RAGExperimentConfig, datasetPath: String, sampleCount: Option[Int]): Result[ExperimentComparison]

Compare two configurations head-to-head.

Compare two configurations head-to-head.

Value parameters

config1

First configuration

config2

Second configuration

datasetPath

Path to dataset

sampleCount

Number of samples (optional)

Attributes

Returns

Comparison result

def quickTest(config: RAGExperimentConfig, datasetPath: String, sampleCount: Int): Result[ExperimentResult]

Run a quick validation with minimal samples.

Run a quick validation with minimal samples.

Value parameters

config

Experiment configuration

datasetPath

Path to dataset

sampleCount

Number of samples to test

Attributes

Returns

Experiment result

Run a single experiment.

Run a single experiment.

Value parameters

config

Experiment configuration

dataset

Evaluation dataset

Attributes

Returns

Experiment result

Run a complete benchmark suite.

Run a complete benchmark suite.

Value parameters

suite

The benchmark suite to run

Attributes

Returns

Aggregated results for all experiments

Concrete fields