org.llm4s.rag.benchmark

Report generator for benchmark results.

Supports multiple output formats:

Console: Formatted text for terminal display
JSON: Machine-readable format for processing
Markdown: Documentation-ready format

Attributes

Supertypes: class Object

trait Matchable

class Any
Self type: BenchmarkReport.type

Results from running a complete benchmark suite.

Value parameters

endTime: When the benchmark completed
results: Results for each experiment
startTime: When the benchmark started
suite: The benchmark suite that was run

Attributes

Companion: object
Supertypes: trait Serializable

trait Product

trait Equals

class Object

trait Matchable

class Any
Show all

Attributes

Companion: class
Supertypes: trait Product

trait Mirror

class Object

trait Matchable

class Any
Self type: BenchmarkResults.type

Main execution engine for RAG benchmarks.

Orchestrates the full benchmark workflow:

Load dataset
For each experiment configuration: a. Create RAG pipeline with config b. Index documents c. Run queries and generate answers d. Evaluate with RAGAS metrics e. Collect timing and results
Aggregate results and generate reports

Value parameters

datasetManager: Dataset loading manager
embeddingClient: Default embedding client
llmClient: LLM client for answer generation and evaluation
options: Runner configuration options

Attributes

Example

val runner = BenchmarkRunner(llmClient, embeddingClient, resolveEmbeddingProvider)
val suite = BenchmarkSuite.chunkingSuite("data/datasets/ragbench/test.jsonl")
val results = runner.runSuite(suite)
println(BenchmarkReport.console(results))

Companion

object

Supertypes

class Object

trait Matchable

class Any

Attributes

Companion: class
Supertypes: class Object

trait Matchable

class Any
Self type: BenchmarkRunner.type

Configuration options for the benchmark runner.

Value parameters

outputDir: Directory for saving results
parallelExperiments: Run experiments in parallel (not yet implemented)
saveIntermediateResults: Save results after each experiment
verbose: Enable verbose logging

Attributes

Supertypes: trait Serializable

trait Product

trait Equals

class Object

trait Matchable

class Any
Show all

A suite of benchmark experiments to run together.

Groups related experiments for systematic comparison. Provides pre-built suites for common comparison scenarios.

Value parameters

datasetPath: Path to the evaluation dataset JSON file
description: What this suite tests
experiments: The experiments in this suite
name: Suite identifier
seed: Random seed for reproducible sample selection
subsetSize: Optional limit on samples to evaluate (for quick tests)

Attributes

Example

val suite = BenchmarkSuite.chunkingSuite("data/datasets/ragbench/test.json")
val results = runner.runSuite(suite)

Companion

object

Supertypes

trait Serializable

trait Product

trait Equals

class Object

trait Matchable

class Any

Show all

Attributes

Companion: class
Supertypes: trait Product

trait Mirror

class Object

trait Matchable

class Any
Self type: BenchmarkSuite.type

Supported dataset formats.

Attributes

Companion: object
Supertypes: class Object

trait Matchable

class Any
Known subtypes: object MultiHopRAG

object RAGBench

object TestDataset

object Unknown

Attributes

Companion: trait
Supertypes: trait Sum

trait Mirror

class Object

trait Matchable

class Any
Self type: DatasetFormat.type

Manages loading and processing of benchmark datasets.

Supports multiple dataset formats:

RAGBench (Hugging Face JSONL format)
MultiHop-RAG (JSON format)
Custom JSON format (TestDataset format)

Attributes

Example

val manager = DatasetManager()
// Load RAGBench dataset
val dataset = manager.loadRAGBench("data/datasets/ragbench/test.jsonl")
// Load with subset
val subset = manager.loadRAGBench("data/datasets/ragbench/test.jsonl", Some(100))

Companion

object

Supertypes

class Object

trait Matchable

class Any

Attributes

Companion: class
Supertypes: class Object

trait Matchable

class Any
Self type: DatasetManager.type

Configuration for embedding provider in benchmark experiments.

Supports multiple embedding providers for comparison testing:

OpenAI (text-embedding-3-small, text-embedding-3-large)
Voyage AI (voyage-3, voyage-code-3)
Ollama (nomic-embed-text, mxbai-embed-large)

Attributes

Companion: object
Supertypes: class Object

trait Matchable

class Any
Known subtypes: class Ollama

class OpenAI

class Voyage

Attributes

Companion: trait
Supertypes: trait Sum

trait Mirror

class Object

trait Matchable

class Any
Self type: EmbeddingConfig.type

Comparison between two experiment results.

Value parameters

baseline: The baseline experiment
comparison: The experiment being compared

Attributes

Supertypes: trait Serializable

trait Product

trait Equals

class Object

trait Matchable

class Any
Show all

Result of a single experiment run.

Value parameters

chunkCount: Number of chunks created
config: The experiment configuration used
documentCount: Number of documents indexed
error: Optional error if experiment failed
evalSummary: RAGAS evaluation summary with all metric scores
metadata: Additional experiment metadata
queryCount: Number of queries evaluated
timings: Timing breakdown by phase

Attributes

Companion: object
Supertypes: trait Serializable

trait Product

trait Equals

class Object

trait Matchable

class Any
Show all

Attributes

Companion: class
Supertypes: trait Product

trait Mirror

class Object

trait Matchable

class Any
Self type: ExperimentResult.type

Options for ground truth generation.

Value parameters

maxTokens: Max tokens for generation
temperature: LLM temperature (lower = more deterministic)

Attributes

Supertypes: trait Serializable

trait Product

trait Equals

class Object

trait Matchable

class Any
Show all

Generates ground truth evaluation datasets from documents using LLM.

Creates question-answer pairs with context that can be used for RAGAS evaluation. Supports multiple generation strategies for different testing scenarios.

Value parameters

llmClient: LLM client for generation
options: Generation options

Attributes

Example

val generator = GroundTruthGenerator(llmClient)
// Generate from documents
val dataset = generator.generateFromDocuments(
 documents = Seq(doc1, doc2, doc3),
 questionsPerDoc = 5,
 datasetName = "my-test-set"
)
// Save for later use
TestDataset.save(dataset, "data/generated/my-test-set.json")

Companion

object

Supertypes

class Object

trait Matchable

class Any

Attributes

Companion: class
Supertypes: class Object

trait Matchable

class Any
Self type: GroundTruthGenerator.type

Result from RAG pipeline answer generation.

Value parameters

answer: The generated answer
contexts: Retrieved context chunks
question: The original question
searchResults: Full search results with scores

Attributes

Supertypes: trait Serializable

trait Product

trait Equals

class Object

trait Matchable

class Any
Show all

Configuration for a single RAG experiment.

Defines all parameters that can vary between experiments:

Chunking strategy and parameters
Embedding provider and model
Search fusion strategy
Retrieval settings

Value parameters

chunkingConfig: Parameters for chunking (size, overlap, etc.)
chunkingStrategy: Which chunker to use (Simple, Sentence, Markdown, Semantic)
description: Human-readable description
embeddingConfig: Embedding provider configuration
fusionStrategy: How to combine vector and keyword search results
name: Unique identifier for this experiment
rerankTopK: Number of candidates for reranking (if enabled)
topK: Number of chunks to retrieve
useReranker: Whether to apply cross-encoder reranking

Attributes

Example

val config = RAGExperimentConfig(
 name = "sentence-rrf60",
 description = "Sentence chunking with RRF fusion",
 chunkingStrategy = ChunkerFactory.Strategy.Sentence,
 fusionStrategy = FusionStrategy.RRF(60),
 topK = 5
)

Companion

object

Supertypes

trait Serializable

trait Product

trait Equals

class Object

trait Matchable

class Any

Show all

Attributes

Companion: class
Supertypes: trait Product

trait Mirror

class Object

trait Matchable

class Any
Self type: RAGExperimentConfig.type

A configurable RAG pipeline for benchmark experiments.

Wraps all RAG components (chunker, embeddings, vector store, keyword index) and provides a unified interface for indexing documents and answering queries.

Value parameters

chunker: Document chunker based on config strategy
config: Experiment configuration
embeddingClient: Embedding client for vectorization
embeddingModelConfig: Model config for embedding requests
hybridSearcher: Hybrid search with vector + keyword fusion
llmClient: LLM client for answer generation
tracer: Optional tracer for cost tracking

Attributes

Companion: object
Supertypes: class Object

trait Matchable

class Any

Attributes

Companion: class
Supertypes: class Object

trait Matchable

class Any
Self type: RAGPipeline.type

Report output format.

Attributes

Companion: object
Supertypes: class Object

trait Matchable

class Any
Known subtypes: object Console

object Json

object Markdown

Attributes

Companion: trait
Supertypes: trait Sum

trait Mirror

class Object

trait Matchable

class Any
Self type: ReportFormat.type

Timing information for a benchmark phase.

Value parameters

durationMs: Duration in milliseconds
itemCount: Number of items processed
phase: Name of the phase (e.g., "indexing", "search", "evaluation")

Attributes

Companion: object
Supertypes: trait Serializable

trait Product

trait Equals

class Object

trait Matchable

class Any
Show all

Attributes

Companion: class
Supertypes: trait Product

trait Mirror

class Object

trait Matchable

class Any
Self type: TimingInfo.type

org.llm4s.rag.benchmark

Members list

Type members

Classlikes

Attributes

Value parameters

Attributes

Attributes

Value parameters

Attributes

Attributes

Value parameters

Attributes

Value parameters

Attributes

Attributes

Attributes

Attributes

Attributes

Attributes

Attributes

Attributes

Value parameters

Attributes

Value parameters

Attributes

Attributes

Value parameters

Attributes

Value parameters

Attributes

Attributes

Value parameters

Attributes

Value parameters

Attributes

Attributes

Value parameters

Attributes

Attributes

Attributes

Attributes

Value parameters

Attributes

Attributes