org.llm4s.rag.benchmark
Members list
Type members
Classlikes
Report generator for benchmark results.
Report generator for benchmark results.
Supports multiple output formats:
- Console: Formatted text for terminal display
- JSON: Machine-readable format for processing
- Markdown: Documentation-ready format
Attributes
- Supertypes
-
class Objecttrait Matchableclass Any
- Self type
-
BenchmarkReport.type
Results from running a complete benchmark suite.
Results from running a complete benchmark suite.
Value parameters
- endTime
-
When the benchmark completed
- results
-
Results for each experiment
- startTime
-
When the benchmark started
- suite
-
The benchmark suite that was run
Attributes
- Companion
- object
- Supertypes
-
trait Serializabletrait Producttrait Equalsclass Objecttrait Matchableclass AnyShow all
Attributes
- Companion
- class
- Supertypes
-
trait Producttrait Mirrorclass Objecttrait Matchableclass Any
- Self type
-
BenchmarkResults.type
Main execution engine for RAG benchmarks.
Main execution engine for RAG benchmarks.
Orchestrates the full benchmark workflow:
- Load dataset
- For each experiment configuration: a. Create RAG pipeline with config b. Index documents c. Run queries and generate answers d. Evaluate with RAGAS metrics e. Collect timing and results
- Aggregate results and generate reports
Value parameters
- datasetManager
-
Dataset loading manager
- embeddingClient
-
Default embedding client
- llmClient
-
LLM client for answer generation and evaluation
- options
-
Runner configuration options
Attributes
- Example
-
val runner = BenchmarkRunner(llmClient, embeddingClient, resolveEmbeddingProvider) val suite = BenchmarkSuite.chunkingSuite("data/datasets/ragbench/test.jsonl") val results = runner.runSuite(suite) println(BenchmarkReport.console(results)) - Companion
- object
- Supertypes
-
class Objecttrait Matchableclass Any
Attributes
- Companion
- class
- Supertypes
-
class Objecttrait Matchableclass Any
- Self type
-
BenchmarkRunner.type
Configuration options for the benchmark runner.
Configuration options for the benchmark runner.
Value parameters
- outputDir
-
Directory for saving results
- parallelExperiments
-
Run experiments in parallel (not yet implemented)
- saveIntermediateResults
-
Save results after each experiment
- verbose
-
Enable verbose logging
Attributes
- Supertypes
-
trait Serializabletrait Producttrait Equalsclass Objecttrait Matchableclass AnyShow all
A suite of benchmark experiments to run together.
A suite of benchmark experiments to run together.
Groups related experiments for systematic comparison. Provides pre-built suites for common comparison scenarios.
Value parameters
- datasetPath
-
Path to the evaluation dataset JSON file
- description
-
What this suite tests
- experiments
-
The experiments in this suite
- name
-
Suite identifier
- seed
-
Random seed for reproducible sample selection
- subsetSize
-
Optional limit on samples to evaluate (for quick tests)
Attributes
- Example
-
val suite = BenchmarkSuite.chunkingSuite("data/datasets/ragbench/test.json") val results = runner.runSuite(suite) - Companion
- object
- Supertypes
-
trait Serializabletrait Producttrait Equalsclass Objecttrait Matchableclass AnyShow all
Attributes
- Companion
- class
- Supertypes
-
trait Producttrait Mirrorclass Objecttrait Matchableclass Any
- Self type
-
BenchmarkSuite.type
Supported dataset formats.
Supported dataset formats.
Attributes
- Companion
- object
- Supertypes
-
class Objecttrait Matchableclass Any
- Known subtypes
Attributes
- Companion
- trait
- Supertypes
-
trait Sumtrait Mirrorclass Objecttrait Matchableclass Any
- Self type
-
DatasetFormat.type
Manages loading and processing of benchmark datasets.
Manages loading and processing of benchmark datasets.
Supports multiple dataset formats:
- RAGBench (Hugging Face JSONL format)
- MultiHop-RAG (JSON format)
- Custom JSON format (TestDataset format)
Attributes
- Example
-
val manager = DatasetManager() // Load RAGBench dataset val dataset = manager.loadRAGBench("data/datasets/ragbench/test.jsonl") // Load with subset val subset = manager.loadRAGBench("data/datasets/ragbench/test.jsonl", Some(100)) - Companion
- object
- Supertypes
-
class Objecttrait Matchableclass Any
Attributes
- Companion
- class
- Supertypes
-
class Objecttrait Matchableclass Any
- Self type
-
DatasetManager.type
Configuration for embedding provider in benchmark experiments.
Configuration for embedding provider in benchmark experiments.
Supports multiple embedding providers for comparison testing:
- OpenAI (text-embedding-3-small, text-embedding-3-large)
- Voyage AI (voyage-3, voyage-code-3)
- Ollama (nomic-embed-text, mxbai-embed-large)
Attributes
- Companion
- object
- Supertypes
-
class Objecttrait Matchableclass Any
- Known subtypes
Attributes
- Companion
- trait
- Supertypes
-
trait Sumtrait Mirrorclass Objecttrait Matchableclass Any
- Self type
-
EmbeddingConfig.type
Comparison between two experiment results.
Comparison between two experiment results.
Value parameters
- baseline
-
The baseline experiment
- comparison
-
The experiment being compared
Attributes
- Supertypes
-
trait Serializabletrait Producttrait Equalsclass Objecttrait Matchableclass AnyShow all
Result of a single experiment run.
Result of a single experiment run.
Value parameters
- chunkCount
-
Number of chunks created
- config
-
The experiment configuration used
- documentCount
-
Number of documents indexed
- error
-
Optional error if experiment failed
- evalSummary
-
RAGAS evaluation summary with all metric scores
- metadata
-
Additional experiment metadata
- queryCount
-
Number of queries evaluated
- timings
-
Timing breakdown by phase
Attributes
- Companion
- object
- Supertypes
-
trait Serializabletrait Producttrait Equalsclass Objecttrait Matchableclass AnyShow all
Attributes
- Companion
- class
- Supertypes
-
trait Producttrait Mirrorclass Objecttrait Matchableclass Any
- Self type
-
ExperimentResult.type
Options for ground truth generation.
Options for ground truth generation.
Value parameters
- maxTokens
-
Max tokens for generation
- temperature
-
LLM temperature (lower = more deterministic)
Attributes
- Supertypes
-
trait Serializabletrait Producttrait Equalsclass Objecttrait Matchableclass AnyShow all
Generates ground truth evaluation datasets from documents using LLM.
Generates ground truth evaluation datasets from documents using LLM.
Creates question-answer pairs with context that can be used for RAGAS evaluation. Supports multiple generation strategies for different testing scenarios.
Value parameters
- llmClient
-
LLM client for generation
- options
-
Generation options
Attributes
- Example
-
val generator = GroundTruthGenerator(llmClient) // Generate from documents val dataset = generator.generateFromDocuments( documents = Seq(doc1, doc2, doc3), questionsPerDoc = 5, datasetName = "my-test-set" ) // Save for later use TestDataset.save(dataset, "data/generated/my-test-set.json") - Companion
- object
- Supertypes
-
class Objecttrait Matchableclass Any
Attributes
- Companion
- class
- Supertypes
-
class Objecttrait Matchableclass Any
- Self type
-
GroundTruthGenerator.type
Result from RAG pipeline answer generation.
Result from RAG pipeline answer generation.
Value parameters
- answer
-
The generated answer
- contexts
-
Retrieved context chunks
- question
-
The original question
- searchResults
-
Full search results with scores
Attributes
- Supertypes
-
trait Serializabletrait Producttrait Equalsclass Objecttrait Matchableclass AnyShow all
Configuration for a single RAG experiment.
Configuration for a single RAG experiment.
Defines all parameters that can vary between experiments:
- Chunking strategy and parameters
- Embedding provider and model
- Search fusion strategy
- Retrieval settings
Value parameters
- chunkingConfig
-
Parameters for chunking (size, overlap, etc.)
- chunkingStrategy
-
Which chunker to use (Simple, Sentence, Markdown, Semantic)
- description
-
Human-readable description
- embeddingConfig
-
Embedding provider configuration
- fusionStrategy
-
How to combine vector and keyword search results
- name
-
Unique identifier for this experiment
- rerankTopK
-
Number of candidates for reranking (if enabled)
- topK
-
Number of chunks to retrieve
- useReranker
-
Whether to apply cross-encoder reranking
Attributes
- Example
-
val config = RAGExperimentConfig( name = "sentence-rrf60", description = "Sentence chunking with RRF fusion", chunkingStrategy = ChunkerFactory.Strategy.Sentence, fusionStrategy = FusionStrategy.RRF(60), topK = 5 ) - Companion
- object
- Supertypes
-
trait Serializabletrait Producttrait Equalsclass Objecttrait Matchableclass AnyShow all
Attributes
- Companion
- class
- Supertypes
-
trait Producttrait Mirrorclass Objecttrait Matchableclass Any
- Self type
-
RAGExperimentConfig.type
A configurable RAG pipeline for benchmark experiments.
A configurable RAG pipeline for benchmark experiments.
Wraps all RAG components (chunker, embeddings, vector store, keyword index) and provides a unified interface for indexing documents and answering queries.
Value parameters
- chunker
-
Document chunker based on config strategy
- config
-
Experiment configuration
- embeddingClient
-
Embedding client for vectorization
- embeddingModelConfig
-
Model config for embedding requests
- hybridSearcher
-
Hybrid search with vector + keyword fusion
- llmClient
-
LLM client for answer generation
- tracer
-
Optional tracer for cost tracking
Attributes
- Companion
- object
- Supertypes
-
class Objecttrait Matchableclass Any
Attributes
- Companion
- class
- Supertypes
-
class Objecttrait Matchableclass Any
- Self type
-
RAGPipeline.type
Attributes
- Companion
- trait
- Supertypes
-
trait Sumtrait Mirrorclass Objecttrait Matchableclass Any
- Self type
-
ReportFormat.type
Timing information for a benchmark phase.
Timing information for a benchmark phase.
Value parameters
- durationMs
-
Duration in milliseconds
- itemCount
-
Number of items processed
- phase
-
Name of the phase (e.g., "indexing", "search", "evaluation")
Attributes
- Companion
- object
- Supertypes
-
trait Serializabletrait Producttrait Equalsclass Objecttrait Matchableclass AnyShow all
Attributes
- Companion
- class
- Supertypes
-
trait Producttrait Mirrorclass Objecttrait Matchableclass Any
- Self type
-
TimingInfo.type