org.llm4s.rag.benchmark.DatasetManager
See theDatasetManager companion object
Manages loading and processing of benchmark datasets.
Supports multiple dataset formats:
- RAGBench (Hugging Face JSONL format)
- MultiHop-RAG (JSON format)
- Custom JSON format (TestDataset format)
Attributes
-
Example
-
val manager = DatasetManager()
// Load RAGBench dataset
val dataset = manager.loadRAGBench("data/datasets/ragbench/test.jsonl")
// Load with subset
val subset = manager.loadRAGBench("data/datasets/ragbench/test.jsonl", Some(100))
-
Companion
-
object
-
Graph
-
-
Supertypes
-
class Object
trait Matchable
class Any
Members list
Load a dataset from a JSON/JSONL file, auto-detecting format.
Load a dataset from a JSON/JSONL file, auto-detecting format.
Value parameters
-
path
-
Path to the dataset file
-
seed
-
Random seed for subset selection
-
subsetSize
-
Optional limit on samples to load
Attributes
-
Returns
-
Loaded dataset or error
Load documents from a directory for indexing.
Load documents from a directory for indexing.
Value parameters
-
extensions
-
File extensions to include (default: .txt, .md)
-
path
-
Directory path
Attributes
-
Returns
-
Sequence of (filename, content) pairs or error
Load MultiHop-RAG dataset.
Load MultiHop-RAG dataset.
MultiHop-RAG format: { "data": [ { "question": "...", "answer": "...", "supporting_facts": [...] } ] }
Value parameters
-
path
-
Path to JSON file
Attributes
-
Returns
-
TestDataset or error
Load RAGBench dataset (Hugging Face JSONL format).
Load RAGBench dataset (Hugging Face JSONL format).
RAGBench format: { "question": "...", "response": "...", "documents": ["...", "..."], "answer": "..." // ground truth }
Value parameters
-
path
-
Path to JSONL file
Attributes
-
Returns
-
TestDataset or error