DatasetManager

org.llm4s.rag.benchmark.DatasetManager
See theDatasetManager companion object

Manages loading and processing of benchmark datasets.

Supports multiple dataset formats:

  • RAGBench (Hugging Face JSONL format)
  • MultiHop-RAG (JSON format)
  • Custom JSON format (TestDataset format)

Attributes

Example
val manager = DatasetManager()
// Load RAGBench dataset
val dataset = manager.loadRAGBench("data/datasets/ragbench/test.jsonl")
// Load with subset
val subset = manager.loadRAGBench("data/datasets/ragbench/test.jsonl", Some(100))
Companion
object
Graph
Supertypes
class Object
trait Matchable
class Any

Members list

Value members

Concrete methods

def load(path: String, subsetSize: Option[Int], seed: Long): Result[TestDataset]

Load a dataset from a JSON/JSONL file, auto-detecting format.

Load a dataset from a JSON/JSONL file, auto-detecting format.

Value parameters

path

Path to the dataset file

seed

Random seed for subset selection

subsetSize

Optional limit on samples to load

Attributes

Returns

Loaded dataset or error

def loadDocumentsFromDirectory(path: String, extensions: Set[String]): Result[Seq[(String, String)]]

Load documents from a directory for indexing.

Load documents from a directory for indexing.

Value parameters

extensions

File extensions to include (default: .txt, .md)

path

Directory path

Attributes

Returns

Sequence of (filename, content) pairs or error

def loadMultiHopRAG(path: String): Result[TestDataset]

Load MultiHop-RAG dataset.

Load MultiHop-RAG dataset.

MultiHop-RAG format: { "data": [ { "question": "...", "answer": "...", "supporting_facts": [...] } ] }

Value parameters

path

Path to JSON file

Attributes

Returns

TestDataset or error

def loadRAGBench(path: String): Result[TestDataset]

Load RAGBench dataset (Hugging Face JSONL format).

Load RAGBench dataset (Hugging Face JSONL format).

RAGBench format: { "question": "...", "response": "...", "documents": ["...", "..."], "answer": "..." // ground truth }

Value parameters

path

Path to JSONL file

Attributes

Returns

TestDataset or error