llm4s-core/org.llm4s/org.llm4s.rag/org.llm4s.rag.benchmark/DatasetManager

DatasetManager

org.llm4s.rag.benchmark.DatasetManager

See theDatasetManager companion object

class DatasetManager

Manages loading and processing of benchmark datasets.

Supports multiple dataset formats:

RAGBench (Hugging Face JSONL format)
MultiHop-RAG (JSON format)
Custom JSON format (TestDataset format)

Attributes

Example

val manager = DatasetManager()
// Load RAGBench dataset
val dataset = manager.loadRAGBench("data/datasets/ragbench/test.jsonl")
// Load with subset
val subset = manager.loadRAGBench("data/datasets/ragbench/test.jsonl", Some(100))

Companion

object

Graph

Supertypes

class Object

trait Matchable

class Any

Members list

Value members

Concrete methods

Load a dataset from a JSON/JSONL file, auto-detecting format.

Value parameters

path: Path to the dataset file
seed: Random seed for subset selection
subsetSize: Optional limit on samples to load

Attributes

Returns: Loaded dataset or error

Load documents from a directory for indexing.

Value parameters

extensions: File extensions to include (default: .txt, .md)
path: Directory path

Attributes

Returns: Sequence of (filename, content) pairs or error

Load MultiHop-RAG dataset.

MultiHop-RAG format: { "data": [ { "question": "...", "answer": "...", "supporting_facts": [...] } ] }

Value parameters

path: Path to JSON file

Attributes

Returns: TestDataset or error

Load RAGBench dataset (Hugging Face JSONL format).

RAGBench format: { "question": "...", "response": "...", "documents": ["...", "..."], "answer": "..." // ground truth }

Value parameters

path: Path to JSONL file

Attributes

Returns: TestDataset or error

In this article

Generated with