DatasetStore

org.llm4s.eval.dataset.DatasetStore
trait DatasetStore[F[_]]

Algebra for managing labelled evaluation datasets.

The effect type F[_] is left unconstrained so that implementations can range from the trivial cats.Id (synchronous, in-memory) to any async effect (e.g. Future, IO) without requiring cats-effect as a core dependency.

All methods use ujson.Value for both input and output, making the store format-agnostic; callers handle (de)serialisation at their own boundary.

Type parameters

F

effect wrapper (e.g. cats.Id, Future, IO)

Attributes

Graph
Supertypes
class Object
trait Matchable
class Any
Known subtypes

Members list

Value members

Abstract methods

def addExample(datasetId: DatasetId, input: Value, referenceOutput: Option[Value], tags: Set[String], metadata: Map[String, String]): F[ExampleId]

Appends a new Example to the given dataset and returns its ExampleId.

Appends a new Example to the given dataset and returns its ExampleId.

Value parameters

datasetId

target dataset

input

model input value

metadata

arbitrary string annotations

referenceOutput

optional ground-truth output

tags

example-level labels for filtering

Attributes

def create(name: String, description: String, inputSchema: Option[Value], outputSchema: Option[Value], tags: Set[String]): F[DatasetId]

Creates a new dataset and returns its generated DatasetId.

Creates a new dataset and returns its generated DatasetId.

Value parameters

description

purpose of the dataset

inputSchema

optional JSON Schema for input validation

name

human-readable name

outputSchema

optional JSON Schema for output validation

tags

dataset-level labels

Attributes

def createSnapshot(datasetId: DatasetId): F[SnapshotId]

Creates an immutable snapshot of the dataset's current examples.

Creates an immutable snapshot of the dataset's current examples.

The snapshot is unaffected by subsequent calls to addExample or delete.

Attributes

Returns

the SnapshotId of the newly created snapshot

def delete(datasetId: DatasetId): F[Boolean]

Deletes the dataset along with all its examples and snapshots.

Deletes the dataset along with all its examples and snapshots.

Attributes

Returns

true if the dataset existed and was removed, false if it was not found

def exportJsonl(datasetId: DatasetId): F[Iterator[String]]

Exports all examples in the dataset as a JSONL iterator.

Exports all examples in the dataset as a JSONL iterator.

Each element is a compact JSON string produced by JsonlCodec.encode. The iterator reflects the dataset state at the moment this method is called; concurrent mutations are not reflected in the returned iterator.

Attributes

def getDataset(datasetId: DatasetId): F[Option[Dataset[Value, Value]]]

Returns the full Dataset record for the given ID, or None if it does not exist. The returned dataset's examples field reflects the current state of all examples.

Returns the full Dataset record for the given ID, or None if it does not exist. The returned dataset's examples field reflects the current state of all examples.

Attributes

def getExamples(datasetId: DatasetId, selector: ExampleSelector): F[List[Example[Value, Value]]]

Retrieves examples from the dataset according to the given ExampleSelector.

Retrieves examples from the dataset according to the given ExampleSelector.

Value parameters

datasetId

target dataset

selector

All, ByTags, or ByIds — see ExampleSelector

Attributes

def getSnapshot(snapshotId: SnapshotId): F[Option[DatasetSnapshot[Value, Value]]]

Retrieves a previously created DatasetSnapshot, or None if the ID is unknown (e.g. because the originating dataset was deleted).

Retrieves a previously created DatasetSnapshot, or None if the ID is unknown (e.g. because the originating dataset was deleted).

Attributes

def importJsonl(datasetId: DatasetId, lines: Iterator[String]): F[(Int, Int)]

Imports examples from a JSONL stream into the dataset.

Imports examples from a JSONL stream into the dataset.

Each line is decoded with JsonlCodec.decode; lines that fail to parse are silently skipped.

Value parameters

datasetId

target dataset

lines

iterator of raw JSONL strings (one JSON object per element)

Attributes

Returns

(imported, skipped) counts

def listDatasets(): F[List[Dataset[Value, Value]]]

Returns all datasets currently held in the store (examples field included).

Returns all datasets currently held in the store (examples field included).

Attributes