Minimal algebra for collecting metrics about LLM operations.
The core metrics model tracks request latency, request outcome, token usage, estimated cost, retry attempts, circuit-breaker transitions, generic errors, and image generation usage. Concrete implementations decide where these observations are stored or exported.
Implementations should be safe: failures must not propagate to callers. All methods should catch and log errors internally without throwing.
Example usage:
val startNanos = System.nanoTime()
client.complete(conversation) match {
case Right(completion) =>
val duration = FiniteDuration(System.nanoTime() - startNanos, NANOSECONDS)
metrics.observeRequest(provider, model, Outcome.Success, duration)
completion.usage.foreach { u =>
metrics.addTokens(provider, model, u.promptTokens, u.completionTokens)
}
case Left(error) =>
val duration = FiniteDuration(System.nanoTime() - startNanos, NANOSECONDS)
val errorKind = ErrorKind.fromLLMError(error)
metrics.observeRequest(provider, model, Outcome.Error(errorKind), duration)
}
Model name (e.g., "gpt-4o", "claude-3-5-sonnet-latest")
outcome
Success or Error with stable error kind
provider
Provider name (e.g., "openai", "anthropic", "ollama")
Attributes
def recordCost(provider: String, model: String, costUsd: Double): Unit
Record estimated cost in USD.
Record estimated cost in USD.
Use this after pricing metadata is available for a request or image generation operation. Implementations should treat the value as an additive counter.