LLMClient implementation for the OpenRouter unified model gateway.
Sends requests to the OpenRouter REST API using the OpenAI-compatible /chat/completions endpoint. Accepts OpenAIConfig — there is no separate OpenRouterConfig; LLMConnect detects OpenRouter by checking whether baseUrl contains "openrouter.ai" and routes accordingly.
== Required headers ==
OpenRouter's usage policy requires two additional headers on every request. This client sends them automatically:
HTTP-Referer: https://github.com/llm4s/llm4s
X-Title: LLM4S
== Reasoning / extended thinking ==
Model type is detected by substring matching on the lower-cased model name:
All other models → reasoning configuration is silently omitted.
The thinking budget is clamped to [1024, maxTokens - 1] for Anthropic models, matching the Anthropic API constraint.
== Thinking content ==
Extended thinking text is extracted from whichever field the model populates: message.thinking, message.reasoning, or choice.thinking (checked in that order).
Value parameters
config
OpenAIConfig whose baseUrl must contain "openrouter.ai"; carries the API key and model name.
metrics
Receives per-call latency and token-usage events. Defaults to MetricsCollector.noop.
Executes a blocking completion request and returns the full response.
Executes a blocking completion request and returns the full response.
Sends the conversation to the LLM and waits for the complete response. Use when you need the entire response at once or when streaming is not required.
Value parameters
conversation
conversation history including system, user, assistant, and tool messages
options
configuration including temperature, max tokens, tools, etc. (default: CompletionOptions())
Attributes
Returns
Right(Completion) with the model's response, or Left(LLMError) on failure
Returns the maximum context window size supported by this model in tokens.
Returns the maximum context window size supported by this model in tokens.
The context window is the total tokens (prompt + completion) the model can process in a single request, including all conversation messages and the generated response.
Attributes
Returns
total context window size in tokens (e.g., 4096, 8192, 128000)
Returns the number of tokens reserved for the model's completion response.
Returns the number of tokens reserved for the model's completion response.
This value is subtracted from the context window when calculating available tokens for prompts. Corresponds to the max_tokens or completion token limit configured for the model.
Executes a streaming completion request, invoking a callback for each chunk as it arrives.
Executes a streaming completion request, invoking a callback for each chunk as it arrives.
Streams the response incrementally, calling onChunk for each token/chunk received. Enables real-time display of responses. Returns the final accumulated completion on success.
Value parameters
conversation
conversation history including system, user, assistant, and tool messages
onChunk
callback invoked for each chunk; called synchronously, avoid blocking operations
options
configuration including temperature, max tokens, tools, etc. (default: CompletionOptions())
Attributes
Returns
Right(Completion) with the complete accumulated response, or Left(LLMError) on failure
Releases resources and closes connections to the LLM provider.
Releases resources and closes connections to the LLM provider.
Call when the client is no longer needed. After calling close(), the client should not be used. Default implementation is a no-op; override if managing resources like connections or thread pools.
Validates client configuration and connectivity to the LLM provider.
Validates client configuration and connectivity to the LLM provider.
May perform checks such as verifying API credentials, testing connectivity, and validating configuration. Default implementation returns success; override for provider-specific validation.
Attributes
Returns
Right(()) if validation succeeds, Left(LLMError) with details on failure
protected def withMetrics[A](provider: String, model: String, operation: => Result[A], extractUsage: A => Option[TokenUsage], extractCost: A => Option[Double]): Result[A]
Executes operation and records metrics for the call.
Executes operation and records metrics for the call.
Latency and outcome (success or classified error) are recorded for every call regardless of result. Token counts and cost are recorded only on success — a Left result emits an org.llm4s.metrics.Outcome.Error event whose kind is derived from the org.llm4s.error.LLMError subtype via ErrorKind.fromLLMError.
Value parameters
extractCost
Extracts the pre-computed cost (USD) from a successful result; return None to skip cost recording.
extractUsage
Extracts prompt/completion token counts from a successful result; return None to skip token recording.
model
Model identifier forwarded to the collector.
operation
The LLM call to time and observe.
provider
Provider label forwarded to the collector (e.g. "openai").