org.llm4s.llmconnect.provider

LLMClient implementation for Anthropic Claude models.

Uses the official Anthropic Java SDK (AnthropicOkHttpClient) for all API calls. SDK exceptions are mapped to the appropriate org.llm4s.error.LLMError subtypes before being returned.

== Message format adaptations ==

The Anthropic Messages API differs from the OpenAI convention in several ways that this client handles transparently:

Default system prompt: if the conversation contains no SystemMessage, the client injects "You are Claude, a helpful AI assistant." automatically. Supply an explicit SystemMessage to override this.
Tool results as user messages: the Anthropic API does not accept native tool-result messages in the same turn structure as OpenAI. ToolMessage values are therefore forwarded as user messages with the prefix "[Tool result for <toolCallId>]: ".
Assistant messages with tool calls are skipped: when an AssistantMessage carries pending tool calls, it is not forwarded — Anthropic infers the assistant turn from the subsequent tool-result user messages.
Schema sanitisation: OpenAI-specific fields (strict, additionalProperties) are stripped from tool schemas before sending, because Anthropic's API rejects them.

== Extended thinking ==

When CompletionOptions.reasoning is set, a thinking block is added to the request. The token budget is clamped to [1024, maxTokens - 1] to satisfy the Anthropic API constraint; the effective budget may therefore differ from what was requested.

maxTokens defaults to 2048 when not set in CompletionOptions because the Anthropic API requires the field.

Value parameters

config: AnthropicConfig carrying the API key, model name, and base URL.
metrics: Receives per-call latency and token-usage events. Defaults to MetricsCollector.noop.

Attributes

Companion: object
Supertypes: trait BaseLifecycleLLMClient

trait MetricsRecording

trait LLMClient

trait AutoCloseable

class Object

trait Matchable

class Any
Show all

Attributes

Companion: class
Supertypes: class Object

trait Matchable

class Any
Self type: AnthropicClient.type

Minimal Cohere provider client (v2 scope).

Supported:

Non-streaming chat completion via Cohere v2 /chat API.

Intentionally not supported in v2:

Streaming
Tool calling
Embeddings
Multimodal inputs

Attributes

Companion: object
Supertypes: trait BaseLifecycleLLMClient

trait MetricsRecording

trait LLMClient

trait AutoCloseable

class Object

trait Matchable

class Any
Show all

Attributes

Companion: class
Supertypes: class Object

trait Matchable

class Any
Self type: CohereClient.type

Centralized cost estimation for LLM completions.

This provides a single source of truth for estimating completion costs based on token usage and model pricing information. It integrates with the ModelRegistryService to look up pricing data and applies it to usage statistics.

The estimator:

Uses existing ModelPricing logic (no duplication)
Returns None if pricing is unavailable
Preserves precision of micro-cost values
Works uniformly across all providers

Example usage:

 val usage = TokenUsage(promptTokens = 100, completionTokens = 50, totalTokens = 150)
 val cost = CostEstimator.estimate("gpt-4o", usage)
 // cost: Some(0.0015) for gpt-4o pricing

Attributes

Supertypes: class Object

trait Matchable

class Any
Self type: CostEstimator.type

DeepSeek LLM client implementation using the OpenAI-compatible API.

Provides access to DeepSeek models including DeepSeek-Chat (V3) with 64K context and DeepSeek-Reasoner (R1) with 128K context for advanced reasoning tasks.

Uses the same request/response format as OpenAI, making it compatible with standard OpenAI tooling and client code patterns.

Value parameters

config: DeepSeek configuration containing API key, model, base URL, and context settings
metrics: MetricsCollector for recording request metrics

Attributes

Companion: object
Supertypes: trait BaseLifecycleLLMClient

trait MetricsRecording

trait LLMClient

trait AutoCloseable

class Object

trait Matchable

class Any
Show all

Attributes

Companion: class
Supertypes: class Object

trait Matchable

class Any
Self type: DeepSeekClient.type

Text embedding provider interface for generating vector representations.

Provides a unified interface for different embedding services (OpenAI, VoyageAI, Ollama). Each implementation handles provider-specific API calls and response formats.

Text content is the primary input; multimedia content (images, audio) should be processed through the UniversalEncoder façade which handles content extraction before embedding.

== Usage Example ==

val provider: EmbeddingProvider = OpenAIEmbeddingProvider.fromConfig(config)
val request = EmbeddingRequest(
 input = Seq("Hello world", "How are you?"),
 model = EmbeddingModelName("text-embedding-3-small")
)
val result: Result[EmbeddingResponse] = provider.embed(request)

Attributes

See also: OpenAIEmbeddingProvider for OpenAI text-embedding models

VoyageAIEmbeddingProvider for VoyageAI embedding models

OllamaEmbeddingProvider for local Ollama embedding models
Supertypes: class Object

trait Matchable

class Any

LLMClient implementation for Google Gemini models.

Calls the Google Generative AI REST API directly using org.llm4s.http.Llm4sHttpClient.

== Message format ==

Gemini uses a different conversation structure from OpenAI:

Roles are "user" and "model" (not "user" and "assistant").
SystemMessage values are sent as a separate systemInstruction field, not inside the contents array.
Tool results (ToolMessage) are sent as functionResponse parts inside a "user" turn, keyed by function name (not tool-call ID). The function name is resolved from an in-request map built while processing the preceding AssistantMessage.

== Tool call IDs ==

The Gemini API does not return an ID with function-call responses. This client generates a random UUID for each tool call so that the llm4s ToolCall / ToolMessage pairing convention is preserved. These IDs are synthetic and are not round-tripped to Gemini.

== Authentication ==

The API key is appended as a ?key= query parameter on every request (Google's API requires this; it is not sent as a header). The full URL is not logged; only the base URL and model are emitted at DEBUG level.

== Schema sanitisation ==

OpenAI-specific fields (strict, additionalProperties) are stripped from tool schemas before sending, because Gemini's API rejects them.

Value parameters

config: GeminiConfig with API key, model, and base URL.
metrics: Receives per-call latency and token-usage events. Defaults to MetricsCollector.noop.

Attributes

Companion: object
Supertypes: trait BaseLifecycleLLMClient

trait MetricsRecording

trait LLMClient

trait AutoCloseable

class Object

trait Matchable

class Any
Show all

Attributes

Companion: class
Supertypes: class Object

trait Matchable

class Any
Self type: GeminiClient.type

Shared HTTP status-code to org.llm4s.error.LLMError mapping used by all HTTP-based LLM provider clients.

Centralises the duplicated pattern of converting non-2xx responses into typed Result errors. Provider-specific error details are extracted from the JSON response body when possible and truncated to a safe length.

Attributes

Supertypes: class Object

trait Matchable

class Any
Self type: HttpErrorMapper.type

Helper trait for recording metrics consistently across all provider clients.

Extracts the common pattern of timing requests, observing outcomes, recording tokens, and reading costs from completion results.

Attributes

Supertypes: class Object

trait Matchable

class Any
Known subtypes: trait BaseLifecycleLLMClient

class AnthropicClient

class CohereClient

class DeepSeekClient

class GeminiClient

class MistralClient

class OllamaClient

class OpenAIClient

class OpenRouterClient

class VertexAIClient

class ZaiClient
Show all

Mistral AI provider client using the OpenAI-compatible chat completions API.

Supported:

Non-streaming chat completion via Mistral /v1/chat/completions API.

Intentionally not supported in v1:

Streaming
Tool calling
Embeddings
Multimodal inputs

Attributes

Companion: object
Supertypes: trait BaseLifecycleLLMClient

trait MetricsRecording

trait LLMClient

trait AutoCloseable

class Object

trait Matchable

class Any
Show all

Attributes

Companion: class
Supertypes: class Object

trait Matchable

class Any
Self type: MistralClient.type

LLMClient implementation for locally-hosted Ollama models.

Connects to an Ollama server via its HTTP chat API (/api/chat). All Ollama-specific protocol details (JSON-lines streaming, token-count field names) are handled internally.

== Tool calling limitation ==

The Ollama chat API does not support tool results in multi-turn conversations in the same way as cloud providers. As a result, ToolMessage values are silently dropped when building the request — only SystemMessage, UserMessage, and AssistantMessage entries are forwarded to the model. Conversations that rely on tool call round-trips should use a different provider.

== Streaming ==

Token counts (prompt_eval_count, eval_count) are only present in the final JSON-lines chunk (done: true). The accumulator updates its count at that point; chunks before the final one report zero tokens.

== Timeouts ==

Non-streaming requests time out after 120 seconds; streaming requests after 600 seconds.

Value parameters

config: Ollama configuration containing the model name and base URL.
metrics: Receives per-call latency and token-usage events. Defaults to MetricsCollector.noop.

Attributes

Companion: object
Supertypes: trait BaseLifecycleLLMClient

trait MetricsRecording

trait LLMClient

trait AutoCloseable

class Object

trait Matchable

class Any
Show all

Attributes

Companion: class
Supertypes: class Object

trait Matchable

class Any
Self type: OllamaClient.type

Embedding provider implementation for Ollama, a local model inference server.

Generates text embeddings by calling the Ollama /api/embeddings HTTP endpoint. Each input text is embedded individually (one HTTP request per text) because the Ollama embedding API accepts a single prompt per call. Results are collected and returned as an org.llm4s.llmconnect.model.EmbeddingResponse.

No API key is required when Ollama runs locally, though one can be supplied for remote or authenticated deployments.

Attributes

Supertypes: class Object

trait Matchable

class Any
Self type: OllamaEmbeddingProvider.type

LLMClient implementation supporting both OpenAI and Azure OpenAI services.

Provides a unified interface for interacting with OpenAI's API and Azure's OpenAI service. Handles message conversion between llm4s format and OpenAI format, completion requests, streaming responses, and tool calling (function calling) capabilities.

Uses Azure's OpenAI client library internally, which supports both direct OpenAI and Azure-hosted OpenAI endpoints.

== Extended Thinking / Reasoning Support ==

For OpenAI o1/o3/o4 models with reasoning capabilities, use OpenRouterClient instead, which fully supports the reasoning_effort parameter. The Azure SDK used by this client does not yet expose the reasoning_effort API parameter.

For Anthropic Claude models with extended thinking, use AnthropicClient which has full support for the thinking parameter with budget_tokens.

Value parameters

client: configured Azure OpenAI client instance
config: provider configuration containing context window and reserve completion settings
metrics: metrics collector for observability (default: noop)
model: the model identifier (e.g., "gpt-4", "gpt-3.5-turbo")

Attributes

Companion: object
Supertypes: trait BaseLifecycleLLMClient

trait MetricsRecording

trait LLMClient

trait AutoCloseable

class Object

trait Matchable

class Any
Show all

Factory methods for creating OpenAIClient instances.

Provides safe construction of OpenAI clients with error handling via Result type.

Attributes

Companion: class
Supertypes: class Object

trait Matchable

class Any
Self type: OpenAIClient.type

OpenAI embedding provider implementation.

Provides text embeddings using OpenAI's embedding API (text-embedding-3-small, text-embedding-3-large, text-embedding-ada-002). Supports batch embedding of multiple texts in a single request.

== Supported Models ==

text-embedding-3-small - Efficient, lower cost (recommended)
text-embedding-3-large - Higher quality, higher cost
text-embedding-ada-002 - Legacy model

== Token Usage == The response includes token usage information when available from the API.

Attributes

See also: EmbeddingProvider for the provider interface

org.llm4s.llmconnect.config.EmbeddingProviderConfig for configuration
Supertypes: class Object

trait Matchable

class Any
Self type: OpenAIEmbeddingProvider.type

LLMClient implementation for the OpenRouter unified model gateway.

Sends requests to the OpenRouter REST API using the OpenAI-compatible /chat/completions endpoint. Accepts OpenAIConfig — there is no separate OpenRouterConfig; LLMConnect detects OpenRouter by checking whether baseUrl contains "openrouter.ai" and routes accordingly.

== Required headers ==

OpenRouter's usage policy requires two additional headers on every request. This client sends them automatically:

HTTP-Referer: https://github.com/llm4s/llm4s
X-Title: LLM4S

== Reasoning / extended thinking ==

Model type is detected by substring matching on the lower-cased model name:

Names containing "claude" or "anthropic" → Anthropic-style thinking object (type: "enabled", budget_tokens).
Names containing "o1", "o3", or "o4" → OpenAI-style reasoning_effort string parameter.
All other models → reasoning configuration is silently omitted.

The thinking budget is clamped to [1024, maxTokens - 1] for Anthropic models, matching the Anthropic API constraint.

== Thinking content ==

Extended thinking text is extracted from whichever field the model populates: message.thinking, message.reasoning, or choice.thinking (checked in that order).

Value parameters

config: OpenAIConfig whose baseUrl must contain "openrouter.ai"; carries the API key and model name.
metrics: Receives per-call latency and token-usage events. Defaults to MetricsCollector.noop.

Attributes

Companion: object
Supertypes: trait BaseLifecycleLLMClient

trait MetricsRecording

trait LLMClient

trait AutoCloseable

class Object

trait Matchable

class Any
Show all

Attributes

Companion: class
Supertypes: class Object

trait Matchable

class Any
Self type: OpenRouterClient.type

Provides Google Cloud OAuth2 access tokens for Vertex AI via ADC.

Token resolution order:

GOOGLE_ACCESS_TOKEN environment variable — direct token; useful for CI and testing.
JSON credential file resolved from credentialFilePath, then GOOGLE_APPLICATION_CREDENTIALS, then ~/.config/gcloud/application_default_credentials.json. Supports both authorized_user (refresh-token flow) and service_account (RS256 JWT assertion flow, no extra dependencies).
GCE / GKE metadata server — Workload Identity path for production deployments.

Tokens are cached until one minute before expiry to minimise round-trips.

== Config-boundary exemption ==

This class reads GOOGLE_ACCESS_TOKEN and GOOGLE_APPLICATION_CREDENTIALS directly (hence the scalafix:off NoSystemGetenv at the top of the file), rather than receiving them through org.llm4s.config.Llm4sConfig like other settings. This is deliberate: Application Default Credentials is a runtime discovery protocol whose inputs (a possibly-rotated access token, the ambient credentials path, and the GCE/GKE metadata server) must be resolved lazily at token-fetch time, not frozen into static config at startup. The reads are funnelled through the injectable envReader so the class stays fully testable without touching the real environment.

Value parameters

credentialFilePath: Optional explicit path to a Google JSON credential file.
envReader: Environment variable reader; injectable for testing.
fileReader: File reader; injectable for testing.
httpClient: HTTP client for token-endpoint calls.

Attributes

Companion: object
Supertypes: class Object

trait Matchable

class Any

Attributes

Companion: class
Supertypes: class Object

trait Matchable

class Any
Self type: VertexAIAuthProvider.type

LLMClient implementation for Google Cloud Vertex AI.

Calls the Vertex AI REST API (aiplatform.googleapis.com) using the same Gemini-compatible JSON format as GeminiClient, but with:

A region-scoped endpoint: https://{location}-aiplatform.googleapis.com/v1/projects/{project}/...
OAuth2 bearer-token auth via VertexAIAuthProvider (ADC) instead of an API-key query param.

== Authentication ==

Tokens are obtained and cached by VertexAIAuthProvider, which supports authorized_user refresh-token credentials (from gcloud auth application-default login), service_account JWT credentials, and the GCE/GKE metadata server (Workload Identity). Set GOOGLE_ACCESS_TOKEN as an escape hatch for testing.

== Message format ==

Identical to GeminiClient:

Roles are "user" and "model".
System messages go into systemInstruction.
Tool results are sent as functionResponse parts keyed by function name.
Synthetic UUIDs are generated for tool-call IDs (Vertex AI does not return them).

Value parameters

config: VertexAIConfig with project, location, model, and credential path.
exchangeLogging: Controls whether raw request/response bodies are recorded.
httpClient: HTTP client (injectable for testing).
metrics: Receives per-call latency and token-usage events.

Attributes

Companion: object
Supertypes: trait BaseLifecycleLLMClient

trait MetricsRecording

trait LLMClient

trait AutoCloseable

class Object

trait Matchable

class Any
Show all

Attributes

Companion: class
Supertypes: class Object

trait Matchable

class Any
Self type: VertexAIClient.type

Embedding provider implementation for the Voyage AI embedding API.

Generates text embeddings by posting batched input to the Voyage AI /v1/embeddings endpoint. Unlike Ollama, Voyage accepts multiple inputs in a single request, so all texts are sent in one HTTP call.

Requires a valid Voyage AI API key in the provider configuration.

Attributes

See also: EmbeddingProvider for the common embedding interface
Supertypes: class Object

trait Matchable

class Any
Self type: VoyageAIEmbeddingProvider.type

LLM client for the Z.ai API.

Z.ai uses an OpenAI-compatible /chat/completions endpoint with one important difference: message content is always an array of typed objects ([{"type":"text","text":"..."}]) rather than a plain string. This applies to user, system, assistant, and tool messages alike. Sending a plain string causes a rejection from the Z.ai API.

Both non-streaming (complete) and streaming (streamComplete) are supported. Tool calling follows the standard OpenAI function-calling format.

Value parameters

config: Z.ai connection configuration (API key, model, base URL, context window)
metrics: records per-call latency and token-usage events; use org.llm4s.metrics.MetricsCollector.noop when metrics are not needed

Attributes

Companion: object
Supertypes: trait BaseLifecycleLLMClient

trait MetricsRecording

trait LLMClient

trait AutoCloseable

class Object

trait Matchable

class Any
Show all

Attributes

Companion: class
Supertypes: class Object

trait Matchable

class Any
Self type: ZaiClient.type

org.llm4s.llmconnect.provider

Members list

Type members

Classlikes

Value parameters

Attributes

Attributes

Attributes

Attributes

Attributes

Value parameters

Attributes

Attributes

Attributes

Value parameters

Attributes

Attributes

Attributes

Attributes

Attributes

Attributes

Value parameters

Attributes

Attributes

Attributes

Value parameters

Attributes

Attributes

Attributes

Value parameters

Attributes

Attributes

Value parameters

Attributes

Attributes

Value parameters

Attributes

Attributes

Attributes

Value parameters

Attributes

Attributes