org.llm4s.llmconnect.provider
Members list
Type members
Classlikes
LLMClient implementation for Anthropic Claude models.
LLMClient implementation for Anthropic Claude models.
Uses the official Anthropic Java SDK (AnthropicOkHttpClient) for all API calls. SDK exceptions are mapped to the appropriate org.llm4s.error.LLMError subtypes before being returned.
== Message format adaptations ==
The Anthropic Messages API differs from the OpenAI convention in several ways that this client handles transparently:
-
Default system prompt: if the conversation contains no
SystemMessage, the client injects"You are Claude, a helpful AI assistant."automatically. Supply an explicitSystemMessageto override this. -
Tool results as user messages: the Anthropic API does not accept native tool-result messages in the same turn structure as OpenAI.
ToolMessagevalues are therefore forwarded as user messages with the prefix"[Tool result for <toolCallId>]: ". -
Assistant messages with tool calls are skipped: when an
AssistantMessagecarries pending tool calls, it is not forwarded — Anthropic infers the assistant turn from the subsequent tool-result user messages. -
Schema sanitisation: OpenAI-specific fields (
strict,additionalProperties) are stripped from tool schemas before sending, because Anthropic's API rejects them.
== Extended thinking ==
When CompletionOptions.reasoning is set, a thinking block is added to the request. The token budget is clamped to [1024, maxTokens - 1] to satisfy the Anthropic API constraint; the effective budget may therefore differ from what was requested.
maxTokens defaults to 2048 when not set in CompletionOptions because the Anthropic API requires the field.
Value parameters
- config
-
AnthropicConfigcarrying the API key, model name, and base URL. - metrics
-
Receives per-call latency and token-usage events. Defaults to
MetricsCollector.noop.
Attributes
- Companion
- object
- Supertypes
-
trait BaseLifecycleLLMClienttrait MetricsRecordingtrait LLMClienttrait AutoCloseableclass Objecttrait Matchableclass AnyShow all
Attributes
- Companion
- class
- Supertypes
-
class Objecttrait Matchableclass Any
- Self type
-
AnthropicClient.type
Minimal Cohere provider client (v2 scope).
Minimal Cohere provider client (v2 scope).
Supported:
- Non-streaming chat completion via Cohere v2
/chatAPI.
Intentionally not supported in v2:
- Streaming
- Tool calling
- Embeddings
- Multimodal inputs
Attributes
- Companion
- object
- Supertypes
-
trait BaseLifecycleLLMClienttrait MetricsRecordingtrait LLMClienttrait AutoCloseableclass Objecttrait Matchableclass AnyShow all
Attributes
- Companion
- class
- Supertypes
-
class Objecttrait Matchableclass Any
- Self type
-
CohereClient.type
Centralized cost estimation for LLM completions.
Centralized cost estimation for LLM completions.
This provides a single source of truth for estimating completion costs based on token usage and model pricing information. It integrates with the ModelRegistry to look up pricing data and applies it to usage statistics.
The estimator:
- Uses existing ModelPricing logic (no duplication)
- Returns None if pricing is unavailable
- Preserves precision of micro-cost values
- Works uniformly across all providers
Example usage:
val usage = TokenUsage(promptTokens = 100, completionTokens = 50, totalTokens = 150)
val cost = CostEstimator.estimate("gpt-4o", usage)
// cost: Some(0.0015) for gpt-4o pricing
Attributes
- Supertypes
-
class Objecttrait Matchableclass Any
- Self type
-
CostEstimator.type
DeepSeek LLM client implementation using the OpenAI-compatible API.
DeepSeek LLM client implementation using the OpenAI-compatible API.
Provides access to DeepSeek models including DeepSeek-Chat (V3) with 64K context and DeepSeek-Reasoner (R1) with 128K context for advanced reasoning tasks.
Uses the same request/response format as OpenAI, making it compatible with standard OpenAI tooling and client code patterns.
Value parameters
- config
-
DeepSeek configuration containing API key, model, base URL, and context settings
- metrics
-
MetricsCollector for recording request metrics
Attributes
- Companion
- object
- Supertypes
-
trait BaseLifecycleLLMClienttrait MetricsRecordingtrait LLMClienttrait AutoCloseableclass Objecttrait Matchableclass AnyShow all
Attributes
- Companion
- class
- Supertypes
-
class Objecttrait Matchableclass Any
- Self type
-
DeepSeekClient.type
Text embedding provider interface for generating vector representations.
Text embedding provider interface for generating vector representations.
Provides a unified interface for different embedding services (OpenAI, VoyageAI, Ollama). Each implementation handles provider-specific API calls and response formats.
Text content is the primary input; multimedia content (images, audio) should be processed through the UniversalEncoder façade which handles content extraction before embedding.
== Usage Example ==
val provider: EmbeddingProvider = OpenAIEmbeddingProvider.fromConfig(config)
val request = EmbeddingRequest(
input = Seq("Hello world", "How are you?"),
model = EmbeddingModelName("text-embedding-3-small")
)
val result: Result[EmbeddingResponse] = provider.embed(request)
Attributes
- See also
-
OpenAIEmbeddingProvider for OpenAI text-embedding models
VoyageAIEmbeddingProvider for VoyageAI embedding models
OllamaEmbeddingProvider for local Ollama embedding models
- Supertypes
-
class Objecttrait Matchableclass Any
LLMClient implementation for Google Gemini models.
LLMClient implementation for Google Gemini models.
Calls the Google Generative AI REST API directly using org.llm4s.http.Llm4sHttpClient.
== Message format ==
Gemini uses a different conversation structure from OpenAI:
- Roles are
"user"and"model"(not"user"and"assistant"). SystemMessagevalues are sent as a separatesystemInstructionfield, not inside thecontentsarray.- Tool results (
ToolMessage) are sent asfunctionResponseparts inside a"user"turn, keyed by function name (not tool-call ID). The function name is resolved from an in-request map built while processing the precedingAssistantMessage.
== Tool call IDs ==
The Gemini API does not return an ID with function-call responses. This client generates a random UUID for each tool call so that the llm4s ToolCall / ToolMessage pairing convention is preserved. These IDs are synthetic and are not round-tripped to Gemini.
== Authentication ==
The API key is appended as a ?key= query parameter on every request (Google's API requires this; it is not sent as a header). The full URL is not logged; only the base URL and model are emitted at DEBUG level.
== Schema sanitisation ==
OpenAI-specific fields (strict, additionalProperties) are stripped from tool schemas before sending, because Gemini's API rejects them.
Value parameters
- config
-
GeminiConfigwith API key, model, and base URL. - metrics
-
Receives per-call latency and token-usage events. Defaults to
MetricsCollector.noop.
Attributes
- Companion
- object
- Supertypes
-
trait BaseLifecycleLLMClienttrait MetricsRecordingtrait LLMClienttrait AutoCloseableclass Objecttrait Matchableclass AnyShow all
Attributes
- Companion
- class
- Supertypes
-
class Objecttrait Matchableclass Any
- Self type
-
GeminiClient.type
Shared HTTP status-code to org.llm4s.error.LLMError mapping used by all HTTP-based LLM provider clients.
Shared HTTP status-code to org.llm4s.error.LLMError mapping used by all HTTP-based LLM provider clients.
Centralises the duplicated pattern of converting non-2xx responses into typed Result errors. Provider-specific error details are extracted from the JSON response body when possible and truncated to a safe length.
Attributes
- Supertypes
-
class Objecttrait Matchableclass Any
- Self type
-
HttpErrorMapper.type
Enumeration of supported LLM providers.
Enumeration of supported LLM providers.
Defines the available language model service providers that can be used with llm4s. Each provider has specific configuration requirements and API characteristics.
Attributes
- See also
-
org.llm4s.llmconnect.config.ProviderConfig for provider-specific configuration
- Companion
- object
- Supertypes
-
class Objecttrait Matchableclass Any
- Known subtypes
Companion object providing LLM provider instances and utilities.
Companion object providing LLM provider instances and utilities.
Attributes
- Companion
- trait
- Supertypes
-
trait Sumtrait Mirrorclass Objecttrait Matchableclass Any
- Self type
-
LLMProvider.type
Helper trait for recording metrics consistently across all provider clients.
Helper trait for recording metrics consistently across all provider clients.
Extracts the common pattern of timing requests, observing outcomes, recording tokens, and reading costs from completion results.
Attributes
- Supertypes
-
class Objecttrait Matchableclass Any
- Known subtypes
-
trait BaseLifecycleLLMClientclass AnthropicClientclass CohereClientclass DeepSeekClientclass GeminiClientclass MistralClientclass OllamaClientclass OpenAIClientclass OpenRouterClientclass ZaiClientShow all
Mistral AI provider client using the OpenAI-compatible chat completions API.
Mistral AI provider client using the OpenAI-compatible chat completions API.
Supported:
- Non-streaming chat completion via Mistral
/v1/chat/completionsAPI.
Intentionally not supported in v1:
- Streaming
- Tool calling
- Embeddings
- Multimodal inputs
Attributes
- Companion
- object
- Supertypes
-
trait BaseLifecycleLLMClienttrait MetricsRecordingtrait LLMClienttrait AutoCloseableclass Objecttrait Matchableclass AnyShow all
Attributes
- Companion
- class
- Supertypes
-
class Objecttrait Matchableclass Any
- Self type
-
MistralClient.type
LLMClient implementation for locally-hosted Ollama models.
LLMClient implementation for locally-hosted Ollama models.
Connects to an Ollama server via its HTTP chat API (/api/chat). All Ollama-specific protocol details (JSON-lines streaming, token-count field names) are handled internally.
== Tool calling limitation ==
The Ollama chat API does not support tool results in multi-turn conversations in the same way as cloud providers. As a result, ToolMessage values are silently dropped when building the request — only SystemMessage, UserMessage, and AssistantMessage entries are forwarded to the model. Conversations that rely on tool call round-trips should use a different provider.
== Streaming ==
Token counts (prompt_eval_count, eval_count) are only present in the final JSON-lines chunk (done: true). The accumulator updates its count at that point; chunks before the final one report zero tokens.
== Timeouts ==
Non-streaming requests time out after 120 seconds; streaming requests after 600 seconds.
Value parameters
- config
-
Ollama configuration containing the model name and base URL.
- metrics
-
Receives per-call latency and token-usage events. Defaults to
MetricsCollector.noop.
Attributes
- Companion
- object
- Supertypes
-
trait BaseLifecycleLLMClienttrait MetricsRecordingtrait LLMClienttrait AutoCloseableclass Objecttrait Matchableclass AnyShow all
Attributes
- Companion
- class
- Supertypes
-
class Objecttrait Matchableclass Any
- Self type
-
OllamaClient.type
Embedding provider implementation for Ollama, a local model inference server.
Embedding provider implementation for Ollama, a local model inference server.
Generates text embeddings by calling the Ollama /api/embeddings HTTP endpoint. Each input text is embedded individually (one HTTP request per text) because the Ollama embedding API accepts a single prompt per call. Results are collected and returned as an org.llm4s.llmconnect.model.EmbeddingResponse.
No API key is required when Ollama runs locally, though one can be supplied for remote or authenticated deployments.
Attributes
- Supertypes
-
class Objecttrait Matchableclass Any
- Self type
LLMClient implementation supporting both OpenAI and Azure OpenAI services.
LLMClient implementation supporting both OpenAI and Azure OpenAI services.
Provides a unified interface for interacting with OpenAI's API and Azure's OpenAI service. Handles message conversion between llm4s format and OpenAI format, completion requests, streaming responses, and tool calling (function calling) capabilities.
Uses Azure's OpenAI client library internally, which supports both direct OpenAI and Azure-hosted OpenAI endpoints.
== Extended Thinking / Reasoning Support ==
For OpenAI o1/o3/o4 models with reasoning capabilities, use OpenRouterClient instead, which fully supports the reasoning_effort parameter. The Azure SDK used by this client does not yet expose the reasoning_effort API parameter.
For Anthropic Claude models with extended thinking, use AnthropicClient which has full support for the thinking parameter with budget_tokens.
Value parameters
- client
-
configured Azure OpenAI client instance
- config
-
provider configuration containing context window and reserve completion settings
- metrics
-
metrics collector for observability (default: noop)
- model
-
the model identifier (e.g., "gpt-4", "gpt-3.5-turbo")
Attributes
- Companion
- object
- Supertypes
-
trait BaseLifecycleLLMClienttrait MetricsRecordingtrait LLMClienttrait AutoCloseableclass Objecttrait Matchableclass AnyShow all
Factory methods for creating OpenAIClient instances.
Factory methods for creating OpenAIClient instances.
Provides safe construction of OpenAI clients with error handling via Result type.
Attributes
- Companion
- class
- Supertypes
-
class Objecttrait Matchableclass Any
- Self type
-
OpenAIClient.type
OpenAI embedding provider implementation.
OpenAI embedding provider implementation.
Provides text embeddings using OpenAI's embedding API (text-embedding-3-small, text-embedding-3-large, text-embedding-ada-002). Supports batch embedding of multiple texts in a single request.
== Supported Models ==
text-embedding-3-small- Efficient, lower cost (recommended)text-embedding-3-large- Higher quality, higher costtext-embedding-ada-002- Legacy model
== Token Usage == The response includes token usage information when available from the API.
Attributes
- See also
-
EmbeddingProvider for the provider interface
org.llm4s.llmconnect.config.EmbeddingProviderConfig for configuration
- Supertypes
-
class Objecttrait Matchableclass Any
- Self type
LLMClient implementation for the OpenRouter unified model gateway.
LLMClient implementation for the OpenRouter unified model gateway.
Sends requests to the OpenRouter REST API using the OpenAI-compatible /chat/completions endpoint. Accepts OpenAIConfig — there is no separate OpenRouterConfig; LLMConnect detects OpenRouter by checking whether baseUrl contains "openrouter.ai" and routes accordingly.
== Required headers ==
OpenRouter's usage policy requires two additional headers on every request. This client sends them automatically:
HTTP-Referer: https://github.com/llm4s/llm4sX-Title: LLM4S
== Reasoning / extended thinking ==
Model type is detected by substring matching on the lower-cased model name:
- Names containing
"claude"or"anthropic"→ Anthropic-stylethinkingobject (type: "enabled",budget_tokens). - Names containing
"o1","o3", or"o4"→ OpenAI-stylereasoning_effortstring parameter. - All other models → reasoning configuration is silently omitted.
The thinking budget is clamped to [1024, maxTokens - 1] for Anthropic models, matching the Anthropic API constraint.
== Thinking content ==
Extended thinking text is extracted from whichever field the model populates: message.thinking, message.reasoning, or choice.thinking (checked in that order).
Value parameters
- config
-
OpenAIConfigwhosebaseUrlmust contain"openrouter.ai"; carries the API key and model name. - metrics
-
Receives per-call latency and token-usage events. Defaults to
MetricsCollector.noop.
Attributes
- Companion
- object
- Supertypes
-
trait BaseLifecycleLLMClienttrait MetricsRecordingtrait LLMClienttrait AutoCloseableclass Objecttrait Matchableclass AnyShow all
Attributes
- Companion
- class
- Supertypes
-
class Objecttrait Matchableclass Any
- Self type
-
OpenRouterClient.type
Embedding provider implementation for the Voyage AI embedding API.
Embedding provider implementation for the Voyage AI embedding API.
Generates text embeddings by posting batched input to the Voyage AI /v1/embeddings endpoint. Unlike Ollama, Voyage accepts multiple inputs in a single request, so all texts are sent in one HTTP call.
Requires a valid Voyage AI API key in the provider configuration.
Attributes
- See also
-
EmbeddingProvider for the common embedding interface
- Supertypes
-
class Objecttrait Matchableclass Any
- Self type
LLM client for the Z.ai API.
LLM client for the Z.ai API.
Z.ai uses an OpenAI-compatible /chat/completions endpoint with one important difference: message content is always an array of typed objects ([{"type":"text","text":"..."}]) rather than a plain string. This applies to user, system, assistant, and tool messages alike. Sending a plain string causes a rejection from the Z.ai API.
Both non-streaming (complete) and streaming (streamComplete) are supported. Tool calling follows the standard OpenAI function-calling format.
Value parameters
- config
-
Z.ai connection configuration (API key, model, base URL, context window)
- metrics
-
records per-call latency and token-usage events; use org.llm4s.metrics.MetricsCollector.noop when metrics are not needed
Attributes
- Companion
- object
- Supertypes
-
trait BaseLifecycleLLMClienttrait MetricsRecordingtrait LLMClienttrait AutoCloseableclass Objecttrait Matchableclass AnyShow all