org.llm4s.llmconnect.provider

Members list

Type members

Classlikes

class AnthropicClient(config: AnthropicConfig, val metrics: MetricsCollector, exchangeLogging: ProviderExchangeLogging) extends BaseLifecycleLLMClient

LLMClient implementation for Anthropic Claude models.

LLMClient implementation for Anthropic Claude models.

Uses the official Anthropic Java SDK (AnthropicOkHttpClient) for all API calls. SDK exceptions are mapped to the appropriate org.llm4s.error.LLMError subtypes before being returned.

== Message format adaptations ==

The Anthropic Messages API differs from the OpenAI convention in several ways that this client handles transparently:

  • Default system prompt: if the conversation contains no SystemMessage, the client injects "You are Claude, a helpful AI assistant." automatically. Supply an explicit SystemMessage to override this.

  • Tool results as user messages: the Anthropic API does not accept native tool-result messages in the same turn structure as OpenAI. ToolMessage values are therefore forwarded as user messages with the prefix "[Tool result for <toolCallId>]: ".

  • Assistant messages with tool calls are skipped: when an AssistantMessage carries pending tool calls, it is not forwarded — Anthropic infers the assistant turn from the subsequent tool-result user messages.

  • Schema sanitisation: OpenAI-specific fields (strict, additionalProperties) are stripped from tool schemas before sending, because Anthropic's API rejects them.

== Extended thinking ==

When CompletionOptions.reasoning is set, a thinking block is added to the request. The token budget is clamped to [1024, maxTokens - 1] to satisfy the Anthropic API constraint; the effective budget may therefore differ from what was requested.

maxTokens defaults to 2048 when not set in CompletionOptions because the Anthropic API requires the field.

Value parameters

config

AnthropicConfig carrying the API key, model name, and base URL.

metrics

Receives per-call latency and token-usage events. Defaults to MetricsCollector.noop.

Attributes

Companion
object
Supertypes
trait LLMClient
trait AutoCloseable
class Object
trait Matchable
class Any
Show all

Attributes

Companion
class
Supertypes
class Object
trait Matchable
class Any
Self type
class CohereClient(config: CohereConfig, val metrics: MetricsCollector, exchangeLogging: ProviderExchangeLogging) extends BaseLifecycleLLMClient

Minimal Cohere provider client (v2 scope).

Minimal Cohere provider client (v2 scope).

Supported:

  • Non-streaming chat completion via Cohere v2 /chat API.

Intentionally not supported in v2:

  • Streaming
  • Tool calling
  • Embeddings
  • Multimodal inputs

Attributes

Companion
object
Supertypes
trait LLMClient
trait AutoCloseable
class Object
trait Matchable
class Any
Show all
object CohereClient

Attributes

Companion
class
Supertypes
class Object
trait Matchable
class Any
Self type
object CostEstimator

Centralized cost estimation for LLM completions.

Centralized cost estimation for LLM completions.

This provides a single source of truth for estimating completion costs based on token usage and model pricing information. It integrates with the ModelRegistry to look up pricing data and applies it to usage statistics.

The estimator:

  • Uses existing ModelPricing logic (no duplication)
  • Returns None if pricing is unavailable
  • Preserves precision of micro-cost values
  • Works uniformly across all providers

Example usage:

 val usage = TokenUsage(promptTokens = 100, completionTokens = 50, totalTokens = 150)
 val cost = CostEstimator.estimate("gpt-4o", usage)
 // cost: Some(0.0015) for gpt-4o pricing

Attributes

Supertypes
class Object
trait Matchable
class Any
Self type
class DeepSeekClient(config: DeepSeekConfig, val metrics: MetricsCollector, exchangeLogging: ProviderExchangeLogging) extends BaseLifecycleLLMClient

DeepSeek LLM client implementation using the OpenAI-compatible API.

DeepSeek LLM client implementation using the OpenAI-compatible API.

Provides access to DeepSeek models including DeepSeek-Chat (V3) with 64K context and DeepSeek-Reasoner (R1) with 128K context for advanced reasoning tasks.

Uses the same request/response format as OpenAI, making it compatible with standard OpenAI tooling and client code patterns.

Value parameters

config

DeepSeek configuration containing API key, model, base URL, and context settings

metrics

MetricsCollector for recording request metrics

Attributes

Companion
object
Supertypes
trait LLMClient
trait AutoCloseable
class Object
trait Matchable
class Any
Show all

Attributes

Companion
class
Supertypes
class Object
trait Matchable
class Any
Self type

Text embedding provider interface for generating vector representations.

Text embedding provider interface for generating vector representations.

Provides a unified interface for different embedding services (OpenAI, VoyageAI, Ollama). Each implementation handles provider-specific API calls and response formats.

Text content is the primary input; multimedia content (images, audio) should be processed through the UniversalEncoder façade which handles content extraction before embedding.

== Usage Example ==

val provider: EmbeddingProvider = OpenAIEmbeddingProvider.fromConfig(config)
val request = EmbeddingRequest(
 input = Seq("Hello world", "How are you?"),
 model = EmbeddingModelName("text-embedding-3-small")
)
val result: Result[EmbeddingResponse] = provider.embed(request)

Attributes

See also

OpenAIEmbeddingProvider for OpenAI text-embedding models

VoyageAIEmbeddingProvider for VoyageAI embedding models

OllamaEmbeddingProvider for local Ollama embedding models

Supertypes
class Object
trait Matchable
class Any
class GeminiClient(config: GeminiConfig, val metrics: MetricsCollector, exchangeLogging: ProviderExchangeLogging, val httpClient: Llm4sHttpClient) extends BaseLifecycleLLMClient

LLMClient implementation for Google Gemini models.

LLMClient implementation for Google Gemini models.

Calls the Google Generative AI REST API directly using org.llm4s.http.Llm4sHttpClient.

== Message format ==

Gemini uses a different conversation structure from OpenAI:

  • Roles are "user" and "model" (not "user" and "assistant").
  • SystemMessage values are sent as a separate systemInstruction field, not inside the contents array.
  • Tool results (ToolMessage) are sent as functionResponse parts inside a "user" turn, keyed by function name (not tool-call ID). The function name is resolved from an in-request map built while processing the preceding AssistantMessage.

== Tool call IDs ==

The Gemini API does not return an ID with function-call responses. This client generates a random UUID for each tool call so that the llm4s ToolCall / ToolMessage pairing convention is preserved. These IDs are synthetic and are not round-tripped to Gemini.

== Authentication ==

The API key is appended as a ?key= query parameter on every request (Google's API requires this; it is not sent as a header). The full URL is not logged; only the base URL and model are emitted at DEBUG level.

== Schema sanitisation ==

OpenAI-specific fields (strict, additionalProperties) are stripped from tool schemas before sending, because Gemini's API rejects them.

Value parameters

config

GeminiConfig with API key, model, and base URL.

metrics

Receives per-call latency and token-usage events. Defaults to MetricsCollector.noop.

Attributes

Companion
object
Supertypes
trait LLMClient
trait AutoCloseable
class Object
trait Matchable
class Any
Show all
object GeminiClient

Attributes

Companion
class
Supertypes
class Object
trait Matchable
class Any
Self type

Shared HTTP status-code to org.llm4s.error.LLMError mapping used by all HTTP-based LLM provider clients.

Shared HTTP status-code to org.llm4s.error.LLMError mapping used by all HTTP-based LLM provider clients.

Centralises the duplicated pattern of converting non-2xx responses into typed Result errors. Provider-specific error details are extracted from the JSON response body when possible and truncated to a safe length.

Attributes

Supertypes
class Object
trait Matchable
class Any
Self type
sealed trait LLMProvider

Enumeration of supported LLM providers.

Enumeration of supported LLM providers.

Defines the available language model service providers that can be used with llm4s. Each provider has specific configuration requirements and API characteristics.

Attributes

See also

org.llm4s.llmconnect.config.ProviderConfig for provider-specific configuration

Companion
object
Supertypes
class Object
trait Matchable
class Any
Known subtypes
object Anthropic
object Azure
object Cohere
object DeepSeek
object Gemini
object Mistral
object Ollama
object OpenAI
object OpenRouter
object Zai
Show all
object LLMProvider

Companion object providing LLM provider instances and utilities.

Companion object providing LLM provider instances and utilities.

Attributes

Companion
trait
Supertypes
trait Sum
trait Mirror
class Object
trait Matchable
class Any
Self type

Helper trait for recording metrics consistently across all provider clients.

Helper trait for recording metrics consistently across all provider clients.

Extracts the common pattern of timing requests, observing outcomes, recording tokens, and reading costs from completion results.

Attributes

Supertypes
class Object
trait Matchable
class Any
Known subtypes
class MistralClient(config: MistralConfig, val metrics: MetricsCollector, exchangeLogging: ProviderExchangeLogging) extends BaseLifecycleLLMClient

Mistral AI provider client using the OpenAI-compatible chat completions API.

Mistral AI provider client using the OpenAI-compatible chat completions API.

Supported:

  • Non-streaming chat completion via Mistral /v1/chat/completions API.

Intentionally not supported in v1:

  • Streaming
  • Tool calling
  • Embeddings
  • Multimodal inputs

Attributes

Companion
object
Supertypes
trait LLMClient
trait AutoCloseable
class Object
trait Matchable
class Any
Show all
object MistralClient

Attributes

Companion
class
Supertypes
class Object
trait Matchable
class Any
Self type
class OllamaClient(config: OllamaConfig, val metrics: MetricsCollector, exchangeLogging: ProviderExchangeLogging, val httpClient: Llm4sHttpClient) extends BaseLifecycleLLMClient

LLMClient implementation for locally-hosted Ollama models.

LLMClient implementation for locally-hosted Ollama models.

Connects to an Ollama server via its HTTP chat API (/api/chat). All Ollama-specific protocol details (JSON-lines streaming, token-count field names) are handled internally.

== Tool calling limitation ==

The Ollama chat API does not support tool results in multi-turn conversations in the same way as cloud providers. As a result, ToolMessage values are silently dropped when building the request — only SystemMessage, UserMessage, and AssistantMessage entries are forwarded to the model. Conversations that rely on tool call round-trips should use a different provider.

== Streaming ==

Token counts (prompt_eval_count, eval_count) are only present in the final JSON-lines chunk (done: true). The accumulator updates its count at that point; chunks before the final one report zero tokens.

== Timeouts ==

Non-streaming requests time out after 120 seconds; streaming requests after 600 seconds.

Value parameters

config

Ollama configuration containing the model name and base URL.

metrics

Receives per-call latency and token-usage events. Defaults to MetricsCollector.noop.

Attributes

Companion
object
Supertypes
trait LLMClient
trait AutoCloseable
class Object
trait Matchable
class Any
Show all
object OllamaClient

Attributes

Companion
class
Supertypes
class Object
trait Matchable
class Any
Self type

Embedding provider implementation for Ollama, a local model inference server.

Embedding provider implementation for Ollama, a local model inference server.

Generates text embeddings by calling the Ollama /api/embeddings HTTP endpoint. Each input text is embedded individually (one HTTP request per text) because the Ollama embedding API accepts a single prompt per call. Results are collected and returned as an org.llm4s.llmconnect.model.EmbeddingResponse.

No API key is required when Ollama runs locally, though one can be supplied for remote or authenticated deployments.

Attributes

Supertypes
class Object
trait Matchable
class Any
Self type

LLMClient implementation supporting both OpenAI and Azure OpenAI services.

LLMClient implementation supporting both OpenAI and Azure OpenAI services.

Provides a unified interface for interacting with OpenAI's API and Azure's OpenAI service. Handles message conversion between llm4s format and OpenAI format, completion requests, streaming responses, and tool calling (function calling) capabilities.

Uses Azure's OpenAI client library internally, which supports both direct OpenAI and Azure-hosted OpenAI endpoints.

== Extended Thinking / Reasoning Support ==

For OpenAI o1/o3/o4 models with reasoning capabilities, use OpenRouterClient instead, which fully supports the reasoning_effort parameter. The Azure SDK used by this client does not yet expose the reasoning_effort API parameter.

For Anthropic Claude models with extended thinking, use AnthropicClient which has full support for the thinking parameter with budget_tokens.

Value parameters

client

configured Azure OpenAI client instance

config

provider configuration containing context window and reserve completion settings

metrics

metrics collector for observability (default: noop)

model

the model identifier (e.g., "gpt-4", "gpt-3.5-turbo")

Attributes

Companion
object
Supertypes
trait LLMClient
trait AutoCloseable
class Object
trait Matchable
class Any
Show all
object OpenAIClient

Factory methods for creating OpenAIClient instances.

Factory methods for creating OpenAIClient instances.

Provides safe construction of OpenAI clients with error handling via Result type.

Attributes

Companion
class
Supertypes
class Object
trait Matchable
class Any
Self type

OpenAI embedding provider implementation.

OpenAI embedding provider implementation.

Provides text embeddings using OpenAI's embedding API (text-embedding-3-small, text-embedding-3-large, text-embedding-ada-002). Supports batch embedding of multiple texts in a single request.

== Supported Models ==

  • text-embedding-3-small - Efficient, lower cost (recommended)
  • text-embedding-3-large - Higher quality, higher cost
  • text-embedding-ada-002 - Legacy model

== Token Usage == The response includes token usage information when available from the API.

Attributes

See also

EmbeddingProvider for the provider interface

Supertypes
class Object
trait Matchable
class Any
Self type
class OpenRouterClient(config: OpenAIConfig, val metrics: MetricsCollector, exchangeLogging: ProviderExchangeLogging) extends BaseLifecycleLLMClient

LLMClient implementation for the OpenRouter unified model gateway.

LLMClient implementation for the OpenRouter unified model gateway.

Sends requests to the OpenRouter REST API using the OpenAI-compatible /chat/completions endpoint. Accepts OpenAIConfig — there is no separate OpenRouterConfig; LLMConnect detects OpenRouter by checking whether baseUrl contains "openrouter.ai" and routes accordingly.

== Required headers ==

OpenRouter's usage policy requires two additional headers on every request. This client sends them automatically:

  • HTTP-Referer: https://github.com/llm4s/llm4s
  • X-Title: LLM4S

== Reasoning / extended thinking ==

Model type is detected by substring matching on the lower-cased model name:

  • Names containing "claude" or "anthropic" → Anthropic-style thinking object (type: "enabled", budget_tokens).
  • Names containing "o1", "o3", or "o4" → OpenAI-style reasoning_effort string parameter.
  • All other models → reasoning configuration is silently omitted.

The thinking budget is clamped to [1024, maxTokens - 1] for Anthropic models, matching the Anthropic API constraint.

== Thinking content ==

Extended thinking text is extracted from whichever field the model populates: message.thinking, message.reasoning, or choice.thinking (checked in that order).

Value parameters

config

OpenAIConfig whose baseUrl must contain "openrouter.ai"; carries the API key and model name.

metrics

Receives per-call latency and token-usage events. Defaults to MetricsCollector.noop.

Attributes

Companion
object
Supertypes
trait LLMClient
trait AutoCloseable
class Object
trait Matchable
class Any
Show all

Attributes

Companion
class
Supertypes
class Object
trait Matchable
class Any
Self type

Embedding provider implementation for the Voyage AI embedding API.

Embedding provider implementation for the Voyage AI embedding API.

Generates text embeddings by posting batched input to the Voyage AI /v1/embeddings endpoint. Unlike Ollama, Voyage accepts multiple inputs in a single request, so all texts are sent in one HTTP call.

Requires a valid Voyage AI API key in the provider configuration.

Attributes

See also

EmbeddingProvider for the common embedding interface

Supertypes
class Object
trait Matchable
class Any
Self type
class ZaiClient(config: ZaiConfig, val metrics: MetricsCollector, exchangeLogging: ProviderExchangeLogging) extends BaseLifecycleLLMClient

LLM client for the Z.ai API.

LLM client for the Z.ai API.

Z.ai uses an OpenAI-compatible /chat/completions endpoint with one important difference: message content is always an array of typed objects ([{"type":"text","text":"..."}]) rather than a plain string. This applies to user, system, assistant, and tool messages alike. Sending a plain string causes a rejection from the Z.ai API.

Both non-streaming (complete) and streaming (streamComplete) are supported. Tool calling follows the standard OpenAI function-calling format.

Value parameters

config

Z.ai connection configuration (API key, model, base URL, context window)

metrics

records per-call latency and token-usage events; use org.llm4s.metrics.MetricsCollector.noop when metrics are not needed

Attributes

Companion
object
Supertypes
trait LLMClient
trait AutoCloseable
class Object
trait Matchable
class Any
Show all
object ZaiClient

Attributes

Companion
class
Supertypes
class Object
trait Matchable
class Any
Self type
ZaiClient.type