LLMClient implementation supporting both OpenAI and Azure OpenAI services.
Provides a unified interface for interacting with OpenAI's API and Azure's OpenAI service. Handles message conversion between llm4s format and OpenAI format, completion requests, streaming responses, and tool calling (function calling) capabilities.
Uses Azure's OpenAI client library internally, which supports both direct OpenAI and Azure-hosted OpenAI endpoints.
== Extended Thinking / Reasoning Support ==
For OpenAI o1/o3/o4 models with reasoning capabilities, use OpenRouterClient instead, which fully supports the reasoning_effort parameter. The Azure SDK used by this client does not yet expose the reasoning_effort API parameter.
For Anthropic Claude models with extended thinking, use AnthropicClient which has full support for the thinking parameter with budget_tokens.
Value parameters
- client
-
configured Azure OpenAI client instance
- config
-
provider configuration containing context window and reserve completion settings
- model
-
the model identifier (e.g., "gpt-4", "gpt-3.5-turbo")
Attributes
- Companion
- object
- Graph
-
- Supertypes
Members list
Value members
Constructors
Creates an OpenAI client for direct OpenAI API access.
Creates an OpenAI client for direct OpenAI API access.
Value parameters
- config
-
OpenAI configuration with API key and base URL
Attributes
Creates an OpenAI client for Azure OpenAI service.
Creates an OpenAI client for Azure OpenAI service.
Value parameters
- config
-
Azure configuration with API key, endpoint, and API version
Attributes
Concrete methods
Executes a blocking completion request and returns the full response.
Executes a blocking completion request and returns the full response.
Sends the conversation to the LLM and waits for the complete response. Use when you need the entire response at once or when streaming is not required.
Value parameters
- conversation
-
conversation history including system, user, assistant, and tool messages
- options
-
configuration including temperature, max tokens, tools, etc. (default: CompletionOptions())
Attributes
- Returns
-
Right(Completion) with the model's response, or Left(LLMError) on failure
- Definition Classes
Returns the maximum context window size supported by this model in tokens.
Returns the maximum context window size supported by this model in tokens.
The context window is the total tokens (prompt + completion) the model can process in a single request, including all conversation messages and the generated response.
Attributes
- Returns
-
total context window size in tokens (e.g., 4096, 8192, 128000)
- Definition Classes
Returns the number of tokens reserved for the model's completion response.
Returns the number of tokens reserved for the model's completion response.
This value is subtracted from the context window when calculating available tokens for prompts. Corresponds to the max_tokens or completion token limit configured for the model.
Attributes
- Returns
-
number of tokens reserved for completion
- Definition Classes
Executes a streaming completion request, invoking a callback for each chunk as it arrives.
Executes a streaming completion request, invoking a callback for each chunk as it arrives.
Streams the response incrementally, calling onChunk for each token/chunk received. Enables real-time display of responses. Returns the final accumulated completion on success.
Value parameters
- conversation
-
conversation history including system, user, assistant, and tool messages
- onChunk
-
callback invoked for each chunk; called synchronously, avoid blocking operations
- options
-
configuration including temperature, max tokens, tools, etc. (default: CompletionOptions())
Attributes
- Returns
-
Right(Completion) with the complete accumulated response, or Left(LLMError) on failure
- Definition Classes
Inherited methods
Releases resources and closes connections to the LLM provider.
Releases resources and closes connections to the LLM provider.
Call when the client is no longer needed. After calling close(), the client should not be used. Default implementation is a no-op; override if managing resources like connections or thread pools.
Attributes
- Inherited from:
- LLMClient
Calculates available token budget for prompts after accounting for completion reserve and headroom.
Calculates available token budget for prompts after accounting for completion reserve and headroom.
Formula: (contextWindow - reserveCompletion) * (1 - headroom)
Headroom provides a safety margin for tokenization variations and message formatting overhead.
Value parameters
- headroom
-
safety margin as percentage of prompt budget (default: HeadroomPercent.Standard ~10%)
Attributes
- Returns
-
maximum tokens available for prompt content
- Inherited from:
- LLMClient
Validates client configuration and connectivity to the LLM provider.
Validates client configuration and connectivity to the LLM provider.
May perform checks such as verifying API credentials, testing connectivity, and validating configuration. Default implementation returns success; override for provider-specific validation.
Attributes
- Returns
-
Right(()) if validation succeeds, Left(LLMError) with details on failure
- Inherited from:
- LLMClient