LLMClient implementation for Google Cloud Vertex AI.
Calls the Vertex AI REST API (aiplatform.googleapis.com) using the same Gemini-compatible JSON format as GeminiClient, but with:
- A region-scoped endpoint:
https://{location}-aiplatform.googleapis.com/v1/projects/{project}/... - OAuth2 bearer-token auth via VertexAIAuthProvider (ADC) instead of an API-key query param.
== Authentication ==
Tokens are obtained and cached by VertexAIAuthProvider, which supports authorized_user refresh-token credentials (from gcloud auth application-default login), service_account JWT credentials, and the GCE/GKE metadata server (Workload Identity). Set GOOGLE_ACCESS_TOKEN as an escape hatch for testing.
== Message format ==
Identical to GeminiClient:
- Roles are
"user"and"model". - System messages go into
systemInstruction. - Tool results are sent as
functionResponseparts keyed by function name. - Synthetic UUIDs are generated for tool-call IDs (Vertex AI does not return them).
Value parameters
- config
-
VertexAIConfig with project, location, model, and credential path.
- exchangeLogging
-
Controls whether raw request/response bodies are recorded.
- httpClient
-
HTTP client (injectable for testing).
- metrics
-
Receives per-call latency and token-usage events.
Attributes
- Companion
- object
- Graph
-
- Supertypes
-
trait BaseLifecycleLLMClienttrait MetricsRecordingtrait LLMClienttrait AutoCloseableclass Objecttrait Matchableclass AnyShow all
Members list
Value members
Concrete methods
Executes a blocking completion request and returns the full response.
Executes a blocking completion request and returns the full response.
Sends the conversation to the LLM and waits for the complete response. Use when you need the entire response at once or when streaming is not required.
Value parameters
- conversation
-
conversation history including system, user, assistant, and tool messages
- options
-
configuration including temperature, max tokens, tools, etc. (default: CompletionOptions())
Attributes
- Returns
-
Right(Completion) with the model's response, or Left(LLMError) on failure
- Definition Classes
Returns the maximum context window size supported by this model in tokens.
Returns the maximum context window size supported by this model in tokens.
The context window is the total tokens (prompt + completion) the model can process in a single request, including all conversation messages and the generated response.
Attributes
- Returns
-
total context window size in tokens (e.g., 4096, 8192, 128000)
- Definition Classes
Returns the number of tokens reserved for the model's completion response.
Returns the number of tokens reserved for the model's completion response.
This value is subtracted from the context window when calculating available tokens for prompts. Corresponds to the max_tokens or completion token limit configured for the model.
Attributes
- Returns
-
number of tokens reserved for completion
- Definition Classes
Executes a streaming completion request, invoking a callback for each chunk as it arrives.
Executes a streaming completion request, invoking a callback for each chunk as it arrives.
Streams the response incrementally, calling onChunk for each token/chunk received. Enables real-time display of responses. Returns the final accumulated completion on success.
Value parameters
- conversation
-
conversation history including system, user, assistant, and tool messages
- onChunk
-
callback invoked for each chunk; called synchronously, avoid blocking operations
- options
-
configuration including temperature, max tokens, tools, etc. (default: CompletionOptions())
Attributes
- Returns
-
Right(Completion) with the complete accumulated response, or Left(LLMError) on failure
- Definition Classes
Inherited methods
Releases resources and closes connections to the LLM provider.
Releases resources and closes connections to the LLM provider.
Call when the client is no longer needed. After calling close(), the client should not be used. Default implementation is a no-op; override if managing resources like connections or thread pools.
Attributes
- Definition Classes
- Inherited from:
- BaseLifecycleLLMClient
Validates that the client is open, executes the operation, and records standard completion metrics (latency, token usage, estimated cost).
Validates that the client is open, executes the operation, and records standard completion metrics (latency, token usage, estimated cost).
Use this in complete and streamComplete implementations to avoid repeating the lifecycle-check + metrics-wrapping boilerplate.
Value parameters
- operation
-
The provider-specific completion logic to execute. Called only when the client is open.
Attributes
- Returns
-
The completion result with metrics recorded as a side-effect.
- Inherited from:
- BaseLifecycleLLMClient
Calculates available token budget for prompts after accounting for completion reserve and headroom.
Calculates available token budget for prompts after accounting for completion reserve and headroom.
Formula: (contextWindow - reserveCompletion) * (1 - headroom)
Headroom provides a safety margin for tokenization variations and message formatting overhead.
Value parameters
- headroom
-
safety margin as percentage of prompt budget (default: HeadroomPercent.Standard ~10%)
Attributes
- Returns
-
maximum tokens available for prompt content
- Inherited from:
- LLMClient
Validates client configuration and connectivity to the LLM provider.
Validates client configuration and connectivity to the LLM provider.
May perform checks such as verifying API credentials, testing connectivity, and validating configuration. Default implementation returns success; override for provider-specific validation.
Attributes
- Returns
-
Right(()) if validation succeeds, Left(LLMError) with details on failure
- Inherited from:
- LLMClient
Attributes
- Inherited from:
- BaseLifecycleLLMClient
Executes operation and records metrics for the call.
Executes operation and records metrics for the call.
Latency and outcome (success or classified error) are recorded for every call regardless of result. Token counts and cost are recorded only on success — a Left result emits an org.llm4s.metrics.Outcome.Error event whose kind is derived from the org.llm4s.error.LLMError subtype via ErrorKind.fromLLMError.
Value parameters
- extractCost
-
Extracts the pre-computed cost (USD) from a successful result; return
Noneto skip cost recording. - extractUsage
-
Extracts prompt/completion token counts from a successful result; return
Noneto skip token recording. - model
-
Model identifier forwarded to the collector.
- operation
-
The LLM call to time and observe.
- provider
-
Provider label forwarded to the collector (e.g.
"openai").
Attributes
- Returns
-
The result of
operation, unchanged. - Inherited from:
- MetricsRecording