org.llm4s.llmconnect.caching
Members list
Type members
Classlikes
Configuration for the CachingLLMClient semantic cache.
Configuration for the CachingLLMClient semantic cache.
Uses the sealed-abstract-case-class pattern to prevent direct construction and disable the generated copy method; always construct via CacheConfig.create, which validates all fields and returns a typed error on invalid input.
Value parameters
- maxSize
-
Maximum number of entries in the in-memory cache. When the limit is reached the least-recently-used entry is evicted automatically.
- similarityThreshold
-
Minimum cosine similarity
[0.0, 1.0]for a cache hit. A value of1.0requires near-identical queries; lower values allow semantically similar but textually different queries to share a cached response. - ttl
-
Maximum age of a CacheEntry before it is considered expired and the cache is bypassed. Must be positive.
Attributes
- Companion
- object
- Supertypes
-
trait Serializabletrait Producttrait Equalsclass Objecttrait Matchableclass AnyShow all
Attributes
- Companion
- class
- Supertypes
-
class Objecttrait Matchableclass Any
- Self type
-
CacheConfig.type
An entry in the CachingLLMClient semantic cache.
An entry in the CachingLLMClient semantic cache.
Stores the embedding vector of the original query alongside the cached org.llm4s.llmconnect.model.Completion, so that later queries can be matched by cosine similarity rather than exact string equality.
Value parameters
- embedding
-
L2-normalised embedding vector of the query that produced this entry. Used for cosine-similarity lookup against new queries. Dimensionality matches the configured embedding model.
- options
-
The org.llm4s.llmconnect.model.CompletionOptions used to produce
response. A cache hit requires an exact match onoptions; mismatched options (e.g. different temperature or tool set) result in aOptionsMismatchmiss and bypass the cache. - response
-
The org.llm4s.llmconnect.model.Completion returned by the LLM for the original query. This is the value returned to the caller on a cache hit.
- timestamp
-
Wall-clock time when this entry was inserted. Compared against CacheConfig.ttl to determine whether the entry has expired.
Attributes
- Supertypes
-
trait Serializabletrait Producttrait Equalsclass Objecttrait Matchableclass AnyShow all
Utility object for generating cache keys using secure hashing. Keys should be deterministic (same inputs always produce same key) and collision-resistant (different inputs shouldn't produce same key).
Utility object for generating cache keys using secure hashing. Keys should be deterministic (same inputs always produce same key) and collision-resistant (different inputs shouldn't produce same key).
Attributes
- Supertypes
-
class Objecttrait Matchableclass Any
- Self type
-
CacheKeyGenerator.type
Value parameters
- hitRatePercent
-
Percentage of requests served from cache (0.0 to 100.0).
- hits
-
Total number of successful cache lookups.
- misses
-
Total number of lookups that required a new embedding.
- size
-
Current number of entries in the cache.
- totalRequests
-
Combined sum of hits and misses.
Attributes
- Supertypes
-
trait Serializabletrait Producttrait Equalsclass Objecttrait Matchableclass AnyShow all
A caching decorator for EmbeddingClient that avoids redundant provider calls.
A caching decorator for EmbeddingClient that avoids redundant provider calls.
On each embed call, input texts are checked against the cache first. Only texts that are not cached are forwarded to the base client — in a single batched request. Results are then stored in the cache and merged with cached hits before being returned, preserving the original input order.
Errors returned by the base client are never cached, so transient failures are retried on the next call.
'''Note''': EmbeddingClient is a concrete class rather than a trait, so this wrapper cannot be used as a drop-in substitute for EmbeddingClient in APIs that require that type. A follow-up issue should extract an embedding service interface to allow proper decorator substitution.
Value parameters
- baseClient
-
The underlying client used to generate embeddings on cache misses.
- cache
-
The storage backend for the embedding vectors.
- keyGenerator
-
Function that maps (text, modelName) to a cache key (defaults to SHA-256).
Attributes
- Supertypes
-
class Objecttrait Matchableclass Any
Semantic caching wrapper for LLMClient.
Semantic caching wrapper for LLMClient.
Caches LLM completions based on the semantic similarity of the prompt request. Useful for reducing costs and latency for repetitive or similar queries.
== Usage ==
val cachingClient = new CachingLLMClient(
baseClient = openAIClient,
embeddingClient = embeddingClient,
embeddingModel = EmbeddingModelConfig("text-embedding-3-small", 1536),
config = CacheConfig(
similarityThreshold = 0.95,
ttl = 1.hour,
maxSize = 1000
),
tracing = tracing
)
== Behavior ==
- Computes embedding for the user/system prompt.
- Searches cache for entries within
similarityThreshold. - Validates additional constraints:
- Entry must be within TTL.
- Entry
CompletionOptionsmust strictly match the request options. - On Hit: Returns cached
Completionand updates LRU order. Emitscache_hittrace event. - On Miss: Delegating to
baseClient, caches the result, and emitscache_misstrace event.
== Limitations ==
streamCompleterequests bypass the cache entirely.- Cache is in-memory and lost on restart.
- Cache lookup involves a linear scan (O(n)) of all entries to calculate cosine similarity. Performance may degrade with very large
maxSize.
Value parameters
- baseClient
-
The underlying LLM client to delegate to on cache miss.
- clock
-
Clock for TTL verification (defaults to UTC).
- config
-
Cache configuration (threshold, TTL, max size).
- embeddingClient
-
Client to generate embeddings for prompts.
- embeddingModel
-
Configuration for the embedding model used.
- tracing
-
Tracing instance for observability.
Attributes
- Supertypes
Generic trait for embedding storage backends.
Generic trait for embedding storage backends.
Type parameters
- Embedding
-
The type of the embedding representation (usually Seq[Double]).
Attributes
- Supertypes
-
class Objecttrait Matchableclass Any
- Known subtypes
-
class InMemoryEmbeddingCache[Embedding]
Thread-safe in-memory implementation of EmbeddingCache with LRU eviction.
Thread-safe in-memory implementation of EmbeddingCache with LRU eviction.
Type parameters
- Embedding
-
The embedding type (usually Seq[Double]).
Value parameters
- maxSize
-
The maximum number of embeddings to store before evicting the oldest.
- ttl
-
Optional Time-To-Live for cache entries. Expired entries are lazily evicted on access.
Attributes
- Supertypes