org.llm4s.llmconnect.caching
package org.llm4s.llmconnect.caching
Members list
Type members
Classlikes
Attributes
- Companion
- object
- Supertypes
-
trait Serializabletrait Producttrait Equalsclass Objecttrait Matchableclass AnyShow all
object CacheConfig
Attributes
- Companion
- class
- Supertypes
-
class Objecttrait Matchableclass Any
- Self type
-
CacheConfig.type
case class CacheEntry(embedding: Seq[Double], response: Completion, timestamp: Instant, options: CompletionOptions)
Attributes
- Supertypes
-
trait Serializabletrait Producttrait Equalsclass Objecttrait Matchableclass AnyShow all
class CachingLLMClient(baseClient: LLMClient, embeddingClient: EmbeddingClient, embeddingModel: EmbeddingModelConfig, config: CacheConfig, tracing: Tracing, clock: Clock) extends LLMClient
Semantic caching wrapper for LLMClient.
Semantic caching wrapper for LLMClient.
Caches LLM completions based on the semantic similarity of the prompt request. Useful for reducing costs and latency for repetitive or similar queries.
== Usage ==
val cachingClient = new CachingLLMClient(
baseClient = openAIClient,
embeddingClient = embeddingClient,
embeddingModel = EmbeddingModelConfig("text-embedding-3-small", 1536),
config = CacheConfig(
similarityThreshold = 0.95,
ttl = 1.hour,
maxSize = 1000
),
tracing = tracing
)
== Behavior ==
- Computes embedding for the user/system prompt.
- Searches cache for entries within
similarityThreshold. - Validates additional constraints:
- Entry must be within TTL.
- Entry
CompletionOptionsmust strictly match the request options. - On Hit: Returns cached
Completionand updates LRU order. Emitscache_hittrace event. - On Miss: Delegating to
baseClient, caches the result, and emitscache_misstrace event.
== Limitations ==
streamCompleterequests bypass the cache entirely.- Cache is in-memory and lost on restart.
- Cache lookup involves a linear scan (O(n)) of all entries to calculate cosine similarity. Performance may degrade with very large
maxSize.
Value parameters
- baseClient
-
The underlying LLM client to delegate to on cache miss.
- clock
-
Clock for TTL verification (defaults to UTC).
- config
-
Cache configuration (threshold, TTL, max size).
- embeddingClient
-
Client to generate embeddings for prompts.
- embeddingModel
-
Configuration for the embedding model used.
- tracing
-
Tracing instance for observability.
Attributes
- Supertypes
In this article