llm4s-core/org.llm4s/org.llm4s.llmconnect/org.llm4s.llmconnect.caching

org.llm4s.llmconnect.caching

package org.llm4s.llmconnect.caching

Members list

Type members

Classlikes

Attributes

Companion: object
Supertypes: trait Serializable

trait Product

trait Equals

class Object

trait Matchable

class Any
Show all

Attributes

Companion: class
Supertypes: class Object

trait Matchable

class Any
Self type: CacheConfig.type

Attributes

Supertypes: trait Serializable

trait Product

trait Equals

class Object

trait Matchable

class Any
Show all

Semantic caching wrapper for LLMClient.

Caches LLM completions based on the semantic similarity of the prompt request. Useful for reducing costs and latency for repetitive or similar queries.

== Usage ==

val cachingClient = new CachingLLMClient(
 baseClient = openAIClient,
 embeddingClient = embeddingClient,
 embeddingModel = EmbeddingModelConfig("text-embedding-3-small", 1536),
 config = CacheConfig(
   similarityThreshold = 0.95,
   ttl = 1.hour,
   maxSize = 1000
 ),
 tracing = tracing
)

== Behavior ==

Computes embedding for the user/system prompt.
Searches cache for entries within similarityThreshold.
Validates additional constraints:
Entry must be within TTL.
Entry CompletionOptions must strictly match the request options.
On Hit: Returns cached Completion and updates LRU order. Emits cache_hit trace event.
On Miss: Delegating to baseClient, caches the result, and emits cache_miss trace event.

== Limitations ==

streamComplete requests bypass the cache entirely.
Cache is in-memory and lost on restart.
Cache lookup involves a linear scan (O(n)) of all entries to calculate cosine similarity. Performance may degrade with very large maxSize.

Value parameters

baseClient: The underlying LLM client to delegate to on cache miss.
clock: Clock for TTL verification (defaults to UTC).
config: Cache configuration (threshold, TTL, max size).
embeddingClient: Client to generate embeddings for prompts.
embeddingModel: Configuration for the embedding model used.
tracing: Tracing instance for observability.

Attributes

Supertypes: trait LLMClient

class Object

trait Matchable

class Any

In this article

Generated with