org.llm4s.llmconnect.caching

Members list

Type members

Classlikes

sealed abstract case class CacheConfig

Attributes

Companion
object
Supertypes
trait Serializable
trait Product
trait Equals
class Object
trait Matchable
class Any
Show all
object CacheConfig

Attributes

Companion
class
Supertypes
class Object
trait Matchable
class Any
Self type
case class CacheEntry(embedding: Seq[Double], response: Completion, timestamp: Instant, options: CompletionOptions)

Attributes

Supertypes
trait Serializable
trait Product
trait Equals
class Object
trait Matchable
class Any
Show all
class CachingLLMClient(baseClient: LLMClient, embeddingClient: EmbeddingClient, embeddingModel: EmbeddingModelConfig, config: CacheConfig, tracing: Tracing, clock: Clock) extends LLMClient

Semantic caching wrapper for LLMClient.

Semantic caching wrapper for LLMClient.

Caches LLM completions based on the semantic similarity of the prompt request. Useful for reducing costs and latency for repetitive or similar queries.

== Usage ==

val cachingClient = new CachingLLMClient(
 baseClient = openAIClient,
 embeddingClient = embeddingClient,
 embeddingModel = EmbeddingModelConfig("text-embedding-3-small", 1536),
 config = CacheConfig(
   similarityThreshold = 0.95,
   ttl = 1.hour,
   maxSize = 1000
 ),
 tracing = tracing
)

== Behavior ==

  • Computes embedding for the user/system prompt.
  • Searches cache for entries within similarityThreshold.
  • Validates additional constraints:
  • Entry must be within TTL.
  • Entry CompletionOptions must strictly match the request options.
  • On Hit: Returns cached Completion and updates LRU order. Emits cache_hit trace event.
  • On Miss: Delegating to baseClient, caches the result, and emits cache_miss trace event.

== Limitations ==

  • streamComplete requests bypass the cache entirely.
  • Cache is in-memory and lost on restart.
  • Cache lookup involves a linear scan (O(n)) of all entries to calculate cosine similarity. Performance may degrade with very large maxSize.

Value parameters

baseClient

The underlying LLM client to delegate to on cache miss.

clock

Clock for TTL verification (defaults to UTC).

config

Cache configuration (threshold, TTL, max size).

embeddingClient

Client to generate embeddings for prompts.

embeddingModel

Configuration for the embedding model used.

tracing

Tracing instance for observability.

Attributes

Supertypes
trait LLMClient
class Object
trait Matchable
class Any