LLMCompressor

org.llm4s.context.LLMCompressor
object LLMCompressor

Applies LLM-powered compression to [HISTORY_SUMMARY] digest messages.

This is Step 3 in the 4-stage context management pipeline. It targets only the structured digest messages produced by HistoryCompressor — it never touches user messages, assistant messages, or tool outputs directly.

==When to Use==

Use LLMCompressor (via ContextManager with enableLLMCompression = true) when:

  • The conversation is long-running and history digests have grown too large to fit within the remaining token budget after HistoryCompressor has run.
  • Preserving semantic fidelity in the digest is important — you cannot afford to simply truncate or drop structured information (IDs, decisions, error codes).
  • An extra LLM API call per compression event is acceptable (latency and cost).

==When NOT to Use==

Prefer DeterministicCompressor or HistoryCompressor alone when:

  • Latency is critical (e.g., interactive chatbots where each round-trip matters).
  • Cost per token must be minimised — an extra inference call adds to compression cost.
  • The conversation fits within budget after deterministic steps; ContextManager skips this step automatically in that case.

==Cost==

When the combined size of all [HISTORY_SUMMARY] messages exceeds the cap, squeezeDigest makes one LLM API call per [HISTORY_SUMMARY] message (the cap is a combined budget, not a per-message threshold). Budget for this accordingly.

==Pipeline Position==

Step 1: DeterministicCompressor — free, fast, tool-output focused
Step 2: HistoryCompressor       — free, fast, deterministic digest
Step 3: LLMCompressor           — 1 LLM call per digest, slower, high quality  ← this object
Step 4: TokenWindow.trimToBudget — free, last resort

Attributes

See also

HistoryCompressor for digest generation that this compressor further shrinks

DeterministicCompressor for the cheaper alternative with no API calls

ContextManager for the orchestrator that chooses when to invoke each step

Graph
Supertypes
class Object
trait Matchable
class Any
Self type

Members list

Value members

Concrete methods

def squeezeDigest(messages: Seq[Message], tokenCounter: ConversationTokenCounter, llmClient: LLMClient, capTokens: Int): Result[Seq[Message]]

Compresses [HISTORY_SUMMARY] digest messages using an LLM, leaving all other message types (user, assistant, tool) completely untouched.

Compresses [HISTORY_SUMMARY] digest messages using an LLM, leaving all other message types (user, assistant, tool) completely untouched.

If no digest messages are found, or if their combined token count already fits within capTokens, the original messages are returned unchanged with no API call.

Value parameters

capTokens

Maximum allowed tokens for all digest messages combined

llmClient

LLM client used to perform the compression inference call

messages

Full conversation message sequence (digests interleaved with others)

tokenCounter

Token counter calibrated to the target model's tokenizer

Attributes

Returns

Compressed messages on success, or a org.llm4s.error.ContextError if the LLM call fails

Deprecated methods

def compress(conversation: Conversation, tokenCounter: ConversationTokenCounter, llmClient: LLMClient, targetBudget: TokenBudget, customPrompt: Option[String]): Result[LLMCompressedConversation]

Attributes

Deprecated
[Since version 0.9.0] Use squeezeDigest for new context management pipeline

Use squeezeDigest for new context management pipeline