LLMCompressor
Applies LLM-powered compression to [HISTORY_SUMMARY] digest messages.
This is Step 3 in the 4-stage context management pipeline. It targets only the structured digest messages produced by HistoryCompressor — it never touches user messages, assistant messages, or tool outputs directly.
==When to Use==
Use LLMCompressor (via ContextManager with enableLLMCompression = true) when:
- The conversation is long-running and history digests have grown too large to fit within the remaining token budget after HistoryCompressor has run.
- Preserving semantic fidelity in the digest is important — you cannot afford to simply truncate or drop structured information (IDs, decisions, error codes).
- An extra LLM API call per compression event is acceptable (latency and cost).
==When NOT to Use==
Prefer DeterministicCompressor or HistoryCompressor alone when:
- Latency is critical (e.g., interactive chatbots where each round-trip matters).
- Cost per token must be minimised — an extra inference call adds to compression cost.
- The conversation fits within budget after deterministic steps;
ContextManagerskips this step automatically in that case.
==Cost==
When the combined size of all [HISTORY_SUMMARY] messages exceeds the cap, squeezeDigest makes one LLM API call per [HISTORY_SUMMARY] message (the cap is a combined budget, not a per-message threshold). Budget for this accordingly.
==Pipeline Position==
Step 1: DeterministicCompressor — free, fast, tool-output focused
Step 2: HistoryCompressor — free, fast, deterministic digest
Step 3: LLMCompressor — 1 LLM call per digest, slower, high quality ← this object
Step 4: TokenWindow.trimToBudget — free, last resort
Attributes
- See also
-
HistoryCompressor for digest generation that this compressor further shrinks
DeterministicCompressor for the cheaper alternative with no API calls
ContextManager for the orchestrator that chooses when to invoke each step
- Graph
-
- Supertypes
-
class Objecttrait Matchableclass Any
- Self type
-
LLMCompressor.type