Orchestrates the 4-step context management pipeline for llm4s conversations.
ContextManager is the primary entry point for keeping a conversation within a model's token limit. Each step is applied in order of increasing cost; the pipeline exits early as soon as the conversation fits the requested budget.
==Compressor Comparison==
Strategy | Cost | Quality | Latency | What it touches
------------------------|-------------|---------|---------|-------------------------
DeterministicCompressor | Free | Lower | Fast | Tool outputs only
HistoryCompressor | Free | Medium | Fast | Older history → digest
LLMCompressor | 1 LLM call | High | Slow | Digest messages only
==4-Step Pipeline==
Each step exits immediately if the budget is already met:
-
'''ToolDeterministicCompaction''' (DeterministicCompressor): Shrinks and caps tool outputs (JSON, logs, binary content) without modifying user or assistant messages. No API calls; always runs first.
-
'''HistoryCompression''' (HistoryCompressor): Keeps the last
config.maxSemanticBlockssemantic blocks verbatim and replaces older blocks with compact[HISTORY_SUMMARY]digests, capped toconfig.summaryTokenTargettokens. No API calls. -
'''LLMHistorySqueeze''' (LLMCompressor): If still over budget and
config.enableLLMCompressionistrue, compresses only the digest messages further via one LLM inference call per digest. -
'''FinalTokenTrim''' (TokenWindow): Hard-trims to
budgettokens (withconfig.headroomPercent), always pinning[HISTORY_SUMMARY]messages so they are never dropped.
==Usage==
// Quick setup with defaults:
val manager = ContextManager.withDefaults(tokenCounter).getOrElse(???)
val result = manager.manageContext(conversation, budget = 8000)
result.foreach(managed => println(managed.summary))
// With an LLM client for Step 3:
val manager = ContextManager.create(tokenCounter, ContextConfig.default, Some(llmClient))
.getOrElse(???)
Value parameters
- artifactStore
-
Optional store for externalized binary/large content from Step 1; defaults to an in-memory store if
None - config
-
Pipeline configuration — controls headroom, semantic block count, and which steps are enabled
- llmClient
-
Optional LLM client; required for Step 3 (LLMHistorySqueeze); Step 3 is skipped if
None - tokenCounter
-
Token counter calibrated to the target model's tokenizer
Attributes
- See also
-
DeterministicCompressor for Step 1 implementation
HistoryCompressor for Step 2 implementation
LLMCompressor for Step 3 implementation
TokenWindow for Step 4 implementation
ContextConfig for all configuration options
- Companion
- object
- Graph
-
- Supertypes
-
class Objecttrait Matchableclass Any