org.llm4s.context
Members list
Packages
Type members
Classlikes
Storage interface for externalized content.
Storage interface for externalized content.
When tool outputs exceed the externalization threshold, they are stored in an ArtifactStore and replaced with content pointers. This allows the conversation to reference the content without including it inline.
==Content-Addressed Storage==
Content is stored using content-addressed keys (based on content hash), which enables:
- Deduplication of identical outputs
- Efficient retrieval without scanning
- Immutable content guarantees
==Implementations==
Use ArtifactStore.inMemory for testing and short-lived sessions. For production, implement with persistent storage (database, S3, etc.).
Attributes
- Companion
- object
- Supertypes
-
class Objecttrait Matchableclass Any
Factory methods for ArtifactStore implementations.
Factory methods for ArtifactStore implementations.
Attributes
- Companion
- trait
- Supertypes
-
class Objecttrait Matchableclass Any
- Self type
-
ArtifactStore.type
Represents a named compression rule that transforms a sequence of messages.
Represents a named compression rule that transforms a sequence of messages.
Rules are applied sequentially by DeterministicCompressor until the token budget is met. Each rule is designed to be:
- '''Idempotent''': Applying twice produces the same result
- '''Safe''': Never removes critical semantic information
- '''Measurable''': Token reduction can be calculated
Value parameters
- apply
-
The transformation function
- name
-
Identifier for logging and debugging
Attributes
- Companion
- object
- Supertypes
-
trait Serializabletrait Producttrait Equalsclass Objecttrait Matchableclass AnyShow all
Built-in compression rules for common scenarios.
Built-in compression rules for common scenarios.
==Available Rules==
- '''removeRedundantPhrases''': Removes "as I mentioned before", "like I said"
- '''compressRepetitiveContent''': Deduplicates repeated sentences with "×N" markers
- '''truncateVerboseResponses''': Summarizes very long assistant responses
- '''removeFillerWords''': Cleans "um", "well", "you know" from transcripts
- '''consolidateExamples''': Merges multiple example messages
- '''compressToolOutputs''': Compresses large tool outputs (see ToolOutputCompressor)
Attributes
- Companion
- class
- Supertypes
-
trait Producttrait Mirrorclass Objecttrait Matchableclass Any
- Self type
-
CompressionRule.type
Configuration for context management pipeline
Configuration for context management pipeline
Attributes
- Companion
- object
- Supertypes
-
trait Serializabletrait Producttrait Equalsclass Objecttrait Matchableclass AnyShow all
Attributes
- Companion
- class
- Supertypes
-
trait Producttrait Mirrorclass Objecttrait Matchableclass Any
- Self type
-
ContextConfig.type
Orchestrates a 4-step context management pipeline (early-exit if budget is met at any step):
Orchestrates a 4-step context management pipeline (early-exit if budget is met at any step):
- ToolDeterministicCompaction
- Run DeterministicCompressor.compressToCap() with compressToolOutputs first (subjective edits OFF).
- Goal: shrink/cap tool outputs (JSON/logs/binary) without touching user/assistant text.
- HistoryCompression
- Run HistoryCompressor.compressToDigest(...): • Keep last K semantic blocks as is (K = config.maxSemanticBlocks). • Replace older blocks with a deterministic [HISTORY_SUMMARY] digest capped to config.summaryTokenTarget.
- LLMHistorySqueeze
- If still over budget AND LLM enabled: • Compress the digest string only to summaryTokenTarget via LLMCompressor.squeezeDigest(). • No whole-conversation LLM compression.
- FinalTokenTrim
- TokenWindow.trimToBudget() with headroom.
- Pin [HISTORY_SUMMARY] so it’s never dropped; pack remaining messages newest-first.
Attributes
- Companion
- object
- Supertypes
-
class Objecttrait Matchableclass Any
Attributes
- Companion
- class
- Supertypes
-
class Objecttrait Matchableclass Any
- Self type
-
ContextManager.type
Represents a single step in the context management pipeline
Represents a single step in the context management pipeline
Attributes
- Companion
- object
- Supertypes
-
trait Serializabletrait Producttrait Equalsclass Objecttrait Matchableclass AnyShow all
Attributes
- Companion
- class
- Supertypes
-
trait Producttrait Mirrorclass Objecttrait Matchableclass Any
- Self type
-
ContextStep.type
Counts tokens in conversations and messages using configurable tokenizers. Provides accurate token counting for context management and budget planning.
Counts tokens in conversations and messages using configurable tokenizers. Provides accurate token counting for context management and budget planning.
Token counting is essential for:
- Ensuring conversations fit within model context windows
- Budget planning for API costs (many providers charge per token)
- Context compression decisions in ContextManager
The counter applies fixed overheads to account for special tokens:
- Message overhead: 4 tokens per message (role markers, delimiters)
- Tool call overhead: 10 tokens per tool call (function markers)
- Conversation overhead: 10 tokens (conversation framing)
Attributes
- See also
-
ConversationTokenCounter.forModel for model-aware counter creation
TokenBreakdown for detailed per-message token analysis
- Example
-
val counter = ConversationTokenCounter.forModel("gpt-4o").getOrElse(???) val tokens = counter.countConversation(conversation) println(s"Conversation uses $tokens tokens") - Companion
- object
- Supertypes
-
class Objecttrait Matchableclass Any
Factory methods for creating ConversationTokenCounter instances.
Factory methods for creating ConversationTokenCounter instances.
Provides model-aware counter creation that automatically selects the appropriate tokenizer based on the model name. Supports OpenAI, Anthropic, Azure, and Ollama models.
==Tokenizer Selection==
Different models use different tokenization schemes:
- '''GPT-4o, o1''': Uses
o200k_basetokenizer - '''GPT-4, GPT-3.5''': Uses
cl100k_basetokenizer - '''Claude models''': Uses
cl100k_baseapproximation (may differ 20-30%) - '''Ollama models''': Uses
cl100k_baseapproximation
Attributes
- Example
-
// Model-aware creation (recommended) val counter = ConversationTokenCounter.forModel("openai/gpt-4o") // Direct tokenizer selection val openAICounter = ConversationTokenCounter.openAI() val gpt4oCounter = ConversationTokenCounter.openAI_o200k() - Companion
- class
- Supertypes
-
class Objecttrait Matchableclass Any
- Self type
Result of token window processing
Result of token window processing
Attributes
- Companion
- object
- Supertypes
-
trait Serializabletrait Producttrait Equalsclass Objecttrait Matchableclass AnyShow all
Attributes
- Companion
- class
- Supertypes
-
trait Producttrait Mirrorclass Objecttrait Matchableclass Any
- Self type
-
ConversationWindow.type
Implements rule-based deterministic compression for conversation context.
Implements rule-based deterministic compression for conversation context.
This compressor applies predictable, reproducible transformations to reduce token usage while preserving semantic meaning. Unlike LLM-based compression, the output is deterministic and doesn't require API calls.
==Compression Pipeline==
Compression occurs in two phases:
- '''Tool Compaction''' (always applied):
- Compresses large JSON/YAML tool outputs
- Externalizes binary content
- Truncates verbose logs and error traces
- '''Subjective Edits''' (optional, requires
enableSubjectiveEdits):
- Removes filler words from transcript-like content
- Deduplicates repetitive sentences
- Truncates overly verbose assistant responses
==Safety Guarantees==
The compressor is designed to be '''safe''' and '''conservative''':
- User messages are '''never''' modified
- Code blocks and JSON are preserved verbatim
- Filler word removal only applies to "transcript-like" content
- Truncation preserves first and last sentences
Attributes
- See also
-
ToolOutputCompressor for the tool compaction implementation
CompressionRule for individual compression rules
- Example
-
val compressed = DeterministicCompressor.compressToCap( messages = conversation.messages, tokenCounter = counter, capTokens = 4000, enableSubjectiveEdits = true ) - Supertypes
-
class Objecttrait Matchableclass Any
- Self type
Deterministic history compression using structured digest extraction.
Deterministic history compression using structured digest extraction.
This compressor creates compact [HISTORY_SUMMARY] digests from older conversation blocks, preserving recent context verbatim while summarizing history. The digests extract key structured information that's likely to be referenced later.
==Compression Strategy==
The compressor uses a "keep last K" strategy:
- '''Group''' messages into semantic blocks (user-assistant pairs)
- '''Keep''' the last K blocks verbatim (recent context)
- '''Digest''' older blocks into
[HISTORY_SUMMARY]messages - '''Consolidate''' if total digest size exceeds the cap
==Information Extraction==
Each digest extracts structured information using regex patterns:
- '''Identifiers''': IDs, UUIDs, keys, references
- '''URLs''': HTTP/HTTPS links
- '''Constraints''': Must/should/cannot requirements
- '''Status Codes''': HTTP status codes, error codes
- '''Errors''': Error messages and exceptions
- '''Decisions''': "decided", "chosen", "selected" statements
- '''Tool Usage''': Function/API call mentions
- '''Outcomes''': Results, conclusions, completions
==Idempotency==
The compressor is '''idempotent''': if messages already contain [HISTORY_SUMMARY] markers, they are returned unchanged. This allows safe re-application.
Attributes
- See also
-
SemanticBlocks for the block grouping algorithm
StructuredInfo for the extracted information types
- Example
-
val compressed = HistoryCompressor.compressToDigest( messages = conversation.messages, tokenCounter = counter, capTokens = 400, // Max tokens for digests keepLastK = 3 // Keep last 3 blocks verbatim ) - Supertypes
-
class Objecttrait Matchableclass Any
- Self type
-
HistoryCompressor.type
Compressed digest representation of a history block.
Compressed digest representation of a history block.
Contains the formatted summary text and metadata about the original block.
Value parameters
- blockId
-
Original semantic block ID
- blockType
-
Type of the block (UserAssistantPair, StandaloneTool, etc.)
- content
-
Formatted digest text for inclusion in conversation
- originalTokens
-
Estimated token count of original content
Attributes
- Supertypes
-
trait Serializabletrait Producttrait Equalsclass Objecttrait Matchableclass AnyShow all
Result of LLM-powered compression process
Result of LLM-powered compression process
Attributes
- Supertypes
-
trait Serializabletrait Producttrait Equalsclass Objecttrait Matchableclass AnyShow all
Implements digest-only compression for [HISTORY_SUMMARY] messages. Replaces full LLM-powered compression with targeted digest compression.
Implements digest-only compression for [HISTORY_SUMMARY] messages. Replaces full LLM-powered compression with targeted digest compression.
This is step 3 in the new 4-stage context management pipeline:
- Tool deterministic compaction
- History compression
- LLM digest squeeze (this module)
- Final token trim
Attributes
- Supertypes
-
class Objecttrait Matchableclass Any
- Self type
-
LLMCompressor.type
Final result of context management pipeline
Final result of context management pipeline
Attributes
- Supertypes
-
trait Serializabletrait Producttrait Equalsclass Objecttrait Matchableclass AnyShow all
Token information for a single message
Token information for a single message
Attributes
- Supertypes
-
trait Serializabletrait Producttrait Equalsclass Objecttrait Matchableclass AnyShow all
Represents a single step in the new 4-stage context management pipeline
Represents a single step in the new 4-stage context management pipeline
Attributes
- Supertypes
-
trait Serializabletrait Producttrait Equalsclass Objecttrait Matchableclass AnyShow all
Represents a semantic block of related messages in a conversation.
Represents a semantic block of related messages in a conversation.
A semantic block groups logically related messages together, typically a user-assistant exchange with any associated tool calls.
Value parameters
- blockType
-
The classification of this block (pair, standalone, etc.)
- expectingAssistantResponse
-
True if the block is incomplete (awaiting response)
- id
-
Unique identifier for this block
- messages
-
The messages contained in this block
Attributes
- Companion
- object
- Supertypes
-
trait Serializabletrait Producttrait Equalsclass Objecttrait Matchableclass AnyShow all
Attributes
- Companion
- class
- Supertypes
-
trait Producttrait Mirrorclass Objecttrait Matchableclass Any
- Self type
-
SemanticBlock.type
Classification of semantic block types.
Classification of semantic block types.
Block types help compression algorithms make decisions about how to handle different conversation patterns:
- '''UserAssistantPair''': Complete conversation turn, can be summarized
- '''StandaloneAssistant''': Isolated response, preserve carefully
- '''StandaloneTool''': Tool output without context, may need special handling
- '''Other''': Unclassified, treat conservatively
Attributes
- Companion
- object
- Supertypes
-
class Objecttrait Matchableclass Any
- Known subtypes
Attributes
- Companion
- trait
- Supertypes
-
trait Sumtrait Mirrorclass Objecttrait Matchableclass Any
- Self type
-
SemanticBlockType.type
Groups messages into semantic blocks for context compression and history management.
Groups messages into semantic blocks for context compression and history management.
==Semantic Block Concept==
A semantic block represents a logically related group of messages in a conversation. The primary patterns are:
-
'''User-Assistant Pairs''': A user question followed by an assistant response. These form the natural "turns" of a conversation.
-
'''Tool Interactions''': Tool calls and their results, often associated with an assistant message that triggered them.
-
'''Standalone Messages''': Messages that don't fit the pair pattern (e.g., system messages, isolated assistant responses).
==Algorithm==
The grouping algorithm uses a tail-recursive state machine:
- '''UserMessage''': Starts a new block expecting an assistant response
- '''AssistantMessage''': Completes a user block, or becomes standalone
- '''ToolMessage''': Attaches to the current block or becomes standalone
- '''SystemMessage''': Treated similarly to assistant (can complete blocks)
Attributes
- See also
-
HistoryCompressor which uses semantic blocks for history compression
SemanticBlockType for the classification of block types
- Example
-
val messages = Seq( UserMessage("What's the weather?"), AssistantMessage("I'll check for you..."), ToolMessage("""{"temp": 72}""", "call_1"), AssistantMessage("It's 72 degrees.") ) val blocks = SemanticBlocks.groupIntoSemanticBlocks(messages) // Result: One UserAssistantPair block containing all 4 messages - Supertypes
-
class Objecttrait Matchableclass Any
- Self type
-
SemanticBlocks.type
Structured information extracted from a message block.
Structured information extracted from a message block.
Each field contains strings matched by the corresponding regex pattern in HistoryCompressor. Matches are limited to prevent digest bloat.
Value parameters
- constraints
-
Requirement statements (must, should, cannot)
- decisions
-
Decision statements
- errors
-
Error messages and exception info
- identifiers
-
IDs, UUIDs, keys found in the content
- outcomes
-
Result and conclusion statements
- statusCodes
-
HTTP status codes, error codes
- toolUsage
-
Tool/function/API call mentions
- urls
-
HTTP/HTTPS URLs
Attributes
- Supertypes
-
trait Serializabletrait Producttrait Equalsclass Objecttrait Matchableclass AnyShow all
Detailed breakdown of token usage in a conversation
Detailed breakdown of token usage in a conversation
Attributes
- Supertypes
-
trait Serializabletrait Producttrait Equalsclass Objecttrait Matchableclass AnyShow all
Token usage information for monitoring and debugging
Token usage information for monitoring and debugging
Attributes
- Supertypes
-
trait Serializabletrait Producttrait Equalsclass Objecttrait Matchableclass AnyShow all
Manages conversation token windows by trimming conversations to fit within token budgets. Always preserves system messages and applies configurable headroom for safety.
Manages conversation token windows by trimming conversations to fit within token budgets. Always preserves system messages and applies configurable headroom for safety.
Attributes
- Supertypes
-
class Objecttrait Matchableclass Any
- Self type
-
TokenWindow.type
Handles intelligent compression and externalization of tool outputs.
Handles intelligent compression and externalization of tool outputs.
Tool outputs (from function calls, API responses, file reads) can be very large and quickly consume the context window. This compressor applies content-aware strategies to reduce their size while preserving essential information.
==Content Type Detection==
The compressor automatically detects content types:
- '''JSON/YAML''': Removes null values, empty strings, truncates large arrays
- '''Logs''': Keeps head/tail, collapses repeated lines
- '''Errors''': Preserves error message and top stack frames
- '''Binary''': Replaces with placeholder (always externalized)
- '''Text''': Generic word-based truncation
==Size Thresholds==
Content is processed based on size:
- '''< 2KB''': Kept as-is (no compression)
- '''2KB - 8KB''': Inline compression (type-specific)
- '''> 8KB''': Externalized to ArtifactStore with content pointer
==Externalization==
Large content is stored in an ArtifactStore and replaced with a pointer:
[EXTERNALIZED: abc123... | JSON | JSON object with 42 fields, 15234 bytes]
The original content can be retrieved using the artifact key.
Attributes
- See also
-
ArtifactStore for content storage interface
DeterministicCompressor which uses this for the tool compaction phase
- Supertypes
-
class Objecttrait Matchableclass Any
- Self type
-
ToolOutputCompressor.type