org.llm4s.context

Storage interface for externalized content.

When tool outputs exceed the externalization threshold, they are stored in an ArtifactStore and replaced with content pointers. This allows the conversation to reference the content without including it inline.

==Content-Addressed Storage==

Content is stored using content-addressed keys (based on content hash), which enables:

Deduplication of identical outputs
Efficient retrieval without scanning
Immutable content guarantees

==Implementations==

Use ArtifactStore.inMemory for testing and short-lived sessions. For production, implement with persistent storage (database, S3, etc.).

Attributes

Companion: object
Supertypes: class Object

trait Matchable

class Any

Factory methods for ArtifactStore implementations.

Attributes

Companion: trait
Supertypes: class Object

trait Matchable

class Any
Self type: ArtifactStore.type

Represents a named compression rule that transforms a sequence of messages.

Rules are applied sequentially by DeterministicCompressor until the token budget is met. Each rule is designed to be:

'''Idempotent''': Applying twice produces the same result
'''Safe''': Never removes critical semantic information
'''Measurable''': Token reduction can be calculated

Value parameters

apply: The transformation function
name: Identifier for logging and debugging

Attributes

Companion: object
Supertypes: trait Serializable

trait Product

trait Equals

class Object

trait Matchable

class Any
Show all

Built-in compression rules for common scenarios.

==Available Rules==

'''removeRedundantPhrases''': Removes "as I mentioned before", "like I said"
'''compressRepetitiveContent''': Deduplicates repeated sentences with "×N" markers
'''truncateVerboseResponses''': Summarizes very long assistant responses
'''removeFillerWords''': Cleans "um", "well", "you know" from transcripts
'''consolidateExamples''': Merges multiple example messages
'''compressToolOutputs''': Compresses large tool outputs (see ToolOutputCompressor)

Attributes

Companion: class
Supertypes: trait Product

trait Mirror

class Object

trait Matchable

class Any
Self type: CompressionRule.type

Configuration for context management pipeline

Attributes

Companion: object
Supertypes: trait Serializable

trait Product

trait Equals

class Object

trait Matchable

class Any
Show all

Attributes

Companion: class
Supertypes: trait Product

trait Mirror

class Object

trait Matchable

class Any
Self type: ContextConfig.type

Orchestrates a 4-step context management pipeline (early-exit if budget is met at any step):

ToolDeterministicCompaction

Run DeterministicCompressor.compressToCap() with compressToolOutputs first (subjective edits OFF).
Goal: shrink/cap tool outputs (JSON/logs/binary) without touching user/assistant text.

HistoryCompression

Run HistoryCompressor.compressToDigest(...): • Keep last K semantic blocks as is (K = config.maxSemanticBlocks). • Replace older blocks with a deterministic [HISTORY_SUMMARY] digest capped to config.summaryTokenTarget.

LLMHistorySqueeze

If still over budget AND LLM enabled: • Compress the digest string only to summaryTokenTarget via LLMCompressor.squeezeDigest(). • No whole-conversation LLM compression.

FinalTokenTrim

TokenWindow.trimToBudget() with headroom.
Pin [HISTORY_SUMMARY] so it’s never dropped; pack remaining messages newest-first.

Attributes

Companion: object
Supertypes: class Object

trait Matchable

class Any

Attributes

Companion: class
Supertypes: class Object

trait Matchable

class Any
Self type: ContextManager.type

Represents a single step in the context management pipeline

Attributes

Companion: object
Supertypes: trait Serializable

trait Product

trait Equals

class Object

trait Matchable

class Any
Show all

Attributes

Companion: class
Supertypes: trait Product

trait Mirror

class Object

trait Matchable

class Any
Self type: ContextStep.type

Counts tokens in conversations and messages using configurable tokenizers. Provides accurate token counting for context management and budget planning.

Token counting is essential for:

Ensuring conversations fit within model context windows
Budget planning for API costs (many providers charge per token)
Context compression decisions in ContextManager

The counter applies fixed overheads to account for special tokens:

Message overhead: 4 tokens per message (role markers, delimiters)
Tool call overhead: 10 tokens per tool call (function markers)
Conversation overhead: 10 tokens (conversation framing)

Attributes

See also

ConversationTokenCounter.forModel for model-aware counter creation

TokenBreakdown for detailed per-message token analysis

Example

val counter = ConversationTokenCounter.forModel("gpt-4o").getOrElse(???)
val tokens = counter.countConversation(conversation)
println(s"Conversation uses $tokens tokens")

Companion

object

Supertypes

class Object

trait Matchable

class Any

Factory methods for creating ConversationTokenCounter instances.

Provides model-aware counter creation that automatically selects the appropriate tokenizer based on the model name. Supports OpenAI, Anthropic, Azure, and Ollama models.

==Tokenizer Selection==

Different models use different tokenization schemes:

'''GPT-4o, o1''': Uses o200k_base tokenizer
'''GPT-4, GPT-3.5''': Uses cl100k_base tokenizer
'''Claude models''': Uses cl100k_base approximation (may differ 20-30%)
'''Ollama models''': Uses cl100k_base approximation

Attributes

Example

// Model-aware creation (recommended)
val counter = ConversationTokenCounter.forModel("openai/gpt-4o")
// Direct tokenizer selection
val openAICounter = ConversationTokenCounter.openAI()
val gpt4oCounter = ConversationTokenCounter.openAI_o200k()

Companion

class

Supertypes

class Object

trait Matchable

class Any

Self type

ConversationTokenCounter.type

Result of token window processing

Attributes

Companion: object
Supertypes: trait Serializable

trait Product

trait Equals

class Object

trait Matchable

class Any
Show all

Attributes

Companion: class
Supertypes: trait Product

trait Mirror

class Object

trait Matchable

class Any
Self type: ConversationWindow.type

Implements rule-based deterministic compression for conversation context.

This compressor applies predictable, reproducible transformations to reduce token usage while preserving semantic meaning. Unlike LLM-based compression, the output is deterministic and doesn't require API calls.

==Compression Pipeline==

Compression occurs in two phases:

'''Tool Compaction''' (always applied):

Compresses large JSON/YAML tool outputs
Externalizes binary content
Truncates verbose logs and error traces

'''Subjective Edits''' (optional, requires enableSubjectiveEdits):

Removes filler words from transcript-like content
Deduplicates repetitive sentences
Truncates overly verbose assistant responses

==Safety Guarantees==

The compressor is designed to be '''safe''' and '''conservative''':

User messages are '''never''' modified
Code blocks and JSON are preserved verbatim
Filler word removal only applies to "transcript-like" content
Truncation preserves first and last sentences

Attributes

See also

ToolOutputCompressor for the tool compaction implementation

CompressionRule for individual compression rules

Example

val compressed = DeterministicCompressor.compressToCap(
 messages = conversation.messages,
 tokenCounter = counter,
 capTokens = 4000,
 enableSubjectiveEdits = true
)

Supertypes

class Object

trait Matchable

class Any

Self type

DeterministicCompressor.type

Deterministic history compression using structured digest extraction.

This compressor creates compact [HISTORY_SUMMARY] digests from older conversation blocks, preserving recent context verbatim while summarizing history. The digests extract key structured information that's likely to be referenced later.

==Compression Strategy==

The compressor uses a "keep last K" strategy:

'''Group''' messages into semantic blocks (user-assistant pairs)
'''Keep''' the last K blocks verbatim (recent context)
'''Digest''' older blocks into [HISTORY_SUMMARY] messages
'''Consolidate''' if total digest size exceeds the cap

==Information Extraction==

Each digest extracts structured information using regex patterns:

'''Identifiers''': IDs, UUIDs, keys, references
'''URLs''': HTTP/HTTPS links
'''Constraints''': Must/should/cannot requirements
'''Status Codes''': HTTP status codes, error codes
'''Errors''': Error messages and exceptions
'''Decisions''': "decided", "chosen", "selected" statements
'''Tool Usage''': Function/API call mentions
'''Outcomes''': Results, conclusions, completions

==Idempotency==

The compressor is '''idempotent''': if messages already contain [HISTORY_SUMMARY] markers, they are returned unchanged. This allows safe re-application.

Attributes

See also

SemanticBlocks for the block grouping algorithm

StructuredInfo for the extracted information types

Example

val compressed = HistoryCompressor.compressToDigest(
 messages = conversation.messages,
 tokenCounter = counter,
 capTokens = 400,  // Max tokens for digests
 keepLastK = 3     // Keep last 3 blocks verbatim
)

Supertypes

class Object

trait Matchable

class Any

Self type

HistoryCompressor.type

Compressed digest representation of a history block.

Contains the formatted summary text and metadata about the original block.

Value parameters

blockId: Original semantic block ID
blockType: Type of the block (UserAssistantPair, StandaloneTool, etc.)
content: Formatted digest text for inclusion in conversation
originalTokens: Estimated token count of original content

Attributes

Supertypes: trait Serializable

trait Product

trait Equals

class Object

trait Matchable

class Any
Show all

Result of LLM-powered compression process

Attributes

Supertypes: trait Serializable

trait Product

trait Equals

class Object

trait Matchable

class Any
Show all

Implements digest-only compression for [HISTORY_SUMMARY] messages. Replaces full LLM-powered compression with targeted digest compression.

This is step 3 in the new 4-stage context management pipeline:

Tool deterministic compaction
History compression
LLM digest squeeze (this module)
Final token trim

Attributes

Supertypes: class Object

trait Matchable

class Any
Self type: LLMCompressor.type

Final result of context management pipeline

Attributes

Supertypes: trait Serializable

trait Product

trait Equals

class Object

trait Matchable

class Any
Show all

Token information for a single message

Attributes

Supertypes: trait Serializable

trait Product

trait Equals

class Object

trait Matchable

class Any
Show all

Represents a single step in the new 4-stage context management pipeline

Attributes

Supertypes: trait Serializable

trait Product

trait Equals

class Object

trait Matchable

class Any
Show all

Represents a semantic block of related messages in a conversation.

A semantic block groups logically related messages together, typically a user-assistant exchange with any associated tool calls.

Value parameters

blockType: The classification of this block (pair, standalone, etc.)
expectingAssistantResponse: True if the block is incomplete (awaiting response)
id: Unique identifier for this block
messages: The messages contained in this block

Attributes

Companion: object
Supertypes: trait Serializable

trait Product

trait Equals

class Object

trait Matchable

class Any
Show all

Attributes

Companion: class
Supertypes: trait Product

trait Mirror

class Object

trait Matchable

class Any
Self type: SemanticBlock.type

Classification of semantic block types.

Block types help compression algorithms make decisions about how to handle different conversation patterns:

'''UserAssistantPair''': Complete conversation turn, can be summarized
'''StandaloneAssistant''': Isolated response, preserve carefully
'''StandaloneTool''': Tool output without context, may need special handling
'''Other''': Unclassified, treat conservatively

Attributes

Companion: object
Supertypes: class Object

trait Matchable

class Any
Known subtypes: object Other

object StandaloneAssistant

object StandaloneTool

object UserAssistantPair

Attributes

Companion: trait
Supertypes: trait Sum

trait Mirror

class Object

trait Matchable

class Any
Self type: SemanticBlockType.type

Groups messages into semantic blocks for context compression and history management.

==Semantic Block Concept==

A semantic block represents a logically related group of messages in a conversation. The primary patterns are:

'''User-Assistant Pairs''': A user question followed by an assistant response. These form the natural "turns" of a conversation.
'''Tool Interactions''': Tool calls and their results, often associated with an assistant message that triggered them.
'''Standalone Messages''': Messages that don't fit the pair pattern (e.g., system messages, isolated assistant responses).

==Algorithm==

The grouping algorithm uses a tail-recursive state machine:

'''UserMessage''': Starts a new block expecting an assistant response
'''AssistantMessage''': Completes a user block, or becomes standalone
'''ToolMessage''': Attaches to the current block or becomes standalone
'''SystemMessage''': Treated similarly to assistant (can complete blocks)

Attributes

See also

HistoryCompressor which uses semantic blocks for history compression

SemanticBlockType for the classification of block types

Example

val messages = Seq(
 UserMessage("What's the weather?"),
 AssistantMessage("I'll check for you..."),
 ToolMessage("""{"temp": 72}""", "call_1"),
 AssistantMessage("It's 72 degrees.")
)
val blocks = SemanticBlocks.groupIntoSemanticBlocks(messages)
// Result: One UserAssistantPair block containing all 4 messages

Supertypes

class Object

trait Matchable

class Any

Self type

SemanticBlocks.type

Structured information extracted from a message block.

Each field contains strings matched by the corresponding regex pattern in HistoryCompressor. Matches are limited to prevent digest bloat.

Value parameters

constraints: Requirement statements (must, should, cannot)
decisions: Decision statements
errors: Error messages and exception info
identifiers: IDs, UUIDs, keys found in the content
outcomes: Result and conclusion statements
statusCodes: HTTP status codes, error codes
toolUsage: Tool/function/API call mentions
urls: HTTP/HTTPS URLs

Attributes

Supertypes: trait Serializable

trait Product

trait Equals

class Object

trait Matchable

class Any
Show all

Detailed breakdown of token usage in a conversation

Attributes

Supertypes: trait Serializable

trait Product

trait Equals

class Object

trait Matchable

class Any
Show all

Token usage information for monitoring and debugging

Attributes

Supertypes: trait Serializable

trait Product

trait Equals

class Object

trait Matchable

class Any
Show all

Manages conversation token windows by trimming conversations to fit within token budgets. Always preserves system messages and applies configurable headroom for safety.

Attributes

Supertypes: class Object

trait Matchable

class Any
Self type: TokenWindow.type

Handles intelligent compression and externalization of tool outputs.

Tool outputs (from function calls, API responses, file reads) can be very large and quickly consume the context window. This compressor applies content-aware strategies to reduce their size while preserving essential information.

==Content Type Detection==

The compressor automatically detects content types:

'''JSON/YAML''': Removes null values, empty strings, truncates large arrays
'''Logs''': Keeps head/tail, collapses repeated lines
'''Errors''': Preserves error message and top stack frames
'''Binary''': Replaces with placeholder (always externalized)
'''Text''': Generic word-based truncation

==Size Thresholds==

Content is processed based on size:

'''< 2KB''': Kept as-is (no compression)
'''2KB - 8KB''': Inline compression (type-specific)
'''> 8KB''': Externalized to ArtifactStore with content pointer

==Externalization==

Large content is stored in an ArtifactStore and replaced with a pointer:

[EXTERNALIZED: abc123... | JSON | JSON object with 42 fields, 15234 bytes]

The original content can be retrieved using the artifact key.

Attributes

See also: ArtifactStore for content storage interface

DeterministicCompressor which uses this for the tool compaction phase
Supertypes: class Object

trait Matchable

class Any
Self type: ToolOutputCompressor.type

org.llm4s.context

Members list

Packages

Type members

Classlikes

Attributes

Attributes

Value parameters

Attributes

Attributes

Attributes

Attributes

Attributes

Attributes

Attributes

Attributes

Attributes

Attributes

Attributes

Attributes

Attributes

Attributes

Value parameters

Attributes

Attributes

Attributes

Attributes

Attributes

Attributes

Value parameters

Attributes

Attributes

Attributes

Attributes

Attributes

Value parameters

Attributes

Attributes

Attributes

Attributes

Attributes