org.llm4s.context

Members list

Packages

Type members

Classlikes

Storage interface for externalized content.

Storage interface for externalized content.

When tool outputs exceed the externalization threshold, they are stored in an ArtifactStore and replaced with content pointers. This allows the conversation to reference the content without including it inline.

==Content-Addressed Storage==

Content is stored using content-addressed keys (based on content hash), which enables:

  • Deduplication of identical outputs
  • Efficient retrieval without scanning
  • Immutable content guarantees

==Implementations==

Use ArtifactStore.inMemory for testing and short-lived sessions. For production, implement with persistent storage (database, S3, etc.).

Attributes

Companion
object
Supertypes
class Object
trait Matchable
class Any
object ArtifactStore

Factory methods for ArtifactStore implementations.

Factory methods for ArtifactStore implementations.

Attributes

Companion
trait
Supertypes
class Object
trait Matchable
class Any
Self type
case class CompressionRule(name: String, apply: Seq[Message] => Seq[Message])

Represents a named compression rule that transforms a sequence of messages.

Represents a named compression rule that transforms a sequence of messages.

Rules are applied sequentially by DeterministicCompressor until the token budget is met. Each rule is designed to be:

  • '''Idempotent''': Applying twice produces the same result
  • '''Safe''': Never removes critical semantic information
  • '''Measurable''': Token reduction can be calculated

Value parameters

apply

The transformation function

name

Identifier for logging and debugging

Attributes

Companion
object
Supertypes
trait Serializable
trait Product
trait Equals
class Object
trait Matchable
class Any
Show all

Built-in compression rules for common scenarios.

Built-in compression rules for common scenarios.

==Available Rules==

  • '''removeRedundantPhrases''': Removes "as I mentioned before", "like I said"
  • '''compressRepetitiveContent''': Deduplicates repeated sentences with "×N" markers
  • '''truncateVerboseResponses''': Summarizes very long assistant responses
  • '''removeFillerWords''': Cleans "um", "well", "you know" from transcripts
  • '''consolidateExamples''': Merges multiple example messages
  • '''compressToolOutputs''': Compresses large tool outputs (see ToolOutputCompressor)

Attributes

Companion
class
Supertypes
trait Product
trait Mirror
class Object
trait Matchable
class Any
Self type
case class ContextConfig(headroomPercent: HeadroomPercent, maxSemanticBlocks: ContextWindowSize, enableRollingSummary: Boolean, enableDeterministicCompression: Boolean, enableLLMCompression: Boolean, summaryTokenTarget: Int, enableSubjectiveEdits: Boolean)

Configuration for context management pipeline

Configuration for context management pipeline

Attributes

Companion
object
Supertypes
trait Serializable
trait Product
trait Equals
class Object
trait Matchable
class Any
Show all
object ContextConfig

Attributes

Companion
class
Supertypes
trait Product
trait Mirror
class Object
trait Matchable
class Any
Self type
class ContextManager(tokenCounter: ConversationTokenCounter, config: ContextConfig, llmClient: Option[LLMClient], artifactStore: Option[ArtifactStore])

Orchestrates a 4-step context management pipeline (early-exit if budget is met at any step):

Orchestrates a 4-step context management pipeline (early-exit if budget is met at any step):

  1. ToolDeterministicCompaction
  • Run DeterministicCompressor.compressToCap() with compressToolOutputs first (subjective edits OFF).
  • Goal: shrink/cap tool outputs (JSON/logs/binary) without touching user/assistant text.
  1. HistoryCompression
  • Run HistoryCompressor.compressToDigest(...): • Keep last K semantic blocks as is (K = config.maxSemanticBlocks). • Replace older blocks with a deterministic [HISTORY_SUMMARY] digest capped to config.summaryTokenTarget.
  1. LLMHistorySqueeze
  • If still over budget AND LLM enabled: • Compress the digest string only to summaryTokenTarget via LLMCompressor.squeezeDigest(). • No whole-conversation LLM compression.
  1. FinalTokenTrim
  • TokenWindow.trimToBudget() with headroom.
  • Pin [HISTORY_SUMMARY] so it’s never dropped; pack remaining messages newest-first.

Attributes

Companion
object
Supertypes
class Object
trait Matchable
class Any

Attributes

Companion
class
Supertypes
class Object
trait Matchable
class Any
Self type
case class ContextStep(name: String, conversation: Conversation, tokensBefore: Int, tokensAfter: Int, applied: Boolean)

Represents a single step in the context management pipeline

Represents a single step in the context management pipeline

Attributes

Companion
object
Supertypes
trait Serializable
trait Product
trait Equals
class Object
trait Matchable
class Any
Show all
object ContextStep

Attributes

Companion
class
Supertypes
trait Product
trait Mirror
class Object
trait Matchable
class Any
Self type

Counts tokens in conversations and messages using configurable tokenizers. Provides accurate token counting for context management and budget planning.

Counts tokens in conversations and messages using configurable tokenizers. Provides accurate token counting for context management and budget planning.

Token counting is essential for:

  • Ensuring conversations fit within model context windows
  • Budget planning for API costs (many providers charge per token)
  • Context compression decisions in ContextManager

The counter applies fixed overheads to account for special tokens:

  • Message overhead: 4 tokens per message (role markers, delimiters)
  • Tool call overhead: 10 tokens per tool call (function markers)
  • Conversation overhead: 10 tokens (conversation framing)

Attributes

See also

ConversationTokenCounter.forModel for model-aware counter creation

TokenBreakdown for detailed per-message token analysis

Example
val counter = ConversationTokenCounter.forModel("gpt-4o").getOrElse(???)
val tokens = counter.countConversation(conversation)
println(s"Conversation uses $tokens tokens")
Companion
object
Supertypes
class Object
trait Matchable
class Any

Factory methods for creating ConversationTokenCounter instances.

Factory methods for creating ConversationTokenCounter instances.

Provides model-aware counter creation that automatically selects the appropriate tokenizer based on the model name. Supports OpenAI, Anthropic, Azure, and Ollama models.

==Tokenizer Selection==

Different models use different tokenization schemes:

  • '''GPT-4o, o1''': Uses o200k_base tokenizer
  • '''GPT-4, GPT-3.5''': Uses cl100k_base tokenizer
  • '''Claude models''': Uses cl100k_base approximation (may differ 20-30%)
  • '''Ollama models''': Uses cl100k_base approximation

Attributes

Example
// Model-aware creation (recommended)
val counter = ConversationTokenCounter.forModel("openai/gpt-4o")
// Direct tokenizer selection
val openAICounter = ConversationTokenCounter.openAI()
val gpt4oCounter = ConversationTokenCounter.openAI_o200k()
Companion
class
Supertypes
class Object
trait Matchable
class Any
Self type
case class ConversationWindow(conversation: Conversation, usage: TokenUsageInfo, wasTrimmed: Boolean, removedMessageCount: Int)

Result of token window processing

Result of token window processing

Attributes

Companion
object
Supertypes
trait Serializable
trait Product
trait Equals
class Object
trait Matchable
class Any
Show all

Attributes

Companion
class
Supertypes
trait Product
trait Mirror
class Object
trait Matchable
class Any
Self type

Implements rule-based deterministic compression for conversation context.

Implements rule-based deterministic compression for conversation context.

This compressor applies predictable, reproducible transformations to reduce token usage while preserving semantic meaning. Unlike LLM-based compression, the output is deterministic and doesn't require API calls.

==Compression Pipeline==

Compression occurs in two phases:

  1. '''Tool Compaction''' (always applied):
  • Compresses large JSON/YAML tool outputs
  • Externalizes binary content
  • Truncates verbose logs and error traces
  1. '''Subjective Edits''' (optional, requires enableSubjectiveEdits):
  • Removes filler words from transcript-like content
  • Deduplicates repetitive sentences
  • Truncates overly verbose assistant responses

==Safety Guarantees==

The compressor is designed to be '''safe''' and '''conservative''':

  • User messages are '''never''' modified
  • Code blocks and JSON are preserved verbatim
  • Filler word removal only applies to "transcript-like" content
  • Truncation preserves first and last sentences

Attributes

See also

ToolOutputCompressor for the tool compaction implementation

CompressionRule for individual compression rules

Example
val compressed = DeterministicCompressor.compressToCap(
 messages = conversation.messages,
 tokenCounter = counter,
 capTokens = 4000,
 enableSubjectiveEdits = true
)
Supertypes
class Object
trait Matchable
class Any
Self type

Deterministic history compression using structured digest extraction.

Deterministic history compression using structured digest extraction.

This compressor creates compact [HISTORY_SUMMARY] digests from older conversation blocks, preserving recent context verbatim while summarizing history. The digests extract key structured information that's likely to be referenced later.

==Compression Strategy==

The compressor uses a "keep last K" strategy:

  1. '''Group''' messages into semantic blocks (user-assistant pairs)
  2. '''Keep''' the last K blocks verbatim (recent context)
  3. '''Digest''' older blocks into [HISTORY_SUMMARY] messages
  4. '''Consolidate''' if total digest size exceeds the cap

==Information Extraction==

Each digest extracts structured information using regex patterns:

  • '''Identifiers''': IDs, UUIDs, keys, references
  • '''URLs''': HTTP/HTTPS links
  • '''Constraints''': Must/should/cannot requirements
  • '''Status Codes''': HTTP status codes, error codes
  • '''Errors''': Error messages and exceptions
  • '''Decisions''': "decided", "chosen", "selected" statements
  • '''Tool Usage''': Function/API call mentions
  • '''Outcomes''': Results, conclusions, completions

==Idempotency==

The compressor is '''idempotent''': if messages already contain [HISTORY_SUMMARY] markers, they are returned unchanged. This allows safe re-application.

Attributes

See also

SemanticBlocks for the block grouping algorithm

StructuredInfo for the extracted information types

Example
val compressed = HistoryCompressor.compressToDigest(
 messages = conversation.messages,
 tokenCounter = counter,
 capTokens = 400,  // Max tokens for digests
 keepLastK = 3     // Keep last 3 blocks verbatim
)
Supertypes
class Object
trait Matchable
class Any
Self type
case class HistoryDigest(blockId: String, blockType: String, content: String, originalTokens: Int)

Compressed digest representation of a history block.

Compressed digest representation of a history block.

Contains the formatted summary text and metadata about the original block.

Value parameters

blockId

Original semantic block ID

blockType

Type of the block (UserAssistantPair, StandaloneTool, etc.)

content

Formatted digest text for inclusion in conversation

originalTokens

Estimated token count of original content

Attributes

Supertypes
trait Serializable
trait Product
trait Equals
class Object
trait Matchable
class Any
Show all
case class LLMCompressedConversation(conversation: Conversation, originalTokens: Int, compressedTokens: Int, compressionRatio: CompressionRatio, targetBudget: TokenBudget, budgetAchieved: Boolean)

Result of LLM-powered compression process

Result of LLM-powered compression process

Attributes

Supertypes
trait Serializable
trait Product
trait Equals
class Object
trait Matchable
class Any
Show all
object LLMCompressor

Implements digest-only compression for [HISTORY_SUMMARY] messages. Replaces full LLM-powered compression with targeted digest compression.

Implements digest-only compression for [HISTORY_SUMMARY] messages. Replaces full LLM-powered compression with targeted digest compression.

This is step 3 in the new 4-stage context management pipeline:

  1. Tool deterministic compaction
  2. History compression
  3. LLM digest squeeze (this module)
  4. Final token trim

Attributes

Supertypes
class Object
trait Matchable
class Any
Self type
case class ManagedConversation(conversation: Conversation, originalTokens: Int, finalTokens: Int, steps: Seq[ContextStep])

Final result of context management pipeline

Final result of context management pipeline

Attributes

Supertypes
trait Serializable
trait Product
trait Equals
class Object
trait Matchable
class Any
Show all
case class MessageTokenInfo(role: String, tokens: Int, preview: String)

Token information for a single message

Token information for a single message

Attributes

Supertypes
trait Serializable
trait Product
trait Equals
class Object
trait Matchable
class Any
Show all
case class PipelineStep(name: String, messages: Seq[Message], tokensBefore: Int, tokensAfter: Int, applied: Boolean)

Represents a single step in the new 4-stage context management pipeline

Represents a single step in the new 4-stage context management pipeline

Attributes

Supertypes
trait Serializable
trait Product
trait Equals
class Object
trait Matchable
class Any
Show all
case class SemanticBlock(id: SemanticBlockId, messages: Seq[Message], blockType: SemanticBlockType, expectingAssistantResponse: Boolean)

Represents a semantic block of related messages in a conversation.

Represents a semantic block of related messages in a conversation.

A semantic block groups logically related messages together, typically a user-assistant exchange with any associated tool calls.

Value parameters

blockType

The classification of this block (pair, standalone, etc.)

expectingAssistantResponse

True if the block is incomplete (awaiting response)

id

Unique identifier for this block

messages

The messages contained in this block

Attributes

Companion
object
Supertypes
trait Serializable
trait Product
trait Equals
class Object
trait Matchable
class Any
Show all
object SemanticBlock

Attributes

Companion
class
Supertypes
trait Product
trait Mirror
class Object
trait Matchable
class Any
Self type
sealed trait SemanticBlockType

Classification of semantic block types.

Classification of semantic block types.

Block types help compression algorithms make decisions about how to handle different conversation patterns:

  • '''UserAssistantPair''': Complete conversation turn, can be summarized
  • '''StandaloneAssistant''': Isolated response, preserve carefully
  • '''StandaloneTool''': Tool output without context, may need special handling
  • '''Other''': Unclassified, treat conservatively

Attributes

Companion
object
Supertypes
class Object
trait Matchable
class Any
Known subtypes

Attributes

Companion
trait
Supertypes
trait Sum
trait Mirror
class Object
trait Matchable
class Any
Self type

Groups messages into semantic blocks for context compression and history management.

Groups messages into semantic blocks for context compression and history management.

==Semantic Block Concept==

A semantic block represents a logically related group of messages in a conversation. The primary patterns are:

  • '''User-Assistant Pairs''': A user question followed by an assistant response. These form the natural "turns" of a conversation.

  • '''Tool Interactions''': Tool calls and their results, often associated with an assistant message that triggered them.

  • '''Standalone Messages''': Messages that don't fit the pair pattern (e.g., system messages, isolated assistant responses).

==Algorithm==

The grouping algorithm uses a tail-recursive state machine:

  1. '''UserMessage''': Starts a new block expecting an assistant response
  2. '''AssistantMessage''': Completes a user block, or becomes standalone
  3. '''ToolMessage''': Attaches to the current block or becomes standalone
  4. '''SystemMessage''': Treated similarly to assistant (can complete blocks)

Attributes

See also

HistoryCompressor which uses semantic blocks for history compression

SemanticBlockType for the classification of block types

Example
val messages = Seq(
 UserMessage("What's the weather?"),
 AssistantMessage("I'll check for you..."),
 ToolMessage("""{"temp": 72}""", "call_1"),
 AssistantMessage("It's 72 degrees.")
)
val blocks = SemanticBlocks.groupIntoSemanticBlocks(messages)
// Result: One UserAssistantPair block containing all 4 messages
Supertypes
class Object
trait Matchable
class Any
Self type
case class StructuredInfo(identifiers: Seq[String], urls: Seq[String], constraints: Seq[String], statusCodes: Seq[String], errors: Seq[String], decisions: Seq[String], toolUsage: Seq[String], outcomes: Seq[String])

Structured information extracted from a message block.

Structured information extracted from a message block.

Each field contains strings matched by the corresponding regex pattern in HistoryCompressor. Matches are limited to prevent digest bloat.

Value parameters

constraints

Requirement statements (must, should, cannot)

decisions

Decision statements

errors

Error messages and exception info

identifiers

IDs, UUIDs, keys found in the content

outcomes

Result and conclusion statements

statusCodes

HTTP status codes, error codes

toolUsage

Tool/function/API call mentions

urls

HTTP/HTTPS URLs

Attributes

Supertypes
trait Serializable
trait Product
trait Equals
class Object
trait Matchable
class Any
Show all
case class TokenBreakdown(totalTokens: Int, messages: Seq[MessageTokenInfo], overhead: Int)

Detailed breakdown of token usage in a conversation

Detailed breakdown of token usage in a conversation

Attributes

Supertypes
trait Serializable
trait Product
trait Equals
class Object
trait Matchable
class Any
Show all
case class TokenUsageInfo(currentTokens: Int, budgetLimit: TokenBudget, withinBudget: Boolean, utilizationPercentage: Int)

Token usage information for monitoring and debugging

Token usage information for monitoring and debugging

Attributes

Supertypes
trait Serializable
trait Product
trait Equals
class Object
trait Matchable
class Any
Show all
object TokenWindow

Manages conversation token windows by trimming conversations to fit within token budgets. Always preserves system messages and applies configurable headroom for safety.

Manages conversation token windows by trimming conversations to fit within token budgets. Always preserves system messages and applies configurable headroom for safety.

Attributes

Supertypes
class Object
trait Matchable
class Any
Self type

Handles intelligent compression and externalization of tool outputs.

Handles intelligent compression and externalization of tool outputs.

Tool outputs (from function calls, API responses, file reads) can be very large and quickly consume the context window. This compressor applies content-aware strategies to reduce their size while preserving essential information.

==Content Type Detection==

The compressor automatically detects content types:

  • '''JSON/YAML''': Removes null values, empty strings, truncates large arrays
  • '''Logs''': Keeps head/tail, collapses repeated lines
  • '''Errors''': Preserves error message and top stack frames
  • '''Binary''': Replaces with placeholder (always externalized)
  • '''Text''': Generic word-based truncation

==Size Thresholds==

Content is processed based on size:

  • '''< 2KB''': Kept as-is (no compression)
  • '''2KB - 8KB''': Inline compression (type-specific)
  • '''> 8KB''': Externalized to ArtifactStore with content pointer

==Externalization==

Large content is stored in an ArtifactStore and replaced with a pointer:

[EXTERNALIZED: abc123... | JSON | JSON object with 42 fields, 15234 bytes]

The original content can be retrieved using the artifact key.

Attributes

See also

ArtifactStore for content storage interface

DeterministicCompressor which uses this for the tool compaction phase

Supertypes
class Object
trait Matchable
class Any
Self type