org.llm4s.context

Members list

Packages

Type members

Classlikes

Storage interface for externalized content.

Storage interface for externalized content.

When tool outputs exceed the externalization threshold, they are stored in an ArtifactStore and replaced with content pointers. This allows the conversation to reference the content without including it inline.

==Content-Addressed Storage==

Content is stored using content-addressed keys (based on content hash), which enables:

  • Deduplication of identical outputs
  • Efficient retrieval without scanning
  • Immutable content guarantees

==Implementations==

Use ArtifactStore.inMemory for testing and short-lived sessions. For production, implement with persistent storage (database, S3, etc.).

Attributes

Companion
object
Supertypes
class Object
trait Matchable
class Any
object ArtifactStore

Factory methods for ArtifactStore implementations.

Factory methods for ArtifactStore implementations.

Attributes

Companion
trait
Supertypes
class Object
trait Matchable
class Any
Self type
case class CompressionRule(name: String, apply: Seq[Message] => Seq[Message])

Represents a named compression rule that transforms a sequence of messages.

Represents a named compression rule that transforms a sequence of messages.

Rules are applied sequentially by DeterministicCompressor until the token budget is met. Each rule is designed to be:

  • '''Idempotent''': Applying twice produces the same result
  • '''Safe''': Never removes critical semantic information
  • '''Measurable''': Token reduction can be calculated

Value parameters

apply

The transformation function

name

Identifier for logging and debugging

Attributes

Companion
object
Supertypes
trait Serializable
trait Product
trait Equals
class Object
trait Matchable
class Any
Show all

Built-in compression rules for common scenarios.

Built-in compression rules for common scenarios.

==Available Rules==

  • '''removeRedundantPhrases''': Removes "as I mentioned before", "like I said"
  • '''compressRepetitiveContent''': Deduplicates repeated sentences with "×N" markers
  • '''truncateVerboseResponses''': Summarizes very long assistant responses
  • '''removeFillerWords''': Cleans "um", "well", "you know" from transcripts
  • '''consolidateExamples''': Merges multiple example messages
  • '''compressToolOutputs''': Compresses large tool outputs (see ToolOutputCompressor)

Attributes

Companion
class
Supertypes
trait Product
trait Mirror
class Object
trait Matchable
class Any
Self type
case class ContextConfig(headroomPercent: HeadroomPercent, maxSemanticBlocks: ContextWindowSize, enableRollingSummary: Boolean, enableDeterministicCompression: Boolean, enableLLMCompression: Boolean, summaryTokenTarget: Int, enableSubjectiveEdits: Boolean)

Configuration for context management pipeline

Configuration for context management pipeline

Attributes

Companion
object
Supertypes
trait Serializable
trait Product
trait Equals
class Object
trait Matchable
class Any
Show all
object ContextConfig

Attributes

Companion
class
Supertypes
trait Product
trait Mirror
class Object
trait Matchable
class Any
Self type
class ContextManager(tokenCounter: ConversationTokenCounter, config: ContextConfig, llmClient: Option[LLMClient], artifactStore: Option[ArtifactStore])

Orchestrates the 4-step context management pipeline for llm4s conversations.

Orchestrates the 4-step context management pipeline for llm4s conversations.

ContextManager is the primary entry point for keeping a conversation within a model's token limit. Each step is applied in order of increasing cost; the pipeline exits early as soon as the conversation fits the requested budget.

==Compressor Comparison==

Strategy                | Cost        | Quality | Latency | What it touches
------------------------|-------------|---------|---------|-------------------------
DeterministicCompressor | Free        | Lower   | Fast    | Tool outputs only
HistoryCompressor       | Free        | Medium  | Fast    | Older history → digest
LLMCompressor           | 1 LLM call  | High    | Slow    | Digest messages only

==4-Step Pipeline==

Each step exits immediately if the budget is already met:

  1. '''ToolDeterministicCompaction''' (DeterministicCompressor): Shrinks and caps tool outputs (JSON, logs, binary content) without modifying user or assistant messages. No API calls; always runs first.

  2. '''HistoryCompression''' (HistoryCompressor): Keeps the last config.maxSemanticBlocks semantic blocks verbatim and replaces older blocks with compact [HISTORY_SUMMARY] digests, capped to config.summaryTokenTarget tokens. No API calls.

  3. '''LLMHistorySqueeze''' (LLMCompressor): If still over budget and config.enableLLMCompression is true, compresses only the digest messages further via one LLM inference call per digest.

  4. '''FinalTokenTrim''' (TokenWindow): Hard-trims to budget tokens (with config.headroomPercent), always pinning [HISTORY_SUMMARY] messages so they are never dropped.

==Usage==

// Quick setup with defaults:
val manager = ContextManager.withDefaults(tokenCounter).getOrElse(???)
val result  = manager.manageContext(conversation, budget = 8000)
result.foreach(managed => println(managed.summary))

// With an LLM client for Step 3:
val manager = ContextManager.create(tokenCounter, ContextConfig.default, Some(llmClient))
 .getOrElse(???)

Value parameters

artifactStore

Optional store for externalized binary/large content from Step 1; defaults to an in-memory store if None

config

Pipeline configuration — controls headroom, semantic block count, and which steps are enabled

llmClient

Optional LLM client; required for Step 3 (LLMHistorySqueeze); Step 3 is skipped if None

tokenCounter

Token counter calibrated to the target model's tokenizer

Attributes

See also

DeterministicCompressor for Step 1 implementation

HistoryCompressor for Step 2 implementation

LLMCompressor for Step 3 implementation

TokenWindow for Step 4 implementation

ContextConfig for all configuration options

Companion
object
Supertypes
class Object
trait Matchable
class Any

Attributes

Companion
class
Supertypes
class Object
trait Matchable
class Any
Self type
case class ContextStep(name: String, conversation: Conversation, tokensBefore: Int, tokensAfter: Int, applied: Boolean)

Represents a single step in the context management pipeline

Represents a single step in the context management pipeline

Attributes

Companion
object
Supertypes
trait Serializable
trait Product
trait Equals
class Object
trait Matchable
class Any
Show all
object ContextStep

Attributes

Companion
class
Supertypes
trait Product
trait Mirror
class Object
trait Matchable
class Any
Self type

Counts tokens in conversations and messages using configurable tokenizers. Provides accurate token counting for context management and budget planning.

Counts tokens in conversations and messages using configurable tokenizers. Provides accurate token counting for context management and budget planning.

Token counting is essential for:

  • Ensuring conversations fit within model context windows
  • Budget planning for API costs (many providers charge per token)
  • Context compression decisions in ContextManager

The counter applies fixed overheads to account for special tokens:

  • Message overhead: 4 tokens per message (role markers, delimiters)
  • Tool call overhead: 10 tokens per tool call (function markers)
  • Conversation overhead: 10 tokens (conversation framing)

Attributes

See also

ConversationTokenCounter.forModel for model-aware counter creation

TokenBreakdown for detailed per-message token analysis

Example
val result = for {
 counter <- ConversationTokenCounter.forModel("gpt-4o")
} yield {
 counter.countConversation(conversation)
}
result match {
 case Right(tokens) =>
   println(s"Conversation uses $$tokens tokens")
 case Left(error) =>
   println(s"Error: $${error.message}")
}
Companion
object
Supertypes
class Object
trait Matchable
class Any

Factory methods for creating ConversationTokenCounter instances.

Factory methods for creating ConversationTokenCounter instances.

Provides model-aware counter creation that automatically selects the appropriate tokenizer based on the model name. Supports OpenAI, Anthropic, Azure, and Ollama models.

==Tokenizer Selection==

Different models use different tokenization schemes:

  • '''GPT-4o, o1''': Uses o200k_base tokenizer
  • '''GPT-4, GPT-3.5''': Uses cl100k_base tokenizer
  • '''Claude models''': Uses cl100k_base approximation (may differ 20-30%)
  • '''Ollama models''': Uses cl100k_base approximation

Attributes

Example
// Model-aware creation (recommended)
val counter = ConversationTokenCounter.forModel("openai/gpt-4o")
// Direct tokenizer selection
val openAICounter = ConversationTokenCounter.openAI()
val gpt4oCounter = ConversationTokenCounter.openAI_o200k()
Companion
class
Supertypes
class Object
trait Matchable
class Any
Self type
case class ConversationWindow(conversation: Conversation, usage: TokenUsageInfo, wasTrimmed: Boolean, removedMessageCount: Int)

Result of token window processing

Result of token window processing

Attributes

Companion
object
Supertypes
trait Serializable
trait Product
trait Equals
class Object
trait Matchable
class Any
Show all

Attributes

Companion
class
Supertypes
trait Product
trait Mirror
class Object
trait Matchable
class Any
Self type

Implements rule-based deterministic compression for conversation context.

Implements rule-based deterministic compression for conversation context.

This compressor applies predictable, reproducible transformations to reduce token usage while preserving semantic meaning. Unlike LLM-based compression, the output is deterministic and doesn't require API calls.

==Compression Pipeline==

Compression occurs in two phases:

  1. '''Tool Compaction''' (always applied):
  • Compresses large JSON/YAML tool outputs
  • Externalizes binary content
  • Truncates verbose logs and error traces
  1. '''Subjective Edits''' (optional, requires enableSubjectiveEdits):
  • Removes filler words from transcript-like content
  • Deduplicates repetitive sentences
  • Truncates overly verbose assistant responses

==Safety Guarantees==

The compressor is designed to be '''safe''' and '''conservative''':

  • User messages are '''never''' modified
  • Code blocks and JSON are preserved verbatim
  • Filler word removal only applies to "transcript-like" content
  • Truncation preserves first and last sentences

Attributes

See also

ToolOutputCompressor for the tool compaction implementation

CompressionRule for individual compression rules

Example
val compressed = DeterministicCompressor.compressToCap(
 messages = conversation.messages,
 tokenCounter = counter,
 capTokens = 4000,
 enableSubjectiveEdits = true
)
Supertypes
class Object
trait Matchable
class Any
Self type

Deterministic history compression using structured digest extraction.

Deterministic history compression using structured digest extraction.

This compressor creates compact [HISTORY_SUMMARY] digests from older conversation blocks, preserving recent context verbatim while summarizing history. The digests extract key structured information that's likely to be referenced later.

==Compression Strategy==

The compressor uses a "keep last K" strategy:

  1. '''Group''' messages into semantic blocks (user-assistant pairs)
  2. '''Keep''' the last K blocks verbatim (recent context)
  3. '''Digest''' older blocks into [HISTORY_SUMMARY] messages
  4. '''Consolidate''' if total digest size exceeds the cap

==Information Extraction==

Each digest extracts structured information using regex patterns:

  • '''Identifiers''': IDs, UUIDs, keys, references
  • '''URLs''': HTTP/HTTPS links
  • '''Constraints''': Must/should/cannot requirements
  • '''Status Codes''': HTTP status codes, error codes
  • '''Errors''': Error messages and exceptions
  • '''Decisions''': "decided", "chosen", "selected" statements
  • '''Tool Usage''': Function/API call mentions
  • '''Outcomes''': Results, conclusions, completions

==Idempotency==

The compressor is '''idempotent''': if messages already contain [HISTORY_SUMMARY] markers, they are returned unchanged. This allows safe re-application.

Attributes

See also

SemanticBlocks for the block grouping algorithm

StructuredInfo for the extracted information types

Example
val compressed = HistoryCompressor.compressToDigest(
 messages = conversation.messages,
 tokenCounter = counter,
 capTokens = 400,  // Max tokens for digests
 keepLastK = 3     // Keep last 3 blocks verbatim
)
Supertypes
class Object
trait Matchable
class Any
Self type
case class HistoryDigest(blockId: String, blockType: String, content: String, originalTokens: Int)

Compressed digest representation of a history block.

Compressed digest representation of a history block.

Contains the formatted summary text and metadata about the original block.

Value parameters

blockId

Original semantic block ID

blockType

Type of the block (UserAssistantPair, StandaloneTool, etc.)

content

Formatted digest text for inclusion in conversation

originalTokens

Estimated token count of original content

Attributes

Supertypes
trait Serializable
trait Product
trait Equals
class Object
trait Matchable
class Any
Show all
case class LLMCompressedConversation(conversation: Conversation, originalTokens: Int, compressedTokens: Int, compressionRatio: CompressionRatio, targetBudget: TokenBudget, budgetAchieved: Boolean)

Result of a legacy full-conversation LLM compression operation.

Result of a legacy full-conversation LLM compression operation.

Produced by the deprecated LLMCompressor.compress method. Prefer LLMCompressor.squeezeDigest for the current pipeline.

Value parameters

budgetAchieved

true if compressedTokens <= targetBudget after compression

compressedTokens

Token count after compression

compressionRatio

compressedTokens / originalTokens (lower = more compressed)

conversation

The compressed conversation

originalTokens

Token count before compression

targetBudget

The token budget this compression was targeting

Attributes

Supertypes
trait Serializable
trait Product
trait Equals
class Object
trait Matchable
class Any
Show all
object LLMCompressor

Applies LLM-powered compression to [HISTORY_SUMMARY] digest messages.

Applies LLM-powered compression to [HISTORY_SUMMARY] digest messages.

This is Step 3 in the 4-stage context management pipeline. It targets only the structured digest messages produced by HistoryCompressor — it never touches user messages, assistant messages, or tool outputs directly.

==When to Use==

Use LLMCompressor (via ContextManager with enableLLMCompression = true) when:

  • The conversation is long-running and history digests have grown too large to fit within the remaining token budget after HistoryCompressor has run.
  • Preserving semantic fidelity in the digest is important — you cannot afford to simply truncate or drop structured information (IDs, decisions, error codes).
  • An extra LLM API call per compression event is acceptable (latency and cost).

==When NOT to Use==

Prefer DeterministicCompressor or HistoryCompressor alone when:

  • Latency is critical (e.g., interactive chatbots where each round-trip matters).
  • Cost per token must be minimised — an extra inference call adds to compression cost.
  • The conversation fits within budget after deterministic steps; ContextManager skips this step automatically in that case.

==Cost==

When the combined size of all [HISTORY_SUMMARY] messages exceeds the cap, squeezeDigest makes one LLM API call per [HISTORY_SUMMARY] message (the cap is a combined budget, not a per-message threshold). Budget for this accordingly.

==Pipeline Position==

Step 1: DeterministicCompressor — free, fast, tool-output focused
Step 2: HistoryCompressor       — free, fast, deterministic digest
Step 3: LLMCompressor           — 1 LLM call per digest, slower, high quality  ← this object
Step 4: TokenWindow.trimToBudget — free, last resort

Attributes

See also

HistoryCompressor for digest generation that this compressor further shrinks

DeterministicCompressor for the cheaper alternative with no API calls

ContextManager for the orchestrator that chooses when to invoke each step

Supertypes
class Object
trait Matchable
class Any
Self type
case class ManagedConversation(conversation: Conversation, originalTokens: Int, finalTokens: Int, steps: Seq[ContextStep])

Final result of context management pipeline

Final result of context management pipeline

Attributes

Supertypes
trait Serializable
trait Product
trait Equals
class Object
trait Matchable
class Any
Show all
case class MessageTokenInfo(role: String, tokens: Int, preview: String)

Token information for a single message

Token information for a single message

Attributes

Supertypes
trait Serializable
trait Product
trait Equals
class Object
trait Matchable
class Any
Show all
case class PipelineStep(name: String, messages: Seq[Message], tokensBefore: Int, tokensAfter: Int, applied: Boolean)

Represents a single step in the new 4-stage context management pipeline

Represents a single step in the new 4-stage context management pipeline

Attributes

Supertypes
trait Serializable
trait Product
trait Equals
class Object
trait Matchable
class Any
Show all
case class SemanticBlock(id: SemanticBlockId, messages: Seq[Message], blockType: SemanticBlockType, expectingAssistantResponse: Boolean)

Represents a semantic block of related messages in a conversation.

Represents a semantic block of related messages in a conversation.

A semantic block groups logically related messages together, typically a user-assistant exchange with any associated tool calls.

Value parameters

blockType

The classification of this block (pair, standalone, etc.)

expectingAssistantResponse

True if the block is incomplete (awaiting response)

id

Unique identifier for this block

messages

The messages contained in this block

Attributes

Companion
object
Supertypes
trait Serializable
trait Product
trait Equals
class Object
trait Matchable
class Any
Show all
object SemanticBlock

Attributes

Companion
class
Supertypes
trait Product
trait Mirror
class Object
trait Matchable
class Any
Self type
sealed trait SemanticBlockType

Classification of semantic block types.

Classification of semantic block types.

Block types help compression algorithms make decisions about how to handle different conversation patterns:

  • '''UserAssistantPair''': Complete conversation turn, can be summarized
  • '''StandaloneAssistant''': Isolated response, preserve carefully
  • '''StandaloneTool''': Tool output without context, may need special handling
  • '''Other''': Unclassified, treat conservatively

Attributes

Companion
object
Supertypes
class Object
trait Matchable
class Any
Known subtypes

Attributes

Companion
trait
Supertypes
trait Sum
trait Mirror
class Object
trait Matchable
class Any
Self type

Groups messages into semantic blocks for context compression and history management.

Groups messages into semantic blocks for context compression and history management.

==Semantic Block Concept==

A semantic block represents a logically related group of messages in a conversation. The primary patterns are:

  • '''User-Assistant Pairs''': A user question followed by an assistant response. These form the natural "turns" of a conversation.

  • '''Tool Interactions''': Tool calls and their results, often associated with an assistant message that triggered them.

  • '''Standalone Messages''': Messages that don't fit the pair pattern (e.g., system messages, isolated assistant responses).

==Algorithm==

The grouping algorithm uses a tail-recursive state machine:

  1. '''UserMessage''': Starts a new block expecting an assistant response
  2. '''AssistantMessage''': Completes a user block, or becomes standalone
  3. '''ToolMessage''': Attaches to the current block or becomes standalone
  4. '''SystemMessage''': Treated similarly to assistant (can complete blocks)

Attributes

See also

HistoryCompressor which uses semantic blocks for history compression

SemanticBlockType for the classification of block types

Example
val messages = Seq(
 UserMessage("What's the weather?"),
 AssistantMessage("I'll check for you..."),
 ToolMessage("""{"temp": 72}""", "call_1"),
 AssistantMessage("It's 72 degrees.")
)
val blocks = SemanticBlocks.groupIntoSemanticBlocks(messages)
// Result: One UserAssistantPair block containing all 4 messages
Supertypes
class Object
trait Matchable
class Any
Self type
case class StructuredInfo(identifiers: Seq[String], urls: Seq[String], constraints: Seq[String], statusCodes: Seq[String], errors: Seq[String], decisions: Seq[String], toolUsage: Seq[String], outcomes: Seq[String])

Structured information extracted from a message block.

Structured information extracted from a message block.

Each field contains strings matched by the corresponding regex pattern in HistoryCompressor. Matches are limited to prevent digest bloat.

Value parameters

constraints

Requirement statements (must, should, cannot)

decisions

Decision statements

errors

Error messages and exception info

identifiers

IDs, UUIDs, keys found in the content

outcomes

Result and conclusion statements

statusCodes

HTTP status codes, error codes

toolUsage

Tool/function/API call mentions

urls

HTTP/HTTPS URLs

Attributes

Supertypes
trait Serializable
trait Product
trait Equals
class Object
trait Matchable
class Any
Show all
case class TokenBreakdown(totalTokens: Int, messages: Seq[MessageTokenInfo], overhead: Int)

Detailed breakdown of token usage in a conversation

Detailed breakdown of token usage in a conversation

Attributes

Supertypes
trait Serializable
trait Product
trait Equals
class Object
trait Matchable
class Any
Show all
case class TokenUsageInfo(currentTokens: Int, budgetLimit: TokenBudget, withinBudget: Boolean, utilizationPercentage: Int)

Token usage information for monitoring and debugging

Token usage information for monitoring and debugging

Attributes

Supertypes
trait Serializable
trait Product
trait Equals
class Object
trait Matchable
class Any
Show all
object TokenWindow

Manages conversation token windows by trimming conversations to fit within token budgets. Always preserves system messages and applies configurable headroom for safety.

Manages conversation token windows by trimming conversations to fit within token budgets. Always preserves system messages and applies configurable headroom for safety.

Attributes

Supertypes
class Object
trait Matchable
class Any
Self type

Handles intelligent compression and externalization of tool outputs.

Handles intelligent compression and externalization of tool outputs.

Tool outputs (from function calls, API responses, file reads) can be very large and quickly consume the context window. This compressor applies content-aware strategies to reduce their size while preserving essential information.

==Content Type Detection==

The compressor automatically detects content types:

  • '''JSON/YAML''': Removes null values, empty strings, truncates large arrays
  • '''Logs''': Keeps head/tail, collapses repeated lines
  • '''Errors''': Preserves error message and top stack frames
  • '''Binary''': Replaces with placeholder (always externalized)
  • '''Text''': Generic word-based truncation

==Size Thresholds==

Content is processed based on size:

  • '''< 2KB''': Kept as-is (no compression)
  • '''2KB - 8KB''': Inline compression (type-specific)
  • '''> 8KB''': Externalized to ArtifactStore with content pointer

==Externalization==

Large content is stored in an ArtifactStore and replaced with a pointer:

[EXTERNALIZED: abc123... | JSON | JSON object with 42 fields, 15234 bytes]

The original content can be retrieved using the artifact key.

Attributes

See also

ArtifactStore for content storage interface

DeterministicCompressor which uses this for the tool compaction phase

Supertypes
class Object
trait Matchable
class Any
Self type