Phase 1.1: Functional Conversation Management

Project: llm4s Agent Framework Enhancement Phase: 1.1 - Core Usability: Conversation State Management Date: 2025-11-16 Status: Design Proposal Author: AI Assistant (Claude)

Overview
Problem Statement
Design Principles
Current State Analysis
Proposed Design
API Examples
Implementation Plan
Testing Strategy
Migration Guide
Future Considerations

Overview

Goals

Design a functional, immutable, and ergonomic API for multi-turn agent conversations that:

Maintains llm4s’s functional programming principles (pure functions, immutable data)
Makes multi-turn conversations easy to write and understand
Provides automatic context window management
Supports conversation persistence without sacrificing purity
Eliminates the need for var and mutable state in user code

Non-Goals

Mutable session objects (like OpenAI SDK’s Session) - violates functional principles
Implicit global state - all state must be explicit
Breaking existing API - should be additive enhancements

Problem Statement

Current Pain Points

1. Imperative State Threading in Samples

Current sample code uses var to thread state:

// From SingleStepAgentExample.scala
var stat = state
while ((stat.status == AgentStatus.InProgress || stat.status == AgentStatus.WaitingForTools) && stepCount < 5) {
  agent.runStep(stat) match {
    case Right(newState) =>
      stat = newState  // ❌ Mutation!
    case Left(error) =>
      stat = stat.withStatus(AgentStatus.Failed(error.toString))
  }
  stepCount += 1
}

Problem: This is not functional code - uses mutable variables and imperative loops.

2. No Multi-Turn Conversation API

There’s no clean way to continue a conversation from a previous state:

// Current: must manually create new state
val state1 = agent.initialize("What's the weather?", tools)
val result1 = agent.run(state1, ...)

// How to continue? No built-in method
val state2 = result1.map { s =>
  s.copy(
    conversation = s.conversation.addMessage(UserMessage("And tomorrow?")),
    userQuery = "And tomorrow?",  // Have to update this too?
    status = AgentStatus.InProgress
  )
}
val result2 = state2.flatMap(agent.run(_, ...))

Problem: Verbose, error-prone, unclear what fields need updating.

3. No Context Window Management

Conversations grow unbounded - no automatic pruning:

// After many turns, conversation history becomes huge
// No built-in way to prune old messages while preserving important context

Problem: Will hit token limits, slow down API calls, increase costs.

4. `userQuery` Field is Ambiguous

case class AgentState(
  conversation: Conversation,
  tools: ToolRegistry,
  userQuery: String,  // ❌ What does this mean in multi-turn context?
  ...
)

Problem: In a multi-turn conversation, there’s no single “user query” - there are multiple user messages.

Design Principles

Functional Programming First

Pure Functions
- All agent operations return new immutable states
- No side effects in core logic (I/O at boundaries only)
- Referential transparency
Explicit State Flow
- State threading is visible in the code
- No hidden mutable session objects
- Clear data flow through for-comprehensions
Immutable Data Structures
- All state is immutable (already true)
- State updates via copy() or builder patterns
- Functional collections only

Ergonomics Second

Reduce Boilerplate
- Helper methods for common patterns
- Smart defaults
- Method chaining where appropriate
Type Safety
- Leverage Scala’s type system
- Compile-time guarantees
- Avoid stringly-typed APIs
Discoverability
- Self-documenting method names
- Consistent naming conventions
- Clear documentation

Current State Analysis

What Works Well

✅ Immutable Core Data Structures

AgentState is immutable
Conversation is immutable with good methods
Message types are immutable

✅ Pure Agent Methods

agent.run(state, ...) returns Result[AgentState] (pure)
agent.runStep(state) returns Result[AgentState] (pure)
agent.initialize(...) returns AgentState (pure)

✅ Result-based Error Handling

Consistent use of Result[A]
Composable with for-comprehensions
Clear error types

What Needs Improvement

❌ Multi-Turn Conversation Support

No continuation API
Manual state threading required
Unclear how to add a new turn

❌ Context Window Management

No automatic pruning
No token counting integration
No configurable strategies

❌ State Persistence

Partial serialization support
No persistence helpers
Unclear how to save/load conversations

❌ AgentState.userQuery Semantics

Ambiguous in multi-turn context
Not clear if it should be updated
Seems like a display-only field

Proposed Design

1. Conversation Continuation API

1.1 Core Continuation Method

Add to Agent class:

/**
 * Continue an agent conversation with a new user message.
 * This is the functional way to handle multi-turn conversations.
 *
 * @param previousState The previous agent state (must be Complete or Failed)
 * @param newUserMessage The new user message to process
 * @param maxSteps Optional limit on reasoning steps
 * @param traceLogPath Optional path for trace logging
 * @param debug Enable debug logging
 * @return Result containing the new agent state
 */
def continueConversation(
  previousState: AgentState,
  newUserMessage: String,
  maxSteps: Option[Int] = None,
  traceLogPath: Option[String] = None,
  debug: Boolean = false
): Result[AgentState] = {
  // Validate previous state
  previousState.status match {
    case AgentStatus.Complete | AgentStatus.Failed(_) =>
      // Prepare new state by adding user message and resetting status
      val newState = previousState.copy(
        conversation = previousState.conversation.addMessage(UserMessage(newUserMessage)),
        status = AgentStatus.InProgress,
        logs = Seq.empty  // Reset logs for new turn
      )
      // Run from the new state
      run(newState, maxSteps, traceLogPath, debug)

    case AgentStatus.InProgress | AgentStatus.WaitingForTools =>
      Left(ValidationError(
        "Cannot continue from an incomplete conversation. " +
        "Previous state must be Complete or Failed."
      ))
  }
}

Rationale:

Pure function - takes state, returns new state
Explicit validation of previous state
Clear semantics: only continue from completed turns
Preserves all agent configuration (tools, system message, completion options)

1.2 Multi-Turn Helper

For running multiple turns in sequence:

/**
 * Run multiple conversation turns sequentially.
 * Each turn waits for the previous to complete before starting.
 *
 * @param initialQuery The first user message
 * @param followUpQueries Additional user messages
 * @param tools Tool registry for the conversation
 * @param maxStepsPerTurn Optional step limit per turn
 * @param systemPromptAddition Optional system prompt addition
 * @param completionOptions Completion options
 * @param debug Enable debug logging
 * @return Result containing the final agent state after all turns
 */
def runMultiTurn(
  initialQuery: String,
  followUpQueries: Seq[String],
  tools: ToolRegistry,
  maxStepsPerTurn: Option[Int] = None,
  systemPromptAddition: Option[String] = None,
  completionOptions: CompletionOptions = CompletionOptions(),
  debug: Boolean = false
): Result[AgentState] = {
  // Run first turn
  val firstTurn = run(
    initialQuery,
    tools,
    maxStepsPerTurn,
    None,
    systemPromptAddition,
    completionOptions,
    debug
  )

  // Fold over follow-up queries, threading state through
  followUpQueries.foldLeft(firstTurn) { (stateResult, query) =>
    stateResult.flatMap { state =>
      continueConversation(state, query, maxStepsPerTurn, None, debug)
    }
  }
}

Rationale:

Functional fold instead of imperative loop
Composes well with Result
No mutable variables needed

2. Context Window Management

2.1 Configuration

package org.llm4s.agent

/**
 * Configuration for automatic context window management
 */
case class ContextWindowConfig(
  /**
   * Maximum number of tokens to keep in conversation history.
   * When exceeded, pruning will occur based on the strategy.
   */
  maxTokens: Option[Int] = None,

  /**
   * Maximum number of messages to keep (alternative to token-based).
   * When exceeded, oldest messages will be removed.
   */
  maxMessages: Option[Int] = None,

  /**
   * Always keep the system message (recommended)
   */
  preserveSystemMessage: Boolean = true,

  /**
   * Minimum number of recent turns to keep (user + assistant pairs)
   * Even if token limit exceeded, these will be preserved.
   */
  minRecentTurns: Int = 3,

  /**
   * Strategy for pruning messages
   */
  pruningStrategy: PruningStrategy = PruningStrategy.OldestFirst
)

/**
 * Strategies for pruning conversation history
 */
sealed trait PruningStrategy

object PruningStrategy {
  /**
   * Remove oldest messages first (FIFO)
   */
  case object OldestFirst extends PruningStrategy

  /**
   * Remove messages from the middle, keeping start and end
   */
  case object MiddleOut extends PruningStrategy

  /**
   * Keep only the most recent N turns (user+assistant pairs)
   */
  case class RecentTurnsOnly(turns: Int) extends PruningStrategy

  /**
   * Custom pruning function
   */
  case class Custom(fn: Seq[Message] => Seq[Message]) extends PruningStrategy
}

2.2 Pruning Implementation

Add to AgentState companion object:

object AgentState {
  /**
   * Prune conversation history based on configuration.
   * Returns a new AgentState with pruned conversation.
   *
   * @param state The current agent state
   * @param config Context window configuration
   * @param tokenCounter Function to count tokens in messages
   * @return New state with pruned conversation
   */
  def pruneConversation(
    state: AgentState,
    config: ContextWindowConfig,
    tokenCounter: Message => Int = defaultTokenCounter
  ): AgentState = {
    val messages = state.conversation.messages

    // Check if pruning is needed
    val needsPruning = (config.maxTokens, config.maxMessages) match {
      case (Some(maxTokens), _) =>
        messages.map(tokenCounter).sum > maxTokens
      case (None, Some(maxMessages)) =>
        messages.length > maxMessages
      case (None, None) =>
        false
    }

    if (!needsPruning) {
      state
    } else {
      val prunedMessages = config.pruningStrategy match {
        case PruningStrategy.OldestFirst =>
          pruneOldestFirst(messages, config, tokenCounter)
        case PruningStrategy.MiddleOut =>
          pruneMiddleOut(messages, config, tokenCounter)
        case PruningStrategy.RecentTurnsOnly(turns) =>
          pruneRecentTurnsOnly(messages, turns, config)
        case PruningStrategy.Custom(fn) =>
          fn(messages)
      }

      state.copy(conversation = Conversation(prunedMessages))
    }
  }

  /**
   * Default token counter (rough estimate: words * 1.3)
   */
  private def defaultTokenCounter(message: Message): Int = {
    val words = message.content.split("\\s+").length
    (words * 1.3).toInt
  }

  private def pruneOldestFirst(
    messages: Seq[Message],
    config: ContextWindowConfig,
    tokenCounter: Message => Int
  ): Seq[Message] = {
    // Separate system message if needed
    val (systemMsgs, otherMsgs) = messages.partition(_.role == MessageRole.System)

    // Calculate target based on maxTokens or maxMessages
    val targetCount = config.maxMessages.getOrElse(messages.length - 1)

    // Keep system messages + recent messages up to limit
    val toKeep = if (config.preserveSystemMessage) {
      systemMsgs ++ otherMsgs.takeRight(targetCount - systemMsgs.length)
    } else {
      messages.takeRight(targetCount)
    }

    toKeep
  }

  private def pruneMiddleOut(
    messages: Seq[Message],
    config: ContextWindowConfig,
    tokenCounter: Message => Int
  ): Seq[Message] = {
    val targetCount = config.maxMessages.getOrElse(messages.length / 2)
    val keepStart = targetCount / 2
    val keepEnd = targetCount - keepStart

    val (systemMsgs, otherMsgs) = messages.partition(_.role == MessageRole.System)

    if (config.preserveSystemMessage) {
      systemMsgs ++ otherMsgs.take(keepStart) ++ otherMsgs.takeRight(keepEnd)
    } else {
      messages.take(keepStart) ++ messages.takeRight(keepEnd)
    }
  }

  private def pruneRecentTurnsOnly(
    messages: Seq[Message],
    turns: Int,
    config: ContextWindowConfig
  ): Seq[Message] = {
    // A turn is a user message + assistant response (+ optional tool messages)
    // Keep the last N complete turns
    val (systemMsgs, otherMsgs) = messages.partition(_.role == MessageRole.System)

    // Group messages into turns (simplified: every user message starts a turn)
    val turnStarts = otherMsgs.zipWithIndex
      .filter(_._1.role == MessageRole.User)
      .map(_._2)

    val keepFromIndex = if (turnStarts.length > turns) {
      turnStarts(turnStarts.length - turns)
    } else {
      0
    }

    if (config.preserveSystemMessage) {
      systemMsgs ++ otherMsgs.drop(keepFromIndex)
    } else {
      otherMsgs.drop(keepFromIndex)
    }
  }
}

2.3 Automatic Pruning in Agent

Add optional pruning to continuation:

def continueConversation(
  previousState: AgentState,
  newUserMessage: String,
  maxSteps: Option[Int] = None,
  traceLogPath: Option[String] = None,
  contextWindowConfig: Option[ContextWindowConfig] = None,  // NEW
  debug: Boolean = false
): Result[AgentState] = {
  previousState.status match {
    case AgentStatus.Complete | AgentStatus.Failed(_) =>
      // Prepare new state
      val stateWithNewMessage = previousState.copy(
        conversation = previousState.conversation.addMessage(UserMessage(newUserMessage)),
        status = AgentStatus.InProgress,
        logs = Seq.empty
      )

      // Optionally prune before running
      val stateToRun = contextWindowConfig match {
        case Some(config) =>
          AgentState.pruneConversation(stateWithNewMessage, config)
        case None =>
          stateWithNewMessage
      }

      run(stateToRun, maxSteps, traceLogPath, debug)

    case _ =>
      Left(ValidationError("Cannot continue from incomplete conversation"))
  }
}

3. Fix `userQuery` Semantics

Option A: Make it Optional

case class AgentState(
  conversation: Conversation,
  tools: ToolRegistry,
  initialQuery: Option[String] = None,  // Renamed and made optional
  status: AgentStatus = AgentStatus.InProgress,
  logs: Seq[String] = Seq.empty,
  systemMessage: Option[SystemMessage] = None,
  completionOptions: CompletionOptions = CompletionOptions()
)

Rationale:

Rename to initialQuery makes purpose clear
Optional because multi-turn conversations don’t have a single query
Keep for backward compatibility and tracing

Option B: Remove it Entirely

case class AgentState(
  conversation: Conversation,
  tools: ToolRegistry,
  // userQuery removed - can be derived from conversation.messages if needed
  status: AgentStatus = AgentStatus.InProgress,
  logs: Seq[String] = Seq.empty,
  systemMessage: Option[SystemMessage] = None,
  completionOptions: CompletionOptions = CompletionOptions()
) {
  /**
   * Get the initial user query (first user message)
   */
  def initialQuery: Option[String] =
    conversation.messages
      .find(_.role == MessageRole.User)
      .map(_.content)
}

Rationale:

Cleaner - no redundant data
Can be derived from conversation
Breaking change but more correct

Recommendation: Use Option A for backward compatibility, consider Option B for v2.0.

4. Conversation Persistence Helpers

4.1 Serialization Support

object AgentState {
  /**
   * Serialize agent state to JSON.
   * Note: ToolRegistry is not serialized (contains function references).
   * Tools must be re-registered when loading state.
   */
  def toJson(state: AgentState): ujson.Value = {
    ujson.Obj(
      "conversation" -> writeJs(state.conversation),
      "initialQuery" -> state.initialQuery.map(ujson.Str).getOrElse(ujson.Null),
      "status" -> writeJs(state.status),
      "logs" -> ujson.Arr(state.logs.map(ujson.Str): _*),
      "systemMessage" -> state.systemMessage.map(msg => ujson.Str(msg.content)).getOrElse(ujson.Null),
      "completionOptions" -> writeJs(state.completionOptions)
      // Note: tools are NOT serialized
    )
  }

  /**
   * Deserialize agent state from JSON.
   * Tools must be provided separately as they cannot be serialized.
   */
  def fromJson(
    json: ujson.Value,
    tools: ToolRegistry
  ): Result[AgentState] = {
    Try {
      AgentState(
        conversation = read[Conversation](json("conversation")),
        tools = tools,  // Provided by caller
        initialQuery = json("initialQuery") match {
          case ujson.Str(q) => Some(q)
          case _ => None
        },
        status = read[AgentStatus](json("status")),
        logs = json("logs").arr.map(_.str).toSeq,
        systemMessage = json("systemMessage") match {
          case ujson.Str(content) => Some(SystemMessage(content))
          case _ => None
        },
        completionOptions = read[CompletionOptions](json("completionOptions"))
      )
    }.toResult
  }

  /**
   * Save state to file (convenience method)
   */
  def saveToFile(state: AgentState, path: String): Result[Unit] = {
    import java.nio.file.{Files, Paths}
    import java.nio.charset.StandardCharsets

    Try {
      val json = toJson(state)
      val jsonStr = write(json, indent = 2)
      Files.write(Paths.get(path), jsonStr.getBytes(StandardCharsets.UTF_8))
    }.toResult
  }

  /**
   * Load state from file (convenience method)
   */
  def loadFromFile(path: String, tools: ToolRegistry): Result[AgentState] = {
    import java.nio.file.{Files, Paths}
    import java.nio.charset.StandardCharsets

    for {
      jsonStr <- Try(new String(Files.readAllBytes(Paths.get(path)), StandardCharsets.UTF_8)).toResult
      json <- Try(ujson.read(jsonStr)).toResult
      state <- fromJson(json, tools)
    } yield state
  }
}

Rationale:

Pure functions - no side effects in core logic
saveToFile/loadFromFile are I/O helpers, clearly separate
Tools cannot be serialized (contain function references) - must be provided on load
Uses existing Result pattern

API Examples

Example 1: Multi-Turn Conversation (Functional Style)

import org.llm4s.agent.Agent
import org.llm4s.llmconnect.LLMConnect
import org.llm4s.toolapi.ToolRegistry
import org.llm4s.toolapi.tools.WeatherTool

val result = for {
  client <- LLMConnect.fromEnv()
  tools = new ToolRegistry(Seq(WeatherTool.tool))
  agent = new Agent(client)

  // Turn 1
  state1 <- agent.run("What's the weather in Paris?", tools)
  _ = println(s"Turn 1: ${state1.conversation.lastMessage.map(_.content)}")

  // Turn 2 - continue from state1
  state2 <- agent.continueConversation(state1, "And what about London?")
  _ = println(s"Turn 2: ${state2.conversation.lastMessage.map(_.content)}")

  // Turn 3 - continue from state2
  state3 <- agent.continueConversation(state2, "Which is warmer?")
  _ = println(s"Turn 3: ${state3.conversation.lastMessage.map(_.content)}")

} yield state3

result.fold(
  error => println(s"Error: $error"),
  finalState => println(s"Conversation completed with ${finalState.conversation.messageCount} messages")
)

No var, no mutation, pure functional style!

Example 2: Multi-Turn with Helper Method

val result = for {
  client <- LLMConnect.fromEnv()
  tools = new ToolRegistry(Seq(WeatherTool.tool))
  agent = new Agent(client)

  finalState <- agent.runMultiTurn(
    initialQuery = "What's the weather in Paris?",
    followUpQueries = Seq(
      "And what about London?",
      "Which is warmer?"
    ),
    tools = tools
  )

} yield finalState

result.fold(
  error => println(s"Error: $error"),
  state => {
    println(s"Completed ${state.conversation.messageCount} messages")
    state.conversation.filterByRole(MessageRole.Assistant).foreach { msg =>
      println(s"Assistant: ${msg.content}")
    }
  }
)

Even cleaner - single method call!

Example 3: Context Window Management

import org.llm4s.agent.{ContextWindowConfig, PruningStrategy}

val config = ContextWindowConfig(
  maxMessages = Some(20),  // Keep max 20 messages
  preserveSystemMessage = true,
  minRecentTurns = 3,
  pruningStrategy = PruningStrategy.OldestFirst
)

val result = for {
  client <- LLMConnect.fromEnv()
  tools = new ToolRegistry(Seq(...))
  agent = new Agent(client)

  state1 <- agent.run("First question?", tools)
  state2 <- agent.continueConversation(state1, "Second question?", contextWindowConfig = Some(config))
  state3 <- agent.continueConversation(state2, "Third question?", contextWindowConfig = Some(config))
  // ... many more turns ...

} yield state3

// Conversation automatically pruned to stay within limits

Example 4: Conversation Persistence

import org.llm4s.agent.AgentState

val result = for {
  client <- LLMConnect.fromEnv()
  tools = new ToolRegistry(Seq(...))
  agent = new Agent(client)

  // Start conversation
  state1 <- agent.run("Complex multi-step question?", tools)

  // Save state to file
  _ <- AgentState.saveToFile(state1, "/tmp/conversation-state.json")

  // ... later, in a different session ...

  // Load state from file
  loadedState <- AgentState.loadFromFile("/tmp/conversation-state.json", tools)

  // Continue from loaded state
  state2 <- agent.continueConversation(loadedState, "Follow-up question?")

} yield state2

Pure, no side effects in core logic, I/O clearly separated.

Example 5: Manual Step Execution (Functional Style)

Replacing the imperative while-loop:

import scala.annotation.tailrec

@tailrec
def runStepsUntilComplete(
  agent: Agent,
  state: AgentState,
  maxSteps: Int,
  stepCount: Int = 0
): Result[AgentState] = {
  if (stepCount >= maxSteps) {
    Right(state)  // Reached limit
  } else {
    state.status match {
      case AgentStatus.Complete | AgentStatus.Failed(_) =>
        Right(state)  // Finished

      case AgentStatus.InProgress | AgentStatus.WaitingForTools =>
        agent.runStep(state) match {
          case Right(newState) =>
            println(s"Step ${stepCount + 1}: ${newState.status}")
            runStepsUntilComplete(agent, newState, maxSteps, stepCount + 1)

          case Left(error) =>
            Left(error)
        }
    }
  }
}

// Usage
val result = for {
  client <- LLMConnect.fromEnv()
  tools = new ToolRegistry(Seq(...))
  agent = new Agent(client)
  initialState = agent.initialize("Question?", tools)
  finalState <- runStepsUntilComplete(agent, initialState, maxSteps = 10)
} yield finalState

Tail-recursive, no var, pure functional!

Implementation Plan

Phase 1: Core Continuation API (Week 1)

Tasks:

Add continueConversation method to Agent
Add runMultiTurn method to Agent
Update AgentState to rename userQuery to initialQuery (make optional)
Write unit tests for continuation logic
Update samples to use functional style

Deliverables:

Updated Agent.scala
Updated AgentState.scala
Unit tests in AgentSpec.scala
Updated SingleStepAgentExample.scala and MultiStepAgentExample.scala

Phase 2: Context Window Management (Week 2)

Tasks:

Define ContextWindowConfig and PruningStrategy
Implement AgentState.pruneConversation
Integrate pruning into continueConversation
Add integration with token counting (from context.tokens module)
Write comprehensive tests for pruning strategies
Add example demonstrating long conversations

Deliverables:

ContextWindowConfig.scala
Updated AgentState.scala with pruning logic
PruningSpec.scala with tests
LongConversationExample.scala sample

Phase 3: Persistence Helpers (Week 3)

Tasks:

Implement AgentState.toJson / fromJson
Implement saveToFile / loadFromFile helpers
Add comprehensive serialization tests
Document limitations (tools not serialized)
Add persistence example

Deliverables:

Updated AgentState.scala with serialization
AgentStatePersistenceSpec.scala
ConversationPersistenceExample.scala

Phase 4: Documentation & Migration (Week 4)

Tasks:

Update CLAUDE.md with new APIs
Write migration guide from old imperative style
Update README with multi-turn examples
Add ScalaDoc to all new methods
Review and merge

Deliverables:

Updated documentation
Migration guide
PR ready for review

Testing Strategy

Unit Tests

class AgentContinuationSpec extends AnyFlatSpec with Matchers {

  "Agent.continueConversation" should "add user message to previous state" in {
    val mockClient = mock[LLMClient]
    val agent = new Agent(mockClient)
    val tools = new ToolRegistry(Seq.empty)

    val state1 = agent.initialize("First query", tools)
    val completedState = state1.copy(status = AgentStatus.Complete)

    // Mock the run to return immediately
    val result = agent.continueConversation(completedState, "Second query")

    result shouldBe Right(...)
    // Verify conversation has both messages
  }

  it should "fail if previous state is not complete" in {
    val mockClient = mock[LLMClient]
    val agent = new Agent(mockClient)
    val tools = new ToolRegistry(Seq.empty)

    val state = agent.initialize("Query", tools)
    // State is InProgress

    val result = agent.continueConversation(state, "Next query")

    result.isLeft shouldBe true
  }

  "Agent.runMultiTurn" should "execute all turns sequentially" in {
    // Test with mocked client
    // Verify each turn completes before next starts
  }
}

class ContextWindowPruningSpec extends AnyFlatSpec with Matchers {

  "AgentState.pruneConversation" should "keep messages under limit" in {
    val config = ContextWindowConfig(maxMessages = Some(10))
    val messages = (1 to 20).map(i => UserMessage(s"Message $i"))
    val state = AgentState(
      conversation = Conversation(messages),
      tools = new ToolRegistry(Seq.empty)
    )

    val pruned = AgentState.pruneConversation(state, config)

    pruned.conversation.messageCount shouldBe 10
  }

  it should "preserve system message when configured" in {
    val config = ContextWindowConfig(
      maxMessages = Some(5),
      preserveSystemMessage = true
    )
    val messages = Seq(SystemMessage("System")) ++ (1 to 10).map(i => UserMessage(s"Msg $i"))
    val state = AgentState(Conversation(messages), new ToolRegistry(Seq.empty))

    val pruned = AgentState.pruneConversation(state, config)

    pruned.conversation.messages.head.role shouldBe MessageRole.System
    pruned.conversation.messageCount shouldBe 5
  }

  it should "use custom pruning strategy" in {
    val customStrategy = PruningStrategy.Custom { messages =>
      messages.filter(_.role != MessageRole.Tool)  // Remove all tool messages
    }
    val config = ContextWindowConfig(pruningStrategy = customStrategy)

    // Test custom pruning
  }
}

class AgentStatePersistenceSpec extends AnyFlatSpec with Matchers {

  "AgentState.toJson/fromJson" should "round-trip correctly" in {
    val tools = new ToolRegistry(Seq(...))
    val state = AgentState(...)

    val json = AgentState.toJson(state)
    val loaded = AgentState.fromJson(json, tools)

    loaded shouldBe Right(state.copy(tools = tools))  // Tools are not serialized
  }

  "AgentState.saveToFile/loadFromFile" should "persist to disk" in {
    val tempFile = Files.createTempFile("agent-state", ".json")
    val tools = new ToolRegistry(Seq(...))
    val state = AgentState(...)

    AgentState.saveToFile(state, tempFile.toString) shouldBe Right(())
    val loaded = AgentState.loadFromFile(tempFile.toString, tools)

    loaded shouldBe Right(state.copy(tools = tools))
  }
}

Integration Tests

class AgentMultiTurnIntegrationSpec extends AnyFlatSpec with Matchers {

  "Multi-turn conversation" should "work end-to-end with real LLM" in {
    // This test requires API key - mark as integration test
    val result = for {
      client <- LLMConnect.fromEnv()
      tools = new ToolRegistry(Seq(...))
      agent = new Agent(client)

      state1 <- agent.run("What's 2+2?", tools)
      state2 <- agent.continueConversation(state1, "Now multiply that by 3")

    } yield state2

    result.isRight shouldBe true
    result.foreach { state =>
      state.conversation.messageCount should be > 4
      state.status shouldBe AgentStatus.Complete
    }
  }
}

Migration Guide

From Imperative to Functional Style

Old Style (with `var`):

var state = agent.initialize(query, tools)
var stepCount = 0

while (state.status == AgentStatus.InProgress && stepCount < 10) {
  agent.runStep(state) match {
    case Right(newState) =>
      state = newState
      stepCount += 1
    case Left(error) =>
      state = state.withStatus(AgentStatus.Failed(error.toString))
  }
}

New Style (functional):

@tailrec
def runSteps(state: AgentState, remaining: Int): Result[AgentState] = {
  if (remaining == 0 || state.status == AgentStatus.Complete) {
    Right(state)
  } else {
    agent.runStep(state) match {
      case Right(newState) => runSteps(newState, remaining - 1)
      case Left(error) => Left(error)
    }
  }
}

val finalState = runSteps(agent.initialize(query, tools), maxSteps = 10)

Or simply use the built-in agent.run() which already does this!

Old Style (manual multi-turn):

val state1 = agent.initialize("First query", tools)
val result1 = agent.run(state1, None, None, false)

val state2 = result1.map { s =>
  s.copy(
    conversation = s.conversation.addMessage(UserMessage("Second query")),
    userQuery = "Second query",
    status = AgentStatus.InProgress
  )
}

val result2 = state2.flatMap(s => agent.run(s, None, None, false))

New Style (continuation API):

val result = for {
  state1 <- agent.run("First query", tools)
  state2 <- agent.continueConversation(state1, "Second query")
} yield state2

Breaking Changes

AgentState.userQuery renamed to initialQuery and made optional
- Impact: Low - mostly internal field
- Migration: Update code that reads state.userQuery to state.initialQuery.getOrElse("")
None currently - all changes are additive

Future Considerations

Potential Enhancements

Conversation Branching

// Fork a conversation to explore alternative paths
def fork(state: AgentState): AgentState

Conversation Merging

// Merge two conversation branches (complex!)
def merge(state1: AgentState, state2: AgentState): Result[AgentState]

Conversation Replay

// Replay a conversation with different tools or prompts
def replay(state: AgentState, newTools: ToolRegistry): Result[AgentState]

Token Counting Integration

// Integration with modules/core/src/main/scala/org/llm4s/context/tokens/
def estimateTokens(state: AgentState): Int

Automatic Summarization

// Use LLM to summarize old parts of conversation before pruning
def summarizeAndPrune(state: AgentState, config: ContextWindowConfig): AsyncResult[AgentState]

Conversation Templates

// Pre-defined conversation flows
object ConversationTemplates {
  def questionAnswer: ConversationTemplate
  def debuggingSession: ConversationTemplate
  def researchTask: ConversationTemplate
}

Conclusion

This design maintains llm4s’s functional programming principles while providing a clean, ergonomic API for multi-turn conversations:

✅ Pure Functions - No mutable state, all operations return new states ✅ Explicit State Flow - State threading visible in code ✅ Immutable Data - All structures immutable ✅ Composable - Works well with Result and for-comprehensions ✅ Type-Safe - Leverages Scala’s type system ✅ Easy to Use - Helper methods reduce boilerplate ✅ Production-Ready - Context window management, persistence support

The API is simpler than OpenAI’s (no mutable session objects) while being more correct (functional, explicit) and more powerful (compile-time safety, composability).

This is the llm4s way - functional purity with practical ergonomics.

Next Steps:

Review and approve design
Begin Phase 1 implementation
Gather feedback from initial users
Iterate based on real-world usage

Phase 1.1: Functional Conversation Management

Table of Contents

Overview

Goals

Non-Goals

Problem Statement

Current Pain Points

1. Imperative State Threading in Samples

2. No Multi-Turn Conversation API

3. No Context Window Management

4. userQuery Field is Ambiguous

Design Principles

Functional Programming First

Ergonomics Second

Current State Analysis

What Works Well

What Needs Improvement

Proposed Design

1. Conversation Continuation API

1.1 Core Continuation Method

1.2 Multi-Turn Helper

2. Context Window Management

2.1 Configuration

2.2 Pruning Implementation

2.3 Automatic Pruning in Agent

3. Fix userQuery Semantics

Option A: Make it Optional

Option B: Remove it Entirely

4. Conversation Persistence Helpers

4.1 Serialization Support

API Examples

Example 1: Multi-Turn Conversation (Functional Style)

Example 2: Multi-Turn with Helper Method

Example 3: Context Window Management

Example 4: Conversation Persistence

Example 5: Manual Step Execution (Functional Style)

Implementation Plan

Phase 1: Core Continuation API (Week 1)

Phase 2: Context Window Management (Week 2)

Phase 3: Persistence Helpers (Week 3)

Phase 4: Documentation & Migration (Week 4)

Testing Strategy

Unit Tests

Integration Tests

Migration Guide

From Imperative to Functional Style

Old Style (with var):

New Style (functional):

Old Style (manual multi-turn):

New Style (continuation API):

Breaking Changes

Future Considerations

Potential Enhancements

Conclusion

4. `userQuery` Field is Ambiguous

3. Fix `userQuery` Semantics

Old Style (with `var`):