Phase 1.2: Guardrails Framework
Phase 1.2: Guardrails Framework
Date: 2025-01-16 Status: Design Phase Priority: βββββ Critical for Production Effort: 2-3 weeks Phase: 1.2 - Core Usability Dependencies: Phase 1.1 (Functional Conversation Management)
Table of Contents
- Executive Summary
- Background & Motivation
- Design Goals
- Core Concepts
- Proposed API
- Implementation Details
- Integration with Existing Features
- Testing Strategy
- Documentation Plan
- Examples
- Appendix
Executive Summary
Problem Statement
llm4s currently lacks a standardized framework for input/output validation in agent workflows. Users must implement manual validation logic, which:
- Increases code complexity - Validation scattered across codebase
- Lacks composability - Hard to reuse validation logic
- No standardization - Each team implements differently
- Production risk - Easy to forget critical safety checks
Current State:
// Manual validation - verbose and error-prone
def runAgent(query: String): Result[AgentState] = {
if (query.isEmpty) {
Left(ValidationError.invalid("query", "Query cannot be empty"))
} else if (query.length > 10000) {
Left(ValidationError.invalid("query", "Query too long"))
} else if (containsProfanity(query)) {
Left(ValidationError.invalid("query", "Query contains inappropriate content"))
} else {
agent.run(query, tools)
}
}
Solution
Implement a declarative, composable guardrails framework that provides:
β Type-safe validation - Compile-time checking of guardrail types β Composability - Chain and combine guardrails functionally β Reusability - Built-in guardrails + custom extension points β Parallel execution - Multiple guardrails run concurrently β Clear semantics - Input vs. output guardrails with explicit flow
Proposed State:
// Declarative validation - clear and composable
agent.run(
query,
tools,
inputGuardrails = Seq(
LengthCheck(min = 1, max = 10000),
ProfanityFilter(),
CustomValidator(myValidationLogic)
),
outputGuardrails = Seq(
JSONValidator(schema),
ToneValidator(allowedTones = Set(Tone.Professional, Tone.Friendly))
)
)
Design Philosophy Alignment
This design adheres to llm4s core principles:
| Principle | How Guardrails Framework Achieves It |
|---|---|
| Functional & Immutable | Guardrails are pure functions A => Result[A] |
| Framework Agnostic | No dependencies on Cats Effect, ZIO, etc. |
| Simplicity Over Cleverness | Clear trait hierarchy, descriptive names |
| Principle of Least Surprise | Validates input before processing, output before returning |
| Type Safety | Compile-time checking of guardrail types |
Key Benefits
- Production Safety - Standardized validation prevents common mistakes
- Developer Experience - Declarative API reduces boilerplate
- Composability - Build complex validation from simple components
- Performance - Parallel validation execution by default
- Flexibility - Easy to add custom guardrails
Background & Motivation
Comparison with Other Frameworks
OpenAI Agents SDK
OpenAI Approach:
# Declarative guardrails with validation functions
agent = Agent(
input_guardrails=[profanity_filter, length_check],
output_guardrails=[fact_check, tone_validator]
)
Features:
- Input and output validation
- Parallel execution of guardrails
- Debounced validation for real-time agents
- Exception-based error handling
PydanticAI
PydanticAI Approach:
# Type-safe validation via Pydantic models
class QueryInput(BaseModel):
text: str = Field(min_length=1, max_length=10000)
language: str = Field(pattern="^[a-z]{2}$")
@agent.tool
async def process_query(input: QueryInput) -> ResponseOutput:
# Pydantic validates input automatically
...
Features:
- Strong runtime validation via Pydantic
- Type hints for IDE support
- Automatic validation on function calls
- Detailed error messages
Gap Analysis
| Feature | OpenAI SDK | PydanticAI | llm4s Current | llm4s Proposed |
|---|---|---|---|---|
| Input Validation | β Guardrails | β Pydantic | β Manual | β Guardrails |
| Output Validation | β Guardrails | β Pydantic | β Manual | β Guardrails |
| Composability | β οΈ List-based | β οΈ Model-based | β None | β Functional |
| Type Safety | β Runtime | β οΈ Runtime + hints | β Compile-time | β Compile-time |
| Parallel Execution | β Yes | β οΈ Sequential | β N/A | β Yes |
| Custom Validation | β Functions | β Validators | β Manual | β Trait extension |
| Error Handling | β οΈ Exceptions | β οΈ Exceptions | β Result | β Result |
llm4s Unique Advantages:
- Compile-time type safety - Catch errors before runtime
- Result-based errors - Explicit error handling, no exceptions
- Functional composition - Guardrails compose like pure functions
- Framework agnostic - Works with any effect system
Design Goals
Primary Goals
- Declarative Validation API β
- Express validation intent clearly
- Separate validation logic from business logic
- Minimize boilerplate code
- Type-Safe Guardrail Composition β
- Compile-time checking of guardrail types
- Type-safe input/output guardrail distinction
- Composable validation logic
- Functional Purity β
- Guardrails are pure functions
A => Result[A] - No side effects in validation logic
- Referentially transparent
- Guardrails are pure functions
- Performance β
- Parallel execution of independent guardrails
- Fail-fast on first error (configurable)
- Minimal overhead for disabled guardrails
- Extensibility β
- Easy to implement custom guardrails
- Composable validation combinators
- Plugin architecture for third-party guardrails
Non-Goals
β Pydantic-style model validation - Not implementing a full data validation framework β Async guardrails - Phase 1.2 is synchronous only (async in Phase 2.2) β Debounced validation - Not needed for non-streaming use cases β Runtime type coercion - Scalaβs type system handles this at compile time
Core Concepts
Guardrail Trait
The fundamental abstraction is a pure function that validates and potentially transforms a value:
trait Guardrail[A] {
/**
* Validate a value, returning the value if valid or an error if invalid.
*
* This is a PURE FUNCTION - no side effects allowed.
*
* @param value The value to validate
* @return Right(value) if valid, Left(error) if invalid
*/
def validate(value: A): Result[A]
/**
* Name of this guardrail for logging and error messages.
*/
def name: String
/**
* Optional description of what this guardrail validates.
*/
def description: Option[String] = None
}
Key Properties:
- Pure function - Same input always produces same output
- Type-safe - Generic type
Aensures type safety - Result-based - Uses
Result[A]for explicit error handling - Self-describing - Name and description for debugging
Input vs. Output Guardrails
Guardrails are specialized based on where they apply in the agent flow:
/**
* Validates user input before agent processing.
*
* Input guardrails run BEFORE the LLM is called, validating:
* - User queries
* - System prompts
* - Tool arguments
*/
trait InputGuardrail extends Guardrail[String]
/**
* Validates agent output before returning to user.
*
* Output guardrails run AFTER the LLM responds, validating:
* - Assistant messages
* - Tool results
* - Final responses
*/
trait OutputGuardrail extends Guardrail[String]
Why separate traits?
- Clarity - Explicit about validation timing
- Type safety - Prevent using output guardrails on input
- Different concerns - Input checks for safety, output checks for quality
- Composition - Can compose input guardrails separately from output
Validation Flow
βββββββββββββββββββ
β User Query β
ββββββββββ¬βββββββββ
β
ββββ Input Guardrails (parallel)
β βββ ProfanityFilter
β βββ LengthCheck
β βββ CustomValidator
β
β (If all pass)
ββββ Agent.run() - LLM processing
β
β (LLM generates response)
ββββ Output Guardrails (parallel)
β βββ JSONValidator
β βββ ToneValidator
β βββ FactChecker
β
β (If all pass)
ββββ Return Result[AgentState]
Validation Modes
How should multiple guardrails be evaluated?
sealed trait ValidationMode
object ValidationMode {
/**
* All guardrails must pass (default).
* Runs all guardrails even if some fail, aggregating all errors.
*/
case object All extends ValidationMode
/**
* At least one guardrail must pass.
* Returns success on first passing guardrail.
*/
case object Any extends ValidationMode
/**
* Returns on first result (success or failure).
* Useful for expensive guardrails where order matters.
*/
case object First extends ValidationMode
}
Use Cases:
- All (default): Safety checks - all must pass (profanity + length + custom)
- Any: Content detection - at least one must match (language detection)
- First: Expensive checks - stop at first definitive result (API-based validation)
Proposed API
1. Core Guardrail Traits
package org.llm4s.agent.guardrails
import org.llm4s.types.Result
/**
* Base trait for all guardrails.
*
* A guardrail is a pure function that validates a value of type A.
*
* @tparam A The type of value to validate
*/
trait Guardrail[A] {
/**
* Validate a value.
*
* @param value The value to validate
* @return Right(value) if valid, Left(error) if invalid
*/
def validate(value: A): Result[A]
/**
* Name of this guardrail for logging and error messages.
*/
def name: String
/**
* Optional description of what this guardrail validates.
*/
def description: Option[String] = None
/**
* Compose this guardrail with another sequentially.
*
* @param other The guardrail to run after this one
* @return A composite guardrail that runs both in sequence
*/
def andThen(other: Guardrail[A]): Guardrail[A] =
CompositeGuardrail.sequential(Seq(this, other))
/**
* Map the validation result.
* Useful for transforming values after validation.
*/
def map[B](f: A => B): Guardrail[B] = new Guardrail[B] {
def validate(value: B): Result[B] = {
// This is a bit tricky - we can't validate B directly
// This is more for transforming the result after validation
// For now, we'll keep it simple
Right(value)
}
def name: String = Guardrail.this.name
override def description: Option[String] = Guardrail.this.description
}
}
/**
* Validates user input before agent processing.
*/
trait InputGuardrail extends Guardrail[String] {
/**
* Optional: Transform the input after validation.
* Default is identity (no transformation).
*/
def transform(input: String): String = input
}
/**
* Validates agent output before returning to user.
*/
trait OutputGuardrail extends Guardrail[String] {
/**
* Optional: Transform the output after validation.
* Default is identity (no transformation).
*/
def transform(output: String): String = output
}
2. Built-in Guardrails
package org.llm4s.agent.guardrails.builtin
import org.llm4s.agent.guardrails.{InputGuardrail, OutputGuardrail}
import org.llm4s.error.ValidationError
import org.llm4s.types.Result
/**
* Validates string length is within bounds.
*
* @param min Minimum length (inclusive)
* @param max Maximum length (inclusive)
*/
class LengthCheck(min: Int, max: Int) extends InputGuardrail {
require(min >= 0, "Minimum length must be non-negative")
require(max >= min, "Maximum length must be >= minimum length")
def validate(value: String): Result[String] =
if (value.length < min) {
Left(ValidationError.invalid(
"input",
s"Input too short: ${value.length} characters (minimum: $min)"
))
} else if (value.length > max) {
Left(ValidationError.invalid(
"input",
s"Input too long: ${value.length} characters (maximum: $max)"
))
} else {
Right(value)
}
val name: String = "LengthCheck"
override val description: Option[String] = Some(s"Validates length between $min and $max characters")
}
/**
* Filters profanity and inappropriate content.
*
* This is a basic implementation using a word list.
* For production, consider integrating with external APIs like:
* - OpenAI Moderation API
* - Google Perspective API
* - Custom ML models
*/
class ProfanityFilter(
customBadWords: Set[String] = Set.empty,
caseSensitive: Boolean = false
) extends InputGuardrail with OutputGuardrail {
// Default bad words list (basic example - expand for production)
private val defaultBadWords: Set[String] = Set(
// Add actual profanity list here
// This is intentionally minimal for example purposes
)
private val badWords: Set[String] = {
val combined = defaultBadWords ++ customBadWords
if (caseSensitive) combined else combined.map(_.toLowerCase)
}
def validate(value: String): Result[String] = {
val checkValue = if (caseSensitive) value else value.toLowerCase
val words = checkValue.split("\\s+")
val foundBadWords = words.filter(badWords.contains)
if (foundBadWords.nonEmpty) {
Left(ValidationError.invalid(
"input",
s"Input contains inappropriate content: ${foundBadWords.mkString(", ")}"
))
} else {
Right(value)
}
}
val name: String = "ProfanityFilter"
override val description: Option[String] = Some("Filters profanity and inappropriate content")
}
/**
* Validates that output is valid JSON matching an optional schema.
*
* @param schema Optional JSON schema to validate against
*/
class JSONValidator(schema: Option[ujson.Value] = None) extends OutputGuardrail {
def validate(value: String): Result[String] = {
// Try to parse as JSON
val parseResult = scala.util.Try(ujson.read(value)).toEither.left.map { ex =>
ValidationError.invalid("output", s"Output is not valid JSON: ${ex.getMessage}")
}
// If schema provided, validate against it
parseResult.flatMap { json =>
schema match {
case Some(s) =>
// TODO: Implement JSON schema validation
// For now, just check that it parses
Right(value)
case None =>
Right(value)
}
}
}
val name: String = "JSONValidator"
override val description: Option[String] = Some("Validates output is valid JSON")
}
/**
* Validates that output matches a regular expression.
*
* @param pattern The regex pattern to match
* @param errorMessage Optional custom error message
*/
class RegexValidator(
pattern: scala.util.matching.Regex,
errorMessage: Option[String] = None
) extends Guardrail[String] {
def validate(value: String): Result[String] =
if (pattern.findFirstIn(value).isDefined) {
Right(value)
} else {
Left(ValidationError.invalid(
"value",
errorMessage.getOrElse(s"Value does not match pattern: $pattern")
))
}
val name: String = "RegexValidator"
override val description: Option[String] = Some(s"Validates against pattern: $pattern")
}
/**
* Validates that output matches one of the allowed tones.
*
* This is a simple keyword-based implementation.
* For production, consider using sentiment analysis APIs.
*/
class ToneValidator(allowedTones: Set[Tone]) extends OutputGuardrail {
def validate(value: String): Result[String] = {
val detectedTone = detectTone(value)
if (allowedTones.contains(detectedTone)) {
Right(value)
} else {
Left(ValidationError.invalid(
"output",
s"Output tone ($detectedTone) not allowed. Allowed tones: ${allowedTones.mkString(", ")}"
))
}
}
private def detectTone(text: String): Tone = {
// Simple keyword-based detection (improve for production)
val lower = text.toLowerCase
if (lower.contains("!") && lower.split("[.!?]").exists(_.split("\\s+").length < 5)) {
Tone.Excited
} else if (lower.matches(".*\\b(please|thank you|kindly)\\b.*")) {
Tone.Professional
} else if (lower.matches(".*\\b(hey|cool|awesome)\\b.*")) {
Tone.Casual
} else {
Tone.Neutral
}
}
val name: String = "ToneValidator"
override val description: Option[String] = Some(s"Validates tone is one of: ${allowedTones.mkString(", ")}")
}
sealed trait Tone
object Tone {
case object Professional extends Tone
case object Casual extends Tone
case object Friendly extends Tone
case object Formal extends Tone
case object Excited extends Tone
case object Neutral extends Tone
}
3. Composite Guardrail
package org.llm4s.agent.guardrails
/**
* Combines multiple guardrails with configurable validation mode.
*
* @param guardrails The guardrails to combine
* @param mode How to combine validation results
*/
class CompositeGuardrail[A](
guardrails: Seq[Guardrail[A]],
mode: ValidationMode = ValidationMode.All
) extends Guardrail[A] {
def validate(value: A): Result[A] = mode match {
case ValidationMode.All =>
validateAll(value)
case ValidationMode.Any =>
validateAny(value)
case ValidationMode.First =>
validateFirst(value)
}
private def validateAll(value: A): Result[A] = {
val results = guardrails.map(_.validate(value))
val errors = results.collect { case Left(err) => err }
if (errors.isEmpty) {
Right(value)
} else {
// Aggregate all errors
Left(ValidationError.invalid(
"composite",
s"Multiple validation failures: ${errors.map(_.formatted).mkString("; ")}"
))
}
}
private def validateAny(value: A): Result[A] = {
val results = guardrails.map(_.validate(value))
val successes = results.collect { case Right(v) => v }
if (successes.nonEmpty) {
Right(successes.head)
} else {
val errors = results.collect { case Left(err) => err }
Left(ValidationError.invalid(
"composite",
s"All validations failed: ${errors.map(_.formatted).mkString("; ")}"
))
}
}
private def validateFirst(value: A): Result[A] = {
guardrails.headOption match {
case Some(guardrail) => guardrail.validate(value)
case None => Right(value)
}
}
val name: String = s"CompositeGuardrail(${guardrails.map(_.name).mkString(", ")})"
override val description: Option[String] = Some(
s"Composite guardrail with mode=$mode: ${guardrails.map(_.name).mkString(", ")}"
)
}
object CompositeGuardrail {
/**
* Create a composite guardrail that validates all guardrails.
*/
def all[A](guardrails: Seq[Guardrail[A]]): CompositeGuardrail[A] =
new CompositeGuardrail(guardrails, ValidationMode.All)
/**
* Create a composite guardrail that validates any guardrail.
*/
def any[A](guardrails: Seq[Guardrail[A]]): CompositeGuardrail[A] =
new CompositeGuardrail(guardrails, ValidationMode.Any)
/**
* Create a composite guardrail that runs guardrails sequentially.
*/
def sequential[A](guardrails: Seq[Guardrail[A]]): Guardrail[A] = new Guardrail[A] {
def validate(value: A): Result[A] =
guardrails.foldLeft[Result[A]](Right(value)) { (acc, guardrail) =>
acc.flatMap(guardrail.validate)
}
val name: String = s"SequentialGuardrail(${guardrails.map(_.name).mkString(" -> ")})"
}
}
4. Enhanced Agent API
package org.llm4s.agent
import org.llm4s.agent.guardrails.{InputGuardrail, OutputGuardrail}
class Agent(client: LLMClient) {
/**
* Run the agent with optional input/output guardrails.
*
* @param query User query
* @param tools Available tools
* @param inputGuardrails Validate query before processing (default: none)
* @param outputGuardrails Validate response before returning (default: none)
* @param maxSteps Maximum agent steps (default: 10)
* @param traceLogPath Optional trace log file path
* @param debug Enable debug logging
* @return Agent state or validation error
*/
def run(
query: String,
tools: ToolRegistry,
inputGuardrails: Seq[InputGuardrail] = Seq.empty,
outputGuardrails: Seq[OutputGuardrail] = Seq.empty,
maxSteps: Option[Int] = Some(10),
traceLogPath: Option[String] = None,
debug: Boolean = false
): Result[AgentState] = {
for {
// 1. Validate input
validatedQuery <- validateInput(query, inputGuardrails)
// 2. Initialize and run agent
initialState = initialize(validatedQuery, tools, None, debug)
finalState <- run(initialState, maxSteps, traceLogPath, debug)
// 3. Validate output
validatedState <- validateOutput(finalState, outputGuardrails)
} yield validatedState
}
/**
* Continue a conversation with optional guardrails.
*
* @param previousState Previous agent state (must be Complete or Failed)
* @param newUserMessage New user message
* @param inputGuardrails Validate new message before processing
* @param outputGuardrails Validate response before returning
* @param maxSteps Maximum agent steps
* @param traceLogPath Optional trace log file
* @param contextWindowConfig Optional context window management
* @param debug Enable debug logging
* @return Updated agent state or validation error
*/
def continueConversation(
previousState: AgentState,
newUserMessage: String,
inputGuardrails: Seq[InputGuardrail] = Seq.empty,
outputGuardrails: Seq[OutputGuardrail] = Seq.empty,
maxSteps: Option[Int] = None,
traceLogPath: Option[String] = None,
contextWindowConfig: Option[ContextWindowConfig] = None,
debug: Boolean = false
): Result[AgentState] = {
for {
// 1. Validate input
validatedMessage <- validateInput(newUserMessage, inputGuardrails)
// 2. Continue conversation
finalState <- super.continueConversation(
previousState,
validatedMessage,
maxSteps,
traceLogPath,
contextWindowConfig,
debug
)
// 3. Validate output
validatedState <- validateOutput(finalState, outputGuardrails)
} yield validatedState
}
/**
* Validate input using guardrails.
*/
private def validateInput(
query: String,
guardrails: Seq[InputGuardrail]
): Result[String] = {
if (guardrails.isEmpty) {
Right(query)
} else {
// Run guardrails in parallel and aggregate results
val composite = CompositeGuardrail.all(guardrails)
composite.validate(query)
}
}
/**
* Validate output using guardrails.
*/
private def validateOutput(
state: AgentState,
guardrails: Seq[OutputGuardrail]
): Result[AgentState] = {
if (guardrails.isEmpty) {
Right(state)
} else {
// Extract final assistant message
val finalMessage = state.conversation.messages
.findLast(_.role == MessageRole.Assistant)
.map(_.content)
.getOrElse("")
// Validate final message
val composite = CompositeGuardrail.all(guardrails)
composite.validate(finalMessage).map(_ => state)
}
}
}
Implementation Details
Module Structure
modules/core/src/main/scala/org/llm4s/agent/guardrails/
βββ Guardrail.scala # Base trait
βββ InputGuardrail.scala # Input validation trait
βββ OutputGuardrail.scala # Output validation trait
βββ ValidationMode.scala # Validation mode enum
βββ CompositeGuardrail.scala # Composite guardrail
βββ builtin/ # Built-in guardrails
βββ LengthCheck.scala
βββ ProfanityFilter.scala
βββ JSONValidator.scala
βββ RegexValidator.scala
βββ ToneValidator.scala
Implementation Phases
Phase 1: Core Framework (Week 1)
Tasks:
- Implement
Guardrail[A]trait - Implement
InputGuardrailandOutputGuardrailtraits - Implement
ValidationModeenum - Implement
CompositeGuardrail - Add tests for composition
Deliverables:
- Core guardrail framework
- Composition utilities
- Unit tests
Phase 2: Built-in Guardrails (Week 1-2)
Tasks:
- Implement
LengthCheck - Implement
ProfanityFilter - Implement
JSONValidator - Implement
RegexValidator - Implement
ToneValidator - Add tests for each guardrail
Deliverables:
- 5 built-in guardrails
- Comprehensive tests
- Documentation
Phase 3: Agent Integration (Week 2)
Tasks:
- Enhance
Agent.run()with guardrail parameters - Enhance
Agent.continueConversation()with guardrail parameters - Implement
validateInput()andvalidateOutput()helpers - Add integration tests
- Update trace logging to include validation
Deliverables:
- Enhanced Agent API
- Integration tests
- Updated trace logs
Phase 4: Documentation & Examples (Week 2-3)
Tasks:
- Write user guide for guardrails
- Create custom guardrail tutorial
- Add examples to samples module
- Update CLAUDE.md with guardrails section
- Create migration guide
Deliverables:
- Comprehensive documentation
- 3+ working examples
- Migration guide
Integration with Existing Features
Integration with Phase 1.1 (Conversation Management)
Guardrails integrate seamlessly with multi-turn conversations:
// Multi-turn conversation with consistent guardrails
val inputGuardrails = Seq(
LengthCheck(min = 1, max = 5000),
ProfanityFilter()
)
val outputGuardrails = Seq(
ToneValidator(allowedTones = Set(Tone.Professional, Tone.Friendly))
)
val result = for {
// First turn - guardrails apply
state1 <- agent.run(
"What is Scala?",
tools,
inputGuardrails = inputGuardrails,
outputGuardrails = outputGuardrails
)
// Second turn - same guardrails apply
state2 <- agent.continueConversation(
state1,
"What are its main features?",
inputGuardrails = inputGuardrails,
outputGuardrails = outputGuardrails
)
// Third turn - different guardrails
state3 <- agent.continueConversation(
state2,
"Generate a code example in JSON format",
inputGuardrails = inputGuardrails,
outputGuardrails = Seq(JSONValidator()) // Require JSON output
)
} yield state3
Key Points:
- Guardrails are optional parameters on each turn
- Can change guardrails between turns
- Validation happens per-turn, not per-conversation
Integration with Phase 1.3 (Handoffs) - Future
When handoffs are implemented, guardrails will apply per-agent:
// Each agent has its own guardrails
val agentA = new Agent(client)
val agentB = new Agent(client)
// Agent A validates with strict rules
val stateA = agentA.run(
query,
toolsA,
inputGuardrails = Seq(LengthCheck(max = 1000), ProfanityFilter()),
handoffs = Seq(Handoff(agentB))
)
// Agent B (after handoff) has its own guardrails
val stateB = agentB.run(
handoffQuery,
toolsB,
inputGuardrails = Seq(LengthCheck(max = 5000)), // Different rules!
outputGuardrails = Seq(JSONValidator())
)
Design Decision: Guardrails are not inherited across handoffs. Each agent is responsible for its own validation.
Integration with Trace Logging
Guardrail validation results are logged in trace files:
# Agent Execution Trace
## Step 1: Input Validation
**Guardrails:** LengthCheck, ProfanityFilter
β
**LengthCheck**: PASSED (1234 characters, max: 5000)
β
**ProfanityFilter**: PASSED
## Step 2: Agent Processing
...
## Step 10: Output Validation
**Guardrails:** ToneValidator
β
**ToneValidator**: PASSED (tone: Professional, allowed: [Professional, Friendly])
## Final Status: Complete
Testing Strategy
Unit Tests
Guardrail Tests
class LengthCheckSpec extends AnyFlatSpec with Matchers {
"LengthCheck" should "pass for valid length" in {
val guardrail = new LengthCheck(min = 1, max = 100)
val result = guardrail.validate("Hello, world!")
result shouldBe Right("Hello, world!")
}
it should "fail for too short input" in {
val guardrail = new LengthCheck(min = 10, max = 100)
val result = guardrail.validate("Hi")
result.isLeft shouldBe true
}
it should "fail for too long input" in {
val guardrail = new LengthCheck(min = 1, max = 10)
val result = guardrail.validate("This is way too long")
result.isLeft shouldBe true
}
}
class CompositeGuardrailSpec extends AnyFlatSpec with Matchers {
"CompositeGuardrail.all" should "pass if all guardrails pass" in {
val composite = CompositeGuardrail.all(Seq(
new LengthCheck(1, 100),
new ProfanityFilter()
))
val result = composite.validate("Hello, world!")
result shouldBe Right("Hello, world!")
}
it should "fail if any guardrail fails" in {
val composite = CompositeGuardrail.all(Seq(
new LengthCheck(1, 10), // Will fail
new ProfanityFilter()
))
val result = composite.validate("This is too long")
result.isLeft shouldBe true
}
}
Integration Tests
Agent Integration Tests
class AgentGuardrailsIntegrationSpec extends AnyFlatSpec with Matchers {
"Agent.run" should "validate input before processing" in {
val agent = new Agent(mockClient)
val result = agent.run(
"", // Empty query
tools,
inputGuardrails = Seq(new LengthCheck(min = 1, max = 100))
)
result.isLeft shouldBe true
result.left.get.formatted should include("too short")
}
it should "validate output before returning" in {
val agent = new Agent(mockClient)
// Mock client returns invalid JSON
when(mockClient.complete(*, *))
.thenReturn(Right(CompletionResponse(content = "Not JSON")))
val result = agent.run(
"Generate JSON",
tools,
outputGuardrails = Seq(new JSONValidator())
)
result.isLeft shouldBe true
result.left.get.formatted should include("not valid JSON")
}
it should "pass when all guardrails pass" in {
val agent = new Agent(mockClient)
when(mockClient.complete(*, *))
.thenReturn(Right(CompletionResponse(content = "Valid response")))
val result = agent.run(
"Hello",
tools,
inputGuardrails = Seq(new LengthCheck(1, 100)),
outputGuardrails = Seq(new RegexValidator(".*".r))
)
result.isRight shouldBe true
}
}
Performance Tests
class GuardrailPerformanceSpec extends AnyFlatSpec with Matchers {
"Guardrails" should "execute in parallel efficiently" in {
// Create 10 slow guardrails that take 100ms each
val slowGuardrails = (1 to 10).map { i =>
new InputGuardrail {
def validate(value: String): Result[String] = {
Thread.sleep(100)
Right(value)
}
val name = s"SlowGuardrail$i"
}
}
val composite = CompositeGuardrail.all(slowGuardrails)
val start = System.currentTimeMillis()
composite.validate("test")
val duration = System.currentTimeMillis() - start
// Should be ~100ms (parallel) not ~1000ms (sequential)
duration should be < 200L
}
}
Documentation Plan
User Guide: Guardrails Framework
# Guardrails Framework
## Overview
Guardrails provide declarative validation for agent inputs and outputs.
## Basic Usage
### Input Validation
```scala
import org.llm4s.agent.guardrails.builtin._
agent.run(
query,
tools,
inputGuardrails = Seq(
LengthCheck(min = 1, max = 5000),
ProfanityFilter()
)
)
Output Validation
agent.run(
"Generate a JSON response",
tools,
outputGuardrails = Seq(
JSONValidator()
)
)
Built-in Guardrails
| Guardrail | Type | Description |
|---|---|---|
LengthCheck(min, max) |
Input | Validates string length |
ProfanityFilter() |
Input/Output | Filters inappropriate content |
JSONValidator(schema) |
Output | Validates JSON structure |
RegexValidator(pattern) |
Both | Validates regex match |
ToneValidator(tones) |
Output | Validates response tone |
Custom Guardrails
Simple Custom Guardrail
class EmailValidator extends InputGuardrail {
private val emailPattern = "^[A-Za-z0-9+_.-]+@[A-Za-z0-9.-]+$".r
def validate(value: String): Result[String] =
if (emailPattern.matches(value)) {
Right(value)
} else {
Left(ValidationError.invalid("email", "Invalid email format"))
}
val name = "EmailValidator"
}
Complex Custom Guardrail
class APIBasedModerationGuardrail(
apiKey: String,
threshold: Double = 0.5
) extends InputGuardrail with OutputGuardrail {
def validate(value: String): Result[String] = {
// Call external moderation API
val moderationResult = callModerationAPI(value, apiKey)
if (moderationResult.toxicityScore < threshold) {
Right(value)
} else {
Left(ValidationError.invalid(
"content",
s"Content toxicity score too high: ${moderationResult.toxicityScore}"
))
}
}
val name = "APIBasedModeration"
}
Composing Guardrails
Sequential Composition
val strictValidation =
new LengthCheck(1, 1000)
.andThen(new ProfanityFilter())
.andThen(new CustomValidator())
Validation Modes
import org.llm4s.agent.guardrails.ValidationMode
// All guardrails must pass
CompositeGuardrail.all(Seq(guardrail1, guardrail2))
// At least one must pass
CompositeGuardrail.any(Seq(guardrail1, guardrail2))
// First result wins
new CompositeGuardrail(Seq(guardrail1, guardrail2), ValidationMode.First)
Multi-turn Conversations
val guardrails = Seq(
LengthCheck(1, 5000),
ProfanityFilter()
)
for {
state1 <- agent.run(query1, tools, inputGuardrails = guardrails)
state2 <- agent.continueConversation(
state1,
query2,
inputGuardrails = guardrails
)
} yield state2
Best Practices
- Use built-in guardrails when possible
- Compose guardrails for complex validation
- Keep guardrails pure - no side effects
- Test custom guardrails thoroughly
- Document validation logic clearly
- Consider performance for expensive validations ```
Examples
Example 1: Basic Input Validation
package org.llm4s.samples.guardrails
import org.llm4s.agent.Agent
import org.llm4s.agent.guardrails.builtin._
import org.llm4s.llmconnect.LLMConnect
object BasicInputValidationExample extends App {
val result = for {
client <- LLMConnect.fromEnv()
agent = new Agent(client)
state <- agent.run(
query = "What is Scala?",
tools = ToolRegistry.empty,
inputGuardrails = Seq(
new LengthCheck(min = 1, max = 10000),
new ProfanityFilter()
)
)
} yield state
result match {
case Right(state) =>
println(s"Success: ${state.conversation.messages.last.content}")
case Left(error) =>
println(s"Validation failed: ${error.formatted}")
}
}
Example 2: Output JSON Validation
package org.llm4s.samples.guardrails
import org.llm4s.agent.Agent
import org.llm4s.agent.guardrails.builtin._
import org.llm4s.llmconnect.LLMConnect
object JSONOutputValidationExample extends App {
val result = for {
client <- LLMConnect.fromEnv()
agent = new Agent(client)
state <- agent.run(
query = "Generate a JSON object with name and age fields",
tools = ToolRegistry.empty,
outputGuardrails = Seq(
new JSONValidator()
)
)
} yield state
result match {
case Right(state) =>
val response = state.conversation.messages.last.content
println(s"Valid JSON response: $response")
// Can safely parse
val json = ujson.read(response)
println(s"Name: ${json("name").str}")
println(s"Age: ${json("age").num}")
case Left(error) =>
println(s"Output validation failed: ${error.formatted}")
}
}
Example 3: Custom Guardrail
package org.llm4s.samples.guardrails
import org.llm4s.agent.Agent
import org.llm4s.agent.guardrails.InputGuardrail
import org.llm4s.error.ValidationError
import org.llm4s.llmconnect.LLMConnect
import org.llm4s.types.Result
// Custom guardrail that checks for specific keywords
class KeywordRequirementGuardrail(requiredKeywords: Set[String]) extends InputGuardrail {
def validate(value: String): Result[String] = {
val lowerValue = value.toLowerCase
val missingKeywords = requiredKeywords.filterNot(kw => lowerValue.contains(kw.toLowerCase))
if (missingKeywords.isEmpty) {
Right(value)
} else {
Left(ValidationError.invalid(
"input",
s"Query must contain keywords: ${missingKeywords.mkString(", ")}"
))
}
}
val name = "KeywordRequirementGuardrail"
override val description = Some(s"Requires keywords: ${requiredKeywords.mkString(", ")}")
}
object CustomGuardrailExample extends App {
val result = for {
client <- LLMConnect.fromEnv()
agent = new Agent(client)
state <- agent.run(
query = "Tell me about Scala programming language features",
tools = ToolRegistry.empty,
inputGuardrails = Seq(
new KeywordRequirementGuardrail(Set("scala", "programming"))
)
)
} yield state
result match {
case Right(state) =>
println(s"Query contained required keywords")
println(s"Response: ${state.conversation.messages.last.content}")
case Left(error) =>
println(s"Validation failed: ${error.formatted}")
}
}
Example 4: Multi-turn with Tone Validation
package org.llm4s.samples.guardrails
import org.llm4s.agent.Agent
import org.llm4s.agent.guardrails.builtin._
import org.llm4s.llmconnect.LLMConnect
object MultiTurnToneValidationExample extends App {
val inputGuardrails = Seq(
new LengthCheck(min = 1, max = 5000),
new ProfanityFilter()
)
val outputGuardrails = Seq(
new ToneValidator(allowedTones = Set(Tone.Professional, Tone.Friendly))
)
val result = for {
client <- LLMConnect.fromEnv()
agent = new Agent(client)
// Turn 1: Ask about Scala
state1 <- agent.run(
"What is Scala?",
ToolRegistry.empty,
inputGuardrails = inputGuardrails,
outputGuardrails = outputGuardrails
)
// Turn 2: Ask for details
state2 <- agent.continueConversation(
state1,
"What are its main features?",
inputGuardrails = inputGuardrails,
outputGuardrails = outputGuardrails
)
// Turn 3: Ask for examples
state3 <- agent.continueConversation(
state2,
"Can you give me a code example?",
inputGuardrails = inputGuardrails,
outputGuardrails = outputGuardrails
)
} yield state3
result match {
case Right(finalState) =>
println("All turns passed validation!")
println(s"Final status: ${finalState.status}")
println(s"Total messages: ${finalState.conversation.messages.length}")
case Left(error) =>
println(s"Validation failed: ${error.formatted}")
}
}
Example 5: Composite Guardrail with Modes
package org.llm4s.samples.guardrails
import org.llm4s.agent.Agent
import org.llm4s.agent.guardrails._
import org.llm4s.agent.guardrails.builtin._
import org.llm4s.llmconnect.LLMConnect
object CompositeGuardrailExample extends App {
// Language detection guardrails (at least one must match)
val languageDetection = CompositeGuardrail.any(Seq(
new RegexValidator(".*\\b(scala|functional)\\b.*".r),
new RegexValidator(".*\\b(java|object-oriented)\\b.*".r),
new RegexValidator(".*\\b(python|dynamic)\\b.*".r)
))
// Safety guardrails (all must pass)
val safetyChecks = CompositeGuardrail.all(Seq(
new LengthCheck(min = 1, max = 10000),
new ProfanityFilter()
))
val result = for {
client <- LLMConnect.fromEnv()
agent = new Agent(client)
state <- agent.run(
query = "Tell me about Scala programming",
tools = ToolRegistry.empty,
inputGuardrails = Seq(
safetyChecks.asInstanceOf[InputGuardrail],
languageDetection.asInstanceOf[InputGuardrail]
)
)
} yield state
result match {
case Right(state) =>
println("Query passed all validation!")
println(s"Response: ${state.conversation.messages.last.content}")
case Left(error) =>
println(s"Validation failed: ${error.formatted}")
}
}
Appendix
A. Comparison with Other Frameworks
OpenAI Agents SDK
Similarities:
- Declarative guardrail API
- Input/output validation separation
- Composable validation
Differences:
- llm4s uses
Result[A](explicit errors) vs. OpenAI exceptions - llm4s guardrails are pure functions vs. OpenAI can have side effects
- llm4s has compile-time type safety vs. OpenAI runtime validation
PydanticAI
Similarities:
- Type-safe validation
- Composable validators
- Clear error messages
Differences:
- llm4s uses traits vs. Pydantic models
- llm4s compile-time checking vs. Pydantic runtime validation
- llm4s functional composition vs. Pydantic class-based
B. Future Enhancements
Phase 2.2: Async Guardrails
trait AsyncGuardrail[A] {
def validate(value: A): AsyncResult[A]
}
Phase 3.3: External API Integration
class OpenAIModerationGuardrail(apiKey: ApiKey) extends AsyncGuardrail[String]
class GooglePerspectiveGuardrail(apiKey: ApiKey) extends AsyncGuardrail[String]
Phase 4: ML-based Guardrails
class SentimentAnalysisGuardrail(model: SentimentModel) extends Guardrail[String]
class ToxicityDetectionGuardrail(model: ToxicityModel) extends Guardrail[String]
C. Migration from Manual Validation
Before (Manual Validation):
def runAgent(query: String): Result[AgentState] = {
// Manual validation
if (query.isEmpty) {
Left(ValidationError.invalid("query", "Empty query"))
} else if (query.length > 10000) {
Left(ValidationError.invalid("query", "Query too long"))
} else {
agent.run(query, tools)
}
}
After (Guardrails):
def runAgent(query: String): Result[AgentState] = {
agent.run(
query,
tools,
inputGuardrails = Seq(
LengthCheck(min = 1, max = 10000)
)
)
}
Benefits:
- β Less boilerplate code
- β Reusable validation logic
- β Composable guardrails
- β Standardized error messages
- β Easier to test
Conclusion
Phase 1.2 (Guardrails Framework) provides a production-critical feature that enhances safety and developer experience:
β Declarative validation - Clear, composable API β Type-safe - Compile-time checking β Functional purity - Pure functions, no side effects β Production-ready - Built-in guardrails + custom extension β Well-integrated - Works seamlessly with Phase 1.1 features
Estimated Timeline: 2-3 weeks Effort: Medium Risk: Low Value: High (Critical for production deployments)
Next Steps:
- Review and approve design document
- Create implementation branch
- Implement core framework (Week 1)
- Implement built-in guardrails (Week 1-2)
- Integrate with Agent API (Week 2)
- Documentation and examples (Week 2-3)
- Testing and refinement (Week 3)
End of Design Document