Guardrails

Validate agent inputs and outputs for safety, quality, and compliance.

Overview
Built-in Guardrails
Basic Usage
LLM-as-Judge Examples
Composite Guardrails
Custom Guardrails
RAG Guardrails
PII Detection and Masking
Prompt Injection Protection
Error Handling
1. Guardrail Errors
2. Validation Mode
Best Practices
Examples
Next Steps

Overview

Guardrails are validation functions that run before (input) and after (output) agent processing. They help ensure:

Safety - Block harmful or inappropriate content
Quality - Enforce response standards
Compliance - Meet business requirements
Security - Detect prompt injection and PII

agent.run(
  query = "User input",
  tools = tools,
  inputGuardrails = Seq(...),   // Validate before LLM call
  outputGuardrails = Seq(...)   // Validate after LLM response
)

Built-in Guardrails

Simple Validators

These guardrails run locally without LLM calls:

Guardrail	Purpose	Example
`LengthCheck`	Enforce min/max length	`new LengthCheck(1, 10000)`
`ProfanityFilter`	Block profane content	`new ProfanityFilter()`
`JSONValidator`	Ensure valid JSON output	`new JSONValidator()`
`RegexValidator`	Pattern matching	`new RegexValidator("\\d{3}-\\d{4}")`
`ToneValidator`	Simple tone detection	`new ToneValidator(Tone.Professional)`
`PIIDetector`	Detect PII (email, SSN, etc.)	`new PIIDetector()`
`PIIMasker`	Mask detected PII	`new PIIMasker()`
`PromptInjectionDetector`	Detect injection attempts	`new PromptInjectionDetector()`

LLM-as-Judge Guardrails

These use an LLM to evaluate subjective qualities:

Guardrail	Purpose	Example
`LLMSafetyGuardrail`	Content safety check	`new LLMSafetyGuardrail(client)`
`LLMFactualityGuardrail`	Verify factual accuracy	`new LLMFactualityGuardrail(client)`
`LLMQualityGuardrail`	Assess response quality	`new LLMQualityGuardrail(client)`
`LLMToneGuardrail`	Validate tone compliance	`new LLMToneGuardrail(client, "professional")`

RAG-Specific Guardrails

For retrieval-augmented generation:

Guardrail	Purpose
`GroundingGuardrail`	Verify answers are grounded in retrieved context
`ContextRelevanceGuardrail`	Check context relevance to query
`SourceAttributionGuardrail`	Ensure sources are cited
`TopicBoundaryGuardrail`	Prevent off-topic responses

Basic Usage

Input Validation

import org.llm4s.agent.guardrails.builtin._

val result = agent.run(
  query = userInput,
  tools = tools,
  inputGuardrails = Seq(
    new LengthCheck(min = 1, max = 10000),
    new ProfanityFilter(),
    new PromptInjectionDetector()
  )
)

result match {
  case Left(GuardrailError(name, message)) =>
    println(s"Input rejected by $name: $message")
  case Right(state) =>
    println(state.lastAssistantMessage)
}

Output Validation

val result = agent.run(
  query = "Generate a JSON response with user data",
  tools = tools,
  outputGuardrails = Seq(
    new JSONValidator(),
    new PIIMasker()
  )
)

Combined Input/Output

val result = agent.run(
  query = userInput,
  tools = tools,
  inputGuardrails = Seq(
    new LengthCheck(1, 5000),
    new ProfanityFilter()
  ),
  outputGuardrails = Seq(
    new LLMSafetyGuardrail(client),
    new ToneValidator(Tone.Professional)
  )
)

LLM-as-Judge Examples

Safety Check

import org.llm4s.agent.guardrails.builtin.LLMSafetyGuardrail

val safetyGuardrail = new LLMSafetyGuardrail(client)

agent.run(
  query = "Write a story",
  tools = tools,
  outputGuardrails = Seq(safetyGuardrail)
)

Factuality Check

Verify responses are grounded in source documents:

import org.llm4s.agent.guardrails.builtin.LLMFactualityGuardrail

val factualityGuardrail = LLMFactualityGuardrail.strict(
  client = client,
  sourceDocuments = Seq(
    "The capital of France is Paris.",
    "Paris has a population of 2.1 million."
  )
)

agent.run(
  query = "What is the capital of France?",
  tools = tools,
  outputGuardrails = Seq(factualityGuardrail)
)

Tone Validation

import org.llm4s.agent.guardrails.builtin.LLMToneGuardrail

val toneGuardrail = new LLMToneGuardrail(
  client = client,
  targetTone = "professional and helpful"
)

agent.run(
  query = "Help with customer complaint",
  tools = tools,
  outputGuardrails = Seq(toneGuardrail)
)

Composite Guardrails

Combine multiple guardrails with different strategies:

All Must Pass (AND)

import org.llm4s.agent.guardrails.CompositeGuardrail

val strictValidation = CompositeGuardrail.all(Seq(
  new LengthCheck(1, 5000),
  new ProfanityFilter(),
  new PIIDetector()
))

// All guardrails must pass for input to be accepted

Any Must Pass (OR)

val flexibleValidation = CompositeGuardrail.any(Seq(
  new RegexValidator("^[A-Z].*"),  // Starts with capital
  new RegexValidator("^\\d.*")     // Starts with digit
))

// At least one guardrail must pass

Sequential (Short-Circuit)

val sequentialValidation = CompositeGuardrail.sequential(Seq(
  new LengthCheck(1, 10000),  // Check length first
  new ProfanityFilter(),      // Then profanity
  new PIIDetector()           // Then PII
))

// Stops at first failure, more efficient

Custom Guardrails

Basic Custom Guardrail

import org.llm4s.agent.guardrails.InputGuardrail
import org.llm4s.types.Result

class KeywordRequirementGuardrail(requiredKeywords: Set[String]) extends InputGuardrail {
  val name: String = "keyword-requirement"

  def validate(value: String): Result[String] = {
    val found = requiredKeywords.filter(kw => value.toLowerCase.contains(kw.toLowerCase))
    if (found.nonEmpty) {
      Right(value)
    } else {
      Left(LLMError.validation(
        s"Input must contain at least one of: ${requiredKeywords.mkString(", ")}"
      ))
    }
  }
}

// Usage
val guardrail = new KeywordRequirementGuardrail(Set("scala", "java", "kotlin"))

Custom Output Guardrail

import org.llm4s.agent.guardrails.OutputGuardrail

class MaxSentenceCountGuardrail(maxSentences: Int) extends OutputGuardrail {
  val name: String = "max-sentence-count"

  def validate(value: String): Result[String] = {
    val sentenceCount = value.split("[.!?]+").length
    if (sentenceCount <= maxSentences) {
      Right(value)
    } else {
      Left(LLMError.validation(
        s"Response has $sentenceCount sentences, max allowed is $maxSentences"
      ))
    }
  }
}

Custom LLM-Based Guardrail

import org.llm4s.agent.guardrails.LLMGuardrail

class CustomLLMGuardrail(client: LLMClient) extends LLMGuardrail(client) {
  val name: String = "custom-llm-check"

  override def buildPrompt(content: String): String = {
    s"""Evaluate if the following content is appropriate for a children's website.
       |Respond with only "PASS" or "FAIL" followed by a brief explanation.
       |
       |Content: $content""".stripMargin
  }

  override def parseResponse(response: String): Result[Boolean] = {
    if (response.trim.startsWith("PASS")) Right(true)
    else if (response.trim.startsWith("FAIL")) Right(false)
    else Left(LLMError.parsing("Unexpected response format"))
  }
}

RAG Guardrails

Basic RAG Setup

import org.llm4s.agent.guardrails.rag._

val ragGuardrails = RAGGuardrails.standard(client)

agent.run(
  query = question,
  tools = tools,
  outputGuardrails = ragGuardrails
)

Preset Configurations

// Minimal - basic safety only
val minimal = RAGGuardrails.minimal()

// Standard - balanced for production
val standard = RAGGuardrails.standard(client)

// Strict - maximum safety
val strict = RAGGuardrails.strict(client)

// Monitoring - warn mode, doesn't block
val monitoring = RAGGuardrails.monitoring(client)

Individual RAG Guardrails

// Verify answer is grounded in retrieved context
val grounding = new GroundingGuardrail(
  client = client,
  retrievedContext = retrievedDocuments
)

// Check retrieved context is relevant to query
val relevance = new ContextRelevanceGuardrail(
  client = client,
  query = userQuery
)

// Ensure sources are properly cited
val attribution = new SourceAttributionGuardrail(
  client = client,
  sourceDocuments = sources
)

// Prevent off-topic responses
val topicBoundary = new TopicBoundaryGuardrail(
  client = client,
  allowedTopics = Set("programming", "software engineering")
)

PII Detection and Masking

Detect PII

import org.llm4s.agent.guardrails.builtin.PIIDetector

val piiDetector = new PIIDetector()

// Detects: emails, SSNs, credit cards, phone numbers, etc.
agent.run(
  query = userInput,
  tools = tools,
  inputGuardrails = Seq(piiDetector)
)

Mask PII in Output

import org.llm4s.agent.guardrails.builtin.PIIMasker

val piiMasker = new PIIMasker()

// Replaces PII with [REDACTED_EMAIL], [REDACTED_SSN], etc.
agent.run(
  query = "Get user details",
  tools = tools,
  outputGuardrails = Seq(piiMasker)
)

Supported PII Types

Type	Pattern	Masked As
Email	`user@domain.com`	`[REDACTED_EMAIL]`
SSN	`123-45-6789`	`[REDACTED_SSN]`
Credit Card	`4111-1111-1111-1111`	`[REDACTED_CC]`
Phone	`(555) 123-4567`	`[REDACTED_PHONE]`
IP Address	`192.168.1.1`	`[REDACTED_IP]`

Prompt Injection Protection

import org.llm4s.agent.guardrails.builtin.PromptInjectionDetector

val injectionDetector = new PromptInjectionDetector()

agent.run(
  query = userInput,
  tools = tools,
  inputGuardrails = Seq(injectionDetector)
)

// Detects patterns like:
// - "Ignore previous instructions..."
// - "System: You are now..."
// - "---\nNew instructions:"
// - Base64 encoded payloads

Error Handling

Guardrail Errors

result match {
  case Left(error: GuardrailError) =>
    println(s"Guardrail '${error.guardrailName}' failed: ${error.message}")
    // Take appropriate action (retry, notify user, log)

  case Left(error) =>
    println(s"Other error: $error")

  case Right(state) =>
    println("Success!")
}

Validation Mode

Control how guardrail results are handled:

import org.llm4s.agent.guardrails.ValidationMode

// Block on failure (default)
val blocking = ValidationMode.Block

// Warn only, continue processing
val warn = ValidationMode.Warn

// Log and continue
val log = ValidationMode.Log

Best Practices

1. Layer Your Guardrails

// Fast, local checks first
val inputGuardrails = Seq(
  new LengthCheck(1, 10000),        // Cheapest first
  new ProfanityFilter(),             // Still fast
  new PromptInjectionDetector(),     // Pattern matching
  new PIIDetector()                  // More complex but local
)

// LLM checks for output only (expensive)
val outputGuardrails = Seq(
  new JSONValidator(),               // Fast, local
  new LLMSafetyGuardrail(client)    // Expensive, last
)

2. Use Appropriate Guardrails for Each Use Case

Use Case	Recommended Guardrails
Customer support	`ProfanityFilter`, `ToneValidator`, `LLMSafetyGuardrail`
Code generation	`LengthCheck`, `JSONValidator` (for structured output)
RAG application	`GroundingGuardrail`, `SourceAttributionGuardrail`
Content moderation	`PIIDetector`, `ProfanityFilter`, `LLMSafetyGuardrail`
Form processing	`RegexValidator`, `LengthCheck`

3. Test Your Guardrails

import org.scalatest.flatspec.AnyFlatSpec
import org.scalatest.matchers.should.Matchers

class GuardrailSpec extends AnyFlatSpec with Matchers {
  "LengthCheck" should "reject empty input" in {
    val guardrail = new LengthCheck(min = 1, max = 100)
    guardrail.validate("") shouldBe a[Left[_, _]]
  }

  it should "accept valid input" in {
    val guardrail = new LengthCheck(min = 1, max = 100)
    guardrail.validate("Hello") shouldBe Right("Hello")
  }
}

Examples

Example	Description
BasicInputValidationExample	Length and profanity checks
JSONOutputValidationExample	JSON output validation
LLMJudgeGuardrailExample	LLM-as-Judge patterns
CompositeGuardrailExample	Combining guardrails
CustomGuardrailExample	Building custom validators
FactualityGuardrailExample	RAG factuality checking

Browse all examples →

Next Steps

Memory Guide - Persistent context across conversations
Handoffs Guide - Agent-to-agent delegation
Streaming Guide - Real-time execution events