Guardrails

Validate agent inputs and outputs for safety, quality, and compliance.

Table of contents

  1. Overview
  2. Built-in Guardrails
    1. Simple Validators
    2. LLM-as-Judge Guardrails
    3. RAG-Specific Guardrails
  3. Basic Usage
    1. Input Validation
    2. Output Validation
    3. Combined Input/Output
  4. LLM-as-Judge Examples
    1. Safety Check
    2. Factuality Check
    3. Tone Validation
  5. Composite Guardrails
    1. All Must Pass (AND)
    2. Any Must Pass (OR)
    3. Sequential (Short-Circuit)
  6. Custom Guardrails
    1. Basic Custom Guardrail
    2. Custom Output Guardrail
    3. Custom LLM-Based Guardrail
  7. RAG Guardrails
    1. Basic RAG Setup
    2. Preset Configurations
    3. Individual RAG Guardrails
  8. PII Detection and Masking
    1. Detect PII
    2. Mask PII in Output
    3. Supported PII Types
  9. Prompt Injection Protection
  10. Error Handling
    1. Guardrail Errors
    2. Validation Mode
  11. Best Practices
    1. 1. Layer Your Guardrails
    2. 2. Use Appropriate Guardrails for Each Use Case
    3. 3. Test Your Guardrails
  12. Examples
  13. Next Steps

Overview

Guardrails are validation functions that run before (input) and after (output) agent processing. They help ensure:

  • Safety - Block harmful or inappropriate content
  • Quality - Enforce response standards
  • Compliance - Meet business requirements
  • Security - Detect prompt injection and PII
1
2
3
4
5
6
agent.run(
  query = "User input",
  tools = tools,
  inputGuardrails = Seq(...),   // Validate before LLM call
  outputGuardrails = Seq(...)   // Validate after LLM response
)

Built-in Guardrails

Simple Validators

These guardrails run locally without LLM calls:

Guardrail Purpose Example
LengthCheck Enforce min/max length new LengthCheck(1, 10000)
ProfanityFilter Block profane content new ProfanityFilter()
JSONValidator Ensure valid JSON output new JSONValidator()
RegexValidator Pattern matching new RegexValidator("\\d{3}-\\d{4}")
ToneValidator Simple tone detection new ToneValidator(Tone.Professional)
PIIDetector Detect PII (email, SSN, etc.) new PIIDetector()
PIIMasker Mask detected PII new PIIMasker()
PromptInjectionDetector Detect injection attempts new PromptInjectionDetector()

LLM-as-Judge Guardrails

These use an LLM to evaluate subjective qualities:

Guardrail Purpose Example
LLMSafetyGuardrail Content safety check new LLMSafetyGuardrail(client)
LLMFactualityGuardrail Verify factual accuracy new LLMFactualityGuardrail(client)
LLMQualityGuardrail Assess response quality new LLMQualityGuardrail(client)
LLMToneGuardrail Validate tone compliance new LLMToneGuardrail(client, "professional")

RAG-Specific Guardrails

For retrieval-augmented generation:

Guardrail Purpose
GroundingGuardrail Verify answers are grounded in retrieved context
ContextRelevanceGuardrail Check context relevance to query
SourceAttributionGuardrail Ensure sources are cited
TopicBoundaryGuardrail Prevent off-topic responses

Basic Usage

Input Validation

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
import org.llm4s.agent.guardrails.builtin._

val result = agent.run(
  query = userInput,
  tools = tools,
  inputGuardrails = Seq(
    new LengthCheck(min = 1, max = 10000),
    new ProfanityFilter(),
    new PromptInjectionDetector()
  )
)

result match {
  case Left(GuardrailError(name, message)) =>
    println(s"Input rejected by $name: $message")
  case Right(state) =>
    println(state.lastAssistantMessage)
}

Output Validation

1
2
3
4
5
6
7
8
val result = agent.run(
  query = "Generate a JSON response with user data",
  tools = tools,
  outputGuardrails = Seq(
    new JSONValidator(),
    new PIIMasker()
  )
)

Combined Input/Output

1
2
3
4
5
6
7
8
9
10
11
12
val result = agent.run(
  query = userInput,
  tools = tools,
  inputGuardrails = Seq(
    new LengthCheck(1, 5000),
    new ProfanityFilter()
  ),
  outputGuardrails = Seq(
    new LLMSafetyGuardrail(client),
    new ToneValidator(Tone.Professional)
  )
)

LLM-as-Judge Examples

Safety Check

1
2
3
4
5
6
7
8
9
import org.llm4s.agent.guardrails.builtin.LLMSafetyGuardrail

val safetyGuardrail = new LLMSafetyGuardrail(client)

agent.run(
  query = "Write a story",
  tools = tools,
  outputGuardrails = Seq(safetyGuardrail)
)

Factuality Check

Verify responses are grounded in source documents:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
import org.llm4s.agent.guardrails.builtin.LLMFactualityGuardrail

val factualityGuardrail = LLMFactualityGuardrail.strict(
  client = client,
  sourceDocuments = Seq(
    "The capital of France is Paris.",
    "Paris has a population of 2.1 million."
  )
)

agent.run(
  query = "What is the capital of France?",
  tools = tools,
  outputGuardrails = Seq(factualityGuardrail)
)

Tone Validation

1
2
3
4
5
6
7
8
9
10
11
12
import org.llm4s.agent.guardrails.builtin.LLMToneGuardrail

val toneGuardrail = new LLMToneGuardrail(
  client = client,
  targetTone = "professional and helpful"
)

agent.run(
  query = "Help with customer complaint",
  tools = tools,
  outputGuardrails = Seq(toneGuardrail)
)

Composite Guardrails

Combine multiple guardrails with different strategies:

All Must Pass (AND)

1
2
3
4
5
6
7
8
9
import org.llm4s.agent.guardrails.CompositeGuardrail

val strictValidation = CompositeGuardrail.all(Seq(
  new LengthCheck(1, 5000),
  new ProfanityFilter(),
  new PIIDetector()
))

// All guardrails must pass for input to be accepted

Any Must Pass (OR)

1
2
3
4
5
6
val flexibleValidation = CompositeGuardrail.any(Seq(
  new RegexValidator("^[A-Z].*"),  // Starts with capital
  new RegexValidator("^\\d.*")     // Starts with digit
))

// At least one guardrail must pass

Sequential (Short-Circuit)

1
2
3
4
5
6
7
val sequentialValidation = CompositeGuardrail.sequential(Seq(
  new LengthCheck(1, 10000),  // Check length first
  new ProfanityFilter(),      // Then profanity
  new PIIDetector()           // Then PII
))

// Stops at first failure, more efficient

Custom Guardrails

Basic Custom Guardrail

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
import org.llm4s.agent.guardrails.InputGuardrail
import org.llm4s.types.Result

class KeywordRequirementGuardrail(requiredKeywords: Set[String]) extends InputGuardrail {
  val name: String = "keyword-requirement"

  def validate(value: String): Result[String] = {
    val found = requiredKeywords.filter(kw => value.toLowerCase.contains(kw.toLowerCase))
    if (found.nonEmpty) {
      Right(value)
    } else {
      Left(LLMError.validation(
        s"Input must contain at least one of: ${requiredKeywords.mkString(", ")}"
      ))
    }
  }
}

// Usage
val guardrail = new KeywordRequirementGuardrail(Set("scala", "java", "kotlin"))

Custom Output Guardrail

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
import org.llm4s.agent.guardrails.OutputGuardrail

class MaxSentenceCountGuardrail(maxSentences: Int) extends OutputGuardrail {
  val name: String = "max-sentence-count"

  def validate(value: String): Result[String] = {
    val sentenceCount = value.split("[.!?]+").length
    if (sentenceCount <= maxSentences) {
      Right(value)
    } else {
      Left(LLMError.validation(
        s"Response has $sentenceCount sentences, max allowed is $maxSentences"
      ))
    }
  }
}

Custom LLM-Based Guardrail

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
import org.llm4s.agent.guardrails.LLMGuardrail

class CustomLLMGuardrail(client: LLMClient) extends LLMGuardrail(client) {
  val name: String = "custom-llm-check"

  override def buildPrompt(content: String): String = {
    s"""Evaluate if the following content is appropriate for a children's website.
       |Respond with only "PASS" or "FAIL" followed by a brief explanation.
       |
       |Content: $content""".stripMargin
  }

  override def parseResponse(response: String): Result[Boolean] = {
    if (response.trim.startsWith("PASS")) Right(true)
    else if (response.trim.startsWith("FAIL")) Right(false)
    else Left(LLMError.parsing("Unexpected response format"))
  }
}

RAG Guardrails

Basic RAG Setup

1
2
3
4
5
6
7
8
9
import org.llm4s.agent.guardrails.rag._

val ragGuardrails = RAGGuardrails.standard(client)

agent.run(
  query = question,
  tools = tools,
  outputGuardrails = ragGuardrails
)

Preset Configurations

1
2
3
4
5
6
7
8
9
10
11
// Minimal - basic safety only
val minimal = RAGGuardrails.minimal()

// Standard - balanced for production
val standard = RAGGuardrails.standard(client)

// Strict - maximum safety
val strict = RAGGuardrails.strict(client)

// Monitoring - warn mode, doesn't block
val monitoring = RAGGuardrails.monitoring(client)

Individual RAG Guardrails

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
// Verify answer is grounded in retrieved context
val grounding = new GroundingGuardrail(
  client = client,
  retrievedContext = retrievedDocuments
)

// Check retrieved context is relevant to query
val relevance = new ContextRelevanceGuardrail(
  client = client,
  query = userQuery
)

// Ensure sources are properly cited
val attribution = new SourceAttributionGuardrail(
  client = client,
  sourceDocuments = sources
)

// Prevent off-topic responses
val topicBoundary = new TopicBoundaryGuardrail(
  client = client,
  allowedTopics = Set("programming", "software engineering")
)

PII Detection and Masking

Detect PII

1
2
3
4
5
6
7
8
9
10
import org.llm4s.agent.guardrails.builtin.PIIDetector

val piiDetector = new PIIDetector()

// Detects: emails, SSNs, credit cards, phone numbers, etc.
agent.run(
  query = userInput,
  tools = tools,
  inputGuardrails = Seq(piiDetector)
)

Mask PII in Output

1
2
3
4
5
6
7
8
9
10
import org.llm4s.agent.guardrails.builtin.PIIMasker

val piiMasker = new PIIMasker()

// Replaces PII with [REDACTED_EMAIL], [REDACTED_SSN], etc.
agent.run(
  query = "Get user details",
  tools = tools,
  outputGuardrails = Seq(piiMasker)
)

Supported PII Types

Type Pattern Masked As
Email user@domain.com [REDACTED_EMAIL]
SSN 123-45-6789 [REDACTED_SSN]
Credit Card 4111-1111-1111-1111 [REDACTED_CC]
Phone (555) 123-4567 [REDACTED_PHONE]
IP Address 192.168.1.1 [REDACTED_IP]

Prompt Injection Protection

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
import org.llm4s.agent.guardrails.builtin.PromptInjectionDetector

val injectionDetector = new PromptInjectionDetector()

agent.run(
  query = userInput,
  tools = tools,
  inputGuardrails = Seq(injectionDetector)
)

// Detects patterns like:
// - "Ignore previous instructions..."
// - "System: You are now..."
// - "---\nNew instructions:"
// - Base64 encoded payloads

Error Handling

Guardrail Errors

1
2
3
4
5
6
7
8
9
10
11
result match {
  case Left(error: GuardrailError) =>
    println(s"Guardrail '${error.guardrailName}' failed: ${error.message}")
    // Take appropriate action (retry, notify user, log)

  case Left(error) =>
    println(s"Other error: $error")

  case Right(state) =>
    println("Success!")
}

Validation Mode

Control how guardrail results are handled:

1
2
3
4
5
6
7
8
9
10
import org.llm4s.agent.guardrails.ValidationMode

// Block on failure (default)
val blocking = ValidationMode.Block

// Warn only, continue processing
val warn = ValidationMode.Warn

// Log and continue
val log = ValidationMode.Log

Best Practices

1. Layer Your Guardrails

1
2
3
4
5
6
7
8
9
10
11
12
13
// Fast, local checks first
val inputGuardrails = Seq(
  new LengthCheck(1, 10000),        // Cheapest first
  new ProfanityFilter(),             // Still fast
  new PromptInjectionDetector(),     // Pattern matching
  new PIIDetector()                  // More complex but local
)

// LLM checks for output only (expensive)
val outputGuardrails = Seq(
  new JSONValidator(),               // Fast, local
  new LLMSafetyGuardrail(client)    // Expensive, last
)

2. Use Appropriate Guardrails for Each Use Case

Use Case Recommended Guardrails
Customer support ProfanityFilter, ToneValidator, LLMSafetyGuardrail
Code generation LengthCheck, JSONValidator (for structured output)
RAG application GroundingGuardrail, SourceAttributionGuardrail
Content moderation PIIDetector, ProfanityFilter, LLMSafetyGuardrail
Form processing RegexValidator, LengthCheck

3. Test Your Guardrails

1
2
3
4
5
6
7
8
9
10
11
12
13
14
import org.scalatest.flatspec.AnyFlatSpec
import org.scalatest.matchers.should.Matchers

class GuardrailSpec extends AnyFlatSpec with Matchers {
  "LengthCheck" should "reject empty input" in {
    val guardrail = new LengthCheck(min = 1, max = 100)
    guardrail.validate("") shouldBe a[Left[_, _]]
  }

  it should "accept valid input" in {
    val guardrail = new LengthCheck(min = 1, max = 100)
    guardrail.validate("Hello") shouldBe Right("Hello")
  }
}

Examples

Example Description
BasicInputValidationExample Length and profanity checks
JSONOutputValidationExample JSON output validation
LLMJudgeGuardrailExample LLM-as-Judge patterns
CompositeGuardrailExample Combining guardrails
CustomGuardrailExample Building custom validators
FactualityGuardrailExample RAG factuality checking

Browse all examples →


Next Steps