org.llm4s.agent.guardrails.builtin

Members list

Type members

Classlikes

sealed trait InjectionCategory

Categories of prompt injection attacks.

Categories of prompt injection attacks.

Attributes

Companion
object
Supertypes
class Object
trait Matchable
class Any
Known subtypes

Attributes

Companion
trait
Supertypes
trait Sum
trait Mirror
class Object
trait Matchable
class Any
Self type
final case class InjectionMatch(name: String, category: InjectionCategory, severity: Int)

Match result from injection detection.

Match result from injection detection.

Attributes

Supertypes
trait Serializable
trait Product
trait Equals
class Object
trait Matchable
class Any
Show all
final case class InjectionPattern(name: String, regex: Regex, category: InjectionCategory, severity: Int)

Pattern for detecting a specific type of injection.

Pattern for detecting a specific type of injection.

Value parameters

category

Type of injection attack

name

Human-readable name for the pattern

regex

Regular expression to match

severity

Severity level (1=low, 2=medium, 3=high)

Attributes

Companion
object
Supertypes
trait Serializable
trait Product
trait Equals
class Object
trait Matchable
class Any
Show all

Attributes

Companion
class
Supertypes
trait Product
trait Mirror
class Object
trait Matchable
class Any
Self type
sealed trait InjectionSensitivity

Sensitivity levels for injection detection.

Sensitivity levels for injection detection.

Higher sensitivity catches more attacks but may have more false positives.

Attributes

Companion
object
Supertypes
class Object
trait Matchable
class Any
Known subtypes
object High
object Low
object Medium

Attributes

Companion
trait
Supertypes
trait Sum
trait Mirror
class Object
trait Matchable
class Any
Self type
class JSONValidator(schema: Option[Value]) extends OutputGuardrail

Validates that output is valid JSON matching an optional schema.

Validates that output is valid JSON matching an optional schema.

This guardrail ensures that LLM output is properly formatted JSON, which is useful when requesting structured data from the agent.

Value parameters

schema

Optional JSON schema to validate against (minimal subset)

Attributes

Companion
object
Supertypes
trait Guardrail[String]
class Object
trait Matchable
class Any
object JSONValidator

Attributes

Companion
class
Supertypes
class Object
trait Matchable
class Any
Self type
class LLMFactualityGuardrail(val llmClient: LLMClient, referenceContext: String, val threshold: Double) extends LLMGuardrail

LLM-based factual accuracy validation guardrail.

LLM-based factual accuracy validation guardrail.

Uses an LLM to evaluate whether content is factually accurate given a reference context. Useful for RAG applications where you want to ensure the model's response aligns with retrieved documents.

Value parameters

llmClient

The LLM client to use for evaluation

referenceContext

The reference text to fact-check against

threshold

Minimum score to pass (default: 0.7)

Attributes

Example
val context = "Paris is the capital of France. It has a population of 2.1 million."
val guardrail = LLMFactualityGuardrail(client, context, threshold = 0.8)
agent.run(query, tools, outputGuardrails = Seq(guardrail))
Companion
object
Supertypes
trait LLMGuardrail
trait Guardrail[String]
class Object
trait Matchable
class Any
Show all

Attributes

Companion
class
Supertypes
class Object
trait Matchable
class Any
Self type
class LLMQualityGuardrail(val llmClient: LLMClient, originalQuery: String, val threshold: Double) extends LLMGuardrail

LLM-based response quality validation guardrail.

LLM-based response quality validation guardrail.

Uses an LLM to evaluate the overall quality of a response including helpfulness, completeness, clarity, and relevance.

Value parameters

llmClient

The LLM client to use for evaluation

originalQuery

The original user query (for relevance checking)

threshold

Minimum score to pass (default: 0.7)

Attributes

Example
val guardrail = LLMQualityGuardrail(client, "What is Scala?")
agent.run(query, tools, outputGuardrails = Seq(guardrail))
Companion
object
Supertypes
trait LLMGuardrail
trait Guardrail[String]
class Object
trait Matchable
class Any
Show all

Attributes

Companion
class
Supertypes
class Object
trait Matchable
class Any
Self type
class LLMSafetyGuardrail(val llmClient: LLMClient, val threshold: Double, customCriteria: Option[String]) extends LLMGuardrail

LLM-based content safety validation guardrail.

LLM-based content safety validation guardrail.

Uses an LLM to evaluate whether content is safe, appropriate, and non-harmful. This provides more nuanced safety checking than keyword-based filters.

Safety categories evaluated:

  • Harmful or dangerous content
  • Inappropriate or offensive language
  • Misinformation or misleading claims
  • Privacy violations
  • Illegal activity promotion

Value parameters

customCriteria

Optional additional safety criteria to check

llmClient

The LLM client to use for evaluation

threshold

Minimum score to pass (default: 0.8 - higher for safety)

Attributes

Example
val guardrail = LLMSafetyGuardrail(client)
agent.run(query, tools, outputGuardrails = Seq(guardrail))
Companion
object
Supertypes
trait LLMGuardrail
trait Guardrail[String]
class Object
trait Matchable
class Any
Show all

Attributes

Companion
class
Supertypes
class Object
trait Matchable
class Any
Self type
class LLMToneGuardrail(val llmClient: LLMClient, allowedTones: Set[String], val threshold: Double) extends LLMGuardrail

LLM-based tone validation guardrail.

LLM-based tone validation guardrail.

Uses an LLM to evaluate whether content matches the specified tone(s). This is more accurate than the keyword-based ToneValidator for nuanced tone detection, but has higher latency due to the LLM API call.

Value parameters

allowedTones

Set of acceptable tones (e.g., "professional", "friendly")

llmClient

The LLM client to use for evaluation

threshold

Minimum score to pass (default: 0.7)

Attributes

Example
val guardrail = LLMToneGuardrail(
 client,
 Set("professional", "friendly"),
 threshold = 0.8
)
agent.run(query, tools, outputGuardrails = Seq(guardrail))
Companion
object
Supertypes
trait LLMGuardrail
trait Guardrail[String]
class Object
trait Matchable
class Any
Show all

Attributes

Companion
class
Supertypes
class Object
trait Matchable
class Any
Self type
class LengthCheck(min: Int, max: Int) extends InputGuardrail, OutputGuardrail

Validates string length is within bounds.

Validates string length is within bounds.

Can be used for both input and output validation. Ensures content is neither too short nor too long.

Value parameters

max

Maximum length (inclusive)

min

Minimum length (inclusive)

Attributes

Companion
object
Supertypes
trait Guardrail[String]
class Object
trait Matchable
class Any
Show all
object LengthCheck

Attributes

Companion
class
Supertypes
class Object
trait Matchable
class Any
Self type
class PIIDetector(val piiTypes: Seq[PIIType], val onFail: GuardrailAction) extends InputGuardrail, OutputGuardrail

Detects Personally Identifiable Information (PII) in text.

Detects Personally Identifiable Information (PII) in text.

Uses regex patterns to detect common PII types including:

  • Social Security Numbers (SSN)
  • Credit Card Numbers
  • Email Addresses
  • Phone Numbers
  • IP Addresses
  • Passport Numbers
  • Dates of Birth

Can be configured to:

  • Block: Return error when PII is detected (default)
  • Fix: Automatically mask PII and continue
  • Warn: Log warning and allow processing to continue

Example usage:

// Block on PII detection
val strictDetector = PIIDetector()

// Mask PII automatically
val maskingDetector = PIIDetector(onFail = GuardrailAction.Fix)

// Detect only credit cards and SSNs
val financialDetector = PIIDetector(
 piiTypes = Seq(PIIType.CreditCard, PIIType.SSN)
)

Value parameters

onFail

Action to take when PII is detected (default: Block)

piiTypes

The types of PII to detect (default: SSN, CreditCard, Email, Phone)

Attributes

Companion
object
Supertypes
trait Guardrail[String]
class Object
trait Matchable
class Any
Show all
object PIIDetector

Attributes

Companion
class
Supertypes
class Object
trait Matchable
class Any
Self type
class PIIMasker(val piiTypes: Seq[PIIType]) extends InputGuardrail, OutputGuardrail

Automatically masks Personally Identifiable Information (PII) in text.

Automatically masks Personally Identifiable Information (PII) in text.

Unlike PIIDetector (which can block or warn), PIIMasker always transforms the text by replacing detected PII with redaction placeholders.

This guardrail never blocks - it always allows processing to continue with sanitized text. Use this when you want to:

  • Sanitize user input before sending to LLM
  • Redact sensitive information from LLM outputs
  • Preserve privacy while allowing queries to proceed

Masked text uses placeholders like [REDACTED_EMAIL], [REDACTED_SSN], etc.

Example usage:

// Mask all default PII types
val masker = PIIMasker()

// Mask only specific types
val emailMasker = PIIMasker(Seq(PIIType.Email, PIIType.Phone))

// Use with agent
agent.run(
 query = userInput,
 tools = tools,
 inputGuardrails = Seq(PIIMasker())
)

Value parameters

piiTypes

The types of PII to mask (default: SSN, CreditCard, Email, Phone)

Attributes

Companion
object
Supertypes
trait Guardrail[String]
class Object
trait Matchable
class Any
Show all
object PIIMasker

Attributes

Companion
class
Supertypes
class Object
trait Matchable
class Any
Self type
PIIMasker.type
class ProfanityFilter(customBadWords: Set[String], caseSensitive: Boolean) extends InputGuardrail, OutputGuardrail

Filters profanity and inappropriate content.

Filters profanity and inappropriate content.

This is a basic implementation using a word list. For production, consider integrating with external APIs like:

  • OpenAI Moderation API
  • Google Perspective API
  • Custom ML models

Can be used for both input and output validation.

Value parameters

caseSensitive

Whether matching should be case-sensitive

customBadWords

Additional words to filter beyond the default list

Attributes

Companion
object
Supertypes
trait Guardrail[String]
class Object
trait Matchable
class Any
Show all

Attributes

Companion
class
Supertypes
class Object
trait Matchable
class Any
Self type
class PromptInjectionDetector(val patterns: Seq[InjectionPattern], val sensitivity: InjectionSensitivity, val onFail: GuardrailAction) extends InputGuardrail

Detects prompt injection attempts in user input.

Detects prompt injection attempts in user input.

Prompt injection attacks attempt to override system instructions, manipulate the AI's behavior, or extract sensitive information.

Detection categories:

  • Instruction Override: "Ignore previous instructions", "forget your rules"
  • Role Manipulation: "You are now DAN", "Act as a different AI"
  • System Prompt Extraction: "What is your system prompt?", "Show your instructions"
  • Jailbreak Phrases: Common patterns used in known jailbreaks
  • Code/SQL Injection: Attempts to inject executable code

Example usage:

// Default: Block on injection detection
val detector = PromptInjectionDetector()

// Custom sensitivity (fewer false positives)
val relaxed = PromptInjectionDetector(
 sensitivity = InjectionSensitivity.Medium
)

// Use as input guardrail
agent.run(
 query = userInput,
 tools = tools,
 inputGuardrails = Seq(PromptInjectionDetector())
)

Value parameters

onFail

Action to take when injection is detected (default: Block)

patterns

Custom injection patterns to detect (in addition to defaults)

sensitivity

Detection sensitivity level

Attributes

Companion
object
Supertypes
trait Guardrail[String]
class Object
trait Matchable
class Any

Attributes

Companion
class
Supertypes
class Object
trait Matchable
class Any
Self type
class RegexValidator(pattern: Regex, errorMessage: Option[String]) extends InputGuardrail, OutputGuardrail

Validates that content matches a regular expression.

Validates that content matches a regular expression.

Can be used for both input and output validation. Useful for enforcing format requirements like email addresses, phone numbers, or custom patterns.

Value parameters

errorMessage

Optional custom error message

pattern

The regex pattern to match

Attributes

Companion
object
Supertypes
trait Guardrail[String]
class Object
trait Matchable
class Any
Show all

Attributes

Companion
class
Supertypes
class Object
trait Matchable
class Any
Self type
sealed trait Tone

Tone categories for content validation.

Tone categories for content validation.

Attributes

Companion
object
Supertypes
class Object
trait Matchable
class Any
Known subtypes
object Casual
object Excited
object Formal
object Friendly
object Neutral
object Professional
Show all
object Tone

Attributes

Companion
trait
Supertypes
trait Sum
trait Mirror
class Object
trait Matchable
class Any
Self type
Tone.type
class ToneValidator(allowedTones: Set[Tone]) extends OutputGuardrail

Validates that output matches one of the allowed tones.

Validates that output matches one of the allowed tones.

This is a simple keyword-based implementation. For production, consider using sentiment analysis APIs or ML models.

Value parameters

allowedTones

The set of acceptable tones

Attributes

Companion
object
Supertypes
trait Guardrail[String]
class Object
trait Matchable
class Any
object ToneValidator

Attributes

Companion
class
Supertypes
class Object
trait Matchable
class Any
Self type