org.llm4s.agent.guardrails.builtin

Categories of prompt injection attacks.

Attributes

Companion: object
Supertypes: class Object

trait Matchable

class Any
Known subtypes: object CodeInjection

object DataExfiltration

object InstructionOverride

object Jailbreak

object RoleManipulation

object SystemPromptExtraction
Show all

Attributes

Companion: trait
Supertypes: trait Sum

trait Mirror

class Object

trait Matchable

class Any
Self type: InjectionCategory.type

Match result from injection detection.

Attributes

Supertypes: trait Serializable

trait Product

trait Equals

class Object

trait Matchable

class Any
Show all

Pattern for detecting a specific type of injection.

Value parameters

category: Type of injection attack
name: Human-readable name for the pattern
regex: Regular expression to match
severity: Severity level (1=low, 2=medium, 3=high)

Attributes

Companion: object
Supertypes: trait Serializable

trait Product

trait Equals

class Object

trait Matchable

class Any
Show all

Attributes

Companion: class
Supertypes: trait Product

trait Mirror

class Object

trait Matchable

class Any
Self type: InjectionPattern.type

Sensitivity levels for injection detection.

Higher sensitivity catches more attacks but may have more false positives.

Attributes

Companion: object
Supertypes: class Object

trait Matchable

class Any
Known subtypes: object High

object Low

object Medium

Attributes

Companion: trait
Supertypes: trait Sum

trait Mirror

class Object

trait Matchable

class Any
Self type: InjectionSensitivity.type

Validates that output is valid JSON matching an optional schema.

This guardrail ensures that LLM output is properly formatted JSON, which is useful when requesting structured data from the agent.

Value parameters

schema: Optional JSON schema to validate against (minimal subset)

Attributes

Companion: object
Supertypes: trait OutputGuardrail

trait Guardrail[String]

class Object

trait Matchable

class Any

Attributes

Companion: class
Supertypes: class Object

trait Matchable

class Any
Self type: JSONValidator.type

LLM-based factual accuracy validation guardrail.

Uses an LLM to evaluate whether content is factually accurate given a reference context. Useful for RAG applications where you want to ensure the model's response aligns with retrieved documents.

Value parameters

llmClient: The LLM client to use for evaluation
referenceContext: The reference text to fact-check against
threshold: Minimum score to pass (default: 0.7)

Attributes

Example

val context = "Paris is the capital of France. It has a population of 2.1 million."
val guardrail = LLMFactualityGuardrail(client, context, threshold = 0.8)
agent.run(query, tools, outputGuardrails = Seq(guardrail))

Companion

object

Supertypes

trait LLMGuardrail

trait OutputGuardrail

trait Guardrail[String]

class Object

trait Matchable

class Any

Show all

Attributes

Companion: class
Supertypes: class Object

trait Matchable

class Any
Self type: LLMFactualityGuardrail.type

LLM-based response quality validation guardrail.

Uses an LLM to evaluate the overall quality of a response including helpfulness, completeness, clarity, and relevance.

Value parameters

llmClient: The LLM client to use for evaluation
originalQuery: The original user query (for relevance checking)
threshold: Minimum score to pass (default: 0.7)

Attributes

Example

val guardrail = LLMQualityGuardrail(client, "What is Scala?")
agent.run(query, tools, outputGuardrails = Seq(guardrail))

Companion

object

Supertypes

trait LLMGuardrail

trait OutputGuardrail

trait Guardrail[String]

class Object

trait Matchable

class Any

Show all

Attributes

Companion: class
Supertypes: class Object

trait Matchable

class Any
Self type: LLMQualityGuardrail.type

LLM-based content safety validation guardrail.

Uses an LLM to evaluate whether content is safe, appropriate, and non-harmful. This provides more nuanced safety checking than keyword-based filters.

Safety categories evaluated:

Harmful or dangerous content
Inappropriate or offensive language
Misinformation or misleading claims
Privacy violations
Illegal activity promotion

Value parameters

customCriteria: Optional additional safety criteria to check
llmClient: The LLM client to use for evaluation
threshold: Minimum score to pass (default: 0.8 - higher for safety)

Attributes

Example

val guardrail = LLMSafetyGuardrail(client)
agent.run(query, tools, outputGuardrails = Seq(guardrail))

Companion

object

Supertypes

trait LLMGuardrail

trait OutputGuardrail

trait Guardrail[String]

class Object

trait Matchable

class Any

Show all

Attributes

Companion: class
Supertypes: class Object

trait Matchable

class Any
Self type: LLMSafetyGuardrail.type

LLM-based tone validation guardrail.

Uses an LLM to evaluate whether content matches the specified tone(s). This is more accurate than the keyword-based ToneValidator for nuanced tone detection, but has higher latency due to the LLM API call.

Value parameters

allowedTones: Set of acceptable tones (e.g., "professional", "friendly")
llmClient: The LLM client to use for evaluation
threshold: Minimum score to pass (default: 0.7)

Attributes

Example

val guardrail = LLMToneGuardrail(
 client,
 Set("professional", "friendly"),
 threshold = 0.8
)
agent.run(query, tools, outputGuardrails = Seq(guardrail))

Companion

object

Supertypes

trait LLMGuardrail

trait OutputGuardrail

trait Guardrail[String]

class Object

trait Matchable

class Any

Show all

Attributes

Companion: class
Supertypes: class Object

trait Matchable

class Any
Self type: LLMToneGuardrail.type

Validates string length is within bounds.

Can be used for both input and output validation. Ensures content is neither too short nor too long.

Value parameters

max: Maximum length (inclusive)
min: Minimum length (inclusive)

Attributes

Companion: object
Supertypes: trait OutputGuardrail

trait InputGuardrail

trait Guardrail[String]

class Object

trait Matchable

class Any
Show all

Attributes

Companion: class
Supertypes: class Object

trait Matchable

class Any
Self type: LengthCheck.type

Detects Personally Identifiable Information (PII) in text.

Uses regex patterns to detect common PII types including:

Social Security Numbers (SSN)
Credit Card Numbers
Email Addresses
Phone Numbers
IP Addresses
Passport Numbers
Dates of Birth

Can be configured to:

Block: Return error when PII is detected (default)
Fix: Automatically mask PII and continue
Warn: Log warning and allow processing to continue

Example usage:

// Block on PII detection
val strictDetector = PIIDetector()

// Mask PII automatically
val maskingDetector = PIIDetector(onFail = GuardrailAction.Fix)

// Detect only credit cards and SSNs
val financialDetector = PIIDetector(
 piiTypes = Seq(PIIType.CreditCard, PIIType.SSN)
)

Value parameters

onFail: Action to take when PII is detected (default: Block)
piiTypes: The types of PII to detect (default: SSN, CreditCard, Email, Phone)

Attributes

Companion: object
Supertypes: trait OutputGuardrail

trait InputGuardrail

trait Guardrail[String]

class Object

trait Matchable

class Any
Show all

Attributes

Companion: class
Supertypes: class Object

trait Matchable

class Any
Self type: PIIDetector.type

Automatically masks Personally Identifiable Information (PII) in text.

Unlike PIIDetector (which can block or warn), PIIMasker always transforms the text by replacing detected PII with redaction placeholders.

This guardrail never blocks - it always allows processing to continue with sanitized text. Use this when you want to:

Sanitize user input before sending to LLM
Redact sensitive information from LLM outputs
Preserve privacy while allowing queries to proceed

Masked text uses placeholders like [REDACTED_EMAIL], [REDACTED_SSN], etc.

Example usage:

// Mask all default PII types
val masker = PIIMasker()

// Mask only specific types
val emailMasker = PIIMasker(Seq(PIIType.Email, PIIType.Phone))

// Use with agent
agent.run(
 query = userInput,
 tools = tools,
 inputGuardrails = Seq(PIIMasker())
)

Value parameters

piiTypes: The types of PII to mask (default: SSN, CreditCard, Email, Phone)

Attributes

Companion: object
Supertypes: trait OutputGuardrail

trait InputGuardrail

trait Guardrail[String]

class Object

trait Matchable

class Any
Show all

Attributes

Companion: class
Supertypes: class Object

trait Matchable

class Any
Self type: PIIMasker.type

Filters profanity and inappropriate content.

This is a basic implementation using a word list. For production, consider integrating with external APIs like:

OpenAI Moderation API
Google Perspective API
Custom ML models

Can be used for both input and output validation.

Value parameters

caseSensitive: Whether matching should be case-sensitive
customBadWords: Additional words to filter beyond the default list

Attributes

Companion: object
Supertypes: trait OutputGuardrail

trait InputGuardrail

trait Guardrail[String]

class Object

trait Matchable

class Any
Show all

Attributes

Companion: class
Supertypes: class Object

trait Matchable

class Any
Self type: ProfanityFilter.type

Detects prompt injection attempts in user input.

Prompt injection attacks attempt to override system instructions, manipulate the AI's behavior, or extract sensitive information.

Detection categories:

Instruction Override: "Ignore previous instructions", "forget your rules"
Role Manipulation: "You are now DAN", "Act as a different AI"
System Prompt Extraction: "What is your system prompt?", "Show your instructions"
Jailbreak Phrases: Common patterns used in known jailbreaks
Code/SQL Injection: Attempts to inject executable code

Example usage:

// Default: Block on injection detection
val detector = PromptInjectionDetector()

// Custom sensitivity (fewer false positives)
val relaxed = PromptInjectionDetector(
 sensitivity = InjectionSensitivity.Medium
)

// Use as input guardrail
agent.run(
 query = userInput,
 tools = tools,
 inputGuardrails = Seq(PromptInjectionDetector())
)

Value parameters

onFail: Action to take when injection is detected (default: Block)
patterns: Custom injection patterns to detect (in addition to defaults)
sensitivity: Detection sensitivity level

Attributes

Companion: object
Supertypes: trait InputGuardrail

trait Guardrail[String]

class Object

trait Matchable

class Any

Attributes

Companion: class
Supertypes: class Object

trait Matchable

class Any
Self type: PromptInjectionDetector.type

Validates that content matches a regular expression.

Can be used for both input and output validation. Useful for enforcing format requirements like email addresses, phone numbers, or custom patterns.

Value parameters

errorMessage: Optional custom error message
pattern: The regex pattern to match

Attributes

Companion: object
Supertypes: trait OutputGuardrail

trait InputGuardrail

trait Guardrail[String]

class Object

trait Matchable

class Any
Show all

Attributes

Companion: class
Supertypes: class Object

trait Matchable

class Any
Self type: RegexValidator.type

Tone categories for content validation.

Attributes

Companion: object
Supertypes: class Object

trait Matchable

class Any
Known subtypes: object Casual

object Excited

object Formal

object Friendly

object Neutral

object Professional
Show all

Attributes

Companion: trait
Supertypes: trait Sum

trait Mirror

class Object

trait Matchable

class Any
Self type: Tone.type

Validates that output matches one of the allowed tones.

This is a simple keyword-based implementation. For production, consider using sentiment analysis APIs or ML models.

Value parameters

allowedTones: The set of acceptable tones

Attributes

Companion: object
Supertypes: trait OutputGuardrail

trait Guardrail[String]

class Object

trait Matchable

class Any

Attributes

Companion: class
Supertypes: class Object

trait Matchable

class Any
Self type: ToneValidator.type

org.llm4s.agent.guardrails.builtin

Members list

Type members

Classlikes

Attributes

Attributes

Attributes

Value parameters

Attributes

Attributes

Attributes

Attributes

Value parameters

Attributes

Attributes

Value parameters

Attributes

Attributes

Value parameters

Attributes

Attributes

Value parameters

Attributes

Attributes

Value parameters

Attributes

Attributes

Value parameters

Attributes

Attributes

Value parameters

Attributes

Attributes

Value parameters

Attributes

Attributes

Value parameters

Attributes

Attributes

Value parameters

Attributes

Attributes

Value parameters

Attributes

Attributes

Attributes

Attributes

Value parameters

Attributes

Attributes