org.llm4s.agent.guardrails.builtin
Members list
Type members
Classlikes
Categories of prompt injection attacks.
Categories of prompt injection attacks.
Attributes
- Companion
- object
- Supertypes
-
class Objecttrait Matchableclass Any
- Known subtypes
-
object CodeInjectionobject DataExfiltrationobject InstructionOverrideobject Jailbreakobject RoleManipulationobject SystemPromptExtractionShow all
Attributes
- Companion
- trait
- Supertypes
-
trait Sumtrait Mirrorclass Objecttrait Matchableclass Any
- Self type
-
InjectionCategory.type
Match result from injection detection.
Match result from injection detection.
Attributes
- Supertypes
-
trait Serializabletrait Producttrait Equalsclass Objecttrait Matchableclass AnyShow all
Pattern for detecting a specific type of injection.
Pattern for detecting a specific type of injection.
Value parameters
- category
-
Type of injection attack
- name
-
Human-readable name for the pattern
- regex
-
Regular expression to match
- severity
-
Severity level (1=low, 2=medium, 3=high)
Attributes
- Companion
- object
- Supertypes
-
trait Serializabletrait Producttrait Equalsclass Objecttrait Matchableclass AnyShow all
Attributes
- Companion
- class
- Supertypes
-
trait Producttrait Mirrorclass Objecttrait Matchableclass Any
- Self type
-
InjectionPattern.type
Attributes
- Companion
- trait
- Supertypes
-
trait Sumtrait Mirrorclass Objecttrait Matchableclass Any
- Self type
-
InjectionSensitivity.type
Validates that output is valid JSON matching an optional schema.
Validates that output is valid JSON matching an optional schema.
This guardrail ensures that LLM output is properly formatted JSON, which is useful when requesting structured data from the agent.
Value parameters
- schema
-
Optional JSON schema to validate against (minimal subset)
Attributes
- Companion
- object
- Supertypes
Attributes
- Companion
- class
- Supertypes
-
class Objecttrait Matchableclass Any
- Self type
-
JSONValidator.type
LLM-based factual accuracy validation guardrail.
LLM-based factual accuracy validation guardrail.
Uses an LLM to evaluate whether content is factually accurate given a reference context. Useful for RAG applications where you want to ensure the model's response aligns with retrieved documents.
Value parameters
- llmClient
-
The LLM client to use for evaluation
- referenceContext
-
The reference text to fact-check against
- threshold
-
Minimum score to pass (default: 0.7)
Attributes
- Example
-
val context = "Paris is the capital of France. It has a population of 2.1 million." val guardrail = LLMFactualityGuardrail(client, context, threshold = 0.8) agent.run(query, tools, outputGuardrails = Seq(guardrail)) - Companion
- object
- Supertypes
-
trait LLMGuardrailtrait OutputGuardrailtrait Guardrail[String]class Objecttrait Matchableclass AnyShow all
Attributes
- Companion
- class
- Supertypes
-
class Objecttrait Matchableclass Any
- Self type
LLM-based response quality validation guardrail.
LLM-based response quality validation guardrail.
Uses an LLM to evaluate the overall quality of a response including helpfulness, completeness, clarity, and relevance.
Value parameters
- llmClient
-
The LLM client to use for evaluation
- originalQuery
-
The original user query (for relevance checking)
- threshold
-
Minimum score to pass (default: 0.7)
Attributes
- Example
-
val guardrail = LLMQualityGuardrail(client, "What is Scala?") agent.run(query, tools, outputGuardrails = Seq(guardrail)) - Companion
- object
- Supertypes
-
trait LLMGuardrailtrait OutputGuardrailtrait Guardrail[String]class Objecttrait Matchableclass AnyShow all
Attributes
- Companion
- class
- Supertypes
-
class Objecttrait Matchableclass Any
- Self type
-
LLMQualityGuardrail.type
LLM-based content safety validation guardrail.
LLM-based content safety validation guardrail.
Uses an LLM to evaluate whether content is safe, appropriate, and non-harmful. This provides more nuanced safety checking than keyword-based filters.
Safety categories evaluated:
- Harmful or dangerous content
- Inappropriate or offensive language
- Misinformation or misleading claims
- Privacy violations
- Illegal activity promotion
Value parameters
- customCriteria
-
Optional additional safety criteria to check
- llmClient
-
The LLM client to use for evaluation
- threshold
-
Minimum score to pass (default: 0.8 - higher for safety)
Attributes
- Example
-
val guardrail = LLMSafetyGuardrail(client) agent.run(query, tools, outputGuardrails = Seq(guardrail)) - Companion
- object
- Supertypes
-
trait LLMGuardrailtrait OutputGuardrailtrait Guardrail[String]class Objecttrait Matchableclass AnyShow all
Attributes
- Companion
- class
- Supertypes
-
class Objecttrait Matchableclass Any
- Self type
-
LLMSafetyGuardrail.type
LLM-based tone validation guardrail.
LLM-based tone validation guardrail.
Uses an LLM to evaluate whether content matches the specified tone(s). This is more accurate than the keyword-based ToneValidator for nuanced tone detection, but has higher latency due to the LLM API call.
Value parameters
- allowedTones
-
Set of acceptable tones (e.g., "professional", "friendly")
- llmClient
-
The LLM client to use for evaluation
- threshold
-
Minimum score to pass (default: 0.7)
Attributes
- Example
-
val guardrail = LLMToneGuardrail( client, Set("professional", "friendly"), threshold = 0.8 ) agent.run(query, tools, outputGuardrails = Seq(guardrail)) - Companion
- object
- Supertypes
-
trait LLMGuardrailtrait OutputGuardrailtrait Guardrail[String]class Objecttrait Matchableclass AnyShow all
Attributes
- Companion
- class
- Supertypes
-
class Objecttrait Matchableclass Any
- Self type
-
LLMToneGuardrail.type
Validates string length is within bounds.
Validates string length is within bounds.
Can be used for both input and output validation. Ensures content is neither too short nor too long.
Value parameters
- max
-
Maximum length (inclusive)
- min
-
Minimum length (inclusive)
Attributes
- Companion
- object
- Supertypes
-
trait OutputGuardrailtrait InputGuardrailtrait Guardrail[String]class Objecttrait Matchableclass AnyShow all
Attributes
- Companion
- class
- Supertypes
-
class Objecttrait Matchableclass Any
- Self type
-
LengthCheck.type
Detects Personally Identifiable Information (PII) in text.
Detects Personally Identifiable Information (PII) in text.
Uses regex patterns to detect common PII types including:
- Social Security Numbers (SSN)
- Credit Card Numbers
- Email Addresses
- Phone Numbers
- IP Addresses
- Passport Numbers
- Dates of Birth
Can be configured to:
- Block: Return error when PII is detected (default)
- Fix: Automatically mask PII and continue
- Warn: Log warning and allow processing to continue
Example usage:
// Block on PII detection
val strictDetector = PIIDetector()
// Mask PII automatically
val maskingDetector = PIIDetector(onFail = GuardrailAction.Fix)
// Detect only credit cards and SSNs
val financialDetector = PIIDetector(
piiTypes = Seq(PIIType.CreditCard, PIIType.SSN)
)
Value parameters
- onFail
-
Action to take when PII is detected (default: Block)
- piiTypes
-
The types of PII to detect (default: SSN, CreditCard, Email, Phone)
Attributes
- Companion
- object
- Supertypes
-
trait OutputGuardrailtrait InputGuardrailtrait Guardrail[String]class Objecttrait Matchableclass AnyShow all
Attributes
- Companion
- class
- Supertypes
-
class Objecttrait Matchableclass Any
- Self type
-
PIIDetector.type
Automatically masks Personally Identifiable Information (PII) in text.
Automatically masks Personally Identifiable Information (PII) in text.
Unlike PIIDetector (which can block or warn), PIIMasker always transforms the text by replacing detected PII with redaction placeholders.
This guardrail never blocks - it always allows processing to continue with sanitized text. Use this when you want to:
- Sanitize user input before sending to LLM
- Redact sensitive information from LLM outputs
- Preserve privacy while allowing queries to proceed
Masked text uses placeholders like [REDACTED_EMAIL], [REDACTED_SSN], etc.
Example usage:
// Mask all default PII types
val masker = PIIMasker()
// Mask only specific types
val emailMasker = PIIMasker(Seq(PIIType.Email, PIIType.Phone))
// Use with agent
agent.run(
query = userInput,
tools = tools,
inputGuardrails = Seq(PIIMasker())
)
Value parameters
- piiTypes
-
The types of PII to mask (default: SSN, CreditCard, Email, Phone)
Attributes
- Companion
- object
- Supertypes
-
trait OutputGuardrailtrait InputGuardrailtrait Guardrail[String]class Objecttrait Matchableclass AnyShow all
Filters profanity and inappropriate content.
Filters profanity and inappropriate content.
This is a basic implementation using a word list. For production, consider integrating with external APIs like:
- OpenAI Moderation API
- Google Perspective API
- Custom ML models
Can be used for both input and output validation.
Value parameters
- caseSensitive
-
Whether matching should be case-sensitive
- customBadWords
-
Additional words to filter beyond the default list
Attributes
- Companion
- object
- Supertypes
-
trait OutputGuardrailtrait InputGuardrailtrait Guardrail[String]class Objecttrait Matchableclass AnyShow all
Attributes
- Companion
- class
- Supertypes
-
class Objecttrait Matchableclass Any
- Self type
-
ProfanityFilter.type
Detects prompt injection attempts in user input.
Detects prompt injection attempts in user input.
Prompt injection attacks attempt to override system instructions, manipulate the AI's behavior, or extract sensitive information.
Detection categories:
- Instruction Override: "Ignore previous instructions", "forget your rules"
- Role Manipulation: "You are now DAN", "Act as a different AI"
- System Prompt Extraction: "What is your system prompt?", "Show your instructions"
- Jailbreak Phrases: Common patterns used in known jailbreaks
- Code/SQL Injection: Attempts to inject executable code
Example usage:
// Default: Block on injection detection
val detector = PromptInjectionDetector()
// Custom sensitivity (fewer false positives)
val relaxed = PromptInjectionDetector(
sensitivity = InjectionSensitivity.Medium
)
// Use as input guardrail
agent.run(
query = userInput,
tools = tools,
inputGuardrails = Seq(PromptInjectionDetector())
)
Value parameters
- onFail
-
Action to take when injection is detected (default: Block)
- patterns
-
Custom injection patterns to detect (in addition to defaults)
- sensitivity
-
Detection sensitivity level
Attributes
- Companion
- object
- Supertypes
Attributes
- Companion
- class
- Supertypes
-
class Objecttrait Matchableclass Any
- Self type
Validates that content matches a regular expression.
Validates that content matches a regular expression.
Can be used for both input and output validation. Useful for enforcing format requirements like email addresses, phone numbers, or custom patterns.
Value parameters
- errorMessage
-
Optional custom error message
- pattern
-
The regex pattern to match
Attributes
- Companion
- object
- Supertypes
-
trait OutputGuardrailtrait InputGuardrailtrait Guardrail[String]class Objecttrait Matchableclass AnyShow all
Attributes
- Companion
- class
- Supertypes
-
class Objecttrait Matchableclass Any
- Self type
-
RegexValidator.type
Validates that output matches one of the allowed tones.
Validates that output matches one of the allowed tones.
This is a simple keyword-based implementation. For production, consider using sentiment analysis APIs or ML models.
Value parameters
- allowedTones
-
The set of acceptable tones
Attributes
- Companion
- object
- Supertypes
Attributes
- Companion
- class
- Supertypes
-
class Objecttrait Matchableclass Any
- Self type
-
ToneValidator.type