org.llm4s.agent.guardrails.builtin.PromptInjectionDetector
See thePromptInjectionDetector companion object
class PromptInjectionDetector(val patterns: Seq[InjectionPattern], val sensitivity: InjectionSensitivity, val onFail: GuardrailAction) extends InputGuardrail
Detects prompt injection attempts in user input.
Prompt injection attacks attempt to override system instructions, manipulate the AI's behavior, or extract sensitive information.
Detection categories:
- Instruction Override: "Ignore previous instructions", "forget your rules"
- Role Manipulation: "You are now DAN", "Act as a different AI"
- System Prompt Extraction: "What is your system prompt?", "Show your instructions"
- Jailbreak Phrases: Common patterns used in known jailbreaks
- Code/SQL Injection: Attempts to inject executable code
Example usage:
// Default: Block on injection detection
val detector = PromptInjectionDetector()
// Custom sensitivity (fewer false positives)
val relaxed = PromptInjectionDetector(
sensitivity = InjectionSensitivity.Medium
)
// Use as input guardrail
agent.run(
query = userInput,
tools = tools,
inputGuardrails = Seq(PromptInjectionDetector())
)
Value parameters
- onFail
-
Action to take when injection is detected (default: Block)
- patterns
-
Custom injection patterns to detect (in addition to defaults)
- sensitivity
-
Detection sensitivity level
Attributes
- Companion
- object
- Graph
-
- Supertypes
Members list
In this article