PromptInjectionDetector

org.llm4s.agent.guardrails.builtin.PromptInjectionDetector
See thePromptInjectionDetector companion object
class PromptInjectionDetector(val patterns: Seq[InjectionPattern], val sensitivity: InjectionSensitivity, val onFail: GuardrailAction) extends InputGuardrail

Detects prompt injection attempts in user input.

Prompt injection attacks attempt to override system instructions, manipulate the AI's behavior, or extract sensitive information.

Detection categories:

  • Instruction Override: "Ignore previous instructions", "forget your rules"
  • Role Manipulation: "You are now DAN", "Act as a different AI"
  • System Prompt Extraction: "What is your system prompt?", "Show your instructions"
  • Jailbreak Phrases: Common patterns used in known jailbreaks
  • Code/SQL Injection: Attempts to inject executable code

Example usage:

// Default: Block on injection detection
val detector = PromptInjectionDetector()

// Custom sensitivity (fewer false positives)
val relaxed = PromptInjectionDetector(
 sensitivity = InjectionSensitivity.Medium
)

// Use as input guardrail
agent.run(
 query = userInput,
 tools = tools,
 inputGuardrails = Seq(PromptInjectionDetector())
)

Value parameters

onFail

Action to take when injection is detected (default: Block)

patterns

Custom injection patterns to detect (in addition to defaults)

sensitivity

Detection sensitivity level

Attributes

Companion
object
Graph
Supertypes
trait Guardrail[String]
class Object
trait Matchable
class Any

Members list

Value members

Concrete methods

def validate(value: String): Result[String]

Validate a value.

Validate a value.

This is a PURE FUNCTION - no side effects allowed. Same input always produces same output.

Value parameters

value

The value to validate

Attributes

Returns

Right(value) if valid, Left(error) if invalid

Inherited methods

def andThen(other: Guardrail[String]): Guardrail[String]

Compose this guardrail with another sequentially.

Compose this guardrail with another sequentially.

The second guardrail runs only if this one passes.

Value parameters

other

The guardrail to run after this one

Attributes

Returns

A composite guardrail that runs both in sequence

Inherited from:
Guardrail
def transform(input: String): String

Optional: Transform the input after validation. Default is identity (no transformation).

Optional: Transform the input after validation. Default is identity (no transformation).

Value parameters

input

The validated input

Attributes

Returns

The transformed input

Inherited from:
InputGuardrail

Concrete fields

override val description: Option[String]

Optional description of what this guardrail validates.

Optional description of what this guardrail validates.

Attributes

val name: String

Name of this guardrail for logging and error messages.

Name of this guardrail for logging and error messages.

Attributes