SecretLeakGuardrail

org.llm4s.agent.guardrails.builtin.SecretLeakGuardrail
See theSecretLeakGuardrail companion object
class SecretLeakGuardrail(val secretTypes: Seq[SecretType], val onFail: GuardrailAction) extends InputGuardrail, OutputGuardrail

Detects and redacts secrets / credentials in LLM input and output.

Prevents two classes of leak:

  • Input leak: a user accidentally pastes an API key into a prompt that is then logged, cached, or forwarded to third-party LLM providers.
  • Output leak: the LLM echoes a secret back in its response (e.g. when asked to summarise a config file that contains credentials).

Detected credential types (defaults):

  • OpenAI API keys (sk-... / sk-proj-...)
  • Anthropic API keys (sk-ant-...)
  • Google API keys (AIza...)
  • Voyage API keys (pa-...)
  • Langfuse keys (pk-lf-... / sk-lf-...)
  • AWS Access Key IDs (AKIA...)
  • JWT tokens (eyJ...eyJ...sig)

Behaviour is controlled by GuardrailAction:

  • Block (default) – reject the text and return a Left error.
  • Fix – replace secrets with typed placeholders and continue (e.g. [REDACTED_OPENAI_KEY]).
  • Warn – allow the text through unchanged; the caller may inspect the Right and decide what to log.

Example usage:

// Block any input that contains a credential
agent.run(
 query          = userInput,
 tools          = tools,
 inputGuardrails = Seq(SecretLeakGuardrail())
)

// Mask secrets automatically and let the query proceed
agent.run(
 query          = userInput,
 tools          = tools,
 inputGuardrails = Seq(SecretLeakGuardrail.masking)
)

// Also scrub LLM responses
agent.run(
 query           = userInput,
 tools           = tools,
 outputGuardrails = Seq(SecretLeakGuardrail.masking)
)

Value parameters

onFail

Action to take when a secret is detected (default: Block)

secretTypes

Secret types to detect (default: all common provider keys)

Attributes

Companion
object
Graph
Supertypes
trait Guardrail[String]
class Object
trait Matchable
class Any
Show all

Members list

Value members

Concrete methods

def containsSecret(text: String): Boolean

True if the text contains at least one detectable secret.

True if the text contains at least one detectable secret.

Attributes

def summarize(text: String): Map[String, Int]

Returns a map of secret-type name → count for the given text.

Returns a map of secret-type name → count for the given text.

Attributes

override def transform(input: String): String

Replace every detected secret with its type-specific placeholder.

Replace every detected secret with its type-specific placeholder.

e.g. sk-abc123... → [REDACTED_OPENAI_KEY]

This is called automatically when onFail == Fix. It can also be called directly for unconditional sanitisation regardless of the guardrail mode.

Attributes

Definition Classes
def validate(value: String): Result[String]

Validate the text for secrets.

Validate the text for secrets.

Returns:

  • Right(original) if no secrets found, or onFail == Warn
  • Right(masked) if onFail == Fix
  • Left(error) if onFail == Block and secrets are present

Attributes

Inherited methods

def andThen(other: Guardrail[String]): Guardrail[String]

Compose this guardrail with another sequentially.

Compose this guardrail with another sequentially.

The second guardrail runs only if this one passes.

Value parameters

other

The guardrail to run after this one

Attributes

Returns

A composite guardrail that runs both in sequence

Inherited from:
Guardrail

Concrete fields

override val description: Option[String]

Optional description of what this guardrail validates.

Optional description of what this guardrail validates.

Attributes

val name: String

Name of this guardrail for logging and error messages.

Name of this guardrail for logging and error messages.

Attributes