llm4s-core/org.llm4s/org.llm4s.agent/org.llm4s.agent.guardrails/LLMGuardrail

LLMGuardrail

org.llm4s.agent.guardrails.LLMGuardrail

See theLLMGuardrail companion object

trait LLMGuardrail extends OutputGuardrail

Base trait for LLM-based guardrails (LLM-as-Judge pattern).

LLM guardrails use a language model to evaluate content against natural language criteria. This enables validation of subjective qualities like tone, factual accuracy, and safety that cannot be easily validated with deterministic rules.

Unlike function-based guardrails, LLM guardrails:

Use natural language evaluation prompts
Return a score between 0.0 and 1.0
Pass if score >= threshold
Can use a separate model for judging (to avoid self-evaluation bias)

Attributes

Note

LLM guardrails have higher latency than function-based guardrails due to the LLM API call. Consider using them only when deterministic validation is insufficient.

Example

class MyCustomLLMGuardrail(client: LLMClient) extends LLMGuardrail {
 val llmClient = client
 val evaluationPrompt = "Rate if this response is helpful (0-1)"
 val threshold = 0.7
 val name = "HelpfulnessGuardrail"
}

Companion

object

Graph

Supertypes

trait OutputGuardrail

trait Guardrail[String]

class Object

trait Matchable

class Any

Known subtypes

class LLMFactualityGuardrail

class LLMQualityGuardrail

class LLMSafetyGuardrail

class LLMToneGuardrail

Members list

Value members

Abstract methods

Natural language prompt describing the evaluation criteria.

The prompt should instruct the model to return a score between 0 and 1. The content being evaluated will be provided separately.

Attributes

Example: "Rate if this response is professional in tone. Return only a number between 0 and 1."

The LLM client to use for evaluation. Can be the same client used by the agent or a different one.

Attributes

Concrete methods

Optional completion options for the judge LLM call. Override to customize temperature, max tokens, etc.

Attributes

Minimum score required to pass validation (0.0 to 1.0). Default is 0.7 (70% confidence).

Attributes

Validate content using the LLM as a judge.

The implementation:

Constructs a prompt with evaluation criteria and content
Calls the LLM to get a score
Parses the score and compares to threshold
Returns success if score >= threshold, error otherwise

Attributes

Definition Classes: Guardrail

Inherited methods

Compose this guardrail with another sequentially.

The second guardrail runs only if this one passes.

Value parameters

other: The guardrail to run after this one

Attributes

Returns: A composite guardrail that runs both in sequence
Inherited from:: Guardrail

Optional description of what this guardrail validates.

Attributes

Inherited from:: Guardrail

Optional: Transform the output after validation. Default is identity (no transformation).

Value parameters

output: The validated output

Attributes

Returns: The transformed output
Inherited from:: OutputGuardrail

Inherited and Abstract methods

Name of this guardrail for logging and error messages.

Attributes

Inherited from:: Guardrail

In this article

Generated with