Base trait for LLM-based guardrails (LLM-as-Judge pattern).
LLM guardrails use a language model to evaluate content against natural language criteria. This enables validation of subjective qualities like tone, factual accuracy, and safety that cannot be easily validated with deterministic rules.
Unlike function-based guardrails, LLM guardrails:
Use natural language evaluation prompts
Return a score between 0.0 and 1.0
Pass if score >= threshold
Can use a separate model for judging (to avoid self-evaluation bias)
Attributes
Note
LLM guardrails have higher latency than function-based guardrails due to the LLM API call. Consider using them only when deterministic validation is insufficient.
Example
class MyCustomLLMGuardrail(client: LLMClient) extends LLMGuardrail {
val llmClient = client
val evaluationPrompt = "Rate if this response is helpful (0-1)"
val threshold = 0.7
val name = "HelpfulnessGuardrail"
}