Guardrails Configuration
Configuring safety checks and filtration policies
The Filtration Gateway enforces safety policies through a pipeline of guardrails. These checks happen before the prompt reaches the LLM (Input Guardrails) and after the LLM generates a response (Output Guardrails).
Available Guardrails
1. PII Redaction
Automatically detects and sanitizes Personally Identifiable Information (PII) such as emails, phone numbers, and addresses.
- Provider: Microsoft Presidio (Local)
- Config: Enabled by default via
GUARDRAIL_PII_DETECTION_ENABLED=true
2. Prompt Injection Detection
Detects attempts to bypass system instructions or "jailbreak" the model.
- Provider: Configurable (Lakera Guard, LLM Guard, etc.)
- Config: Managed via Dashboard > Settings > Guardrails
3. Toxicity Detection
Evaluates content for hate speech, harassment, and explicit material.
- Provider: Configurable (Llama Guard via Groq, LLM Guard, etc.)
- Config: Managed via Dashboard > Settings > Guardrails
Note: Provider API keys (Lakera, Groq, etc.) are now configured through the Dashboard, not environment variables.
Performance Optimization
The Filtration Gateway utilizes Parallel Execution to minimize latency.
- Concurrent Scanning: PII redaction and Input Guardrails (e.g., Prompt Injection) run in parallel using
asyncio.gather. - Latency Impact: This reduces the overhead of safety checks by approximately 40%, ensuring that robust security doesn't compromise user experience.
Configuration
Guardrails are configured via the policy engine. A typical policy definition looks like this:
{
"policy_id": "strict_safety",
"input_guardrails": ["pii", "jailbreak", "toxicity"],
"output_guardrails": ["toxicity"],
"threshold": 0.8
}Policies can be assigned per-API key in the Dashboard.
Adding Custom Guardrails
To add a custom guardrail, you must implement the GuardrailProvider interface and register it with the engine.
1. Create the Provider
Create a new file in package/src/inferia/services/filtration/guardrail/providers/ (e.g., regex_provider.py).
from typing import Dict, Any
from ..models import GuardrailResult, Violation
from .base import GuardrailProvider
class RegexProvider(GuardrailProvider):
@property
def name(self) -> str:
return "regex-guard"
async def scan_input(
self,
text: str,
user_id: str,
config: Dict[str, Any],
metadata: Dict[str, Any] = None
) -> GuardrailResult:
# Example: Block specific pattern
if "forbidden_param" in text:
return GuardrailResult(
is_valid=False,
violations=[Violation(type="custom", score=1.0, message="Forbidden pattern detected")]
)
return GuardrailResult(is_valid=True, sanitized_text=text)
async def scan_output(
self,
text: str,
output: str,
user_id: str,
config: Dict[str, Any],
metadata: Dict[str, Any] = None
) -> GuardrailResult:
return GuardrailResult(is_valid=True, sanitized_text=output)2. Register the Provider
Update package/src/inferia/services/filtration/guardrail/engine.py to initialize your new provider.
from .providers.regex_provider import RegexProvider
class GuardrailEngine:
def _load_providers(self):
# ... existing providers ...
# Register new provider
regex = RegexProvider()
self.providers[regex.name] = regexEnvironment Variables
| Variable | Description | Default |
|---|---|---|
GUARDRAIL_PII_DETECTION_ENABLED | Enable PII detection | true |
GUARDRAIL_ENABLE_GUARDRAILS | Master switch for guardrails | true |
GUARDRAIL_DEFAULT_GUARDRAIL_ENGINE | Default safety provider | llm-guard |
