Core Features
Features
Core capabilities of InferiaLLM
InferiaLLM provides a comprehensive set of features for managing LLM infrastructure in production.
Inference
OpenAI-Compatible API
Drop-in replacement for OpenAI API with full streaming support.
from openai import OpenAI
client = OpenAI(
base_url="http://localhost:8001/v1",
api_key="sk-inferia-..."
)
response = client.chat.completions.create(
model="llama-3-8b",
messages=[{"role": "user", "content": "Hello!"}],
stream=True
)Automatic Features
- Prompt Templates: System prompts injected automatically
- RAG Context: Retrieved documents from Knowledge Base
- Guardrails: Safety scanning per deployment config
Compute Orchestration
Multi-Provider Support
| Provider | Type | Description |
|---|---|---|
| Kubernetes | On-Prem/Cloud | Standard GPU clusters |
| SkyPilot | Multi-Cloud | AWS, GCP, Azure VMs |
| Nosana | DePIN | Decentralized GPU network |
| Akash | DePIN | Decentralized compute |
Deployment Management
- Create, start, stop, terminate deployments
- Monitor replica status and health
- View inference and terminal logs
- Configure rate limits per deployment
Security & Access Control
Authentication
- JWT-based authentication
- TOTP two-factor authentication (2FA)
- Invitation-based onboarding
RBAC (Role-Based Access Control)
| Role | Capabilities |
|---|---|
| Admin | Full access, user management |
| Developer | Deployments, API keys, configs |
| User | Read access, limited API keys |
| Guest | View only |
API Keys
- Generate scoped API keys per deployment
- Automatic rotation support
- Usage tracking and quotas
Guardrails
Safety Scanners
- PII Detection: Redact emails, phone numbers, SSNs
- Toxicity Filter: Block harmful content
- Prompt Injection: Detect jailbreak attempts
- Secret Detection: Prevent API key leakage
Providers
- LLM Guard (Local)
- Llama Guard (Groq API)
- Lakera Guard (API)
Knowledge Base (RAG)
Document Management
- Upload PDF, DOCX, TXT files
- Automatic chunking and embedding
- ChromaDB vector storage
Deployment Integration
- Link collections to deployments
- Automatic context retrieval
- Configurable chunk count
Prompt Templates
Features
- Jinja2 templating engine
- Variable injection
- Version management
- Per-deployment assignment
Variables
{{user_message}}- Current user input{{context}}- RAG-retrieved content- Custom variables via API
Observability
Inference Logs
- Request/response pairs
- Token usage and latency
- Model and deployment metadata
Audit Logs
- All administrative actions
- Security events
- Immutable history
Metrics
- Prometheus-compatible export
- Request latency (p50, p95, p99)
- Token throughput
- Error rates
