Developer Guide

System Architecture

Deep dive into the Inferia LLM architecture

InferiaLLM uses a microservices architecture split into two planes:

Data Plane – Handles inference traffic (North-South via REST/HTTP)
Control Plane – Manages execution policy and routing (East-West via gRPC)

High-Level Diagram

Component Details

Inference Gateway (Port 8001)

The Inference Gateway is a stateless, high-performance proxy that implements the OpenAI API specification.

Responsibilities:

Request validation and normalization
Context resolution via Filtration Gateway
Rate limiting and quota checks
Input/output guardrail enforcement
Prompt processing (RAG, templates)
Upstream LLM routing and streaming

Endpoint: POST /v1/chat/completions

Request Flow:

Auth → Context Resolution → Rate Limiting → Quota Check
Input Guardrails → Prompt Processing (RAG/Templates)
Upstream LLM Call → Output Guardrails → Logging

Filtration Gateway (Port 8000)

The Filtration Gateway is the security and policy enforcement layer.

Responsibilities:

Authentication (JWT + TOTP 2FA)
Role-Based Access Control (RBAC)
Guardrail scanning (PII, toxicity, prompt injection)
Quota and rate limit enforcement
Audit logging
Provider configuration management

Key Endpoints:

Path	Description
`/auth/*`	Authentication (login, register, 2FA)
`/management/*`	Deployments, API keys, configs
`/internal/*`	Service-to-service APIs
`/admin/*`	RBAC management
`/audit/*`	Audit logs

Orchestration Gateway (Port 8080 HTTP, 50051 gRPC)

The Orchestration Gateway manages compute infrastructure and model deployments.

Responsibilities:

Compute pool management
Model deployment lifecycle
Inventory tracking and heartbeats
Multi-provider abstraction (K8s, SkyPilot, Nosana, Akash)

Key Endpoints:

Path	Description
`/deployment/deploy`	Create deployment
`/deployment/deployments`	List deployments
`/deployment/listPools/{owner_id}`	List compute pools
`/inventory/heartbeat`	Node health reporting

gRPC Services:

ComputePoolManagerService - Pool CRUD
ModelRegistryService - Model registry
ModelDeploymentService - Deployment operations

Admin Dashboard (Port 3001)

React-based admin interface for managing the entire platform.

Features:

Deployment management with logs viewer
Compute pool and instance management
API key management
Knowledge base (RAG) management
User and role management
Provider configuration
Audit log viewer

Data Flow

Client Request: POST /v1/chat/completions to Inference Gateway
Context Resolution: IG queries Filtration Gateway with API key
Policy Checks: Rate limits, quotas, input guardrails
Prompt Processing: Apply templates, retrieve RAG context
Upstream Call: Route to provider (vLLM, OpenAI, Nosana worker)
Response Streaming: Stream tokens back to client
Logging: Async log to Filtration Gateway

Compute Provider Abstraction

The Orchestration Gateway uses an Adapter Pattern to abstract infrastructure:

Adapter	Provider Type	Description
`kubernetes`	On-Prem/Cloud	Standard K8s GPU clusters
`skypilot`	Multi-Cloud	AWS, GCP, Azure VMs
`nosana`	DePIN	Decentralized GPU network (Solana)
`akash`	DePIN	Decentralized compute marketplace

Database Schema

PostgreSQL stores:

Organizations, Users, Roles
API Keys, Deployments
Policies, Usage quotas
Inference logs, Audit logs
Compute pools, Inventory

Redis provides:

Rate limiting counters
Hot state caching
Task queue (Redis Streams)

Prompt Templates

Manage and version dynamic prompts for your applications

Project Structure

Overview of the Inferia LLM codebase and monorepo layout

On this page

High-Level Diagram Component Details Inference Gateway (Port 8001)Filtration Gateway (Port 8000)Orchestration Gateway (Port 8080 HTTP, 50051 gRPC)Admin Dashboard (Port 3001)Data Flow Compute Provider Abstraction Database Schema