InferiaLLM
API Reference

Orchestration Gateway

Compute and Deployment Management

The Orchestration Gateway manages compute infrastructure, model deployments, and the model registry.

Base URL

  • HTTP: http://localhost:8080
  • gRPC: localhost:50051

Deployment Endpoints

POST /deployment/deploy

Create a new model deployment.

Request:

{
  "model_name": "my-llama",
  "model_version": "1.0",
  "replicas": 1,
  "gpu_per_replica": 1,
  "pool_id": "pool-uuid",
  "workload_type": "inference",
  "engine": "vllm",
  "configuration": {},
  "owner_id": "user-uuid",
  "org_id": "org-uuid",
  "inference_model": "meta-llama/Llama-3-8B-Instruct"
}

Response:

{
  "deployment_id": "uuid",
  "status": "DEPLOYING"
}

GET /deployment/deployments

List all deployments.

Query Parameters:

ParameterTypeDescription
org_idstringFilter by organization

GET /deployment/status/{deployment_id}

Get deployment status and details.

POST /deployment/start

Start a stopped deployment.

POST /deployment/terminate

Terminate a running deployment.

DELETE /deployment/delete/{deployment_id}

Permanently delete a stopped deployment.

Compute Pool Endpoints

POST /deployment/createpool

Create a new compute pool.

Request:

{
  "pool_name": "gpu-pool-1",
  "owner_type": "org",
  "owner_id": "org-uuid",
  "provider": "nosana",
  "allowed_gpu_types": ["RTX 4090", "A100"],
  "max_cost_per_hour": 10.0,
  "is_dedicated": false,
  "provider_pool_id": "external-id",
  "scheduling_policy_json": "{}"
}

GET /deployment/listPools/{owner_id}

List compute pools for an owner.

GET /deployment/list/pool/{pool_id}/inventory

List nodes/instances in a pool.

POST /deployment/deletepool/{pool_id}

Delete a compute pool.

Log Endpoints

GET /deployment/logs/{deployment_id}

Get deployment logs (supports DePIN/IPFS logs).

GET /deployment/logs/{deployment_id}/stream

Get WebSocket info for log streaming.

Model Registry Endpoints

POST /deployment/registerModel

Register a model configuration.

Request:

{
  "model_name": "llama-3-8b",
  "model_version": "1.0",
  "backend": "vllm",
  "artifact_uri": "meta-llama/Llama-3-8B-Instruct",
  "config_json": {}
}

GET /deployment/getModel/{name}/{version}

Get model details.

GET /deployment/listModels/{model_name}

List model versions.

DELETE /deployment/deleteModel

Delete a model registration.

Inventory Endpoints

POST /inventory/heartbeat

Node health reporting.

Request:

{
  "provider": "nosana",
  "provider_instance_id": "job-id",
  "gpu_allocated": 1,
  "vcpu_allocated": 8,
  "ram_gb_allocated": 32,
  "health_score": 100,
  "state": "READY",
  "expose_url": "http://worker:8000"
}

Provider Resources

GET /deployment/provider/resources

Discover available resources from a provider.

Query Parameters:

ParameterTypeDescription
providerstringProvider name (default: nosana)

gRPC Services

ComputePoolManagerService

  • RegisterPool - Create pool
  • ListPools - List pools
  • DeletePool - Delete pool
  • ListPoolInventory - List pool nodes

ModelRegistryService

  • RegisterModel - Register model
  • GetModel - Get model details
  • ListModels - List models
  • DeleteModel - Delete model

ModelDeploymentService

  • DeployModel - Create deployment
  • GetDeployment - Get status
  • ListDeployments - List deployments
  • StartDeployment - Start deployment
  • DeleteDeployment - Terminate deployment

On this page