The inference platform · pre-release
infervisor — runtime v0.x · pre-release
infervisor:~$ ./run --modality all --target any
One inference runtime for LLMs, diffusion,
and voice — across any accelerator.
One engine for every modality, on any chip — on-prem or cloud. Built for production inference.
   LLMs ──────┐                       ┌─────  NVIDIA
   diffusion ─┤    ┌─────────────┐    ├─────  AMD
   voice ─────┼──INFERVISOR──┼─────  DPU
   RAG ───────┘    └─────────────┘    └─────  future
modalities LLM diffusion voice targets NVIDIA AMD DPU deploy on-prem│cloud ready
WHAT IT IS

One runtime. Every modality. Any accelerator.

The inference layer the category is missing — a single engine that replaces a stack of vendor-locked runtimes.

01

One runtime, every modality

LLMs, diffusion, and voice on a single dataflow engine — one stack instead of three.

02

Any accelerator

One model graph compiles to NVIDIA, AMD, or future chips. No vendor lock-in.

03

On-prem or cloud

The same engine on your own hardware or hosted. Start in the cloud, ship to your racks.

COMPANY

Building the inference layer
the category is missing.

As AI workloads shift from training to serving, one runtime across every modality and vendor is the platform layer nobody owns yet. That's what we're building.

Building production AI inference? Let's talk.

Design-partner and early-access conversations are open.

request access talk to us →