FlexInfer docs
FlexInfer user docs
Concepts and workflows for deploying and operating models with FlexInfer.
User docs
FlexInfer is a Kubernetes-native set of controllers and agents for running AI workloads (LLMs and image generation) on GPU nodes with sane defaults for a homelab.
Start here
docs/user/quickstart.mdfor install + first model.docs/user/models-v1alpha2.mdfor the recommended CRD (Model).docs/user/proxy.mdfor how requests are routed (OpenAI-style payloads supported).
API versions
- Recommended:
ai.flexinfer/v1alpha2Model(single resource). - Legacy:
ai.flexinfer/v1alpha1ModelDeployment+ModelCache+GPUGroup(more knobs, more moving parts).