FlexInfer docs
Labels & annotations
Node labels, pod labels, and discovery/routing annotations.
Labels & annotations
Node labels (agent)
The node agent applies GPU capability labels. Common keys:
flexinfer.ai/gpu.vendor→AMD/NVIDIAflexinfer.ai/gpu.arch→gfx1100/sm_90/ etcflexinfer.ai/gpu.vram→24Giflexinfer.ai/gpu.count→1flexinfer.ai/gpu.int4→true|false
See services/flexinfer/AGENTS.md for the full list.
Workload labels (controller)
The controller labels managed pods/services for discovery:
flexinfer.ai/model: model nameflexinfer.ai/backend: backend nameflexinfer.ai/gpu-group: shared group name (v1alpha2spec.gpu.shared)
LiteLLM discovery annotations
When enabled (spec.litellm.enabled: true), the controller adds:
litellm.flexinfer.ai/served-modellitellm.flexinfer.ai/aliases(comma-separated)litellm.flexinfer.ai/copilot-model(optional)
Proxy ↔ GPUGroup annotations (v1alpha1)
The proxy writes queue state onto a GPUGroup so the controller can decide swaps:
flexinfer.ai/queue.<modelName>: "<depth>"flexinfer.ai/queue-since.<modelName>: "<rfc3339>"
Service label routing
Service labels can be attached to a model and used for routing. Relevant fields/annotations:
- v1alpha2:
Model.spec.serviceLabels - v1alpha1:
ModelDeployment.spec.serviceLabels - proxy uses
ai.flexinfer/active-servicesto cache active service labels during GPUGroup swaps