FlexInfer docs

Operations

Common day-2 workflows: inspect, debug, and clean up.

Inspect what’s running

kubectl -n flexinfer-system get deploy,ds,svc
kubectl -n flexinfer-system get pods -o wide

kubectl -n flexinfer-system get models -w
kubectl -n flexinfer-system describe model <name>

kubectl -n flexinfer-system get modeldeployments -w
kubectl -n flexinfer-system describe modeldeployment <name>

Check events:

kubectl -n flexinfer-system describe pod <pod>

Check backend logs:

kubectl -n flexinfer-system logs <pod> -c model --tail=200

Confirm GPU resources:

kubectl get nodes -o wide
kubectl describe node <node> | rg -n \"nvidia.com/gpu|amd.com/gpu\"

FlexInfer resources are hierarchical. Delete the parent, not the children.

v1alpha2: delete the Model

kubectl -n flexinfer-system delete model <name>

v1alpha1: delete the ModelDeployment (not the Deployment)

kubectl -n flexinfer-system delete modeldeployment <name>

For detailed cleanup guidance (including RAM caches and stuck Jobs), see the “Resource Cleanup Procedures” section in services/flexinfer/AGENTS.md.