FlexInfer docs
Scheduler extender
Kube scheduler extender endpoints and scoring knobs.
Scheduler extender
Binary: flexinfer-sched (services/flexinfer/cmd/flexinfer-sched)
FlexInfer uses the Kubernetes scheduler extender mechanism to:
- filter nodes that can run GPU workloads
- score nodes using benchmark and heuristic signals
Endpoints
The extender exposes:
POST /filterPOST /scoreGET /healthz
The request/response types follow the kube-scheduler extender v1 API:
- request:
k8s.io/kube-scheduler/extender/v1.ExtenderArgs - response:
- filter:
ExtenderFilterResult - score:
[]HostPriority
- filter:
Environment variables
These influence scoring (see services/flexinfer/scheduler/scheduler.go):
BENCHMARK_RESULTS_CONFIGMAP(defaultflexinfer-benchmark-results)SCHED_TPS_WEIGHT(default0.7)SCHED_UTIL_WEIGHT(default0.2)SCHED_COST_WEIGHT(default0.1)SCHED_CACHE_WEIGHT(default0.3)
Inputs used for scoring
- Pod annotations:
flexinfer.ai/modelflexinfer.ai/backend
- Benchmark results (ConfigMap data)
- Optional node annotations (if present):
flexinfer.ai/gpu.utilflexinfer.ai/costflexinfer.ai/kv-cache-usage