Skip to main content
FlexInfer docs

Metrics

Prometheus metrics exposed by FlexInfer components.

Metrics

FlexInfer components expose Prometheus metrics. The exact set depends on which binaries you deploy.

Shared library metrics (pkg/metrics)

Defined in services/flexinfer/pkg/metrics/exporter.go:

  • flexinfer_tokens_per_second{model,backend,node}
  • flexinfer_model_load_seconds{model,node}
  • flexinfer_gpu_temperature_celsius{gpu,node}
  • flexinfer_modelcache_resident_seconds{cache,node,strategy}
  • flexinfer_dev_shm_utilization_percent{node}
  • flexinfer_modelcache_evictions_total{cache,node,policy}
  • flexinfer_modelcache_hit_rate{cache,node}
  • flexinfer_modelcache_size_bytes{cache,node,strategy}
  • flexinfer_modelcache_access_count{cache,node}
  • flexinfer_modelcache_phase{cache,namespace,phase}

Proxy metrics (flexinfer-proxy)

Defined in services/flexinfer/cmd/flexinfer-proxy/main.go:

  • proxy_requests_total{model,status}
  • proxy_scale_ups_total{model}
  • proxy_request_duration_seconds{model}
  • proxy_queued_requests_total{model}
  • proxy_queue_rejected_total{model}
  • proxy_queue_wait_seconds{model}
  • proxy_active_connections{model}
  • proxy_queue_depth{model}
  • proxy_gpugroup_swap_signals_total{gpugroup,model}
  • proxy_gpugroup_queued_requests_total{gpugroup,model}