FlexInfer docs
flexinfer-config (Playground)
The YAML configuration schema validated by flexinfer-site’s playground.
flexinfer-config (Playground)
flexinfer-config.yaml is a human-friendly configuration format intended for:
- quickly describing a model + runtime needs
- validating config in the
flexinfer-siteplayground - acting as a stable input to future tooling (CLI → CRDs)
This format is separate from the Kubernetes CRDs (Model, ModelDeployment, etc.).
Schema overview
Top-level keys:
version(required): semver-ish string, e.g.1.0model(required): which model + backend to runresources(optional): GPU/CPU/memory and replica countserving(optional): port, concurrency, timeoutslogging(optional): level and format
Example
version: '1.0'
model:
name: meta-llama/Llama-3.1-8B-Instruct
backend: vllm
quantization: awq
context_length: 8192
resources:
gpu:
type: nvidia
count: 1
memory: 24Gi
replicas: 1
cpu: '4000m'
memory: 32Gi
serving:
port: 8080
max_concurrent: 50
timeout: 60s
logging:
level: info
format: json
Field reference
model
name(required): model name or HuggingFace model IDbackend(required):vllm|llamacpp|tgi|ollamaquantization(optional):awq|gptq|fp16|fp32|int8|int4context_length(optional): integermax_batch_size(optional): integer
resources
gpu.type(optional):amd|nvidia|cpugpu.count(optional): integergpu.memory(optional): string like24Gireplicas(optional): integercpu(optional): string like4000mor4memory(optional): string like32Gi
serving
port(optional): integermax_concurrent(optional): integertimeout(optional): string like60s,5mhealth_check(optional): boolean
logging
level(optional):debug|info|warn|errorformat(optional):json|text
Source of truth (today)
The validator for this schema currently lives in:
services/flexinfer-site/lib/schemas/flexinfer-config.ts
The corresponding JSON Schema artifact lives in:
services/flexinfer/specs/jsonschema/flexinfer-config.schema.json
As the FlexInfer CLI evolves, this spec should become a shared artifact (generated schema + docs) so the playground and CLI don’t drift.