Skip to main content

FlexInfer

FlexInfer Tools

AI inference deployment toolkit. Build deployment configurations with live validation, generate CLI commands, and explore the full configuration schema.

About FlexInfer

FlexInfer is an AI inference orchestration platform for Kubernetes. It simplifies deploying and scaling large language models with support for multiple backends (vLLM, TGI, Ollama), automatic GPU allocation, and production-ready configurations.

KubernetesvLLMTGIOllamaGPU