FlexInfer

Docs:

FlexInfer docs hubflexinfer-config schema

FlexInfer Tools

AI inference deployment toolkit. Build deployment configurations with live validation, generate CLI commands, and explore the full configuration schema.

Config Editor

Interactive YAML editor for FlexInfer deployment configurations. Build, validate with live schema checking, and export production-ready configs.

Open

CLI Builder

Visual command builder for FlexInfer CLI. Configure models, backends, and resource allocation with parameter hints and copy-ready output.

Open

About FlexInfer

FlexInfer is an AI inference orchestration platform for Kubernetes. It simplifies deploying and scaling large language models with support for multiple backends (vLLM, TGI, Ollama), automatic GPU allocation, and production-ready configurations.

KubernetesvLLMTGIOllamaGPU

Documentation

Repo →

Quickstart

Get started with FlexInfer in minutes. Install, configure, and deploy your first model.

Open

Configuration schema

Complete reference for flexinfer-config.yaml with all options and examples.

Open

Model backends

Supported inference backends: vLLM, TGI, Ollama, and custom containers.

Open

Operations guide

Production deployment, scaling, monitoring, and troubleshooting.

Open

These playground tools mirror the FlexInfer CLI and configuration concepts. Changes made here can be exported and used directly in your deployments.