FlexInfer Documentation
This directory is the canonical documentation for services/flexinfer.
Getting Started
| Guide | Description |
|---|---|
| Installation | Install FlexInfer on your cluster |
| Quickstart | Deploy your first model in 5 minutes |
| Configuration | Environment variables and settings |
User Guides
| Guide | Description |
|---|---|
| Models (v1alpha2) | Creating and managing Model resources |
| Proxy & Requests | Sending inference requests |
| API Compatibility | OpenAI API compatibility |
| Routing | Session affinity, prefix routing, load balancing |
| GPU Sharing | Time-sharing GPUs between models |
| Caching | Model weight caching strategies |
| Quantization Pipelines | GGUF/AWQ/GPTQ ModelCache quantization workflows |
| Operations | Day-2 operations and troubleshooting |
Developer Guides
| Guide | Description |
|---|---|
| Local Development | Setting up a dev environment |
| Architecture | System design and components |
| Backends | Supported inference backends |
| Testing | Running tests |
| Release & Images | Building and releasing |
Specifications
| Spec | Description |
|---|---|
| CRDs | Custom Resource Definitions |
| Proxy API | Proxy HTTP endpoints |
| Scheduler Extender | Kubernetes scheduler integration |
| Labels & Annotations | Resource metadata conventions |
| Metrics | Prometheus metrics reference |
Planning
| Document | Description |
|---|---|
| Feature Inventory | Current feature status |
| Next Roadmap | Upcoming work |
| Multi-Tenancy Design | M1 namespace isolation foundation |
| Phase 1 | Controller & API hardening |
| Phase 2 | Serverless hardening |
| Phase 3 | Routing & performance |
| Phase 4 | Operational polish |
| Phase 5 | Multi-cluster (future) |
Quick Links
- Need help? Start with Quickstart then Operations
- Debugging issues? See troubleshooting in Operations
- API reference? See CRDs and Proxy API
Site Integration
These docs are intentionally written to be "site-syncable" (plain Markdown, optional YAML frontmatter).
services/flexinfer-site can copy and render them as part of the playground/docs experience.
Navigation is defined in docs/nav.yaml.