Welcome to My Homelab

November 27, 2025·5 min read

labhomelabkubernetesgitopsrocm

I’m Cody Blevins. This site is where I publish the systems I actually build and operate: a small, production-like homelab, the products running on top of it, and the operational lessons that come out of keeping the whole thing alive.

What started as “can I run LLMs locally?” turned into a more durable theme: make operations boring enough that the interesting work can happen on top of them. If a thing is useful, I want it to be deployable, observable, reversible, and documented well enough that I can come back to it three months later without re-learning it from scratch.

TL;DR

The homelab is not just for model hosting anymore. It now backs public docs, playgrounds, demos, and product surfaces on this site.
The current core stack is Harvester -> K3s -> GitLab CI -> Flux, with AMD-first GPU lanes for text, vision, and media workloads.
The projects you’ll see most often here are FlexInfer, Loom / Loom Core / MentatLab, fi-fhir, and the public FlexDeck-powered demos.
The current focus is less “can it run?” and more “can it ship repeatedly without surprises?”

The Philosophy

My day job is integration architecture and production systems. The homelab is where I can try ideas without asking for a budget or a change window, but still keep myself honest:

build it like a service, not a one-off,
measure it (even if the numbers are “directional”),
and keep the failure modes visible.

The Hardware Zoo

It’s a mix of enterprise hardware and repurposed PCs. I optimize for “easy to repair” and “good enough,” not perfection.

The Backbone: Dell R730xd

The heart of the operation is a Dell PowerEdge R730xd running Harvester HCI. It handles virtualization and gives me a clean substrate for K3s control-plane and worker VMs, plus Longhorn-backed storage that is good enough to be operationally boring.

From this single server, I run:

3 K3s control plane nodes (HA Kubernetes, because I’ve been burned before)
Multiple worker VMs for general compute
Storage pools that back the entire cluster

The Power Duo: AMD Radeon RX 7900 XTX (x2)

I run two RX 7900 XTX cards and split them by workload so contention is predictable instead of mysterious:

Quality lane (cblevins-7900xtx): always-on text inference plus larger on-demand models when I want more quality or reasoning headroom.
Vision + fast lane (cblevins-5930k): the vision workload stays warm, and a faster text model can take over when that lane is idle.

The Legacy Branch

NVIDIA GTX 980 Ti - The old guard (cblevins-gtx980ti). It is no longer the star of the show, but it still earns its keep as a small media lane and a useful reminder that “legacy” hardware often sticks around longer than the architecture diagram wants to admit.

The Edge

Raspberry Pi 5 - Handling lightweight DNS and monitoring-adjacent work. It is the quiet counterpoint to the much louder basement hardware.

The Gaming PC Graveyard

Several machines in the cluster are former gaming rigs. I like hardware with a known failure history and parts I can replace.

The Software Stack

All of this hardware is orchestrated through GitOps. Everything that matters lives in version control, gets built in CI, and is reconciled automatically. If it cannot be rebuilt from Git, I do not count it as infrastructure.

Architecture diagram showing ingress routing to services across Harvester VMs and GPU nodes in a K3s cluster. — **Figure 1.** A simplified view of how traffic lands on services across VM workers and dedicated GPU nodes.

Kubernetes (K3s)

The cluster runs K3s, with the control plane spread across three VMs and workers split between VM nodes and the dedicated GPU hosts. The exact node list changes over time. The important part is the operating model:

GitLab CI builds artifacts,
Harbor stores images,
Flux reconciles desired state,
and the cluster is treated like a projection of Git, not a hand-edited appliance.

All configuration lives in the platform/gitops repo. Here's a peek at the structure:

platform/gitops/
├── clusters/
│   └── home-cluster/     # Flux definitions
├── k3s/
│   ├── ai/
│   │   ├── litellm/      # Model routing
│   │   ├── llamacpp/     # Text inference
│   │   └── comfyui/      # Image/Video gen
│   ├── apps/             # Web services
│   └── infra/            # Cert-manager, monitoring
└── Dockerfiles/          # Custom images

This "everything as code" approach means I can rebuild the entire cluster from scratch just by bootstrapping Flux.

What the cluster is backing now

The biggest change since I first wrote this post is that the homelab is now backing real public surfaces, not just private experiments:

FlexInfer: the inference control plane and routing work behind my GPU posts, config contracts, and playgrounds.
Loom / Loom Core / MentatLab: multi-assistant MCP config sync, agent lifecycle tooling, HUD/ops surfaces, and operator DAG workflows.
fi-fhir: healthcare integration docs, runnable examples, and playground tools for source profiles and mappings.
flexinfer-site itself: multi-project docs, case studies, and a growing set of public demos driven by the same stack I use internally.
FlexDeck public visualizations: sanitized operational views for cluster topology, model status, CI pipelines, and metrics summaries.

AI / runtime stack

The runtime mix is intentionally pragmatic:

FlexInfer for model routing, GPU scheduling, and deployment control
llama.cpp and MLC-LLM where they make the most sense operationally
ComfyUI for media workloads
FlexDeck as the operations surface for live cluster and pipeline views

I still experiment with backends. The difference now is that experiments happen inside a clearer platform boundary instead of in a pile of one-off containers.

Storage

Longhorn provides distributed block storage across the cluster. It is not the fastest option, but it is Kubernetes-native, understandable, and good enough for the kinds of failures I actually want to debug.

Important data gets replicated. Model artifacts and experiments get more pragmatic treatment based on how expensive they are to rebuild.

What I’m tightening next

The next wave of work is less about standing the system up and more about hardening the public edges:

Observability rollout on this site: /metrics and app-level instrumentation are live; the remaining work is wiring the scrape and alerting path cleanly with ServiceMonitor and PrometheusRule.
Public visualization hardening: the FlexDeck-backed demos are live, and the remaining cleanup is the boring security/operability work like rate limiting and narrower RBAC.
Documentation + playground parity: keep the docs, examples, and interactive tools aligned so the public site reflects the real product state instead of lagging behind it.

That’s the through-line for this whole setup now. The homelab is still where I experiment, but the bar is higher than “it works on my LAN.” The useful question is whether the thing can survive contact with repeated deploys, public demos, and future me.

8 min read

kubernetesrocm

Two-Lane Text GPU Allocation: Quality + Vision/Fast (Plus a Media Lane)

How I redistributed 6 models across 3 GPU nodes to eliminate contention, using priority-based shared groups and label-based aliases for routing and failover.

8 min read

rocmgitops

Getting Gemma 4 Running on a Radeon 7900 XTX (with and without TurboQuant)

What it took to get Gemma 4 E4B serving cleanly on Radeon through FlexInfer: a stable TRITON lane on a 7900 XTX, an experimental TurboQuant long-context lane on a second node, and the GPTQ pipeline work still underway.

6 min read