Building a Real-Time Kubernetes Operations Dashboard
How I built FlexDeck, a full-stack operations dashboard with real-time K8s monitoring, GitLab CI/CD visualization, and AI model management using Go and SolidJS.
Tech Stack
Overview
Managing a homelab Kubernetes cluster with multiple workloads requires visibility into cluster state, CI/CD pipelines, and AI model deployments. I built FlexDeck to consolidate these views into a single real-time dashboard with interactive visualizations.
The Challenge
Running a production-like homelab environment means dealing with:
- Multiple clusters: K3s for applications, Harvester for infrastructure
- GitLab CI/CD: Dozens of pipelines across 40+ repositories
- AI workloads: GPU scheduling, model health, inference metrics
- No commercial tooling budget: Datadog and similar tools are expensive for personal use
I needed a dashboard that would:
- Show real-time cluster state without manual refresh
- Visualize CI/CD pipelines with enough detail to debug failures
- Monitor GPU utilization and AI model health
- Work on both desktop and mobile for on-the-go checks
The Approach
Technology Choices
Backend: Go
- Excellent Kubernetes client libraries
- Low memory footprint for always-on service
- Strong concurrency primitives for real-time streaming
Frontend: SolidJS
- Fine-grained reactivity without Virtual DOM overhead
- Excellent performance for frequent updates
- Smaller bundle than React for faster mobile loads
Data streaming: WebSocket + Server-Sent Events
- WebSocket for bidirectional communication (kubectl exec, logs)
- SSE for unidirectional updates (cluster state, metrics)
Architecture
Implementation Details
Kubernetes Watch Streams
The Go backend uses informers to watch cluster resources efficiently:
type ClusterWatcher struct {
clientset *kubernetes.Clientset
informers informers.SharedInformerFactory
updates chan ResourceUpdate
}
func (w *ClusterWatcher) WatchPods(ctx context.Context) {
informer := w.informers.Core().V1().Pods().Informer()
informer.AddEventHandler(cache.ResourceEventHandlerFuncs{
AddFunc: func(obj interface{}) {
pod := obj.(*corev1.Pod)
w.updates <- ResourceUpdate{
Type: "pod",
Action: "added",
Resource: podToDTO(pod),
}
},
UpdateFunc: func(old, new interface{}) {
pod := new.(*corev1.Pod)
w.updates <- ResourceUpdate{
Type: "pod",
Action: "updated",
Resource: podToDTO(pod),
}
},
DeleteFunc: func(obj interface{}) {
pod := obj.(*corev1.Pod)
w.updates <- ResourceUpdate{
Type: "pod",
Action: "deleted",
Resource: podToDTO(pod),
}
},
})
}
Fine-Grained Reactivity with SolidJS
SolidJS signals update only the specific DOM elements that need to change:
function PodCard(props: { pod: Pod }) {
// Only re-renders when this specific pod's status changes
const statusColor = () => {
switch (props.pod.status) {
case 'Running':
return 'text-green-400';
case 'Pending':
return 'text-yellow-400';
case 'Failed':
return 'text-red-400';
default:
return 'text-gray-400';
}
};
return (
<div class="pod-card">
<span class="pod-name">{props.pod.name}</span>
<span class={`pod-status ${statusColor()}`}>{props.pod.status}</span>
<Show when={props.pod.restarts > 0}>
<span class="restart-count">Restarts: {props.pod.restarts}</span>
</Show>
</div>
);
}
GitLab CI/CD Visualization
For CI/CD pipelines, I built a custom visualization using D3.js with particle effects showing job flow:
- Pipelines displayed as directed graphs
- Jobs animate between stages as they progress
- Failed jobs pulse red for visibility
- Click-to-expand for job logs
GPU Monitoring
The dashboard integrates with DCGM (NVIDIA) and ROCm metrics for GPU health:
type GPUMetrics struct {
DeviceID string `json:"device_id"`
Utilization float64 `json:"utilization"`
MemoryUsed int64 `json:"memory_used"`
MemoryTotal int64 `json:"memory_total"`
Temperature int `json:"temperature"`
PowerDraw float64 `json:"power_draw"`
ActiveModel string `json:"active_model,omitempty"`
}
Results
Performance Metrics
| Metric | Target | Achieved |
|---|---|---|
| Initial load time | <2s | 1.2s |
| Update latency | <200ms | <100ms |
| Memory usage (backend) | <100MB | 65MB |
| Bundle size (frontend) | <500KB | 380KB |
Operational Impact
Before FlexDeck:
- Switching between
kubectl, Grafana, and GitLab UI constantly - Missing pipeline failures until builds broke
- No mobile access to cluster state
After FlexDeck:
- Single pane of glass for all operations
- Real-time notifications for failures
- Check cluster health from phone during incidents
Visualizations Built
- Cluster topology: Interactive node and pod layout
- Pipeline DAG: Directed graph with job status animations
- Resource utilization: Real-time charts for CPU/memory/GPU
- Namespace overview: Grid view with health indicators
- Model registry: AI model versions and deployment status
Lessons Learned
SolidJS for Real-Time UIs
The choice of SolidJS over React paid off significantly:
- No batched updates: Changes appear instantly, not on next tick
- No re-render cascades: Parent updates don't re-render children
- Smaller runtime: 7KB vs React's 40KB+
The mental model is different (signals vs state), but for dashboards with many independent updating components, it's worth learning.
Kubernetes Informers vs Polling
Initially I polled the API server every 5 seconds. Switching to informers:
- Reduced API server load by 95%
- Eliminated "stale data" UX issues
- Enabled instant status updates
WebSocket Reconnection
Real-time connections fail. Robust reconnection is essential:
func (c *Client) maintainConnection(ctx context.Context) {
backoff := time.Second
maxBackoff := time.Minute
for {
select {
case <-ctx.Done():
return
default:
}
if err := c.connect(); err != nil {
log.Printf("Connection failed: %v, retrying in %v", err, backoff)
time.Sleep(backoff)
backoff = min(backoff*2, maxBackoff)
continue
}
backoff = time.Second // Reset on success
c.handleMessages(ctx)
}
}
Future Improvements
- Multi-tenant support: Share dashboard with team members
- Alert integration: PagerDuty/Slack notifications from dashboard
- Cost tracking: Integrate with cloud billing APIs
- Capacity planning: Predictive scaling recommendations
Conclusion
Building a custom operations dashboard was more work than using off-the-shelf tools, but the result is exactly what I need: fast, focused, and free. The combination of Go's efficiency and SolidJS's reactivity creates a dashboard that feels instant, even monitoring multiple clusters simultaneously.
For homelabbers and small teams, this approach provides commercial-grade visibility without commercial-grade costs.
Interested in similar solutions?
Let's discuss how I can help with your project.