Table of Contents
Production Hardening Guide
This guide covers security hardening for fi-fhir deployments in healthcare environments requiring HIPAA compliance.
Table of Contents
- Security Overview
- Container Security
- Kubernetes Security
- Network Security
- Secrets Management
- Encryption
- Audit Logging
- Access Control
- Monitoring & Alerting
- Disaster Recovery
Security Overview
HIPAA Technical Safeguards
fi-fhir deployments handling PHI must implement:
| Safeguard | Implementation |
|---|---|
| Access Control | RBAC, service accounts, network policies |
| Audit Controls | Structured logging, trace correlation, event recording |
| Integrity Controls | Image signing, checksum verification, immutable infrastructure |
| Transmission Security | TLS 1.3, mTLS between services |
| Encryption | At-rest (database, secrets), in-transit (TLS) |
Security Checklist
[ ] Container runs as non-root user
[ ] Read-only root filesystem
[ ] No privileged containers
[ ] Resource limits configured
[ ] Network policies applied
[ ] Secrets encrypted at rest
[ ] TLS enabled for all endpoints
[ ] Audit logging enabled
[ ] Health checks configured
[ ] Pod disruption budget set
[ ] Vulnerability scanning in CI
[ ] Image signatures verified
Container Security
Dockerfile Best Practices
The fi-fhir Dockerfile follows security best practices:
# Multi-stage build minimizes attack surface
FROM golang:1.22-alpine AS builder
# ... build stage ...
# Distroless base - no shell, no package manager
FROM gcr.io/distroless/static-debian12:nonroot
# Run as non-root user (UID 65532)
USER nonroot:nonroot
# Binary only - minimal attack surface
COPY --from=builder --chown=nonroot:nonroot /fi-fhir /fi-fhir
Image Scanning
Scan images before deployment:
# Trivy scan
trivy image fi-fhir:latest --severity CRITICAL,HIGH
# Grype scan
grype fi-fhir:latest
# Snyk scan
snyk container test fi-fhir:latest
Image Signing (Cosign)
Sign images for supply chain security:
# Generate key pair
cosign generate-key-pair
# Sign image
cosign sign --key cosign.key registry.gitlab.flexinfer.ai/libs/fi-fhir:v1.0.0
# Verify signature
cosign verify --key cosign.pub registry.gitlab.flexinfer.ai/libs/fi-fhir:v1.0.0
Kubernetes Security
Pod Security Standards
Apply restrictive pod security:
# namespace-security.yaml
apiVersion: v1
kind: Namespace
metadata:
name: fi-fhir
labels:
pod-security.kubernetes.io/enforce: restricted
pod-security.kubernetes.io/audit: restricted
pod-security.kubernetes.io/warn: restricted
Security Context
The Helm chart applies these security contexts by default:
# Pod-level security
podSecurityContext:
runAsNonRoot: true
runAsUser: 65532
runAsGroup: 65532
fsGroup: 65532
seccompProfile:
type: RuntimeDefault
# Container-level security
securityContext:
allowPrivilegeEscalation: false
readOnlyRootFilesystem: true
capabilities:
drop:
- ALL
Resource Limits
Always set resource limits to prevent resource exhaustion:
resources:
limits:
cpu: 500m
memory: 512Mi
ephemeral-storage: 100Mi
requests:
cpu: 100m
memory: 128Mi
ephemeral-storage: 50Mi
Pod Disruption Budget
Ensure availability during cluster operations:
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: fi-fhir
spec:
minAvailable: 1
selector:
matchLabels:
app.kubernetes.io/name: fi-fhir
Network Security
Network Policies
Default-deny with explicit allow rules:
# network-policy.yaml
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: fi-fhir-default-deny
namespace: fi-fhir
spec:
podSelector: {}
policyTypes:
- Ingress
- Egress
---
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: fi-fhir-allow-ingress
namespace: fi-fhir
spec:
podSelector:
matchLabels:
app.kubernetes.io/name: fi-fhir
policyTypes:
- Ingress
ingress:
# Allow from ingress controller
- from:
- namespaceSelector:
matchLabels:
kubernetes.io/metadata.name: ingress-nginx
ports:
- protocol: TCP
port: 8080
# Allow Prometheus scraping
- from:
- namespaceSelector:
matchLabels:
kubernetes.io/metadata.name: monitoring
ports:
- protocol: TCP
port: 9090
---
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: fi-fhir-allow-egress
namespace: fi-fhir
spec:
podSelector:
matchLabels:
app.kubernetes.io/name: fi-fhir
policyTypes:
- Egress
egress:
# Allow DNS
- to:
- namespaceSelector: {}
podSelector:
matchLabels:
k8s-app: kube-dns
ports:
- protocol: UDP
port: 53
# Allow FHIR server
- to:
- ipBlock:
cidr: 10.0.0.0/8 # Internal network
ports:
- protocol: TCP
port: 443
# Allow database
- to:
- namespaceSelector:
matchLabels:
kubernetes.io/metadata.name: database
ports:
- protocol: TCP
port: 5432
Service Mesh (Istio)
For mTLS between services:
# peer-authentication.yaml
apiVersion: security.istio.io/v1beta1
kind: PeerAuthentication
metadata:
name: fi-fhir-mtls
namespace: fi-fhir
spec:
selector:
matchLabels:
app.kubernetes.io/name: fi-fhir
mtls:
mode: STRICT
---
# authorization-policy.yaml
apiVersion: security.istio.io/v1beta1
kind: AuthorizationPolicy
metadata:
name: fi-fhir-authz
namespace: fi-fhir
spec:
selector:
matchLabels:
app.kubernetes.io/name: fi-fhir
action: ALLOW
rules:
- from:
- source:
principals:
- cluster.local/ns/ingress-nginx/sa/ingress-nginx
to:
- operation:
methods: ['GET', 'POST']
paths: ['/api/*', '/health', '/ready']
Secrets Management
Kubernetes Secrets (Encrypted)
Enable encryption at rest for secrets:
# encryption-config.yaml (for kube-apiserver)
apiVersion: apiserver.config.k8s.io/v1
kind: EncryptionConfiguration
resources:
- resources:
- secrets
providers:
- aescbc:
keys:
- name: key1
secret: <base64-encoded-32-byte-key>
- identity: {}
External Secrets Operator
For production, use External Secrets with HashiCorp Vault:
# secret-store.yaml
apiVersion: external-secrets.io/v1beta1
kind: SecretStore
metadata:
name: vault-backend
namespace: fi-fhir
spec:
provider:
vault:
server: https://vault.example.com
path: secret
version: v2
auth:
kubernetes:
mountPath: kubernetes
role: fi-fhir
serviceAccountRef:
name: fi-fhir
---
# external-secret.yaml
apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
name: fi-fhir-secrets
namespace: fi-fhir
spec:
refreshInterval: 1h
secretStoreRef:
name: vault-backend
kind: SecretStore
target:
name: fi-fhir
creationPolicy: Owner
data:
- secretKey: database-password
remoteRef:
key: fi-fhir/database
property: password
- secretKey: fhir-bearer-token
remoteRef:
key: fi-fhir/fhir
property: bearer_token
Sealed Secrets
Alternative for GitOps workflows:
# Install kubeseal
brew install kubeseal
# Seal a secret
kubectl create secret generic fi-fhir-secrets \
--from-literal=database-password=secret \
--dry-run=client -o yaml | \
kubeseal --format yaml > sealed-secret.yaml
# Apply sealed secret
kubectl apply -f sealed-secret.yaml
Encryption
TLS Configuration
Ingress TLS with cert-manager:
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: fi-fhir
annotations:
cert-manager.io/cluster-issuer: letsencrypt-prod
nginx.ingress.kubernetes.io/ssl-redirect: 'true'
nginx.ingress.kubernetes.io/proxy-ssl-protocols: 'TLSv1.3'
spec:
ingressClassName: nginx
tls:
- hosts:
- fi-fhir.example.com
secretName: fi-fhir-tls
rules:
- host: fi-fhir.example.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: fi-fhir
port:
name: http
Database TLS:
# In values.yaml
config:
database:
enabled: true
sslMode: verify-full # Require TLS with certificate verification
Data at Rest
For database encryption:
-- PostgreSQL: Enable pgcrypto
CREATE EXTENSION IF NOT EXISTS pgcrypto;
-- Encrypt sensitive columns
ALTER TABLE workflow_events
ALTER COLUMN payload
SET DATA TYPE bytea
USING pgp_sym_encrypt(payload::text, current_setting('app.encryption_key'))::bytea;
Audit Logging
Structured Logging Configuration
# In values.yaml
config:
observability:
logLevel: info
logFormat: json # Structured JSON for log aggregation
tracingEnabled: true
Log Fields for Compliance
fi-fhir logs include:
{
"timestamp": "2024-01-15T10:30:00.123Z",
"level": "info",
"message": "Event processed",
"trace_id": "abc123",
"span_id": "def456",
"event_type": "patient_admit",
"source": "epic_adt",
"action": "fhir",
"duration_ms": 45,
"status": "success"
}
Kubernetes Audit Policy
# audit-policy.yaml
apiVersion: audit.k8s.io/v1
kind: Policy
rules:
# Log all access to secrets
- level: Metadata
resources:
- group: ''
resources: ['secrets']
namespaces: ['fi-fhir']
# Log all changes to fi-fhir resources
- level: RequestResponse
verbs: ['create', 'update', 'patch', 'delete']
namespaces: ['fi-fhir']
Access Control
RBAC Configuration
# rbac.yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: fi-fhir-operator
namespace: fi-fhir
rules:
- apiGroups: ['']
resources: ['pods', 'services', 'configmaps']
verbs: ['get', 'list', 'watch']
- apiGroups: ['apps']
resources: ['deployments']
verbs: ['get', 'list', 'watch', 'update', 'patch']
- apiGroups: ['']
resources: ['pods/log']
verbs: ['get', 'list']
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: fi-fhir-operators
namespace: fi-fhir
subjects:
- kind: Group
name: fi-fhir-operators
apiGroup: rbac.authorization.k8s.io
roleRef:
kind: Role
name: fi-fhir-operator
apiGroup: rbac.authorization.k8s.io
Service Account
apiVersion: v1
kind: ServiceAccount
metadata:
name: fi-fhir
namespace: fi-fhir
automountServiceAccountToken: false # Don't mount unless needed
Monitoring & Alerting
Critical Alerts
See dashboards/alerting/workflow-alerts-k8s.yaml for full alert rules:
# Key alerts for production
groups:
- name: fi-fhir-critical
rules:
- alert: FiFhirHighErrorRate
expr: |
rate(workflow_action_errors_total[5m])
/ rate(workflow_events_processed_total[5m]) > 0.05
for: 5m
labels:
severity: critical
annotations:
summary: High error rate in fi-fhir workflow
- alert: FiFhirDLQBacklog
expr: workflow_dlq_size > 100
for: 10m
labels:
severity: warning
annotations:
summary: Dead letter queue growing
- alert: FiFhirCircuitBreakerOpen
expr: workflow_circuit_breaker_state == 2
for: 1m
labels:
severity: critical
annotations:
summary: Circuit breaker open - external service failing
SLO Targets
| Metric | Target | Alert Threshold |
|---|---|---|
| Availability | 99.9% | < 99.5% |
| Latency (p99) | < 500ms | > 1s |
| Error Rate | < 0.1% | > 1% |
| DLQ Size | 0 | > 100 |
Disaster Recovery
Backup Strategy
# Database backup (PostgreSQL)
pg_dump -h $DB_HOST -U $DB_USER -d fi_fhir | \
gzip | \
aws s3 cp - s3://backups/fi-fhir/$(date +%Y%m%d).sql.gz
# Workflow configuration backup
kubectl get configmap fi-fhir -n fi-fhir -o yaml > workflow-config-backup.yaml
# Secrets backup (encrypted)
kubectl get secret fi-fhir -n fi-fhir -o yaml | \
kubeseal --format yaml > sealed-secret-backup.yaml
Recovery Procedures
-
Database Recovery:
aws s3 cp s3://backups/fi-fhir/latest.sql.gz - | \ gunzip | \ psql -h $DB_HOST -U $DB_USER -d fi_fhir -
Application Recovery:
# Redeploy from Helm helm upgrade fi-fhir deploy/helm/fi-fhir/ \ -f production-values.yaml \ --namespace fi-fhir -
DLQ Replay (after recovery):
# Replay failed events from dead letter queue ./fi-fhir workflow replay --dlq --since 24h
RTO/RPO Targets
| Scenario | RTO | RPO |
|---|---|---|
| Pod failure | 30s | 0 |
| Node failure | 5m | 0 |
| Database failure | 15m | 5m |
| Full cluster failure | 1h | 15m |
Security Hardening Checklist
Pre-Deployment
- Vulnerability scan passed (no CRITICAL/HIGH)
- Image signed with cosign
- Secrets stored in Vault/External Secrets
- Network policies applied
- RBAC configured (principle of least privilege)
- TLS certificates provisioned
- Audit logging enabled
Post-Deployment
- Health checks passing (
/health,/ready) - Metrics being scraped
- Alerts configured and tested
- Backup procedures tested
- Runbook reviewed by operations team
- Incident response plan documented
Periodic Review
- Quarterly: Rotate secrets and certificates
- Monthly: Review audit logs for anomalies
- Weekly: Check vulnerability scan results
- Daily: Monitor alert dashboards