Skip to main content
Back to Case Studies
Healthcare4 monthsCompleted

fi-fhir: Format-Agnostic Healthcare Integration

A case study in healthcare integration: how Source Profiles, a three-phase parsing pipeline, and a workflow DSL turn messy legacy formats into semantic events.

January 13, 2026·6 min read·Client: Internal Library
4+
Formats Supported
HL7v2, CSV, EDI X12, CDA
80%+
Parser Coverage
core library test coverage
-60%
Integration Time
new feed onboarding

Tech Stack

Backend
Go
Infrastructure
YAMLCELFHIR R4HL7v2
Monitoring
PrometheusOpenTelemetry

Overview

I've spent years debugging healthcare integrations: parsing HL7v2 messages, mapping local codes to LOINC, and explaining to teams why their "standards-compliant" feed is still breaking production. The pattern is always the same: the spec says one thing, the real world says another.

fi-fhir is my attempt to encode that experience into software. It's a format-agnostic healthcare integration library that transforms legacy formats (HL7v2, CSV, EDI X12, CDA) into semantic events and routes them through configurable workflows. But the real insight isn't the parsing. It's the abstraction layer.

Key Insights:

  • Source Profiles are the unit of scalability, not "HL7v2 support." Each interface/feed gets its own config for tolerance, mapping, and event classification.
  • Three-phase parsing pipeline: byte normalization → syntactic parse → semantic extraction. Each phase is governed by the profile.
  • "Warnings over errors" because healthcare data is messy. Don't fail on recoverable issues.
  • Workflow DSL abstracts format-specific parsing from business logic. CEL expressions enable complex routing without code.
  • Production reliability by default: retry with exponential backoff, circuit breakers, dead letter queues, rate limiting.

The Challenge

Industry Context: Legacy Systems Meet Modern Mandates

Healthcare integration is changing fast. By 2025, an estimated 90% of health systems will have adopted FHIR APIs. The healthcare IT integration market is projected to reach $5.8 billion this year. Regulations like the 21st Century Cures Act and TEFCA are making interoperability mandatory, not optional.

But here's the catch: most healthcare organizations still run on HL7v2. It's battle-tested, deeply embedded in EHR workflows, and isn't going anywhere soon. The challenge is bridging legacy HL7v2 feeds with modern FHIR-based systems while handling the messy reality that every interface is different.

Common Integration Challenges

  • Legacy system compatibility: Older EHRs use outdated data formats and protocols
  • Version management: Juggling v2.3, v2.4, v2.5.1 across different feeds
  • Customization complexity: Extensive custom coding for each interface
  • Semantic inconsistency: The same message type means different things to different systems
  • Technical debt: Maintaining traditional interfaces diverts resources from innovation

The Problem: Every Feed Is Different

When teams say "we support HL7v2," they usually mean "we can parse a well-formed 2.5.1 ADT^A01 message." That's necessary but insufficient.

In practice, every interface has quirks:

RealityExample
Version driftFeed claims 2.5.1 but sends v2.3 data types
Missing segmentsPV1 is "required" but a clinic omits it
Z-segmentsEvery Epic feed has ZPD, ZVN, ZIN (none documented the same way)
Line endingsSpec says \r, you'll receive \r\n, \n, or mixed
DelimitersMSH-2 is usually ^~\&, until it's !~\$
Event semanticsA01 means "admit"… or "register outpatient"… depending on PV1-2

Building parsers that handle all these variations in code is possible, but it doesn't scale. Every new integration means new special-case logic.

The Approach

The Core Insight: Source Profiles

The shift that made everything click: moving the unit of abstraction from format to feed.

A Source Profile is a YAML configuration that owns:

  • HL7 version expectations and tolerated drift
  • Parsing tolerance (missing segments, extra components, non-standard delimiters)
  • Z-segment extraction and mapping rules
  • Identifier normalization and validation rules
  • Terminology mapping (LOCAL → LOINC, SNOMED, ICD-10)
  • Event classification heuristics (A01 → inpatient vs outpatient)

Here's what that looks like:

source_profile:
  id: epic_adt_hosp_a
  name: 'Epic ADT Feed - Hospital A'
  version: '1.0.0'

  hl7v2:
    default_version: '2.5.1'
    timezone: 'America/New_York'
    tolerate:
      missing_segments: ['PV1', 'PD1']
      nte_anywhere: true
      extra_components: true
      non_standard_delimiters: true

    event_classification:
      adt_a01:
        rules:
          - condition: "PV1.2 == 'I'"
            event: 'inpatient_admit'
          - condition: "PV1.2 == 'O'"
            event: 'outpatient_registration'

  z_segments:
    preserve_raw: true
    mappings:
      ZPD:
        - field: 1
          target: patient.extensions.vip_flag
          type: boolean

  identifiers:
    assigning_authority_map:
      'HOSP_A': 'urn:oid:1.2.3.4.5.6.7'
      'SSA': 'urn:oid:2.16.840.1.113883.4.1'
    validation:
      npi: { enabled: true, on_invalid: 'warn' }
      mbi: { enabled: true, on_invalid: 'warn' }

  terminology:
    mappings:
      - source_system: 'LOCAL_LAB'
        target_system: 'http://loinc.org'
        file: './mappings/hosp_a_local_to_loinc.csv'

When a new hospital comes online, I create a new profile, not new code. The parsing logic is stable; only the configuration changes.

Three-Phase Parsing Pipeline

The profile governs a three-phase pipeline:

Phase 1: Byte Normalization

  • Normalize line endings (\r\n, \n\r)
  • Detect character set (BOM, MSH-18) and decode to UTF-8
  • Preserve original bytes for audit/replay

Phase 2: Syntactic Parse

  • Detect field separator and encoding characters from MSH-1/MSH-2
  • Handle non-standard delimiters if profile allows
  • Parse repetitions (~), components (^), subcomponents (&)
  • Process escape sequences (\F\, \S\, \T\, \R\, \E\, \X..\)
  • Preserve unknown segments (Z-segments, vendor extensions)

Phase 3: Semantic Extraction

  • Classify event type using profile rules (A01 → inpatient_admit)
  • Extract identifiers and apply normalization/validation
  • Map terminology using profile mappings
  • Emit canonical semantic events with full provenance

Each phase is governed by the Source Profile. A profile can be strict (fail on missing PV1) or tolerant (emit a warning and continue). The parser doesn't decide; the profile does.

Three-phase parsing pipeline diagram showing byte normalization, syntactic parsing, and semantic extraction, all governed by Source Profiles.
Figure 1. Each parsing phase is governed by the Source Profile. Parse what you can; warn on the rest.

Workflow DSL: Routing Without Code

Once messages become semantic events, routing becomes configuration:

workflow:
  name: adt_routing
  version: '1.0'
  routes:
    - name: admits_to_fhir
      filter:
        event_type: [patient_admit, inpatient_admit]
        condition: event.patient.age >= 65
      transforms:
        - redact: patient.ssn
        - map_terminology: patient.race
      actions:
        - type: fhir
          endpoint: https://fhir.hospital.org/r4
          resource: Patient
          auth:
            type: oauth2
            tokenUrl: https://auth.hospital.org/token
            clientId: ${CLIENT_ID}
            clientSecret: ${CLIENT_SECRET}

    - name: critical_labs_to_alert
      filter:
        event_type: lab_result
        condition: event.observation.interpretation in ["critical", "HH", "LL"]
      actions:
        - type: webhook
          url: https://alerts.hospital.org/critical
          method: POST
        - type: log
          level: warn
          message: 'Critical lab: {{.Observation.Code}} for {{.Patient.MRN}}'

The workflow engine supports:

  • Filters: event type, source system, CEL expressions for complex conditions
  • Transforms: set_field, map_terminology, redact (PHI masking)
  • Actions: FHIR (with OAuth2), webhook, database (PostgreSQL/MySQL/SQLite), message queue (Kafka), logging

CEL (Common Expression Language) makes this work. Instead of writing routing code, you write expressions like event.patient.age >= 65 && event.encounter.class == "inpatient". The engine evaluates them, caches compiled expressions, and routes efficiently.

Implementation Details

Production Reliability

Healthcare integrations can't drop messages. fi-fhir builds reliability into the workflow engine:

PatternWhat It Does
RetryExponential backoff with configurable max attempts
Circuit BreakerStop hammering a failing downstream service
Dead Letter QueuePark failed events for investigation and replay
Rate LimitingToken bucket to avoid overwhelming receivers
OAuth Token RefreshAutomatic refresh with 401 retry

Observability is integrated from the start:

  • Prometheus metrics (workflow_events_processed_total, workflow_action_duration_seconds)
  • OpenTelemetry distributed tracing
  • Structured logging with trace ID correlation
  • Grafana dashboard templates

Key Design Decisions

"Warnings over errors." Healthcare data is messy. A missing PV1 segment shouldn't crash your pipeline if the profile says it's tolerable. The parser emits ParseWarning objects that can be logged, alerted on, or fed into quality metrics, but processing continues.

Identifier-first design. PID-3 (patient identifiers) almost always repeats: MRN, SSN, MBI, insurance ID. I made IdentifierSet a first-class type with validation (NPI/MBI/SSN checksums), normalization (strip dashes, uppercase), and priority selection (which ID is "primary"?).

Profile-driven, not hardcoded. Event classification (is A01 an inpatient admit or outpatient registration?) depends on the source system. Profile rules like condition: "PV1.2 == 'I'" make this configurable per feed.

Go for the core. Performance matters for high-volume feeds. Single binary deployment simplifies operations. Strong typing catches mistakes at compile time. Minimal external dependencies (stdlib + YAML + CEL).

Results

Before: Integration Pain Points

IssueImpact
Per-feed custom code2-3 weeks per new integration
Format-specific parsersDuplicated logic across feeds
Hardcoded routingCode changes for workflow updates
Missing observabilityBlind spots in production pipelines

After: Measurable Improvements

MetricResult
New feed onboarding60% faster (config only)
Parser test coverage80%+ core library
Workflow changesNo code deployment needed
Production visibilityFull tracing + metrics

Lessons Learned

What I'd Do Differently

Profile inference. Currently, you write profiles by hand based on sample messages and interface documentation. A fi-fhir profiles infer ./samples/*.hl7 command that generates a draft profile from message samples would accelerate onboarding.

Vendor profile templates. Epic, Cerner, and Meditech all have semi-predictable patterns for Z-segments and event semantics. Shipping default profiles for common EHRs would reduce boilerplate.

Earlier CLI test coverage. The core parser and workflow engine have solid coverage (80%+). The CLI has gaps because it touches external services (FHIR servers, databases). I should have stubbed external dependencies earlier.

Conclusion

  1. Think in feeds, not formats. "HL7v2 support" is necessary but not sufficient. The real abstraction is the Source Profile.

  2. Configuration over code. Every integration decision that varies per feed belongs in a profile, not in parsing logic.

  3. Build tolerance into the system. Healthcare data is messy. Design for warnings, not failures. Quarantine bad data; don't crash.

  4. Decouple format from workflow. Semantic events (patient_admit, lab_result) let you route messages without caring whether they came from HL7v2, EDI, or FHIR.

  5. Reliability is a feature, not an afterthought. Retry, circuit breaker, DLQ, and observability should be in the architecture from day one.

The full library is at libs/fi-fhir with documentation covering the workflow DSL, FHIR output mappings, and production hardening.

Further Reading

Interested in similar solutions?

Let's discuss how I can help with your project.