Skip to main content
fi-fhir docs

Philosophy: Profile-Driven Normalization

Core Concepts

This document explains the fundamental concepts behind fi-fhir's design and architecture.

Philosophy: Profile-Driven Normalization

The key insight behind fi-fhir is that the unit of scalability is the Source Profile, not "HL7v2 support".

In traditional healthcare integration:

  • You build a monolithic "HL7v2 parser"
  • Every feed requires code changes for edge cases
  • Tolerance rules are scattered across the codebase

In fi-fhir:

  • Each interface/feed gets its own Source Profile
  • Parsing behavior is driven by configuration
  • Adding a new feed means creating a new profile, not writing code

Parsing Pipeline

fi-fhir processes messages through a three-phase pipeline:

┌─────────────────────┐    ┌─────────────────────┐    ┌─────────────────────┐
│ Phase 1             │    │ Phase 2             │    │ Phase 3             │
│ Byte Normalization  │───>│ Syntactic Parsing   │───>│ Semantic Extraction │
└─────────────────────┘    └─────────────────────┘    └─────────────────────┘
        │                          │                          │
        v                          v                          v
  Raw bytes               Parsed segments           Canonical events
  (UTF-8, line           (fields, components,      (patient_admit,
   endings, BOM)          escape sequences)         lab_result, etc.)

Phase 1: Byte Normalization

Input: Raw bytes from source system Output: Normalized UTF-8 string

Operations:

  • BOM (Byte Order Mark) detection and handling
  • Character encoding conversion (ISO-8859-1 → UTF-8)
  • Line ending normalization (CRLF/CR → LF)
  • Trailing whitespace handling

Configuration (in Source Profile):

encoding:
  charset: UTF-8
  lineEnding: auto
  bomHandling: strip

Phase 2: Syntactic Parsing

Input: Normalized string Output: Parsed message structure

Operations:

  • Field separator extraction from MSH.1
  • Encoding characters from MSH.2
  • Segment splitting
  • Field/component/subcomponent splitting
  • Escape sequence handling (\H\, \N\, \.br\)

Configuration:

syntax:
  hl7Version: '2.5'
  fieldSeparator: '|'
  encodingChars: "^~\\&"
  strictMode: false

Phase 3: Semantic Extraction

Input: Parsed message structure Output: Canonical semantic events

Operations:

  • Message type classification (ADT^A01 → patient_admit)
  • Identifier extraction (MRN, SSN, NPI)
  • Field mapping to canonical model
  • Terminology normalization

Configuration:

semantics:
  messageTypes: [ADT, ORU]
  patientIdentifiers:
    - source_field: PID.3.1
      assigning_authority: EPIC
      identifier_type: MRN

Canonical Event Model

All input formats map to a common set of semantic events. This decouples:

  • Input parsing from business logic
  • Workflow routing from format specifics
  • FHIR generation from source systems

Event Types

CategoryEvent Types
Patientpatient_admit, patient_discharge, patient_transfer, patient_update, patient_merge
Schedulingappointment_scheduled, appointment_cancelled, appointment_rescheduled, appointment_noshow, appointment_checked_in
Lab/Clinicallab_result, lab_ordered, lab_cancelled, vital_sign, condition, procedure, immunization
Claimsclaim_submitted, claim_adjudicated, prior_auth_request, prior_auth_response
Documentsdocument, document_addendum, document_replacement, document_status_change
Financialfinancial_transaction

Event Structure

Every event includes:

{
  "meta": {
    "id": "evt_unique_id",
    "type": "patient_admit",
    "source": "epic_adt",
    "format": "HL7v2",
    "timestamp": "2024-01-15T12:00:00Z",
    "source_message_id": "MSG12345",
    "parse_warnings": []
  }
  // Event-specific fields...
}

Source Profiles

A Source Profile defines parsing behavior for a specific data feed.

Why Profiles?

Consider two hospital systems sending ADT messages:

Hospital A (Epic):

  • Uses ^ as component separator
  • MRN in PID.3.1 with assigning authority "EPIC"
  • Missing NK1 (next of kin) segments are normal

Hospital B (Cerner):

  • Uses ^ as component separator (same)
  • MRN in PID.3.1 with assigning authority "CERN"
  • NK1 segments are always present

Same message type, different behaviors. Profiles let you configure each independently.

Profile Structure

id: epic_adt
name: Epic ADT Interface
version: '1.0'

# Phase 1 config
encoding:
  charset: UTF-8
  lineEnding: auto

# Phase 2 config
hl7v2:
  default_version: '2.5'
  tolerate:
    missing_segments: [NK1, NTE]
    extra_components: true

# Phase 3 config
identifiers:
  mrn:
    assigning_authority: EPIC
    validation: required

# Event classification
event_classification:
  adt_a01:
    patient_class_values:
      I: inpatient
      O: outpatient
      E: emergency

Workflow Engine

The workflow engine routes events through configurable pipelines.

Components

Event ─── Filter ─── Transform ─── Actions
         (match?)    (modify)      (destinations)

Filters determine which events match a route:

  • Event type matching
  • Source system matching
  • CEL expressions for complex conditions

Transforms modify events before routing:

  • Set/update fields
  • Map terminology codes
  • Redact sensitive data

Actions send events to destinations:

  • FHIR servers
  • Webhooks
  • Databases
  • Message queues
  • Logging

Example Workflow

workflow:
  name: adt_routing
  routes:
    - name: critical_admits
      filter:
        event_type: patient_admit
        condition: event.encounter.class == "inpatient"
      transform:
        - set_field: processed_at = now()
      actions:
        - type: fhir
          endpoint: https://fhir.hospital.com/r4
        - type: log
          message: 'Inpatient admit: {{.Patient.MRN}}'

Warnings Over Errors

Healthcare data is inherently messy. fi-fhir uses a warnings over errors philosophy:

  • Recoverable issues generate warnings, not failures
  • Tolerance rules determine what's acceptable
  • Warnings are recorded in event metadata for auditing
// Instead of failing on missing data:
if segment == nil {
    if profile.IsMissingSegmentTolerated(segmentID) {
        addWarning("MISSING_SEGMENT", segmentID)
        return defaultValue  // Continue processing
    }
    return error  // Only fail if profile says so
}

Identifier-First Design

Patient identifiers are a first-class concept:

  • IdentifierSet handles multiple identifiers (PID-3 repetitions)
  • Validators for NPI, MBI, SSN, DEA numbers
  • Assigning authority mapping
  • Original value preservation for audit
{
  "identifiers": {
    "mrn": {
      "value": "12345",
      "assigning_authority": "EPIC",
      "type": "MR"
    },
    "ssn": {
      "value": "XXX-XX-6789",
      "original": "123-45-6789",
      "redacted": true
    }
  }
}

FHIR Mapping

Canonical events map to FHIR R4 resources following US Core profiles:

Event TypeFHIR Resource(s)
patient_admitPatient, Encounter
lab_resultObservation, DiagnosticReport
claim_submittedClaim (Da Vinci PAS)
vital_signObservation (US Core Vital Signs)
documentDocumentReference

See FHIR Output for complete mapping details.

Next Steps