fi-fhir docs
Quick Reference
Source Profiles Specification
This document defines the Source Profile system - the unit of scalability for fi-fhir integration adapters.
Quick Reference
| Concept | Description |
|---|---|
| Source Profile | YAML config for a single data feed (e.g., epic_adt_hosp_a.yaml) |
| Why per-feed? | Each interface has unique quirks, tolerances, and mappings |
| 3-Phase Pipeline | Byte normalization → Syntactic parse → Semantic extraction |
| Key config sections | hl7v2, z_segments, identifiers, terminology, quality |
| Implementation | pkg/profile/profile.go - Registry, loader, and config types |
Starting From a Template
When onboarding a new HL7v2 interface, start from a vendor template and then tighten it based on samples:
- Templates:
profiles/templates/hl7v2/ - Example fixtures:
testdata/hl7v2/vendors/
Typical loop:
- Copy a template into
profiles/<feed>.yamland setsource_profile.id+source_profile.name. - Run
fi-fhir profile lint profiles/<feed>.yaml --samples <sample-dir>and iterate. - Reduce tolerances over time (prefer fewer
missing_segments, disablenon_standard_delimiterswhen possible).
Core Technical Thesis
The unit of scalability is a Source Profile (per interface / per feed), not "HL7v2 support" in general.
A Source Profile owns:
- HL7 version expectations and tolerated drift
- Delimiter + encoding handling
- Z-segment extraction + mapping rules
- Identifier normalization and prioritization rules
- Terminology mapping rules (LOCAL → standard system)
- Event disambiguation heuristics (e.g., A01 used for OP reg)
This matches real-world variability: delimiter weirdness, missing segments, optionality drift, datatype evolution across versions.
Source Profile Schema
# Source Profile Definition
# Each interface/feed gets its own profile
source_profile:
id: epic_adt_feed_hosp_a
name: "Epic ADT Feed - Hospital A"
version: "1.0.0"
# ─────────────────────────────────────────────────────────────────
# HL7v2 Configuration
# ─────────────────────────────────────────────────────────────────
hl7v2:
default_version: "2.5.1"
timezone: "America/New_York"
# Encoding/Transport Layer
encoding:
charset_default: "UTF-8" # Fallback if MSH-18 missing
charset_detection: true # Attempt auto-detection
line_ending_mode: "tolerant" # "strict" (CR only) or "tolerant" (CR/LF/CRLF)
# Parsing Tolerance
tolerate:
missing_segments: ["PV1", "PD1", "OBR"] # Don't fail if absent
nte_anywhere: true # NTE can appear outside spec locations
extra_components: true # Don't fail on XCN with 23 vs 14 components
unknown_segments: true # Preserve but don't require mapping
non_standard_delimiters: true # Handle MSH-2 variations
# Datatype Version Handling
datatypes:
xcn_component_count: "flexible" # "strict" = fail if wrong count, "flexible" = parse available
cx_component_count: "flexible"
xpn_component_count: "flexible"
# Event Disambiguation
event_classification:
# A01 behavior depends on PV1-2 (Patient Class)
adt_a01:
default: "patient_admit"
rules:
- condition: "PV1.2 == 'I'"
event: "inpatient_admit"
- condition: "PV1.2 == 'O'"
event: "outpatient_registration"
- condition: "PV1.2 == 'E'"
event: "emergency_registration"
- condition: "PV1.2 == 'P'"
event: "preadmit"
- condition: "PV1.2 == 'R'"
event: "recurring_patient"
# A04 can be registration or admit depending on source
adt_a04:
default: "outpatient_registration"
# ─────────────────────────────────────────────────────────────────
# EDI/X12 Configuration (Companion Guides)
# ─────────────────────────────────────────────────────────────────
edi:
# Enable payer-specific companion guide validation.
# Values:
# - "auto" (auto-detect from ISA/GS/ST + payer loop)
# - "<guide-id>" (built-in guide ID)
# - "<path>" (guide YAML/JSON file)
companion_guide: "auto"
# Optionally load additional guide files from a directory.
# Files with .yaml/.yml/.json extensions are loaded.
companion_guide_dir: "./guides"
# ─────────────────────────────────────────────────────────────────
# Z-Segment Mappings
# ─────────────────────────────────────────────────────────────────
z_segments:
# Always extract raw Z-segments as structured data
preserve_raw: true
# Optionally map to canonical extensions
mappings:
ZPD:
- field: 1
target: patient.extensions.vip_flag
type: boolean
- field: 2
target: patient.extensions.department_code
type: string
ZVN:
- field: 1
target: encounter.extensions.visit_type_detail
type: string
- field: 2
target: encounter.extensions.expected_los_days
type: integer
ZIN:
- field: 1
target: coverage.extensions.auth_number
type: string
# ─────────────────────────────────────────────────────────────────
# Identifier Normalization
# ─────────────────────────────────────────────────────────────────
identifiers:
# Map assigning authority strings to canonical systems
assigning_authority_map:
"HOSP_A": "urn:oid:1.2.3.4.5.6.7"
"HOSP_A_MRN": "urn:oid:1.2.3.4.5.6.7"
"ENTERPRISE": "urn:oid:1.2.3.4.5"
"SSA": "urn:oid:2.16.840.1.113883.4.1"
"CMS": "urn:oid:2.16.840.1.113883.4.927"
"BCBS_VA": "urn:oid:2.16.840.1.113883.3.123"
# Priority order for selecting "primary" identifier
# First match wins
primary_id_preference:
- { type: "MR", assigner_contains: "HOSP_A" }
- { type: "MR", assigner_contains: "ENTERPRISE" }
- { type: "PI" }
- { type: "SS" }
# Validation rules (quarantine if invalid)
validation:
npi:
enabled: true
on_invalid: "warn" # "error" | "warn" | "pass"
dea:
enabled: false
mbi:
enabled: true
on_invalid: "warn"
ssn:
enabled: true
on_invalid: "warn"
# Normalization rules
normalization:
ssn:
strip_dashes: true
reject_patterns: ["000000000", "123456789", "111111111"]
phone:
strip_country_code: true # Remove leading 1 for US
normalize_to_digits: true
mrn:
strip_leading_zeros: false # Some systems use them semantically
uppercase: true
# ─────────────────────────────────────────────────────────────────
# Terminology Mapping
# ─────────────────────────────────────────────────────────────────
terminology:
# Global settings
strict_validation: false
unknown_code_behavior: "warn" # "error" | "warn" | "pass"
# Version expectations per code system
versions:
loinc: "2.76"
snomed: "2023-09-01" # US Edition
icd10cm: "2024" # Fiscal year
cpt: "2024" # Calendar year
# LOCAL → Standard mappings
mappings:
- source_system: "LOCAL_LAB"
target_system: "http://loinc.org"
file: "./mappings/hosp_a_local_to_loinc.csv"
- source_system: "LOCAL_DX"
target_system: "http://hl7.org/fhir/sid/icd-10-cm"
file: "./mappings/hosp_a_dx_to_icd10.csv"
- source_system: "LOCAL_PROC"
target_system: "http://www.ama-assn.org/go/cpt"
file: "./mappings/hosp_a_proc_to_cpt.csv"
# Panel expansion (optional)
panels:
expand_cbc: true # CBC panel → individual LOINC components
expand_bmp: true
expand_cmp: true
# ─────────────────────────────────────────────────────────────────
# Data Quality & Metrics
# ─────────────────────────────────────────────────────────────────
quality:
# Track these metrics per-feed
metrics:
- message_types_distribution
- segment_presence_rates
- encoding_anomalies
- identifier_validity_rates
- terminology_coverage_rates
- z_segment_frequency
# Alerting thresholds
alerts:
invalid_npi_rate:
threshold: 0.05 # Alert if >5% invalid
severity: "warning"
unknown_loinc_rate:
threshold: 0.20 # Alert if >20% unmapped
severity: "info"
missing_pv1_rate:
threshold: 0.10
severity: "warning"
3-Phase Parsing Pipeline
Source Profiles govern a 3-phase parsing pipeline:
┌─────────────────────────────────────────────────────────────────┐
│ 3-Phase Parsing Pipeline │
├─────────────────────────────────────────────────────────────────┤
│ │
│ Phase 1: BYTE NORMALIZATION (Transport Layer) │
│ ──────────────────────────────────────────────────────────────│
│ • Normalize line endings (\r\n and \n → \r) │
│ • Detect character set (BOM + MSH-18) and decode safely │
│ • Keep original bytes for audit/replay │
│ • Output: normalized UTF-8 string + original_bytes │
│ │
│ ↓ │
│ │
│ Phase 2: SYNTACTIC PARSE (Format Layer) │
│ ──────────────────────────────────────────────────────────────│
│ • Detect field separator and encoding chars from MSH-1/MSH-2 │
│ • Handle non-standard delimiter characters │
│ • Parse repetitions (~), components (^), subcomponents (&) │
│ • Process escape sequences (\F\, \S\, \T\, \R\, \E\, \X..\) │
│ • Preserve unknown segments and out-of-order NTEs │
│ • Store raw component arrays for fidelity │
│ • Output: Message struct with segments[], raw_fields[] │
│ │
│ ↓ │
│ │
│ Phase 3: SEMANTIC EXTRACTION (Meaning Layer) │
│ ──────────────────────────────────────────────────────────────│
│ • Map message → canonical event(s) using Source Profile │
│ • Apply event classification rules (A01 → inpatient_admit) │
│ • Gracefully handle missing PV1, missing OBR, OBX-only flows │
│ • Extract Z-segments generically + apply field mappings │
│ • Normalize identifiers per profile rules │
│ • Map terminology per profile mappings │
│ • Generate parse_warnings[] for anomalies │
│ • Output: Canonical event(s) with full provenance │
│ │
└─────────────────────────────────────────────────────────────────┘
Implementation Types
// SourceProfile represents configuration for a single data source/interface
type SourceProfile struct {
ID string `yaml:"id" json:"id"`
Name string `yaml:"name" json:"name"`
Version string `yaml:"version" json:"version"`
HL7v2 *HL7v2Config `yaml:"hl7v2,omitempty" json:"hl7v2,omitempty"`
ZSegments *ZSegmentConfig `yaml:"z_segments,omitempty" json:"z_segments,omitempty"`
Identifiers *IdentifierConfig `yaml:"identifiers,omitempty" json:"identifiers,omitempty"`
Terminology *TerminologyConfig `yaml:"terminology,omitempty" json:"terminology,omitempty"`
Quality *QualityConfig `yaml:"quality,omitempty" json:"quality,omitempty"`
}
// HL7v2Config governs HL7v2 parsing behavior
type HL7v2Config struct {
DefaultVersion string `yaml:"default_version" json:"default_version"`
Timezone string `yaml:"timezone" json:"timezone"`
Encoding *EncodingConfig `yaml:"encoding,omitempty" json:"encoding,omitempty"`
Tolerate *ToleranceConfig `yaml:"tolerate,omitempty" json:"tolerate,omitempty"`
Datatypes *DatatypeConfig `yaml:"datatypes,omitempty" json:"datatypes,omitempty"`
EventRules *EventRulesConfig `yaml:"event_classification,omitempty" json:"event_classification,omitempty"`
}
// ToleranceConfig defines what the parser accepts vs rejects
type ToleranceConfig struct {
MissingSegments []string `yaml:"missing_segments" json:"missing_segments"`
NTEAnywhere bool `yaml:"nte_anywhere" json:"nte_anywhere"`
ExtraComponents bool `yaml:"extra_components" json:"extra_components"`
UnknownSegments bool `yaml:"unknown_segments" json:"unknown_segments"`
NonStandardDelimiters bool `yaml:"non_standard_delimiters" json:"non_standard_delimiters"`
}
// IdentifierConfig governs ID normalization and validation
type IdentifierConfig struct {
AssigningAuthorityMap map[string]string `yaml:"assigning_authority_map" json:"assigning_authority_map"`
PrimaryIDPreference []IDPreferenceRule `yaml:"primary_id_preference" json:"primary_id_preference"`
Validation map[string]*ValidatorConfig `yaml:"validation" json:"validation"`
Normalization *NormalizationConfig `yaml:"normalization" json:"normalization"`
}
// IDPreferenceRule determines primary identifier selection
type IDPreferenceRule struct {
Type string `yaml:"type" json:"type"`
AssignerContains string `yaml:"assigner_contains,omitempty" json:"assigner_contains,omitempty"`
AssignerEquals string `yaml:"assigner_equals,omitempty" json:"assigner_equals,omitempty"`
}
// ValidatorConfig for individual identifier validators
type ValidatorConfig struct {
Enabled bool `yaml:"enabled" json:"enabled"`
OnInvalid string `yaml:"on_invalid" json:"on_invalid"` // "error", "warn", "pass"
}
// TerminologyConfig governs code system mappings
type TerminologyConfig struct {
StrictValidation bool `yaml:"strict_validation" json:"strict_validation"`
UnknownCodeBehavior string `yaml:"unknown_code_behavior" json:"unknown_code_behavior"`
Versions map[string]string `yaml:"versions" json:"versions"`
Mappings []TerminologyMapping `yaml:"mappings" json:"mappings"`
}
// TerminologyMapping defines LOCAL → Standard code mappings
type TerminologyMapping struct {
SourceSystem string `yaml:"source_system" json:"source_system"`
TargetSystem string `yaml:"target_system" json:"target_system"`
File string `yaml:"file" json:"file"`
}
// ZSegmentConfig defines Z-segment handling
type ZSegmentConfig struct {
PreserveRaw bool `yaml:"preserve_raw" json:"preserve_raw"`
Mappings map[string][]ZFieldMapping `yaml:"mappings" json:"mappings"`
}
// ZFieldMapping maps a Z-segment field to canonical extension
type ZFieldMapping struct {
Field int `yaml:"field" json:"field"`
Target string `yaml:"target" json:"target"`
Type string `yaml:"type" json:"type"` // string, boolean, integer, etc.
}
Profile Loader
// ProfileRegistry manages source profiles
type ProfileRegistry struct {
profiles map[string]*SourceProfile
mu sync.RWMutex
}
// Load loads a profile from YAML file
func (r *ProfileRegistry) Load(path string) (*SourceProfile, error) {
data, err := os.ReadFile(path)
if err != nil {
return nil, fmt.Errorf("failed to read profile: %w", err)
}
var wrapper struct {
SourceProfile *SourceProfile `yaml:"source_profile"`
}
if err := yaml.Unmarshal(data, &wrapper); err != nil {
return nil, fmt.Errorf("failed to parse profile: %w", err)
}
profile := wrapper.SourceProfile
if err := r.validate(profile); err != nil {
return nil, fmt.Errorf("invalid profile: %w", err)
}
r.mu.Lock()
r.profiles[profile.ID] = profile
r.mu.Unlock()
return profile, nil
}
// Get retrieves a profile by ID
func (r *ProfileRegistry) Get(id string) (*SourceProfile, bool) {
r.mu.RLock()
defer r.mu.RUnlock()
p, ok := r.profiles[id]
return p, ok
}
// Default returns a sensible default profile for unknown sources
func (r *ProfileRegistry) Default() *SourceProfile {
return &SourceProfile{
ID: "default",
Name: "Default Profile",
HL7v2: &HL7v2Config{
DefaultVersion: "2.5.1",
Tolerate: &ToleranceConfig{
MissingSegments: []string{"PV1", "PD1"},
NTEAnywhere: true,
ExtraComponents: true,
UnknownSegments: true,
NonStandardDelimiters: true,
},
},
Identifiers: &IdentifierConfig{
Validation: map[string]*ValidatorConfig{
"npi": {Enabled: true, OnInvalid: "warn"},
"mbi": {Enabled: true, OnInvalid: "warn"},
},
},
Terminology: &TerminologyConfig{
StrictValidation: false,
UnknownCodeBehavior: "warn",
},
}
}
Profile-Aware Parser
// Parser uses a SourceProfile to govern parsing behavior
type Parser struct {
profile *SourceProfile
logger *slog.Logger
}
// NewParser creates a parser with the given profile
func NewParser(profile *SourceProfile) *Parser {
if profile == nil {
profile = DefaultProfile()
}
return &Parser{
profile: profile,
logger: slog.Default(),
}
}
// Parse executes the 3-phase pipeline
func (p *Parser) Parse(raw []byte) (*ParseResult, error) {
result := &ParseResult{
SourceProfileID: p.profile.ID,
ParseWarnings: make([]ParseWarning, 0),
OriginalBytes: raw,
}
// Phase 1: Byte Normalization
normalized, warnings := p.normalizeBytes(raw)
result.ParseWarnings = append(result.ParseWarnings, warnings...)
// Phase 2: Syntactic Parse
msg, warnings, err := p.syntacticParse(normalized)
if err != nil {
return nil, fmt.Errorf("syntactic parse failed: %w", err)
}
result.ParseWarnings = append(result.ParseWarnings, warnings...)
result.RawMessage = msg
// Phase 3: Semantic Extraction
event, warnings, err := p.semanticExtract(msg)
if err != nil {
return nil, fmt.Errorf("semantic extraction failed: %w", err)
}
result.ParseWarnings = append(result.ParseWarnings, warnings...)
result.Event = event
return result, nil
}
// ParseResult contains all outputs from the 3-phase pipeline
type ParseResult struct {
SourceProfileID string `json:"source_profile_id"`
Event interface{} `json:"event"`
RawMessage *RawMessage `json:"raw_message,omitempty"`
ParseWarnings []ParseWarning `json:"parse_warnings,omitempty"`
OriginalBytes []byte `json:"original_bytes,omitempty"`
}
// ParseWarning captures non-fatal issues during parsing
type ParseWarning struct {
Phase string `json:"phase"` // "byte", "syntactic", "semantic"
Code string `json:"code"` // e.g., "MISSING_PV1", "INVALID_NPI"
Message string `json:"message"`
Path string `json:"path,omitempty"` // e.g., "PID.3.1"
}
CLI Integration
# Parse using a specific source profile
fi-fhir parse --profile ./profiles/epic_adt.yaml message.hl7
# Parse with inline profile override
fi-fhir parse --profile ./profiles/epic_adt.yaml \
--set 'hl7v2.tolerate.missing_segments=["PV1","PV2"]' \
message.hl7
# Lint a profile (with optional sample messages)
fi-fhir profile lint ./profiles/epic_adt.yaml --samples ./samples/*.hl7
# Generate profile from sample messages
fi-fhir profile infer ./samples/*.hl7 --out ./profiles/inferred.yaml
Example Profiles
Minimal Profile (Unknown Source)
source_profile:
id: unknown_default
name: "Unknown Source - Tolerant Defaults"
version: "1.0.0"
hl7v2:
default_version: "2.5.1"
tolerate:
missing_segments: ["PV1", "PD1", "OBR"]
nte_anywhere: true
extra_components: true
unknown_segments: true
Epic ADT Feed
source_profile:
id: epic_adt_prod
name: "Epic ADT Production Feed"
version: "2.1.0"
hl7v2:
default_version: "2.5.1"
timezone: "America/New_York"
encoding:
charset_default: "UTF-8"
tolerate:
missing_segments: ["PD1"]
nte_anywhere: true
event_classification:
adt_a01:
rules:
- condition: "PV1.2 == 'I'"
event: "inpatient_admit"
- condition: "PV1.2 == 'O'"
event: "outpatient_registration"
z_segments:
preserve_raw: true
mappings:
ZPD:
- field: 1
target: patient.extensions.vip_flag
type: boolean
- field: 3
target: patient.extensions.epic_mrn_checksum
type: string
identifiers:
assigning_authority_map:
"EPIC_HOSP": "urn:oid:1.2.840.114350.1.13.123.3.7.2.696570"
"SSA": "urn:oid:2.16.840.1.113883.4.1"
primary_id_preference:
- { type: "MR", assigner_contains: "EPIC" }
- { type: "SS" }
validation:
npi: { enabled: true, on_invalid: "warn" }
terminology:
strict_validation: false
unknown_code_behavior: "warn"
Cerner Lab Interface
source_profile:
id: cerner_lab_results
name: "Cerner Lab Results Feed"
version: "1.0.0"
hl7v2:
default_version: "2.4"
timezone: "America/Chicago"
tolerate:
missing_segments: ["OBR"] # Some Cerner configs send OBX-only
terminology:
mappings:
- source_system: "CERNER_LAB"
target_system: "http://loinc.org"
file: "./mappings/cerner_lab_to_loinc.csv"
Roadmap Impact
Moving Source Profiles to MVP requires:
Weeks 1-4 (MVP-Hard Requirements) ✅
- Profile YAML schema and loader - see
pkg/profile/profile.go:Registry - All config types (HL7v2, ZSegments, Identifiers, Terminology, Quality)
- Default() profile with sensible tolerances
- Profile validation on load
- Delimiter + encoding handling (Phase 1)
- Tolerant missing segment handling - see
IsMissingSegmentTolerated() - Event classification rules - see
GetEventClassification() - Assigning authority mapping - see
GetAssigningAuthoritySystem()
Weeks 5-8 (Post-Stability) ⚠️
- Terminology mapping tables - see
pkg/terminology/mapper.go - NPI/MBI validators wired to quality checks - see
pkg/validate/identifiers.go - Z-segment field mapping beyond raw capture
- Profile inference from samples (
fi-fhir profile infer) - seecmd/fi-fhir/main.go
Testing Strategy
func TestSourceProfileLoading(t *testing.T) {
tests := []struct {
name string
yaml string
wantErr bool
}{
{
name: "minimal valid",
yaml: `source_profile:
id: test
name: Test
version: "1.0"`,
wantErr: false,
},
{
name: "missing id",
yaml: `source_profile:
name: Test`,
wantErr: true,
},
}
for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
reg := NewProfileRegistry()
_, err := reg.LoadFromBytes([]byte(tt.yaml))
if (err != nil) != tt.wantErr {
t.Errorf("LoadFromBytes() error = %v, wantErr %v", err, tt.wantErr)
}
})
}
}
See Also
- HL7V2-QUIRKS.md - Version differences and Z-segment details
- IDENTIFIERS.md - Identifier validation rules referenced by profiles
- TERMINOLOGY.md - Code mapping files referenced by profiles
- WORKFLOW-DSL.md - Routes use events produced by profile-driven parsing