Skip to main content

Pipeline

Click phases or animate to see data flow
Phase 1

Byte Normalization

Normalize encoding and line endings

Raw bytes (UTF-8 with CRLF)
Normalized UTF-8 string with LF
Phase 2

Syntactic Parsing

Parse HL7v2 structure

Normalized string
Parsed segments and fields
Phase 3

Semantic Extraction

Extract meaningful data

Parsed message structure
Extracted identifiers and data

Phase 1: Byte Normalization

  • Detect BOM markers (if present)
  • Convert charset to UTF-8
  • Normalize line endings to LF
  • Strip trailing whitespace

Sample Message

MSH|^~\&|ADT|HOSPITAL|RECEIVER|LAB|202401151023||ADT^A01|MSG001|P|2.5.1
PID|1||MRN12345^^^HOSPITAL^MR||Smith^John^A||19850315|M
PV1|1|I|ICU^101^A|||||||||||||||VN98765^^^HOSPITAL^VN

This ADT^A01 message flows through all 3 phases, transforming from raw bytes to structured FHIR resources.

Configuration Fields

encoding.charsetencoding.lineEndingencoding.bomHandling

These fields in your Source Profile control byte normalization behavior.