What It Actually Takes to Turn a Patient Chart into a Structured Clinical Summary

Every structured clinical deliverable starts the same way: a domain-specific clinician manually reads through a set of unstructured records. Pre-visit summaries, referral packets, patient-uploaded documents, and case reviews all require expert synthesis of fragmented data before a single word of output can be created. This process is expensive, slow, and difficult to scale.

Fourier Health automates this process end-to-end. This article describes the full pipeline, from raw document ingestion to the delivery of a structured clinical output, to illustrate the depth of automation required in practice.

Handling the Full Range of Clinical Data Formats

Fourier ingests more than 25 file types: faxed PDFs, DICOM files, raw images, transcribed audio, HL7 v2 messages, FHIR bundles, and CCDs pulled from Health Information Exchanges. These may be faxed or sent directly to Fourier via SFTP, direct API, or manual upload. Structured data is supplemented with HIE queries triggered by patient demographics extracted via object character recognition (OCR).

The diversity of formats for these assets is itself a nontrivial problem. The content contains even more challenges. A single incoming fax may contain records from multiple organizations for the same patient, out of chronological order, co-mingled with clinically irrelevant material, and sometimes even containing information from other patients with a similar name. These are edge cases, yet so common that they are routine. Fourier addresses and resolves them before any downstream processing begins.

Ingestion Edge Cases Fourier Handles Automatically

Multi-patient faxes with comingled records
Documents arriving out of chronological order
Patient name mismatches: misspellings, nicknames, maiden names
Clinically irrelevant documents embedded in record sets
Hierarchical labels valid only in the absence of higher-precedence labels
Labels spanning tens of pages, identifiable only from first and last page
Patient-uploaded files with no standard format constraints
Care-relevant documents not pertinent to the reason for visit

Classifying Every Page Before It Can Be Used

Accurate downstream retrieval and summarization depend on knowing what each page contains. Fourier's labeling system achieves F1 scores of 0.94–0.98 per label class, compared to approximately 0.70 for a standard single-prompt approach. This is accomplished through a two-stage pipeline: a segmentation pass that identifies care-setting boundaries across 100-page chunks with overlapping context windows, combined with per-page OCR classification that optionally includes context from surrounding pages. The merged output produces final per-page labels that correctly handle both simple per-page labels and complex span labels covering many pages.

Contrast this to the approach of adding full document context to every page classification. The entire document is analyzed for each page and the hallucination risk increases substantially. The two-stage approach requires only a constant number of iterations per token regardless of document size. The result is a best-in-class classification scheme that outperforms classification using frontier lab models:

Classification F1 Score Comparison:Fourier vs. Frontier Labs

F1 score

0.68

OpenAI GPT 5

0.72

Claude Opus 4.6

0.96

Fourier AI

Fig. 01 · Per-page label F1, higher is better

Linking Every Page to the Correct Patient Record

Fourier first extracts unique patient mentions from each page of a document, then uses an LLM to merge mentions that likely represent the same individual. A follow-up prompt evaluates the risk of each proposed merge given available context. Cases deemed potentially risky are routed to a human review queue before any downstream processing proceeds.

Locating Relevant Information Across the Full Record Set

Information retrieval accounts for approximately 85% of total manual effort in clinical summarization. Fourier automates this using a retrieval pipeline built around a clear priority: minimize false negatives first, then filter false positives. Missing clinically relevant information is a more serious failure mode than surfacing an extra result.

The pipeline runs per output field — medications, diagnoses, surgical history, radiology, and so on — and proceeds through four stages:

Chunking and Embedding: Document OCR is dynamically split into chunks with size and bounds optimized for the task.
Hybrid Multi-Query Search: A semantic query captures general matching language; multiple lexical queries target specialty-specific terminology. Query sets are generated through iterative offline optimization using human-derived retrieval data. Page and document labels pre-filter the search space, reducing the risk of false positives.
Reranking and Prompt-Based Filtering: Retrieved chunks are scored via a reranking model. A final LLM call assesses each remaining candidate's relevance to the original query before it advances.
Bounding Box Reconstruction: Retrieved text is mapped back to precise locations in the source PDF via OCR block reconstruction, generating visual bounding boxes that clinicians can verify directly. Every claim in the output is now traceable to its source.

The result is a 95% reduction in human time spent on information retrieval, with 0.995 recall at the page level and 0.998 token-level recall within retrieved pages, representing a 90% reduction in false negatives compared to a standard semantic search baseline.

Information retrieval false positive rate

False positive rate

OpenAI GPT 5

Claude Opus 4.6

Fourier AI

Fig. 02 · False positive ratelower is better

Information retrieval false negative rate

Miss rate

OpenAI GPT 5

Claude Opus 4.6

Fourier AI

Fig. 03 · False negative ratelower is better

Grounding Clinical Events in the Correct Point in Time

Patient records contain many dates: fax headers, document dates, visit dates, addendum dates, and relative references like "two weeks ago." Determining which date applies to a specific clinical event is non-trivial, particularly when an event can be mentioned many pages before or after the visit date that contextualizes it.

Fourier's date extraction system filters irrelevant dates, then intelligently parses relevant dates that are used downstream to filter retrieved information by recency and to correctly attribute dates to events in the generated output.

Urgency Determination and Diagnosis Extraction

Fourier's clinical inference layer handles urgency determination and ICD-10 diagnosis extraction. Urgency uses a rules-based approach to identify candidates. For example, lab values exceeding defined thresholds for specific SNOMED-coded entities. Diagnosis extraction similarly combines ICD-10 code and natural language description entity extraction with recency- and relevance-based filtering.

Agentic Approach for Generating Structured Outputs

Fourier relies on its proprietary agentic approach to transform upstream information into structured, actionable output. Derived information, including diagnoses, critical dates, and supporting evidence retrieved, is assembled by the agent to match a custom schema defined for the clinical specialty and customer.

Fourier's output schemas are built from a library of reusable clinical components: Demographics, Diagnoses, Radiology Reports, Surgical History, and Timeline are examples. These are chosen by customers and assembled into specialty-specific outputs. Because training data, evaluation sets, and retrieval configurations are shared across customers in the same specialty, each new customer in an existing specialty requires less seed data before automation can begin. We encode the edge cases and other difficulties as part of our broader instruction set so that they can be leveraged for different organizations. This compounding data advantage is not replicable by a point solution or an internal build.

The result is a best-in-class AI output that significantly outperforms even human experts.

Clinical Summary Structured OutputF1 Score Comparison

F1 score

0.82

Human Benchmark

0.94

Fourier AI

Fig. 04 · Structured-output F1, higher is better

A Calibrated Human-in-the-Loop System

Trust in and accuracy of the output are of paramount importance in healthcare. To that end, Fourier AI's structured outputs are graded by human QA using a rubric approach. QA sampling probability is designed to direct human review toward cases where it adds the most value. We generally err on the side of more QA for new specialties, new customers, and output components that historically show low QA scores. Fourier's AI-assisted QA layer, optimized using human QA feedback pairs, flags likely errors before a human reviewer opens a case. This Clinician-in-the-Loop system ensures the highest quality possible.

What It Actually Takes to Turn a Patient Chart into a Structured Clinical Summary

Handling the Full Range of Clinical Data Formats

Ingestion Edge Cases Fourier Handles Automatically

Classifying Every Page Before It Can Be Used

Linking Every Page to the Correct Patient Record

Locating Relevant Information Across the Full Record Set

Grounding Clinical Events in the Correct Point in Time

Urgency Determination and Diagnosis Extraction

Agentic Approach for Generating Structured Outputs

A Calibrated Human-in-the-Loop System

See what your data could be doing for you.