How Meridian Health Cut Triage Time by 40% with a Fine-Tuned Clinical Model

healthcare · Published on March 18, 2026

How Meridian Health Cut Triage Time by 40% with a Fine-Tuned Clinical Model

Company: Meridian Health Systems
Industry: Healthcare
Model: Llama 3.1 70B fine-tuned on internal clinical notes
Result: 40% reduction in average triage completion time, 91% clinician satisfaction score


Background

Meridian Health Systems operates 14 regional hospitals and over 80 outpatient clinics across the Northeast. Their clinical documentation team was spending an average of 22 minutes per patient encounter just on structured intake summaries — pulling together chief complaints, vitals context, medication history, and acuity scores into a format that downstream teams could act on.

The problem wasn't a lack of data. Meridian had 11 years of de-identified clinical notes, discharge summaries, and triage logs sitting in their on-prem data warehouse. The problem was turning that data into a model that fit their workflow, their terminology, and their liability requirements — without shipping patient data to a third-party API.

They came to Clado in January 2026.

The Challenge

General-purpose LLMs underperformed on Meridian's internal terminology. Abbreviations like "SOB on exertion," "hx of CABG," and "c/o dysuria x3d" were being expanded incorrectly or ignored. Structured output was inconsistent — the model would sometimes produce narrative prose where a discrete acuity score was needed.

Beyond accuracy, the team had strict constraints:

  • All inference had to run on Meridian's private cloud (no data egress)
  • The model needed to produce structured JSON output conforming to their HL7 FHIR schema
  • Responses had to cite the source section of the input note (for auditing)
  • Latency budget was under 4 seconds per note at P95

A generic RAG pipeline over their existing notes helped recall but didn't solve the formatting or latency problems. They needed a fine-tuned model.

What They Built with Clado

Dataset Preparation

Meridian's ML team used Clado's dataset tooling to curate a fine-tuning set from their 11-year corpus. They filtered for notes with a corresponding structured discharge summary (their ground truth), deduplicated across patient IDs, and split by year to avoid data leakage in evaluation.

The final dataset: 48,000 training pairs, 6,000 validation, 2,000 held-out test.

Clado's eval harness let them define custom metrics against their FHIR schema — field-level precision and recall on the structured output, plus a human-in-the-loop review queue for cases where the model disagreed with the gold label by more than a threshold.

Fine-Tuning

They started with Llama 3.1 70B as the base. Clado's run management tracked every experiment: learning rate, LoRA rank, batch size, number of epochs. The team ran 14 experiments over three weeks before landing on a configuration that hit their accuracy targets without overfitting to the training distribution's date range.

Key decisions logged in Clado:

  • LoRA rank 32 outperformed rank 64 on held-out clinical note types from 2019 (older terminology)
  • A 10% mix of synthetic negative examples (deliberately malformatted inputs) improved robustness on edge cases from the ER
  • Instruction-tuning the model to always emit a source_span field with character offsets into the input improved auditor trust significantly

Evaluation

Before any human review, Clado's automated eval pipeline ran each candidate model checkpoint against the 2,000 held-out test cases. Metrics tracked per run:

  • Field-level F1 on 12 structured FHIR fields
  • JSON validity rate
  • Source citation accuracy (span overlap with gold label)
  • Latency distribution (P50, P90, P95) on Meridian's target hardware

The final model hit 94.3% field-level F1, 99.8% JSON validity, and 3.1s P95 latency on Meridian's A100 cluster.

Deployment

Meridian deployed the model behind their existing clinical portal. Clado's versioned model registry made it straightforward to pin the production endpoint to a specific checkpoint and roll back if a regression was caught in the live monitoring queue.

Results

After 60 days in production across three pilot hospitals:

MetricBeforeAfter
Average triage documentation time22 min13 min
Clinician satisfaction (survey)91%
Structured output validity99.6%
Escalations requiring manual correction4.2%

The 40% reduction in triage time translated to roughly 3.5 additional patient encounters per clinician per shift at the pilot sites — capacity that had previously been absorbed by documentation overhead.

What the Team Said

"We'd been sitting on years of clinical data and didn't have a clean way to turn it into something we could actually deploy. Clado gave us the eval infrastructure to know when the model was good enough — and the confidence to ship it."

— Director of Clinical Informatics, Meridian Health Systems

What's Next

Meridian is now running a second fine-tuning cycle targeting specialist consult notes (cardiology, nephrology, oncology), which have different terminology distributions than general triage. They're also evaluating whether a smaller fine-tuned 8B model can hit acceptable accuracy thresholds for lower-acuity outpatient settings, where latency and compute cost matter more.

Both workstreams are tracked in Clado.

Build your advantage for production AI.


Footer
Freesolo

Copyright 2026 Linkd Inc. All rights reserved.

Freesolo