A Single Sentence from a Family Member Shifted an AI Diagnosis 12x. That Anchoring Bias Is in Your Agents Right Now.

Original article: Read on Nate’s Substack

Summary

Main Thesis

OpenAI’s ChatGPT Health — built with input from 260+ physicians and 600,000 rounds of clinician feedback — failed its first independent evaluation in alarming ways. But the real story is not that one product failed: the same four structural failure modes exist in every AI agent currently being deployed in enterprise settings.

Key Data Points

Among cases unanimously classified as emergencies by three independent physicians, ChatGPT Health directed patients away from the ER 52% of the time
Suicide-crisis safeguards fired more on vague emotional distress than on patients describing specific plans to harm themselves
A single dismissive sentence from a family member shifted triage recommendations away from emergency care with an odds ratio of 11.7 (12x shift)
The system’s own reasoning trace correctly identified “early respiratory failure” — then output “wait and schedule an appointment”
40 million people use this tool daily

The 4 Structural Failure Modes

Anchoring bias — the model latches onto early contextual signals and deprioritizes its own correct analysis
Confidence misalignment — outputs appear authoritative even when internal traces are uncertain
Evaluation gap — safety evaluations weren’t designed to detect these failure patterns
Context override — a single external sentence overrides multi-step clinical reasoning

Practical Takeaways

These failure modes are not medical — they’re properties of how LLMs behave in production, and they apply to agents handling claims, compliance, customer service, and procurement
A factorial evaluation methodology (accidentally pioneered by the medical team) is the most rigorous agent eval approach yet published — and it scales
A four-layer eval architecture addresses each failure mode: confidence routing, deterministic validation, stress testing, and adversarial input injection
The cost model is front-loaded: month 6 costs a fraction of month 1

Frameworks & Prompts

Factorial evaluation methodology (borrowed from the medical study)
4-layer eval architecture mapped to specific failure modes
Prompts for auditing your own agent deployments for these blind spots

Processed: 2026-03-19

Infographics

Landscape Infographic

Portrait Infographic