Study Reveals Large Language Models Vulnerable to Misleading Medical Information
Researchers found that large language models (LLMs) that perform at expert levels on medical licensing exams can be manipulated into abandoning correct answers when presented with misleading context, a vulnerability they term poor "epistemic resilience." The study introduced MedMisBench, a benchmark with nearly 11,000 medical questions and 48,889 misleading context variations, showing accuracy drops from 71.1% to 38.0% under adversarial conditions. This finding is significant because millions of patients increasingly rely on LLMs for health advice, yet existing medical benchmarks fail to measure whether models maintain correct judgment under real-world misinformation.
Researchers at arXiv published a peer-reviewed study demonstrating that large language models achieving expert-level performance on medical licensing exams are surprisingly vulnerable to misleading information. The study introduced MedMisBench, a comprehensive benchmark containing 10,932 medical questions paired with 48,889 misleading context variations designed to test what researchers call "epistemic resilience"—the ability to maintain correct medical judgment when presented with false or contradictory information. Across 11 different model configurations, average accuracy plummeted from 71.1% on original questions to 38.0% when misleading context was injected, with a 51.5% overall attack success rate. The most effective attacks used formal, rule-like fabrications and authority-framed falsehoods, achieving up to 69.5% success rates. A clinical panel of 14 experts from 7 countries reviewed cases and identified serious potential harm in 38.2% of instances, highlighting a critical gap in how medical AI systems are currently evaluated.
What's missing
The study does not discuss potential mitigation strategies or how LLMs might be improved to resist such adversarial medical misinformation. Additionally, the paper does not address whether findings generalize to other specialized domains beyond medicine or whether different model architectures show varying resilience patterns.
What different sources said
- arXiv cs.CLCenter
Measuring Epistemic Resilience of LLMs Under Misleading Medical Context
Related
Genetic Drift, Not Selection, Drives Rapid Feather Color Evolution in Island Bird Radiation
A new study of an island bird radiation found that rapid evolution of feather coloration is driven primarily by genetic drift in small populations rather than sexual or ecological selection. The research integrated whole-genome data with detailed plumage measurements across complete species sampling to test whether signaling trait evolution correlates with speciation rates. The findings suggest that neutral demographic processes play a central role in generating phenotypic diversity during island radiations, challenging assumptions about the mechanisms driving rapid evolution.
New AI Model Improves Prediction of Therapeutic Peptide Function from Protein Sequences
Researchers developed a lightweight CNN classifier that predicts whether peptide sequences have therapeutic properties, trained on a database of 54,655 peptides across 48 functional categories. The model uses a novel negative sampling strategy to reduce false positive rates from over 60% in previous approaches to 2.1%. This advancement could accelerate drug discovery by enabling faster computational screening of peptide candidates before expensive experimental testing.
Study Shows Different Metabolic Stress Models Produce Distinct Effects on Human Neuronal Networks
Researchers tested three common in vitro metabolic stress models on human-derived neuronal networks and found each produced different patterns of neuronal activity and cell damage. The models tested were hypoxia alone, oxygen-glucose deprivation (OGD), and hypoxia combined with glutamate exposure. The findings suggest that choice of experimental model significantly affects results and that combining electrophysiological and structural analyses is important for accurately assessing metabolic stress in stroke research.