AIMedicineResearch

AI Models Score 90% on Clinical Test, Human Doctors 49%

April 22, 2026 3 min read

A study from Marburg University found that 13 AI language models answered 90 percent of clinical knowledge questions correctly, while 123 human physicians scored 49 percent. The gap points to a real but narrow capability: AI has the facts; it still lacks the judgment.

Thirteen AI language models answered an average of 90 percent of questions in a clinical knowledge test correctly in a study from Marburg University. The 123 human participants, including internal medicine specialists attending a major congress, scored 49 percent. This is not proof that AI replaces doctors. It does demonstrate that AI can match human medical knowledge recall, and it raises a question that hospitals around the world have not yet answered: how do they want to use that?

The study: 15 questions, 2 patient cases, 13 models

The study was conducted by Dr. Philipp Russ and Prof. Dr. Ivica Grgic from the University Hospital of Giessen and Marburg and was published in 2026 in the journal Scientific Reports. The test format consisted of two real patient cases involving acute kidney injury and 15 multiple-choice questions in German. On the human side were 123 participants, including medical students and internal medicine physicians from the 131st Annual Congress of the German Society of Internal Medicine in Wiesbaden.

The AI models answered an average of 90 percent of questions correctly; several models solved all 15 correctly. Human participants averaged 49 percent. The AI also required only a fraction of the time.

What the study does not measure: clinical judgment. A doctor takes a medical history, observes the patient, reads physical signals and communicates under pressure. Knowledge tests do not capture this. The study shows that AI has medical expertise on demand. It says nothing about what happens between doctor and patient when no textbook case is on the table.

The gap between capability and deployment

A knowledge test is one thing; the clinical environment is another. Adoption of AI in healthcare is accelerating globally but remains uneven. A 2025 survey by the German digital association Bitkom of 616 physicians found that 78 percent consider AI a major opportunity for medicine, and 60 percent expect AI to eventually deliver better diagnoses than humans. At the same time, 77 percent felt unprepared for AI in practice and 76 percent demanded strict regulation.

That tension is widespread: clinicians recognise the technology's potential but doubt their own readiness and want political guardrails before committing.

A regulatory first: AI made mandatory

In April 2026, Germany became the first country to legally require AI assistance in a national cancer screening programme. Only radiology practices and clinics that evaluate lung scans with AI support are permitted to participate in the national lung cancer early detection programme.

Until now, AI in medicine was everywhere voluntary and supplementary. The legislature has for the first time determined that an AI-assisted procedure is more reliable for a specific task than a purely manual one. That is likely to serve as a precedent for future regulatory decisions in other countries.

At the same time, the EU AI Act classifies medical AI systems as high-risk applications, requiring extensive approval procedures and audits by certification bodies. What protects patients also significantly lengthens the path from the laboratory to the clinic, sometimes by years.

What the Marburg study actually tells us

The result, 90 percent versus 49 percent, is striking but not surprising to researchers who follow the field. Language models have been trained on vast quantities of medical literature. What they have is recall and pattern-matching over that corpus at scale. What they lack is the ability to integrate ambiguous real-world signals: the patient who downplays symptoms, the atypical presentation, the social context that changes everything about a diagnosis.

The more useful question the study prompts is not whether AI is smarter than a doctor, but how healthcare systems can embed AI where it adds the most value with the least risk, such as in diagnostic support for imaging, in flagging missed findings or in reducing administrative burden, while keeping human judgment central where it matters most.