ChatGPT Generates Differential Diagnoses With Similar Accuracy to Emergency Doctors

In a recent study, cases featured a single primary complaint and diagnosis
doctors tablet
doctors tabletAdobe Stock
Medically Reviewed By:
Meeta Shah, M.D.

THURSDAY, Sept. 21, 2023 (HealthDay News) -- ChatGPT performance in generating differential diagnoses appears to be similar to emergency department medical experts, according to a research letter published online Sept. 9 in the Annals of Emergency Medicine to coincide with the annual European Emergency Medicine Congress, held from Sept. 17 to 20 in Barcelona, Spain.

Hidde ten Berg, from Jeroen Bosch Hospital in Utrecht, Netherlands, and colleagues investigated the ability of ChatGPT to generate accurate differential diagnoses based on physician notes recorded at initial emergency department presentation. The analysis included a retrospective analysis of 30 undifferentiated patients presenting to a nonacademic teaching hospital in March 2022 with a single proven diagnosis. ChatGPT results were compared to clinical teams' first formulated differential diagnoses and leading diagnoses without laboratory tests.

The researchers found that physicians correctly included the diagnosis in the top five differential diagnoses for 83 percent of cases, similar to ChatGPT v3.5 (77 percent) and v4.0 (87 percent). When including laboratory data, physicians' accuracy increased to 87 percent and ChatGPT v3.5 accuracy increased to 97 percent, while v4.0 accuracy remained at 87 percent. Physicians outperformed ChatGPT for choosing the correct leading diagnosis (60 versus 37 percent for v3.5 and 53 percent for v4.0). These values changed to 53 percent for physicians with laboratory data and 60 percent for v3.5 and 53 percent for v4.0. Differential diagnoses of physicians and ChatGPT overlapped by 60 percent. However, the researchers noted that ChatGPT can also generate varied responses to the same query.

"This observed inconsistency in ChatGPT's outputs emphasizes the inherent unpredictability in large language models and underscores the fact that these are merely tools that can aid, but not replace physicians' judgment," the authors write.

Abstract/Full Text (subscription or payment may be required)

More Information

Related Stories

No stories found.
logo
www.healthday.com