Version 3.5 of ChatGPT could not formulate a correct diagnosis in 83 of 100 pediatric cases, according to recent research published in JAMA Pediatrics.
According to the authors of the study, 72 of the incorrect diagnoses were completely incorrect and 11 of the incorrect diagnoses were clinically related but too broad to be considered a correct diagnosis.
A caveat of this study was the large language model used represented an older version of ChatGPT. Despite this, what do these results mean for healthcare and the use of AI?
The aforementioned study underscores the importance of physician oversight when implementing AI tools and large language models in clinical medicine. AI tools are only beginning to be developed, and much more research and investigation is necessary before they become mainstream in healthcare. Physicians are and should always be the final arbiters and stewards of patient care, particularly when the stakes are as high as human life or death, as is the case with patient care.
Medical interpretation often is nuanced and requires contextual understanding of various factors. As an example, when radiologist physicians are interpreting a CT scan of the legs, they may come across the finding of subcutaneous edema in the calf. This finding is nonspecific and can be seen in the setting of many diagnoses; including cellulitis, contusion from trauma and vascular disease from heart failure. Physicians rely on integrated information from the patient’s history to make the final diagnosis. In the above scenario, if the patient had a fever the likely diagnosis would be cellulitis, but if the patient suffered a motor vehicle accident the subcutaneous edema would likely be from a contusion.
It is precisely this contextual information that AI still needs to develop, as exemplified in the study published in JAMA Pediatrics. Making the proper diagnosis in the pediatric cases not only requires pattern recognition of symptoms, but also consideration of patient’s age and additional contextual patient information. AI certainly excels in pattern recognition, but likely struggles with more complex health scenarios where symptoms could overlap with various diagnoses. This limitation is precisely why physicians must regulate and oversee decisions and diagnoses made by large language models.
Discover more from News Facts Network
Subscribe to get the latest posts sent to your email.