ChatGPT has helped me pass several exams of the United States Medical Licensing Examination (USMLE). study It is expected to be completed in 2022. This year, a team of Canadian medical experts tested the treatment in real life to see if it worked. The results showed that it didn’t.
ChatGPT vs Medscape
“Our source of medical questions was Medscape’s question bank,” says Amrit Kirpalani, a medical educator at Western University in Ontario, Canada, who led the new study on ChatGPT’s performance as a diagnostic tool. While USMLE test questions were mostly multiple-choice, Medscape has complete medical cases based on real patients, complete with physical exam findings and lab test results.
The idea behind it is that these cases are challenging for healthcare professionals due to complications such as multiple comorbidities (the presence of two or more diseases at the same time) and various diagnostic dilemmas that make the right answer unclear. Kirpalani’s team translated 150 of these Medscape cases into prompts that ChatGPT could understand and process.
This was a bit of a challenge: OpenAI, the company that developed ChatGPT, prohibits its use for medical advice, so prompting it to directly diagnose the case didn’t work. However, this was easily circumvented by telling the AI that a diagnosis was needed for an academic research paper the team was writing. The team then gave the AI various possible answers, copied/pasted all the case information available on Medscape, and asked ChatGPT to provide justification for its selected answer.
It turns out ChatGPT was wrong in 76 out of 150 cases. But chatbots are supposed to be good at diagnosing, right?
Special Purpose Tools
In early 2024, Google Articulate Medical Intelligence Explorer (AMIE) is a large-scale language model built specifically to diagnose diseases based on conversations with patients. AMIE diagnosed 303 cases based on conversations with patients and performed better than human doctors. New England Journal of Medicine and Clinical Pathology ConferenceAnd AMIE is no exception: Barely a week went by last year without a study being published showing how AI can perform remarkably well in diagnosing cancer or diabetes, or even predicting male infertility based on blood test results.
But the difference between these specialized medical AIs and ChatGPT is the data they use to train them. “These AIs may have been trained on large volumes of medical literature and may have even been trained on similar complex cases,” Kirpalani explains. “They may be customized to understand medical terminology, interpret diagnostic tests, and recognize patterns in medical data related to specific diseases or symptoms. In contrast, general-purpose LLMs like ChatGPT are trained on broad topics and lack the deep expertise required for medical diagnosis.”