ChatGPT crushes real doctors in answering patient questions
New JAMA IM paper shows the potential of ChatGPT
Here is what he and colleagues did. He went to a subreddit called Ask Docs. On this website, random people ask questions of doctors. To answer, you have to prove you are a physician to the moderator. He collected 195 questions and answers. For each question, he gave the exact same prompt to ChatGPT. (GPT 3.5v1)
Then 3 health care providers who were blinded to the authors scored the responses. A picture is worth a thousand words. ChatGPT crushed doctors. Both in quality and empathy! It wasn’t close.
On the majority of questions, 100% of judges favored the chatbot. Here are some e.g.s
I called up Dr. John Ayers to ask him why he thought ChatGPT did so well.
Ayers said, “Just go on ChatGPT and say, ‘I have a headache, can you help me?’ and you will see why. ChatGPT has infinite time. It is not constrained. It won’t say anything the doctor doesn’t know, but it will take the time to explain things, while the doctor will focus on the the central point.”
The other issue he notes is that sometimes patient messages are long— they might have 6 paragraphs and multiple questions. John added, “sometimes doctors don’t see all the questions. It might be the 4th question the patient really cares about.”
I confronted John with what I thought were the weaknesses of the paper, “Isn’t a limitation that the docs on reddit aren’t the best doctors? After all: who would answer questions on reddit?”
John disagreed with me. He told me to keep in mind that doctors seeing patients in clinic, “are not getting paid to message either.” Doctors on reddit are actually doing this voluntarily and publicly, which creates a game of reputation, he pointed out. Ayers suspects that the reddit answers might be better than the day-to-day answers in clinic. I conceded it was possible.
John added that one objection some reporters have raised is that your regular doctor knows you, and might answer questions better than the average reddit doctor. John fired back, “Do you know your doctor?” Adding, “I don’t know my doctor. People are disconnected from health care.”
Few of us are lucky enough to have an Adam Cifu to call, whom we have known for years.
Finally, I asked John if the questions on reddit were representative of actual questions people have.
John: “our evaluators said the questions here look similar to questions they get in their inbox.”
I told John that even before I was aware of his paper, I had become interested and impressed in ChatGPT, and made my thoughts known on my personal Substack, and in a video, which I will link to below.
John agreed, adding that he felt ChatGPT’s performance is, “unbelievable. I never in my life imagined I would see this.” He is shocked that software could generate such satisfying answers.
Last question: “Did ChatGPT confabulate?”.
“We didn’t directly evaluate for that, but doctors judged these messages, and would be unlikely to rate highly something that was wrong.”
I will read the paper again more critically, but I agree with John that fielding written questions likely will be one early success of ChatGPT.
If you like this, check out John’s brilliant covid op ed, (one of my favorites!) and my video on ChatGPT.