ChatGPT crushes real doctors in answering patient questions

New JAMA IM paper shows the potential of ChatGPT

Apr 28, 2023

My colleague John Ayers at UCSD performed a clever study that is out right now in JAMA IM.

Here is what he and colleagues did. He went to a subreddit called Ask Docs. On this website, random people ask questions of doctors. To answer, you have to prove you are a physician to the moderator. He collected 195 questions and answers. For each question, he gave the exact same prompt to ChatGPT. (GPT 3.5v1)

Then 3 health care providers who were blinded to the authors scored the responses. A picture is worth a thousand words. ChatGPT crushed doctors. Both in quality and empathy! It wasn’t close.

On the majority of questions, 100% of judges favored the chatbot. Here are some e.g.s

I called up Dr. John Ayers to ask him why he thought ChatGPT did so well.

Ayers said, “Just go on ChatGPT and say, ‘I have a headache, can you help me?’ and you will see why. ChatGPT has infinite time. It is not constrained. It won’t say anything the doctor doesn’t know, but it will take the time to explain things, while the doctor will focus on the the central point.”

The other issue he notes is that sometimes patient messages are long— they might have 6 paragraphs and multiple questions. John added, “sometimes doctors don’t see all the questions. It might be the 4th question the patient really cares about.”

I confronted John with what I thought were the weaknesses of the paper, “Isn’t a limitation that the docs on reddit aren’t the best doctors? After all: who would answer questions on reddit?”

John disagreed with me. He told me to keep in mind that doctors seeing patients in clinic, “are not getting paid to message either.” Doctors on reddit are actually doing this voluntarily and publicly, which creates a game of reputation, he pointed out. Ayers suspects that the reddit answers might be better than the day-to-day answers in clinic. I conceded it was possible.

John added that one objection some reporters have raised is that your regular doctor knows you, and might answer questions better than the average reddit doctor. John fired back, “Do you know your doctor?” Adding, “I don’t know my doctor. People are disconnected from health care.”

Few of us are lucky enough to have an Adam Cifu to call, whom we have known for years.

Finally, I asked John if the questions on reddit were representative of actual questions people have.

John: “our evaluators said the questions here look similar to questions they get in their inbox.”

I told John that even before I was aware of his paper, I had become interested and impressed in ChatGPT, and made my thoughts known on my personal Substack, and in a video, which I will link to below.

John agreed, adding that he felt ChatGPT’s performance is, “unbelievable. I never in my life imagined I would see this.” He is shocked that software could generate such satisfying answers.

Last question: “Did ChatGPT confabulate?”.

“We didn’t directly evaluate for that, but doctors judged these messages, and would be unlikely to rate highly something that was wrong.”

I will read the paper again more critically, but I agree with John that fielding written questions likely will be one early success of ChatGPT.

If you like this, check out John’s brilliant covid op ed, (one of my favorites!) and my video on ChatGPT.

Dr. K

Apr 28, 2023Edited

This article both makes and misses the important point. People have loved Eliza for years...and still do. No AI there -- just a sympathetic, psychoanalyst-like set of interactive prompts with which people will positively engage for hours. (This was recently in the news because Eliza did not tell a patient to seek professional help and he committed suicide. Eliza was developed as a "toy" demonstration project decades ago -- it was not meant to DO anything.)

For the kinds of questions that are asked on reddit, I am certain that ChatGPT (4 is even better than 3.5 by a fair amount) will do a better job. It has access to an almost infinite amount of "data" and an almost infinite amount of time (and electricity) to aggregate it. If health care were about answering questions, it would have a definite place -- and likely will get that.

But I always tell my patients two things: 1) You are your own science experiment -- I know lots about lots of patients but which part of what I know applies to you is something we will discover together and 2) Our interaction is based 20% on what I know after medical school, residency, fellowship and decades of practice and 80% on how well I can take that information and correctly make it be about YOU.

ChatGPT and its peers may know more than I do about my areas of medicine (although Larry Weed's research with Knowledge Problem Couplers says otherwise) but it knows nothing about you, and because of the broken structure of health informatics it likely will not for the foreseeable future. What you tell it about you is subject to nuances that you are unlikely to understand and that an LLE will not understand (since LLEs actually "understand" nothing).

Irrespective of the reddit study which is interesting but, in many ways irrelevant to medical practice, it is as easy to make ChatGPT confabulate/hallucinate in health as it is in anything else. And as with the Eliza patient's suicide, the effects can and will be devastating. But even more so (and we have extensive experience with Watson underscoring this) the information one gets, while great for a general answer on a reddit blog, is only coincidentally of value to YOU -- and sometimes can be inimical.

This is about the 10th time during my life (going back to Ted Shortliffe) that AI was going to radically change medicine. Every one of these cycles has failed for the same reason -- knowledge about medical facts has almost nothing to do with appropriately and optimally caring for patients where the N is always ONE. In that regard, we have not yet seen artificial stupidity -- which, based on numbers, will have to precede any kind of actual care-managing artificial intelligence.

Expand full comment

3 replies

Mary S. LaMoreaux

It makes you wonder if ChatGPT is so good, or our health system has gotten so impersonal. My cataract surgeon has no idea who I am, but my regular eye doctor has known me for years. Thats why I go to her. The cataract surgeon just lines everyone up like cattle and makes a fortune.

8 more comments...

Sensible Medicine

Discussion about this post