A few months ago, I wrote about how a study that purported to show that an older version of the Shingrix vaccine prevented dementia was covered in the news. I argued that although people should get the current vaccine, I did not think the study proved that vaccinating against shingles prevents dementia. As is often the case -- which is what I love about this site -- I got some feedback. Some people congratulated me on my wisdom, while others explained why I was wrong. When I discussed this with a friend, he forwarded this short masterpiece, The Subjective Interpretation of the Medical Evidence.
I’ve been waiting for an opportunity to share the article. The recent article “Structured Exercise after Adjuvant Chemotherapy for Colon Cancer” reported the results of the CHALLENGE trial. Its lay press coverage, and Dr. Mandrola’s recent critical appraisal, present the perfect opportunity.
In The Subjective Interpretation of the Medical Evidence, Drs. Bauchner and Ioannidis point out that experts can disagree about what course of action evidence suggests. They point out the divergent recommendations on topics such as colon cancer, breast cancer, and depression screening, as well as the use of coronary calcium scores. They note that even when financial conflict of interest is managed, “allegiance bias and ideology may also shape recommendations.” They go on to discuss how these disagreements get more “airtime” than ever before because:
Besides organized guideline committees, individual experts have been increasingly influential and vocal in the public space. Many medical experts have become powerful influencers in media and on social media, interpreting evidence subjectively. Journalists and other nonprofessional influencers who may lack an understanding of how evidence is developed and evaluated nevertheless join the ensuing discussions and debates.
I’m telling you, it’s a great article. Give it a read.
John Ioannidis’ classic Why Most Published Research Findings Are False helped me understand the subjectivity of critical appraisal. In this article, he points out that given the a and b errors built into our statistics, the likelihood of study results being “correct”—true positive or true negative vs. false positive or false negative—depends on our pre-study probability, which is subjective.
I often present these calculations to students: if you read a study that shows a new treatment is effective, and your pre-study probability that the treatment works is 50%, then the likelihood that this treatment is effective is 94%. However, if the pre-study probability is only 10%, the likelihood that the results represent a true positive is only 64%. And this is for a perfect study.1
So let’s get back to colon cancer and exercise. The widely covered article showed that in patients with resected colon cancer who had completed adjuvant chemotherapy, an intensive exercise regimen was associated with increasing 5-year disease-free survival to 80.3% from 73.9%. The regimen also increased overall survival, over 8 years, to 90.3% from 83.2%.
John’s interpretation (which, unsurprisingly, I agree with) comes in part from him having, and developing as he read the article, a low pretest probability that these results are valid.2
He points out that “the 37% reduction in all-cause mortality is implausible and rivals many proven cancer therapies” and that the impact of the exercise regimen on cancer outcomes occurs remarkably (unbelievably?) quickly. John’s most important point is that none of this matters. Even if the effect of this intervention is truly as immense as shown in this trial, demonstrated in the second trial in a different population, generalizing the results of the CHALLENGE trial would be, well, challenging.3
Those more enthusiastic about the results who celebrated the results as an astounding breakthrough had a higher pretest probability that the trial would demonstrate a large benefit. While John (and I) feel it is unlikely that the positive results of this trial represent a true positive, others are sure it is.
It might not be that those who disagreed with John had a higher pre-study probability, but that they were not thinking about priors but were looking at the results in a vacuum. John and I read the article as Bayesians, interpreting the results in light of our priors (which are skeptical, or maybe conservative). Others took the numbers at face value, not considering priors. This is a frequentist approach.
This is why critical appraisal is endlessly fascinating and why “following the science” is a lot more difficult than it sounds. We must read articles carefully and consider our priors as we interpret the results. When we read other people’s interpretations of studies, we also need to consider their priors. It is not that there is no true interpretation of an article. Some articles contain fatal flaws and residual confounding. Others are well-planned and solidly conducted. The results are clear, but the interpretation depends on what you bring to them.
I usually end the discussion by saying, “And that is why, if you hear on NPR that a study proves something seemingly impossible, the study is probably a false positive.”
I should note that I did not talk to John about his article, so I am interpreting his interpretation. When he and I talk, we spend most of the time talking about our own obsessive exercising.
I do not agree with all of John’s points — specifically, his #2. There are process measures in this study that show that exercise was effective. Anyone who has ever tried to lose weight with exercise alone knows that it doesn’t work very well
Photo Credit: Adam Cifu
1. I didn't realize that you and I overlapped in training at Beth Israel Hospital-- looks like you were a medical resident there during the tail of my cardiology fellowship and my first couple of years on faculty.
2. I find yours (and John's) conflation of interpretation of a research study with "critical appraisal" to be problematic. IMHO, if you want to criticize a study design or point out its limitations, that's critical appraisal in my book. It's reasonably objective and focuses on what are generally practical tradeoffs in conducting research that investigators face every day. OTOH, if you want to provide subjective opinions on things like generalizability or likelihood of a study results being replicable (by Bayesian inference), that's interpretation. I think those are 2 completely different objectives and should not be conflated with each other. The first is an objective exercise. The second is largely a subjective judgement and is far from high science (although it still provides a usual perspective).
Thanks for another thoughtful, insightful post. Great way to start thinking in the morning.