The Uncertainty of Clinical Trial Results
For the Study of the Week, I will show examples of results that I hope make you more cautious about drawing conclusions.
Clinical trials are supposed to give doctors answers to major questions. You test treatment A against treatment B and count outcomes. You then learn which treatment delivers better results.
Grin. It rarely works that cleanly. Rare are times when treatment A is so dominant over B that everyone agrees. Such scenarios are rare because trials are mostly applied to medical decisions in which doctors cannot agree—so called equipoise. (The debate about when there is equipoise is complicated and a matter for another post.)
Professor Frank Harrell is an eminent professor of statistics at Vanderbilt. Thankfully he stays active on Twitter, so that many of us can learn from him.
Harrell recently criticized Vinay Prasad for publishing a paper that defended the authors of the Cochrane review of physical interventions to interrupt or reduce the spread of respiratory viruses.
Let me do a brief summary and then make my point about drawing conclusions from randomized trials.
The Cochrane review of studies of medical/surgical masks for preventing community spread of respiratory viruses found a relative risk of 1.01. We call that the point estimate. And 1.0 indicates no effect.
Point estimates come with 95% confidence intervals, which stipulate the precision of the experiment. In this case the 95% confidence intervals ranged from 0.72 to 1.42. That means masks could have reduced viral spread by 28% or increased it by 42%.
The authors of the Cochrane review wrote:
“Wearing masks in the community probably makes little or no difference to the outcome of laboratory-confirmed influenza/SARS-CoV-2 compared to not wearing masks.''
Professor Harrell’s point was that wide confidence intervals indicate too much uncertainty, and the data is not informative. The Cochrane editor-in-chief agreed, and she felt a more appropriate conclusion would have been to say the results were inconclusive.
Vinay then led a study showing that 20 previous Cochrane reviews that had similar point estimates and confidence intervals concluded no effect. So there seemed to be selective enforcement of uncertainty in experiments.
This brings me to the larger point of how clinicians should interpret the uncertainty in clinical trials.
Let me show you two examples of results and conclusions from the same month in the same journal.
DANCAVAS compared an invitation to have a slew of cardiac screening tests vs no invitation. The outcome was death. (A strong endpoint.) The final results: HR 0.95 with 95% CI 0.90-100. Translation: (roughly) the screening program reduced the chance of dying by 5%, and, if the experiment was repeated many times and a 95% confidence interval constructed each time, 95% of such confidence intervals would contain the true unknown value. (I put that in italics because the origin wording was wrong. Thanks to Professor Harrell for the correction.)
Conclusion in the NEJM: (emphasis mine)
After more than 5 years, the invitation to undergo comprehensive cardiovascular screening did not significantly reduce the incidence of death…
My comments: Really? The vast majority of the CI lies below 1.0. If this experiment was repeated many times, most of the true results would have shown a reduction of death. Death is an important endpoint. Yet, because the CI reached no effect (1.0), editors of the journal required the authors to conclude the intervention was unsuccessful. This seems extreme to me.
PROTECTED-TAVR compared use of device placed in the carotid arteries to capture debris during replacement of a diseased aortic valve. The idea is if you stop debris going north to the brain, than you reduce stroke.
Researchers randomized one group to get the device and one group did not. Outcome was stroke. The authors presented the results in the form of absolute differences—stroke rates of 2.3% vs 2.9%, device vs no device, respectively. The point estimate of the absolute risk reduction was -0.6%. The 95% confidence intervals ranged from a 1.7% lower risk of stroke to a 0.5% risk increase.
Unlike DANCAVAS, the upper bound of the confidence interval included a chance that strokes were higher in the device arm.
Conclusion in the NEJM: (my emphasis on the last phrase)
Among patients with aortic stenosis undergoing transfemoral TAVR, the use of CEP did not have a significant effect on the incidence of periprocedural stroke, but on the basis of the 95% confidence interval around this outcome, the results may not rule out a benefit of CEP during TAVR.
We discussed this issue on the Sensible Medicine podcast this week.
The first problem here is the uncertainty inherent in trial results. When I was young, I erred in focusing too much on the point estimate and whether the confidence intervals reached significance (less than 1.0 when looking for a reduction in an outcome).
But it is clear from the Cochrane review, Vinay’s re-analysis of other Cochrane papers, and these two examples from NEJM, that there is a lot more uncertainty in results and flex in making conclusions than I had appreciated.
Regulators have it easy because they can simply say something doesn’t meet a statistical threshold.
But doctors have it harder, for many reasons:
A) because we have to assess the width of the confidence intervals—too wide and we can’t draw conclusions due to imprecision. But…no one tells us how wide is too wide. Whatever this threshold is, it has to be consistent.
B) Because we have to look at the confidence intervals and determine the probability that an intervention has a benefit. The screening program in DANCAVAS may not have met its threshold but clearly the probability is more than not that it reduced death.
C) In the stroke prevention trial, the larger part of the confidence interval trends to benefit, but there is a chance of harm. What are we to think? Why were those investigators allowed to conclude possible benefit whereas DANCAVAS authors were not? I am asking.
If you are more confused and now have lower confidence in medical evidence, I have succeeded.
It’s why I am cautious and medically conservative. Please free to educate us in the comments.
It's pretty obvious to me that you need to look at the "funding and disclosures" section of any research publication to decide be aware of the very potential bias and influence behind the conclusions any paper has arrived at. And of course as I understand it , majority of clinical trial data from studies are not available without a FOIA. We must have transparency and accountability in research, journal publications and the pharmaceutical industry to accurately interpret data.
John, excellent piece -- thanks. The most important point here is that the incentive/pressure to support The Narrative(TM) is profound. Masking has long been known to be worthless...there are more studies than one can count if one aggregates them all. Some are better than others, but the only couple (excluding a piece of cloth in a clean room) that seem to show any kind of positive effect are the weakest of all. As Cochrane properly concluded, there is no evidence that masks have value from the studies they analyzed. The scandalous piece was their non-scientific "retraction" of marvelous results and then the repeated echo chamber of wrongthink such as you illustrated here in support of the bad science. Wish you had a bullier pulpit from which to shout this. Most people just emerge confused which is the point of those trying to call decent results into question, but that is bad for science, bad for scientists, bad for physicians and, worst of all, bad for patients. So keep these kinds of pieces coming, please.