The Uncertainty of Clinical Trial Results
For the Study of the Week, I will show examples of results that I hope make you more cautious about drawing conclusions.
Clinical trials are supposed to give doctors answers to major questions. You test treatment A against treatment B and count outcomes. You then learn which treatment delivers better results.
Grin. It rarely works that cleanly. Rare are times when treatment A is so dominant over B that everyone agrees. Such scenarios are rare because trials are mostly applied to medical decisions in which doctors cannot agree—so called equipoise. (The debate about when there is equipoise is complicated and a matter for another post.)
Professor Frank Harrell is an eminent professor of statistics at Vanderbilt. Thankfully he stays active on Twitter, so that many of us can learn from him.
Sensible Medicine is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.
Let me do a brief summary and then make my point about drawing conclusions from randomized trials.
The Cochrane review of studies of medical/surgical masks for preventing community spread of respiratory viruses found a relative risk of 1.01. We call that the point estimate. And 1.0 indicates no effect.
Point estimates come with 95% confidence intervals, which stipulate the precision of the experiment. In this case the 95% confidence intervals ranged from 0.72 to 1.42. That means masks could have reduced viral spread by 28% or increased it by 42%.
The authors of the Cochrane review wrote:
“Wearing masks in the community probably makes little or no difference to the outcome of laboratory-confirmed influenza/SARS-CoV-2 compared to not wearing masks.''
Professor Harrell’s point was that wide confidence intervals indicate too much uncertainty, and the data is not informative. The Cochrane editor-in-chief agreed, and she felt a more appropriate conclusion would have been to say the results were inconclusive.
Vinay then led a study showing that 20 previous Cochrane reviews that had similar point estimates and confidence intervals concluded no effect. So there seemed to be selective enforcement of uncertainty in experiments.
This brings me to the larger point of how clinicians should interpret the uncertainty in clinical trials.
Let me show you two examples of results and conclusions from the same month in the same journal.
DANCAVAS compared an invitation to have a slew of cardiac screening tests vs no invitation. The outcome was death. (A strong endpoint.) The final results: HR 0.95 with 95% CI 0.90-100. Translation: (roughly) the screening program reduced the chance of dying by 5%, and, if the experiment was repeated many times and a 95% confidence interval constructed each time, 95% of such confidence intervals would contain the true unknown value. (I put that in italics because the origin wording was wrong. Thanks to Professor Harrell for the correction.)
Conclusion in the NEJM: (emphasis mine)
After more than 5 years, the invitation to undergo comprehensive cardiovascular screening did not significantly reduce the incidence of death…
My comments: Really? The vast majority of the CI lies below 1.0. If this experiment was repeated many times, most of the true results would have shown a reduction of death. Death is an important endpoint. Yet, because the CI reached no effect (1.0), editors of the journal required the authors to conclude the intervention was unsuccessful. This seems extreme to me.
PROTECTED-TAVR compared use of device placed in the carotid arteries to capture debris during replacement of a diseased aortic valve. The idea is if you stop debris going north to the brain, than you reduce stroke.
Researchers randomized one group to get the device and one group did not. Outcome was stroke. The authors presented the results in the form of absolute differences—stroke rates of 2.3% vs 2.9%, device vs no device, respectively. The point estimate of the absolute risk reduction was -0.6%. The 95% confidence intervals ranged from a 1.7% lower risk of stroke to a 0.5% risk increase.
Unlike DANCAVAS, the upper bound of the confidence interval included a chance that strokes were higher in the device arm.
Conclusion in the NEJM: (my emphasis on the last phrase)
Among patients with aortic stenosis undergoing transfemoral TAVR, the use of CEP did not have a significant effect on the incidence of periprocedural stroke, but on the basis of the 95% confidence interval around this outcome, the results may not rule out a benefit of CEP during TAVR.
We discussed this issue on the Sensible Medicine podcast this week.
The first problem here is the uncertainty inherent in trial results. When I was young, I erred in focusing too much on the point estimate and whether the confidence intervals reached significance (less than 1.0 when looking for a reduction in an outcome).
But it is clear from the Cochrane review, Vinay’s re-analysis of other Cochrane papers, and these two examples from NEJM, that there is a lot more uncertainty in results and flex in making conclusions than I had appreciated.
Regulators have it easy because they can simply say something doesn’t meet a statistical threshold.
But doctors have it harder, for many reasons:
A) because we have to assess the width of the confidence intervals—too wide and we can’t draw conclusions due to imprecision. But…no one tells us how wide is too wide. Whatever this threshold is, it has to be consistent.
B) Because we have to look at the confidence intervals and determine the probability that an intervention has a benefit. The screening program in DANCAVAS may not have met its threshold but clearly the probability is more than not that it reduced death.
C) In the stroke prevention trial, the larger part of the confidence interval trends to benefit, but there is a chance of harm. What are we to think? Why were those investigators allowed to conclude possible benefit whereas DANCAVAS authors were not? I am asking.
If you are more confused and now have lower confidence in medical evidence, I have succeeded.
It’s why I am cautious and medically conservative. Please free to educate us in the comments.