The Frustration of Not Being Able to Sort Signal From Noise
The Study of the Week describes why I often struggle to find certainty in clinical trials.
A treatment to reduce stroke is tested in a clinical trial. In the treatment group, 2.3% of patients had a stroke vs 2.9% in the control arm. The question that everyone wants to know is whether this -0.6% difference is signal or noise.
For this, we look to the 95% confidence intervals. In the PROTECTED TAVR trial, the confidence intervals went from -1.7% (a lower stroke rate) to 0.5% (a higher rate of stroke).
I will avoid the controversy over defining confidence intervals, but suffice to say, this degree of wideness allows for the treatment to be better or worse than the control arm.
NEJM allowed the authors to hedge in the conclusion. (Emphasis mine.)
Among patients with aortic stenosis undergoing TAVR, the use of the *device* did not have a significant effect on the incidence of periprocedural stroke, but on the basis of the 95% confidence interval around this outcome, the results may not rule out a benefit of the *device* during TAVR.
Wide confidence intervals are a problem because trials are supposed to give answers. Doctors use trials to guide recommendations. Trials are the foundation of knowledge in medicine. But. A trial that has this much uncertainty doesn’t help.
It seems that most weeks I am writing to you about such a trial.
And even when I don’t see that much uncertainty in a result, Vanderbilt statistics professor Frank Harrell points out that, indeed, John, there was uncertainty. I chose cerebral protection as an example, but I could have picked many different examples.
Why does this happen? How often does it happen?
A group of investigators, mostly from Germany, have published a sobering analysis of trials over a ten-year period (from JAMA, NEJM, Lancet) and discovered why we struggle to sort signal from noise in standard cardiology trials.
JAMA Network Open published the review of 344 cardiology trials. The authors studied the accuracy between the estimated event rates and effect sizes and the actual event rates and effect sizes.
Pause there. Don’t glaze over that sentence. Estimated event rates and effect sizes are how investigators decide how many patients to enroll in a trial.
If your event rate is rare or effect size small, you need lots of patients. (The PCSK9i trials had 18K patients.) If your event rate is high or effect size is large, you need fewer patients. (Cardiogenic shock kills more than half its patients, and the DanGer-Shock trial of the Impella device had 360 patients.)
Here’s the problem: if you are too optimistic, either about how many events you will see (event rate), or, how good the therapy is (effect size), then you don’t enroll enough patients. This leads to wide confidence intervals.
We then say the trial was underpowered. Again, that sentence doesn’t sound awful. Let me rephrase: if you have wide confidence intervals, which can be consistent with benefit or harm, you have experimented on humans for nothing.
The German-led study found that cardiac investigators are quite poor in their pre-trial estimates.
The observed event rates were substantially lower than expected rates. More than half the trials overestimated event rates.
It was the same with effect size estimates. The median effect size observed in the actual trials was only 9% (RR 0.91). But the average estimated effect size was 28%. (RR 0.72)
Device trials (vs drug trials) had less accuracy between estimates and observed values.
Comments
This is bad. An experiment on humans should have enough power to provide an answer.
Having been involved with an underpowered trial, I understand the issues. The problem is that trials don’t do themselves. Trials cost money; they take huge efforts. Extra patients means extra costs and effort.
But it’s not just about money and time. Since a trial is an experiment on humans, you want to enroll the minimum number of patients. Too many patients is also a problem.
Yet the current paper describes a widespread problem of not having enough power to get answers from experiments. It’s an important paper because it exposes a common and serious problem.
Since I am a user of evidence (a doctor), I don’t have the answers. My knee-jerk solution would be to advise more pessimism before the trial. Cardiologists are not so good at pessimism.
Another solution would be to figure out ways to glean more data from the experiment. I suspect Professor Harrell has some ideas.
The take-home message for readers of clinical trials is to understand that uncertainty around trial results stems largely from overly optimistic pre-trial estimates of how many events there will be or how effective the treatment will be.
Users of evidence might be part of the solution by publicly calling out the problem of inaccurate pre-trial estimates.
Instead of being sad about a trial not finding a positive result, we could be sad about wide confidence intervals. JMM
come on John; a 0.6% difference; only a pharmaceutical could claim that as something;
surely we are in statin territory here; specious claims of efficacy;
Dion says it well; the absolute risk reduction is so small (over 200 need to spend the cash to claim even one might benefit .......)
Dion comments "since there are always also adverse effects to consider," and do we all believe a pharma is going to really tell us the incidence of problems; ... please .....
I commend this paper to all; please read it; get your head around its implications; https://bmjopen.bmj.com/content/12/12/e060172
it showed that in a study, 40% of heart diagnoses; by cardiologists; looking after ill people; were OVERTURNED by a secret committee; run by the researchers; getting paid by the company; months later, who never saw the patients obviously; come on John; so many of us are so over all of this stuff
raw data was vigorously and systematically re-worked; John: why don't you review that paper sometime? If we saw daylight in one paper; how much more is going on? We admire your honesty and thoroughness; you do a great job.
You do a good job of explaining subtleties in research well. I really appreciate your making this kind of analysis available without the new Substack teaser feature. I am not a medical researcher but have always wondered about the small differences so often report in many research studies. Now I have a better understanding of those numbers. Thank you.