Small Trials vs Large Trials
I have a super fun post today about another case when cardiologists were fooled
I got the idea of this story from our project over at Cardiology Trials, where we are cataloging the seminal trials in cardiology. Gosh, I am learning a ton about medical evidence. Please do head over there and join the learning.
When I rounded in the coronary care units back at Indiana University in the early 1990s, there was momentum to use IV-magnesium (Mg) in patients after myocardial infarction.
The reason for Mg enthusiasm was a series of studies that showed Mg actually worked. Patients who received the drug died less often. A lot less often.
When these small studies were taken together (we call this a meta-analysis), the mortality reduction was a whopping 50%. Here is what it looks like in a picture.
Doctors speculated that Mg might produce this amazing effect by coronary dilation, reduction of arrhythmias, favorable platelet effects, and a host of other means.
I hope regular readers notice two issues at this point, in addition to the mystery of how Mg would actually deliver such a massive benefit. One is the small numbers of events. Even with 9 trials combined the events were only 42 and 86. That’s not a lot of events. The second issue is the huge effect size. We should always be skeptical of interventions that reduce death by 50%.
The next chapter of the Mg story involves a team in Leicester England. They ran the LIMIT-2 trial where they randomized more than 2300 patients with MI to IV Mg or placebo infusion. That larger number of patients was substantially more than the 9 trials combined.
The results: Mortality from all causes was 7·8% in the magnesium group and 10·3% in the placebo group, a relative reduction of 24% (95% confidence interval 1-43%). (P = 0.04)
Can you see what’s happening here? The larger trial, with more events, is still positive. But the effect size is not 50% but 24%. The authors write in the manuscript that adding LIMIT-2 to the smaller trials moves the mortality signal from 50% to something smaller than that—around 35-40%.
The next and final chapter changed everything. Buckle up.
The ISIS-4 trial randomized more than 58,000 patients after MI. And, no, that is not a typo. I will come back to the size in the conclusion.
As we write on CardiologyTrials, ISIS-4 was a 2x2x2 factorial trial that studied captopril, nitrates and IV-Mg. (It was one of the first trials establishing early ACE-I as beneficial in MI). But let’s stay focused on the IV-Mg story.
The Mg results: 2216 patients (7.64%) patients died in the Mg arm vs 2103 (7.24%) in the control arm. The increase in death (about 4 per 1000) almost made statistical significance (P = 0.07). The confidence interval for the increased death went from 0% death reduction to a 12% risk increase. So it excluded any benefit from IV-Mg.
Here is the new picture: (from the ISIS-4 paper).
Comments:
What a story! The early Mg studies randomized patients in the late 1980s. The meta-analyses came out in 1991-1992. LIMIT-2 publishes in 1992.
For a half-century, doctors felt that IV-Mg reduced death after MI. But then. Boom.
ISIS-4, with it’s nearly 60,000 patients and more than 2000 events in each treatment group, definitively shows that IV-Mg had no effect on mortality. Even a trend toward increased death.
There are multiple teaching points.
One is that we should always be skeptical of trials with large treatment effects. Such is not normal in biomedicine. In fact, a Stanford group has systematically shown this. They reviewed 85K forest plots from more than 3000 systematic reviews from the Cochrane database of studies, and found that “most large treatment effects stem from small studies, and when additional trials are done, the effect sizes typically become much smaller.”
You might wonder why the small studies found such large treatment effects for Mg infusions. I believe it is a) noise (like flipping 4 heads in a row once in awhile); and b) publication bias. Publication bias occurs when positive studies are more likely to be written up and published.
The second lesson here is the value of large trials. ISIS 4, like many of the studies of the era, randomized huge numbers of patients. This allows for sorting out signals, such as death. We don’t seem to do this anymore.
Now, we randomize fewer patients but record composite outcomes, which may include things like, death and MI and stroke and coronary revascularization. When you choose an endpoint with multiple endpoints, you record more endpoints, and don’t need as many patients.
The problem with modern cardiac trials is that when a therapy reduces such a composite endpoint, it’s harder to speak with patients about the treatment. Contrast that to IV-Mg. Due to trials like ISIS-4, we now can say that IV-Mg does not reduce death. Period.
Doctors who do trials will rebut my call for larger trials powered for important endpoints, such as being dead or alive, (not whether or not a doctor chooses to do another stent (coronary revascularization)) as naive. Naive because larger trials are harder to do and more expensive. And I understand.
But if we want to know things, really know things, the ISIS-4 trial teaches us a lot—about being humble, skeptical and wise about trials with small numbers of events.
Sensible Medicine remains an advertiser-free endeavor. Thank you for your support. Most of our posts remain free to all. Sometimes we limit comments to paid subscribers. We are open to posting other opinions. Submit your work via email. JMM.
There is another lesson to be learned. David Spiegelhalter showed that the standard methods for doing meta-analysis are flawed. These methods (DerSimonean and Laird) pretend that the variance of random effects is estimated without error. Spiegelhalter re-ran the meta-analysis using a Bayesian random effects model and showed a lack of evidence for Mg efficacy then because of the properly wider uncertainty interval. Statistical methods matter.
All I can say is “Bravo and Wow”. Your post today is one of the most important and instructive things I have seen lately (retired surgeon here, “amateur” epidemiologist, and PhD chemist as well). I was very glad to see your terse, tidy, and lucid comments, which included serious questioning of the rather popular practice these days of using Composite Outcomes in clinical trials. Cardiologists are certainly not the only people enamored by this kind of “shortcut".