We often forget this but clinical trials are experiments on humans. To minimize risks to participants, trials have independent oversight boards that make recommendations on stopping trials early.
A typical example is when one group in the trial shows obviously better results. The group of experts judge whether it would be ethical to continue the trial—because if the effect is so much better in one arm of the trial, the other arm is getting inferior care.
The tension of course is that trials stopped early have fewer events. Fewer events increases the risk of seeing noise instead of signal. A fair coin might show many more tails when tossed only 10 times rather than 100 times.
I wanted to highlight today a classic paper on the risks of stopping trials early for benefit. I then add an example of a common medicine.
The stimulus for this post came from a review article by Sanjay Kaul and Javeed Butler wherein they discussed insights from the early stopping of 4 pivotal trials in chronic kidney disease. (I did not remember that the three SGLT2i trials and one GLP1a trial had been stopped early for benefit.)
The Classic Paper by Bassler et al in JAMA
This group found 91 truncated trials (for benefit) that asked 63 different questions. They matched these trials against 424 similar but nontruncated trials. They found the nontruncated trials from systematic reviews.
They then compared the effect sizes of the two types of trials. And found that overall, effective sizes were 29% more favorable in truncated trials than nontruncated trials.
Basically, they divided the relative risk ratio for the truncated trials by the relative risk ratios of the nontruncated trials. This difference easily met statistical significance.
They also found that the largest difference between truncated and nontruncated trials occurred when the truncated trials had fewer than 500 outcome events. (This makes sense because the fewer the events, the greater the chance of finding noise.)
The final finding was that in more than half the comparisons (62%) the effect size in the nontruncated trials did not reach statistical significance. That is a big deal because every one of the 91 included trials were stopped early for benefit.
The authors concluded:
The Story of ASA for Primary Prevention of Cardiac Events
One of the trials included in Bassler’s analysis was the Physicians Health Study—a large randomized trial comparing aspirin to placebo for the prevention of cardiac events in patients without heart disease (aka - primary prevention).
The trial included 22K participants. An early look at the data revealed a reduction in total myocardial infarction. Hazard ratio was 0.53 with 95% confidence intervals 0.42-0.67. The p-value calculated at less than 0.00001. This prompted the study to be stopped early.
This was one of the trials that got aspirin use established as a preventive medicine. Though, in the final report, the reduction in MI did not lead to a reduction in total cardiovascular death.
In 2018, three modern trials comparing asa vs placebo for primary prevention were published. All were nonsignificant. Let me show you the one that most closely resembles the Physicians Health study. It was called ARRIVE.
In this study, more than 12,500 patients without heart disease were randomized to asa 100 mg per day or placebo. The six-year trial was not stopped.
The rate of the composite primary outcome of CV death, MI, unstable angina, stroke or transient attack was 4.29% with asa vs 4.48% with placebo. The tiny difference did not come close to reaching statistical difference.
Since the Physicians Health Study was stopped due to lower MI rates, I will show you the MI results in ARRIVE:
The hazard ratio here was 0.85 or 15% better with aspirin—but it did not reach statistical significance.
So… If we did a Bassler-et-al type analysis for MI we would find a HR in the truncated Physicians Health study of 0.53 vs a HR of 0.85 in the nontruncated ARRIVE study.
The effect size for aspirin in the truncated trials was 38% greater than the nontruncated trial. (HR 0.53/HR 0.85 = HR - 0.62)
Caveats with the ASA example
I do not mean to imply that the all of the difference in the early positive asa trials vs the newer neutral asa trials was due to early termination.
Other factors that could explain the difference was that 20 years separated the trials. The ARRIVE authors explained the lack of asa benefit on the very low event rates. Indeed the event rates for MI were lower in the modern trial.
There have been many other changes (removal of transfats from the food, less smoking, more effective blood pressure drugs, etc) that could have reduced the effect of asa in the modern era.
But early termination for benefit in the Physicians Health study could have contributed to its large effect size for MI.
Teaching Points and Things to Think About
The main teaching point is that when you read that a trial is stopped early for efficacy, think about Bassler et al.
It doesn’t mean the trialists have done anything wrong. Ethics hold that you don’t want to expose patients to inferior treatment. But a trial stopped early has a decent chance of finding an effect size larger than a similar trial that was not stopped early.
The argument at hand is how much information we wish to obtain from trials. When we stop early, there are fewer events. In addition to accentuating effect size, fewer events also mean less chance of finding signals in subgroups of patients.
I don’t pretend to have answers on the perfect way to conduct trials. My point is to inform users of medical evidence to be mindful of the implications of early termination of trials.
Kaul and Butler’s paper that I linked to earlier is also an excellent resource.
Update on Sensible Medicine: Vinay, Adam and I remain shocked at the success of this newsletter. Well more than a million views per month far exceeds our expectations. Thank you.
We remain an advertisement-free site. We think subscriber-based models allow for independent thinking. As Emerson said in his 1837 American Scholar lecture: academics should think clearly not influenced by tradition or historic views. We can do that because of your support. Thank you. JMM
Kind of like how the original trial for AZT (a failed chemo drug used to treat AIDS) was stopped early, or how more recently Remdesivir (failed Ebola drug) was stopped early for COVID.
From a 1989 article about AZT
"The most serious problem with the original study, however, is that it was never completed. Seventeen weeks into the study, when more patients had died in the placebo group, the study was stopped, five months prematurely, for “ethical” reasons: It was considered unethical to keep giving people a placebo when the drug might keep them alive longer. Because the study was stopped short, and all subjects were put on AZT, no scientific study can ever be conducted to prove unequivocally whether AZT does prolong life."
"In the study that won FDA approval for AZT, the one fact that swayed the panel of judges was that the AZT group outlived the placebo group by what appeared to be a landslide. The ace card of the study, the one that canceled out the issue of the drug’s enormous toxicity, was that 19 persons had died in the placebo group and only one in the AZT group. The AZT recipients were also showing a lower incidence of opportunistic infections.
While this data staggered the panel that approved the drug, other scientists insisted that it meant nothing — because it was so shabbily gathered, and because of the unblinding. Shortly after the study was stopped, the death rate accelerated in the AZT group. “There was no great difference after a while,” says Dr. Brook, “between the treated and the untreated group.”"
Sure seems like the treatment group was being handled differently to keep them alive for the study. I remember reading somewhere that they were receiving weekly blood transfusions to keep them healthier to counteract AZT effects...
https://www.spin.com/2015/10/aids-and-the-azt-scandal-spin-1989-feature-sins-of-omission/
OK. Let's create a process ("science") to determine if a "treatment" is safe and effective. Then, let's break that process because a treatment MIGHT be beneficial for a few people, thus preventing us from any confidence about that treatment for the larger population. What science giveth, it can taketh away.
Waiting for the final results of a RCT is hard. The scientists conducting it start acting like kids on a long car trip: "Mommy, are we there yet?"!