When Studies Don’t Answer Their Question
The Study of the Week delves into the frustrating matter of inconclusive trials
As always, we at Sensible Medicine appreciate the support from our readers. We are surprised and grateful. JMM
Let’s start by leaving out the disease and treatment.
This randomized controlled trial was simple and elegant. One group received an active drug (an inexpensive generic), the other group received placebo.
The primary endpoint required no judging—alive or dead at 60 days.
The Results
At 60 days, 17.3% of those in the active arm died vs 21.3% in the placebo arm. That is an absolute risk reduction of 4%. The relative risk reduction in death equaled 33%.
There were no important safety signals.
This sounds pretty amazing: a generic drug reduced the risk of dying by 33% in relative terms and a whopping 4% in absolute terms. These are impressive effect sizes.
Yet something is missing. I haven’t told you all that you need to know.
No, it’s not what the disease or treatment was. I will get to that.
When you look at a study’s results, you need a measure of confidence in signal vs noise.
Were the results a chance finding? So far, all that I have told you is the effect size. Now you need to know whether the findings are statistically robust.
Confidence Intervals
That absolute risk reduction—in death—of 4% had 95% confidence intervals that went from 14% lower to 4.7% higher.
We use a hazard ratio to express the relative risk reduction of 33%. The HR in this trial was 0.77 and the 95% confidence intervals went from 0.45 (a 55% reduction) to 1.31 (a 31% increase).
The p-value, which quantifies the surprise value of a result given the assumption that there was no difference in the two treatment arms was high at 0.33.
JAMA published the AntibioCor trial. The disease in question was severe alcoholic hepatitis. The treatment in the active arm was the super-common antibiotic amoxicillin-clavulanate.
The idea is that patients who present with severe liver injury and inflammation are usually treated with oral corticosteroids. Both liver injury and steroid therapy increase the susceptibility to bacterial infection—which is very bad.
That is why the authors studied the use of long-term antibiotics in a preventive role. Of course, if there were signs of infection, antibiotics would be used. The question in this trial was about prevention.
It is an important question because a) alcoholic hepatitis is common, b) it is severe (note the high mortality rates) and c) amoxicillin-clavulanate is inexpensive.
The reason I highlight the AntibioCor Trial is that it is a shining example of an inconclusive trial.
The authors and editorialists conclude that the antibiotic did not improve survival and do not support preventive antibiotics in this disease.
The current clinical trial by Louvet et al suggests no role for prophylactic prescription of antibiotics when treating all patients who have alcohol-related hepatitis with corticosteroids.
While this may be a technically correct conclusion, I don’t think it is accurate.
The more accurate conclusion is that the wide confidence intervals do not preclude a substantial benefit or harm.
For instance, the lower bound (best case scenario) of the primary outcome allows for a massive 14% reduction in death. DEATH. Yet it also allows for (worst case scenario) of a 4% higher rate of death.
This data cannot give an answer.
Why did this happen?
The simple answer is that there were too few patients enrolled and too few events. It’s like determining whether a coin is fair with only 10 flips.
Had the investigators enrolled more patients, there would have been more events, and the confidence intervals would have been tighter.
You might now ask: why were there too few patients? How do trialists decide on the number of patients?
Well, this gets a bit complicated. And it is not an exact science.
There are two main factors, each with opposing forces. One force is ethical and the other pragmatic. Ethically speaking, RCTs are experiments on humans. So, you want to enroll enough patients to answer a question while exposing the minimum number to the experiment. Pragmatically, trials require effort and money. So, again, you want to enroll the goldilocks number of patients.
Most studies include a few sentences on how the authors determined the sample size. It’s usually in the Statistics paragraph. In JAMA, Sample Size Calculation gets its own section. Understanding this calculation usually requires some content expertise.
In this case, the authors estimated that 27% of placebo-treated patients would die. They then powered their study to detect a 14% reduction in death from the addition of amoxicillin-clavulanate. Given these estimates, they estimated a sample size of 280 patients.
The problem was that only 21% of patients in the placebo arm died. They underestimated.
I, and likely you, don’t have the content expertise to criticize the choice of sample size. Though it does seem strange to think that an everyday oral antibiotic used in a preventive way would reduce death by that much.
The message of this Study of the Week is that the point estimates from this trial suggest a major reduction in death. But due to underestimation of sample size, and wide confidence intervals, we don’t really know whether the intervention works.
Results like this have changed my mind about trials.
I used to think we need to do more trials. Randomize, randomize and randomize some more.
But now I have come to learn that trialists need to be super-careful about design. Because perhaps the worst outcome is doing an experiment that cannot answer a question—like this one.
(Let us know what you think in the comments)
Dr. Mandrola writes in this post:
"The Results
At 60 days, 17.3% of those in the active arm died vs 21.3% in the placebo arm. That is an absolute risk reduction of 4%. The relative risk reduction in death equaled 33%."
I believe there is an error here. Based on the facts as stated, the relative risk reduction in death equaled 19%, not 33%.
I think it is more important to know the make up of the two groups. Often times, these participants are cherry picked or negative results are weeded out over time. Also, I think that since this study might have showed good results, there should be 3 or 4 more done.
We have to face the fact that drug companies are more interested in selling drugs, especially high priced newer ones, than they are in curing any diseases or helping people live without drugs. That means most of these studies are going to be massaged and manipulated to extract the best outcomes with the happiest smile faces on them.
It still does not matter because we no longer have any agencies that are honest in their assessments of new drugs. The FDA ain't it no more....not by a long, long shot. They have given up any mission that includes protecting the public from harmful drugs...witness the recent mRNA substance fiasco.