Study of the Week – The Matter of Confidence
The POST-PCI trial was reported as “negative.” The better interpretation was that it was inconclusive. These are not the same.
Imagine an experiment to determine if a coin is fair.
If you flipped it 10 times and got 7 heads and 3 tails, you would not be sure.
If you flipped it 100 times and got 70 heads and 30 tails, you’d be worried that it was biased towards heads.
If you flipped it 1000 times and got 700 heads and 300 tails, you would be highly confident that this coin was biased to turn up heads.
Medical experiments are similar. Scientists who compare one intervention to another have to have enough outcomes to sort out signal from noise.
Scientists make educated guesses before the experiment starts about the rate of outcomes and the expected differences in the two interventions. They try to avoid the situation in the first coin-flip experiment where you can’t tell if the 7 heads/3 tails is signal or noise.
One way to avoid an inconclusive result is to recruit oodles of participants. The two problems with that are a) trials are expensive and funds are limited, and b) trials are actually human experiments, so you want to expose the minimum number of people to experimentation.
The POST-PCI trial set out to determine if the strategy of doing surveillance stress tests after a person gets a cardiac stent is beneficial.
The backstory: In the old days, cardiologists did stress tests after a patient had a coronary blockage fixed. The idea was that the fixes (balloon angioplasty, stents, or even bypass surgery) were not 100% reliable, and it was good to know if there were still residual obstructions. It didn’t hurt that that cardiologists were generously paid for these stress tests.
Over time, we learned that this lucrative strategy was mostly unnecessary. For two reasons: stents and bypass became very reliable and numerous studies showed that medical therapy was highly effective in preventing future events even when arteries had severe blockages.
Now, however, technology has advanced so much that cardiologists can put stents in super-high-risk situations. Situations that, in previous years, would have required bypass surgery.
One example is placing a stent in the left main coronary artery. This is high-risk because if that stent were to clot off or become re-stenosed (re-blocked), a huge amount of heart muscle would be in jeopardy.
The POST-PCI trialists randomized about 850 patients who had a high-risk coronary intervention to either having a stress test afterwards or standard of care (stress tests only when indicated by symptoms).
They measured a really important endpoint—a composite of death, heart attack or a hospital admission for unstable angina.
The results: After 2 years, a bad outcome occurred in 5.5% of patients in the stress testing arm and 6.0% of those in the standard care arm. We say the relative risk difference was 10%, which is often expressed as a hazard ratio. Here, the HR was 0.90.
The authors also reported a p-value, which is a measure of how surprising the data is given the assumption that there was no difference in the treatment arms. We call a result statistically significant if the p-value is below 0.05, (very surprising). In POST-PCI, the p-value was 0.62, which is not at all surprising given the assumption of no difference in the two strategies.
I realize that paragraph may read like gibberish. Try this:
What you want to know about this (or any) experiment is if the result was like the coin flip experiment of 7 heads/3 tails or the one that had 700H/300T. IOW, you want to know how confident to be in that 10% lower rate of events in the stress-testing arm.
Boom—that is why we have confidence intervals!
In POST-PCI, the 95% confidence intervals surrounding that hazard ratio of 0.90 were 0.61 to 1.35. What does that mean?
Sadly, this is not a good result. It means, roughly, that the intervention of doing surveillance stress tests after high-risk coronary intervention could have lowered the chance of a bad outcome by as much as 39% or increased it by as much as 35%.
The editors of the New England Journal of Medicine are old-school. If an experiment does not meet that p-value threshold of 0.05, they make the authors conclude that there was no difference. Hence, the conclusion of this experiment reads:
Among high-risk patients who had undergone PCI, a follow-up strategy of routine functional testing, as compared with standard care alone, did not improve clinical outcomes at 2 years.
I don’t think that is right.
The message is that when the confidence intervals include a substantial benefit and substantial harm, the result is inconclusive. POST-PCI did not include enough patients to answer the question. We simply don’t know.
It is nobody’s fault. The authors assumed high-risk patients would have had more bad outcomes. If there were more bad outcomes (coin flips), we’d have had more confidence. The 95% confidence intervals would have been tighter.
The message for readers of medical studies is always look at the confidence intervals.
These give you a clue to how confident to be in the trial’s results.
JMM
(BTW, a good contrast is last week’s study of the week, DANCAVAS. This trial found a 5% reduction in death with screening. The confidence intervals ranged from 0.90 to 1.00 (or a 10% reduction to a 0% reduction). This means that there was a very good chance that the screening program in that trial had a positive effect. It’s one of the reasons why my take differed from Dr. Foy’s.
Thank you for taking the time to write this. Dana
Thank you so much for this lucid summary!