Study of the Week -Professor Sanjay Kaul Reviews The POST-PCI Trial
Professor Kaul explains why the POST-PCI trial of stress testing vs no stress testing after high-risk coronary stenting failed to answer its question. It's a tour-de-force in critical appraisal.
Sensible Medicine is delighted to offer this review of the POST-PCI trial. Dr. Sanjay Kaul is a professor of cardiology at Cedars Sinai Medical Center in Los Angeles. He has published extensively in the academic literature, served in regulatory roles at FDA, and is a sought after voice of reason within health journalism. I consider him a chief justice of medical evidence.
The POST-PCI trial generated quite a stir in the cardiology community. I wrote about it here on Sensible Medicine on September 12. There is much to learn from dissecting this trial. Kaul’s analysis is masterful—and short. John Mandrola
By Sanjay Kaul
POST-PCI Trial: Inconclusive by Design
What is POST-PCI trial?
The POST-PCI trial is an investigator-initiated, multicenter, open-label, registry-based, pragmatic randomized controlled trial conducted in South Korea comparing two post-percutaneous coronary intervention (PCI) management strategies—routine stress testing vs. no routine stress testing—in high-risk PCI patients with complex anatomic or clinical characteristics which are associated with an increased risk of ischemic or thrombotic events during follow-up.
The study was powered (90% power) to detect a 30% relative reduction (so-called ‘delta’) in primary composite outcome of death, MI or hospitalization for unstable angina with routine stress testing arm compared with the standard care arm without routine testing (assuming an event rate of 15% at 2 years based on prior evidence).
(Editors note: All trials have to estimate how many patients they will need to sort out signal from noise. This depends on the expected event rate and difference in the two arms. If you have a lot of events or you expect a huge effect, you don’t need as many patients. If event rates are low, or if effect size is small, you need more patients. But more patients means more cost of doing the trial. Keep reading, Kaul explains this tension.)
Why was this study conducted?
There is limited high-quality evidence to support routine stress testing for asymptomatic patients after coronary revascularization. Accordingly, current guidelines provide a tepid support for (Class IIb, LOE (level of evidence) C recommendation), while Choosing Wisely campaign recommends against routine testing.
Yet approximately 40-60% of patients undergo cardiac stress testing within 2 years of PCI, highlighting the disconnect between policy and practice. This trial, therefore, addresses an unmet clinical need.
What did the study find?
At 2 years, a primary-outcome event had occurred in 46 of 849 patients (5.5%) in the functional-testing group and in 51 of 857 (6.0%) in the standard-care group (hazard ratio, 0.90; 95% confidence interval [CI], 0.61 to 1.35; P=0.62). There were no between-group differences with respect to the components of the primary outcome or key secondary outcome of invasive coronary angiography or repeat revascularization. Thus, among high-risk patients who had undergone PCI, a follow-up strategy of routine functional testing, as compared with standard care alone, did not improve clinical outcomes at 2 years.
Why did the study fail?
Did it really? Should this study be characterized as a truly negative trial (well-designed, well-conducted trial with adequate enrollment, but no benefits in any end point) or an underpowered trial (well-conducted trial, but underpowered to determine whether there was any benefit)?
The 95% CI is consistent with a treatment effect that ranges from a 39% risk reduction to a 35% risk increase. Such trials are at best ‘inconclusive’ or ‘uninterpretable’, and not necessarily ‘negative’ and therefore should be reclassified as ‘underpowered’ to describe them more accurately.
The expected treatment effect that the study was powered to detect (HR 0.68) is contained within the wide CI, indicating the study was indeed underpowered.
Why did this happen? There are two design elements that contributed to this. First, is the so-called ‘optimism bias’, i.e., the unwarranted belief in the efficacy of interventions; and second, is the overestimation of expected event rate in the standard care arm.
The trial was designed to detect an unrealistically large effect size which the investigators justified based on the effect size two previous trials—FACTOR-64 and PROMISE were designed to detect 40% and 20% relative reduction in primary outcome, respectively. However, the actual effect size observed in both these trials was null. It is disconcerting that these null results were reported in 2014 and 2015, respectively, much earlier than the beginning of enrollment in POST-PCI in November 2017. Thus, the ambitious effect size contributed significantly to the inconclusiveness of trial.
Investigators are frequently incentivized to tinker with the hypothesized effect to engineer a desired sample size, in a process referred to as the ‘sample size samba’. While some degree of ‘samba’ may be necessary given budgetary constraints, effect sizes should also be realistic and supported by prior evidence; biological plausibility alone is not sufficient. Indeed, as acknowledged by the trial investigators themselves ‘…an extremely large study sample (>90,000 patients) would be required to detect a clinically relevant difference in the primary outcome’. Had the investigators not ignored the results of FACTOR-64 and PROMISE, would this trial have even been done?
Another factor that contributed to low power was the 2.5-fold lower observed event rates in the standard arm (6%) than expected (15%), which was based on decade-old evidence. Secular trends in improved stent technology and increased adjunctive use of IVUS and FFR, more effective periprocedural and adjunctive pharmacologic treatment after stenting, and high levels of adherence to recommended medical therapy could have contributed to this. The potential impact of crossover (7.5% in the testing group did not undergo testing, 9% in the standard care group underwent testing) cannot be ruled out. It is unlikely that missing data (ascertainment was 98% for primary outcome and 100% for vital status) would account for the inconclusive results.
The Problem with Underpowered Trials: In addition to squandering precious resources, underpowered clinical trials are notorious for amplifying the risk of type 2 or ‘false-negative’ error, i.e., a potentially effective intervention is perceived as ineffective, when such evidence is weak or dubious. As the saying goes, ‘the absence of evidence is not the evidence of absence’.
Should clinical practice guidelines downgrade recommendations?
Does this trial provide reliable evidence regarding the prognostic role of active surveillance with routine functional testing post-PCI? No. Should guidelines be changed to Class III recommendation (‘don’t do it’) as suggested by the editorialist? No. Even though there is precious little evidence to support routine surveillance, the trial, in my opinion, does not provide the ‘knockout punch’ necessary to provide a definitive answer. The trial, unfortunately, is inconclusive by design!
Editors Note: For more of this wisdom, follow Sanjay Kaul @kaulcsmc
Pragmatic trials should be preceded by an EHR query, and not rely on priors from the literature, alone. An argument could be made that a retrospective in silico simulated trial should precede a prospective (more expensive) trial. However, I am not sure if funders are enthusiastic about "delaying" a prospective study in this way. And then the prior should be skeptical (as per Spiegelhalter). Of course, a more explicit Bayesian design is not a bad idea.
Thanks for this, well done. In a previous article of this thread the issue was "bias." I think it important to consider how bias, either direction, can play into sample size calculations based on statistical power. One's bias is sure to play a part in assuming the effect size parameters - assumptions - that must be made in calculating (or being able to support) an anticipated effect size. The sample size samba is a dance most researchers know well. As a researcher, author and reviewers I am puzzled by how little attention is paid to these issues. Thanks again Dr. Kaul.