How Authors' Choices Affect How Evidence Can Be Used at the Bedside
The Study of Week explores two reports of the FAME-3 trial
This is a story about a common practice that bothers me.
The example I will use involves the treatment of patients who have multiple coronary lesions. The two choices are placing multiple stents or having coronary bypass surgery. More on the specifics in a moment.
The general concept is how trials measure, report and interpret outcomes.
Before any experiment, choices have to be made about which outcome(s) to measure. People will argue about the choices, but a steadfast rule should be that reporting outcomes should be consistent over time.
Standard practice holds that a trial first report its overall results in the entire population. Yet, when thousands of patients have been randomized (experimented on), it is reasonable to look further into the data for other important information.
But this process requires caution. For two reasons: one is that the more you look at data, the greater the risk of chance findings. The second reason for caution is the normal human temptation to highlight findings you like and discount findings you don’t like.
FAME-3
The FAME-3 trial randomized 1500 patients who had multi-vessel coronary artery disease to a multi-stent procedure called percutaneous coronary intervention (PCI) or bypass surgery. Such studies have been done before, and surgery performed better.
But FAME-3 had a twist that the authors thought would boost PCI: stents would be placed using a technique called FFR or fractional flow reserve. (FFR is another story that needs to be told, but for now, just think of it as a possible way to make stenting better.)
The authors chose to measure a primary outcome of death, myocardial infarction, stroke or repeat revascularization (either more stent procedures or surgeries). FAME-3 stipulated that these measurements were to be made at one year.
The results of FAME-3 were clear.
After one year, the rate of the primary outcome was 10.6% in the PCI arm vs 6.9% in the surgery arm.
The hazard ratio—a measure of relative risk—was 1.5. The 95% confidence intervals went from 1.1 to 2.2. Translation: the absolute risk reduction of 3.7% translated to a 50% higher rate of the primary outcome in the PCI arm.
The statistics were also clear. FAME-3 was analyzed as a non-inferiority trial with a NI margin of 1.65. Meaning that if the worst-case scenario (the upper bound of the 95% confidence interval), was less than 65% worse, than PCI would be declared non-inferior to surgery.
The upper-bound in this case 2.2, which is far greater than 1.65. So… PCI was not noninferior to surgery. If that sounds like tortured language, it sort of is, but it’s necessary because you can have a treatment that misses non-inferiority but is not necessarily inferior.
In the case of FAME-3, however, since the best-case scenario, the lower bound of the confidence interval was 1.1 (or 10% higher rate), then, PCI, can be declared inferior to surgery. (Non-inferiority analyses is also another chapter.)
FAME-3 Three Year Results
Now for the curious part of the story. About 7 weeks ago, the FAME-3 authors reported three-year follow up of the trial in the journal Circulation.
Assessing results at longer time frames is reasonable because these were 65-year-old patients, and the treatment of severe coronary disease is for many years not 12 months.
But. But. The authors wrote the report in a curious way. Curious for two reasons: First, instead of focusing on what happened with the primary outcome of death, MI, stroke and repeat procedures, they reported a slightly different outcome. They dropped the repeat procedure outcome and reported only death, MI, stroke. The second curious thing was how they interpreted this result.
If you were a Neutral Martian, you’d probably think that an endpoint felt to be most valuable at 1 year, would be similarly important at 3 years. Consider this case from a patient perspective. If going back for a repeat procedure was felt to be a bad outcome at say 11 months, why not 28 months?
The results at three years were that 12.0% of the PCI group experienced a death, MI or stroke vs 9.2% of the CABG arm. The HR was 1.3 or 30% higher for PCI. The confidence intervals ranged from 0.98-1.83 and the p-value was just barely over the threshold at 0.07.
This 30% higher rate of bad outcomes did not reach significance and allowed the authors to write
“At 3-year follow-up, there was no difference in the incidence of the composite of death, MI, or stroke after FFR-guided PCI with current-generation drug-eluting stents compared with CABG.
These results provide contemporary data to allow improved shared decision-making between physicians and patients with 3-vessel coronary artery disease.”
Summary
See what happens. While the authors did pre-specify that they would look at this endpoint, the fact remains that what was once a decidedly inferior therapy at 12 months, was transformed into an equally good therapy.
But is this really true? I don’t think so. Two reasons.
First is that deeper in the results you find the results using the original primary endpoint—including the repeat procedures outcome. It was 11.1% PCI vs 5.9% CABG. HR 1.5 (1.2-2.0) and a P-value of 0.002. Using the same endpoint at 3 years yields the same inferior result for PCI.
Second. Even with the three-component endpoint (death, MI, Stroke), there is a still a strong signal of worse outcomes with PCI. The hazard ratio was 30% higher and the p-value was nearly significant at 0.07.
The authors call this “no difference.” But that is a stretch. Yes, it did not reach statistical significance, but look at the confidence intervals: the lower bound is 0.98 and upper bound is 1.83. Most of that is well above 1.0, indicating a high probability of PCI being worse than CABG.
My Conclusion
The teaching point here is that without doing anything nefarious the authors’ choices of what to measure and then amplify affects how evidence is used at the bedside.
A cardiologist finds three-vessel CAD in a patient. The main results of FAME-3 would have argued for referral to surgery.
But now, as the three-year paper stands, PCI looks more equivalent. You translate this to the patient who is happy to avoid surgery.
A Neutral Martian looks at the three-year data and concludes that PCI continues to be the inferior option.
These posts have been generating great discussions. Thank you for that. And your continued support. We have some great things planned. Stay tuned. JMM
Great comment and subsequent discussion
The problem is that despite this manipulation that is similar to the one used in EXCEL (repeat revascularization was removed for the first time in CABG vs PCI trials) and a custom made MI definition (not detailed in the trial plan submitted to clinical trials.gov) was used, this paper will certainly be used in future Guidelines recommendations
I think Dr. Mandrola is being too generous with his upper bounds for declaring “nefarious behaviour “. Changing the primary endpoint, even if prespecified, is an absolute no-no. If this was initially planned as a 36 month endpoint study and they did that, there would be an outcry.
This is yet another in a long line of studies by people who push catheters, wanting to justify their reasons for pushing more catheters. There is ample prior data to suggest that revasc (more than other endpoints) is particularly poor for PCI (Vs CABG) as time goes by. It would be entirely predictable (physiology guided PCI or not) to see that drive the primary composite more and more with progressive follow up. And that is in fact what was shown. That they have obfuscated this fact by changing the endpoint (and that CIRC allowed them to do so without more overt indication of this in the abstract and text) is egregious abuse of the scientific process by both the authors and the journal.
And of course, the other elephant in the room is that med Rx alone was not studied. This part is at least excusable, since this study came before Ischemia. But it doesn’t eliminate or resolve the dilemma of “just because we can (revascularize in some way for stable CAD) doesn’t mean we should”.