CHAMPION AF Breaks Almost Every Rule of Noninferiority Trial Design

Millions of people have AF and take oral anticoagulation. It's one of Medicine's most evidence-based treatments. It would take strong data to upend this treatment.

Mar 30, 2026

I was in New Orleans when the CHAMPION AF trial was presented as a late-breaking clinical trial. NEJM published the manuscript.

It’s both one of the most biased and most consequential studies I have reviewed in years.

The question: for the millions of people with atrial fibrillation (AF) who take oral anticoagulation (OAC), does the Watchman Flx left atrial appendage closure (LAAC) device provide an alternative to OAC for stroke prevention?

The stakes for society, patients, the medical profession and of course the company are massive, since millions of people take OAC for stroke prevention.

It’s also a reasonable question because if mechanical occlusion of the left atrial appendage worked then a one-time procedure might be able to provide lifelong protection from stroke without taking tablets for anticoagulation.

You would think such a consequential question would be studied in a rigorous flawless trial. Alas, that was definitely not the case.

CHAMPION Trial

The authors randomized 3000 patients into either a Watchman arm or OAC arm. Modern direct acting oral anticoagulants (DOAC) were used for the medical arm.

Patients were fairly low stroke risk at age 71 and CHADSVASC score of 3.5. They also had a low HAS-BLED score of 1.3, indicating a very low bleeding risk.

You find most of the trouble in this trial design in the choice of endpoints and their analyses.

Two Problem with the PRIMARY EFFICACY ENPOINT

The primary efficacy endpoint is stroke, systemic embolism and cardiovascular death tested with non-inferiority. Recall that it in a noninferiority design, no difference in outcomes is considered a positive.

The issue here is adding CV death to the endpoint; everyone accepts the fact that neither Watchman nor OAC will affect CV death. Therefore… adding an outcome to a composite endpoint that will not be affected by either treatment simply adds outcomes (noise) in both arms making noninferiority easier to reach. (Thing is that all LAAC trials do this, so CHAMPION AF authors aren’t deviating from bad practice.)

The far more problematic issue was the choice of noninferiority margin. In a noninferiority trial, the authors, often in discussion with regulators, choose a margin that the new therapy should be no worse than. But to do this, you first have to estimate an event rate in the control arm. Here, they expected a 12% rate of primary outcome events in 3 years. They then chose 4.8% as the NI margin. IOW: if the upper bound of the 95% confidence interval for the absolute risk difference was less than 4.8%, the Watchman would be declared noninferior for efficacy.

Now we tip-toe into fractions. Sorry. 4.8% higher than 12% is equal to 40% or 1.4. Translation: the relative risk margin is 1.4. In most if not all drug noninferiority trials, the authors set out NI margins in both absolute and relative terms. CHAMPION AF authors did not do this. Their noninferiority margin was just 4.8% in absolute terms.

Can you guess the problem?

In CHAMPION AF, the event rates came in much lower than 12%. They were 5.7% in the Watchman arm and 4.8% in the OAC arm. That risk difference was 0.9% higher in the Watchman arm; the 95% confidence intervals were -0.8-2.6. Since 2.6< 4.8, the authors declared noninferiority. And there it sits in the NEJM, on the scoreboard if you will, as a win.

The problem though is that when event rates come in that much lower, the 4.8% margin is much too lenient. In normal trials, the authors would have also tested NI with a rate ratio or relative risk.

As I said above, the margin in relative terms is 1.4. The rate ratio (or relative risk difference) comes out with a Hazard Ratio of 1.20 (20% higher) but 95% confidence intervals of 0.87-1.66). Since 1.66 > 1.4, the device does not make NI on relative terms. Why the editors or regulators let the trialists get away with not using both rate ratio and risk difference is hard to explain.

Two Problems with the PRIMARY SAFETY ENDPOINT

The authors chose nonprocedural clinically relevant nonmajor bleeding as the primary safety endpoint. The results favored Watchman (10.9% vs 19.0%) and easily met superiority. Again, another win on the scoreboard.

The first problem is that exactly zero patients can exclude bleeding from the invasive procedure, which could be bleeding from venous access or pericardial tamponade. Both are bad. The authors just exclude that.

The second problem is that CHAMPION AF is an unblinded trial wherein patients know the treatment assignment. Patients on oral anticoagulants are far more likely to complain of nonmajor bleeding like bruising, nose bleeds or gum bleeds. This was well demonstrated in the OPTION trial, wherein the authors listed all nonmajor bleeds in the supplement. CHAMPION AF did not list these bleeds.

The Secondary Safety Endpoint is Actually the Proper Safety Endpoint but its Statistical Test was Incorrect

The secondary safety endpoint counted all major bleeds including the procedure. It was 83 in the Watchman arm and 87 in device arm. The HR was 0.92; 95% CI 0.68-1..24). This easily met noninferiority but it was not tested for superiority. So, there it sits, another win on the scoreboard.

The problem is that the standard for noninferiority trials is to test the safety endpoint with superiority. The reason for this is that if you are giving up some efficacy, the new treatment should offer something important, such as safety. In this case, when bleeding is counted properly, the device is clearly not superior.

Looking at the Actual Numbers is Even More Sobering

The authors provide a “net benefit” analysis where strokes are placed against bleeds. Their calculation is positive but that’s because they use nonprocedural bleeding.

Let me show you what I feel is a fairer way to assess net benefit: Stroke rates were higher in the watchman arm: 50 vs 33. Ischemic strokes were 45 vs 27. Hemorrhagic stroke was 5 in each arm. Major bleeds were lower in the Watchman arm but there was a difference of only 4 (83 vs 87).

A sober patient looking at these numbers would see that there are 17 more strokes and only 4 less bleeds. Since strokes are literally one of the worst outcomes a person can have (because of disability), this is not a good trade.

Placing CHAMPION AF in context with previous Watchman vs OAC trials.

The original trials of Watchman vs Warfarin were essentially negative for Watchman. PROTECT did not pass FDA muster; PREVAIL found Watchman not noninferior to warfarin (due to increased strokes) in the co-primary endpoint of stroke, systemic embolism and CV death. In November, I reported on the German CLOSURE AF trial, which compared left atrial appendage closure with Watchman (and other devices) vs best medical therapy. CLOSURE AF found that LAAC was not only not noninferior but inferior to best medical therapy.

In Bayesian terms, the prior evidence for LAAC is extremely pessimistic. If CHAMPION AF was to change practice, it would have had to been an utter grand slam. I hope to have shown you that it was not. In fact, when you look at the actual data, and the way it should have been analyzed, it also was a negative trial.

Theme at ACC

Proponents of Watchman said in the main presentation and on social media, that CHAMPION AF results can be discussed with patients who may choose to have left atrial appendage closure. They use the term shared decision making.

I would argue that if we use shared decision making with patients for this decision then shared decision making is dead.

Why? Because the the prior data clearly show that left atrial appendage is worse than anticoagulation. CHAMPION AF does nothing to change that, and I don’t think we should ever offer inferior procedures. Despite what’s on the scoreboard, Watchman is clearly worse than direct acting oral anticoagulation.

Possible Explanation of Higher Stroke Rates and Similar Bleeding Rates

In Table s12 of the supplement, the authors show that at 4 months post procedure, 21% of patients in the Watchman arm had a peri-device leak. Any leak is bad because it increases the risk of clot forming and increases stroke rate.

Worse, though, is that device-related thrombus was seen in 4.8% of patients. That, too, is bad because it increases the risk of stroke, and warrants taking oral anticoagulation—which then negates the purpose of the device.

CHAMPION AF authors were critical of the German teams who did CLOSURE AF because that trial reported a much higher procedural complication rate. Yet, I don’t find these numbers on peri-device leaks reassuring at all.

Spin is Coming

Boston Scientific reps brought lunch to our office last week. They had flyers made about CHAMPION AF. They did not tell the results but it was obvious the trial was positive from the enthusiasm. One of the comments on the flyer was “Is your practice ready?”

There will be an extreme push to use these results to say Watchman can be an alternative to oral anticoagulation. Proponents will point to the scorecard they have in the NEJM. But as I have shown you, CHAMPION AF is clearly not a win for Watchman and does not come close to changing practice.

Other Links to Read:

I wrote a more detailed piece on Medscape for a doctor audience.
We also have coverage at Cardiology Trials Substack
My Twitter thread with 83K views.
When a win is not a win by Sanjay Kaul, MD
Perhaps the most enthusiastic Watchman proponent, Christopher Ellis, MD with this sobering comment.

James H. Stein, MD

Mar 30

PS. I have a post that will publish in a few weeks about shared decision-making and the information asymmetry between patients and doctors. The main point is that the phrase “shared decision-making” often is a more palatable way of telling patients what the physicians already has decided. I don't say that cynically. It's just what happens when the phrase gets tossed around and put in guidelines without thinking carefully about what it actually requires. I then discuss the SHARE approach and what sharing decisions really requires. Thanks, John for this post.

As our friend and brilliant methodologist Sanjay Kaul: commented elsewhere “Good trial design balances feasibility with rigor; when it’s designed to deliver a win, it signals more about clever protocol writing and trial design than breakthrough efficacy.” I am ashamed of the NEJM for publishing a paper on a study as flawed as this one - all the points you made, but especially, excluding protocol related bleeding, open label design, the excess of people with PAF who statistically did much better with OAC, and not required the authors to discuss the excess of strokes and the details of the non-inferiority bounds, as you laid out. Gregory M. Marcus's editorial was an insightful breath of fresh air and I highly recommend it.

21 more comments...

Sensible Medicine

Discussion about this post

Ready for more?