Discussion about this post

User's avatar
Frank Harrell's avatar

Absence of evidence is not evidence for absence strikes again. There is a reason that so many cardiovascular trials that use a lowest-information binary endpoint require 6,000-10,000 patients: They need to have 600 events to be able to nail down the treatment benefit. A study with 80 events on the lowest power binary endpoint of recurrent stroke, which ignores stroke severity, hospitalization required, re-recurrence of stroke, etc. is at a tremendous disadvantage from the start.

The [0.64, 1.55] 0.95 confidence interval for the hazard ratio means that the data are consistent with up to a 36% reduction or a 55% increase in instantaneous rate of recurrent stroke with anticoagulant. The information is likely sufficient for deciding to stop for futility, i.e., to be fairly certain that going to planned completion at the planned inadequate sample size would not lead to success, if success were defined as a demonstration of efficacy through statistical "significance". That is a far cry from concluding that there is evidence for lack of efficacy.

The study had a pre-specified margin of 0.6 for an efficacy threshold. In other words the investigators in their collective wisdom are saying that a 39% reduction in instantaneous risk of recurrent stroke should have been clinically irrelevant. If 0.6 were to be widely accepted by disinterested experts and patients then we perhaps know enough. But we need a more direct analysis to quantify the evidence. We need the Bayesian posterior probability that the hazard ratio is greater than x where x is the minimum clinically effective treatment effect as specified by external experts/patients.

This study is one of countless examples where keeping the study going until sufficient evidence is collected for a conclusion about the effect size, and running the trial as a Bayesian sequential design with no planned sample size would have resulted in far more useful information. One could even do a Bayesian futility analysis to possibly stop earlier than traditional frequentist futility analysis, e.g, when Pr(HR > 0.9) > 0.95, where 0.9 is replaced with whatever clinical threshold is relevant. This design will allow one to conclude that the treatment didn't work, unlike stopping for futility about H0:HR=1 with a wide confidence interval.

Expand full comment
Lois Lassiter's avatar

'Negative' data is STILL data....when did we forget this?

Expand full comment
6 more comments...

No posts