Part 2 - The ELAN Trial Forces Doctors to Be Mature About Using Medical Evidence
And I love it!
If you have not read part 1 of this series, do that now. It is short. We will wait.
NEJM published the ELAN trial in May of this year. Early vs late initiation of oral anticoagulation after an ischemic stroke due to atrial fibrillation. The primary outcome was a composite of lots of bad things—stroke, systemic embolism, intracranial bleeding, extracranial bleeding, and death due to cardiovascular causes.
The Results
A primary outcome occurred in 2.9% of the patients in the early arm vs 4.1% in the later treatment arm.
The absolute risk reduction was 1.2%. The relative risk reduction was 30%—expressed as an odds ratio of 0.70.
The Normal Interpretation
The normal procedure would tell us whether this lower rate in the early arm met statistical significance.
Statistical significance requires there to be a hypothesis. Most medical studies use a frequentist approach of testing a null hypothesis (H0). In this case, the H0 would be that there was no difference in the two strategies in the rate of the primary outcome.
Statistical tests would then measure the surprise value of the observed data given this assumption of no difference. It’s simplified; but this is the p-value.
If the p-value calculated to be low (less than 0.05, or very surprising) then the strategy of early initiation would be declared better than later starting.
It would likely be written into guidelines and established as practice.
Another way of codifying differences in two treatment arms is to provide point estimates of an absolute or relative risk difference with 95% confidence intervals.
And if one treatment arm had a point estimate and its upper bound of the confidence interval was less than 1.0 (no difference), then we would say the reduction was significant and that arm was superior.
In non-inferiority trials, if the upper bound of the 95% confidence interval fell below the accepted non-inferiority margin, then we would say that treatment was non-inferior to the established treatment.
In ELAN,
The absolute risk difference was 1.2% and the 95% confidence interval [CI] ranged from −2.8% to 0.5%.
The relative risk difference expressed as an odds ratio was 0.70 and 95% confidence intervals ranged from 0.44 to 1.14.
The authors conclusion:
In this trial, the incidence of recurrent ischemic stroke, systemic embolism, major extracranial bleeding, symptomatic intracranial hemorrhage, or vascular death at 30 days was estimated to range from 2.8 percentage points lower to 0.5 percentage points higher (based on the 95% confidence interval) with early than with later use of DOACs.
This stands out, doesn’t it?
They did not declare “no difference.” They did not do a statistical test.
They simply said the effect of early vs later starting of oral anticoagulation ranges from very good (nearly 3% lower) to a little worse (0.5% higher).
My thoughts as a consumer of medical evidence
I love this approach. There is no faux certainty of declaring a winner or loser based on a statistical test. If ELAN were presented in the normal way, we would be told there was “no difference” in early vs later use of anticoagulation.
But that would be unfair. Because the point estimate of 0.70 indicates a 30% reduction of bad outcomes with early anticoagulation. The confidence interval around that risk reduction goes as low as 0.44, so it might be as large as a 56% reduction. But the confidence interval also goes to 1.14, indicating that that early initiation could increase the rate of bad outcomes by 14%.
There is uncertainty. Let me repeat: there is uncertainty.
Nearly all medical studies have uncertainty. But the normal procedure has been to dichotomize them into “positive’ or “negative” studies based on reaching a statistical threshold. Doing such belies the inherent uncertainty.
The authors defend their approach in their rationale/protocol paper. These four statements made me tingle with delight.
First, when we designed the trial, there was a lack of high-quality data on event rates in this setting, making it difficult to identify an appropriate non-inferiority margin.
Second, the assumed low event rate would require a very large trial to assess either superiority or non-inferiority and this would not necessarily provide greater clarity concerning patient management.
Although we propose a different analytic approach to that often seen in clinical trials, this should not hinder interpretation of trial data or their clinical utility.
We also believe that the complexity of managing patients with AF early on after [stroke] precludes simplified dichotomous decision-making and necessitates some leeway for individual decision-making.
ELAN authors ask us to be mature rational decision makers. They want us to be doctors. To use judgement.
They have randomized 2000 patients to two strategies. They have provided the results with confidence intervals and not proclaimed a winner. They force us to embrace probability.
Guideline writers cannot put ELAN results into those oversimplified colored boxes—that then become quality measures—aka, bludgeons with which to turn doctors into robots.
Imagine a scenario where you had a young patient after a stroke from AF. He had no significant bleeding risk factors. You would use ELAN results to start early anticoagulation.
Conversely, if you had an older patient with multiple risk factors for bleeding (say, falls, kidney disease, or the need to take aspirin), you might defer to later anticoagulation.
But in the world of dichotomous trial results, which soon get transformed into “guideline-directed-therapy,” doctors are nudged into robotic mode. We are told to use all these medicines because that is what “evidence-based-medicine” says to do.
But most trials are actually similar to ELAN. Similar in that—no matter the faux certainty of declaring a winner—most trials come with uncertainty. And it’s not just uncertainty about the signal-to-noise ratio or effect sizes reported in the trial, but uncertainty regarding how the results of the trial apply to the patient in front of you.
ELAN embraces the uncertainty. And that is why I love this approach. Medicine exudes uncertainty. Medicine cannot be put neatly into colored boxes in guidelines.
Before I close, I want to highlight one potential negative of the no-statistical-hypothesis approach to trials.
We want to be careful to avoid unethical trials.
A trial is an experiment on humans. You want to enroll enough patients to provide a reasonable estimate. I would not want ELAN to give rise to under-powered trials.
Trial planners still need to think carefully about enrolling enough patients to sort out signal from noise.
JMM
Post-Script Notes:
I hope those who have statistical and trial planning background leave comments and share this column. I (and our readers) learn a lot from these comments.
This week’s Sensible Medicine podcast features Andrew Foy from Penn State University. Andrew takes a slightly different approach to ELAN. Stay tuned for that coming tomorrow.
Finally, thank you for your tremendous support of Sensible Medicine.
"Judgment" easily turns into "voodoo".
And voodoo draped in "numbers" is worse that proclaiming "expertise" by putting on a white coat and hanging a stethoscope around your neck.
There is uncertainty.
Point estimates are misleading. I don't think they should ever be given. It should always be a range, aka confidence or credibility interval.
Abandon 95% (~ 2 sigma) and replace it with 99.7% (3 sigma). This will force medicine to come to grips with its lack of substantive knowledge.
There is a difference between the returns of 100 people going to casino with the same system that works 95% of the time, and one person going for 100 consecutive days (unless he goes bankrupt before he makes it 100 days). The former is like a population study, the later is treating an individual patient.
Human beings, without training and experience, have a hard time distinguishing 1 in million from 1 in 1000 from 1 in a 100. We manage to simultaneously overestimate and underestimate.
Of course, you have to use judgment.
I will repeat my comment from part 1 that a picture is worth a thousand numbers:
https://postimg.cc/jCFv3C9K
The x-axis is the actual rate of a bad outcome, and the curves show the probability of that value, given the results of the study, with a flat input prior: blue is early administration, orange is late.
If I was shown these curves as a patient, I would unhesitatingly ask for the blue protocol, unless my physician could clearly articulate a reason for choosing the orange protocol.
Wouldn't you?