# Part 2 - The ELAN Trial Forces Doctors to Be Mature About Using Medical Evidence

### And I love it!

**If you have not read part 1 of this series, do that now. It is short. We will wait.**

NEJM published the ELAN trial in May of this year. Early vs late initiation of oral anticoagulation after an ischemic stroke due to atrial fibrillation. The primary outcome was a composite of lots of bad things—stroke, systemic embolism, intracranial bleeding, extracranial bleeding, and death due to cardiovascular causes.

**The Results**

A primary outcome occurred in 2.9% of the patients in the early arm vs 4.1% in the later treatment arm.

The absolute risk reduction was 1.2%. The relative risk reduction was 30%—expressed as an odds ratio of 0.70.

**The Normal Interpretation**

The normal procedure would tell us whether this lower rate in the early arm met *statistical significance*.

Statistical significance requires there to be a hypothesis. Most medical studies use a frequentist approach of testing a null hypothesis (H0). In this case, the H0 would be that there was no difference in the two strategies in the rate of the primary outcome.

Statistical tests would then measure the surprise value of the observed data given this assumption of no difference. It’s simplified; but this is the p-value.

If the p-value calculated to be low (less than 0.05, or very surprising) then the strategy of early initiation would be declared better than later starting.

It would likely be written into guidelines and established as practice.

Another way of codifying differences in two treatment arms is to provide point estimates of an absolute or relative risk difference with 95% confidence intervals.

And if one treatment arm had a point estimate and its upper bound of the confidence interval was less than 1.0 (no difference), then we would say the reduction was significant and that arm was superior.

In non-inferiority trials, if the upper bound of the 95% confidence interval fell below the accepted non-inferiority margin, then we would say that treatment was non-inferior to the established treatment.

In ELAN,

The absolute risk difference was 1.2% and the 95% confidence interval [CI] ranged from −2.8% to 0.5%.

The relative risk difference expressed as an odds ratio was 0.70 and 95% confidence intervals ranged from 0.44 to 1.14.

The authors conclusion:

In this trial, the incidence of recurrent ischemic stroke, systemic embolism, major extracranial bleeding, symptomatic intracranial hemorrhage, or vascular death at 30 days was estimated to range from 2.8 percentage points lower to 0.5 percentage points higher (based on the 95% confidence interval) with early than with later use of DOACs.

This stands out, doesn’t it?

They did not declare “no difference.” They did not do a statistical test.

They simply said the effect of early vs later starting of oral anticoagulation ranges from very good (nearly 3% lower) to a little worse (0.5% higher).

**My thoughts as a consumer of medical evidence**

I love this approach. There is no faux certainty of declaring a winner or loser based on a statistical test. If ELAN were presented in the normal way, we would be told there was “no difference” in early vs later use of anticoagulation.

But that would be unfair. Because the point estimate of 0.70 indicates a 30% reduction of bad outcomes with early anticoagulation. The confidence interval around that risk reduction goes as low as 0.44, so it might be as large as a 56% reduction. But the confidence interval also goes to 1.14, indicating that that early initiation could increase the rate of bad outcomes by 14%.

There is uncertainty. Let me repeat: there is uncertainty.

Nearly all medical studies have uncertainty. But the normal procedure has been to dichotomize them into “positive’ or “negative” studies based on reaching a statistical threshold. Doing such belies the inherent uncertainty.

The authors defend their approach in their rationale/protocol paper. These four statements made me tingle with delight.

*First, when we designed the trial, there was a lack of high-quality data on event rates in this setting, making it difficult to identify an appropriate non-inferiority margin.**Second, the assumed low event rate would require a very large trial to assess either superiority or non-inferiority and this would not necessarily provide greater clarity concerning patient management.**Although we propose a different analytic approach to that often seen in clinical trials, this should not hinder interpretation of trial data or their clinical utility.**We also believe that the complexity of managing patients with AF early on after [stroke] precludes simplified dichotomous decision-making and necessitates some leeway for individual decision-making.*

ELAN authors ask us to be mature rational decision makers. They want us to be doctors. To use judgement.

They have randomized 2000 patients to two strategies. They have provided the results with confidence intervals and not proclaimed a winner. They force us to embrace probability.

Guideline writers cannot put ELAN results into those oversimplified colored boxes—that then become quality measures—aka, bludgeons with which to turn doctors into robots.

Imagine a scenario where you had a young patient after a stroke from AF. He had no significant bleeding risk factors. You would use ELAN results to start early anticoagulation.

Conversely, if you had an older patient with multiple risk factors for bleeding (say, falls, kidney disease, or the need to take aspirin), you might defer to later anticoagulation.

But in the world of dichotomous trial results, which soon get transformed into “guideline-directed-therapy,” doctors are nudged into robotic mode. We are told to use all these medicines because that is what “evidence-based-medicine” says to do.

But most trials are actually similar to ELAN. Similar in that—no matter the faux certainty of declaring a winner—most trials come with uncertainty. And it’s not just uncertainty about the signal-to-noise ratio or effect sizes reported in the trial, but uncertainty regarding how the results of the trial apply to the patient in front of you.

ELAN embraces the uncertainty. And that is why I love this approach. Medicine exudes uncertainty. Medicine cannot be put neatly into colored boxes in guidelines.

Before I close, I want to highlight one potential negative of the no-statistical-hypothesis approach to trials.

We want to be careful to avoid unethical trials.

A trial is an experiment on humans. You want to enroll enough patients to provide a reasonable estimate. I would not want ELAN to give rise to under-powered trials.

Trial planners still need to think carefully about enrolling enough patients to sort out signal from noise.

JMM

**Post-Script Notes: **

*I hope those who have statistical and trial planning background leave comments and share this column. I (and our readers) learn a lot from these comments. *

*This week’s Sensible Medicine podcast features Andrew Foy from Penn State University. Andrew takes a slightly different approach to ELAN. Stay tuned for that coming tomorrow. *

*Finally, thank you for your tremendous support of Sensible Medicine. *

I think the approach taken in the ELAN trial is indeed an interesting one and showcases a less rigid approach to clinical trial interpretation. By not imposing a binary classification of "significant" or "not significant" based on p-values, the study acknowledges the underlying uncertainty that is inherent in any statistical analysis.

However, there are still considerations to keep in mind when interpreting the results of this study. Confidence intervals, while providing a range of potential outcomes, should not be misinterpreted as the full range of possible outcomes. They only capture 95% of potential outcomes assuming the data follows a specific distribution. This is a reminder that statistical models are simplifications of complex realities.

Furthermore, while embracing the uncertainty in trial results is laudable, it is crucial to ensure that trials are adequately powered to detect a clinically meaningful effect. A small sample size, while often necessitated by practical constraints, can increase the risk of a Type II error - failing to detect an effect when one truly exists. This could potentially limit the interpretability of the results and their applicability to wider patient populations.

Lastly, while p-values are often criticized for their misuse (p-hacking) and overreliance, they do play an important role in hypothesis testing and controlling the rate of false positive findings. It would be important to maintain a balanced perspective that incorporates both effect estimates and their precision (confidence intervals), as well as hypothesis tests (p-values) to inform clinical decisions.

I have less and less faith in guideline writers in recent years. And I am less charitable than Dr. Mandrola tends to be. I have a feeling things are not often on the up-and-up, and that some of it is in fact nefarious. The conflicts of interest these days are so large as to block out the sun.

The latest on HFpEF (ie SGLT for everyone), for example, is nauseating.

I would much prefer if guidelines highlighted those things that are clearly “settled” (if I may use that loaded word), and things that are clearly harmful, and leave everything else as “considerations”, rather than the current completely arbitrary and recipe-centric “classes”.