Editors note: The writer—me—inverted results of one of the trials used as an example. This post now contains the proper results. Thanks to an astute reader.
The NEJM published this month a study on an important medical question that may upend the way we use medical evidence.
This has nothing to do with the specific medical question. It has everything to do with how we interpret the results.
First some background.
Medical studies have been like running races. A drug or device beats the standard of care. Or it does not.
But. Unlike a foot race, wherein you can (nearly always) see who wins, the judge of medical studies is statistics.
Did the new treatment reduce the bad outcome so much so that the difference meets a statistical threshold? Was the effect true signal and not noise?
This way of judging and declaring a winner creates a challenge for using these medical studies to treat patients.
Two examples explain the challenge of using statistics to judge science
In large studies, a tiny difference in outcomes—one that is not “clinically” significant can easily reach statististical significance.
For instance, in the FOURIER trial, the super-expensive PCSCK9 inhibitor, evolocumab, reduced a primary composite outcome (CV death, MI, Stroke, unstable angina, coronary revascularization) vs placebo.
The absolute risk decrease was just 1.5%, and it was driven by non-fatal outcomes. There was no difference in cardiovascular death or all-cause death. Yet, because trialists enrolled more than 27,000 patients, this tiny difference rendered a highly significant statistical test.
And the conclusions were that patients with heart disease benefit from this drug.
Contrast this with the THAPCA trial of cooling pediatric survivors of cardiac arrest.
In smaller studies, a large and potentially clinically significant difference may not reach statistical significance.
In THAPCA, trialists reported that meaningful survival at one year occurred in 20% of patients cooled to 33 degrees vs 12% in those kept at a normal body temperature 36.8 degrees.
That 8% better survival in absolute terms did not reach statistical significance. The conclusion therefore read that therapeutic hypothermia “did not confer a significant benefit in survival with a good functional outcome at 1 year.”
This massive improvement in survival was declared not different because there were too few patients randomized.
The statistical test suggested the possibility these results were not surprising enough given the (null hypothesis) assumption of no real difference. (I know; that is a wacky way to say that we cannot exclude the possibility of noise.)
The trial that breaks this dogmatic way of judging science is called the ELAN trial.
Swiss-led investigators asked the question of when to start oral anticoagulation drugs after a patient has a stroke due to a blocked blood vessel in the brain. We call this acute ischemic stroke or AIS.
The two choices are early (within days to a week) or later (about 2 weeks). Current practice now is to start later. So early initiation is the active arm.
The primary outcome is reasonable—recurrent stroke, blood clot elsewhere in the body (systemic embolism), bleeding outside the brain, bleeding in the brain or death due to blood vessel disease in the first 30 days. In other words—a composite of bad things that can occur from not treating (more clots) or treating (bleeding).
In multiple centers, slightly more than 2000 patients were randomized to either early or later starting of the anticoagulant drugs.
A primary outcome occurred in 2.9% of the patients in the early arm vs 4.1% in the later treatment arm.
The absolute risk reduction was 1.2%. The relative risk reduction was 30%—expressed as an odds ratio of 0.70.
The question for you is what to make of these results.
What I love about this study is that we don’t have to worry about dualities of interests. No one is making money from the results. It’s a matter of starting treatment early or later. It's a pure scientific question.
The ELAN authors provide an extremely provocative way to use these results.
Next Monday, on the Study-of-the-Week, I will write about this new approach. We will also discuss it on our podcast next weekend. Stay tuned—and think.
A brief word of thanks. We are shocked at the response this newsletter has achieved. We now measure views in the millions. Thank you x 1000.
I am a 70 yr old grandmother with no medical training but a longtime curiosity and I love your substack! I’ve learned so much and look forward to every post. Thank you for sharing your knowledge in such a thoughtful and accessible way.
We need to map some statistical guidelines such as these into LAWS that govern what the FDA and CDC can recommend or approve... Take away their ability to use arbitrary opinions and judgements (their behavior during the pandemic has proven to be especially bad along these lines)
with some appropriate requirement on the
BENEFIT -- absolute risk reduction over standard or best practice care measured in some improved QALY and cost metric
vs a comparable measurement of harm or increased risk from the studied treatment when necessary.