A Note On Criticism Of Yesterday's Study of the Week
Experts in trial analysis and statistics objected to my review of the ARCADIA Trial. This is a great teaching point.
Yesterday, someone made an error on the Internet. And I think it was me.
Here is a link to my take of the ARCADIA trial. In brief, the trial compared apixaban vs aspirin in patients who had had a stroke of unknown source and evidence of atrial cardiopathy.
The trial was stopped early for futility. Recurrent stroke (the primary endpoint) occurred in 40 patients in each group. The hazard ratio was 1.00.
I interpreted this in a similar way as the authors: as a negative study. Apixaban did not reduce the rate of recurrent stroke over aspirin in these patients.
My mistake, most clearly made by Dr. Frank Harrell, a statistical expert from Vanderbilt, was that I wrongly equated the absence of evidence with evidence of absence. His full criticism is in the comments of yesterday’s post.
My understanding of Professor Harrell’s point is that the trial was underpowered, and we cannot be sure that there was no difference in stroke rates. IOW: we cannot say that ARCADIA provided evidence of absence of no benefit from apixaban.
The first clue—one that I noticed but dismissed—was the confidence intervals around the hazard ratio of 1.00. These were 0.64-1.55. I hesitate to write the next sentence because the probability that I get the wording exactly correct is very low. But suffice to say that confidence intervals allowed for a 36% reduction in stroke with apixaban and up to a 55% increase in stroke with apixaban. Translation: these are wide and imprecise estimates.
The next question is why were the estimates so wide. The answer is that there were not enough primary outcome events because too few patients were enrolled.
Trialists have to make estimates before a trial as to how many patients to enroll. The main variables that go into this estimate are the expected event rates in the respective arms of the study, as well as the minimally clinically important difference you would like to detect (MCID).
The authors estimated a 7% annual stroke rate in patients with atrial cardiopathy who had just had a stroke. They used a previous trial of apixaban vs asa (AVERROES), which found a 55% reduction in stroke with apixaban, to declare 40% reduction as their minimal clinically important difference. More translation: they “powered” ARCADIA to find a very large effect size.
You can use a rough sample size calculator to estimate how many patients this would require. They got to about 1100 patients.
But. But. There is another factor in these calculations that does not appear in any calculator. And that is pragmatism. Before I started reviewing trials, I did not know about pragmatism in the design of trials.
What I mean by pragmatism is that trials cost money and effort. So, if you try to detect smaller differences in the two groups, you need many more patients. Precision, like money, does not grow on trees.
Dr Sanjay Kaul calls this sample size samba. A dance (or tension) between getting enough patients enrolled to detect differences but doing so within the pragmatic budget constraints.
Many cardiology trials are funded by industry. Industry has plenty of money. These tend to be mega trials that can detect small differences between groups. (Look at the FOURIER trial of evolocumab vs placebo…N ≈ 27,000)
I played around with the online sample size calculator and entered a 25% reduction in stroke with apixaban in ARCADIA. This would have required triple the number of patients. Well, that would have required a lot more funding. ARCADIA was not funded primarily by industry.
Professor Harrell noted that the problem with ARCADIA, and really, many trials in cardiology, is that if you declare a minimal clinically important difference of a 40% reduction, you ignore a 39% reduction in stroke.
Testing Atrial Cardiopathy: I want to offer one rebuttal regarding the specifics of ARCADIA. This trial was not only a comparison of two drugs in reducing a binary outcome of stroke. ARCADIA was also about testing the concept of atrial cardiopathy as a screening tool for high stroke risk.
The authors estimated that their novel idea of atrial structural disease would significantly influence stroke rates. It did not. That may have been because the criteria for atrial disease (ECG, biomarker, Echo) were not stringent enough. Or it might be that atrial cardiopathy is just not that strong of a causative factor in recurrent stroke.
There is more to learn about AF and stroke.
I write this post to alert readers to an important teaching point about uncertainty in trial estimates. I noted the confidence intervals but perhaps I was distracted by the hazard ratio of 1.00 and the exact same number of events in each group.
Being wrong is a great teaching tool. I am grateful for the interaction that Sensible Medicine brings out from real experts. I learned a lot.
One fear I have though is that if we apply Professor Harrell’s strict criteria, we will find that many trials have high degrees of uncertainty. Confidence intervals in our trials are often wide.
That’s ok. Because rare are the things that we should be certain about in Medicine. It’s one of the reasons that I do not love those colored boxes in guideline statements.
In his comment, Dr. Harrell offers some solutions to resolving the tension between not having a trial or having an imperfect trial. Please do read his entire comment.
Sensible Medicine is a reader-supported publication. To receive new posts and support our work, consider becoming a free or paid subscriber.