Posting this article was a no brainer. It is a critical appraisal deep dive by “friend of the stack” David Rind that combines the “letter to the editor” feel that we like so much as well some nice EBM history. Enjoy.
Adam Cifu
Imagine performing a clinical trial of a drug for heart disease. You study the drug versus placebo, and it reduces mortality by 35%. In your studied population, 60% of the people have brown eyes and 40% have blue eyes. You perform a subgroup analysis and find that in the people with brown eyes, the drug reduces mortality by 34%; there is no pre-planned subgroup analysis for patients with blue eyes. What do you conclude about prescribing your drug for people with heart disease and blue eyes?
When I was responsible for evidence grading at an electronic medical resource, I got a first-hand look at the different levels of evidence for practices in various medical specialties as well as at the differing expectations for proof among specialists. I think most would agree that Internal Medicine specialties have been more focused on understanding clinical epidemiology while other areas of medicine have lagged behind. A dermatology editor looking at an unblinded trial of duct tape for warts in 51 children might say, “But we almost never have RCTs in dermatology – these are amazing data!” Emergency physicians looking at giving systemic steroids to children with sore throat might conclude the practice is safe based on a small trial that lacked the power to detect harms occurring in as many as 10% of the patients.
At the other end of the evidence spectrum, far removed from even other internal medicine subspecialists are the cardiologists. Responsible for treating the number one cause of death in the US, cardiologists were frequently burned by inadequate clinical trials up through the 1980s and were not going to make that error again. A cardiology editor might object to trusting a clinical trial showing a mortality benefit on the grounds that it “only” had 500 patients in the treatment arm and “only” three years of follow-up, things a neurologist might be willing to kill for.
One of the lessons supposedly burned into the brains of cardiologists is that of ISIS-2. ISIS-2, published in 1988 in the Lancet, was an enormous trial of aspirin and streptokinase, alone or in combination, in patients having a heart attack. The reviewers and editors requested a large number of subgroup analyses, and these showed larger than average effects in some subgroups and smaller or harmful effects in others. The authors insisted that, if these results were to be published, they be allowed to add an additional subgroup analysis by astrological sign. This analysis showed that, although beneficial across all patients, aspirin was harmful to patients born under Gemini or Libra.
Having trained in a specialty that demands high-quality clinical evidence is not the same thing as having trained as a specialist in clinical evidence – a clinical epidemiologist. When my co-authors and I concluded in a public report that a cardiology trial demonstrated a convincing benefit for a new treatment across all subgroups, truly expert and thoughtful cardiologists -- subspecialists that treat the condition in question, transthyretin amyloid cardiomyopathy (ATTR-CM) -- were unconvinced. Because I deal so well with my judgment being doubted, I feel compelled to justify the report’s conclusions here on Sensible Medicine.
ATTR-CM is a disease where amyoid deposition in the heart can lead to heart failure. A decade ago, cardiologists generally did not try to diagnose ATTR-CM since there was no disease-specific treatment.1[i] People were just treated for their heart failure.
This all changed in 2019 when a TTR stabilizing drug, tafamidis, was approved after demonstrating convincing clinical benefits, including a 30% reduction in all-cause mortality. Non-invasive methods were developed for diagnosing ATTR-CM and patients were diagnosed and treated much earlier in the course of the disease.
In 2024, another TTR stabilizer, acoramidis, showed clinical benefits in ATTR-CM, but was unable to demonstrate a mortality benefit. The population was much healthier than that in the earlier trial of tafamidis, since patients were now being diagnosed much sooner, making comparisons across the trials difficult.
Another way of treating ATTR-CM is to reduce TTR rather than stabilizing it. The RNA interference drug vutrisiran does just that and was studied in the HELIOS-B trial, published in August of this year. The trial included 655 patients and, overall, the population was even a bit healthier than the one studied for acoramidis. Additionally, 40% of the patients studied were already taking tafamidis.
The preplanned analyses looked at all patients randomized to vutrisiran or placebo and in the subgroup of patients not taking tafamidis. Although the timing and methods of the analyses are a little confusing, vutrisiran reduced mortality in all patients by 35%, and by 34% in the 60% of patients on vutrisiran monotherapy. Do those numbers remind you of anything?
In a post-hoc analysis, the reduction in mortality in those already receiving tafamidis was about 41%, though not statistically significant versus placebo in this subpopulation (that included fewer than half the total patients). The absolute reduction in all-cause mortality over about three years was more than 6%, and remember this is a fairly healthy population of people with ATTR-CM. How convinced are you that vutrisiran reduces mortality in people with blue ey… oops, people who are receiving tafamidis?
The lesson of ISIS-2 was supposed to be that, unless you have a very good reason to look at a subgroup analysis, you should apply the population estimate of effect to all subgroups. In clinical epidemiology we would describe this by saying, the a priori hypothesis is that the effect in the subgroup is the same as in the overall population. In biostatistics, this would be the null hypothesis. You have to prove a subgroup effect, not prove no subgroup effect.
Tafamidis is extremely expensive and vutrisiran will be even more expensive, such that the cost of combination therapy would be outrageous. These drugs are not a good value alone or in combination. But that should not affect our judgment of the evidence. The conclusion we came to was that we have high certainty that adding vutrisiran to the regimen of patients with ATTR-CM who are receiving tafamidis is substantially beneficial.
As mentioned above, the expert cardiology discussants and the panel at our meeting disagreed and felt a dedicated randomized trial is indicated. I think they are wrong, that we know the answer, and that patients should not be randomized to a treatment arm that is clearly inferior.
Not surprisingly, I’ve been asked about our conclusions by interested parties and by the press. A common misunderstanding is that we concluded that combination therapy is superior to monotherapy. We did not, though that may well turn out to be true. The randomized trial was of vutrisiran and some people in the population were on tafamidis. We can clearly see that there is no reason to view the tafamidis population as an important subgroup where vutrisiran has differential effects; there is no “effect modification” by tafamidis. The same is not true in the other direction. The quality of evidence for concluding that adding tafamidis to patients receiving vutrisiran monotherapy is much weaker, though, as mentioned, this could well turn out to be beneficial.
David Rind is an academic primary care physician at Beth Israel Deaconess Medical Center and the Chief Medical Officer for the Institute for Clinical and Economic Review. Prior to his work at ICER, he was Vice President of Editorial and Evidence-Based Medicine at UpToDate.
In a small number of younger patients, heart transplant was an option, and so cardiac biopsy would sometimes be performed to make the diagnosis.
I have three questions. 1, would a Baysian analysis say more about the subgroups? Particularly if one had reason to think that 2 drugs are better than one. 2, To what extent does 'Simpson paradox' enter into objections to this sort of subgroup analysis and to what extent would creating a hierarchical model help? 3, are t- tests relevant to subgroup analysis?
These are sincere questions. I'm trying to learn. References to read and learn would be great too!
Ignorant layperson's opinion: I suspect that this result would not hold up with the RCT that the cardiology discussants wanted. The price of the drugs would justify further study.
I am an engineer not a doctor though. We don't have behemoth insurance companies paying for everything.