Recently Sensible Medicine contributor, cardiologist and researcher, Dr. John Mandrola stepped on a bee-hive. He recounted the case of hormone replacement therapy, which was widely recommended for post-menopausal women based on pathophysiology and observational studies. Later, the Women’s Health Initiative (WHI) randomized trial found negative effects of this strategy— with increases in cardiovascular events. The adoption and withdrawal of HRT became a seminal example of a widespread medical practice contradicted by a randomized trial, or so, Dr. Mandrola’s story went.
Readers were stimulated by Dr. Mandrola’s post, and two have written thoughtful responses and rejoinders. First, Dr. Chandrasekar Gopalkrishnan from Harvard further develops the concept of target trials. He argues that a target trial framework—that asks the same causal question as WHI— does in fact perfectly predict the results of the randomized trial. Among post menopausal women asked to continue or start hormone therapy vs. those asked to refrain from taking or stop hormone therapy, what happens? The two methods— one observational and one randomized— appear to be in alignment and disputes occurred because the original studies were asking different questions.
Dave Allely from Mt Sinai develops this argument from a different vantage. He argues that the Women’s Health Initiative asks a question that does not apply to the vast majority of women who might with to take hormone therapy in the perimenopausal period— WHI focused on older, post menopausal women. He further argues against several conclusions of WHI, and while he agrees with Dr. Mandrola’s general message, he laments Dr. Mandrola’s use of this example.
Here is my take. I remain optimistic about target trials, but to me the proof is in the pudding. Take 100 ongoing RCTs and predict the results with target observational work, and let us see the concordance. Recently, the FDA commissioned a pilot project: RCT-DUPLICATE to do just this. Sadly, the result was suboptimal. Less than 50% of RCTs could be reliably emulated or replicated with this design. More work is needed.
Mr. Allely’s points are well taken. My two cents is only that no RCT can show a therapy does not work under all possible circumstances, and indeed, HRT might benefit women in a different stage of menopause or at younger ages or if given for shorter duration. But what I want to see is randomized data showing this any of these claims are true. Pick younger women and randomize them, set a fixed duration, and show improvements in survival or quality of life or both. The Danish study Mr Allely discusses comes close to this, though I want to see a pre-specified power calculation and registered protocol, and replication efforts. For trials assessing quality of life, we have to think carefully about our control— to minimize inadvertently capturing a placebo effect. A mildly active placebo strategy might be optimal (beyond the scope of this article).
Finally, the post concludes with comments from the man who started this all, Dr. John Mandrola. And this is all in the spirit of Sensible Medicine— a small place on earth where we can talk and debate issues.
Vinay Prasad MD MPH
Causal inference using observational data and the target trial framework
Chandrasekar Gopalakrishnan, MD, MPH
Dr. Mandrola rightly highlights the perils of inferring causation from observational data. The example he picks is also damning because it may be the biggest embarrassment to the observational data enterprise. It was a bit of coincidence that when I read Dr. Mandrola’s piece on Monday morning, I was preparing to give a lecture to students at Boston University on causal inference from observational data and the target trial framework. No prizes for guessing which example I highlighted as bad observational research.
I would like to add a bit more detail to Dr. Mandrola’s piece on why so many studies got the answer so horribly wrong. Dr. Mandrola’s major criticism of observational research is the lack of random assignment of treatment leading to groups that are not similar. In causal inference language we would call this confounding by indication and there we cannot assume the treatment groups are “exchangeable”.
Of course, this is a major concern for all observational studies however with expert knowledge of the possible confounders and high-quality data, one can reasonably assume conditional exchangeability (conditional on adjusting for the confounders) – i.e that the treatment groups are comparable after adjusting for confounders. However, what happened with the hormone therapy example that led to such a stunning “medical reversal” across many studies is not because they didn’t control for confounding, but because so many did not even ask the right causal question!
Many made the fundamental but fatal errors of not defining time zero (which by the way would be guaranteed in a randomized trial) and not conceptualizing their study question as one that could hypothetically be answered in a randomized trial (One of the key ideas of the target trial framework). Below was a slide I presented to students on defining time zero.
Perhaps the most high-profile observational analysis on this question was published by Grodstein et al. in the New England journal of medicine using the Nurses’ health study (see below). Of course, their effect estimates shown below (RR=0.4 showing a 60% reduction in CVD) are stunningly wrong like so many other observational studies but focus on how they define their exposure groups.
They compared current users to never users. Can you think of how you can randomize someone to “currently” use a medication? YOU CANT!! The Women’s health initiative (WHI) RCT published a few years later naturally compared women “initiated” HRT to those who didn’t (with random assignment of course) and found a diametrically opposed Hazard Ratio of 1.29, a 29% increased risk of CVD amongst the hormone therapy users.
This type of reversal cannot be attributed to residual confounding across so many studies. The observational studies were not even in the same ballpark because they did not ask the same causal question. When Hernan and colleagues re-analyzed the data mimicking the question asked by the trial, they found very similar results to the trial and were able to replicate the increased risk seen soon after randomization in the WHI trial. (see below)
Explicitly designing and analyzing our observational study to emulate the hypothetical target (randomized) trial we would have conducted can go a long way in avoiding serious errors when trying to infer causation from observational data. This is the key idea of the target trial framework - It clarifies the causal question and the target estimate.
The target trial framework has made a significant impact on the way observational studies are being conducted and I think over time will improve the quality of observational research. However, it is important to recognize that simply using this framework is not a get out of jail card to avoid the age old problems of confounding, selection or information bias.
Dr. Mandrola’s advice to beware of non-randomized comparisons is well taken and sound advice to consumers of the vast amounts of observational research published - Because of the perverse incentives we have to publish in medicine and public health, the average observational study is likely going to be pretty awful. But just as we critically read published RCTs to search for possible biases (yes, they exist there too!), many of us who conduct observational analyses for a living also believe we can judge an observational study on its merits and separate the wheat from the chaff.
Hormone replacement therapy has been widely misinterpreted
David Allely, 2nd Year Medical Student Mt Sinai
I didn’t think I would be writing anything more about hormone replacement therapy (HRT) for quite some time, but here we are. In a recent post on Sensible Medicine’s substack, Dr. John Mandrola used the topic of HRT to make an important point. His basic message was we need to be careful how we interpret data, especially in the setting of observational studies. Even consistent trends in many observational studies can be an artifact of confounding. I couldn’t agree more with this, but I think the choice of an example was unfortunate.
The conclusion of Dr Mandrola’s post leans on the Women’s Health Initiative (WHI), which I am decidedly critical of (see here my previous post about this study). The numbers that were pulled from the study were used to claim that per 10,000 person-years, the use of HRT in post-menopausal women was associated with 8 additional coronary heart disease (CHD) events, 8 more pulmonary embolisms (PEs), 8 more strokes, and 8 more cases of invasive breast cancer.
These numbers were then added together to get 32 events per 10,000 person years, which the author of the post used to analyze the harm writ large to a population in which 15 million post-menopausal women were taking HRT. The conclusion: “In one year alone, HRT led to nearly 50k women being harmed” (emphasis mine). I think that this is a difficult conclusion to defend. In my mind there are two major problems with the assumptions needed to make such a conclusion:
1) The population studied by the WHI was far from representative of the general population of women taking HRT
2) The findings of the WHI linking harm to the use of HRT have since been shown to be ambiguous and insufficient to justify the national panic that led to approximately a 50% reduction in its’ use.
Was the WHI population representative of women taking HRT?
First and foremost, even if we accept these 32 events at face value, extrapolating the WHI results to the population of women taking HRT is a dubious prospect. The average age of the women in this study was 63 years. That is over a decade beyond the average age of onset of menopause.
The average BMI of women in the study was 28.5 (overweight), just over a third of the women were obese. Half of the women were past or current smokers. Over a third were either treated for hypertension or had a blood pressure greater than 140/90 mmHg. Simply put, this was not a healthy population of women. Women are typically prescribed HRT to alleviate vasomotor symptoms, which occur around the onset of menopause. A representative average age would have been 51 years. Just under 2/3rds of the women in this study were over 10 years post-menopausal, about 1/4 of the women were over 20 years post-menopausal.
What does the WHI and other research tell us about risks of HRT?
Now let’s get back to that number 32. There were four types of events added to get there, each with 8 per 10,000 person years. I will group pulmonary embolism (PE) with venous thromboembolism (VTE), which includes both PE and DVT (deep vein thrombosis).
Coronary heart disease
As was pointed out in the Sensible Medicine post, decades of observational data suggest a benefit from HRT on cardiovascular health.
The increased risk of cardiac events in the WHI treatment group was largely attributable to women that were 20+ years post-menopause when they enrolled in the study. (Check out the figures in that study I just linked above if you want to appreciate how large the impact of advanced age was on the original WHI conclusions).
In 2006, a meta-analysis of twenty-three RCTs w/ a total of just over 39,000 women was conducted. They found a 30% decreased incidence of heart attacks and cardiac deaths among young post-menopausal women taking estrogen alone or HRT. Young women were defined as less than 10 years post-menopausal.
Note: In older women (greater than 10 years post-menopause), events increased for the first year, then subsequently decreased by two years of HRT. This is consistent with the hypothesis that suggests that initiation of HRT in patients with established atherosclerotic disease (read: older women) transiently increases risk of events, after which the cardioprotective effect dominates. This pathophysiology is its own rabbit hole, but the basic idea is that the vasodilatory effect of estrogen can disrupt existing plaques in women with established lesions.
An RCT of 1,006 Danish women was conducted, women were randomized to either placebo or HRT (or ERT if s/p hysterectomy). In 2002, after 10 years of follow-up, the treatment groups had a 50% reduction in the incidence of acute cardiac events. There was no increased risk seen of breast cancer or stroke. Of note, these women were on average about 7 months post-menopause, a much more representative population by age.
The second arm of the WHI trial (ERT for women s/p hysterectomy) never showed any increased risk of CHD.
The statistically significant increased risk seen in the 2002 analysis went away for all subsequent years:
Stroke/VTE
The WHI group found no increase in any kind of serious stroke that led to incapacitation or death. The definition that was used for stroke included transient, “subtle neurological deficits” that resolved in a day or two without sequelae.
Results from a 2015 Cochrane study: “Those who started hormone therapy less than 10 years after the menopause had lower mortality and coronary heart disease, though they were still at increased risk of venous thromboembolism compared to placebo or no treatment. There was no strong evidence of effect on risk of stroke in this group. In those who started treatment more than 10 years after the menopause there was high quality evidence that it had little effect on death or coronary heart disease between groups but there was an increased risk of stroke and venous thromboembolism. *Note* This review includes the WHI data.
Breast CA - I will refer to my previous post on this topic. The data behind HRT increasing risk of breast cancer is weak. ERT alone had no such risk. The WHI was halted in 2002 after they found a non-statistically significant increased risk, such that for every 238 women given HRT for five years, one would develop invasive breast cancer that otherwise wouldn’t have.
A 2003 update of the cohort found the RR decreased to 1.24 (from 1.26), but this time was marginally significant. In a 2006 update of the cohort the risk was gone.
Final Thoughts
I completely agree with Dr. Mandrola’s point regarding the importance of conservative interpretation of data. RCTs provide a unique tool to increase our confidence when we infer causal links. Those inferences, like all others, should be limited by the quality of the data and its applicability to relevant populations. Such is the legacy of the WHI’s alarmism that the original 2002 warnings remain most salient, even though the post-mortem analysis has found it toothless. HRT is a useful therapy for many women, and as is true with any drug, it carries risk. Two decades later it seems that the benefits of HRT are best realized within 10 years from the onset of menopause, maybe ideally immediately after the onset. The WHI was a bad study, and I hope we are now reaching a point where the fear it created for countless women and clinicians has begun to lose its grip on the practice of medicine.
Hi everyone, It is John.
This column is already 2400 words, so I will be brief.
I first want to say how over-the-moon happy I am to have these cogent wise critiques. I learned from both of them. Damn it…This is our goal here at Sensible Medicine: education via free and civil debate.
Thank you Dr Gopalakrishnan and (soon-to-be) Dr. Allely.
Dr. Gopalakrishnan is an expert in causal inference and he rightly points out that observational studies come in different varieties—some better than others. His lesson also reminded me that confounding isn’t the only problem with observational research. We also need to be aware of defining time zero. Thanks Dr. Gopalakrishnan.
The two remaining problems, of course are a) the vast majority of observational research is not done by people like Dr. G, and b) as Vinay points out in the lead, the FDA commissioned a pilot project called RCT-DUPLICATE that found that only half the RCTs could be reliably emulated with observational techniques. And that is the existential issue: it’s hard to know which emulations are right and which are wrong.
I remain open-minded and I often read Miguel Hernan’s free textbook. I consider myself a beginner student in causal inference. And yes, of course, RCTs have oodles of flaws too.
Dave the Knave’s comments are equally educational. He highlights the notion of external validity of a trial—for whom do these results apply? Translation of trial results is one of the most important jobs of a clinician.
He is absolutely correct that the WHI study included older women well past menopause. And that these results likely do not apply to younger women.
My friends at Penn State, Matt Nudy and Andrew Foy, have published a systematic review finding that
Younger initiation of HRT may be effective in reducing death and cardiac events. However, younger HRT initiators remained at an increased risk of stroke, TIA and systemic embolism and this risk increased as average age increased. Younger menopausal women using HRT to treat vasomotor symptoms do not appear to be at an increased risk of dying or experiencing CHD events.
Dave is a medical student so I assume he was not around to witness the practice patterns in the 1990s. I was. It was common to use HRT, not to treat symptoms, but to prevent CV disease in older women. Sure, we know now that there are heterogenous treatment effects based on age, but we did not know that then.
My point in highlighting WHI is not that it is the perfect or definitive study. In fact, nearly all studies should have expiration dates.
My point was merely to show that randomization was what was required to overturn a potentially harmful therapeutic fashion built on plausibility and weak observational data.
less pills, and life is less complicated.
in 2013 I moved both of my parents, age 88 closer to my home after mom's stroke
I then, took them to my wonderful doctor Dr Fletcher, and we worked to get mom off of half of her meds that were piled on her by various doctors. Then, I took dad to Dr Ujevic and did the same.
Mom lived to 2016 and died of natural causes. Daddy is 98 and still going steady.
I have learned a lot over the years from taking care of my peeps
For me, the best part of this post was Dr. Mandrola's suggestion that all studies should have an expiration date. Am always astonished that many relatively ancient RCTs and analyses are still widely cited, and are staunchly believed to be the final law-giver in this or that domain. No matter that experts like Dr. Mandrola and Dr. Prasad frequently point out that reversals of medical protocols occur regularly. Wish that there were more - would be an indicator of an alert and ever-curious medical sector.
Most readers are probably too young to remember the super-hyped 'Feminine Forever' marketing wave. It shocked me then, and still resonates. My own view on HRT is simply that no potent treatment should ever be given anyone UNLESS the omission of that treatment clearly puts the patient in serious danger. But of course, as with liposuctions, botox, cosmetic surgery, and so many other elective interventions, it seems that societal pressure and marketing prowess rule the day.