We Need Research that Answers Important Questions
Empagliflozin and Semaglutide and Oseltamivir, oh my!
I frequently get asked whether learning critical appraisal is still important. The argument is that with the complex ways that industry – pharma and device makers — manipulates trials, appraising trials is hopeless unless you have at least an MPH. I totally disagree with this sentiment. A little bit of common sense, an understanding of study design, and some basic stats can carry you a very long way.
Discussed here are three studies — two RCTs and one observational study — that are interesting critical appraisal challenges. Each study was perfectly done and easily passes traditional “critical appraisal muster.” The problem is that they answer the wrong questions and miss opportunities to truly move medical care forward.
This trial was a perfect study, a double-blind RCT of 5988 patients with class II–IV HF and EF > 40% — HFpEF. Patients were assigned to empagliflozin (10 mg QD) or placebo. The primary outcome was cardiovascular death or hospitalization for heart failure. The study was funded by Boehringer Ingelheim and Eli Lilly.
Empagliflozin is a selective sodium-glucose transporter-2 (SGLT2) inhibitor. SGLT2 is responsible for most of the reabsorption of glucose in the kidneys. Thus, SGLT2 inhibitors reduce glucose reabsorption and increase urinary glucose excretion. They also reduce sodium reabsorption.
SGLT2-Is are impressive drugs. They’ve proven effective for diabetes and preventing the progression of chronic kidney disease. Because they lead to an osmotic diuresis – we pee out water with the glucose – they improve edema and lower blood pressure. The main adverse effect is what you’d expect: if you’re loading up your urine with sugar, you are prone to UTIs. Some people are also troubled by the diuresis.
EMPEROR-Preserved was a positive study. Here are the results:
So, what is the problem? It is really a question of study design. The empagliflozin group had equal mortality but less hospitalization for HF. What else produces this outcome? Diuretics. Cheap, old-fashioned diuretics. Why was there no 3rd arm of patients randomized to a diuretic? Said another way, why was this not a comparative efficacy trial. (Hint, recall the funding source.)
This was another perfect trial that the NEJM must have been thrilled to publish.1 STEP TEENS was a double-blind, placebo-controlled trial that randomized adolescents with a BMI >/= 95th percentile (or >/= 85th percentile with at least one weight-related condition) to either semaglutide or placebo for 68 weeks, plus lifestyle intervention. The primary end point was the percentage change in BMI from baseline to week 68. The study was funded by, wait for it, Novo Nordisk.
I’m not even going to tell you the results of this study. Why, because we knew what it would be before the trial was run. Semaglutide (Ozempic, Rybelsus, Wegovy) or any of the GLP1-RAs, are incredibly effective at producing weight loss. There is no reason to think they wouldn’t work in teens. So what question should this study have addressed?
We could ask, do we even want to go down this road? Do we want to treat teens with a drug that helps them to lose weight only while they are on the drug. Adolescence is the time when we establish habits that last a lifetime. Maybe this is the time we should be doubling down on creating childhoods that lead to healthy adulthood.
If we want to use drugs, then the real question relates to adverse effects. I am not interested in side effects over 68 weeks. I am interested in side effects in kids who start these drugs at 13 and then, because we never addressed their unhealthy lifestyle, need to be on the drugs at 22, 32, 42... Thankfully, these drugs have seemed safe so far – I have written a lot of prescriptions for them — but what happens when the cumulative dose approaches that that was associated with medullary thyroid cancer in animal studies?
Imagine consenting kids and parents to a 10-20 year study designed specifically to detect adverse effects. You’d probably have trouble enrolling patients. Maybe that would tell us something.
OK, one more excellent study published in an excellent journal.
First a little background about oseltamivir (also known as Tamiflu). The drug is a neuraminidase inhibitor that decreases viral uptake and replication. It was approved in the US first in 1999.2 Subsequently, quite a bit was revealed about pharma malfeasance during the early data analysis.3 What we know now is that oseltamivir is modestly effective. Early clinical trials mostly studied outpatients presenting with an influenza like illness. In these patients, the drug shortens the symptoms of flu by about a day, decreases the rate of lower respiratory tract complications requiring antibiotics, and leads to fewer hospital admissions. It also is associated with nausea and vomiting.
This recent study sought to assess if treatment with oseltamivir, given as soon as possible in patients hospitalized with confirmed or suspected influenza, is effective. This is a completely reasonable question. It extends what we know about how the drug works with mild illness to a sicker population. Observational data had suggested that it was effective in this population.
So what was the problem? Well, this is a therapy question. Any first-year medical student knows that therapy questions are answered with RCTs. Observational trials can be incredibly helpful for defining prognosis or generating hypotheses. The problem with using them for therapy questions is that they mostly agree with the results of RCTs, about 75% of the time, but we don’t know which 25% are wrong. They also don’t really define the absolute benefit of interventions, making it hard to weigh harms and benefits to honestly counsel patients.
Instead of doing an RCT, the researchers did another observational study. This was a really good multicenter, prospective study. I spent way too much time looking at all the previous observational studies of this question, and I am pretty sure this is the best.
The results? As with previous studies, the early use of oseltamivir was associated with better outcomes. Those treated early had lower odds of intensive care unit admission (aOR: 0.24, 95% CI: 0.13–0.47) and in-hospital death (aOR: 0.36, 95% CI: 0.18–0.72).
These results are not surprising, and they do not change our management. We knew that observational data supported this intervention. Guidelines recommend this treatment. After this study, we are a little more confident in this practice, though I am still at a loss as to that actual magnitude of the benefit. It would have been enormously more useful if these 56 authors, at 24 hospitals, who worked on a 10-month prospective cohort study had done a 10-month RCT.
These three studies are well done, they took time and money and expertise, they were published in two of our best journals. They also do not move medicine forward in important ways. We need to incentivize our gifted researchers to go after the most important questions. This will mean publishing fewer studies (probably in fewer journals). It will mean shorter CVs. It will mean fewer projects for medical students (and college students, and high school students). We have published ideas about changing incentives to achieve this goal (see Part IV of Ending Medical Reversal).
One near term fix might be to hire people from outside the research team to weigh in on the proposed research. These “consultants” – I hate to use that word – need to be free of any COI. They need not comment on the ethics, or the statistics, but simply whether we need to do this study, and if not, what study could be done that would be important and clinically meaningful. Maybe you couldn’t submit your research for publication if it had not been vetted in this way. A “Sensible Research Review Board?”
Yes, journals care about clicks too…
As a personal aside, it was the stimulus for one of my first articles in a major journal.
We wrote quite a bit about this in Ending Medical Reversal.
Dear Adam,
I think this is a really interesting article and thank you. I agree with the principle. As a heart failure physician I am going to comment on your points around the Emperor preserved study. I was all ready to refute you by saying that most of the patients in both groups were already on a loop diuretic, but somewhat amazingly the investigators do not report this data anywhere in the paper or supplementary appendix (unless I have missed it).
However the previous DELIVER trial of dapagliflozin in the same patient population did report this (UK authors win here!) ; 75% of both groups were on a loop diuretic. So it does appear that the benefit of dapagliflozin was in addition to the loop. https://www.nejm.org/doi/full/10.1056/NEJMoa2206286
I suspect there is a data torturing paper showing the benefit of SGLT2 inhibitors was the same with and without loop diuretics in the preserved ejection fraction population, but I have not found it yet. There is one showing that the presence or absence of diuretics did not seem to impact the efficiency of dapagliflozin in reduced ejection fraction population https://www.ahajournals.org/doi/10.1161/CIRCULATIONAHA.120.047077
So I think here it looks like SGLT2 probably do work additionally to the loop diuretics. I agree however that particularly in the preserved EF group a trial of higher dose of loop versus SGLT2 would have been interesting - if less likely to get funded.
Finally, as someone who is handing out a lot of these medications (cost is much lower here in the UK, and funded by NHS), the clinical effect appears to be pretty clear and my patients do seem to feel better even when they have already been on a loop diuretic.
Best wishes,
James Gamble, cardiologist with an interest in heart failure
Oxford
Agree: Many principles can be learned without being a genius. I guess the challenge with appraisal is that people are either content or methodology “experts” and to some extent you need both. As someone trained in methodology and working in health technology assessment for 15 years, dissecting the clinical aspects is generally the bigger challenge for me, while the clinicians often lack training in the methods. Unfortunately the discourses between these groups are not always constructive, but sometimes too much influenced by power struggles, vested interests, preconceptions, different perspectives (say population vs individual) or not distinguishing between data and what they might mean for practice. Science and its application is tricky and there should always be humility. I think of appraisal as a skill that is best honed lifelong. That’s why I enjoy following Sensible Medicine.