Another "Positive" Heart Failure Trial that Falls Short

The SUMMIT trial delivered positive results and glowing headlines but gosh its design was curiously weak.

Dec 09, 2024

At last month’s AHA meeting, Dr. Milton Packer presented positive results of the placebo-controlled SUMMIT trial of tirzepatide (Zepbound) in patients with heart failure and a preserved ejection fraction (HFpEF) and obesity.

The positive results made news. At least 57 news organizations covered the study. Experts called the study “practice-changing.” If this were true, it would be a big deal as obesity and HFpEF are so common.

Yet the trial falls way short of practice-changing. And it’s quite surprising that experts would say these things about it.

Some Brief Background

In the old days, most heart failure was due to a weak pump. We call it heart failure with a reduced ejection fraction (HFrEF). Now, most heart failure is due to a stiff heart that pumps fine but does not relax well. This is HFpEF. Patients with HFpEF often have other conditions, such as aging, diabetes, high blood pressure and obesity.

Treating HFpEF is a lot harder than treating HFrEF. The latter patient usually has one problem (the weak heart) and we have medicines that help. The patient with a stiff heart often has multiple non-cardiac issues that complicate treatment. Plus, there isn’t a heart muscle “relaxing’ medicine.

The Trial

A total of 731 patients were randomly assigned to receive tirzepatide or placebo.

Investigators chose two co-primary endpoints. The first was a composite of cardiovascular (CV) death and worsening heart failure events—the latter could be a hospitalization for heart failure, a visit for intravenous diuretics, or intensification of oral diuretics. The second co-primary endpoint was a change a patient-reported quality of life questionnaire at 1 year.

Patients were an average age of 65 years, 55% were female, the average body mass index was 38.

Results

The primary outcome of CV death and first heart failure event occurred in 36 patients (9.9%) in the tirzepatide group and 56 patients (15.3%) in the placebo group, for a hazard ratio of 0.62 (95% CI, 0.41-0.95; P =.026).

The 5.4% absolute risk reduction in the primary endpoint was completely driven by lower rates of heart failure events (8% vs 14.2%). CV death was actually higher in the tirzepatide arm, but the number of deaths was low in both arms (8 vs 5).

The rate of hospitalizations due to heart failure was lower with tirzepatide (3.3% vs 7.1%), as was intensification of oral diuretics (4.7% vs 5.7%).

The second co-primary endpoint of change from baseline quality of life favored tirzepatide.

Other secondary endpoints also favored tirzepatide: longer 6-minute walk distance, greater change in body weight (-11.6%), and lower high-sensitivity C-reactive protein levels and systolic blood pressure (-4.7 mm Hg).

Comments:

It looks positive, doesn’t it? P-values show statistical significance. HF events are lower. QOL better.

The lead investigator, Dr. Packer said that tirzepatide “changed the clinical trajectory of the disease.” Another expert, Jennifer Ho, associate professor of medicine at Harvard Medical School, Boston, Massachusetts, said, “This really is a practice-changing trial and cements this type of therapy as one of the cornerstones of obesity and HFpEF treatment.”

I have many concerns:

Problem One: When you say something is disease-modifying, that means a reduction in hard bias-free outcomes, such as death, CV death, myocardial infarction, or stroke. I argued last year that the GLP1a drug semaglutide was disease-modifying in patients with established heart disease and obesity because the SELECT trial showed reductions in hard clinical endpoints.

SUMMIT did not show that. Of its two components of the primary endpoint, CV death was not different in the two groups. The other component, HF events, is a problematic endpoint. As I wrote last week, hospitalization for heart failure is often a poor surrogate for total hospitalization.

Yet, that’s not the only problem with HF events as an endpoint. SUMMIT had a placebo arm, but surely, patients would learn their treatment assignment. GLP1a drugs cause GI symptoms, such as decreased appetite, bloating and weight loss. Placebo does not. Since the decision to seek care for swelling or fluid gain is partially subjective, unblinding of treatment can bias the endpoint. IOW, a HF event requires a patient and clinician to make a decision.

Problem Two: Uncertainty. In 2018, Dr Packer wrote an editorial titled Building Castles in the Sky. It was strong critique of an AF ablation trial called CASTLE-AF. One of his main criticisms was a small number of outcome events. He wrote that such trials are “notoriously nonreproducible.” This is an important criticism because small numbers of events may not sort signal from noise.

Dr Packer’s SUMMIT had extremely small numbers of events. CV deaths were 8 vs 5 in the tirzepatide and placebo groups. And the number of hospitalizations for heart failure — the more standard endpoint — was low, at only 12 and 26, respectively. Contrast this with two other HFpEF trials: the DELIVER trial of the sodium-glucose cotransporter 2 inhibitor dapagliflozin where there were nearly 750 hospitalizations for heart failure and PARAGON-HF of sacubitril-valsartan vs valsartan where there were nearly 1500.

Problem Three: SUMMIT authors do not tell us the number of all cause hospitalizations. Without that number, it is hard to assess the importance of the HF event endpoint. For instance, what if tirzepatide caused an increase in non-HF hospitalizations?

Summary:

It’s curious to me that Eli Lily and the academic authors designed such a small underpowered trial. HFpEF and obesity are two of the most common medical conditions. Previous trials in HFpEF enrolled many thousands of patients. The sponsor clearly has the funds.

Instead, we get a trial with 13 total CV deaths, less than 50 hospitalizations for heart failure, and biased quality of life endpoints (because of unblinding).

GLP1 drugs may be disease-modifying in HFpEF, but it would take a far more robust trial to show it. Why this was not done is a mystery.

Thanks for reading Sensible Medicine. Your support keeps this site free of industry influence. Please feel free to share and comment. And of course consider becoming a subscriber. JMM

Frank Harrell

Dec 9, 2024

When are trialists going to figure out that counting all endpoints as equally bad is silly? The time-to-first-event endpoint considers all outcomes equally bad. The study is crying out for an ordinal outcome variable constructed from clinical consensus about severities of various outcomes. A treatment should get more credit for reducing mortality and less for other things.

Michael Kelberman

Really inexcusable for such a common condition, with very easy enrollment criteria. One would hope such an experienced investigator would be holding these companies to a higher standard, in fact, demanding it. This contributes to the slow (drip,drip,drip)inexorable loss of confidence that physicians and the public feel regarding “experts” and the system.

16 more comments...

Sensible Medicine

Discussion about this post

Ready for more?