How Not To Look at Data

The Study of the Week considers a major trial of a pricy device that breaks a basic rule of science. Yet a major journal and regulators allowed it. What should one think?

Nov 28, 2022

My early foray into science taught me two lessons. Last week I discussed the issue that no one checked our source data.

The other lesson turned on how not to look at data. An influential trial published this year failed that lesson.

First the story, then the big trial.

Many years ago, we set out to study an association of a variable with a procedural outcome in our hospital. We put our data on a spreadsheet and then coerced a student to run statistical tests to sort out signal from noise.

To our (mostly my) dismay, the association we thought would be obvious did not meet the threshold for significance.

Darn it, I thought. I am a doctor and I know what I see.

I said to Sean, our impossibly young and earnest statistical volunteer, let’s look at this subgroup of older patients. That will surely show what I would expect.

Sean stared at me with a frightened look in his face.

Sean, “what’s the matter?” I asked.

He spoke softly. “Dr. Mandrola, you can’t do that.”

I’m like, “Sean, first of all, I am a cardiologist, and second of all, it makes perfect sense to look at the data this way.”

Sean explained that you can’t change the way you look at data after the fact. Faint memories of statistical class early in medical school started to come back. Let’s call Dr. Mann, our senior colleague, who, before private practice, was a legit academic.

David Mann heard only the first few minutes and explained that Sean was correct. We had to stick with the overall analysis.

The take-home: I was biased and wanted the data to confirm my belief. I didn’t like the results, so I wanted to look at it another way.

In every break room in our clinic and hospital sits booklets for a new tool to measure pressures within the heart of patients who have congestive heart failure.

CARDIOMEMS is an ingenious (and pricy) paperclip-like device that is inserted into the pulmonary artery, and then wirelessly transmits data to the clinician.

The idea is that having internal pressure data has to be better than the current technique of managing patients with heart failure—which includes daily weights and simple physical exam signs, such as swelling of the abdomen or lower legs.

FDA had approved the device in 2014 for a very narrow group of patients. But uptake had been slow.

The company wanted to expand use of the device. And if it helped people, there would be a confluence of interests: the company makes more profits and people would get better outcomes. That is what we want.

Regular readers know what comes next—a randomized controlled trial.

In the GUIDE HF trial, about 1000 patients with heart failure had the CARDIOMEMS device implanted and then were randomly assigned to care guided by the device vs standard care.

The authors chose a primary outcome that had three components: death, urgent heart failure visits, and heart failure hospitalizations.

During follow-up, there were 253 primary outcome events in the treatment arm versus 289 primary outcome events in the control arm. So, it was lower in the treatment arm. But we have to determine if this difference was statistically robust.

The relative risk reduction was 12% and we express this in a hazard ratio of 0.88. The 95% confidence intervals surrounding that estimate ranged from 0.74 (a 26% reduction) to 1.05 (a 5% increase). The p-value, which calculates the surprise value of this data if you assume there were no differences, was 0.16; and the accepted threshold for significance is lower than 0.05.

The upshot is that because the confidence intervals included a risk increase and because the p-value was well above the threshold, GUIDE HF is considered a non-significant trial.

This had to be a huge disappointment for proponents and makers of the device. Had this trial passed the standard, it would have expanded its use to an additional 1.2 million patients in the US--at approximately $20,000 per device. (You do the math.)

Had Sean been their statistical expert, the conclusion would have read that the device did not result in a lower composite endpoint rate of mortality and total heart failure events compared with the control group. Period. End of story.

The authors would have moved on to other research projects and the company would focus on some of their other products.

But that is not what happened.

Instead, the authors re-analyzed the data using a different subset of data.

It turns out that three-fourths of the patients had finished follow-up before the pandemic and when they looked at this group, there were fewer primary events in the CARDIOMEMS arm, and now, the difference barely made the p-value threshold at 0.049.

In the one-quarter of patients followed during the pandemic there were no differences in events. This is weird because a wireless device ought to have over-performed during a time when in-person follow-up was curtailed. There were many other problems with the trial and its interpretation. I wrote about these in detail here.

Just like I tried to explain to Sean many years ago, the authors said the pandemic was a good excuse for after-the-fact slicing of data. The Lancet, an eminent journal, let the authors write this conclusion:

The authors and proponents will rebut my point by saying that the pre-COVID analysis was done in consultation with regulators at FDA. Which is true, but in no way makes it correct.

FDA has approved the device for this expanded indication. This has led to intense marketing. Doctors feel like they have to offer the device to patients. Why? Because the hospital across the street is doing it. Plus, doctors love extra data.

Along with colleagues, we tried publishing our criticisms in a peer-reviewed journal. It was rejected—numerous times.

Conclusion:

I try. I really do try to avoid cynicism. I have zero doubt that collaboration with industry can improve healthcare.

But when we allow authors of an industry-sponsored study to break basic rules of statistics to push an expensive device through regulatory approval, what choice does one have?

One final screenshot. From the Lancet. (Abbott is the maker of CARDIOMEMS).

This post is open to all subscribers. If you like our work, and support independent appraisal, please consider becoming a paid subscriber. Sensible Medicine aims to remain free of the influence inherent in advertising models.

Amos

This reminds me of an assignment I was given as an undergrad psychology student. We had to find examples of bad research. A member of my team found the Journal of Parapsychology in the library. What a goldmine! We found a study that tested whether mice could received "thought messages" from humans that could help them through a maze. The researchers timed how long it took mice to get through a maze, both with a human "thought" guide, and without. There was no significant difference between the two groups. However, the researchers sliced apart their data until they found some that would get them the p-value they needed. Their conclusion? Mice are able to receive thought messages from humans, but only before 10am!

Expand full comment

1 reply

Joseph Marine, MD

Another very informative post. Thank you.

32 more comments...

Sensible Medicine

How Not To Look at Data

The Study of the Week considers a major trial of a pricy device that breaks a basic rule of science. Yet a major journal and regulators allowed it. What should one think?

Discussion about this post