Study of the Week: Choices, oh my, the choices
I used to think *the* data yielded *the* result. This paper challenged that notion.
Summer tends to be slow for medical evidence so this week I will dip into the slide deck of my favorite studies. I will tell you about one of the most shocking studies I have seen in my career. It forever changed my view of evidence interpretation. Buckle up.
Medical studies aim to answer a question. This may be a trial comparing a drug to a placebo; it may be an observational study looking for a correlation between two things, such as fitness and longevity.
Researchers collect data. They put the data in a spreadsheet. Then they analyze it.
It turns out there are many ways to analyze data—not only in the statistical tests to sort out signal from noise but also in the choices of how to handle variables.
When you read a medical study, the researchers tell you how they analyzed the data. In the best quality studies this is set out before the experiment.
But…they analyze it in only one way.
Yet there are actually many ways to analyze data.
In 2018, the scientist Brian Nosek, from the University of Virginia, tested the idea that results of studies may hinge on subjective decisions made during the process of analyzing the data.
Pause there. How can that be?
You would think that the creativity in science would come in the question asked or the design of the experiment—not something as banal-seeming as analyzing results.
I called Nosek’s paper the most important study of the year.
Here is a quote:
For years, when I read a scientific paper, I’ve thought that the data yielded the published result.
Nosek and colleagues recruited 29 teams of expert researchers to analyze one data set to answer one simple question: were professional soccer referees more likely to give red cards to dark-skin-toned players than light-skin-toned players?
The 29 teams of researchers analyzed the same data in 29 different ways. The picture shows the different approaches. The names are super-complicated. The point is that they are different.
Now look what happens:
Two-thirds of the expert teams of data scientists detected a significant result and one-third found no statistical difference. Two teams of experts found results that were highly suggestive of implicit bias amongst referees.
This paper rocked me to my core. No paper I’ve ever read in medicine has ever said we will analyze data in multiple ways. They analyze data in one way.
But you can see how much difference it might make.
It turns out that this has been replicated; and co-founder of Sensible Medicine, Vinay Prasad, was a co-author.
This group selected a clinical scenario in cancer medicine, such as survival in prostate, breast, or lung cancer. There have been many non-randomized studies that have compared outcomes in patients given different treatment regimens in these scenarios.
They then ran more than 300,000 regression models--each representing a publishable study. In each of these models they varied the use of methods to reduce selection bias.
And…similar to Nosek, they also found that varying analytic choices, would generate different outcomes. In fact, their findings suggested that more rigorous choices to reduce bias reduced the likelihood of positive findings.
Here is the point.
Let’s say you read a study that finds a statistically significant association. Let’s then say the odds ratio is 1.20, which means the exposure of something may increase the odds of an effect by 20%.
The 95% confidence intervals are 1.02-1.40, roughly meaning that had the experiment been run an infinite number of times, the true result would fall within a 2% increase or 40% increase.
This is a statistically significant result. The researchers will make positive conclusions. Media may cover it as a positive study, potentially a breakthrough. Public or medical opinion may change.
Yet this result came from the researchers’ chosen analytic method. What if they had chosen a different way to approach the analysis, like the 29 research teams in Nosek’s study?
There might be different results…and different conclusions…and different policies.
My conclusions are simple:
There are obvious things in medicine. Fixing a hip fracture, antibiotics for infection, insulin for Type 1 diabetes.
But the vast majority of our knowledge is far less certain. This is okay. Normal even.
Yet I wonder…what would be the effect of open data? What if experimental data were available to independent scientists that could then analyze the results in ways they see fit?
Truly important findings would surely pass muster.
But if a result can be affected by the choice of how to analyze the data, well, then, perhaps we, as in everyone, ought to be a lot less rapturous about its results.
Or a lot less despondent? If perhaps the results from a given analysis aren’t what one was hoping for?
Is it possible that some of the methods were just poor to begin with or poor choices for a particular use case and shouldn't have been used at all? I'd imagine that a poor model compared to a good one would end up skewing results.