Choosing a Control Group
The Study of the Week considers one of the most curious choices of a control arm that I have seen
You had two choices in the lottery for senior science class.
Mr. Flexner taught science in the old way. As a reductionist, he had his students learn basic physiology as that would explain human disease. If you did the work, which was hard, you got a good grade.
Mrs. Onderdonk focused on the mechanics of science. Her students actually did science. They thought about questions; designed experiments to answer these questions, and assessed the results. Grades depended how students carried out the scientific method.
Let’s consider a scientific question in medicine.
Patients may develop a disease that requires admission to the hospital, mostly because of lung problems, such as the need for oxygen. The disease might affect other organs—either directly or indirectly.
Indirect organ damage is common in medical conditions that are severe enough to warrant admission to the hospital. For instance, you could be admitted for inflammation of the pancreas and develop a heart arrhythmia just from being sick. You could be admitted for pneumonia and develop brain issues from the low oxygen.
The question that a group of investigators from Leicester in the UK asked is how different patients recover from infection with the SARS—CoV-2 infection.
This would be a great project for the class of Mrs Onderdonk. I will tell you about the study’s results and then let’s consider how a science teacher would grade the experiment.
The C-MORE/PHOSP-COVID collaborative group published their results in an important journal—called The Lancet Respiratory Medicine
They screened nearly 3000 patients and found about 250 patients who had been admitted and then discharged for COVID-19. They then did outpatient MRI scans of multiple organs—lung, heart, brain, liver and kidneys.
Now. Stop there.
These were patients ill enough to be admitted for a viral infection. Median length of stay was 6 days.
When you do imaging on people who have been that sick, you are going to need a control arm.
Mrs Onderdonk spent months teaching her students that the best choice of a control arm is one chosen at random before any intervention. This has the effect of balancing known and unknown factors, which could affect the primary outcome.
This study, however, is not an RCT, it is an observational comparison, looking back at results in two groups.
The authors chose to compare the MRI findings from acutely ill patients hospitalized for an infection to (n = 52) normal people from the community—who had no recent illness.
The primary outcome of interest was excess burden of multiorgan abnormalities (two or more organs) relative to controls.
Results:
Multi-organ abnormalities on MRI were far more frequent in the COVID-19 group (61% vs27%). COVID-19 was a strong predictor of having an MRI abnormality. (odds ratio 2.9).
Compared with controls, patients were more likely to have MRI evidence of lung abnormalities (p=0·0001), brain abnormalities (p<0·0001), and kidney abnormalities (p=0·014) whereas cardiac and liver MRI abnormalities were similar between patients and controls. (*Nice to know that cardiac abnormalities were similar, but that’s a post for another day.)
Authors conclusion:
After hospitalisation for COVID-19, people are at risk of multiorgan abnormalities in the medium term. Our findings emphasize the need for proactive multidisciplinary care pathways, with the potential for imaging to guide surveillance frequency and therapeutic stratification.
Media Attention:
This paper received a high altmetric score of 2252. That places it in the top 5% of all research outputs scored by Altmetric. It was covered by 194 news outlets and tweeted by nearly 2000 accounts, including some very high profile doctors.
Comments:
Mrs Onderdonk has long taught her students to choose a control arm with care. In the absence of randomization, you want a comparator arm to be similar in many ways to the active arm.
That is not what happened in this study. Not at all. As a reader of a study, the way to assess control arms relative to active arms is Table 1:
There we learn that patients in the control arm were far healthier. Control patients were younger (by 8 years on average), had less obesity, smoked less, had less high blood pressure, less pre-existing lung disease, and less cardiac disease.
What do you think will happen comparing MRI images from sicker older patients to those who are younger and healthier? The paper is open access so you can see it yourself.
Summation
In this science experiment, the authors did imaging of patients who had just been discharged from a near-week-long hospital stay for an acute infectious disease.
They then compared the imaging abnormalities in this group NOT to a group of patients admitted to the hospital for some other week-long illness, but to a much younger healthier control arm that did not have an illness.
The authors attempted statistical adjustments, but there is no way that any adjustments can account for the massive degrees of differences. Notable also is that fact that there were 259 patients in the active arm and only 52 in the control arm.
Anyone want to guess the grade Mrs Onderdonk would give this experiment?
And what would she say about the fact that it was published in big-named journal and garnered this much attention from medical doctors online?
A note to readers. Sensible Medicine continues to grow. We appreciate your support. Our aim is provide ad-free content not available in traditional places. This requires reader support.
We welcome guest posts of less than 1200 words.
Andrew Foy and I recorded a great podcast yesterday. Stay tuned this week for our conversation.
That this got published in Lancet should tell you a LOT.
Why the fear mongering do you think?
Whoa. or LOL. This sounds like a typical in vitro diagnostic validation study from 20 years ago (although some are still performed today). Modern IVD practitioners know never to compare a diseased group with normal controls (even matched). The performance of tests investigated that way can approach 100% specificity and 100% sensitivity (c statistic of 1.0), but end up a coin toss (c statistic of 0.5) in a prospective all-comers study.
This should not have passed the first round of peer review, perhaps triaged to not even be sent to reviewers. It is just not science. It's wishful thinking.