Let's Do the Cochrane Review of Physical Measures to Reduce the Spread of Viruses
The Study of the Week delves into the recent Cochrane Review because it offers a trove of lessons in critical appraisal
Professor Tom Jefferson of Oxford led a large group of authors for this fifth update of the evidence review of physical measures to interrupt the spread of respiratory viruses. Their last review was in 2020, which did not include SARS-CoV-2 infection.
Background Regarding Meta-analyses, Systematic Reviews and Cochrane Reviews
In the pyramid of evidence, a systematic review and meta-analysis sits as the highest-level evidence. You can see from the picture that the lowest levels are expert opinion, case reports, case series, and observational studies.
The pandemic underscored two (of many) problems with observational studies of non-randomized groups. One is dissimilar groups and the other is analytic flexibility.
When groups aren’t chosen at random, one group may do better because it has healthier characteristics—not because of the intervention. And when you study the effect of an intervention over one time period, the choice of a different time period may yield different results.
Randomized trials eliminate these biases. They set a time zero and randomization (mostly) balances known and unknown characteristics.
The problem with RCTs is that they can be selective and inform narrow areas. That is where systematic reviews and meta-analyses come in. These combine the trials to estimate an overall effect.
Sadly, though, the medical literature is over-populated with poor meta-analyses. That’s because computer software allows anyone to enter trials and get an overall effect.
Cochrane reviews are different. These are considered the gold standard. It’s beyond the scope of this column to explain why this is, but in short, Cochrane reviews are known for their rigor, strict adherence to methodology, pre-registration and transparency.
Changes from Previous Reviews
The authors made one major change in their methods from 2020. For this review, Jefferson and colleagues found sufficient randomized trials and therefore excluded observational studies.
This change enabled more robust evidence summaries from high‐quality studies, which are much less prone to the risk of the multiple biases associated with observational studies.
This was a massive change because during the pandemic, people who chose to use physical interventions (such as masking) are likely to do multiple other things to stop the spread of a virus. That’s why you need randomization.
The Current Study
The authors included 11 new RCTs and cluster RCTs with more than 610,000 participants. Six of the new trials were conducted during the pandemic—two from Mexico, and one each from Denmark, Bangladesh, England, and Norway.
The authors pre-registered three main questions: 1) medical/surgical masks compared to no masks; 2) N95/P2 respirators compared to medical/surgical masks; 3) Hand hygiene compared to control
The Results: (I will skip hand hygiene for time constraints and because clean hands also prevent bacterial infections.)
For the first question – medical/surgical masks vs no masks, the results are in the picture.
Nine trials considered the risk of any viral illness. The point estimate of the relative risk was 0.95 with 95% CI ranging from 0.84 (a 16% lower rate with masks) to 1.09 (a 9% higher rate with masks). We consider this a non-significant difference.
Six trials considered the risk of SARS-CoV-2 infection. The point estimate was 1.01 with 95% CI ranging from 0.72 (a 28% lower rate) to 1.42 (a 42% higher rate). We also consider this a non-significant difference. (*Notice the width of the confidence intervals.)
For the second question -- N95/P2 respirators compared to medical/surgical masks, the results are in the picture.
Three trials conducted in hospital settings with healthcare workers studied the presence of any viral illness. The point estimate of the relative risk was 0.70 with 95% CI ranging from 0.45 (a 55% lower rate with the stronger masks) to 1.10 (a 10% higher rate with the stronger masks). We also consider this a non-significant difference.
Five trials compared the two types of masks and when using lab-confirmed influenza, the point estimate was 1.10 with 95% CI ranging from 0.90 (a 10% lower rate) to 1.34 (a 34% higher rate). We consider this a nonsignificant result.
The Conclusions
The authors wrote
“The pooled results of RCTs did not show a clear reduction in respiratory viral infection with the use of medical/surgical masks.”
“There were no clear differences between the use of medical/surgical masks compared with N95/P2 respirators in healthcare workers when used in routine care to reduce respiratory viral infection.”
But they added an important caveat: there was low to moderate certainty of the evidence. That reduces confidence in the estimate.
Interpretation
I will start with a Tweet from one of the wiser voices during the pandemic, pediatrician Alasdair Munro:
I like this comment because it exposes two extremes that we should avoid.
The first is easy, right? Of course, a mask may work in a physics lab on a robot. But that is not how masks are used in the real world. People take them off to eat or drink; people don’t wear them properly, and, in the case of N95s vs medical masks in the hospital, workers don’t stay in the hospital the entire day and night.
This message transcends masks. It’s the same with procedures and drugs. A drug may exert a clear effect in moving one surrogate marker. But when used in a larger group of humans, it may be ineffective in reducing an important outcome.
Dr. Munro’s other extreme, that this proves that masks do nothing, is also not exactly what this review said.
Some would argue that the systematic review shows an absence of evidence of benefit. Which is true.
But.
The confidence intervals around the estimates allow for both increased infection and decreased infection. That would lead others to argue that absence of benefit does not equate to evidence of absence of benefit. The authors lend some credence to this idea when they wrote that there was low-to-moderate certainty of the evidence.
They called for a large well-designed RCT that would address these questions—especially the impact of adherence on effect sizes.
My Final Comments
I think we can find a middle ground between Dr. Munro’s two extremes. This involves common sense and consideration of prior probabilities.
First the priors. When I look at a trial, I like to think of it as akin to a medical test.
Medical tests rarely give definitive answers; instead, they update our prior beliefs. For instance, a negative stress test in a person with super-low cardiac risk further strengthens my confidence that this person does not have heart disease.
It’s the same here. Going into these new RCTs, there was no compelling evidence that masks did much to halt the spread of respiratory viruses. The new trials, with their null results, further strengthened that belief, though we should keep our minds open to change if a large well-conducted trial upends that belief.
But, as it always is, the onus of proof is on the proponents to show us a setting in which an intervention works.
Now to common sense: We in the health field have had three years to observe the use of masks. You walk through a hospital ward and see half the workers with their masks pulled down to take a sip of coffee. And there is no mask use in the cafeteria or break rooms.
Even when talking with a person wearing a regular medical mask, you can see the gaps on the sides that a respiratory virus can easily flow through.
You can love evidence, as I do, but this does not mean one needs to abandon common sense.
Editors' Note: This post will allow comments from paying subscribers. If you like our work, and want to support the Sensible Medicine project of independent ad-free evidence review and the allowance of nuanced argument, please consider becoming a paid supporter.
Editors’ Note #2 — I made one edit. I changed the reason for not reviewing the data on hand hygiene. I originally said it was because hand hygiene seemed obvious. That’s a problematic argument because some might say mask use is obvious. Better reasons to exclude it were time constraints and the fact that clean hands also prevents the spread of bacterial pathogens.
I'm still fascinated by the psychology and group dynamics that allowed the entire world to do a complete 180 on masks, to completely abandon the scientific method, seek out the weakest of evidence to support our superstitions we embraced out of nowhere.... it really was like a new Dark Age swept over us in 2020. I'm still in awe by it. There's a certain magic of being able to witness this first hand, as I ignorantly assumed we were "better than that". The superstition phase of medicine, whether it was bloodletting, electroshock, lobotomies, mercury elixirs - all of that quackery could never arise again now that we had "science" and EBM to shield us from such irrationality.
It really is remarkable. I truly hope my grandchildren will think I am bullshitting them when I tell them about 2020 someday.
Very well written and clear - thank you! What I love about this one is the description of how a medical study is like a test, it doesn't provide certainty but it updates our priors. What is useful about this is that it allows us to see where the biases are, and, in this case, they're in the 'common sense' section, which pulls current anecdata about mask compliance to state the priors.
And what is clear is Mandrola's priors that "masks don't work" is coming from a place of selective evidence. But this framing is helpful to show that we don't disagree about this study or its conclusions. The bias is in our priors and what we choose to cite for 'common sense.' My "common sense" is the fact that I worked as a hospitalist during the first wave in NYC, wearing only surgical masks for the first week, spending hours in rooms with patients, then n95s until that summer. Mask adherence was high among co-workers. The only healthcare workers I heard of that got sick were those who ate in communal settings in breakrooms. It seems Cochrane's methodology of excluding observational studies is a big mistake here because it seems like those studies should at least inform our priors, even if we take that evidence with a grain of salt.
The challenge with Cochrane reviews, as pointed out by Trish Greenhalgh, is they're insufficient for real world questions and challenges where policymakers may need to make decisions under high uncertainty and may lean toward precaution. Despite my various levels of immunity, in the absence of concrete evidence it seems reasonable to err on the side of caution and wear masks in certain settings. I still do because the risk of harm (to myself or others) is low while the possibility of benefit (to myself and others) seems to outweigh that.
What is worse is that this debate is playing out publicly. I appreciate Mandrola's nuanced take here, and the nuance of the study authors. But the problems arise in the press. Bloomberg, for example, dedicated their health newsletter to the issue this morning, and their takeaway was "masks don't work," framing a looming controversy with how long can Hong Kong's mask policy hold out since masks aren't shown to work?
The nuance, uncertainty and priors are lost in the translation from dissecting the study on its merits to the reporting on it in the press. And, what follows is the propagation of misinformation - overstating the conclusions of a study when even the study authors were quite circumspect and nuanced in how they reported it.
I struggle with this because this is exactly what we should be doing as scientists - debating the science, being transparent about our priors and how much any new study should update those. But channels are getting crossed as we debate this publicly and nuance is lost in translation, making the issue even more political. Which raises the questions, is this the proper place for this debate? Or is it better suited for a medical journal?