I tend to align with the critical attitude with which Dr. Ionnadis, as well as yourself and others, approach the examination of scientific evidence. The only thing I’d push back on is the idea that the frequentist “takes numbers at face value.” Even RA Fisher himself understood that the results of an experiment can only begin to be accepted when repeated trials “rarely fail[] to give this level of significance.”
I understand the broader point though and it’s one I agree with — common interpretation of frequentist measures often lack that necessary provisionality.
Thank you for the great thought provoking article. The idea of Bayesian thinking being considered during the appraisal of studies evaluating medical interventions seems helpful to me, particularly to put the study findings in context with prior literature. This may be considered more interpretation, which I would say is involved in the "third/final part" of the appraisal process when considering how the results apply to a particular patient. I would be hesitant to consider Bayesian "priors" when assessing for threats to validity and the clinical importance of the study results.
Thanks for the links to those other papers on CA. I will need to check those out.
I think of CA as a process that allows each individual clinician to translate evidence to their clinical encounter with Mrs. Smith (among many others).
The nuts and bolts of evaluating a study are obviously crucial. When I was an IM res we went through a series in the Annals of Internal Medicine. It was step by step….precise….and extremely dry. Though it was useful as a foundation, there was no art there.
While I am not as much of a Bayesian as you and Dr. JMM, I ultimately evaluate how “good” a study is based on how well I can apply the knowledge learned (whether the trial was “positive” or “negative”) to the Mrs. Smith’s of my world. So I think for me, all else being equal, external validity is where it’s at.
I am not a statistician, nor do I have a deep understanding of what it means to be Bayesian vs frequentist beyond simply taking into account pre-test probabilities. I am a clinician though (ID Fellow) and trying to learn how to interpret medical evidence is clearly relevant to me.
Here endeth the caveats.
Now the question: what goes into the generation of someone’s pre-test probability (PTP for short). Is it prior study results? Physiologic reasoning? Vibes? I ask this because shouldn’t the PTP be subject to critical appraisal, just like any other number? But then it seems that we approach an infinite regress. Prior study results are subject to the same Bayesian concerns as the ones we are currently interpreting. Physiologic rationale, as Vinay Prasad has repeatedly taught me, is unreliable at best. It seems like we might be left simply with “Vibes”, or if we want to sound sophisticated, JH Newman’s “illative sense”.
Is there a Bayesian answer to this? Or is the Bayesian framework better understood simply as a *description* of how our cognition works rather than a *prescription* to aid in attaining objectivity?
Thanks for the reply. Curious as to your thoughts about the potential for infinite regress here in terms of epistemological justification. Do you worry about this?
Relatedly, do you feel Bayesian thinking is descriptive or prescriptive?
Dr. Cifu, in case this forum no longer offers another opportunity to discuss the manuscript, I’d like to suggest that we approach the article from a broader and more complex perspective, beyond the VÍA framework I mentioned in my previous comment on this post.
Regardless of the study itself—its methodological flaws, which are identifiable and meaningful—and its results, the fact that 8-year survival ranged from 70% to 80% among patients who “simply” received “education about exercise” implies that, overall, the trial cohorts are survivor cohorts with a much better prognosis than most colon cancer patients worldwide.
But the real question we should ask is: beyond exercise education—even if not adhered to—or participation in a rigorous training program (which only “healthy,” health-conscious individuals inclined toward physical activity are likely to follow), what other truly impactful intervention could this study have considered? Perhaps in a third experimental arm?
If anything has a significant attributable fraction in all chronic diseases—and is certainly fundamental in cancer—it’s nutrition. I reviewed all supplementary materials of the study, and nutrition appears merely as a sentence noting that the education group—but not the structured exercise group—received recommendations about a “healthy diet.”
My question is: what do the researchers—clearly from the world of physical exercise—consider a healthy diet? All the research we’ve been involved in for the past five years strongly suggests that it is currently quite possible to tell a person, individually, what a healthy diet means for them.
As a hypothesis (which, I admit, borders on strong certainty—and possibly prejudice), I would propose a third intervention group, in which each patient is assigned the diet most appropriate for their biotype, body composition, culture, habits, and naturally, one that excludes carcinogenic elements. I would even dare to estimate that such an approach could yield 8-year survival rates exceeding 95%.
That is to say: between a so-called healthy diet and a healthy exercise routine—both of which remain widely misunderstood—I would prioritize, in colon cancer and especially in all gastrointestinal cancers, a careful dietary approach and lifestyle modifications specifically related to eating habits associated with cancer risk.
By the way, the fundamental problem with this study lies in the selection process, which, on the one hand, allowed the inclusion of post-adolescents and young adults—individuals who naturally have a different relationship with exercise, are more likely to engage in it, and therefore met the inclusion criteria more easily.
This selection bias is almost entirely responsible—albeit artificially—for the marginal effects observed in the study, which are comparable in size to those seen with medical treatment for colon cancer.
Truly excellent post, thank you. The Ioannidis paper is great. I need to re-read it several times. The concept of R is challenging but it rings true.
The corollaries are an excellent guide.
Thoughts:
Replication is a pillar of scientific method, if this pillar crumbles?
Underpowered experiments coupled with excessive numbers of measurements (“because we can”) can be very misleading (in my experience).
“Non-significant” findings are still significant, (as my Ph.D. Advisor said).
Bias is becoming a more pervasive problem. There has always been bias, especially when the pet theory of an eminent, tenured professor at a major university is challenged by upstarts, but it seems more widespread and insidious today.
Meta-analysis is a great advance but potential pitfalls are serious. We are basically treating individual studies as independent experimental units. These studies are often opaque to scrutiny, or would take many hours to vet. Much faith is put in the authors and peer review.
I am (old) skeptical of papers with more than 5-6 authors, and more than 15 pages. There are certainly exceptions, but can’t anyone write up a study in 5-7 pages anymore? Seems to be a lot of weeding needed to ferret out the hypothesis, design and key results.
On the colon cancer study the findings are not surprising, are they? If the treatment were colloidal silver or something off the track like that it would be a different matter. Yes the magnitude of the treatment effect may be suspiciously large but the downside risk of a patient having some supervised exercise seems low. I’m not an M.D. or medical professional though.
Dr. Cifu, as always, you raise crucial points regarding critical reading. To be consistent, this is precisely what I teach in my course.
1. No study is perfect; at best, it is dressed up to appear so in order to pass peer review, including the publication process.
2. Critically reading an article means making an objective diagnosis: understanding the research and judging it objectively in terms of its methods, based on how the study should have been done. This is why—and here I respectfully disagree a bit—one should never begin with the magnitude of the outcomes. Only after assessing the study’s validity should we consider the intensity and significance of the results, and finally, their applicability to our patients. Pragmatically, this means adjusting the study’s NNT for efficacy to match our own patient’s baseline probability for the most clinically relevant outcome. I’ve used the Spanish acronym VÍA (which conveniently translates as way in English) to summarize this process: Validity, Intensity of results, and Applicability.
3. I understand the view that the low pre-test probability of the results being valid may be seen as a political stance toward studies. I would call it distrust. Given that two-thirds of published studies are essentially waste (as Ioannidis argues), and two-thirds of randomized controlled trials are commercially sponsored or influenced by commercial interests, it is reasonable to adopt the mindset that “if you assume the worst, you’ll probably be right.” Unfortunately, this is no longer prejudice—it’s reality.
1. I didn't realize that you and I overlapped in training at Beth Israel Hospital-- looks like you were a medical resident there during the tail of my cardiology fellowship and my first couple of years on faculty.
2. I find yours (and John's) conflation of interpretation of a research study with "critical appraisal" to be problematic. IMHO, if you want to criticize a study design or point out its limitations, that's critical appraisal in my book. It's reasonably objective and focuses on what are generally practical tradeoffs in conducting research that investigators face every day. OTOH, if you want to provide subjective opinions on things like generalizability or likelihood of a study results being replicable (by Bayesian inference), that's interpretation. I think those are 2 completely different objectives and should not be conflated with each other. The first is an objective exercise. The second is largely a subjective judgement and is far from high science (although it still provides a usual perspective).
Great comment. That’s exactly the issue I’m unsure about. But, since I think CA is not for its own sake but for making clinical decisions, it must include everything.
You have the harder job of picking a therapy for an individual. It is the inverse of generalizability. Maybe research should focus more on rigorous patient selection as the goal of a study.
I am reminded of a great article from The Lancet. We often fixate on significance and ignore effect size.
https://www.thelancet.com/journals/lancet/article/PIIS0140673610611749/fulltext
I tend to align with the critical attitude with which Dr. Ionnadis, as well as yourself and others, approach the examination of scientific evidence. The only thing I’d push back on is the idea that the frequentist “takes numbers at face value.” Even RA Fisher himself understood that the results of an experiment can only begin to be accepted when repeated trials “rarely fail[] to give this level of significance.”
I understand the broader point though and it’s one I agree with — common interpretation of frequentist measures often lack that necessary provisionality.
Thank you for the great thought provoking article. The idea of Bayesian thinking being considered during the appraisal of studies evaluating medical interventions seems helpful to me, particularly to put the study findings in context with prior literature. This may be considered more interpretation, which I would say is involved in the "third/final part" of the appraisal process when considering how the results apply to a particular patient. I would be hesitant to consider Bayesian "priors" when assessing for threats to validity and the clinical importance of the study results.
well done
This should be required reading for the general public. As a layperson, I’ve learned so much from Sensible Medicine, articles and discussions.
Great article and some great comments.
Thanks for the links to those other papers on CA. I will need to check those out.
I think of CA as a process that allows each individual clinician to translate evidence to their clinical encounter with Mrs. Smith (among many others).
The nuts and bolts of evaluating a study are obviously crucial. When I was an IM res we went through a series in the Annals of Internal Medicine. It was step by step….precise….and extremely dry. Though it was useful as a foundation, there was no art there.
While I am not as much of a Bayesian as you and Dr. JMM, I ultimately evaluate how “good” a study is based on how well I can apply the knowledge learned (whether the trial was “positive” or “negative”) to the Mrs. Smith’s of my world. So I think for me, all else being equal, external validity is where it’s at.
I am not a statistician, nor do I have a deep understanding of what it means to be Bayesian vs frequentist beyond simply taking into account pre-test probabilities. I am a clinician though (ID Fellow) and trying to learn how to interpret medical evidence is clearly relevant to me.
Here endeth the caveats.
Now the question: what goes into the generation of someone’s pre-test probability (PTP for short). Is it prior study results? Physiologic reasoning? Vibes? I ask this because shouldn’t the PTP be subject to critical appraisal, just like any other number? But then it seems that we approach an infinite regress. Prior study results are subject to the same Bayesian concerns as the ones we are currently interpreting. Physiologic rationale, as Vinay Prasad has repeatedly taught me, is unreliable at best. It seems like we might be left simply with “Vibes”, or if we want to sound sophisticated, JH Newman’s “illative sense”.
Is there a Bayesian answer to this? Or is the Bayesian framework better understood simply as a *description* of how our cognition works rather than a *prescription* to aid in attaining objectivity?
PTP comes from everything you note. Bioplausibility, effect of similar interventions, clinical experience... It is subjective but not meaningless.
Thanks for the reply. Curious as to your thoughts about the potential for infinite regress here in terms of epistemological justification. Do you worry about this?
Relatedly, do you feel Bayesian thinking is descriptive or prescriptive?
Dr. Cifu, in case this forum no longer offers another opportunity to discuss the manuscript, I’d like to suggest that we approach the article from a broader and more complex perspective, beyond the VÍA framework I mentioned in my previous comment on this post.
Regardless of the study itself—its methodological flaws, which are identifiable and meaningful—and its results, the fact that 8-year survival ranged from 70% to 80% among patients who “simply” received “education about exercise” implies that, overall, the trial cohorts are survivor cohorts with a much better prognosis than most colon cancer patients worldwide.
But the real question we should ask is: beyond exercise education—even if not adhered to—or participation in a rigorous training program (which only “healthy,” health-conscious individuals inclined toward physical activity are likely to follow), what other truly impactful intervention could this study have considered? Perhaps in a third experimental arm?
If anything has a significant attributable fraction in all chronic diseases—and is certainly fundamental in cancer—it’s nutrition. I reviewed all supplementary materials of the study, and nutrition appears merely as a sentence noting that the education group—but not the structured exercise group—received recommendations about a “healthy diet.”
My question is: what do the researchers—clearly from the world of physical exercise—consider a healthy diet? All the research we’ve been involved in for the past five years strongly suggests that it is currently quite possible to tell a person, individually, what a healthy diet means for them.
As a hypothesis (which, I admit, borders on strong certainty—and possibly prejudice), I would propose a third intervention group, in which each patient is assigned the diet most appropriate for their biotype, body composition, culture, habits, and naturally, one that excludes carcinogenic elements. I would even dare to estimate that such an approach could yield 8-year survival rates exceeding 95%.
That is to say: between a so-called healthy diet and a healthy exercise routine—both of which remain widely misunderstood—I would prioritize, in colon cancer and especially in all gastrointestinal cancers, a careful dietary approach and lifestyle modifications specifically related to eating habits associated with cancer risk.
By the way, the fundamental problem with this study lies in the selection process, which, on the one hand, allowed the inclusion of post-adolescents and young adults—individuals who naturally have a different relationship with exercise, are more likely to engage in it, and therefore met the inclusion criteria more easily.
This selection bias is almost entirely responsible—albeit artificially—for the marginal effects observed in the study, which are comparable in size to those seen with medical treatment for colon cancer.
Truly excellent post, thank you. The Ioannidis paper is great. I need to re-read it several times. The concept of R is challenging but it rings true.
The corollaries are an excellent guide.
Thoughts:
Replication is a pillar of scientific method, if this pillar crumbles?
Underpowered experiments coupled with excessive numbers of measurements (“because we can”) can be very misleading (in my experience).
“Non-significant” findings are still significant, (as my Ph.D. Advisor said).
Bias is becoming a more pervasive problem. There has always been bias, especially when the pet theory of an eminent, tenured professor at a major university is challenged by upstarts, but it seems more widespread and insidious today.
Meta-analysis is a great advance but potential pitfalls are serious. We are basically treating individual studies as independent experimental units. These studies are often opaque to scrutiny, or would take many hours to vet. Much faith is put in the authors and peer review.
I am (old) skeptical of papers with more than 5-6 authors, and more than 15 pages. There are certainly exceptions, but can’t anyone write up a study in 5-7 pages anymore? Seems to be a lot of weeding needed to ferret out the hypothesis, design and key results.
On the colon cancer study the findings are not surprising, are they? If the treatment were colloidal silver or something off the track like that it would be a different matter. Yes the magnitude of the treatment effect may be suspiciously large but the downside risk of a patient having some supervised exercise seems low. I’m not an M.D. or medical professional though.
Thank you again for this thought provoking post.
Dr. Cifu, as always, you raise crucial points regarding critical reading. To be consistent, this is precisely what I teach in my course.
1. No study is perfect; at best, it is dressed up to appear so in order to pass peer review, including the publication process.
2. Critically reading an article means making an objective diagnosis: understanding the research and judging it objectively in terms of its methods, based on how the study should have been done. This is why—and here I respectfully disagree a bit—one should never begin with the magnitude of the outcomes. Only after assessing the study’s validity should we consider the intensity and significance of the results, and finally, their applicability to our patients. Pragmatically, this means adjusting the study’s NNT for efficacy to match our own patient’s baseline probability for the most clinically relevant outcome. I’ve used the Spanish acronym VÍA (which conveniently translates as way in English) to summarize this process: Validity, Intensity of results, and Applicability.
3. I understand the view that the low pre-test probability of the results being valid may be seen as a political stance toward studies. I would call it distrust. Given that two-thirds of published studies are essentially waste (as Ioannidis argues), and two-thirds of randomized controlled trials are commercially sponsored or influenced by commercial interests, it is reasonable to adopt the mindset that “if you assume the worst, you’ll probably be right.” Unfortunately, this is no longer prejudice—it’s reality.
Skepticism keeps you healthier, and usually, wealthier.
Thanks for another thoughtful, insightful post. Great way to start thinking in the morning.
1. I didn't realize that you and I overlapped in training at Beth Israel Hospital-- looks like you were a medical resident there during the tail of my cardiology fellowship and my first couple of years on faculty.
2. I find yours (and John's) conflation of interpretation of a research study with "critical appraisal" to be problematic. IMHO, if you want to criticize a study design or point out its limitations, that's critical appraisal in my book. It's reasonably objective and focuses on what are generally practical tradeoffs in conducting research that investigators face every day. OTOH, if you want to provide subjective opinions on things like generalizability or likelihood of a study results being replicable (by Bayesian inference), that's interpretation. I think those are 2 completely different objectives and should not be conflated with each other. The first is an objective exercise. The second is largely a subjective judgement and is far from high science (although it still provides a usual perspective).
Write us a post on this. Is it semantics? Is critical appraisal really objective? I’m not sure.
I second that.
Great comment. That’s exactly the issue I’m unsure about. But, since I think CA is not for its own sake but for making clinical decisions, it must include everything.
Thanks. Adam
You have the harder job of picking a therapy for an individual. It is the inverse of generalizability. Maybe research should focus more on rigorous patient selection as the goal of a study.
Well said.