A Brilliant Comment Makes the Study of the Week
On yesterday’s podcast, I talked with Bobby Yeh, an academic cardiologist who made a compelling case for enhancing credibility of observational research. Please do listen. Bobby is one of the smartest people in cardiology today.
After closely reviewing the cardiac literature for the past decade or so, I have become increasingly hopeless that we could glean any useful information from non-random retrospective comparisons. Mainly due to systemic bias.
Bobby infused me with some hope.
But then came this email. It’s from Paul Dorian, MD. Paul is a cardiologist and Professor of Medicine at the University of Toronto. With permission, I will publish his email. It is a trove of educational material.
Sensible Medicine is a reader-supported publication. To receive new posts and support our work, consider becoming a free or paid subscriber.
John, thank you for your recent outstanding podcasts, and this recent reference to two important and thoughtful papers on observational studies. (Editor’s note—links are in the show notes from the podcast yesterday.)
While I agree on the general points made by both authors and the confusion in the community between the usual (unarticulated) purpose of observational studies and their reporting (either not acknowledging the “causal intent” or using weak terms such as association), I think a general comment is summarized by Yogi Berra:
“In theory there is no difference between theory and practice - in practice there is"
I am very concerned that, even as we strive as a community to do better observational studies, taking into account the important concepts and methods discussed in these papers, there will be an implicit belief that striving for causation means that epidemiological studies will actually get us to the Holy Grail. This is most often a fool’s errand in the real world.
The reasons are multiple, but I will articulate a few:
Most epidemiological studies are done from large data sets that are not collected for the purpose of research. This means that the data on which the observations are based, independently of how they are analyzed, are not reliable.
Administrative data sets are often collected for the purpose of billing, or costing, or administrative organization, and thus are extremely vulnerable to up coding or intentional or accidental omissions. In universal healthcare systems, where these factors are less at play, administrative data sets are completed and compiled by professional coders, who rely on the medical record. I think we can all agree that the medical record is unreliably reflective of what the caregivers were actually thinking, and what their ultimate diagnosis was. If medical practitioners cannot decide if a small troponin rise in a patient with myocarditis and non-critical coronary disease was due to an infarct or not, how can coders possibly make an informed decision about this?
More pertinent to your recent This Week in Cardiology podcast, patients with monomorphic ventricular tachycardia and coronary disease often have a small troponin rise and ST segment changes on their initial post-cardioversion ECG (and have established coronary disease). How is a professional coder to decide if this event was due to an MI or not? I have not investigated this question, but I would wager that administrative medical records often indicate an “MI” in this setting, although most of us would not consider this type of event to be a true myocardial infarction, or at least not the type that would require urgent angiography and coronary revascularization.
A second major problem with administrative data is that it fails to account for the severity of underlying disease or morbidities, as opposed to their presence or absence. This is a common probably insurmountable problem with morbidities, such as hypertension, diabetes , COPD , etc.
In these cases, not only is the diagnosis completely arbitrary on the part of the clinician, but the severity of the condition has an extremely strong influence on its contribution to the usually measured outcome.
Another problem, which is possibly surmountable, is the conflation of what one can call “patient outcomes” versus “doctor outcomes.” For example, mortality is a patient outcome—provided it can be reliably ascertained. This is not always obvious in the clinical data sets, since mortality out of hospital may be recorded in vital status databases but not in hospital or healthcare system databases especially if patients move, etc..
On the other hand, outcomes, such as rehospitalization or revascularization are doctor outcomes, very heavily influenced by physician biases, economic factors, cultural factors, insurance status, etc.
I would argue that these biases are even more at play in observational studies than in clinical trial settings.
Well intentioned and sophisticated methods of correcting for bias by indication, for example, such as propensity analysis, are seriously hampered by the assumption that physicians are consistent (within and between doctors) in their treatment decisions. We know this not to be the case.
Another commonly used source of data is patient reported data, such as alcohol consumption, exercise habit, dietary habits, etc.
These are not only weak approximation of the actual habits individuals have over time, but in many cases the measures used are known to be systematically biased. The alcohol consumption studies and exercise habits studies are particularly relevant. There are credible observations that alcohol consumption is systematically underestimated in observational studies, and the extent of underestimation is asymmetric, with more underestimation at a moderate or high alcohol intake. As a consequence, the dose response relationship of alcohol to health outcomes, even if the relationship is “causal” will be systematically incorrect. Studies relating self report of activity/ exercise versus objectively measured movement with accelerometers yield similar, systematic over estimation, which is greater in certain specific populations.
Unless the data on which the analyses and inferences are to be made are collected for the purpose of research, rather than for another reason, and subjectively established data is validated for accuracy, precision, and reliability, no analytic legerdemain can compensate for these major limitations.
The long and unfortunate history of observational data of dietary constituents and supplements (vitamins, fish oils, chocolate , etc) for which persuasive epidemiologic research has suggested causal connections, and eventually thoroughly and completely refuted by randomized blinded clinical trials, serves as a warning about the risks of introducing discussion of causal affects when associations are reported.
The arguments advanced in favor of forthrightness in articulating the purposes of epidemiological studies, and the need for rigor and transparency in their methods are extremely well supported. However, I worry about “interpretation creep,” whereby careful discussions about strengths and limitations and the design of observational studies would be spun to causal inferences, whether articulated or not.
By all means, let us continue doing observational studies, since they give us important insights into what is actually happening, and since they allow the formation of important and testable hypotheses.
Unfortunately, in most observational studies, hypotheses is all we’re going to get.
We at Sensible Medicine are very excited to get this kind of interaction regarding medical science. It is our goal. Thanks for your support. It has been amazing.
Next week I am going to show you what a well-done medical study looks like. It will be a positive upbeat view of medical science. JMM.