Cohort studies are observational studies. They are natural experiments that might be examined prospectively or retrospectively. Classically, cohort studies are used to assess the prognosis of a group of patients. This is (mostly) how the Users’ Guides article examines them. A cohort study tells you about the risk of a certain group of people or patients developing an outcome. Cohort studies can also compare the prognosis of two groups. For example, what are the rates of stroke in patients with chronic atrial fibrillation and in similar people without Afib?
Cohort studies find some interesting uses in clinical medicine. Besides providing prognosis data, cohort studies can be used to assess the effectiveness of a medical practice that is accepted but lacks a strong evidence base. If the benefit of a practice has come into question, but a RCT cannot be done because the practice is the accepted standard of care, a cohort study might provide data on patients who received the treatment and those that did not.
Cohort studies can also be used to test novel hypotheses, ones that do not yet have the data foundation to permit an RCT. If exposure or lack of an exposure is associated with an improved outcome in a cohort study, an RCT might then be designed to experimentally test that intervention.
Design
A group of people is assembled, classified according to exposures (a cohort), and then observed over time to see who will develop the outcome:
Prospective cohort studies have some real advantages over RCTs. They can accurately establish incidence, and because they are natural experiments, they are exempt from the ethical standards that govern experimental studies. Cohort studies are also less expensive to run than equally large RCTs.
On the other hand, a well-done cohort study often takes years to run and is expensive. Cohort studies cannot address rare outcomes. Most importantly, cohort studies, except in rare circumstances, are compromised by confounding. This is the most important drawback of observational studies and the main reason that they prove association, not causation. When considering therapy, medical research is meant to determine causal relationships, not just co-occurrence, because only causal relationships suggest an intervention. This point calls for the best XKCD cartoon ever.
Oh hell, let’s throw in one more picture.
Accepting that observational studies show association, not causation, there are several criteria that hint at a causative relationship.
1. Temporality: cause precedes effect
2. Strength of association: large odds ratio
3. Dose-response: larger exposures result in higher rates of disease
4. Reversibility: reduction in exposure leads to lower rates of disease
5. Consistency: repeatedly observed in different studies
6. Biologic plausibility: makes sense, according to current biologic knowledge
Confounding
The primary reason observational studies cannot prove causation is because of selection bias leading to confounding. Confounding is present when the association between exposure and outcome is distorted because the study groups differ with respect to factors other than the one of interest. To be a confounder, a variable must be related to both the exposure and the outcome and distort the measured association between exposure and outcome.1
When I was first taught about confounding, the example that my professor used is still a powerful one. He confidently stated that Volvos are safer than Corvettes. He said that we know this because fewer Volvo drivers die in motor vehicle accidents each year than Corvette drivers. He then showed us pictures of the cars from the era and the confounding was obvious:
Volvos were driven by responsible parents and corvettes were driven by testosterone addled young men. If you put Corvette drivers into Volvos and Volvo drivers into Corvettes you can bet the data would change.2
There is, pretty much, always selection bias in cohort studies. If two non-random groups are being compared, it is unlikely they will be identical with regard to risk factors (other than the one of interest). In the most famous example, post-menopausal nurses who did and did not take estrogen replacement therapy differed in their exposure to ERT, but they also differed in their family history of coronary disease and DM, their smoking history, their multivitamin and aspirin use, their age, their BMI, and their alcohol use (in addition to many unmeasured variables).
There are many tools that researchers use to deal with confounding. The most common are:
1. Restriction: Limit the range of characteristics of patients in a study. This reduces generalizability.
2. Matching: Select a few characteristics that will be, intentionally, kept the same in both groups. You can do this only for a couple of variables.
3. Adjustment
It is way beyond the scope of this series (and, if I am honest, beyond my expertise) to go into all the ways that researchers try to adjust for confounding. That said, two topics are particularly important.
Propensity Matching
When propensity matching is used to control for confounding, each patient is assigned a propensity score. This score is a summary of characteristics that predict whether or not the patient will experience the exposure of interest. Each patient in the exposed group is then matched with a patient in the non-exposed group with the same propensity score. This technique provides an approximation of a randomized trial because baseline differences are less likely to confound the relationship of the predictor with the outcomes. This is true, of course, only to the extent that relevant predictors are included in the propensity score.
Natural Experiments
Some of the most interesting cohort studies make use of “natural experiments.” In these studies, life randomizes people to the presence or absence of an exposure. Anupam Jena has published some of the best of these studies. In one of my favorites, emergency physicians were categorized as being high-intensity or low-intensity opioid prescribers. The authors then identified opiate naive patients who went to the emergency rooms in which these doctors worked . The authors then compared rates of long-term opioid use (defined as 6 months of days supplied) in the 12 months after a visit to the emergency department among patients treated by high-intensity or low-intensity prescribers. In this observational study, patients were essentially randomized to high or low opiate prescribers. The researchers found that patients exposed to high opiate prescribers were more likely to be using opiates months later.3
Despite these steps to reduce confounding, there always remains some uncertainty in the interpretation of the results of observational studies. Here are a few biases and problems that can undermine the accuracy of the results of observational studies.
Survival Bias
Survival bias is present when a cohorts entry into a trial is affected by their condition. The traditional survival bias is dictated by aspects of a disease. For fatal diseases, cohorts might include patients with better prognoses -- those who haven’t died. On the other hand, for diseases that sometimes remit, cohorts might include patients with worse prognoses -- those who have persistent disease and therefore are seeking medical attention.
Immortal time bias is a type of survival bias. Immortal time is a period of follow-up during which, by design, death (or the study outcome) cannot occur in one group but can in the other. An example might be when the determination of an individual’s treatment status involves a wait period during which follow-up time is accrued. In this case, one group might begin accruing time at the time of discharge from the hospital, while the other group is “immortal” until they pick up a medication after discharge – the criteria for entering the “exposed” group in the study. A recent JAMA Guide to Statistics and Methods did an excellent job discussing this bias.
Vibration of effects
Lastly, the results of observational studies can be greatly affected by decisions made by the researchers about how they address confounders. In a brilliant 2015 article, Chirag Patel, Belinda Burford, and John Ioannidis showed the diverse results that can be obtained depending on decisions made on how to control variables.
How reliable is a well done cohort study?
There are plenty of questions that can’t be addressed by RCTs. There are even more questions that could be addressed but haven’t been, leaving us to rely on observational data for guidance. So, how reliable are the conclusions of observational studies? Studies have attempted to answer this question (two of my favorites here and here). In the end, a reasonable answer to the question comes from the second article: “Currently available evidence suggests that inferences about comparative effectiveness from observational data sometimes disagree with the results of RCTs, even when contemporary methods of confounding control are used.”
The Numbers
Cohort studies report results in a variety of ways, depending on how the cohorts were assembled and how much attention is paid to defining different levels of risk factors. Sometimes results are reported in a way similar to that of therapy articles:
Relative risk (RR) = incidence of disease in exposed patients/incidence of disease in nonexposed patients
Risk difference (attributable risk) = incidence in exposed patients - incidence in nonexposed patients
Etiologic fraction = rate in exposed-rate in unexposed/overall rate
Example:
Incidence (per 1000) of stroke in smokers = 29.7
Incidence (per 1000) in nonsmokers = 7.4
RR = 29.7/7.4= 4.0
Risk difference = 29.7-7.4 = 22.3 per 1000
Thus, one could say that if people stopped smoking the risk of stroke would decrease by 22/1000.
If you had a population of 1000 people with a rate of stroke of 3% you could write: (.029-.0074)/0.3 = 72%. 72% of strokes are caused by smoking.
The course that led to the first drafts of these posts was all about reading classic articles. Here are some wonderful examples of cohort studies. N.B., a few of the conclusions reached in these trials are dead wrong. I’ll leave it to you do pick those out.
(For those of you who already read the RCT article, I have added articles after that post as well).
Hirayama T. Non-smoking wives of heavy smokers have a higher risk of lung cancer: a study from Japan. BMJ 1981;282:940-941.
Grodstein F, et al. Postmenopausal estrogen and progestin use and the risk of cardiovascular disease. NEJM;335:453-61.
Connor AF, et al. The Effectiveness of Right Heart Catheterization in the Initial Care of Critically Ill Patients. JAMA 1996;276:889-896.
Haffner SM, et al. Mortality from Coronary Heart Disease in Subjects with Type 2 Diabetes and in Nondiabetic Subjects with and without Prior Myocardial Infarction. NEJM 1998;339:229-234.
Andersson RE, et al. Appendectomy and Protection Against Ulcerative Colitis. NEJM 2001;344:808-814.
Christakis DA, et al. Early television exposure and subsequent attentional problems in children. Pediatrics 2004; 113:708-13.
Sjöström L, et al. Effects of Bariatric Surgery on Mortality in Swedish Obese Subjects. NEJM 2007;357:741-52.
Huang ES, et al. Impact of nasogastric lavage on outcomes in acute GI bleeding. Gastrointest Endosc 2011;74:971-80.
Nielsen SF, et al. Statin Use and Reduced Cancer-Related Mortality. NEJM 2012;367:1792-802.
Zammit S, et al. Self reported cannabis use as a risk factor for schizophrenia in Swedish conscripts of 1969: historical cohort study. BMJ 2002;325:1199.
Muzaale AD, et al. Risk of End-Stage Renal Disease Following Live Kidney Donation. JAMA. 2014;311:579-586.
Gomm W, et al. Association of Proton Pump Inhibitors With Risk of Dementia: A Pharmacoepidemiological Claims Data Analysis. JAMA Neurol. 2016;73:410-416.
Chapman S, et al. Association Between Gun Law Reforms and Intentional Firearm Deaths in Australia, 1979-2013. JAMA 2016;316:291-299.
Barnett ML, et al. Opioid-Prescribing Patterns of Emergency Physicians and Risk of Long-Term Use. NEJM 2017; 376:663-673
Pincus D, et al. Association Between Wait Time and 30-Day Mortality in Adults Undergoing Hip Fracture Surgery. JAMA. 2017;318(20):1994-2003.
Zeng W, et al. Association of Daily Wear of Eyeglasses With Susceptibility to Coronavirus Disease 2019 Infection. JAMA Ophthalmol. 2020;138:1196-1199.
Users’ Guides questions for articles about prognosis
1) Was there a representative and well-defined sample of patients at a similar point in the course of the disease?
2) Was follow-up sufficiently long and complete?
3) Were objective and unbiased outcome criteria used?
4) Was there adjustment for important prognostic factors?
5) How large is the likelihood of the outcome event(s) in a specified period of time?
6) How precise are the estimates of likelihood?
7) Were the study patients similar to my own?
8) Will the results lead directly to selecting or avoiding therapy?
9) Are the results useful for reassuring or counseling patients?
If there was going to be a test, a multiple choice question would come from this sentence.
How amazing would it be to do this study.
The Rolling Stones knew of this relation in 1972 when they recorded Torn and Frayed:
Joe's got a cough, sounds kind a rough
Yeah, and the codeine to fix it
Doctor prescribes, drug store supplies
Who's gonna help him to kick it
A cohort study may be of some interest. But, as the article points out, there are a lot of snags along the way. In addition to the impossibility of knowing or taking into account all the possible confounding factors, the accuracy of much of the data that is analyzed can be very unreliable. For example, a lot of the disease incidence numbers may be taken from death certificates where the accuracy is often highly questionable. In my field of cardiovascular disease these studies have generated endless lists of risk factors and, as a result, research has been largely mired in blind alleys for decades. Most risk factors are unmeasurable---diet, exercise or lack thereof, stress, etc.---or cannot be changed such as sex or family history. As long as cardiovascular research is stuck in the risk factor paradigm we will continue to get nowhere. But this is unlikely to change as these studies provide endless opportunities for getting research grants and furthering the careers of the researchers.
I am really enjoying this / some re-education, more new ways of looking at cohorts! And to this day, I can’t hear “Cold Turkey” by John Lennon without thinking of high opioid prescribers.