The goal of having educated doctors is without controversy. Everyone wants their doctor to be an expert in medical practice.

The controversy comes in how best to achieve this obvious societal good.

For US physicians, the American Board of Internal Medicine holds a monopoly in the certification of doctors. The idea is simple: ABIM gets experts to write test questions. Doctors must pass an exam every 10 years. They then receive “board certification,” which is accepted by hospitals and insurance companies. Board certification was once a one-time accomplishment, but the ABIM has now mandated a test every 10 years.

Most recently, the ABIM has allowed a substitute for the high stakes test called longitudinal knowledge assessment (LKA) where doctors answer 30 open-book questions every quarter.

I know of few organizations more despised than ABIM. They have consistently increased fees and requirements over the years. They disallow other forms of continuing medical education.

My colleague Wes Fisher has led the charge against ABIM. His blog has a trove of material regarding the fight to bring competition to medical education.

In my opinion, the strongest argument against the ABIM monopoly is the lack of empirical evidence that their brand of education improves actual outcomes.

And this brings me to a study published in JAMA-Internal Medicine that attempted to correlate ABIM-branded education and patient outcomes. The study offers a view into many of the major flaws of observational research as well as the hubris of biased researchers.

The Study

The study group included slightly more than 4,000 hospitalists who had enrolled in the LKA in 2022-23. Patient outcomes were derived from a pool of 260,000 hospital admissions from these doctors.

The primary endpoints were a composite of mortality or readmissions at 7 days. Secondary outcomes included 30-day mortality and 30-day readmissions, length of stay and consultations.

The key timing feature was that patient outcomes were measured two years before physicians took the LKA.

The authors split the test scores into quartiles and compared the top vs bottom quartiles.

The Results

Per 1000 hospitalizations, the 7-day adjusted mortality difference for physicians in the top vs bottom quartile was −4.1 (95% CI, −7.7 to −0.5; P = .03), a 7.8% difference. This difference was similar when comparing physicians in the second (−3.8 [95% CI, −7.1 to −0.6]; P = .02) or third quartile (−3.9 [95% CI, −7.1 to −0.6]; P = .02) with those in the bottom quartile.

Here is what it looks like:

The authors also found similar differences in 7-day readmissions as well. At 30-days, the association was noted for mortality but not for readmissions, length of stay, or consultations.

The authors concluded

Seven-day mortality and readmissions were lower for physicians in the top vs bottom quartile of LKA scores, consistent with prior research that found similar associations with initial certification scores

My Comments:

There are at least 5 problems with this attempt to assess ABIM-branded medical education. Some of which were acknowledged by the authors.

The most glaring problem is the timing issue. The study measures outcomes 2 years before enrolling in the LKA. This assumes that performance on a future test reflects past clinical competence—which may or may not be true. Why not assess performance during or after enrolling? (I will come back to this.)

There is also clear selection bias, as only physicians who voluntarily enrolled in the LKA were included. The authors excluded many more physicians than they included. And while they made adjustments for many factors, there were likely differences in baseline characteristics between the lower and upper quartile groups that drove the different outcomes.

Which brings me to the matter of analytic flexibility. I covered this topic in April of 2024 here on Sensible Medicine. The team at McMaster University, led by Dr Dena Zeraatker, elegantly showed that a typical observational study can include massive amounts of analytic flexibility that bear on results—confirming the work of Brian Nosek’s Many Analysts, One Data Set paper. I also wrote about Nosek’s work.

Zeraatker’s team studied meat consumption and mortality and found a quadrillion different ways to analyze the data. That’s not a typo. They then narrowed the choices down using 10 random unique combinations of covariates, which yielded about 1400 ways to analyze the data. Ultimately, most analytic methods showed no effect; but there were both positive and negative associations.

The ABIM study used one method of analyzing the data. But as Zeraatker’s and Nosek’s teams have shown, observational research offers many different analytic choices. I would bet any of you an espresso, that other research teams making different but reasonable analytic choices would find null results.

Perhaps the largest problem with this study is plausibility. If you know anything about hospital medicine, it would be foolish to believe that being in the lowest quartile of an open-book test 2 years in the future “caused” a difference of 0.4% higher 7-day mortality.

Why? Because a zillion things are on the causal pathway to 7-day mortality of a hospitalized patient. Hospitalist care represents a tiny fraction of these causal pathways. For instance, the vast majority of patients sick enough to die a week after being hospitalized see specialists, who, arguably, have a larger effect on outcomes. Hospitalists have little to no control over the quality of specialty care. I don’t care what the p-value is, a 0.4% delta is pure noise.

The final matter is the glaring conflicts of interest. Three of the authors of this study are employees of ABIM. The senior author, the only physician author, reports a conflict, but none of the other authors consider their employment at ABIM as a potential conflict. Neither did the editors of JAMA-IM. Imagine a drug study with three of four authors from the drug company. This would never be published.

Final Conclusion:

I see this as one of year’s best examples of marketing disguised as science. It is embarrassing for the ABIM to promote such weak efforts at empirical study. But it’s even more embarrassing for the editors of JAMA-IM to publish such a dubious study—especially without demanding proper disclosure of conflicts.

The thing is that ABIM’s value is entirely testable. It would simply require a randomized controlled trial where testing is compared to standard CME. Why this has not been done nor required is surely related to the fact that ABIM has a monopoly. It does not need to.