Marketing Disguised as Science Embarrasses Everyone Involved
A prominent medical journal embarrrasses itself by publishing a flawed study from the American Board of Internal Medicine. Critical appraisal lessons are obvious.
The goal of having educated doctors is without controversy. Everyone wants their doctor to be an expert in medical practice.
The controversy comes in how best to achieve this obvious societal good.
For US physicians, the American Board of Internal Medicine holds a monopoly in the certification of doctors. The idea is simple: ABIM gets experts to write test questions. Doctors must pass an exam every 10 years. They then receive “board certification,” which is accepted by hospitals and insurance companies. Board certification was once a one-time accomplishment, but the ABIM has now mandated a test every 10 years.
Most recently, the ABIM has allowed a substitute for the high stakes test called longitudinal knowledge assessment (LKA) where doctors answer 30 open-book questions every quarter.
I know of few organizations more despised than ABIM. They have consistently increased fees and requirements over the years. They disallow other forms of continuing medical education.
My colleague Wes Fisher has led the charge against ABIM. His blog has a trove of material regarding the fight to bring competition to medical education.
In my opinion, the strongest argument against the ABIM monopoly is the lack of empirical evidence that their brand of education improves actual outcomes.
And this brings me to a study published in JAMA-Internal Medicine that attempted to correlate ABIM-branded education and patient outcomes. The study offers a view into many of the major flaws of observational research as well as the hubris of biased researchers.
The Study
The study group included slightly more than 4,000 hospitalists who had enrolled in the LKA in 2022-23. Patient outcomes were derived from a pool of 260,000 hospital admissions from these doctors.
The primary endpoints were a composite of mortality or readmissions at 7 days. Secondary outcomes included 30-day mortality and 30-day readmissions, length of stay and consultations.
The key timing feature was that patient outcomes were measured two years before physicians took the LKA.
The authors split the test scores into quartiles and compared the top vs bottom quartiles.
The Results
Per 1000 hospitalizations, the 7-day adjusted mortality difference for physicians in the top vs bottom quartile was −4.1 (95% CI, −7.7 to −0.5; P = .03), a 7.8% difference. This difference was similar when comparing physicians in the second (−3.8 [95% CI, −7.1 to −0.6]; P = .02) or third quartile (−3.9 [95% CI, −7.1 to −0.6]; P = .02) with those in the bottom quartile.
Here is what it looks like:
The authors also found similar differences in 7-day readmissions as well. At 30-days, the association was noted for mortality but not for readmissions, length of stay, or consultations.
The authors concluded
Seven-day mortality and readmissions were lower for physicians in the top vs bottom quartile of LKA scores, consistent with prior research that found similar associations with initial certification scores
My Comments:
There are at least 5 problems with this attempt to assess ABIM-branded medical education. Some of which were acknowledged by the authors.
The most glaring problem is the timing issue. The study measures outcomes 2 years before enrolling in the LKA. This assumes that performance on a future test reflects past clinical competence—which may or may not be true. Why not assess performance during or after enrolling? (I will come back to this.)
There is also clear selection bias, as only physicians who voluntarily enrolled in the LKA were included. The authors excluded many more physicians than they included. And while they made adjustments for many factors, there were likely differences in baseline characteristics between the lower and upper quartile groups that drove the different outcomes.
Which brings me to the matter of analytic flexibility. I covered this topic in April of 2024 here on Sensible Medicine. The team at McMaster University, led by Dr Dena Zeraatker, elegantly showed that a typical observational study can include massive amounts of analytic flexibility that bear on results—confirming the work of Brian Nosek’s Many Analysts, One Data Set paper. I also wrote about Nosek’s work.
Zeraatker’s team studied meat consumption and mortality and found a quadrillion different ways to analyze the data. That’s not a typo. They then narrowed the choices down using 10 random unique combinations of covariates, which yielded about 1400 ways to analyze the data. Ultimately, most analytic methods showed no effect; but there were both positive and negative associations.
The ABIM study used one method of analyzing the data. But as Zeraatker’s and Nosek’s teams have shown, observational research offers many different analytic choices. I would bet any of you an espresso, that other research teams making different but reasonable analytic choices would find null results.
Perhaps the largest problem with this study is plausibility. If you know anything about hospital medicine, it would be foolish to believe that being in the lowest quartile of an open-book test 2 years in the future “caused” a difference of 0.4% higher 7-day mortality.
Why? Because a zillion things are on the causal pathway to 7-day mortality of a hospitalized patient. Hospitalist care represents a tiny fraction of these causal pathways. For instance, the vast majority of patients sick enough to die a week after being hospitalized see specialists, who, arguably, have a larger effect on outcomes. Hospitalists have little to no control over the quality of specialty care. I don’t care what the p-value is, a 0.4% delta is pure noise.
The final matter is the glaring conflicts of interest. Three of the authors of this study are employees of ABIM. The senior author, the only physician author, reports a conflict, but none of the other authors consider their employment at ABIM as a potential conflict. Neither did the editors of JAMA-IM. Imagine a drug study with three of four authors from the drug company. This would never be published.
Final Conclusion:
I see this as one of year’s best examples of marketing disguised as science. It is embarrassing for the ABIM to promote such weak efforts at empirical study. But it’s even more embarrassing for the editors of JAMA-IM to publish such a dubious study—especially without demanding proper disclosure of conflicts.
The thing is that ABIM’s value is entirely testable. It would simply require a randomized controlled trial where testing is compared to standard CME. Why this has not been done nor required is surely related to the fact that ABIM has a monopoly. It does not need to.
I want to see the data on how JCAHO is saving lives by not allowing drinks at the nurses station. Really ready to go to the research on that one.
The purpose of ABIM if you put money aside is to assure public safety. Ironically, I was sitting in a MOC review course in DC. I received an urgent phone call and when I answered, I was told by a paramedic that my wife had a severe headache while visiting family, she collapsed with a grandmal seizure and now was unresponsive. I was distraught but eventually somehow made it back to the hotel and eventually reached an emergency room physician. His exact words were "she's awake I'll call you". When I got through security at the airport, the phone rang and he said "it's a subdural the surgeons on the way ". I thought for a second and said "I don't know you, but you seem to know what you're doing. as a colleague please do I need to transfer her or can this person take care of it?". He thought for a second and said "yes this person can , the one tomorrow no" . And there you have it , all of the MOC outcomes distilled into what physicians know on a daily basis. There are people who clearly are not performing up to quality, but if you try to do something about it, especially if you're in a competitive hospital with multiple practices it's impossible. One of the reasons the heart hospital we owned met every quality metric and then some, was because it was a single practice with a single group. Four physicians of a board could do what needs to be done without fear of restraint of trade or other legal action that would be significant. If you want to deal with the burden of a ABIM unfortunately as a group we need to deal with the reality we all know. Sort of like the Godfather , we know where the bodies are buried . In every city in America there are a few percentage of bad actors as in every profession. Until there's way for us to deal with that in a compassionate and rational way this will continue .
If you want to do an interesting study on outcomes, compare the cities in terms of desirability to live versus outcomes. We had 100 applicants one year for position in cardiology in a suburb of Austin.
It's a competition for jobs in nice cities that allows practices in hospitals to choose and while that too is subjective, the statistics of having a greater applicant pool probably holds
As the practice grew and got to the critical mass of close to 50, we were also able to evaluate abilities a lot easier due to the "network" of doctors. It seems graduation from fellowship does not mean quality either. All of our physicians were on a two year probation with every three month evaluations in writing. This is post fellowship and post being signed off on.
It was necessary.
No patient cares what your test score is, but they want to know if their family is in the ER with a life-threatening problem the person driving in able to handle it.
Thankfully, airplanes have the technology to help. The instant access to information and maybe even AI can help Medicine in the same way except perhaps in the OR.
Unfortunately, now I've been on all sides. Physician, physician president of a group, patient and family member. It sure is a lot easier when the physicians involved are high-quality and not lunatics. That's another post.
Figuring that out before you hire them, work with them or before they see you or your family is the challenge. Once they pass the basic board certification, the test scores are not helpful.