When Studies Can't Answer an Important Question (but are still published)
Study of the Week readers may be shocked to read this. I was.
Let’s do a thought experiment about the tricuspid valve. The TCV controls blood flow from the right atrium to the right ventricle.
Background: A common TCV problem occurs when the leaflets don’t close properly during systole and there is too much regurgitation of blood back to the right atrium. We call this TR or tricuspid regurgitation.
TR creates issues for patients because it can lead to terrible swelling in the abdomen or legs; it causes dilation of the RV that can progress to RV pump failure.
Surgeons can operate on TR to either repair or replace the valve. But it’s a tough operation with high risk to the patient.
Design of the Experiment: An important question in cardiology is if and when to operate on patients with TR. You are tasked with designing such an experiment.
First you would find patients with severe TR (with or without symptoms) and then randomize one group to surgery and one group to medical therapy. Since it’s impossible to blind such an experiment, you would definitely need an unbiased endpoint—say death. Randomization would be critical because the only way to judge the effect of the treatment would be if the patients were matched in known and unknown factors.
Perhaps a better way to study this would be to have a sham operative procedure wherein one group got the real repair, one group got an operation but no surgery, and one group got standard medical care. Sham operations are tough to do because of the ethical challenges. (Discussion of sham operations would be a great topic for future posts.)
The Actual Study published in an Actual Journal: A group of surgeons from the Cleveland Clinic reported their study in The Journal of Thoracic and Cardiovascular Surgery, which has an impact factor of 6.
The title gives away the results:
Here is what they did: They took 159 patients who had severe TR and were operated on from 2004-2018. These were operations solely for the tricuspid valve. Most tricuspid surgeries are done as add-on to other surgeries.
After the fact (retrospectively) they made two groups.
One group ( n = 115) had surgery based on the guideline recommendations, which require severe symptoms. The other group (N = 44), called the early group, were patients who had severe TR and some RV dilation by echo, but no symptoms.
They then looked at a primary outcome of death.
The early group had much better results. Patients in the class I group had a higher composite morbidity than early surgery (35.7% vs 18.2% ; P = .036).
Here is the graph: Notice that most of the difference occurs in the first few months. See my later comments.
The surgeons thusly concluded (emphasis mine).
Patients with class I indication for isolated TV surgery had worse survival compared with those undergoing earlier surgery before reaching class I indication. Earlier surgery may improve outcomes in these high-risk patients.
They then spent, and I kid you not, 1400 words in the discussion section promoting the finding that operating earlier on these patients is better. The message was that we wait too long to operate on these patients.
But then, after these 1400 mostly glowing words, they wrote one sentence that exposes the fatal flaw of this exercise.
Differences in baseline characteristics between the 2 groups likely largely explain the differences in their outcomes as discussed above, and propensity matching was not performed.
Comments:
I show you this study because it shocks me. I am shocked that academic researchers would publish such an exercise.
There was no randomization. So the baseline characteristics of the two groups had major differences. I counted 9 factors in which the early surgery group were healthier—for instance, they were younger, leaner, had less heart failure, and less atrial fibrillation, etc.
The comparison, therefore, was between a healthier group vs a sicker group. That is surely why the survival curves separate in the first few months.
Not randomizing groups and comparing healthier to sicker patients are bad, but it gets worse.
A normal practice when comparing two non-randomized groups is to attempt some sort of statistical matching. The idea is to take from the two groups a set of patients who at least look like each other on some baseline factors. The authors did not do this.
What gets me is that their disclaimer statement says that the differences in patients explain the results. The translation is that our experiment can’t answer the question.
But they still publish the study and conclude that we should be operating earlier on patients with tricuspid valve disease.
These are prominent surgeons from a famous center. This is an academic paper with strong conclusions. It may have influence over other surgeons. It may change what surgeons do; or when cardiologists refer patients for consideration of surgery.
But. But.
The study’s methods are utterly unable to answer the question. And the authors know this; they wrote such.
But there the study sits—in an academic journal.
What should we think about this? Why does this happen? What does it say about medical science? How does it not induce cynicism? I am asking.
As always, we at Sensible Medicine remain grateful and surprised at your support. Thank you. The Study of the Week has remained open to all subscribers. JMM
I disagree, John. If the editors recognized the controversial element to the discussion, they could and should require a revision of the conclusions. On the other hand, they could solicit a contrary editorial to bring out the deficiencies of the study. We in the medical profession need widespread education and exposure regarding faulty studies. Journal readers EXPECT careful and critical reviews. Subscribers need to complain to the editors and publisher that this flawed study was published without reviewers demanding a more fulsome discussion of limitations and an opposing view in an editorial. Physicians need to seriously consider their responsible to be critical especially in light of their continued general respect for the deeply flawed Tony Fauci.
Dear John
Shocking example!
If you are collecting articles by authors on what I call 'quasi-experimental studies' , i.e. non-randomised prospective studies , who can't be bothered to do a proper analysis , here is another example:
Pan H, Zhou X, Shen L, Li Y, Dong W, Wang S, et al. Efficacy of apatinib +radiotherapy vs radiotherapy alone in patients with advanced multiline therapy failure for non small cell lung cancer with brain metastasis. Br J Radiol (2023) 10.1259/bjr.20220550
Propensity score matching is not a universal panacea, but can be useful. I wrote a discussion on propensity scores here:
247. Campbell MJ (2017) What is propensity score modelling? Emergency Medical Journal. 10.1136/emermed-2016-206542