The Quandary of a "Positive" Trial with a…

Nov 27, 2023

The MINT trial tested two strategies of dealing with a common problem. The results seem clear. Except for the statistical test.

26 Comments

I read the news late and just got to this.

My problem with this is that the study is unblinded; and the oodles of bias shows in this little stat:

>Here is another endpoint: Death due to cardiac disease was 74% higher in the restrictive arm. The CI went from 1.26 to 2.4.

This is an insane stat, and it makes me think the adjudicators of the cause of the death were biased away from blaming cardiac disease. When I declare death in patients; if I have no direct cause in mind I just put 'cardiovascular arrest'. If I were part of this unblinded study, I'd think three times before putting it for the person I just spent a couple of days filling with blood.

This then transitions over to 'adverse effects'. I am very wary of increasing blood viscosity and volume in cardiac patients; my rule of thumb is that absent defects in blood production proper, the body has the compensatory response of anemia of chronic disease for a reason.

Expand full comment

As a non physician I don't understand why we can't create a set of X reasonable sceneries (20? 50? 200?) for each pathology/patient and based on the data we have on those, we deliver one or more "approved" therapies.

Whenever we think that there are reasons to NOT follow these guidelines we have to follow a certain process which includes an experimental framework by which the non-standard intervention and its outcomes get tracked for scientific purposes.

Isn't the risk of arbitrary decisions based on individual prejudices - worsened by the lack of precise tracking of the scenery VS intervention VS outcome - way bigger than that of being forced to deliver a bad therapy by bureaucratic-algorithmic processes?

I understand that a physician is a professional with a sophisticated knowledge base and great responsibilities, but I don't get how what you do can't be more standardized.

In this case I'd conclude that the standard of care should be transfusion, but that we need more (precise) data to understand what's going on: therefore physicians can still deliver the restrictive therapy but they have to follow a stringent experimental framework.

Expand full comment

I like to remind myself about what a P-value means. It is not the probability that the result is wrong. It is the probability that the difference found in the study was due to random chance. A P-value of 0.05 means a 5% probability that the result was random. My probability/statistics teacher said you have to decide on a cut-off somewhere, and 5% seems to be an acceptably low enough chance by consensus. But it is only an arbitrary consensus. That same teacher said it's subjective, and that he himself would feel comfortable in some cases with a P-value of 0.1, or a 10% chance the result was random. As a clinician, a 7% chance feels low enough for me to tilt toward a more liberal transfusion cut-off. For most studies, a P-value should not be the primary criterion for the validity of a study. More important is whether the study design accounts for bias and confounding, which are both different from randomness. P-values and confidence intervals only address random chance.

Expand full comment

Sander Greenland

I'm very sorry but no, the P-value is NOT "the probability that the difference found in the study was due to random chance", nor does "a P-value of 0.05 mean a 5% probability that the result was random." Those are among the many common misinterpretations of P-values although they can be found in some statistics primers; see for example the 2016 is article I linked for a discussion of why those are incorrect. Also, no, you don't have to decide on a cut-off; you can and often should suspend judgment and just report whatever P-values you got, delaying decisions until you see more results rather than committing to a possible mistake because of inappropriate pressure from reviewers, editors, etc. And then a P-value is never a criterion for validity of a study; instead a P-value assumes the study is valid (it assumes the study is showing the truth apart from random "noise"). Finally, I repeat again that it can be very important to show P-values for alternatives as well as for the null. I do agree however that bias (including confounding) is crucial to consider, because P-values and CI only address random errors left after adjustments for controllable biases.

Expand full comment

Sander Greenland

I come from a statistics and research methodology background rather than a clinical background, although I've been involved in a hundred clinical studies. As such, I cannot fathom why and how researchers still obsess about whether p is above or below 0.05 as if the latter is some natural constant, or equivalently whether the 95% CI contains the null value. The founders of modern statistics including Fisher himself advised flexibility with the cutoff depending on the circumstances, while Neyman & Pearson and their successors emphasized that the cutoff should be chosen based on the costs of false positive (Type 1) vs false negative (Type 2) errors. And that is leaving aside the many (including me) who advise that such cutoffs should not be the basis for decisions, and should serve only as convenient reference points much like labeled points on a graphical axis. Conclusions about treatment effects need other information, including P-values for alternatives of clinical importance such as a minimal clinically important difference. The CI shows quickly where those alternative P-values are relative to 0.05, but better still is to look at them directly.

Here are a few of the many open-access articles my colleagues and I have written recently trying to stem this unhealthy compulsion to treat 0.05 as some magic number or universal constant of science; the first lists common mistakes traceable to that compulsion, the others detail how to reorient one's thinking to get a valid picture of the statistical information in a trial, including information about possible effect sizes other than the null:

Greenland, S., Senn, S.J., Rothman, K.J., Carlin, J.C., Poole, C., Goodman, S.N., Altman, D.G. (2016). Statistical tests, confidence intervals, and power: A guide to misinterpretations. The American Statistician, 70, online supplement 1 at https://amstat.tandfonline.com/doi/suppl/10.1080/00031305.2016.1154108/suppl_file/utas_a_1154108_sm5368.pdf, https://www.jstor.org/stable/44851769

Rafi, Z., Greenland, S. (2020). Semantic and cognitive tools to aid statistical science: Replace confidence and significance by compatibility and surprise. BMC Medical Research Methodology, 20, 244. doi: 10.1186/s12874-020-01105-9, https://bmcmedresmethodol.biomedcentral.com/articles/10.1186/s12874-020-01105-9, updates at http://arxiv.org/abs/1909.08579

Greenland, S., Mansournia, M., Joffe, M. (2022). To curb research misreporting, replace significance and confidence by compatibility. Preventive Medicine, 164, https://www.sciencedirect.com/science/article/pii/S0091743522001761

Expand full comment

Edward H Livingston, MD, FACS

Just to let you know, I took this reply seriously, read the suggested articles and spent a couple of months reading and studying about these issues. I emerged as a fully committed Bayesian (I was one already but did not understand it well enough). There is a need for better terminology regarding P values and confidence limits to ensure better interpretation of these numbers and what they really mean.

Expand full comment

Ernest N. Curtis

Thank you for the links to your articles. They should be required reading for everyone involved in conducting or reviewing scientific and medical research.

Expand full comment

My hospital prohibits blood transfusion until the Hgb is <7 (unless the patient is actively, briskly bleeding and you can convince the blood bank that the patient’s Hgb will soon be 0 if you sit around rechecking and waiting for it to fall below 7). I wonder if they’d designed the study so that “restrictive” was less than 7 whether it might have risen to significance.

Expand full comment

Nice article John. Once again, trial designers refused to use a full information endpoint (ordinal longitudinal endpoint) that would have (1) greater power and (2) been able to tell us on which treatment patients fared better.

Expand full comment

Edward H Livingston, MD, FACS

To clarify- from the article: The primary outcome was a composite of myocardial infarction or death from any cause up to 30 days after randomization." With the ordinal longitudinal endpoint are MI and death treated as 2 potential independent endpoints instead of as a composite?

Expand full comment

The composite output you mentioned takes, as usual, 'composite' to mean 'union', ignoring different impacts/consequences of the different events. Ordinal endpoints break the ties. Death would be counted as worse than MI. The different kinds of events are not considered to be independent, but when using a hierarchical structure if death occurs on the same day as MI the MI is ignored. The outcome is the worst outcome in a given day. More at https://hbiostat.org/rmsc/markov - see the links at the top of that chapter. Ordinal analysis approximates a true patient utility-based outcome analysis.

Expand full comment

The overarching point, that this result would not lend itself to a cookie-cutter algorithmic “guideline”, is fortuitous and highly welcome.

It was unfortunate this study included both type 1 and type 2 MI (as well as type 4B and 4c, although these are arguably less problematic). I’m most interested in how to manage a plaque rupture event, and not the quagmire of myriad pathologies involved in any “type 2” diagnosis.

And given the overall “negative” result, it is difficult with a “purist” approach to make too strong a conclusion about the subgroup of type 1 MI who did derive a statistically significant outcome benefit. (Also the fact they didn’t account for multiplicity). But this adds to the regret that they didn’t just study type 1 Mi to begin with.

I think my response to this data is to not trip over myself and transfuse for HB of 99….but I may not wait till 70-80 (as the “hard” outcomes are indeed difficult to completely disregard). I will likely apply this non-black and white result in a non-black and white fashion.

Expand full comment

Roland Büchter

And there I was curiously expecting a cardiologists opinion on trials of joint denervation. I think a BMJ Christmas piece on trial acronyms is in order. :)

Yet - interesting as always.

Expand full comment

A Bayesian approach may inform. The Cochrane analysis included two trials of liberal vs restrictive transfusion in acute coronary syndrome, 45 and 110 patients. The larger trial trended to benefit, and the other found harm (p=0.046). The prior probability (credence) generated by these two studies suggests that this latest trial merely maintains the equipoise.

Expand full comment

Ernest N. Curtis

A very useful study. But no quandary; the differences are obviously insignificant. In a book I wrote I included an Appendix on how to interpret a medical study. My advice was to always look at the raw data and ignore the odds ratio, confidence intervals, regression equations, and p values. Just look at the numbers and decide for yourself whether the differences are of any practical significance. Any study that doesn't give the raw data and the basic percentages of real risk is hiding something and probably engaging in deception.

From a pathophysiologic angle it makes sense that transfusion would probably do very little to alter the outcome. Myocardial infarction involves the complete obstruction of an artery or a branch. It seems unlikely that improving the oxygen level in blood that can't reach the area of damage would be of little use. I know that many have pointed out that the surrounding peri-infarction tissue may be spared with better oxygenation, but this is speculative.

I also thought the choice of terms for the two options was interesting and possibly reflects some bias. Liberal of course is derived from the word liberty and connotes fairness, generosity, and a bunch of other favorable terms. Restrictive is a word that makes one think of more negative attributes such as meanness, penury, censorious behavior, etc. I am not saying that this was done with deliberate intent but it is interesting.

Expand full comment

Well, maybe the lawn signs i see in my neighborhood are right- science is real. Because this is what i think of when someone says “science” (i am surprised it didnt make the front page of our national news outlets).

Using a well designed study to help us make clinical decisions about balancing the benefit of an intervention with its inherent risk and cost (transfusion), and challenging a long held belief that is rooted in common sense and bench research (more oxygen delivery=good). Its conclusions are honest, noninferior, but clearly in the details, there is a suggestion that some people may be harmed by a restrictive strategy. It is up to the clinician to use judgment until another trial is designed to bring more into focus who may benefit.

And it is consistent with other studies of transfusion after heart surgery. DOI: 10.1056/NEJMoa1711818

Our tax money well spent by the NHLBI.

Thanks Dr Mandrola. It helps my morale to see the system work as it should.

Expand full comment

Carlos Valladares

I agree, specially with the last sentence! Doctoring is not just about applying evidence but also about applying clinical judgement. Statistics is a useful tool in our profession, but one of many.

Expand full comment

Journal of Olecranology

I agree with your principle, and I like your point that sometimes we need science AND common sense (with the nuance that both can lead us astray). But how about the use of IV iron in these folks instead of transfusion? That might strike a middle ground. Any hematologists want to chime in?

I'm in primary care, so I do a lot of pre-op clearance. Surgeons always want a CBC, I've started adding ferritin and iron studies to preop clearance with the premise that if you're gonna lose a lot of blood, I want to know what your iron stores look like. Lots of folks are iron deficient without anemia (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8002799/), and treating this is particularly important with operative blood loss, menstruation, maybe frailty. We won't see the boost quite as quickly, but we would see H/H going up within a month.

Expand full comment

Ernest N. Curtis

Excellent suggestion. I never thought of that when I was in practice and, if I weren't retired, I would definitely include those tests in any pre-op evaluation.

Expand full comment

Excellent 2nd paragraph.!I was an internist/fellowshlp hematologist before going to a dermatology residency for which I apologize even though I only practiced medical dermatology.

Most doctors do not know or appreciate the iron-mitochondria relationship.The former is needed for the latter to function normally.,not to mention oxygen carrying capacity.Mitochondria are the engines of the body.!

Even though I am long retired I continue to harp on this because iron deficiency before even anemia can cause problems as alluded to in Dr. Nielsen’s comment.

Expand full comment

Edward H Livingston, MD, FACS

The problem here is relying on the dichotomous yes/no that comes from hypothesis testing-P values. Although the P value was not significant because the CI crossed 1.0, there was a 95% chance that the true HR was somewhere between 1.0 and 1.35, favoring liberal blood transfusion. Almost all of the effect embodied in the CI favored liberal transfusion. To me, the conclusion of this study is that there is reasonable evidence that liberal transfusion is beneficial in this population and that more evidence is needed to use the results of this trial in clinical care. If the HRQOL outcomes (yet to be reported) favor blood transfusion, I would conclude that when the totality of effects are accounted for, a liberal transfusion should be considered for these patients.

Expand full comment

Sander Greenland

Sorry, but no, there was not a 95% chance the true HR was between 1.0 and 1.35. That's mistaking a frequentist confidence interval for a Bayesian credibility interval. It's a type of inversion fallacy, related to mistaking the null P-value as the probability chance alone produced the association.

Now, 95% might be what you'd bet for 1.0 to 1.35 if you had no other information about the effect size other than the data and the fact that the treatment was randomized. But is it really the case that all you know are the reported numbers and that they came from a well-conducted randomized trial? Not if you also know how the compared treatments work and about other studies of their effects.

One argument Bayesian statisticians level against conventional (frequentist) P-values and CI is that in practice almost no one seems to interpret them correctly, and many instructors and tutorials don't get them right either. That's a fair criticism, but I think it reflects an entrenched cultural problem, much like failure to follow antiseptic procedures in 19th-century medicine. My colleagues and I think it is possible for most anyone to interpret P-values and CI correctly if one follows good mental hygiene about what those statistics are and (perhaps more importantly) what those statistics are not. You can judge that thesis for yourself by reading the citations I posted earlier today.

Expand full comment

Thanks for reading and commenting EL. I look forward to a guest post from you someday ;)

Expand full comment

Edward H Livingston, MD, FACS

Will do!

Expand full comment

All of these drug trials are executed for the express purpose of creating more and more drugs because the drugs we have now are virtually useless or inefficient. Big pharma needs a constant flow of "new" drugs (often nothing more than an altered older drug) so that they can charge outrageous prices while pretending the newer drugs are much more efficient. This goes on and on as the population gets sicker and sicker.

The answers to a healthier life and living a healthier life for longer are to be found well outside any drug, vaccine or mRNA product. Healthy living can be obtained virtually free in many respects.

Study results, test results and statistics can all be manipulated or driven to produce the desired results. Often, they are.

Expand full comment

But to be fair, this study wasn't of any drug other than human blood. No pharma angle.

Expand full comment

#nojs-banner { position: fixed; bottom: 0; left: 0; padding: 16px 16px 16px 32px; width: 100%; box-sizing: border-box; background: red; color: white; font-family: -apple-system, "Segoe UI", Roboto, Helvetica, Arial, sans-serif, "Apple Color Emoji", "Segoe UI Emoji", "Segoe UI Symbol"; font-size: 13px; line-height: 13px; } #nojs-banner a { color: inherit; text-decoration: underline; } This site requires JavaScript to run correctly. Please turn on JavaScript or unblock scripts