41 Comments
User's avatar
David AuBuchon's avatar

What is an "ideal" metric anyway? (is "metric" the right vocabulary?) My initial beliefs are that it would reflect cognizance of all of the following in its design:

1) No patient is average and a variety of effect sizes could happen on an individual level.

2) No patient is average with respect to adverse effects either.

3) Individual patients could hence have realized favorable or unfavorable risk-benefit tradeoffs within a trial regardless of the overall results

4) There is uncertainty around effect sizes

5) There is uncertainty around adverse effects

6) Arbitrary cutoffs are arbitrary

7) An assessment of risk-benefit tradeoffs on an individual (not overall) level is what patients and clinicians either want or will make on their own, by definition.

8) Metrics that report overall or average happenings are in essence obscuring or reducing all available information that exists on a patient level.

9) Metric should be amenable to both interventional and observational studies of different types of outcomes.

10) Metric should be amenable to meta-analysis, to enriching interpretation of past and future results, and to having interpretation enriched by past and future results.

11) An “ideal” metric assumes the existence of excellent data transparency and adverse event reporting.

12) Relative and standardized measures are not intuitive to read.

13) Clinically significant is not the same as statistically significant.

14) Even in the presence of complete knowledge, risk-benefit tradeoffs are subjective (e.g. palliative chemo that will give you 2 more months but lots of side effects).

15) Potential adverse effects are of three types: those recorded during the study, the potential for a negative effect size, the potential for long-term effects not captured.

- Various metric ideas going on in this head, but none of them coherent enough to write.

- One thought is since we want to get away from arbitrary thresholds, maybe we can make a metric that dynamically bakes-in its own decision about a confidence level and then standardizes itself against something. Or maybe present a continuum of a metric values that are a function of readers' personal subjective risk-benefit values. (On a side note, the word risk implies possibility of harm, whereas the word benefit implies certainty of benefit. I always found this consistent. It perhaps ought to be "benefit-harm " tradeoff.)

- Another thought: In cosmetic trials dermatologists subjectively rate things and they look at inter and intra-rater reliability. At the end of the day, the subjective interpretations of humans is what determines the value of a study. Should we be routinely polling people?

Thanks for lasting through this rant.

Expand full comment
Sigdrifr's avatar

Not a rant, but a great list of limitations.

The big one IMO, is that RCT's (and most clinicians) are looking at the intervention (drug, device, etc.) and not the patient. The obvious (I hope, but often ignored) biological differences among patients creates statistical noise, but also should alert all of us to the hard fact that one size does not fit all! The glib conclusion that a drug or procedure "works" flies in the face of biological reality. It works in certain patients (sometimes in a vast majority), but not in others. At a clinical decision level the question is, "will this work for this patient?" but RCTs can not provide this guidance.

Expand full comment
Franklin Carroll's avatar

Why aren't they doing better power analysis? It seems like a lot of these problems could be avoided with better pre-study design.

Expand full comment
Michael Campbell's avatar

This is a very interesting issue, and it is wonderful that Frank Harrell has contributed to this discussion. Like him, I have promulgated confidence intervals and their correct interpretation for years but appreciate confusion still occurs. I have three comments to make.

1) When discussing relative risks or hazard ratios one should always consider the absolute risk difference as well. In the DANCAVAS the result was the intervention reduced the risk of dying by 5% relative to the control (my emphasis). If the risk of dying on the control was low, then the risk difference could be very small and not important to those at risk.

2) I think it is a false dichotomy to say a treatment ‘works’ or ‘ does not work’ . Treatments may have very small effects, but still said ‘work’ if a very large trial was conducted and a statistically significant effect found. What is needed before the trial is conducted is a robust discussion as to what clinically important differences are worth considering . (Note not a ‘significant’ difference). There has been much work on determining a meaningful difference. As I have stated many times, one of the most useful aspects of a sample size calculation is it requires interested parties to come up with some effect size that would be considered useful. In the PROTECTED-TAVR trial, what was considered a useful effect size? It should be available in the authors' reported sample size calculation. This would help interpret a confidence interval.

3) It is easy to forget that the point estimate is our best guess as to the true effect. Thus I would not promote reporting where only the confidence interval is given.

Mike Campbell

University of Sheffield

Expand full comment
JDK's avatar

Points 1 and 2 are well taken.

But re 3): a point estimate is NOT "the" "best" guess as to the true effect. It is a guess. It is not a random guess, (that would only occur if CI was 0 to ♾️), but to say it is "best" guess I think is not a good claim or way to think about .

There is point estimate of 1.01 with 95% CI from 0.72 to 1.42. Somehow there is way to magically know the "true" effect.

You get to bet on 1.01 as the true effect. (What you call your "best" guess.) I get to bet on NOT 1.01 as true effect. I will take my bet every day (with properly sized bet, see Kelly). I think if you thought about it you'd take my bet too.

Where human beings are involved we should minimize our enthusiasm for likelihood of benefit and maximize our caution for risk of harm. The focus should be on the tails. Point estimates distract from that focus. And to say they are our "best" guess also distracts and overstates our knowledge.

Expand full comment
Allan katz's avatar

prefer the success rates doctors in the trenches are having with their treatments - real medicine - than RCTS that are problematic and far from how real medicine is being practised

Expand full comment
MSB's avatar

On one hand, I'm glad that this type of conventional western medicine model is not the only option. On the other hand, I am sad that too often the pharma industry in collusion with medical doctors and agencies collude to diminish other types of treatment; and in particular under the influence purchased by the pharma industry, agencies such as the FDA work to ban genuinely safe and effective alternatives which are also inexpensive.

Expand full comment
DG's avatar

My saying about medicine: I will use it when I need MAID, it’s the only procedure doctors have perfected.

Expand full comment
Crixcyon's avatar

You can take any numbers and statistics and manipulate them to arrive at a desired result. All you need do is look at the global warming claims. You always have to look at who is sanctioning a study, who is funding the study, who is performing the study and who will gain from positive results. There will always be biases.

I tend to ignore most studies since the game of science has become political, big pharma oriented and highly questionable. Although I am am not an expert in much of anything, my trust level is nearing zero especially with anything connected to the medical establishment.

Medicine is a life and death equation. There is a 100% chance that my doctor and related medical community will make money on my diseases, illnesses and death, whereas my chances of surviving are all over the map. Having to depend on my doctor's interpretation and beliefs of a trial or study is not something I want to subscribe to. Even if the "statistics" are in my favor, it's still only a guess.

So, they guessed wrong and I died. So, I guess wrong and I died. You can never prove with 100% certainty that making the other choice would have ended in a better outcome.

Expand full comment
JDK's avatar

This is the other extreme of nihilism. It is just as problematic as the "scientistic".

Expand full comment
LeftTheLeft. AntiDemsAntiTrump's avatar

This is a great post on an important issue. Along with uncertainty related to statistical considerations, what about uncertainty due to the trial not adequately carrying out the intervention it is supposed to be testing? Like if a clinical trial gives only 40% of the necessary dose of a drug, or if only 40% of the intervention group actually gets the intervention? Can't we all agree such trials are ridiculously flawed?

Where I am going with this relates to the following direct quotes in the Cochrane review pertaining to masking: "Adherence with interventions was low in many studies. The risk of bias for the RCTs and cluster‐RCTs was mostly high or unclear." "For example, in the most heavily-weighted trial of interventions to promote community mask wearing, 42.3% of people in the intervention arm wore masks compared to 13.3% of those in the control arm."

The Cochrane review clearly tells us that mask RECOMMENDATIONS lack effectiveness, I agree 100% with that. But with such low adherence, how can we HONESTLY claim that the Cochrane review tells us that masks themselves are not effective? BTW: I am not "pro-mask", I quit masking over a year ago!

Expand full comment
Steve Cheung's avatar

I agree with the general tenor of this post. Dr. Mandrola has motivated me over the years to adopt medical conservatism.

I feel part of the issue is the “Twitter-fication” of our lives, and of our minds. We don’t just need the “top line” result, and shouldn’t shy away from gray zones. But neither should we ignore or “spin” the actual result, insofar as what the result of that particular experiment says about the specific question under study.

So if the P value is below our accepted threshold for significance, and if the HR 95%CI crosses 1, then we should say that we except the null, period. But then, we should comment on the point estimate, and the bounds of the CI, to say that a particular degree of benefit (or harm) isn’t entirely excluded. Which actually should go without saying, since the entire point of P<0.05 is to say there is less than 5% likelihood of the observed result being a random or chance finding; since P does not (and can’t ever) equal 0, the possibility of the result being erroneous is never excluded with certainty, by definition.

So I’d say DANCANVAS can conclude no significant benefit, but there is quite likely a benefit since the CI includes but does not exceed 1. I’d agree with the protected TAVR conclusion of no conclusive benefit, but the possibility of some remains. And I’d agree with Prasad on the masks meta-analysis, that there is no conclusive benefit, but the possibility of benefit or harm is not excluded.

I’d also add that a scientific publication using a lay term like “inconclusive” is a dereliction of duty. It’s not inconclusive. It failed to compel us to reject the null. It failed to show a benefit. There remains a possibility of some benefit (and/or harm)….as is true of literally every scientific result that fails to reject the null. Saying “inconclusive” is to suggest that nothing has been learned, when in fact something has.

Expand full comment
Frank Harrell's avatar

Your interpretation of a P-value is highly problematic.

Expand full comment
SEF's avatar

What is the totality of gold-standard randomized clinical trial evidence on the mortality impact of mRNA vaccines and ivermectin?

mRNA vaccines (Pfizer and Moderna clinical trials totaling >70000 adults, conducted in late 2020 - early 2021 at peak vaccine effectiveness, by the companies themselves):

COVID deaths: 2 vaccine vs. 5 placebo

non-COVID deaths: 29 vaccine vs. 25 placebo

Overall mortality: 3% INCREASED with vaccine (even at the height of the pandemic)

Pfizer: https://www.nejm.org/doi/suppl/10.1056/NEJMoa2110345/suppl_file/nejmoa2110345_appendix.pdf – Table S4

Moderna: https://www.nejm.org/doi/suppl/10.1056/NEJMoa2113017/suppl_file/nejmoa2113017_appendix.pdf – Table S26

Note: if anyone says "it's not statistically significant" they are obviously sidestepping the entire point. The clinical trial mortality results for mRNA vaccines are indisputably ABYSMAL. Pretend that those results were for ivermectin, would the establishment say ivermectin is safe, effective, and livesaving?

Ivermectin (18 published randomized clinical trials totaling >7000 COVID patients, conducted from 2020-2022):

Overall mortality: 29% DECREASED with ivermectin (raw totals were 67 deaths in ivermectin groups vs. 102 deaths in control groups, out of about ~3600 each)

https://c19ivm.org/meta.html#fig_fprd

Note on ivermectin trials: almost all large trials delayed ivermectin until 5-7 days after symptoms started, most only gave ivermectin for 3 days, and gave ivermectin in isolation outside of the protocols that almost all ivermectin proponents use. In contrast, the extremely successful clinical trial of Paxlovid (0 deaths Paxlovid vs. 13 deaths placebo) began Paxlovid an average of 3 days after symptoms started, gave Paxlovid for 5 days twice a day, and Paxlovid itself is a COMBINATION drug.

Expand full comment
Frank Harrell's avatar

This thinking represents really bad science.

Expand full comment
SEF's avatar

Please share your "correct" clinical trial mortality results with links to the sources. If you cannot do so, then it is obvious that these are indeed the correct clinical trial mortality results and your intention is just to obfuscate these results that are devastating to the medical establishment. Being an "expert" does not give one a license to deceive the public on life and death matters by hiding devastating clinical trial mortality results. "Bad science" has become a code-phrase for suppressing any true scientific findings that undermine one's position.

Expand full comment
Frank Harrell's avatar

No intent-to-treat or modified-intent-to-treat randomized trials have demonstrated a benefit of Ivermectin, and several well-done randomized studies have shown evidence for no benefit or trivial benefit.

Expand full comment
SEF's avatar

Your claim that "No intent-to-treat or modified-intent-to-treat randomized trials have demonstrated a benefit of Ivermectin" is not true. Please tell us how many of the 18 randomized clinical trials on ivermectin (which yielded overall mortality results of 67/3610 in ivermectin groups versus 102/3629 in control groups, ~30% reduction) did NOT report an intention-to-treat analysis? Publications available here: https://c19ivm.org/meta.html#fig_fprd. In fact, if we consider broad "benefit" as you stated (which goes beyond mortality), we have 45 randomized clinical trials and an even larger effect size https://c19ivm.org/meta.html#fig_fpr. Again, please tell us how many of these 45 trials did NOT report an intention-to-treat analysis?

Regarding your second point, the term "well-done" is completely subjective and prone to bias. Those of us who are UNBIASED will say that trials which delayed treatment for 5-7 days, and/or stopped treatment after 3 days, and/or avoided enrollment of high-risk patients are the polar opposite of well-done. (As I am committed to remaining unbiased, even though this is an extremely unpopular opinion here and will not win me "likes", I will say that the mask trials in the Cochrane review that prompted the original discussion were NOT well-done due to the abysmal adherence and failure to focus on higher-risk settings where masking COULD actually be beneficial IF it indeed works). Most commonly, the term "well-done" is simply a euphemism for "I like the result". A much better requisite of a "well-done" trial is that the proponents of the intervention should broadly agree with the trial's design (of course BEFORE the trial is conducted).

Expand full comment
Frank Harrell's avatar

You make some fair points. Since I haven't taken the time to review all those trials I'll defer to your judgement. I would be interested in a summary of the intent-to-treat double-blind ones.

Expand full comment
Dr. K's avatar

John, excellent piece -- thanks. The most important point here is that the incentive/pressure to support The Narrative(TM) is profound. Masking has long been known to be worthless...there are more studies than one can count if one aggregates them all. Some are better than others, but the only couple (excluding a piece of cloth in a clean room) that seem to show any kind of positive effect are the weakest of all. As Cochrane properly concluded, there is no evidence that masks have value from the studies they analyzed. The scandalous piece was their non-scientific "retraction" of marvelous results and then the repeated echo chamber of wrongthink such as you illustrated here in support of the bad science. Wish you had a bullier pulpit from which to shout this. Most people just emerge confused which is the point of those trying to call decent results into question, but that is bad for science, bad for scientists, bad for physicians and, worst of all, bad for patients. So keep these kinds of pieces coming, please.

Expand full comment
HardeeHo's avatar

"Masking has long been known to be worthless" - Not to beat a dead horse, but Cochrane only finds that proof is lacking. So there might be some small benefit to masking, not worthless. However, in the DANMASK study there was a small effect signalling harm from masking! We can imagine that mask handling might increase the risk of infection, aside from the typical lack of fit along with the impression of being invulnerable while masked.

Departing from Cochrane we can examine population level data as Ian Miller has done in his book and https://ianmsc.substack.com/ also https://twitter.com/ianmSC. If masking did anything such data might show an effect. OTOH, critics point out that few wear masks properly so population studies are meaningless begging the question that if people can't/won't wear masks "properly" then mask requirements don't matter.

No matter what we say or think about masks, our leaders will continue the assertion of effectiveness. Maryanne Demasi reports that Walensky was badly misinformed in her testimony recently https://maryannedemasi.substack.com/p/cdc-director-gives-misleading-testimony. If our leaders refuse to use data, we can only howl.

Expand full comment
LeftTheLeft. AntiDemsAntiTrump's avatar

Has anyone done an analysis of COMPLETE general population data on masking, without cherry-picking places and times? I'd really like to see such an analysis but haven't really seen any good studies on masking in EITHER direction, everything is so narrative-driven one way or the other. Sad.

Expand full comment
James McCormack's avatar

Another great example of how confidence intervals are misinterpreted are the meta-analyses that looked at the impact statins have on mortality in primary prevention. See our paper here for the ridiculousness of it all. https://bmcmedresmethodol.biomedcentral.com/articles/10.1186/1471-2288-13-134

Five different meta-analyses all looking at pretty much the same data - all got pretty much the same results - but 3 authors said they do reduce mortality and 2 said no. This misinterpretation of CIs is a huge problem if one only reads conclusions. When you only have 1 confidence interval I think the best one can say is that the true result is likely somewhere in the interval but not really sure where. That's pretty much all you can. Thoughts?

Expand full comment
JDK's avatar

It is not clear to me that point estimates should ever be published. It fools one into thinking that it is the "mostly likely" purportedly "true value". But this is wrong.

Consider a bin filled with many many marbles/beads. Who knows how many? There are white beads and red beds. (There might be purple or orange beads.) We do not know the proportion. We dip a paddle in and take a sample (technically mechanical sample but we may pretend it is random). We have done _a_ trial. We count and find a proportion. (We hope we are good counters and have not made a mistake in our count. But mistakes will happen.)

Is the proportion of red beads in the paddle the true value? No.

Does it make sense to think of the counted proportion in the paddle as the most likely value? No.

Can we say that the counted proportion is "close" to the true value? Well, that depends on how big your paddle is.

But you are not a professional bead counter. You want to know the proportion for a specific reason. So you will really want to know where are these beads coming from and how will these beads be used and what happens if there are the wrong color beads in the next phase of action. You require substantive knowledge of the underlying process.

You are also men and women of action. You want to _do_ something. But sometimes doing something does not actually make things better.

I think 95% intervals are too small,

99.7% at least. We are not making Guinness beer. Medicine deals with humans. But, I will stop here because the topic is too large to work out in a substack comment.

Expand full comment
KP's avatar

It's pretty obvious to me that you need to look at the "funding and disclosures" section of any research publication to decide be aware of the very potential bias and influence behind the conclusions any paper has arrived at. And of course as I understand it , majority of clinical trial data from studies are not available without a FOIA. We must have transparency and accountability in research, journal publications and the pharmaceutical industry to accurately interpret data.

Expand full comment
Dr. K's avatar

And even those are now turning out to be incomplete/wrong.

Expand full comment
CKW's avatar

Great article, thank you! I do indeed have lower confidence in medical evidence now - or at least in the way that medical evidence is reported.

Expand full comment
Wayne Neville's avatar

Why not talk to the experts rather than running clinical trials on things that can't possibly work? Many doctors assume they know everything about everything.

Stephen Petty - On the effectiveness of masks

https://www.youtube.com/watch?v=J3dnkbKoj4A

Petty Podcasts

https://www.youtube.com/channel/UCwPHqgMiWwjpqd5dA-Og_Ag

Expand full comment