Sensible Medicine

Thanks very much for the nice reply James. I understand. Yes we should do something in the future about Bayes. I think this goes a long way towards both approaches: https://discourse.datamethods.org/t/language-for-communicating-frequentist-results-about-treatment-effects https://www.fharrell.com/post/bayes-freq-stmts/ https://discourse.datamethods.org/t/bayesian-vs-frequentist-statements-about-treatment-efficacy . The first link has my proposed interpretations of trials like the one you described just now . In terms of confidence intervals, Sander Greenland has published much related work and shows advantages of using the term 'compatibility interval'. For now the important points are (1) the frequentist approach ties to be simple but it does so by not giving you what you want and (2) the probability to attach to compatibility intervals is the long-term probability that the PROCESS used to generate the interval covers the true unknown treatment effect. The probability is not attached to a single realization of that process.

Expand full comment

MarkS

https://i.postimg.cc/LXhDnkQm/trial.jpg

We need the actual numbers of patients in each group to do a proper Bayesian analysis. It cannot be backed out with just the info you have given.

"So, is it not reasonable to say it is likely that the one CI we have contains the true result given that 95% of them do BUT as we said we will be wrong with that statement with 5% of the CIs we see?"

No, this is an illegal inference by frequentist rules. Frequentism gives no metric for what is "reasonable". Frequentism only and ever gives P(E|H): the probability that the evidence E would be obtained in the experiment if the hypothesis H is true.

But no one cares about this. What we care about is P(H|E), the probability that the hypothesis H is true given the evidence E that has been obtained. But, according to frequentism, P(H|E) does not exist, as a matter of principle.

This is an insurmountable barrier for frequentist statistics. But since the barrier must nevertheless be surmounted, all sorts of dodges are made (all illegal by frequentism's own rules), and this is why the subject is so damn hard to understand. Because, at a fundamental level, it does not make sense.

This would be of mere academic curiosity if lives were not at stake.

Expand full comment

Interesting stuff Mark.

1) I imagine you are "correct" if you strictly follow the "rules" etc - however, the entire purpose of our post was to try to give people who get exposed to results and statistics presented in a frequentist way on a regular basis a way to more "correctly" interpret them. If we get rid of the words reasonable and likely - do you at least agree that "95% of the confidence intervals will include the true result"?

2) The example I gave was simply theoretical. Would you be willing to use the following real-life example and explain what you say about the results in a way that clinicians might be able to use? The numbers come from the EMPA-REG trial https://pubmed.ncbi.nlm.nih.gov/26378978/. The abstract states "The primary outcome occurred in 490 of 4687 patients (10.5%) in the pooled empagliflozin group and in 282 of 2333 patients (12.1%) in the placebo group (hazard ratio in the empagliflozin group, 0.86; 95.02% confidence interval, 0.74 to 0.99; P=0.04" . Really look forward to hearing what you have to say. Thanks.

Expand full comment

Here is a plot of the posterior distribution for the rate of the primary outcome (which I understand is BAD) in the drug group (blue) and the placebo group (orange), assuming a flat prior (choice of prior makes very little difference because there is a lot of data):

Now you could do some more fancy math and compute confidence intervals ("credible intervals" in Bayesian lingo) or whatever, but I think it's enough to just look at the picture. I sure would want the drug, wouldn't you?

But if a number is needed, I think the most relevant one is the probability that the true value of the primary outcome rate is lower with the drug than with the placebo. That probability is 0.978. Very convincing, IMO.

Of course the drug may have other possible bad consequences (including cost) that would complicate the decision, but that's not part of the trial as I understand it.

The formula for each curve is simple:

P(x) = C x^np (1-x)^(n-np)

where n is the total number in the group (blue 4687, orange 2333) and np is the number with the primary outcome (blue 490, orange 282). The constant C is chosen so that the total probability is one; C =(n+1)! / (np! (n-np)!) where the exclamation point denotes the factorial function.

Note that, if it's not already obvious, I am NOT a medical person: my expertise is in a different hard science. I can barely get through medical jargon at all.

Expand full comment

Thanks so much for doing this.

Just so I understand are you saying that the probability that the drug has an effect is 97.8%. If so that is great - but I think I already pretty much know that by using a frequentists approach, because we have ruled out chance - the p value is <0.05. However what I really need to know is the benefit large enough to take the drug every day for the next three years. Here is what I would do by looking at the confidence interval. I believe the relative benefit is somewhere between a 26% relative benefit (0.74) and a 1% relative benefit (0.99) and the observed relative benefit was 14% (0.86). So the absolute benefit seen in this trial was 12.1% minus 10.5% = 1.6% - so a 1.6% benefit and therefore 98.4% get no benefit - or approximately 60 people need to take this drug for three years for 1 to benefit. However, because we don't know the true effect all I can say is that the effect is likely - sorry I know Bayesians don't really like that word - somewhere as large as a 26% relative benefit or as small as 1%. So the absolute benefit might be as large as ~3% or close to no benefit at all. Then I have to add in that the cost of the medication is about CA $1000 a year and 5-10% of people will get a genital infection because of the drug. Then I have to somehow explain this to a patient using percentages to help them make a shared decision.

So my main question is now, what additional clinically useful information could I get that I could use in the decision making process by using a Bayesian approach instead of a frequentist's approach? And then is it something I could easily do by looking at the results presented in the paper?

Hope my approach and questions make sense.

Thanks again.

Expand full comment

Watch out. 98.4% get no benefit has nothing to do what the probabilities being considered here. To interpret things that way you'd need a 6-period randomized crossover study, which allows one to estimate benefit on a patient-by-patient basis. You can't get the fraction benefitting from the probability about a group effect.

Expand full comment

Interesting. Unfortunately, the entire point is that we have in this case a single RCT that is suggesting, albeit using a frequentist approach, a benefit and we will likely never have a 6-period randomized crossover study. And possibly not even another RCT. So we have to as best as possible use the available data to make a ballpark estimate of the benefit and then be able to present it in a way that can make sense. So if you had a person similar to the people studied in the trial, given the results, what ballpark absolute benefit would you tell them. The entire purpose of using the medication would be to reduce their risk of a bad outcome. The best answer I can give is roughly a 1-2% lower chance of having one of the primary outcomes over 3 years. Is there anything from a Bayesian perspective that can help make a better estimate of benefit? Thanks.

Expand full comment

Frank Harrell

I mentioned the crossover study only in the context of estimating the proportion of patients who benefitted. You can ignore that for other purposes. Your use of the confidence interval in your previous reply doesn't cut it. The clinical question is: given an interval [a,b] what is the probability that the true treatment effect is in that interval. With a CI you give the probability and it derives the interval. Also, frequentist inference was developed by Fisher as a sequence of experiments with continually refined evidence against H0. What you have enunciated as the need to know what to do now is incompatible with the frequentist approach to some extent, and calls for Bayes. Bayes is about uncovering the data generating mechanism behind THIS study.

Expand full comment

I appreciate all that. So if my use of the CI doesn't cut it then can you, using the specific example of the trial I showed, tell me how a clinician and/or patient should interpret the specific findings from this trial. The only way for any individual to make a decision is to have an idea of the benefits and harms. So in this case if the person was similar to the people enrolled in the trial what could we tell them about the benefit of this medication on their risk of a CVD event. You say that Bayes can uncover the answer so could you tell me the answer that is better than what I have done with the CI. Thanks.

Expand full comment

Frank Harrell

By not cutting it I wasn't referring to you but the general problem with CIs besides the almost impossibility of defining them. Clinicians have specific interests, e.g. what's the evidence that the effect > 0? > 15%? The intervals for those are [0, infinity], [15%, infinity]. To get the evidence for the unknown being in the interval you must use Bayes. Frequentist takes control of the interval endpoints after you define the compatibility probability. This is very non-clinical.

Expand full comment

"what I really need to know is the benefit large enough to take the drug every day for the next three years."

That requires first quantifying the downside in some way that allows it to be meaningfully compared to the upside, eg, by assigning a dollar value to every potential outcome, good or bad. I don't see how your "relative benefit" and "absolute benefit" numbers are meaningful without that sort of quantification first.

As for the more basic point, is frequentist p<0.05 a good criterion? In high data situations, as we have here, yes, it will mimic the (fundamentally more sound) Bayesian posterior probability of there being an effect well enough not to matter. In situations with less data, I would not trust this to be the case.

Expand full comment

Glad to hear you think in this case a simpler frequentist approach is giving us a reasonable answer. The person who has to make the decision about taking the medication is the individual person. Their risks are the inconvenience, costs and side effects to them. And they will never be able to know if they benefit because we aren't making them feel better we are just reducing their risk. Not sure how assigning a dollar value to each outcome is useful to an individual patient especially as in Canada with have pretty good health insurance. As I asked Frank, can a Bayesian approach give me a better number to use than saying we can likely reduce your chance from ~12% down to 10%?

Expand full comment

MarkS

"can a Bayesian approach give me a better number to use than saying we can likely reduce your chance from ~12% down to 10%?"

No. All Bayes does for you here (a high-statistics study) is give you a more meaningful quantification of "likely" than you can get from frequentism. The probability that the drug reduces the chance of the primary outcome (by some amount) is 97.8%. The most likely reduction is, as you say, from ~12% to ~10%.

I agree that the patient should make the decision.

All medical professionals have been trained in frequentist methods, and almost none in Bayesian methods, and this is not going to change any time soon. This is a real shame, because Bayesian methods, once learned, are so much more intuitive. But for now all of you have to learn frequentist methods, because that's what's used in every paper you read.

I became a Bayesian 40 years ago when a standard frequentist analysis of some low-quality data was giving me a nonsensical result, that some signal that could not possibly be negative was negative with some decent confidence. But I actually had 100% confidence that it was not negative! How could I put that into the analysis? The answer is a Bayesian prior. This is the sort of situation where Bayesian methods give better results. I would think that medicine has a lot of situations where there are no high-statistics studies at all, and yet doctors have patients who need advice. Bayesian methods would result in better advice in these cases, so I hope they eventually become more common.

Expand full comment

Excellent - so it seems that a frequentist approach to looking at clinical trials is at least a reasonable approach when it comes to using clinical trial data and making decisions in patient care. Thanks.

Expand full comment

David AuBuchon

https://www.bmj.com/content/366/bmj.l5381

Following this thread with interest. As Mark described:

P(E|H) = "the probability that the evidence E would be obtained in the experiment if the hypothesis H is true."

P(H|E) = "the probability that the hypothesis H is true given the evidence E that has been obtained."

The latter term being incalculable. The prior term bringing to mind the term "compatibility interval":

As in "the evidence is compatible with the hypothesis".

E can be compatible with multiple hypotheses simultaneously, including hypotheses that would explain some or all of the result by bias/confounding, reverse causation, fraud, or randomness.

Thus is my pleb understanding.

Expand full comment