There is much, much good in this article. The authors started out with great pains to interpret a confidence interval exactly correctly. Then they made a mistake:
"So, with this single poll, all we can say is the true result is likely somewhere between 37%
and 43% but we will be wrong with that statement 5% of the time."
There is much, much good in this article. The authors started out with great pains to interpret a confidence interval exactly correctly. Then they made a mistake:
"So, with this single poll, all we can say is the true result is likely somewhere between 37%
and 43% but we will be wrong with that statement 5% of the time."
No. Both parts of this sentence are incorrect. In frequentist statistics the true value is either in or outside the interval; there is no probability attached to this. The probability statement does not apply to 0.37 and 0.43 but to the process that generated this interval.
The extreme difficulty in interpreting confidence intervals should drive more people to Bayes, as described in my Bayesian journey at https://fharrell.com/post/journey.
Later the authors say
"Inferential statistics actually do NOT help us test a research hypothesis about whether an intervention worked or not. They assume the observed difference was solely due to chance and simply give us an estimate of the probability of such an occurrence over many potential repetitions of the study."
This is incorrect, as the statement applies only to classical frequentist inferential statistics. Any article on statistics that doesn't acknowledge the existence of Bayes is problematic.
Now take a look at
"No statistics can tell us if the medication worked or if the differences seen were clinically important. These decisions are clinical judgments--not statistical judgements. The ONLY reason we do inferential statistics is to singularly deal with the issue of chance. This concept is key to understanding inferential statistics."
That is false as again it applies only to classical frequentist statistics. With Bayesian posterior probabilities you are not needing to deal with "chance" in the sense above, and you obtain direct evidence measures such as the probability the treatment has any effectiveness and the probability of clinically meaningful effectiveness. And Bayesian uncertainty intervals are so much easier to interpret than confidence intervals.
An article about statistics should be exactly correct to not mislead readers, and researchers should stop pretending that the p-value/confidence limit form of interence is the only form that exists. Otherwise, new confusions will arise.
Frank - thanks for your response – very much appreciated. If I was asked before I posted this article, what is the probability someone will bring up the issue of a Bayesian approach to statistics, I would have guessed that probability to be 110% 🤣
You are correct we were just commenting from a frequentist’s perspective – we should have acknowledged that and we certainly didn’t intend to suggest a Bayes approach did not exist. Maybe we could collaborate in the future on a post to help people simply understand and contextualize a Bayesian approach to statistics – I think that could be very valuable.
It appears the main issue you had was with our phrase “So, with this single poll, all we can say is the true result is likely somewhere between 37% and 43% but we will be wrong with that statement 5% of the time”. We did not intend the word “likely” to suggest a specific probability – although I understand why you might think that. What we wanted to get across was simply a way a reader could possibly start to think about what information a single point estimate and a single confidence interval might provide. So, we were trying to make it “as simply as possible but not simpler”. There is always a fine balance when one does that.
The issue here is a common one. We are using probability in our article in a frequentist sense which is still the most common approach used in clinical research. We actually agree with the key elements of your comments. Clinicians normally have to make an estimate of the probability of a patient's disease before deciding on diagnosis and treatment. Frequentist probabilities do not apply here as there is no such thing as a single case frequentist probability. Our article is actually about trying to explain this with regard to p values and CI's. The only way to deal with this issue would be to explain in detail to people the difference between frequentist and Bayesian probabilities, which is beyond the scope of this article.
Nonetheless, if 95% of the generated CIs will contain the true result, then 5% will not contain the true result. So, is it not reasonable to say it is likely that the one CI we have contains the true result given that 95% of them do BUT as we said we will be wrong with that statement with 5% of the CIs we see?
In essence, saying that 95% CI's will contain the true value 95% of the time, is another way of saying you can have reasonable confidence in this range. Frequentist statisticians use the term confidence rather than a degree of belief because single case probabilities do not have meaning to them.
We do state quite clearly elsewhere in our post that “Inferential statistics don’t give us a probability” and that “Knowing this nuance is key to understanding statistics”
A REQUEST
To help our readers understand the difference between a frequentist approach and a Bayesian approach I have a question as to how you think might be the best way to simply interpret the result of a single trial.
THE SCENARIO
1) Let’s say there was only a single placebo-controlled trial of a new medication to see if it would reduce the risk of a heart attack. A single trial is not infrequently all we have in medicine when it comes to answering a clinical question.
2) Let’s assume the trial was well done and the findings were reported as a point estimate of 0.8 with a 95% confidence interval of 0.7-0.9 with the absolute risks of a heart attack in the placebo group being 10% and the risk in the new medication group being 8%.
How would you interpret this scenario? Thanks again for your response and interest.
Thanks very much for the nice reply James. I understand. Yes we should do something in the future about Bayes. I think this goes a long way towards both approaches: https://discourse.datamethods.org/t/language-for-communicating-frequentist-results-about-treatment-effectshttps://www.fharrell.com/post/bayes-freq-stmts/https://discourse.datamethods.org/t/bayesian-vs-frequentist-statements-about-treatment-efficacy . The first link has my proposed interpretations of trials like the one you described just now . In terms of confidence intervals, Sander Greenland has published much related work and shows advantages of using the term 'compatibility interval'. For now the important points are (1) the frequentist approach ties to be simple but it does so by not giving you what you want and (2) the probability to attach to compatibility intervals is the long-term probability that the PROCESS used to generate the interval covers the true unknown treatment effect. The probability is not attached to a single realization of that process.
We need the actual numbers of patients in each group to do a proper Bayesian analysis. It cannot be backed out with just the info you have given.
"So, is it not reasonable to say it is likely that the one CI we have contains the true result given that 95% of them do BUT as we said we will be wrong with that statement with 5% of the CIs we see?"
No, this is an illegal inference by frequentist rules. Frequentism gives no metric for what is "reasonable". Frequentism only and ever gives P(E|H): the probability that the evidence E would be obtained in the experiment if the hypothesis H is true.
But no one cares about this. What we care about is P(H|E), the probability that the hypothesis H is true given the evidence E that has been obtained. But, according to frequentism, P(H|E) does not exist, as a matter of principle.
This is an insurmountable barrier for frequentist statistics. But since the barrier must nevertheless be surmounted, all sorts of dodges are made (all illegal by frequentism's own rules), and this is why the subject is so damn hard to understand. Because, at a fundamental level, it does not make sense.
This would be of mere academic curiosity if lives were not at stake.
1) I imagine you are "correct" if you strictly follow the "rules" etc - however, the entire purpose of our post was to try to give people who get exposed to results and statistics presented in a frequentist way on a regular basis a way to more "correctly" interpret them. If we get rid of the words reasonable and likely - do you at least agree that "95% of the confidence intervals will include the true result"?
2) The example I gave was simply theoretical. Would you be willing to use the following real-life example and explain what you say about the results in a way that clinicians might be able to use? The numbers come from the EMPA-REG trial https://pubmed.ncbi.nlm.nih.gov/26378978/. The abstract states "The primary outcome occurred in 490 of 4687 patients (10.5%) in the pooled empagliflozin group and in 282 of 2333 patients (12.1%) in the placebo group (hazard ratio in the empagliflozin group, 0.86; 95.02% confidence interval, 0.74 to 0.99; P=0.04" . Really look forward to hearing what you have to say. Thanks.
Here is a plot of the posterior distribution for the rate of the primary outcome (which I understand is BAD) in the drug group (blue) and the placebo group (orange), assuming a flat prior (choice of prior makes very little difference because there is a lot of data):
Now you could do some more fancy math and compute confidence intervals ("credible intervals" in Bayesian lingo) or whatever, but I think it's enough to just look at the picture. I sure would want the drug, wouldn't you?
But if a number is needed, I think the most relevant one is the probability that the true value of the primary outcome rate is lower with the drug than with the placebo. That probability is 0.978. Very convincing, IMO.
Of course the drug may have other possible bad consequences (including cost) that would complicate the decision, but that's not part of the trial as I understand it.
The formula for each curve is simple:
P(x) = C x^np (1-x)^(n-np)
where n is the total number in the group (blue 4687, orange 2333) and np is the number with the primary outcome (blue 490, orange 282). The constant C is chosen so that the total probability is one; C =(n+1)! / (np! (n-np)!) where the exclamation point denotes the factorial function.
Note that, if it's not already obvious, I am NOT a medical person: my expertise is in a different hard science. I can barely get through medical jargon at all.
Just so I understand are you saying that the probability that the drug has an effect is 97.8%. If so that is great - but I think I already pretty much know that by using a frequentists approach, because we have ruled out chance - the p value is <0.05. However what I really need to know is the benefit large enough to take the drug every day for the next three years. Here is what I would do by looking at the confidence interval. I believe the relative benefit is somewhere between a 26% relative benefit (0.74) and a 1% relative benefit (0.99) and the observed relative benefit was 14% (0.86). So the absolute benefit seen in this trial was 12.1% minus 10.5% = 1.6% - so a 1.6% benefit and therefore 98.4% get no benefit - or approximately 60 people need to take this drug for three years for 1 to benefit. However, because we don't know the true effect all I can say is that the effect is likely - sorry I know Bayesians don't really like that word - somewhere as large as a 26% relative benefit or as small as 1%. So the absolute benefit might be as large as ~3% or close to no benefit at all. Then I have to add in that the cost of the medication is about CA $1000 a year and 5-10% of people will get a genital infection because of the drug. Then I have to somehow explain this to a patient using percentages to help them make a shared decision.
So my main question is now, what additional clinically useful information could I get that I could use in the decision making process by using a Bayesian approach instead of a frequentist's approach? And then is it something I could easily do by looking at the results presented in the paper?
Watch out. 98.4% get no benefit has nothing to do what the probabilities being considered here. To interpret things that way you'd need a 6-period randomized crossover study, which allows one to estimate benefit on a patient-by-patient basis. You can't get the fraction benefitting from the probability about a group effect.
Interesting. Unfortunately, the entire point is that we have in this case a single RCT that is suggesting, albeit using a frequentist approach, a benefit and we will likely never have a 6-period randomized crossover study. And possibly not even another RCT. So we have to as best as possible use the available data to make a ballpark estimate of the benefit and then be able to present it in a way that can make sense. So if you had a person similar to the people studied in the trial, given the results, what ballpark absolute benefit would you tell them. The entire purpose of using the medication would be to reduce their risk of a bad outcome. The best answer I can give is roughly a 1-2% lower chance of having one of the primary outcomes over 3 years. Is there anything from a Bayesian perspective that can help make a better estimate of benefit? Thanks.
I mentioned the crossover study only in the context of estimating the proportion of patients who benefitted. You can ignore that for other purposes. Your use of the confidence interval in your previous reply doesn't cut it. The clinical question is: given an interval [a,b] what is the probability that the true treatment effect is in that interval. With a CI you give the probability and it derives the interval. Also, frequentist inference was developed by Fisher as a sequence of experiments with continually refined evidence against H0. What you have enunciated as the need to know what to do now is incompatible with the frequentist approach to some extent, and calls for Bayes. Bayes is about uncovering the data generating mechanism behind THIS study.
I appreciate all that. So if my use of the CI doesn't cut it then can you, using the specific example of the trial I showed, tell me how a clinician and/or patient should interpret the specific findings from this trial. The only way for any individual to make a decision is to have an idea of the benefits and harms. So in this case if the person was similar to the people enrolled in the trial what could we tell them about the benefit of this medication on their risk of a CVD event. You say that Bayes can uncover the answer so could you tell me the answer that is better than what I have done with the CI. Thanks.
By not cutting it I wasn't referring to you but the general problem with CIs besides the almost impossibility of defining them. Clinicians have specific interests, e.g. what's the evidence that the effect > 0? > 15%? The intervals for those are [0, infinity], [15%, infinity]. To get the evidence for the unknown being in the interval you must use Bayes. Frequentist takes control of the interval endpoints after you define the compatibility probability. This is very non-clinical.
"what I really need to know is the benefit large enough to take the drug every day for the next three years."
That requires first quantifying the downside in some way that allows it to be meaningfully compared to the upside, eg, by assigning a dollar value to every potential outcome, good or bad. I don't see how your "relative benefit" and "absolute benefit" numbers are meaningful without that sort of quantification first.
As for the more basic point, is frequentist p<0.05 a good criterion? In high data situations, as we have here, yes, it will mimic the (fundamentally more sound) Bayesian posterior probability of there being an effect well enough not to matter. In situations with less data, I would not trust this to be the case.
Glad to hear you think in this case a simpler frequentist approach is giving us a reasonable answer. The person who has to make the decision about taking the medication is the individual person. Their risks are the inconvenience, costs and side effects to them. And they will never be able to know if they benefit because we aren't making them feel better we are just reducing their risk. Not sure how assigning a dollar value to each outcome is useful to an individual patient especially as in Canada with have pretty good health insurance. As I asked Frank, can a Bayesian approach give me a better number to use than saying we can likely reduce your chance from ~12% down to 10%?
"can a Bayesian approach give me a better number to use than saying we can likely reduce your chance from ~12% down to 10%?"
No. All Bayes does for you here (a high-statistics study) is give you a more meaningful quantification of "likely" than you can get from frequentism. The probability that the drug reduces the chance of the primary outcome (by some amount) is 97.8%. The most likely reduction is, as you say, from ~12% to ~10%.
I agree that the patient should make the decision.
All medical professionals have been trained in frequentist methods, and almost none in Bayesian methods, and this is not going to change any time soon. This is a real shame, because Bayesian methods, once learned, are so much more intuitive. But for now all of you have to learn frequentist methods, because that's what's used in every paper you read.
I became a Bayesian 40 years ago when a standard frequentist analysis of some low-quality data was giving me a nonsensical result, that some signal that could not possibly be negative was negative with some decent confidence. But I actually had 100% confidence that it was not negative! How could I put that into the analysis? The answer is a Bayesian prior. This is the sort of situation where Bayesian methods give better results. I would think that medicine has a lot of situations where there are no high-statistics studies at all, and yet doctors have patients who need advice. Bayesian methods would result in better advice in these cases, so I hope they eventually become more common.
Excellent - so it seems that a frequentist approach to looking at clinical trials is at least a reasonable approach when it comes to using clinical trial data and making decisions in patient care. Thanks.
As in "the evidence is compatible with the hypothesis".
E can be compatible with multiple hypotheses simultaneously, including hypotheses that would explain some or all of the result by bias/confounding, reverse causation, fraud, or randomness.
P(H|E) is calculable if one first grants that it is meaningful; that is whole idea behind Bayesian analysis. Then we use Bayes' Theorem (which has a one-line derivation from the axioms of probability):
P(H|E) = P(E|H)P(H)/P(E)
Here P(E) is an irrelevant normalization constant. The rub is P(H), the "prior". We have to start with a notion of how likely each of our potential hypotheses is BEFORE we get the evidence.
Frequentists HATE this. They really really don't want to assign priors to their hypotheses.
But they end up doing it anyway, stealthily. As soon as you say anything about a "true value" being "likely", you have snuck in an illegal (for a frequentist) prior. The main virtue of Bayesianism is that the priors are not concealed. Rather, they are up front where they can be examined (and varied, to see what the effects on the posterior are).
Hi Dr. Harrell. I know too little about Bayes (and stats in general) to ask this effectively.
I use Bayes implicitly when ordering any diagnostic test, as I must have a pre-test likelihood of disease in order for any test result to inform my post-test likelihood, and hopefully affect my downstream management decisions. But my pre-test seems entirely subjective, informed by formal teaching (“textbooks”) as well as clinical experience. Another clinician may have a different prior for the exact same patient. How does one deal with such differences in prior probabilities?
And for trials of therapeutics (esp “new” agents or first in class therapies), how does one even arrive at an informed prior probability? Thanks.
There are many good answers to that question, which I've dealt with at https://hbiostat.org/bayes/bet . Briefly, we always know something and classical statistics does not even make use of such minimal knowledge, e.g., that a treatment is incremental rather than curative. An incremental therapy may entail use a prior for an odds ratio, for example, such that the probability that the odds ratio >4 or < 1/4 is 0.05. In some cases we have actual trustworthy data on which to base a prior. In a majority of cases a reasonable sample size makes the prior very less relevant. Having a prior is the price of being able to make probability statements about the unknown of true interest. Just as with medical diagnosis.
There is much, much good in this article. The authors started out with great pains to interpret a confidence interval exactly correctly. Then they made a mistake:
"So, with this single poll, all we can say is the true result is likely somewhere between 37%
and 43% but we will be wrong with that statement 5% of the time."
No. Both parts of this sentence are incorrect. In frequentist statistics the true value is either in or outside the interval; there is no probability attached to this. The probability statement does not apply to 0.37 and 0.43 but to the process that generated this interval.
The extreme difficulty in interpreting confidence intervals should drive more people to Bayes, as described in my Bayesian journey at https://fharrell.com/post/journey.
Later the authors say
"Inferential statistics actually do NOT help us test a research hypothesis about whether an intervention worked or not. They assume the observed difference was solely due to chance and simply give us an estimate of the probability of such an occurrence over many potential repetitions of the study."
This is incorrect, as the statement applies only to classical frequentist inferential statistics. Any article on statistics that doesn't acknowledge the existence of Bayes is problematic.
Now take a look at
"No statistics can tell us if the medication worked or if the differences seen were clinically important. These decisions are clinical judgments--not statistical judgements. The ONLY reason we do inferential statistics is to singularly deal with the issue of chance. This concept is key to understanding inferential statistics."
That is false as again it applies only to classical frequentist statistics. With Bayesian posterior probabilities you are not needing to deal with "chance" in the sense above, and you obtain direct evidence measures such as the probability the treatment has any effectiveness and the probability of clinically meaningful effectiveness. And Bayesian uncertainty intervals are so much easier to interpret than confidence intervals.
An article about statistics should be exactly correct to not mislead readers, and researchers should stop pretending that the p-value/confidence limit form of interence is the only form that exists. Otherwise, new confusions will arise.
Frank - thanks for your response – very much appreciated. If I was asked before I posted this article, what is the probability someone will bring up the issue of a Bayesian approach to statistics, I would have guessed that probability to be 110% 🤣
You are correct we were just commenting from a frequentist’s perspective – we should have acknowledged that and we certainly didn’t intend to suggest a Bayes approach did not exist. Maybe we could collaborate in the future on a post to help people simply understand and contextualize a Bayesian approach to statistics – I think that could be very valuable.
It appears the main issue you had was with our phrase “So, with this single poll, all we can say is the true result is likely somewhere between 37% and 43% but we will be wrong with that statement 5% of the time”. We did not intend the word “likely” to suggest a specific probability – although I understand why you might think that. What we wanted to get across was simply a way a reader could possibly start to think about what information a single point estimate and a single confidence interval might provide. So, we were trying to make it “as simply as possible but not simpler”. There is always a fine balance when one does that.
The issue here is a common one. We are using probability in our article in a frequentist sense which is still the most common approach used in clinical research. We actually agree with the key elements of your comments. Clinicians normally have to make an estimate of the probability of a patient's disease before deciding on diagnosis and treatment. Frequentist probabilities do not apply here as there is no such thing as a single case frequentist probability. Our article is actually about trying to explain this with regard to p values and CI's. The only way to deal with this issue would be to explain in detail to people the difference between frequentist and Bayesian probabilities, which is beyond the scope of this article.
Nonetheless, if 95% of the generated CIs will contain the true result, then 5% will not contain the true result. So, is it not reasonable to say it is likely that the one CI we have contains the true result given that 95% of them do BUT as we said we will be wrong with that statement with 5% of the CIs we see?
In essence, saying that 95% CI's will contain the true value 95% of the time, is another way of saying you can have reasonable confidence in this range. Frequentist statisticians use the term confidence rather than a degree of belief because single case probabilities do not have meaning to them.
We do state quite clearly elsewhere in our post that “Inferential statistics don’t give us a probability” and that “Knowing this nuance is key to understanding statistics”
A REQUEST
To help our readers understand the difference between a frequentist approach and a Bayesian approach I have a question as to how you think might be the best way to simply interpret the result of a single trial.
THE SCENARIO
1) Let’s say there was only a single placebo-controlled trial of a new medication to see if it would reduce the risk of a heart attack. A single trial is not infrequently all we have in medicine when it comes to answering a clinical question.
2) Let’s assume the trial was well done and the findings were reported as a point estimate of 0.8 with a 95% confidence interval of 0.7-0.9 with the absolute risks of a heart attack in the placebo group being 10% and the risk in the new medication group being 8%.
How would you interpret this scenario? Thanks again for your response and interest.
Thanks very much for the nice reply James. I understand. Yes we should do something in the future about Bayes. I think this goes a long way towards both approaches: https://discourse.datamethods.org/t/language-for-communicating-frequentist-results-about-treatment-effects https://www.fharrell.com/post/bayes-freq-stmts/ https://discourse.datamethods.org/t/bayesian-vs-frequentist-statements-about-treatment-efficacy . The first link has my proposed interpretations of trials like the one you described just now . In terms of confidence intervals, Sander Greenland has published much related work and shows advantages of using the term 'compatibility interval'. For now the important points are (1) the frequentist approach ties to be simple but it does so by not giving you what you want and (2) the probability to attach to compatibility intervals is the long-term probability that the PROCESS used to generate the interval covers the true unknown treatment effect. The probability is not attached to a single realization of that process.
We need the actual numbers of patients in each group to do a proper Bayesian analysis. It cannot be backed out with just the info you have given.
"So, is it not reasonable to say it is likely that the one CI we have contains the true result given that 95% of them do BUT as we said we will be wrong with that statement with 5% of the CIs we see?"
No, this is an illegal inference by frequentist rules. Frequentism gives no metric for what is "reasonable". Frequentism only and ever gives P(E|H): the probability that the evidence E would be obtained in the experiment if the hypothesis H is true.
But no one cares about this. What we care about is P(H|E), the probability that the hypothesis H is true given the evidence E that has been obtained. But, according to frequentism, P(H|E) does not exist, as a matter of principle.
This is an insurmountable barrier for frequentist statistics. But since the barrier must nevertheless be surmounted, all sorts of dodges are made (all illegal by frequentism's own rules), and this is why the subject is so damn hard to understand. Because, at a fundamental level, it does not make sense.
This would be of mere academic curiosity if lives were not at stake.
Interesting stuff Mark.
1) I imagine you are "correct" if you strictly follow the "rules" etc - however, the entire purpose of our post was to try to give people who get exposed to results and statistics presented in a frequentist way on a regular basis a way to more "correctly" interpret them. If we get rid of the words reasonable and likely - do you at least agree that "95% of the confidence intervals will include the true result"?
2) The example I gave was simply theoretical. Would you be willing to use the following real-life example and explain what you say about the results in a way that clinicians might be able to use? The numbers come from the EMPA-REG trial https://pubmed.ncbi.nlm.nih.gov/26378978/. The abstract states "The primary outcome occurred in 490 of 4687 patients (10.5%) in the pooled empagliflozin group and in 282 of 2333 patients (12.1%) in the placebo group (hazard ratio in the empagliflozin group, 0.86; 95.02% confidence interval, 0.74 to 0.99; P=0.04" . Really look forward to hearing what you have to say. Thanks.
Here is a plot of the posterior distribution for the rate of the primary outcome (which I understand is BAD) in the drug group (blue) and the placebo group (orange), assuming a flat prior (choice of prior makes very little difference because there is a lot of data):
https://i.postimg.cc/LXhDnkQm/trial.jpg
Now you could do some more fancy math and compute confidence intervals ("credible intervals" in Bayesian lingo) or whatever, but I think it's enough to just look at the picture. I sure would want the drug, wouldn't you?
But if a number is needed, I think the most relevant one is the probability that the true value of the primary outcome rate is lower with the drug than with the placebo. That probability is 0.978. Very convincing, IMO.
Of course the drug may have other possible bad consequences (including cost) that would complicate the decision, but that's not part of the trial as I understand it.
The formula for each curve is simple:
P(x) = C x^np (1-x)^(n-np)
where n is the total number in the group (blue 4687, orange 2333) and np is the number with the primary outcome (blue 490, orange 282). The constant C is chosen so that the total probability is one; C =(n+1)! / (np! (n-np)!) where the exclamation point denotes the factorial function.
Note that, if it's not already obvious, I am NOT a medical person: my expertise is in a different hard science. I can barely get through medical jargon at all.
Thanks so much for doing this.
Just so I understand are you saying that the probability that the drug has an effect is 97.8%. If so that is great - but I think I already pretty much know that by using a frequentists approach, because we have ruled out chance - the p value is <0.05. However what I really need to know is the benefit large enough to take the drug every day for the next three years. Here is what I would do by looking at the confidence interval. I believe the relative benefit is somewhere between a 26% relative benefit (0.74) and a 1% relative benefit (0.99) and the observed relative benefit was 14% (0.86). So the absolute benefit seen in this trial was 12.1% minus 10.5% = 1.6% - so a 1.6% benefit and therefore 98.4% get no benefit - or approximately 60 people need to take this drug for three years for 1 to benefit. However, because we don't know the true effect all I can say is that the effect is likely - sorry I know Bayesians don't really like that word - somewhere as large as a 26% relative benefit or as small as 1%. So the absolute benefit might be as large as ~3% or close to no benefit at all. Then I have to add in that the cost of the medication is about CA $1000 a year and 5-10% of people will get a genital infection because of the drug. Then I have to somehow explain this to a patient using percentages to help them make a shared decision.
So my main question is now, what additional clinically useful information could I get that I could use in the decision making process by using a Bayesian approach instead of a frequentist's approach? And then is it something I could easily do by looking at the results presented in the paper?
Hope my approach and questions make sense.
Thanks again.
Watch out. 98.4% get no benefit has nothing to do what the probabilities being considered here. To interpret things that way you'd need a 6-period randomized crossover study, which allows one to estimate benefit on a patient-by-patient basis. You can't get the fraction benefitting from the probability about a group effect.
Interesting. Unfortunately, the entire point is that we have in this case a single RCT that is suggesting, albeit using a frequentist approach, a benefit and we will likely never have a 6-period randomized crossover study. And possibly not even another RCT. So we have to as best as possible use the available data to make a ballpark estimate of the benefit and then be able to present it in a way that can make sense. So if you had a person similar to the people studied in the trial, given the results, what ballpark absolute benefit would you tell them. The entire purpose of using the medication would be to reduce their risk of a bad outcome. The best answer I can give is roughly a 1-2% lower chance of having one of the primary outcomes over 3 years. Is there anything from a Bayesian perspective that can help make a better estimate of benefit? Thanks.
I mentioned the crossover study only in the context of estimating the proportion of patients who benefitted. You can ignore that for other purposes. Your use of the confidence interval in your previous reply doesn't cut it. The clinical question is: given an interval [a,b] what is the probability that the true treatment effect is in that interval. With a CI you give the probability and it derives the interval. Also, frequentist inference was developed by Fisher as a sequence of experiments with continually refined evidence against H0. What you have enunciated as the need to know what to do now is incompatible with the frequentist approach to some extent, and calls for Bayes. Bayes is about uncovering the data generating mechanism behind THIS study.
I appreciate all that. So if my use of the CI doesn't cut it then can you, using the specific example of the trial I showed, tell me how a clinician and/or patient should interpret the specific findings from this trial. The only way for any individual to make a decision is to have an idea of the benefits and harms. So in this case if the person was similar to the people enrolled in the trial what could we tell them about the benefit of this medication on their risk of a CVD event. You say that Bayes can uncover the answer so could you tell me the answer that is better than what I have done with the CI. Thanks.
By not cutting it I wasn't referring to you but the general problem with CIs besides the almost impossibility of defining them. Clinicians have specific interests, e.g. what's the evidence that the effect > 0? > 15%? The intervals for those are [0, infinity], [15%, infinity]. To get the evidence for the unknown being in the interval you must use Bayes. Frequentist takes control of the interval endpoints after you define the compatibility probability. This is very non-clinical.
"what I really need to know is the benefit large enough to take the drug every day for the next three years."
That requires first quantifying the downside in some way that allows it to be meaningfully compared to the upside, eg, by assigning a dollar value to every potential outcome, good or bad. I don't see how your "relative benefit" and "absolute benefit" numbers are meaningful without that sort of quantification first.
As for the more basic point, is frequentist p<0.05 a good criterion? In high data situations, as we have here, yes, it will mimic the (fundamentally more sound) Bayesian posterior probability of there being an effect well enough not to matter. In situations with less data, I would not trust this to be the case.
Glad to hear you think in this case a simpler frequentist approach is giving us a reasonable answer. The person who has to make the decision about taking the medication is the individual person. Their risks are the inconvenience, costs and side effects to them. And they will never be able to know if they benefit because we aren't making them feel better we are just reducing their risk. Not sure how assigning a dollar value to each outcome is useful to an individual patient especially as in Canada with have pretty good health insurance. As I asked Frank, can a Bayesian approach give me a better number to use than saying we can likely reduce your chance from ~12% down to 10%?
"can a Bayesian approach give me a better number to use than saying we can likely reduce your chance from ~12% down to 10%?"
No. All Bayes does for you here (a high-statistics study) is give you a more meaningful quantification of "likely" than you can get from frequentism. The probability that the drug reduces the chance of the primary outcome (by some amount) is 97.8%. The most likely reduction is, as you say, from ~12% to ~10%.
I agree that the patient should make the decision.
All medical professionals have been trained in frequentist methods, and almost none in Bayesian methods, and this is not going to change any time soon. This is a real shame, because Bayesian methods, once learned, are so much more intuitive. But for now all of you have to learn frequentist methods, because that's what's used in every paper you read.
I became a Bayesian 40 years ago when a standard frequentist analysis of some low-quality data was giving me a nonsensical result, that some signal that could not possibly be negative was negative with some decent confidence. But I actually had 100% confidence that it was not negative! How could I put that into the analysis? The answer is a Bayesian prior. This is the sort of situation where Bayesian methods give better results. I would think that medicine has a lot of situations where there are no high-statistics studies at all, and yet doctors have patients who need advice. Bayesian methods would result in better advice in these cases, so I hope they eventually become more common.
Excellent - so it seems that a frequentist approach to looking at clinical trials is at least a reasonable approach when it comes to using clinical trial data and making decisions in patient care. Thanks.
Following this thread with interest. As Mark described:
P(E|H) = "the probability that the evidence E would be obtained in the experiment if the hypothesis H is true."
P(H|E) = "the probability that the hypothesis H is true given the evidence E that has been obtained."
The latter term being incalculable. The prior term bringing to mind the term "compatibility interval":
https://www.bmj.com/content/366/bmj.l5381
As in "the evidence is compatible with the hypothesis".
E can be compatible with multiple hypotheses simultaneously, including hypotheses that would explain some or all of the result by bias/confounding, reverse causation, fraud, or randomness.
Thus is my pleb understanding.
P(H|E) is calculable if one first grants that it is meaningful; that is whole idea behind Bayesian analysis. Then we use Bayes' Theorem (which has a one-line derivation from the axioms of probability):
P(H|E) = P(E|H)P(H)/P(E)
Here P(E) is an irrelevant normalization constant. The rub is P(H), the "prior". We have to start with a notion of how likely each of our potential hypotheses is BEFORE we get the evidence.
Frequentists HATE this. They really really don't want to assign priors to their hypotheses.
But they end up doing it anyway, stealthily. As soon as you say anything about a "true value" being "likely", you have snuck in an illegal (for a frequentist) prior. The main virtue of Bayesianism is that the priors are not concealed. Rather, they are up front where they can be examined (and varied, to see what the effects on the posterior are).
Hi Dr. Harrell. I know too little about Bayes (and stats in general) to ask this effectively.
I use Bayes implicitly when ordering any diagnostic test, as I must have a pre-test likelihood of disease in order for any test result to inform my post-test likelihood, and hopefully affect my downstream management decisions. But my pre-test seems entirely subjective, informed by formal teaching (“textbooks”) as well as clinical experience. Another clinician may have a different prior for the exact same patient. How does one deal with such differences in prior probabilities?
And for trials of therapeutics (esp “new” agents or first in class therapies), how does one even arrive at an informed prior probability? Thanks.
There are many good answers to that question, which I've dealt with at https://hbiostat.org/bayes/bet . Briefly, we always know something and classical statistics does not even make use of such minimal knowledge, e.g., that a treatment is incremental rather than curative. An incremental therapy may entail use a prior for an odds ratio, for example, such that the probability that the odds ratio >4 or < 1/4 is 0.05. In some cases we have actual trustworthy data on which to base a prior. In a majority of cases a reasonable sample size makes the prior very less relevant. Having a prior is the price of being able to make probability statements about the unknown of true interest. Just as with medical diagnosis.