Antibiotic Management of Nonperforated Appendicitis is Inferior to Appendectomy
Don’t Believe It
For decades, the treatment of acute appendicitis was appendectomy. Nobody even questioned this treatment approach. The appendix was acutely inflamed, at risk of perforation, it needed to be removed. The idea that there could be a simple, non-surgical management could not have been imagined. But, eventually evidence would accumulate casting doubt on this paradigm. Submariners submerged under the sea for months at a time developed appendicitis that could not be treated surgically and they did fine. Emergency appendectomy was performed for fear that perforation was inevitable but epidemiological evidence suggested that acute and perforated appendicitis were different diseases. The nail in appendectomy’s coffin came from Finland where an RCT showed that appendicitis could be treated with antibiotics with no complications attributable to trying this approach.
A recent Lancet article, reporting the results of a multicenter randomized clinical trial, stated, “Based on cumulative failure rates and a 20% non-inferiority margin, antibiotic management of nonperforated appendicitis was inferior to appendicectomy.” This conclusion should be doubted because:
1) The experimental and control groups had different endpoints. This is not how RCTs work.
2) The control group’s endpoint, a normal appendix based on pathology, is not on the causal pathway for complications of appendicitis.
3) Setting the noninferiority margin at 20% was arbitrary and ensured a negative result.
4) Language usage biased reporting of the study findings.
The experimental and control groups had different endpoints. This is not how RCTs work.
The design of this study was problematic. Children with acute, uncomplicated appendicitis were randomly allocated to antibiotics (n=477) or appendectomy (n=459). Treatment failure, the primary outcome, was defined differently for each group: A need for subsequent appendectomy in the antibiotic group and negative appendectomy for the controls. The study hypothesis was that avoiding surgery by antibiotic treatment of appendicitis would be almost (by 20%) as good as (i.e. noninferior to) initially having an appendectomy (if it was not a negative appendectomy). This approach does not make sense. An RCT measures an intervention’s effect on a single outcome. In this study, different outcomes were assessed for each group.
The control group’s endpoint, a normal appendix based on pathology, is not on the causal pathway for complications of appendicitis.
Randomized clinical trials are performed to “provide evidence for relative treatment effectiveness over an adequate time horizon for assessing target patient outcomes.” At least in theory, appendectomy is performed to avoid more serious intraabdominal infection if appendicitis is left untreated. Consequently, trials comparing antibiotics to surgical treatment of appendicitis should have serious infection as the main outcome. Antibiotics treat serious infection, and appendectomy avoids it. However, the endpoint for the control group in the Lancet study was negative appendectomy rates. True, negative appendectomy is an outcome of interest for the care of patients with appendicitis, but it has nothing to do with the clinically important outcome of avoiding serious infection. To riff off of Jerry Steinfeld, this was an RCT about nothing.
Because the outcomes were different, the study groups cannot be compared. What can be gleaned from the study are the outcomes from antibiotic treatment of pediatric appendicitis in children: 66% of children with acute, noncomplicated appendicitis were successfully treated with antibiotics alone. This is an important outcome. It means that 2/3 of children presenting with uncomplicated appendicitis could avoid surgery.
Setting the noninferiority margin at 20% was arbitrary and ensured a negative result.
Noninferiority trials are used when a new treatment may be equally effective to an older therapy but has other advantages such as being less expensive, require fewer doses, or has fewer complications. Noninferiority studies examine if the new therapy and old therapy are equivalent or, if they are not equivalent, how much of the older treatment’s effectiveness is sacrificed to offset the benefits of the new treatment. This is the noninferiority margin. If the new treatment’s effectiveness is outside the margin, the new treatment is considered inferior to the old. If it is within the margin, it is considered almost as good as the established treatment, i.e. noninferior.
When performing a clinical trial that will be analyzed by frequentist (null hypothesis testing) methods, the minimal clinically important difference (MCID) between groups must be specified. For noninferiority designs, the equivalent is called the noninferiority margin. These concepts distinguish between statistical and clinical significance. Statistical significance is a difficult concept to understand and is frequently misunderstood. What medical literature readers should know is that they should pay little attention to study conclusions that demonstrate statistical significance based on some P value. They should pay attention to clinically significant findings. Clinically significant differences are based on prespecified margins defined as the difference between groups needed to be clinically important as established by the study’s investigators.
How the clinically important margins are established should be explained in detail in a research article. For the Lancet appendicitis report, St. Peter et al stated:
“The sample size was calculated to test the null hypothesis that antibiotic treatment alone is inferior to appendicectomy by more than 20 percentage points, implying that surgeons and patients would be content with failure being within 20%. The non-inferiority margin was determined by trial investigators as a compromise between a margin that would be acceptable to patients and their families (who might find a margin wider than 20% acceptable) and one that might be acceptable to surgeons treating children (who would probably prefer a narrow margin) and is consistent with opinion within the literature.22”
Reference 22 was a Cochrane review of antibiotic treatment of appendicitis published in 2011. This was published before there were successful randomized clinical trials showing that appendicitis can be treated with antibiotics. The authors of the Lancet article never stated why they thought that patients and surgeons would accept a 20% difference between groups.
Although published after this trial began, there is a prospective trial of antibiotic treatment of appendicitis. In selecting a treatment margin, the authors of this study stated:
“The threshold success rate was set at 70% to accommodate the opinions of the surgeons at each site and to obtain complete surgical group participation from each site at the beginning of the study. In contrast, the team members (patients and their families, primary care physicians, nurses, emergency department physicians, and payors) favored a threshold success rate of 50%.”
When asked, surgeons were willing to accept a 30% failure rate, but patients and other care givers were willing to have a 50% failure rate to avoid surgery. Nevertheless, after the patients received a scripted description of the study arms, only 1/3 chose antibiotic therapy.
Selection of the inferiority margin is highly subjective. One major problem with null hypothesis analytic approaches as was used in the Lancet study is that conclusions are reduced to a yes/no answer based on a subjectively determined margin for differences between the groups. Margin selection risks gaming the system by specifying a margin the investigator knows will succeed or fail. Before the authors of this study enrolled patients, Salminen showed that antibiotic treatment of appendicitis resulted in a 27% need for subsequent appendectomy. While this trial was underway, other reports showed that there was a 33% to 40% need for appendectomy within 1-year of appendicitis treated with antibiotics. Thus, the choice of the authors of this Lancet trial of a 20% noninferiority margin all but guaranteed that this study would conclude inferiority of antibiotic therapy. This is because the authors selected a noninferiority margin that was known to be lower than the actual rate at which antibiotic treatment for appendicitis fails.
Language usage biased reporting of the study findings.
This trial’s authors refer to the 34% appendectomy rate in the antibiotic treated patients as a treatment failure. However, 66% of patients initially treated with antibiotics never required surgery, a success for those who prefer to avoid surgery. Prior studies have shown that patients who have appendicitis treated by either surgery or with antibiotics perceive their treatment as successful. The word ‘failure’ appears 40 times in the article when referring to antibiotic treatment and the word ‘successful’ only twice. The bias in favor of surgery was obvious in this article.
Conclusion
Readers of medical literature cannot rely on an author’s conclusion in any journal, no matter what its reputation. Most clinicians have little training in how to interpret clinical research. However, with ready dissemination of ideas via the internet, opinions about an article’s interpretability are readily available. Consumers of medical information need to identify trusted sources and read them before blindly accepting what appears in the medical literature.
Dr. Livingston is a Professor of Surgery at UCLA and former Deputy Editor at JAMA. He is a frequent contributor to Sensible Medicine.
Great analysis
This is a disease where it would be best to let the dust settle on both treatment arms. Both surgery and antibiotics have late side effects. One year follow up, total hospital days, morbidity and mortality.
Imagine two F1 teams in a race, one using one stop, the other two stops. This is like assessing the strategies based on who is leading at lap 40.
Great post.
What is with journal editors these days? If a “non-inferiority” trial fails to reject the null, the conclusion statement should read “was not non-inferior”, rather than “was inferior”. It is a small and technical point, but it speaks to the precision of language and concepts, and to the attention to detail. Reading that abstract makes me think it was amateur day at Lancet.
Thanks also for drawing attention to the “endpoint” in the antibiotic arm. A “failure” of antibiotics was the need to get an appendectomy….which is the starting point of the control arm. Ie. a failed active arm pt hasn’t lost anything (except some lead time). Esp (as pointed out by the Skeptician below) when there were no differences in severe clinical outcomes like death.
After reading this study, my clinical take would be “why wouldn’t everybody try a course of antibiotics first?”.