Staying True to Trial Analyses -- The Tale of Two Screening Trials
I contrast two population screening trials--DANCAVAS and NordICC. One stayed true to the principle of randomization. The other broke traditional intention-to-treat principles
When you randomize patients to two therapy groups, you must count outcomes based on the group that the patient was randomized to. It does not matter if the patient did not take the drug or have the procedure. The name we give this principle is intention to treat.
There is a temptation to count only people who actually get the drug or procedure. This is called the as-treated group. The problem is that patients who get the procedure or drug may be different than the ones who do not. Using as-treated outcomes breaks the balancing benefits of randomization.
Two screening trials exemplify these concepts. Both attempted to sort out a population benefit to screening tests.
In the DANCAVAS trial, the intervention was an invitation to an efficient pragmatic cardiac screening program including blood, ECG and imaging tests. If that screening program identified an issue, action was recommended through the normal Danish health system. The control arm was people not invited to screening who received usual Danish healthcare. The trial endpoint was mortality. Alive or dead.
In the NordICC trial, the intervention was also an invitation to undergo colonoscopy screening (the invited group) or to no invitation and no screening (the usual-care group). The primary endpoint was the incidence of colorectal cancer and related death, and the secondary end point was death from any cause.
Trial Results and Interpretation
In DANCAVAS, screening was offered to ≈ 16,700 people and not offered to ≈ 30,000 who remained in the Danish healthcare system. Randomization was at the invitation letter level.
You can see that of the 16,700 invited, only 10,471 individuals attended screening, which means about 6300 patients did not attend screening.
The main result was that 12.6% in the invited group and 13.1% in the control group died (hazard ratio, 0.95; 95% confidence interval [CI], 0.90 to 1.00; P=0.06).
I quibble with this strict conclusion because the 95% confidence intervals allow for a 10% relative risk reduction in death, but, statistically, the trial did not meet significance.
My key point is that, to this day, 3 years on, DANCAVAS investigators have never reported the results of the people who actually got screened (N = 10,471) vs those who did not attend (N = 6300).
When the trial was presented in 2022, I asked senior investigator Axel Diederichsen why he did not present the “as screened” analysis, and he looked askance and said that would not be proper.
I understand it as not proper for two reasons: one is that DANCAVAS is a population-based test of invitation to the screening program. The second reason was that patients who decided to attend vs not attend likely have different baseline characteristics that would preclude causal inference from the intervention.
In the NordICC trial, ≈ 28,000 people were assigned to have colonoscopy and ≈ 56,000 were assigned to no screening.
Similar to DANCAVAS, a large fraction of individuals (58%) assigned to the colonoscopy group did NOT have the procedure.
Applying the DANCAVAS methods to NordICC yielded a death rate in the invited group of 11.03% and 11.04% in the usual-care group. The HR was 0.99 with 95% CI of 0.96-1.04. Solidly negative.
The risk of death from colorectal cancer (0.28% vs 0.31%) was also not significantly different in the invited vs usual care arms. (HR 0.90 (0.64-1.16).
So that’s it, right? Same results and discussion as DANCAVAS.
No. That is not what happened.
The NordICC investigators did not resist the urge to stay true to randomization. They presented the adjusted per-protocol analysis “to estimate the effect of screening if all the participants who were randomly assigned to screening had actually undergone screening.”
This analysis found a statistically significant 31% relative risk reduction in colorectal cancer and 50% reduction in colon cancer death rate (0.15% vs 0.30%).
When I mentioned NordICC to a GI colleague, his eyes brightened, and he told me that you have to get the colonoscopy to benefit.
This message, of course, came from the American Gastroenterological Association (AGA) Talking Points memo pictured below.
Comments:
The point I hope to make is not about the merits of cardiac or colonoscopy screening.
My main point is to illustrate how one set of authors stayed true to their original analysis and did not even dare to present the as-screened analysis, while the other group of authors broke the rules of intention-to-treat.
Both trials set out to study the effects of population based screening. Does an invitation to cardiac screening or colonoscopy improve outcomes?
The intervention was not the screening tests but the invitation to screening.
This, I believe, is a relevant public health question. Because people who are able and willing to take the time to have these tests are different from people who cannot. Those differences could surely cause differences in outcomes.
I can’t speculate on the NordICC authors’ reasons for presenting the as-screened numbers. Perhaps it was the peer-reviewers that forced them.
But I, and surely you too, can speculate on why the AGA were ready with their talking points about the biased as-treated analysis.
Notice that AGA did not take the opportunity to educate the public about causal inference and randomization. They did not say that better outcomes were noted in the as-screened group because these patients were healthier, richer, or more health conscious than the non-screened group.
They attributed all the benefit to the procedure, rather than the potential differences in patients. Of course they did.
There are really four populations. (1) those invited who screened; (2) those invited who did not screen; (3) those not invited who screened anyway; (4) those not invited who did not screen. Population (3) is really the fly in the ointment. Then there are the sub-populations of (1) and (3) who detected illness (1/3 A) and those that screened that did not detect illness (1/3 B). It would be interesting to know false positives and false negatives in these two groups and associated outcomes, including adverse side effects. Why? You might discover, for example, that sending an invitation has no impact on screening rates, or maybe it does. You might also find that screening is ineffective in accurately diagnosing disease and that the side effects resulting from false positives overwhelms the advantage of finding the disease early, or the opposite. If you were actually trying to figure out what policy to adopt would this information be valuable? I think it would. Why invite people to an ineffective screening? Why invite people to a screening that is more likely to cause harm than to advance health? Of course, once you start the study you probably have little ability to modify it on the fly without injecting bias. So it is best practice to really think things through before you start and to challenge your assumptions. Maybe screening is ineffective. Maybe screening is effective. I suspect that the study designers started from the assumption that screening is effective. But that assumption may not be true. Whatever the truth was, it would impact the results of this study.
A simple way of seeing the problem: viewing from the top, we have an RCT. Embedded inside the ‘assigned to colonoscopy arm’ is the equivalent of a classical observational study. We know that observational studies show that colonoscopies appear to have a benefit, but the point of this study is to learn whether an RCT will show a benefit. So, for the people who claim a benefit by comparing the two groups inside the ‘assigned to colonoscopy arm’, they are making the argument that observational studies prove causality and that RCTs are not necessary. They may not realize they are doing this.