I Am Afraid of Early Cancer Detection
Part 1: A critical appraisal of a study of Grail's Galleri test
It’s been a few years since I first heard about the Galleri test produced by the Menlo Park startup Grail. Galleri is a blood-based screening test for cancer. I think I first heard about it from doctors in concierge practices; next I heard whispers from my own better-off patients who brought me results or asked what I thought of the test.
For a couple of reasons, I didn’t work very hard to learn about the Galleri test.
First, I’ve been doing clinical and evidence-based medicine for long enough that I knew I could convincingly argue against the value of this test without actual knowledge.1 I would argue that there is no data showing that Galleri saves lives. The purpose of a cancer screening test is not to find cancer, I would say, but to save lives. I would add that, given the low prevalence of disease in a screened population, the test would have a shockingly low positive predictive value. This would guarantee false positives, overdiagnosis, and overtreatment with subsequent financial and physical harm (to patients and to society, but not to the Menlo Park startup).
Second, I was worried that this test might have real promise. An effective blood-based cancer screening test would be hugely beneficial for patients but, if I am completely honest, would test (and likely defeat) the minimalism and parsimony that I so appreciate in medicine. Galleri could do more harm to the idea of medical conservatism than any hate rising from Twitter or blogs.
So I buried my head in the sand until an article of the February 2nd WSJ -- Who’s Afraid of Early Cancer Detection? -- made me commit to learning about the test. Today and next week I will post articles about Galleri. This week will be a straight up critical appraisal (with a bit of a critique of Wall Street Journalism Churnalism thrown in). Next week will be some soul searching about accepting progress in medicine even if it runs counter to my values.
I think that the most important article for leaning about Galleri was published in Annals of Oncology in 2021: Clinical validation of a targeted methylation-based multi-cancer early detection test using an independent validation set. This was the validation article in a series of diagnostic test articles.2 It was referenced in the WSJ article.
Background
Grail’s founding idea was to develop a single blood test that can screen for multiple cancers. This would be done by assaying for cell-free DNA, or other circulating substances, shed by tumors. The studies leading up to the Annals of Oncology study demonstrated that the most effective assay targeted specific methylated, cell-free DNA. This study was a case-control, diagnostic test study.
Design
The “multi-cancer early detection test” (Galleri) was run on 4077 patients with known cancers and 1254 people without cancer. To be certain that the controls were disease-free, the controls were followed and did not develop cancer in the year after the test. There were a number of exclusion criteria (see Table S1 – linked from here) so this is not a test that can be used on any person at any time. The results that would come from such a study design would be sensitivity (probability of a positive test in patients with cancer) and specificity (probability of a negative test in people without cancer). Given the large number of cancer patients included, sensitivity data could be determined for individual cancers at various stages.
Results
The study included an admirable diversity of patients. People without cancer were younger than those with, but there was a range of ages, and the sample was diverse with regards to race/ethnicity. It is surprising that the cases and controls were not more closely matched. The headline results are best expressed in the article’s figure 3. (I respect any authors who give this amount of journal real estate to a 2X2 table.)
Before we get into the nitty gritty of the data, let’s cover the numbers that might make you optimistic about this test. The high specificity should mean that few people without cancer will be classified as having cancer. The sensitivity is disappointing. If you have cancer, it is basically a coin flip as to whether the test will be positive. But, given this test is the only screening test for most of the diseases we are interested in -- pancreatic cancer, ovarian cancer, gastric cancer -- at least one in two might benefit. (This is about the depth of consideration the WSJ piece gave to these issues).
Application
Looking a bit more closely you see why Grail’s test is actually useless, or dangerous, or both. Let’s start with the sensitivity of the test. For a cancer screening test to work, it must find disease before it has caused symptoms -- when it is in an early or premalignant stage. Say what you want about lung cancer screening, mammography, PSA, and colonoscopy (I’m talking to you Drs. M and P) but at least they look for, and succeed at finding, early stage/premalignant disease. Here is the sensitivity of the Galleri test by stage: stage 1, 16.8%; stage 2, 40.4%; stage 3, 77%; stage 4, 90.1%.
The test is nearly worthless at finding stage 1 disease, the stage we would like to find with screening. The type of disease that is usually cured with surgery alone.
How about specificity? Let’s consider a fictional, 64-year-old male patient who presents to his internist worried about pancreatic cancer. I pick pancreatic not only because it is a scary cancer: we can’t screen for it, our treatments stink, and it seems to kill half the people in NYT obituary section. I also chose it because it is the anecdotal disease in the WSJ article.
The WSJ article tells the story of a patient lucky enough to have the Galleri test. The test showed a cancer signal referable to the pancreas, gallbladder, stomach, or esophagus. An MRI demonstrated a pancreatic lesion and a biopsy confirmed cancer. The patient had three months of chemotherapy, surgery (I assume a Whipple procedure), followed by another three months of chemotherapy. He now gets CT and MRI scans every three months.
I’ll try to explain how unusual this situation would be. What is the pretest probability of pancreatic cancer in a 64-year-old patient sitting with me in the office? The lifetime risk of pancreatic cancer is about 1%. About 90% of these occur in patients over 55. If we are screening for pancreatic cancer, we need to find it early. The 5 year of survival rate is 44% for localized pancreatic cancer, 16% for regional disease, and 3% for metastatic disease. So, being generous, let’s say we have a year to find early pancreatic cancer in our 64-year-old man.3 The sensitivity of GRAIL for stage 1 pancreatic cancer is 61.9%. Being interested in the 90% of pancreatic cancer presenting between 55 and 85 our “pretest probability” is 0.9%/30 or 0.03%.4
Working through the math (prevalence 0.03%, sensitivity 61.9%, specificity 99.5%), this means our patient’s likelihood of having pancreatic cancer after a positive test is only 3.58%. For our patient, we have caused anxiety and the need for an MRI. You almost hope to find pancreatic cancer at this point to be able to say, “Well, it was all worth it.” If the MRI or ERCP is negative, the patient will live with fear and constant monitoring. (You will have to wait until next week to consider with me the impact of this test if we were to deploy it widely).
If the evaluation is positive, and you have managed to diagnose asymptomatic, pancreatic cancer, the likelihood of survival is probably, at best, 50%.
Let’s end this week with two thoughts. First the data for the Galleri test is not good, yet. The test characteristics are certainly not those we would like to see for a screening test. Even more importantly, good test characteristics are just the start. To know that a test is worthwhile, you would like to know that it does more good than harm. This has not even been tested. The WSJ article scoffs at the idea that we would want this data.5
Second, I’ve been talking about this test at the level of one doctor and one patient. Consider the societal cost of this test. What if we spend the $1000 this test costs for every person, every year, between the age of 50 and 80. We then work up every positive test. If pancreatic cancer is a guide, about 96% of those workups will cause anxiety and the need for further monitoring without benefit.
It is a good thing we don’t have anything else to spend our healthcare dollars on.
Next week: How promising is this test and what would it mean for medicine if it becomes a cost-effective screening test.
This is one of the risks of gaining expertise.
There are more recent articles but they to not make Galleri seem any more promising.
There is the similarity between cancer screening and the possibility of intelligent life in the universe defined by the Drake equation.
If we are thinking of screening for all cancers, rather than just pancreatic cancer, this number will be higher but stick with me for a bit.
Ooops, I might have let my irritation show.
Just attended a dinner party where the topic of early cancer screening came up and I was the only one at the table who wasn’t getting invasive “annual” screenings of my colon breast and cervix. The look of horror on everyone’s face was as if i had spat in the soup. I brought up the lack of impact on mortality and the horrors of false positives but it all fell on deaf ears. I appreciate you, Dr. Cifu, and your rational and measured approach to the analysis of the potential benefits AND harms for those of us who still have and use critical thinking skills and apply them to our healthcare decisions.
The design section should read “run on total cohort of 4077 patients; 2823 patients had known cancers…”.
A screening test with the sensitivity of barely-better-than-a coin flip is patently useless. This test will be perfect for the worried well and those who cater to executive health care. Or maybe as an add-on to chelation for those who are into that sort of thing.
It seems, very roughly speaking and with exceptions that almost break any plausible rule, that the sensitivity might be better for detecting adeno carcinomas? As a DNA based test, is there any biologic plausibility for this?
Looking forward to next week’s entry to see what the number needed to screen will be in order to save one life. I’m guessing the answer will be along the order of “sh*t-tons”.
And as a cynic, I’d be curious about the financial backers of this outfit making this test. Any makers of MRI scanners bankrolling this thing?