12 Comments
Oct 1, 2023Liked by Nicole White

Validation non a holdout sample is often not what it seems. It requires very large samples to have sufficient precision, and lowers the sample size available for model development. It hides instability due to variable selection and makes it difficult to incorporate geographical and temporal trends into model building. It the majority of cases, strong internal validation through honest resampling procedure is more accurate and more insightful. “Honest” here means that all supervised learning steps are repeated afresh within each resampling loop. 100 repeats of 10-fold cross-validation or several hundred bootstrap resamples is recommended. For elaboration on these issues see https://hbiostat.org/bbr/reg#internal-vs.-external-model-validation

Expand full comment
author

Thank you, Frank, for sharing this resource. I agree that there are different approaches to internal validation and that an honest approach to internal validation can be informative when external validation is not feasible.

I think there is a larger issue in the prediction modelling literature about what internal validation means and how it is reported. In many cases, references to "internal validation" may instead be describing split-sample validation or cross-validation without optimism correction. Greater adherence to TRIPOD reporting guidelines would definitely improve how models are appraised by others, but progress has been slow. The more we draw attention to these issue the better (I hope).

Expand full comment

Extremely good points Nicole. Some of the biggest self-created problems with holdout sample “validation” is that what is called a failure to validate in time or space is merely a lost opportunity to combine all the data so as to be able to model secular and geographical trends.

Expand full comment
Sep 30, 2023Liked by Nicole White

Thanks for this. A very informative primer.

I’m amazed so many models get published without being vetted with a prospective validation cohort. If you’re only testing your model on the retrospective derivation cohort, you’re are asking for self-fulfilling prophecies. Such models would be worse than useless, as they instil unwarranted confidence in a model’s prospective utility.

Expand full comment

.

Someday We Will Laugh About All Of This.

I Just Got A Head Start.

.

Expand full comment

In other words, it is all pretty useless.

Expand full comment

Good discussion of an important subject. I fell out of my chair and broke my left wrist when I read that more than 4,000 new publications in this area are spewed out every year. One must wonder how many of those new articles represent churning by assistant professors eager to "climb the ladder" and some day reach Professor. I *predict* that more than 68.34 percent of the articles fit in that category

Expand full comment
author

I am confident that publication for the sake of publication is a major driver! The sheer number of publications referencing clinical prediction modelling is likely a combination of high/low-quality models, review papers and "hype" papers discussing the potential of clinical prediction models to transform healthcare. Even so, what concerns me is that finding high-quality models among all this noise is incredibly difficult. Unlike RCTs that (mostly) use more standard language, descriptions of clinical prediction models are highly variable.

Expand full comment

That reminds me of around one century of psychological tests/models.

Needless to say that they are almost always bad, unable to capture what they try to, using vague/unmeasurable KPIs and prone to bias and alpha/beta errors.

Yet they are employed everywhere: from clinical practice, to HR depts, to marketing and communication, to influencers and entertainment.

Given the quantity of scientific literature produced each day, the quantity of scandals around it and its uttermost importance for our society, I think we may need a further level of checks that actual meta-analyses and systematic reviews miss.

That is, scientific entities (institutionalised?) that - based on the highest standards of research - check if some research published in scientific journals is worth any consideration or should be retracted.

Thus impacting on the reputation of both the journals and the authors; plus fixing various other problems (conflicts of interest, publication bias, non-standardized reviews, h-index, predatory journals, political bias, etc).

Expand full comment
Sep 29, 2023Liked by Nicole White

I love that we humans are SO obsessed with prediction! There's no more uncomfortable state for most people than uncertainty. But while high confidence in a poor quality prediction model might temporarily assuage our anxieties, it ultimately leads to less credibility of the ones who make the prediction. One suggestion I would make, if possible, is to run your own external validation study before actually using a model that has not been externally validated. Find a grad student in need of a study!

Expand full comment
deletedSep 29, 2023Liked by Nicole White
Comment deleted
Expand full comment
author

I'd be keen to see some examples of low-quality published climate models!

Expand full comment
Sep 29, 2023Liked by Nicole White

And we learn that clouds and cloud dynamics are excluded because too hard to model yet clouds are a major factor in climate.

Expand full comment