12 Comments
User's avatar
David AuBuchon's avatar

An excellent and very ambitious idea. I've had a related idea but by starting a mega-publisher with raw data reporting requirements that is all held on the backend. AI I think is more hypothesis-generating, and human checking is its companion at every step. Though someday I imagine it could outperform human researchers. It could also propose the most impactful hypotheses on its own, without prompting, and keep a running list of such. These hypotheses are also posted to a kind of research job board on the platform. Proposals get developed and escalated by researchers and the the crowd until someone actually claims it and does it. The estimated impactfulness (or "high-yield-ness") of each hypothesis gets updated in real time.

Expand full comment
medstudent's avatar

having a database for all to query is of course a very sensible idea... amazing it has not been done, serious incompetence over there

Expand full comment
Ernest N. Curtis's avatar

When does Sensible Medicine become unsensible? When it publishes articles promoting that fictional entity called "public health". Advocates for "public health" like to claim at least partial credit for the steady improvement in longevity and helping to eliminate infectious disease epidemics. The reality is that improvements in standard of living account for practically all of advances in longevity and the so-called infectious epidemics mostly tailed off to very low levels well before the introduction of vaccines. Antibiotics also played a significant role. While many accepted the presence of the public health bureaucracy as an inevitable government boondoggle that was inconvenient but probably did no major harm, this rosy view is no longer possible. The covid episode revealed the very real harm that these collectivistic bureaucrats can cause for a major portion of the populace. Departments of public health should be abolished at all levels---local, state, and federal.

Expand full comment
Stan W's avatar

AI is a tool, not a solution to the current deficiencies in our public health and research programs, which are manifold. One of the largest problems, particularly in the context of the application of AI, is the quality of the available data - much of which is of poor quality and unreliable. Combine this with the tendency of AI to confabulate when posed with difficult questions / tasks and you may well create a massive GIGO problem absent first addressing the underlying issues of data quality.

Expand full comment
AM Schimberg's avatar

Even this might be an avenue of discovery for a properly trained and programmed AI. AI can assess the quality of the studies and give ranking to the quality of the data before using it in it's evaluations.

Expand full comment
Stan W's avatar

And who is going to “properly train and program” the AI in question, and on what bases? The available data are a complete mess. Just one of many examples; in vogue, but incorrect, results being endlessly “replicated” in the literature while contrary and correct findings are either never submitted for publication, or rejected. This ‘crowd’ bias, combined with the perverse incentives that dominate both the funding basic and clinical medical sciences and publication, are major sources of the pervasive unreliability of the existing data. How might AI be used to address these issues, before actually using the available data to guide future research and medical care?

Expand full comment
Jim Ryser's avatar

After the opiate debacle I will never trust public health again. “Public health” is the HR of medicine. You would THINK HR is for the employee…you would THINK public health is for the public! Somebody is benefitting from public health but it sure ain’t the public!

Expand full comment
JohnS's avatar

One of the challenges of AI is to make it honest. If the AI engine is given access to all the data and trained properly to weight the quality of the data – ignore observational studies when RCTs are available – the AI is going to expose some uncomfortable truths. We already see this with Cochrane reviews, but they can be brushed aside more easily than a publicly funded AI. Many powerful interests will object and claim that the AI is hallucinating. To pull this off will require strong and honest leaders. AI can be easily biased with supervised training.

I have a bit more confidence in the private sector, I assume everything public health officials do is compromised. I think it’s a matter of time before the AI engines get access to all the data, including behind pay walls. I love Perplexity but get annoyed with some of its biases. I think the biases will decrease with more high quality data so long is there are no interventions when someone gets upset with answers.

Expand full comment
Laura Coleman's avatar

Since this is such a knowledgeable forum, can you speak to the data availability from research from Clinical trials, public universities, medical schools and public hospitals like MD Anderson. In addition, I would think healthcare insurance and pharmacy data should be available as long as it’s masked (no name) but all other demographic and diagnosis, treatment, and outcome information/data intact.

I would also like to see NIH/FDA/CDC research study standards established so data across all sources are comparable and useful. I’m old enough to know the computer programming adage “Garbage in Garbage out” which also applies to research and AI…we definitely need a move away from a for Profit only model and be more inclusive of natural/alternative healthcare solutions/treatments. This is why MD Anderson will never make cancer history, and why we and are children have been for profit lab rats for vaccines, toxic metal adjuncts, toxic pesticides, and the epidemic rise in autism and Alzheimer’s.Too many Government leaders, researchers and physicians on the drug manufacturer’s payroll.

Expand full comment
Dr. K's avatar

John, Interesting piece, most of which I heartily endorse. [HIPAA (about which I testified to Congress and was ignored) has to be one of the worst pieces of legislation ever. But no one cares even though it has done enormous disservice to most axes in health care.]

But there is a further issue in what you propose -- the data is infinitely "bad" -- dirty, duplicative in non-intuitive ways, out of sequence and written to an infinitude of quasi-standards that are not. To provide useful analyses, the AI used would have to be able to do two things: 1) make all information be about individuals (not groups/populations) because you cannot care for a population and because any conclusion at some point MUST be applied to an individual to be useful (and as has been demonstrated many times, conflating individuals is risky and sometimes, literally, fatal) and 2) UNDERSTAND the data so that the answers to questions you posed as examples have a chance at being correct. Neither of these criteria can be met by generative AI, much as many folks wish they could be. Agents are just the latest sop (after RAG) to try to paint lipstick on a pig not suited for purpose..

This is the foundational deception of generative AI -- it is meant to look intelligent, but actually is solely a correlation engine, NOT an understanding engine. The correlations are deceptively good (and I use many of these engines every day) but entirely meaningless in terms of understanding. It is trivial to invoke nonsense hallucinations from the best engines because: 1) they are foundationally probabilistic (and this cannot be fixed) and 2) they rely on traversing training sets which themselves are filled with misguided rhetoric that the engine "copies" because on some traverses through the network that is the path it takes. As a great example of the nonsense that ensues, here is a "coding" AI that refused to code, telling the requestor that "they would not learn anything unless they coded it themselves". https://arstechnica.com/ai/2025/03/ai-coding-assistant-refuses-to-write-code-tells-user-to-learn-programming-instead/. These kinds of situations are very, very common and are virtually impossible to reduce below 10% (and are usually far worse).

DARPA (love them or hate them, they have a mountain of expertise) notes that there are three generations of AI -- generative is the second generation, characterized by being "statistically impressive but individually unreliable". (https://machinelearning.technicacuriosa.com/2017/03/19/a-darpa-perspective-on-artificial-intelligence/). The often forgotten issue in health care is that it is 100% about individuals -- one cannot care for a population...only people.

The reason this is true is because correlations are impressive until they are not -- and any application of a particular correlation to a particular case is likely to become wrong (thus hallucinations and their ilk). This is why it is unsafe to use any generative AI UNLESS YOU ALREADY KNOW THE ANSWER to the question you are asking. (Ask any of the lawyers who wrote and submitted briefs that way.)

So getting answers from almost unusable data using probability-responsive tensors that cannot be "run in reverse" to check what they did is likely to give bad results -- and in any case, results not applicable to any patient. I expect the results would be interesting, but with so many potential errors and holes that their use for anything other than non-scholarly articles would be worrisome.

DARPA posits a third wave of AI -- one that is mostly still not extant. (This is in many ways the foundation of the original Web 3.0 which is why we have not seen that, either.) This is context adaptation, often referred to as "Cognitive AI" -- that is, AI that UNDERSTANDS. This is particularly important in fields like health care where the unit of measure is an n-of-1...you cannot treat a population (statistics) -- only an individual (as vonEye pointed out years ago).

What you are proposing could have important results -- if a different framework (not the current deep learning/LLM framework) is adopted. One needs a framework that is deterministic and based on empiric truths -- not probabilities tweaked up front with RAGs and in the back by agents. Some of us have been working in this Cognitive AI space and the results are more-than-interesting. And completely different from just dropping the "health data" into a generative AI model and hoping for the best. (We did some of the original research with IBM using Watson for health...fascinating results, utterly consonant with the above. We predicted exactly what happened at MD Anderson.)

This is a decent general-reader article on why the current generative AI tools cannot deliver what needs to be delivered at the level one would wish: (https://medium.com/data-science/the-rise-of-cognitive-ai-a29d2b724ccc). The author has also published another piece on why agents cannot do it either for other unexpected reasons...lol. (https://towardsdatascience.com/the-urgent-need-for-intrinsic-alignment-technologies-for-responsible-agentic-ai/.) Every look at the current solution stack cries out for a deterministic, description logics (or other equivalent), curated approach to actually making real sense of the mountains of data out there.

So loved your piece, but we already see people dumping their stuff into various engines where the results will be specious at best. We can do better and this is the time to start out right.

Expand full comment
Jim Ryser's avatar

The only people I have EVER seen benefit from HIPAA are the frequent flyer drug addicts who bounce from hospital to hospital.

Expand full comment
KTonCapeCod's avatar

I hope your dreams come true. To think we are asked to collect data on "gender" and how much money I make and my education level, and we can't even use our required EMRs for useful data and health related outcomes gathering. But at least you will know I am a white woman aged 50-60 who has a Doctoral degree and makes 100k to 150k. But you have no idea how to use my data to find out why my back pain takes 8 visits of care at clinic X versus a similar patient who needs 20 visits. Ugh. So. Absolutely. Dumb. And that isn't life saving information! We don't have those questions answered either!

Expand full comment