Discussion about this post

User's avatar
Dr. K's avatar

John, Interesting piece, most of which I heartily endorse. [HIPAA (about which I testified to Congress and was ignored) has to be one of the worst pieces of legislation ever. But no one cares even though it has done enormous disservice to most axes in health care.]

But there is a further issue in what you propose -- the data is infinitely "bad" -- dirty, duplicative in non-intuitive ways, out of sequence and written to an infinitude of quasi-standards that are not. To provide useful analyses, the AI used would have to be able to do two things: 1) make all information be about individuals (not groups/populations) because you cannot care for a population and because any conclusion at some point MUST be applied to an individual to be useful (and as has been demonstrated many times, conflating individuals is risky and sometimes, literally, fatal) and 2) UNDERSTAND the data so that the answers to questions you posed as examples have a chance at being correct. Neither of these criteria can be met by generative AI, much as many folks wish they could be. Agents are just the latest sop (after RAG) to try to paint lipstick on a pig not suited for purpose..

This is the foundational deception of generative AI -- it is meant to look intelligent, but actually is solely a correlation engine, NOT an understanding engine. The correlations are deceptively good (and I use many of these engines every day) but entirely meaningless in terms of understanding. It is trivial to invoke nonsense hallucinations from the best engines because: 1) they are foundationally probabilistic (and this cannot be fixed) and 2) they rely on traversing training sets which themselves are filled with misguided rhetoric that the engine "copies" because on some traverses through the network that is the path it takes. As a great example of the nonsense that ensues, here is a "coding" AI that refused to code, telling the requestor that "they would not learn anything unless they coded it themselves". https://arstechnica.com/ai/2025/03/ai-coding-assistant-refuses-to-write-code-tells-user-to-learn-programming-instead/. These kinds of situations are very, very common and are virtually impossible to reduce below 10% (and are usually far worse).

DARPA (love them or hate them, they have a mountain of expertise) notes that there are three generations of AI -- generative is the second generation, characterized by being "statistically impressive but individually unreliable". (https://machinelearning.technicacuriosa.com/2017/03/19/a-darpa-perspective-on-artificial-intelligence/). The often forgotten issue in health care is that it is 100% about individuals -- one cannot care for a population...only people.

The reason this is true is because correlations are impressive until they are not -- and any application of a particular correlation to a particular case is likely to become wrong (thus hallucinations and their ilk). This is why it is unsafe to use any generative AI UNLESS YOU ALREADY KNOW THE ANSWER to the question you are asking. (Ask any of the lawyers who wrote and submitted briefs that way.)

So getting answers from almost unusable data using probability-responsive tensors that cannot be "run in reverse" to check what they did is likely to give bad results -- and in any case, results not applicable to any patient. I expect the results would be interesting, but with so many potential errors and holes that their use for anything other than non-scholarly articles would be worrisome.

DARPA posits a third wave of AI -- one that is mostly still not extant. (This is in many ways the foundation of the original Web 3.0 which is why we have not seen that, either.) This is context adaptation, often referred to as "Cognitive AI" -- that is, AI that UNDERSTANDS. This is particularly important in fields like health care where the unit of measure is an n-of-1...you cannot treat a population (statistics) -- only an individual (as vonEye pointed out years ago).

What you are proposing could have important results -- if a different framework (not the current deep learning/LLM framework) is adopted. One needs a framework that is deterministic and based on empiric truths -- not probabilities tweaked up front with RAGs and in the back by agents. Some of us have been working in this Cognitive AI space and the results are more-than-interesting. And completely different from just dropping the "health data" into a generative AI model and hoping for the best. (We did some of the original research with IBM using Watson for health...fascinating results, utterly consonant with the above. We predicted exactly what happened at MD Anderson.)

This is a decent general-reader article on why the current generative AI tools cannot deliver what needs to be delivered at the level one would wish: (https://medium.com/data-science/the-rise-of-cognitive-ai-a29d2b724ccc). The author has also published another piece on why agents cannot do it either for other unexpected reasons...lol. (https://towardsdatascience.com/the-urgent-need-for-intrinsic-alignment-technologies-for-responsible-agentic-ai/.) Every look at the current solution stack cries out for a deterministic, description logics (or other equivalent), curated approach to actually making real sense of the mountains of data out there.

So loved your piece, but we already see people dumping their stuff into various engines where the results will be specious at best. We can do better and this is the time to start out right.

Expand full comment
KTonCapeCod's avatar

I hope your dreams come true. To think we are asked to collect data on "gender" and how much money I make and my education level, and we can't even use our required EMRs for useful data and health related outcomes gathering. But at least you will know I am a white woman aged 50-60 who has a Doctoral degree and makes 100k to 150k. But you have no idea how to use my data to find out why my back pain takes 8 visits of care at clinic X versus a similar patient who needs 20 visits. Ugh. So. Absolutely. Dumb. And that isn't life saving information! We don't have those questions answered either!

Expand full comment
10 more comments...