Stadiums packed with working-class Americans cheering for public health. Political leaders calling for war on our chronic disease crisis. An incoming NIH Director, Jay Bhattacharya M.D., Ph.D. committed to reform. Had someone painted this picture when I was a graduate student at the Johns Hopkins Bloomberg School of Public Health where the moto is “saving lives, millions at a time” it would have sounded like a dream come true. But today, many public health professionals don’t see cause for celebration. Instead, they see uncertainty with science and public health systems struggling to regain trust.
I remain optimistic because change means rejecting the status quo and accepting the potential for progress. The status quo has failed us. For the first time, U.S. life expectancy is declining—a glaring indictment of our public health effectiveness. So how will Dr. Bhattacharya respond? While calls for change have historically garnered rhetorical attention, now there is an unprecedented opportunity for real transformation.
NIH holds a powerful advantage—resources, data, and artificial intelligence (AI)—with the potential not just to reimagine but genuinely revolutionize scientific research. Leveraged properly, AI-driven data may significantly enhance NIH’s effectiveness, accelerate discoveries, and drive innovative breakthroughs. The critical question is clear: how exactly should NIH harness AI to achieve these ambitious goals?
Unlock our Public Health Data
NIH has supported enormous data resources – from decades of clinical trials, long-term population studies, genomic databases, etc. Yet this gold mine is mostly fools gold. 97% of health data is unused—a staggering waste of potential. The little data utilized for research rarely reaches its full potential. For example, a systematic review of medical journals found only 0.6% of studies made their data public. Even large, permanent public data are underused. The Youth Risk Behavior Surveillance System, for instance, provides essential nationally representative insights into the behavioral and lifestyle health of children but is at risk of being discontinued because the data are underutilized. (This includes other agencies, like the FDA with its treasure trove of data, including millions of clinical trial encounters and clinical research forms that have almost never been reanalyzed.)
When data are used, most analyses are not reproducible. In 2024, the NIH Office of Strategic Coordination launched the Replication to Enhance Research Impact Initiative. This pilot program aims to support the replication of NIH-funded studies but despite reproducibility being a scientific necessity, this initiative will receive only a miniscule of NIH funding and relies on a costly, contrived method for replication: The original investigators hire clinical research organizations to replicate their study under their direct supervision. The goal of replication is for any scientist, anywhere to be able to understand and implement the study design using public data—a task I would entertain myself as a student where replicating studies was one of my passion projects, including when I not only found the original assertions of an American Journal of Public Health study failed to hold but the opposite conclusion was true!
To see the harms of data inaccessibility and poor reproducibility one doesn’t need a long memory. During COVID-19, billions of dollars tracked limited outcomes like mortality rates, while crucial issues such as mental health, addiction, and chronic diseases were overlooked due to fragmented data systems that made simply assembling and analyzing data a herculean effort. This resulted in an incomplete response rather than a holistic, data-driven strategy. Dr. Bhattacharya’s appointment was partly to prevent a repeat scenario.
AI Agents: The Key to Secure, Scalable Data Analytics
While it is tempting to blame researchers for hoarding data to protect their career interests—after all, data-sharing advocates were not long ago dismissed as “data parasites”—the larger obstacle is logistical. Sharing data that includes sensitive patient health information, exposes researchers and institutions to significant legal and financial risks. For example, the Feinstein Institute for Medical Research settled HIPAA violations for $3.9 million after being accused of mishandling a laptop containing protected health information. Recent breakthroughs in the development of semi-autonomous AI agents that mimic human behavior by observing their environment, planning, acting, and re-evaluating their actions based on initial instructions—like planning and booking a trip—provide new opportunities for NIH research. What if AI agents specialized in technical aspects of data storage, data access, and data analytics could be deployed across the nation’s entire ecosystem of health data?
Instead of direct data access, AI-driven secure data enclaves would allow researchers to run complex analyses on sensitive datasets without exposing raw patient information. Here is how they work: AI systems would be composed of semi-autonomous agents, each specialized in a piece of the research process. In a typical agentic workflow, a user (who could be a scientist, clinician, or even a patient advocate) poses a research query in plain English. For example, “are women more likely than men to develop Long COVID?” The AI agents starts to work: one agent iterating with the user refines the research question and design a study, another queries the relevant data (public health records, clinical data, etc.); another cleans and organizes the data; a modeling agent runs statistical analysis; yet another agent interprets the results and generates a readable report. In the end, the user gets an answer – with graphs and explanations – as if a whole team of researchers had toiled for months (yes, the average time to complete a research report, even in emergency situations, is 11 months), when in fact an AI orchestra performed the work in the background within minutes. The result: Participating in research would be as easy as asking a question.
Past failures to integrate cutting-edge technology offers a cautionary tale for NIH. Take electronic health records (EHRs) – a decade ago, they were touted as the 21st-century upgrade to medical practice. Billions were spent rolling out EHR systems nationwide. Yet ask any doctor today about their EHR, and you’re more likely to get an eye-roll than praise. Yes, we transitioned from paper charts to digital, but we failed to translate that digital data into smarter care. NIH risks squandering the potential of the AI age of abundance, just as EHRs did in clinical practice. It’s time to truly leverage AI for transformation of NIH systems, not just install it on top of old habits.
While I am often skeptical of AI, AI agents have already been deployed on health data to empower research demonstrating their real-world feasibility. How then might the NIH deploy such technology?
NIH should invest in a centralized public data warehouse.
These repositories can then be accessed by AI agents to run analyses, ensuring that no personally identifiable health information is ever disclosed.
Researchers can query these agents to answer research questions.
Researchers receive full, unrestricted access to analytical outputs, meaning they gain actionable insights without exposing the privacy of research subjects.
The entire analytical process is carried out in public and archived. The results of NIH research can be replicated with a single click, and extended/generalized across diverse populations with written prompts, no coding required.
NIH has greater insights into the types of questions being investigated and therefore can provide greater stewardship.
From Reactive to Proactive
The implications for NIH (and potentially the FDA and CDC) are profound. An NIH that embraces AI-driven, agentic workflows could open its doors to a much broader range of contributors and stakeholders. Instead of research being the domain of a select few, we’d see more clinicians, nurses, patients, and citizen scientists collaborating in discoveries. Such democratization not only accelerates innovation (more minds, more ideas), but also rebuilds public trust in science. At a time when mistrust and misinformation are rampant, giving communities the data and tools to analyze data for themselves makes research participatory. People become partners in knowledge-generation, not just passive subjects of study.
Perhaps the starkest change AI can bring to NIH is a shift from a reactive, research model to a proactive one. Traditionally, NIH (and science in general) identifies priorities by looking at historical burdens of diseases or by following the leads of existing research. This often means we’re chasing yesterday’s problems, perpetually one step behind. This is evidenced by a laundry list of public health crises that were only detected and responded to years after they got out of control (e.g., opioid addiction, the rise of vaping cannabis or nicotine, worsening mental health and increased suicide, etc.). The NIH working with other government agencies could stand up a real-time data mission control center covering a range of lifestyle and chronic disease measures and use these data to implement a scientific evaluation of the Make America Healthy Again agenda. Researchers have in rare cases demonstrated the potential of this proactive approach, such as I have done tracking demand for HIV testing, increased suicidal ideation, or recently rising concerns about gambling addictions, but do so relying on limited data and still detect these important events after significant delays. A robust real-time data infrastructure powered by AI agents could make powerful proactive research the norm.
Ultimately, this isn't just about harnessing technology; it's about harnessing the full potential of NIH research. By investing in AI-driven data initiatives, NIH can empower millions of scientists, clinicians, and communities to innovate more rapidly, collaborate more effectively. Broader, secure data access through AI agents will ensure more reproducible and generalizable research -- the gold standard of research. If we act now, the future won't just see improvements—it will mark the dawn of a new era in public health, one where data-driven insights and proactive interventions mean our collective goal of "saving lives" with public health isn't just a motto, but a reality.
John W. Ayers PhD MA is a Johns Hopkins, Harvard trained computational epidemiologist, known for his work on emerging technologies and public health, including the most read medical research study in the world during 2023 that described using AI for healthcare. He is the Vice Chief of Innovation (Division of Infectious Disease) and Head of AI (Altman Clinical and Translational Research Institute) at UC San Diego Medicine. He is also the Head of Strategy for Medeloop.
John, Interesting piece, most of which I heartily endorse. [HIPAA (about which I testified to Congress and was ignored) has to be one of the worst pieces of legislation ever. But no one cares even though it has done enormous disservice to most axes in health care.]
But there is a further issue in what you propose -- the data is infinitely "bad" -- dirty, duplicative in non-intuitive ways, out of sequence and written to an infinitude of quasi-standards that are not. To provide useful analyses, the AI used would have to be able to do two things: 1) make all information be about individuals (not groups/populations) because you cannot care for a population and because any conclusion at some point MUST be applied to an individual to be useful (and as has been demonstrated many times, conflating individuals is risky and sometimes, literally, fatal) and 2) UNDERSTAND the data so that the answers to questions you posed as examples have a chance at being correct. Neither of these criteria can be met by generative AI, much as many folks wish they could be. Agents are just the latest sop (after RAG) to try to paint lipstick on a pig not suited for purpose..
This is the foundational deception of generative AI -- it is meant to look intelligent, but actually is solely a correlation engine, NOT an understanding engine. The correlations are deceptively good (and I use many of these engines every day) but entirely meaningless in terms of understanding. It is trivial to invoke nonsense hallucinations from the best engines because: 1) they are foundationally probabilistic (and this cannot be fixed) and 2) they rely on traversing training sets which themselves are filled with misguided rhetoric that the engine "copies" because on some traverses through the network that is the path it takes. As a great example of the nonsense that ensues, here is a "coding" AI that refused to code, telling the requestor that "they would not learn anything unless they coded it themselves". https://arstechnica.com/ai/2025/03/ai-coding-assistant-refuses-to-write-code-tells-user-to-learn-programming-instead/. These kinds of situations are very, very common and are virtually impossible to reduce below 10% (and are usually far worse).
DARPA (love them or hate them, they have a mountain of expertise) notes that there are three generations of AI -- generative is the second generation, characterized by being "statistically impressive but individually unreliable". (https://machinelearning.technicacuriosa.com/2017/03/19/a-darpa-perspective-on-artificial-intelligence/). The often forgotten issue in health care is that it is 100% about individuals -- one cannot care for a population...only people.
The reason this is true is because correlations are impressive until they are not -- and any application of a particular correlation to a particular case is likely to become wrong (thus hallucinations and their ilk). This is why it is unsafe to use any generative AI UNLESS YOU ALREADY KNOW THE ANSWER to the question you are asking. (Ask any of the lawyers who wrote and submitted briefs that way.)
So getting answers from almost unusable data using probability-responsive tensors that cannot be "run in reverse" to check what they did is likely to give bad results -- and in any case, results not applicable to any patient. I expect the results would be interesting, but with so many potential errors and holes that their use for anything other than non-scholarly articles would be worrisome.
DARPA posits a third wave of AI -- one that is mostly still not extant. (This is in many ways the foundation of the original Web 3.0 which is why we have not seen that, either.) This is context adaptation, often referred to as "Cognitive AI" -- that is, AI that UNDERSTANDS. This is particularly important in fields like health care where the unit of measure is an n-of-1...you cannot treat a population (statistics) -- only an individual (as vonEye pointed out years ago).
What you are proposing could have important results -- if a different framework (not the current deep learning/LLM framework) is adopted. One needs a framework that is deterministic and based on empiric truths -- not probabilities tweaked up front with RAGs and in the back by agents. Some of us have been working in this Cognitive AI space and the results are more-than-interesting. And completely different from just dropping the "health data" into a generative AI model and hoping for the best. (We did some of the original research with IBM using Watson for health...fascinating results, utterly consonant with the above. We predicted exactly what happened at MD Anderson.)
This is a decent general-reader article on why the current generative AI tools cannot deliver what needs to be delivered at the level one would wish: (https://medium.com/data-science/the-rise-of-cognitive-ai-a29d2b724ccc). The author has also published another piece on why agents cannot do it either for other unexpected reasons...lol. (https://towardsdatascience.com/the-urgent-need-for-intrinsic-alignment-technologies-for-responsible-agentic-ai/.) Every look at the current solution stack cries out for a deterministic, description logics (or other equivalent), curated approach to actually making real sense of the mountains of data out there.
So loved your piece, but we already see people dumping their stuff into various engines where the results will be specious at best. We can do better and this is the time to start out right.
I hope your dreams come true. To think we are asked to collect data on "gender" and how much money I make and my education level, and we can't even use our required EMRs for useful data and health related outcomes gathering. But at least you will know I am a white woman aged 50-60 who has a Doctoral degree and makes 100k to 150k. But you have no idea how to use my data to find out why my back pain takes 8 visits of care at clinic X versus a similar patient who needs 20 visits. Ugh. So. Absolutely. Dumb. And that isn't life saving information! We don't have those questions answered either!