Autism Has Been Neglected
Only 543 studies or commentaries published in JAMA—among the world’s most widely circulated medical journals—even mention autism, with just 4 research studies published in the last five years mentioning autism in their title. In contrast, cancer has been the subject of nearly 19,000 JAMA publications. One might reasonably argue that cancer research merits more attention but even topics such as climate change have significantly eclipsed autism, garnering 707 JAMA publications. Autism also trails other mental health concerns, like anxiety with 3,682 JAMA publications, alzheimer’s with 1,540 JAMA publications, and alcohol with 7,514 JAMA publications—about 6.8x, 2.8x, and 13.8x the attention given to autism. These disparities in attention aren’t unique to JAMA but are reflected in other prominent journals, such as the New England Journal of Medicine, and the entire medical literature as archived on PubMed.
Why does autism get so little attention? A key factor is data scarcity. The largest government supported autism database, the National Database for Autism Research, has collected data on about 90,000 participants with autism. The Centers for Disease Control and Prevention’s Autism and Developmental Disabilities Monitoring Network included 9,000 participants diagnosed with autism in their latest report. Other studies focused on predictors are smaller. The Environmental influences on Child Health Outcomes Program includes less than 400 persons with autism. Without larger datasets, insights into autism's causes, trajectories, and disease-modifying treatments remain elusive and breakthroughs are delayed.
A Bold Vision—and a Practical Path Forward
A new NIH initiative promises to change that by pooling real-world health data on a national scale to decode the condition’s origins, progression, and optimal treatments. The immediate goal: to create the largest dataset dedicated to autism ever assembled.
Essays at moments like this usually pivot to an idealized wish list, weighed down by “if only” caveats about the complexities of public‑health science. This is not that kind of essay. Inspired by the NIH’s charge, my collaborators and I have launched the largest-ever autism-related database in history. Our early findings show that the NIH’s initiative can deliver tangible public‑health benefits now—not sometime in the distant future.
Building the World’s Largest Autism Data Resource
HealthVerity, a clearinghouse for federated health data, manages vast repositories of privacy-compliant de-identified medical claims, electronic health records, registries, genomics, and death records. When I asked if we could close autism’s data gap, their CEO immediately said “yes.”
However, data alone will not galvanize autism research; trust is equally critical. The field is fraught with political and social tensions. Researchers challenging polarized beliefs often face ad hominem attacks, and the public is fed “black‑box” science via press releases, not open workflows.
As I have noted in Sensible Medicine, secure, AI‑driven analysis engines may help rebuild trust. Among many possible applications, one can enable semi-autonomous agents to:
Refine research questions expressed in plain language.
Query data securely, never exposing individual‑level records.
Run reproducible analyses in minutes, not months.
Summarize findings in transparent reports that anyone—from government officials to families—can audit and replicate.
Concerns about gate‑keeping or reproducibility melt away when every step is open for public interrogation and refinement. As a result, my collaborators and I at Medeloop layered specialized AI research agents—experts in data processing, analytics, and study design—across 121 million patient records from HealthVerity covering nearly a decade of observations to empower autism research.
What comes next? Asking the first question.
The image shows the first set of steps where a Medeloop’s AI agents operate on HealthVerity data to study autism. Here the first set of agents work with the user to refine their initial research question. An interactive view of all the steps are available here: https://demo.medeloop.ai/analytics/
Exploring Our AI-Powered Autism Data Resource
I asked: “How many patients had a new diagnosis of autism? What is the breakdown of diagnoses by male/female, race/ethnicity, and age (kids, pre-adolescents, etc.)? Show me trends for new diagnoses and break those down by age as well. Patients should have been followed a minimal amount of time before a diagnosis to be counted as a new diagnosis.” Entering that query into the shared resource triggered a series of agents to resolve the answer.
Medeloop’s AI agents processed this query on HealthVerity’s data through secure interactive steps that you can follow here, creating a cohort of 799,560 people with a new diagnosis of autism [which the AI system defined as a first diagnosis of ICD-10 code F84.0 (autistic disorder) after a minimum of 2 years of prior encounters without a diagnosis] between January 2015 through December 2024. This is not only the largest dataset on autism but does so focused on new diagnoses, cases that are suitable for studying what preceded or followed the diagnosis.
The AI agents then described how new diagnoses varied by key demographics:
Sex: 69.7% of newly diagnosed persons were male, 30.3% female.
Age: Most diagnoses occurred among 5 to 11-year-olds (35.3%), followed by 12 to 17-year-olds (20.3%), and 3 to 5-year-olds (18.7%). However, even a relatively large number of diagnoses were among adults, including 31+ year-olds (9.4%), 22 to 30-year-olds (8.5%), and 18 to 21-year-olds (7.5%).
Race/Ethnicity: Non-Hispanic White individuals comprised 43.6% of new diagnoses, followed by Hispanic (23.7%), Black (14.3%), and Asian (2.6%) individuals.
Then AI agents produced a visualization of monthly trends in new autism diagnoses, revealing an increase in diagnoses both before and after a decline corresponding with the Covid-19 pandemic with new monthly diagnoses among all age groups, peaking at 13,170 new cases for March 2023. Moreover, since the low during April 2020 there have been 15 months with a new record high in diagnoses. The cause of these increases remains unclear, but some explanations could include new case identification, shifts in diagnostic criteria or changes in healthcare-seeking behavior.
The image shows trends in new diagnoses (defined as a first diagnosis of ICD-10 code F84.0 “autistic disorder” after a minimum of 2 years of prior encounters without a diagnosis) among 799,560 persons in the United States as compiled by Medeloop’s AI agents operating on HealthVerity data. An interactive view of the entire research process is available here: https://demo.medeloop.ai/analytics
Note: A deceleration in growth during the last quarter of 2024 may correspond to some medical claims not being processed as these data are typically delayed.
Monthly trends grouped by age suggest the increase in diagnoses is prominent in all ages but more so among school-aged children (5 to 11 year olds) followed by preschool-aged children (3 to 5) and adolescents (12 to 17).
(Note: A slowdown in late 2024 may reflect unprocessed claims, as diagnosis claims can be delayed.)
What Would You Ask?
The resulting database of 799,560 people with a new diagnosis of autism also has more than 650,000,000 variable observations available to study the condition’s origins, progression, and optimal treatments, going well beyond the claims analysis shown herein. AI agents can be extended to other integrated real-world datasets, including electronic health records, genomics, clinical trials, patient surveys, disease registries, wearable and sensor data, social determinants of health, pharmacy dispensing records, lab and imaging results, public health archives, linked multi-source longitudinal cohorts, etc. If you could ask anything, what would you ask?
Why have new autism diagnoses increased?
Are certain providers more likely to make a new autism diagnosis?
Which biological or social factors predict a diagnosis?
How have diagnosis trends changed with changes to autism’s clinical definitions?
This vision can be accomplished without favoring any single data source or technology but companies like HealthVerity and Medeloop are simply proof that the technology is mature. Regardless, I look forward to inviting policymakers, scientists, clinicians, and—importantly—families who have a loved one affected by autism to do just that: Ask a question in everyday English and watch a team of AI agents trace the evidence in real time. That is how we turn vision into answers—and how NIH can launch its autism research initiative immediately.
-------
John W. Ayers, Ph.D., MA, is a Johns Hopkins and Harvard-trained computational epidemiologist specializing in emerging technologies in public health. He serves as Vice Chief of Innovation (Division of Infectious Disease) and Head of AI (Altman Clinical and Translational Research Institute) at UC San Diego Medicine and is the Head of Strategy for Medeloop. His summary of the evidence is his own and is not intended to reflect an official position of his employer, HealthVerity, or Medeloop.
The problem is that autism is not a medical condition, it’s a medicalization of problem behavior and in general of the situation of being a “weird” kid. You cannot just throw a ton of data at the situation and get a real understanding of it when the underlying conceptual and diagnostic categories are so confused.
A big problem is that the diagnosis of autism has been applied to a number of neurological disorders that may or may not be related to one another. It may be that this has contributed to the increasing numbers over the last few decades. Data sets are only useful when the data is accurate. AI cannot deal with matters requiring judgement or nuance, but it can mislead people into fruitless blind alleys of research such as we have seen in the "preventive" measures recommended for cardiovascular disease.