ChatCPR: AI’s Answer to the Question ‘When Will It Save Lives?’
In the US, fewer than 1 in 10 people who have cardiac arrest outside the hospital survive. Read this summary of a research team's attempt to use AI to save lives.
Earlier this year, I appeared on a cable news broadcast to comment on yet another AI study. At the end of the segment, the host asked a question that caught me off guard: When will AI start saving lives? I smiled for the camera and tried my best, but I couldn’t shake the question.
Since ChatGPT’s launch, PubMed has indexed over 178,000 articles mentioning AI in their title or abstract. Many are interesting; some are promising. But when will this AI wave truly save lives?
That question reframed my thinking: AI’s biggest impact isn’t in documentation, prediction, or optimization, but in high-stakes moments when seconds count and existing human systems can fail.
The answer became clear: out-of-hospital cardiac arrest.
I’m not a clinician; I’ve never performed or trained in CPR. I’m also part of a supermajority of Americans where only 2% have been certified to perform CPR within the past year. When someone collapses in front of us, we call 911 and wait.
Dispatchers take about 75 seconds just to recognize a cardiac arrest, with chest compressions only starting nearly 3 minutes into the call. Across more than 350,000 out-of-hospital cardiac arrests each year in the United States, that delay is measured in lives where survival sits at roughly 9%.
Closing That Deadly Gap with AI
What if, the moment I needed help rescuing a loved one in arrest, I immediately had an expert CPR coach with me, telling me exactly what to do?
Our Study
Working alongside UC San Diego's Altman Clinical and Translational Research Institute — the very institution that pioneered CPR — my co-leads and I took on this challenge. Our team brought together Nimit Desai, a medical student and longtime collaborator, and Christopher Horvatt, a critical care physician who also happens to be a former high school classmate. Our findings are published in JAMA Internal Medicine.
We benchmarked popular AI models, including ChatGPT, Claude, Grok, Gemini, Llama, and Mixtral, on CPR coaching in simulated emergency scenarios against a checklist of criteria for delivering guideline-concordant CPR in out-of-hospital cardiac arrest. “Minimally viable” criteria captured the non-negotiable basics, e.g., coaching on where to compress and at what rate. “Maximally effective” criteria represented everything needed to optimize survival, e.g., instructing rescuers to allow full chest recoil between compressions. The benchmark results were encouraging but also sobering.
Across scenarios varying by cause (e.g., drowning vs. collapse during a jog) and patient (e.g., toddlers vs. seniors), AI models averaged 90% on minimally viable criteria, ranging from 79% (Gemini) to 97% (Grok, Claude), and 70% on maximally effective criteria, ranging from 61% (Llama) to 75% (ChatGPT). But in cardiac arrest, good is not good enough. Missing 10-30% of steps can be the difference between life and death.
Building ChatCPR for Real Emergencies
Those gaps informed how we developed ChatCPR, an open-source AI agent for CPR coaching. Grounded in 911 dispatcher training materials and decades of evidence on CPR best practices, we iteratively engineered the system to fix the specific failures we had observed in the benchmark. In these same simulations, ChatCPR scored 100% on both checklists.
But does ChatCPR work in reality? We obtained a set of de-identified 911 calls where dispatchers had provided CPR instruction, blindly comparing the dispatchers to ChatCPR’s responses. This is precisely the setting where ChatCPR could be deployed: a real emergency, a panicked bystander, help-seeking in progress.
ChatCPR won every head-to-head comparison with human dispatchers: +15 percentage points on minimally effective criteria (85% of guideline criteria met by dispatchers vs. 100% for ChatCPR) and +36 on maximally effective criteria (63% vs. 99%).
Performance differences were most pronounced for minimally viable criteria related to assessing if the patient was awake or responding, providing initial chest compression instructions, and instructing compression quality (depth and rate) as minimally viable CPR criteria. For maximally effective criteria, the largest gaps were in directing the caller to retrieve and use an AED if available, instructing full chest recoil between compressions, and ensuring proper continual CPR positioning.
This wasn’t about style, like our widely cited study on, AI for patient messaging, it was about strict adherence to CPR guidelines where precision matters most. Specifically, the only unmet checklist criteria occurred in one call and involved a maximally effective item requiring assessment of patient responsiveness and breathing. Although ChatCPR addressed both elements, the questions were not asked in the guideline-recommended order. ChatCPR excelled in patient assessment, depth/rate instructions, and recoil, areas where stressed, multitasking dispatchers faltered.
A Free AI that Can Save Lives—With Safeguards
ChatCPR proves AI can deliver precise, guideline-based CPR coaching on demand. The remaining challenge is implementation.
We have open-sourced ChatCPR and made it freely available, publishing the complete system, evaluation framework, and all supporting materials so that any developer or organization can freely use, adapt, and deploy the system.
This open-source approach is what makes practical deployment realistic: because the full system is publicly available, technology companies can integrate it directly into the devices people already carry, such as smartphones, search engines, and voice assistants, without building CPR coaching from scratch, while researchers can test, refine, and improve it, especially as multimodal AI tools rapidly advance. The goal is simple: make expert CPR coaching available instantly, anywhere, to anyone who needs it.
Importantly, systems like ChatCPR could add value across the cardiac arrest response continuum. They could help bystanders start CPR sooner (especially in difficult to reach or remote locations), support dispatchers with standardized guidance (a kind of coach’s coach) and assist clinicians and first responders with complex or scenario-specific instructions during training thereby making familiarity with the tool part of initial training. The real promise is closing the deadly gap between a person collapsing and lifesaving care beginning.
Yet, rigorous real-world trials are essential for safety, usability in chaos, and supporting how people interact with AI guidance. Safeguards such as integration with existing 911 systems and human oversight will be a necessary first step. Moreover, clear regulatory frameworks will be essential as our tool moves from research to real-world use.
For example, good samaritans receive legal protections for good-faith resuscitation efforts; how protections extend to developers and deployers of AI-enabled CPR coaches remains an open question.
Ultimately our work grounds AI hype in life-or-death reality. If AI is going to earn its place in medicine, it should start by helping people save the person right in front of them.
John W. Ayers is a computational epidemiologist focussed on getting the public back in public health. He is a professor (medicine), vice chief of innovation (infectious disease and global public health), and head of AI (Altman Clinic and Translational Research Institute) at UC San Diego.





Congratulations of finding a ground breaking role for AI. It can't do a lot of things, but CPR is clearly something it can handle.
Excellent work! Is an App being developed?