How Good is the Apple Watch at Detecting Hypertension?

And how will the function impact doctors and patients?

Jan 27, 2026

I hate to overstate the importance of a Sensible Medicine post, but the study reported on here by Caspian Kuma Folmsbee will affect many readers — doctors, Apple Watch wearers, both — in the coming months. I love a good test characteristics study. I love thinking about the impact of introducing health care technology to an unsuspecting population. This article comments on both.

Adam Cifu

In September 2025, Apple released a new health feature: hypertension alert. Hypertension notifications are “powered by a machine learning–based algorithm to identify key photoplethysmography (PPG) patterns that may indicate hypertension. The algorithm uses 60-second segments of PPG signals as inputs, collected approximately every two hours throughout nonoverlapping 30-day evaluation windows.”1

With the update, Apple released a validation study that defined the test characteristics – sensitivity and specificity – of the function. The study compared the algorithm noted above to the gold standard of average blood pressure readings taken twice daily at home over 30 days. Participants were instructed to wear the watch 12 hours a day. Apple used 3216 participants to train the data, 3878 to validate it, and then 2236 participants to test it.

Table 1 below has the demographic details of the participants. About 40% were aged 18-39, 50% were white, and 20% were Asian. The training set was 18% African American, with the test group being 34% African American. 33% of the training group had Stage 1 or 2 hypertension. 64% of the test group was hypertensive. It is unclear how they decided to assign participants to their respective groups.

Using blood pressure measurements as the gold standard and the training sets, the study calculated the sensitivity and specificity of the algorithm for detecting hypertension. (Table 4 below).

As a reminder, sensitivity is the ability to detect disease. If there were 100 patients with hypertension, the sensitivity would be the proportion correctly identified as having hypertension. Specificity is the proportion of people without disease correctly identified as not having disease.

As you can see, the overall sensitivity is 41% for Stage 1 and 2 hypertension. The specificity was 92%. So out of 100 theoretical patients with hypertension, we would correctly identify 41 with hypertension, and the remaining 59 would not be notified. 8 out of 100 theoretically healthy (normotensive) patients would be falsely alerted that they are hypertensive.

Unfortunately, a true appraisal of this study is impossible since it is not a typical academic publication. The document lacks details about the methodology. For example, where did they get these participants? Are they employed? What is their income? Did they have HTN already? Were they already getting treatment for other medical issues? This information is critical to determine the external validity/generalizability of these test characteristics.

Furthermore, how often were people actually checking blood pressure, and were they doing it correctly? Did participants really do it twice a day as specified in the protocol? We know the type of machine they were using, but blood pressure evaluation is not a simple procedure. Were patients blinded to the watch notification? Did they seek out medical care?

There are no answers to these questions, and I wonder if it is right to roll out a feature to millions of people based on a study with only a couple of thousand people. Perhaps there were other validation studies, but this one is pulled directly from the Apple website and presumably the most robust.

But here is my takeaway about these internal validity concerns – It does not matter.

It does not matter if the sensitivity is 20%, 40%, or even 80%.

It does not matter if the specificity is 85%, 90%, or even 95%.

Before I explain, it can be helpful to work through a theoretical example with the results.

Apple was projected to sell 55 million Apple Watches in 2025. Let’s say that projection is more like 20 million, and let’s assume that only half of those will use the hypertension notification feature. Let’s assume 10 million people use this feature and assume a prevalence of hypertension of 20%. (In underserved Chicago neighborhoods, that rate is as high as 50%). Of the 2 million who have undiagnosed hypertension in that group, assuming a sensitivity of 41%, 820,000 will get notified that they might have hypertension. That leaves 1,180,000 with hypertension who were not notified.

Assuming a specificity of 92%, of the remaining 8 million (80%) without hypertension, 640,000 (8% x 8,000,000) will be falsely notified of high blood pressure. The classic point to make is that even with a “good” specificity of 92%, 640,000 is a lot of false positives. The positive predictive value in this case would be 56%, or put another way, if someone was notified by their watch of potential hypertension, they would have a 56% chance of truly having hypertension.

Now repeat the above with worse sensitivity, or maybe better specificity. The numbers change, maybe by a couple of hundred thousand in each group.

But here is my point – it does not matter.

Does a watch notification make it easier for a person to see their primary care doctor? A person who got the notification will also have to compete with appointments with the 1,460,000 others who were also notified, 640,000 of whom do not have hypertension but were told they might.

Flooding these patients into an already overwhelmed primary care system is not in the best interests of public health.

Primary care needs to get out of the preventive care business or at least deprioritize it. We need to focus on those who need our help, such as those with multimorbid comorbidities who are acutely ill and cannot get appointments for months.

If we really want to tackle the real problem of cardiovascular disease, it will take more than a watch notification.

Caspian Kuma Folmsbee is a primary care provider in Chicago. He publishes at Kuma’s Substack.

PPG uses a light source and photodetector at the surface of the skin to measure volumetric variations of blood circulation. Originally used to measure heart rate, it is variations in these waveforms that could have utility to detect cardiovascular risk factors such as arterial stiffness.

A guest post by

Kuma Folmsbee

Internist figuring out critical appraisal. Mostly done for myself but anyone else welcome.

AM Schimberg

Jan 27

I've had the thought recently: If you could have a full body monitor/scanner/tester that was constantly evaluating the state of your entire body, would you want such a device? Would such a device be "healthy" for you? I think certainly not. Constantly fretting over every little variation or deviation from mean would not make a happy, full, well-lived life.

JDM

Jan 27Edited

“Primary care needs to get out of the preventive care business.”

Huh? This statement casts the entire essay into serious doubt.

Primary prevention using well supported methods is the best way to reduce morbidity and mortality in the future, and should be the goal of primary care family docs, pediatricians and internists.

The solution to overloading primary care doctors with new patients is not to stop providing preventative care. It is to continue to increase the number of primary care physicians.

5 replies

25 more comments...

Sensible Medicine

Discussion about this post

Ready for more?