Wearable health devices aren’t always accurate

The smartwatch on your wrist tells you your sleep score, your heart rate variability, your blood oxygen, and how many calories you burned mowing the lawn. The numbers feel medical because the marketing makes them feel medical. The actual accuracy is more uneven than the polished interface suggests, and treating wearable data as clinical truth produces a steady stream of unnecessary doctor visits, missed real problems, and bad decisions about training.

Heart rate is mostly fine, sometimes catastrophic

For resting and steady-state heart rate, most modern wrist-based optical sensors do well, often within one or two beats per minute of a chest strap. The trouble starts during interval training, weight lifting, or anything involving wrist flexion. Validation studies—including peer-reviewed work from Stanford and others—have shown error rates jumping into double digits for high-intensity intervals on Apple Watch, Fitbit, and Garmin devices. Skin tone matters too: green-light optical sensors perform worse on darker skin, a documented and slowly-improving issue. For someone trying to train in heart-rate zones, that variance is the difference between a productive workout and a useless one. A chest strap is still the cheap, reliable answer for serious training, even though it lacks the lifestyle aesthetics.

Sleep tracking is essentially educated guessing

Polysomnography—the gold standard for sleep staging—involves EEG, EOG, EMG, and a technician watching brain waves all night. Wearables use accelerometers and heart rate trends to infer the same information. The result is correlated with actual sleep stages but not equivalent. Studies comparing consumer wearables to clinical sleep labs typically find sleep-stage agreement in the 50 to 70 percent range. That’s better than random but not good enough to diagnose anything. The score does broadly track total sleep time and overall trends across weeks, which is genuinely useful. The detailed breakdown of REM versus deep sleep is mostly entertainment. People who restructure their lives around their Oura Ring’s nightly verdict are responding to a number that has a meaningful margin of error built in.

SpO2 and ECG features come with asterisks

Single-lead ECG functions on watches can detect atrial fibrillation reasonably well in screened populations, but they generate false positives often enough that cardiologists have written about the resulting unnecessary workups. SpO2 readings on wrist devices are notoriously sensitive to fit, motion, and skin tone—the FDA has warned about this for the broader pulse oximetry category. Treating a 92 percent reading from your watch the same way you’d treat a hospital pulse oximeter is a mistake in both directions: it can cause panic over normal variation and false reassurance during real hypoxia. The features are useful as trend indicators, not as diagnostics.

The takeaway

Wearables are great for trends, motivation, and noticing when something has changed. They’re poor substitutes for clinical measurement, and the gap between the two is often invisible to users. The right mental model is “this is a fitness coach with strong opinions and weak credentials.” Use the data to inform questions for your doctor, not to answer them yourself. And if a number feels off, get a real measurement before reacting.

Heart rate is mostly fine, sometimes catastrophic

Sleep tracking is essentially educated guessing

SpO2 and ECG features come with asterisks

The takeaway

Comments

Leave a Reply