Why Evidence Isn’t Always as Strong as It Seems

“Studies show” is one of the most overused phrases in journalism and one of the most underexamined. Most people, including educated ones, don’t have the training to interrogate what a study actually demonstrated. That’s not a moral failing — it’s a structural problem. But it means a lot of confidently stated claims rest on evidence much shakier than the headline suggests.

Sample size and the law of small numbers

A study of 30 participants can produce statistically significant results that mean almost nothing in the real world. Small samples are noisy, and noise gets amplified into headlines. Worse, small studies that find nothing usually never get published, while small studies that find something striking get picked up everywhere. This publication bias means the literature on any given question is often skewed toward false positives. When you see a single study, especially with fewer than a few hundred participants, the right reaction is mild interest, not certainty. The phrase that should follow is “let’s see if it replicates.”

Replication: where claims go to die

The replication crisis is not an arcane academic dispute. In psychology, only about 40% of published findings replicate when redone with larger samples. In cancer biology, the rate is worse. Whole subfields — power posing, ego depletion, much of priming research — have collapsed under scrutiny. This means the public was confidently told things, often by credentialed experts, that turned out not to be true. The lesson isn’t to dismiss science. It’s to weight findings by how many independent labs have reproduced them, not by how loudly they were announced.

Effect sizes and the difference between real and useful

A finding can be statistically significant and still effectively meaningless. A drug that improves outcomes by 0.3% across 50,000 patients will hit statistical significance and produce a press release, but the practical effect on any individual is trivial. Conversely, an intervention with a huge effect in a small study might be real and important — or it might be the noise of a small sample. Always look for effect size, not just p-values. The question “how much does this actually do?” is more useful than “is this real?”

The expert in front of a microphone

Credentials matter, but they don’t override scrutiny. Experts speaking outside their narrow specialty — a virologist on economic policy, an economist on nutrition — should be weighted accordingly. Even within their field, experts have incentives: tenure, grants, media presence, ideological commitments. None of this means they’re lying. It means their certainty should be calibrated against the underlying evidence, not borrowed from their title.

Bottom line

Strong evidence is replicated, has meaningful effect sizes, and survives scrutiny from people motivated to disprove it. Weak evidence is a single small study with a striking result, amplified by a press release. Most of what you read in popular coverage is closer to the second than the first. Reading more carefully — or simply waiting six months — solves a surprising amount of the problem.

Sample size and the law of small numbers

Replication: where claims go to die

Effect sizes and the difference between real and useful

The expert in front of a microphone

Bottom line

Comments

Leave a Reply