The lawsuits keep coming. Authors, illustrators, news publishers, all arguing that training a large language model on their work is theft. The moral grievance is understandable. The legal claim is much weaker than the headlines suggest. Under existing US fair use doctrine, training a model on text or images you’ve lawfully accessed looks far closer to permitted transformative use than to infringement, and pretending otherwise muddles a real economic problem with the wrong cause of action.
The fight worth having isn’t about copyright. It’s about labor markets and licensing norms.
What fair use actually says
The Supreme Court’s framework, refined in cases like Campbell v. Acuff-Rose and Authors Guild v. Google, asks four questions: purpose, nature, amount, and market effect. Google won the right to scan every book in print to build a search index, a far more verbatim use than what an LLM does. AI training extracts statistical patterns, not reproductions. The output isn’t your novel; it’s a probability distribution shaped by trillions of tokens. Courts in Bartz v. Anthropic and Kadrey v. Meta have already declined to treat training itself as infringement absent piracy in acquisition. The transformative-purpose argument is among the strongest fair use claims in modern doctrine. Reasonable people can disagree on outcomes, but the doctrinal posture clearly favors the AI labs.
What creators are actually worried about
The real fear isn’t that a model memorized a paragraph. It’s that AI shrinks the addressable market for paid creative work. Stock photographers, copywriters, junior illustrators, and translators are watching client budgets evaporate. That’s a labor displacement problem, and it’s serious. But it’s not what copyright was built to address. Copyright protects specific expressions, not occupational categories. Trying to stretch it to cover competitive harm from a new technology turns it into something it has never been, and bad case law made under economic pressure tends to backfire on the very creators it’s supposed to help.
What a saner response looks like
Some publishers have already cut licensing deals, OpenAI with the Associated Press, News Corp, Axel Springer; Google with Reddit. These aren’t admissions that training requires a license. They’re commercial settlements that buy peace and freshness. Statutory licensing schemes, like the ones that let radio play music without negotiating with every artist, would be a cleaner fix than litigation roulette. Tax policies that recapture some AI productivity gains and route them toward displaced workers would address the actual harm. Banning training data wholesale would freeze open research, advantage incumbents who already have the corpus, and not save a single creative job.
The bottom line
Creators have legitimate complaints about how AI is reshaping their industries. Those complaints deserve real responses: licensing, labor protections, redistribution. They don’t, however, fit neatly inside copyright doctrine, and pretending they do is producing a wave of lawsuits likely to lose. Anger at the technology is rational. Channeling it through a misapplied legal theory just delays the harder, more useful conversations about how value gets shared in the AI era.
Leave a Reply