Anthropic's 🖕to every one of us
A Californian judge just validated the industrialized IP theft of the AI industry.
TLDR Summary
Anthropic is a reckless and entitled AI start-up, symbolic of a tech industry that is completely out of control and drunk on its own success. A judge in California just ruled that they can use copyrighted data without permission to train their models. This allows them to make money of other people’s works without compensating them adequately. The sole purpose is to enrich themselves. It’s a grift disguised as progress.
If this ruling receives validation from future rulings, this would be a significant blow to property rights, the defense of which is essential to provide an incentive for creativity and ingenuity. If we don't protect creators from the industrialized IP theft of the AI industry, the rise of AI could stifle innovation rather than accelerate it.
Company and case background
Anthropic is an AI start up founded in 2021 by OpenAI alumni. They are backed by Amazon and Google which sets them up as the one of most important antagonists for OpenAI/Microsoft. Their main product is Claude which has grown to a 3% market share as of June 2025.
So far, Claude is a niche product. But the space is dynamic and things can change quickly. For what it’s worth, Claude is currently the fastest growing GenAI chatbot out there.
Anthropic has been sued regularly by media publishers for IP infringement. On Monday, there was an important ruling in a class-action lawsuit filed against the company for feeding its LLMs with pirated copies of books.
Before we move on, appreciate for a moment about how appalling this practice is. Not only are they using IP protected works without permission for their business. They even dare to use pirated versions of them. They have raised billions of Dollars and can’t even pay the $9.99 that you and me pay for our latest beach novel.
One of the most prominent start-ups in the US and backed by tech giants is behaving like Sean Parker in his college dorm. That alone says a lot about the type of reckless and hubristic people we are dealing with here. They think rules don’t apply to them. The entire tech industry is out of control.
As usual in these cases, Anthropic argued that their use is ‘fair’ in the context of US copyright law. As a brief refresher, in US copyright law the fair use principle is designed to govern the use of copyrighted material without permission. It’s based on subjectively weighing criteria such as:
Is the use transformative? If a new perspective is added to the material, it is more likely to qualify as fair use under the criticism/commentary angle.
Is the use commercial? If the user intends to make money of the material, the use is less likely fair.
How much of the original work is used? If most of the user’s publication is someone else’s work, it’s less likely to be fair.
Is the original material factual or creative? The more creative, the less likely fair.
Does the use help or harm the copyright owner financially? If it’s marketing for them, it’s more likely fair. If it’s removing potential buyers from them, it less likely to be fair.
As I discussed in the article below, it’s very difficult for the AI industry to prove their use fair based on this criteria.
The ruling
Judge Alsup ruled that Anthropic’s digitization of the books and subsequent use for training purposes can be deemed sufficiently transformative to be considered fair use. He did however (at least!) take issue with the fact that Anthropic’s use of pirated copies.
“The use of the books at issue to train Claude and its precursors was exceedingly transformative and was a fair use under Section 107 of the Copyright Act. And, the digitization of the books purchased in print form by Anthropic was also a fair use but not for the same reason as applies to the training copies. Instead, it was a fair use because all Anthropic did was replace the print copies it had purchased for its central library with more convenient space-saving and searchable digital copies for its central library — without adding new copies, creating new works, or redistributing existing copies. However, Anthropic had no entitlement to use pirated copies for its central library. Creating a permanent, general-purpose library was not itself a fair use excusing Anthropic’s piracy.”
His main line of reasoning was to equate human learning to machine learning:
“Everyone reads texts, too, then writes new texts. They may need to pay for getting their hands on a text in the first instance. But to make anyone pay specifically for the use of a book each time they read it, each time they recall it from memory, each time they later draw upon it when writing new things in new ways would be unthinkable. For centuries, we have read and re-read books. We have admired, memorized, and internalized their sweeping themes, their substantive points, and their stylistic solutions to recurring writing problems.”
So if a human can read a book and apply the knowledge commercially without asking for permission, a machine can do the same according to this judge.
Such an understanding is very problematic for various reasons. Firstly, the nature of human learning differs greatly from machine learning. Humans use reasoning. Machines use pattern-matching. Humans generalize. Machines extrapolate and interpolate. Humans have intentions. Machines follow instructions. Because of all of this, humans can actually internalize knowledge. Machines simply process the knowledge owned by others.
I can imagine you want to disagree with me on that. So, let’s move on to the second point. Even if we accept the equivalence of human and machine learning under certain circumstances, the scale of the potential damage of misuse is very different. Machines can be used in a very targeted way to monetize the IP of others at an extreme scale.
You can build a model which transforms source data with the sole purpose of disguising where it comes from. The transformed data can then be monetized as a direct competitor to the source material. This can be greatly damaging the rights of the owners of the copyrighted material.
Anthropic is using the material with the very clear intention to make money of it. They are systematically building a library of millions (!) of books to train their models. People have to understand that this has a very different quality from a human reading books to become an expert on a matter an make that a career.
Consider the most extreme scenario
I like to think in extremes when I want to make conceptual points. It’s a great test for any principle that you want to test for general truth.
The global media and entertainment industry is worth about $3tn every year. That's all money spent by humans to consume books, newspapers, movies, TV, radio, music, gaming etc.
Imagine one single company, let's call them ClosedAI, aggregated all of that media. They might even pay for them properly (paying the price people typically pay for personal use, not obtaining a distribution license which is very different). Then this company uses their data repository to create a personalized media experience of each of its subscribers that is so good that it satisfies all their media needs. Nobody would then consume any other media anymore. They would all go out of business. And we would allow the rise of a new media behemoth that can manipulate public opinion at will.
ClosedAI would argue: “This is all fair use. We're not just regurgitating the data. None of our tokens are copies of the source material. We transform it to deliver a personalized media experience for our users.”
But can that really be fair use? Section 107 of the US Copyright Act lists four factors for consideration in the determination of fair use. One of them is 'the effect of the use upon the potential market for or value of the copyrighted work'.
If the use completely nukes the entire industry that provide the materials, can it possibly be fair? Obviously not. Systematically redistributing someone else's content for commercial purposes without their consent is theft. What is true in this extreme example is also true in principle.
If you side with AI companies here, you are not siding with progress. You are siding with grifters. Just listen to them. They are openly admitting that they are not trying to figure out what’s right, but to justify what’s in their interest.
Sincerely,
Rene
Great article Rene. At this point I do think the judicial system is compromised. Why are so many judges siding with BS tech lawyers and their BS arguments? Could it be that these tech firms are bribing circuit and federal district judges, especially in California?