Federal Court Rules on AI Training with Copyrighted Books -- Pure AI

Federal Court Rules on AI Training with Copyrighted Books

By John K. Waters
06/27/2025

A federal judge ruled this week that artificial intelligence company Anthropic did not violate copyright law when it used copyrighted books to train its Claude chatbot without author consent, but ordered the company to face trial on allegations it used pirated versions of the books.

Judge William Alsup of the U.S. District Court for the Northern District of California issued the ruling Monday in a lawsuit filed by three authors against Anthropic. The decision represents a significant development for AI companies facing copyright litigation over their training methods.

Court's Fair Use Determination
Alsup ruled that Anthropic's use of copyrighted books to train its large language models constituted fair use under copyright law. The judge compared the practice to an aspiring writer reading copyrighted texts "not to race ahead and replicate or supplant" those works, "but to turn a hard corner and create something different."

The lawsuit was filed by authors Andrea Bartz, Charles Graeber, and Kirk Wallace Johnson, who alleged Anthropic used their work without consent in what they termed "largescale theft."

Had Judge Alsup ruled that using copyrighted books for AI training constituted copyright infringement, it would have fundamentally altered the AI development landscape. Companies would likely have faced two primary paths: either negotiate expensive licensing deals with publishers and authors for training data, or pivot to using only public domain materials and original content.

The licensing approach would have created significant barriers to entry for smaller AI companies while potentially benefiting established players with deeper pockets. We might have seen the emergence of massive content licensing consortiums, similar to how music streaming services negotiate with record labels. Publishers and authors would have gained substantial leverage and new revenue streams from AI companies.

Alternatively, if companies had been forced to rely primarily on public domain content, AI models might have developed differently - potentially with knowledge cutoffs much earlier in history, or with notable gaps in contemporary understanding. This could have led to a bifurcated AI ecosystem where some models had access to modern knowledge through expensive licensing while others remained limited to older, freely available content.

Piracy Allegations to Proceed
While dismissing the copyright infringement claims, Alsup ordered Anthropic to face trial on allegations it knowingly obtained copies of more than 7 million books from piracy websites. The company later purchased copies of some books, according to court documents.

The judge expressed skepticism about the company's piracy defense, stating he doubted "any accused infringer could ever meet its burden of explaining why downloading source copies from pirate sites that it could have purchased or otherwise accessed lawfully was itself reasonably necessary to any subsequent fair use."

"That Anthropic later bought a copy of a book it earlier stole off the internet will not absolve it of liability for the theft, but it may affect the extent of statutory damages," Alsup added.

Company Response
Anthropic said in a statement it was pleased the court recognized that using published works to train large language models was consistent with copyright laws "in enabling creativity and fostering scientific progress."

The company disagreed with the decision to proceed to trial regarding its "acquisition of a subset of books and how they were used," and said it was "evaluating all options."

If the court had also dismissed the piracy allegations, it would have essentially given AI companies a green light to acquire training data through any means necessary, as long as the ultimate use qualified as fair use. This could have established a problematic precedent where the method of obtaining copyrighted material became irrelevant if the end use was deemed transformative.

Such an outcome might have encouraged more aggressive data acquisition practices across the industry and potentially undermined traditional content markets. It could have created a situation where piracy became a de facto acceptable method for obtaining training data, as long as companies could argue fair use for their AI systems.

Background
According to court documents, after internal concerns arose about using pirated books, Anthropic hired former Google Books executive Tom Turvey to obtain "all the books in the world" while avoiding legal issues.

Rather than seeking commercial licensing agreements with publishers, the company purchased millions of print books from retailers, many used, then scanned them into digital form. Alsup noted the company could have hired staff to create original content for training but that would have "required spending more."

The authors who filed the lawsuit said Anthropic's actions made "a mockery of its lofty goals."

A More Nuanced Precedent
A completely different ruling could have triggered a wave of similar lawsuits against other major AI companies like OpenAI, Google, and Meta, potentially leading to industry-wide changes in training practices. We might have seen the development of standardized licensing frameworks or even legislative intervention to clarify the boundaries of fair use in AI training contexts.

The current mixed outcome—fair use for training but potential liability for piracy—creates a more nuanced precedent that maintains some protections for AI development while preserving content creators' rights against outright theft of their work.

About the Author

John K. Waters is the editor in chief of a number of Converge360.com sites, with a focus on high-end development, AI and future tech. He's been writing about cutting-edge technologies and culture of Silicon Valley for more than two decades, and he's written more than a dozen books. He also co-scripted the documentary film Silicon Valley: A 100 Year Renaissance, which aired on PBS. He can be reached at [email protected].