In a landmark ruling, a federal judge said this week that Anthropic did not infringe copyright by digitizing books it had purchased, and then using them to train the chatbot Claude.
"The use of the books at issue to train Claude and its precursors was exceedingly transformative and was a fair use," U.S. District Court Judge William Alsup in the Northern District of California said in a 32-page decision issued Monday.
But Alsup sided against Anthropic regarding allegations that it downloaded millions of pirated books, ruling that the company was not entitled to claim fair use regarding those downloads.
The portion of the decision in Anthropic's favor marks the first time a judge has ruled that training artificial intelligence systems with copyrighted material is a fair use, according to copyright law expert Jeremy Goldman, a partner in the law firm Frankfurt Kurnit Klein & Selz.
advertisement
advertisement
"It's the most important ruling to date in this entire realm," Goldman says.
The decision comes as other companies -- including OpenAI, Google and Meta -- are facing copyright infringement suits over their alleged use of newspapers, books, magazines and other material to train chatbots.
Those cases are now "going to have to grapple with this ruling," Goldman says.
Alsup's decision comes in a dispute dating to last August, when the authors Andrea Bartz, Charles Graeber, and Kirk Wallace Johnson alleged in a class-action complaint that Anthropic infringed copyright by using published books to train its large language model (LLM).
"Anthropic’s immense success has been built, in large part, on its largescale copyright theft," the authors alleged.
Anthropic argued to Alsup that using books to train large language models is fair use -- regardless of whether the books were purchased legally or downloaded from piracy sites.
"Copyright law does not give plaintiffs the right to prevent Anthropic from making copies in order to study plaintiffs’ writing, extract uncopyrightable information from it, and use what it learned to create revolutionary technology that itself does entirely new things," the company argued in written papers.
Alsup sided with Anthropic regarding the books it bought.
"Anthropic used copies of authors’ copyrighted works to iteratively map statistical relationships between every text-fragment and every sequence of text-fragments so that a completed LLM could receive new text inputs and return new text outputs as if it were a human reading prompts and writing responses," Alsup wrote.
"The purpose and character of using copyrighted works to train LLMs to generate new text was quintessentially transformative," he added. "Like any reader aspiring to be a writer, Anthropic’s LLMs trained upon works not to race ahead and replicate or supplant them -- but to turn a hard corner and create something different."
But he also rejected Anthropic's fair use argument regarding books it downloaded from piracy sites, writing that not all of those books were used to train chatbots.
"Not every book Anthropic pirated was used to train LLMs. And, every pirated library copy was retained even if it was determined it would not be so used," Alsup wrote.