Meta Sued For Allegedly Training Llama On Pirated Books

Meta and CEO Mark Zuckerberg were sued Tuesday for allegedly downloading books from "notorious pirate sites," and then copying them in order to train the large language model Llama.

Meta and Zuckerberg allegedly "illegally torrented millions of copyrighted books and journal articles from notorious pirate sites and downloaded unauthorized web scrapes of virtually the entire internet," publishers including Elsevier, Hachette, Macmillan and McGraw Hill, along with "Presumed Innocent" author Scott Turow, allege in a class-action complaint filed in U.S. District Court for the Southern District of New York.

"They then copied those stolen fruits many times over to train Meta’s multi-billion-dollar generative AI system called Llama," the plaintiffs added. "In doing so, defendants engaged in one of the most massive infringements of copyrighted materials in history."

advertisement

advertisement

Meta spokesperson Andy Stone says the company will fight the case "aggressively."

"AI is powering transformative innovations, productivity and creativity for individuals and companies, and courts have rightly found that training AI on copyrighted material can qualify as fair use," he stated Tuesday.

Among other allegations, the plaintiffs claim that Zuckerberg and other Meta executives "authorized and directed the torrenting of over 267 TB of pirated material -- equivalent to hundreds of millions of publications and many times the size of the entire print collection of the Library of Congress."

The complaint claims that Meta violates copyright in several ways -- including by downloading pirated copies, using the books to train Llama, and generating outputs that include verbatim copies as well as summaries of the originals.

"AI-generated books are already flooding the world’s largest book marketplace, Amazon, in volumes that materially displace human-authored works," the complaint alleges. "AI-produced titles have saturated Amazon’s Kindle ecosystem, and commentators have described the flood of AI-generated books on Amazon as a persistent, years-long crisis. They also devalue the original works on which the AI model was trained."

The plaintiffs add that Llama is already "competing with texts written by human authors for sales and attention."

For instance, the complaint alleges, one writer "released three books in three months and accidentally left in the published text an AI prompt asking it to 'rewrite' passages 'to align more with' the work of a specific, published author identified by name."

The complaint joins a growing roster of cases brought by publishers, music companies and authors against tech companies that allegedly train generative artificial intelligence systems on copyrighted material. Defendants in those suits have typically argued that their alleged activity was "transformative," and therefore protected by fair use concepts.

Meta itself was hit with a similar lawsuit in California by authors including Sarah Silverman and Junot Diaz.

Last year, U.S. District Court Vince Chhabria in the Northern District of California ruled that Meta didn't infringe copyright by downloading the books and using the texts to train Llama -- but Chhabria also said the ruling was limited to the specific facts of the case, and that copying material in order to train generative artificial intelligence would likely be illegal in other circumstances.

"By training generative AI models with copyrighted works, companies are creating something that often will dramatically undermine the market for those works, and thus dramatically undermine the incentive for human beings to create things the old-fashioned way," Chhabria wrote.

But a different federal judge, William Alsup in the Northern District of California, ruled last year that Anthropic didn't infringe copyright by training Claude on purchased books. Alsup said in that matter that using books to train Claude "was exceedingly transformative and was a fair use."

However, Alsup also ruled that Anthropic wasn't entitled to claim fair use regarding allegations that it downloaded millions of pirated books.

Anthropic later agreed to settle the matter for $1.5 billion, which came to around $3,000 per book to authors whose work was unlawfully downloaded.

Next story loading loading..