Commentary

AI's Risky Business: New Content Suit Is Filed, Old Case Gets A New Opinion

It has been a watershed week in the ongoing battle about content scraping by AI vendors, and there's one day left to the week as this is written. 

In a case, announced Thursday, a group of  big-name publishers filed a suit against a Canadian AI outfit called Cohere Inc., which reportedly is valued at over $5 billion. 

The allegations are as follows:

“Cohere takes the creative output of Publishers, some of the largest, most enduring, and most important news, magazine, and digital publishers in the United States and around the world,” the complaint states. “Without permission or compensation, Cohere uses scraped copies of our articles, through training, real-time use, and in outputs, to power its artificial intelligence (“AI”) service, which in turn competes with Publisher offerings and the emerging market for AI licensing.”

It gets worse, “Not content with just stealing our works, Cohere also blatantly manufactures fake pieces and attributes them to us, misleading the public and tarnishing our brands," the complaint adds. 

advertisement

advertisement

Why and how is this happening? Here the complaint gets a little wonky. 

“Cohere copies Publishers’ works to train its suite of large language model (“LLM”) AI systems, called the ‘Command Family of products,’ the filing continues. “Inherent to Cohere’s goal of providing trustworthy, “verifiable answers,” its models are designed to consult “retrievable sources” including Publishers’ websites, a feature Cohere calls “crucial for use cases like content generation” or knowledge assistance.

The publishers involved in the case, which is on file with the U.S. District Court for the Southern District of New York include Advance Local Media, Condé Nast, The Atlantic, Forbes Media, The Guardian, Business Insider, LA Times, McClatchy Media Company, Newsday, Plain Dealer Publishing Company, POLITICO, The Republican Company, Toronto Star Newspapers and Vox Media.

“We are going to court to protect our rights,” says Danielle Coffey, president and CEO of the News/Media Alliance, to which the plaintiffs belong. “As generative AI becomes more prevalent, it is imperative that legal protections be enforced so that innovation can flourish responsibly. This not only protects investments in the creative process and developing intellectual property, but supports the quality of what users consume and the sustainability of the AI products themselves.” 

In a statement on its site, Condé Nast says, "Cohere’s behavior amounts to massive, systematic copyright infringement, as well as trademark infringement."

In another news development on Tuesday, U.S. Circuit Judge Stephanos Bibas reversed his earlier opinion and granted a victory to Thomson Reuters, believed to be the first in this area.  

Thomson Reuters offers a legal research platform called Westlaw, which users can pay to access cases, statutes and law journal content. 

The problem started when Ross, a new competitor to Westlaw, approached Thomson Reuters to license Westlaw content. 

Thomson Reuters refused because Ross was its competitor. 

"So to train its AI, Ross made a deal with LegalEase to get training data in the form of “Bulk Memos,” Bibas writes. “Bulk Memos are lawyers’ compilations of legal questions with good and bad answers. LegalEase gave those lawyers a guide explaining how to create those questions using Westlaw headnotes, while clarifying that the law- yers should not just copy and paste headnotes directly into the questions.”

The scope of this operation? “LegalEase sold Ross roughly 25,000 Bulk Memos, which Ross used to train its AI search tool,” Bibas writes. In other words, Ross built its competing product using Bulk Memos, which in turn were built from Westlaw headnotes. When Thomson Reuters found out, it sued Ross for copyright infringement.

The judge granted a summary to Thomson Reuters on the fair use question and partial summary judgment on the fair use question pertaining to the headnotes. 

Why did Bibas reverse himself? 

“I thought that “Ross’s use might be transformative, creating a brand-new research platform that serves a different purpose than Westlaw. If that were true, then Ross would not be a market substitute for Westlaw. Plus, I worried whether there was a relevant, genuine issue of material fact about whether Thomson Reuters would use its data to train AI tools or sell its headnotes as training data. And I thought a jury ought to sort out 'whether the public’s interest is better served by protecting a creator or a copier.'”

He adds, “In hindsight, those concerns are unpersuasive.”

Certainly, we haven’t heard the last of this case. But its significance may dim as some of the other lawsuits, like those against OpenAI are adjudicated. 

Last November, OpenAI won a signal victory when a federal court dismissed a lawsuit filed against it by Raw Story Media and Alternet Media. 

The court stated that defendants may, absent permission, reproduce or even create derivatives of Plaintiffs' works – without incurring liability under Section 1202 – as long as Defendants keep Plaintiffs' (content management system) intact. 

Perhaps that decision will be reversed, too. 

 

 

Next story loading loading..