AI Startup Sued By Major Publishers Over Alleged Copyright Infringement

Cohere, an AI startup based in Canada, is being sued by 14 publishers including Condé Nast parent Advance Publications, the Los Angeles Times, The Atlantic, Forbes Media, The Guardian, McClatchy, Business Insider, Newsday, the Toronto Star and Vox Media.

All have filed a lawsuit against Cohere alleging the generative AI (GAI) startup has infringed on each publisher’s copyrighted material.

Publisher data has become the gold for companies trying to train AI models. And it's not something that will stop anytime soon, because models continually need to be updated. Search engines may have set a bad precedent by crawling the web without giving compensation to publishers for the exception of clicks and traffic to their websites. 

Without proper agreements, copyright infringement in this manner harms publishers’ referral traffic, the complaint says.

advertisement

advertisement

The complaint accuses Cohere of using about 4,000 copyrighted works to train its AI models and display large portions of articles. Sometimes the company used entire articles.

Pam Wasserstein, president of Vox Media, told The Wall Street Journal the suite is intended to establish terms for licensing journalism using AI for training and other uses.

The lawsuit -- filed Thursday in the Southern District of New York -- also alleges that Cohere infringed on publishers’ trademarks by “hallucinating” content that was not really published by the outlets.

“Publishers say the AI-powered chatbots that provide comprehensive answers to user queries threaten their business models, which are reliant on advertising and subscription revenue,” reported the WSJ.

The company told the WSJ that the lawsuit is “misguided and frivolous” and Cohere stands by its practices for training AI models.

AI companies will always say they prioritize controls that mitigate the risk of IP infringement to respect the rights of holders, but there have been countless lawsuits alleging similar actions.

The allegations led to multimillion-dollar agreements between companies -- for example, a deal between Reddit, which holds the data, and Google or OpenAI, which require the data to train their respective AI models.

The lawsuit also alleges that answers with factually inaccurate information are attributed to the publishers. 

For example, when prompted to pull an article from October 7, 2024 in The Guardian titled “‘The pain will never leave’: Nova massacre survivors return to site one year on,” which references Hamas’s attack on the Israeli music festival, Cohere referenced an entirely different and nonexistent article — about a 2020 attack in Nova Scotia, Canada.

Cohere said the engine pulled from an article published in June 2022 by The Guardian about returning to the scene of the 2020 mass shooting.

This week Cohere released a Secure AI Frontier Model Framework that contains Risk Identification, Risk Assessment and Mitigation, Risk Assurance Mechanisms, Transparency, and Research and External Stakeholder Engagement. 

Next story loading loading..