Commentary

Anthropic To Train Claude On Chats As They Leak Into Google Search

Several writers last year filed a lawsuit against Anthropic alleging that its AI assistant Claude had infringed copyright. It trained its large language models on their works — pirated fiction and nonfiction books. 

Shadow libraries, unauthorized online repositories, or databases, of books, academic articles, and other content were used.

Anthropic agreed to pay $1.5 billion to some 500,000 authors whose work trained the models without permission. The request for a preliminary approval was filed Friday with a San Francisco federal judge, but this involuntary training or copyright theft is not what caught my attention to the illegal use of content to train large language models (LLM) that continue to take consumer privacy.

Why would LLMs be trained on fiction books? Perhaps to improve their ability to understand and generate higher-quality creative.

advertisement

advertisement

Nonfiction provides factual information, while fiction teaches models nuances of narrative and human experience. Still, it is fiction, and the reactions of characters may not reflect the ways an actual human may react in a specific situation.

Executives must have known the court was getting close to making a decision, because Anthropic recently changed its updated privacy terms that will take effect on September 28. Users see it when logging into Claude.ai. The first sign of this update came in a blog post in late August.

The settlement is among the first in dozens of copyright lawsuits filed against AI companies like OpenAI, Media, and Midjourney, according to Bloomberg. All alleged misuse of proprietary online content.

Anthropic wrote in a court filing that it felt “inordinate pressure” to cut a deal to avoid a potentially business-ending trial that could have put the company on the hook for as much as $1 trillion in damages, according to the report.

Emails about the change in terms came last week to ensure users knew. Anthropic previously did not use consumer chat data to train its model. Now it wants to train its AI systems on user conversations and coding sessions, and plans to extend data retention to five years for those who do not opt out.

These changes only affect consumer accounts on Claude Free, Pro, and Max plans. For those who use Claude for Work, via the API, or other services under its Commercial Terms or other Agreements, then these changes don't apply. 

Then today, Forbes reported Anthropic has become the third AI company, following OpenAI and Grok whose user chatbot conversations have found their way into Google search results. 

"Unlike OpenAI and xAI though, Anthropic said it blocked crawlers from Google, ostensibly preventing those pages from being indexed," Forbes reported. "But despite this, hundreds of Claude conversations still became accessible in search results (they have [since] been removed)."

Anthropic spokesman Gabby Curtis told Forbes that Claude conversations were only visible on Google and Bing because users had posted links to the conversations online or on social media.

“We give people control over sharing their Claude conversations publicly, and in keeping with our privacy principles, we do not share chat directories or sitemaps of shared chats with search engines like Google and actively block them from crawling our site,” Curtis told Forbes in an email.

Data retention also will change. For those who agree to let Anthropic use its data to train models, the company will retain this data for five years. The company said users retain "complete" control over how it use the data.

"If you change your training preference, delete individual chats, or delete your account, we'll exclude your data from future model training," the email said. 

Anthropic also will restrict service to entities that are more than 50% owned by companies headquartered in unsupported regions, such as China, regardless of where they are in the world.

A Chinese media outlet reported Singapore-based Trae, an AI-powered code editor launched by ByteDance in China for overseas users, is known to use OpenAI’s GPT and Anthropic's Claude models. A number of users of Trae have raised the issue of refunds to Trae staff on developer platforms over concerns their access to Claude will end.

Next story loading loading..