New York Times Says Halt: Paper Sues Microsoft, OpenAI Over 'Scraping'

The battle over scraping of news content by chatbots has brought a journalistic heavyweight into the court system. 

The New York Times has filed suit against Microsoft and OpenAI, charging that they used its copyrighted work to compete against it. 

The suit, on file with the U.S. District Court for the Southern District of New York, alleges that Microsoft and OpenAI singled out the Times in particular because of its wide newsgathering capabilities and digital strength.

The defendants utilized large-language models (LLMs) “that were built by copying and using millions of The Times’s copyrighted news articles, in-depth investigations, opinion pieces, reviews, how-to guides, and more,” the suit continues. 

It adds, “While Defendants engaged in widescale copying from many sources, they gave Times content particular emphasis when building their LLMs—revealing a preference that recognizes the value of those works.”



The filing continues, “Through Microsoft’s Bing Chat (recently rebranded as “Copilot”) and OpenAI’s ChatGPT, Defendants seek to free-ride on The Times’s massive investment in its journalism by using it to build substitutive products without permission or payment.” 

It goes on, “Powered by LLMs containing copies of Times content, Defendants’ GenAI tools can generate output that recites Times content verbatim, closely summarizes it, and mimics its expressive style, as demonstrated by scores of examples. These tools also wrongly attribute false information to The Times. 

In reporting on the case, the Times notes that Microsoft has committed $13 billion to OpenAI and has incorporated the company’s technology into its Bing search engine.

The story states that it had not obtained comment from Microsoft and OpenAI. 

In its complaint, the paper grimly observes, “If The Times and other news organizations cannot produce and protect their independent journalism, there will be a vacuum that no computer or artificial intelligence can fill.”

The suit alleges  that the defendants "removed The Times’s copyright-management information in building the training datasets containing millions of copies of Times Works, including removing The Times’s copyright-management information from Times Works scraped directly from The Times’s websites and removing The Times’s copyright-management information from Times Works reproduced from third-party datasets." 

The complaint also states that, “As of December 21, 2023, the only text-based content sites ranking above The Times are Wikipedia, Wordpress, and Medium.” 

The Times asks the court to enjoin the defendants from continuing the purported practices, and that it award statutory damages, compensatory damages and restitution. 

Meanwhile, the News/Media Alliance praised the Times for taking the action. 

 “The New York Times’s complaint demonstrates the value of quality journalism to AI developers," says Danielle Coffey, president and CEO of the News/Media Alliance. "These companies repurpose and monetize news content, competing with the very industry they are benefiting from. Quality journalism and GenAI can complement each other if approached collaboratively, but using journalism without permission or payment is unlawful, and certainly not fair use.”

Coffey continues, “The value of quality journalism has been debated for years.  We are at a point where the question is not whether quality journalism should be compensated, rather a question of how much.  In the case of AI, copyright protected content used without authorization should be a priority in releasing these technologies to the public so that responsible innovation can live alongside responsible reporting.”

Barrons reported that the Times stock was up 2.9% following the court filing. 

"Investors possibly see a deal in sight where OpenAI with Microsoft pays the Times for its archives," Barrons adds. 


Next story loading loading..