Commentary

AI Data Wipe: OpenAI Mistakenly Erases Content Needed In 'New York Times' Case

The lawsuit filed by The New York Times against OpenAI and Microsoft has taken a new turn with the erasure by OpenAI of documents needed to prove use of the plaintiffs’ content for training models.

OpenAI had provided the plaintiffs, including The New York Daily News and other so-called News Plaintiffs with two dedicated virtual machines to help them determine usage of their content, according to a letter from attorneys for the Times to U.S. Judge Ona T Wang. 

But on Nov. 14, “all of News Plaintiffs’ programs and search result data stored on one of the dedicated virtual machines was erased by OpenAI engineers,” it continues. 

The Times and other plaintiffs have “no reason to believe” that the erasure was intentional. But it apparently has put them at a serious inconvenience.

“While OpenAI was able to recover much of the data that it erased, the folder structure and file names of the News Plaintiffs’ work product have been irretrievably lost,” the letter states. “Unfortunately, without the folder structure and original final names, the recovered data is unreliable and cannot be used to determine where the News Plaintiffs’ copied articles were used to build Defendants’ models.”

advertisement

advertisement

Don’t expect any quick resolution to this. It was not clear at deadline if fellow plaintiff the Center for Investigative Reporting also lost data. 

Meanwhile, the plaintiffs have been “forced to recreate their work from scratch using significant person-hours and computer processing time,” the letter continues. “The News Plaintiffs learned only yesterday that the recovered data is unusable and that an entire week’s worth of its experts’ and lawyers’ work must be re-done, which is why this supplemental letter is being filed today.”

The plaintiffs have asked the Court to “order OpenAI to identify and admit which of the News Plaintiffs’ works it used to train each of the GPT models.” 

In another development, Microsoft has asked the court to order disclosure by the Times of its efforts to develop generative AI tools. The letter states, “Any failed efforts by The Times to develop its own Generative AI system using its works would undermine its claims of harm.”

Microsoft is requesting a conference with the plaintiffs and Judge Wang.

The case is on file with the U.S. District Court for the Southern District of New York. 

 

Next story loading loading..