The copyright infringement case pursued by several publishers against OpenAI has taken a new turn, one bound to cost time and money.
The plaintiffs, including
The New York Times and New York Daily News and other titles owned by MediaNews Group and Tribune Publishing, accuse OpenAI of deleting millions of digital conversations despite
court orders to preserve them. They want OpenAI to be held in contempt and sanctioned.
In a letter to U.S. Magistrate Judge Ona T. Wang, the plaintiffs allege that
“OpenAI compressed tens of billions of consumer ChatGPT logs, which rendered them unsearchable and therefore functionally unavailable,” on 24 occasions since the filing of the
original New York Times complaint.
The news plaintiffs also include The Intercept, the Center for Investigative Reporting, and Ziff Davis. The
various cases have been consolidated.
advertisement
advertisement
This is occurring as web scraping, “which began as a tool for search indexing, has now mutated to a global extraction industry,”
and has become a boardroom crisis, writes Areejit Banerjee in Corporate Compliance
Insights.
Research estimates show that “the web-scraping market currently sits at $1.03 billion and is projected to nearly double to $2 billion by 2030,” Banerjee
notes.
Banerjee adds, “For boards, compliance officers and chief information security officers (CISOs), this is no longer a purely technical problem; it is
a governance issue that affects fair competition, fiduciary duty and the credibility of the organization’s data-protection commitments."
Judge Wang had issued two orders regarding data deletion, according to the letter:
- A May order instructing OpenAI to “preserve and
segregate all output log data that would otherwise be deleted on a going forward basis until further order of the Court”
- A November order directing
OpenAI to produce 20 million deidentified consumer ChatGPT output logs
This is part of the discovery process that conceivably could determine the damages that OpenAI
might have to pay. But the plaintiffs are not satisfied with the pace, nor the extent of the data production to date.
After the court “ordered OpenAI
to produce 20 million logs over OpenAI’s vociferous and repeated objections, OpenAI substituted millions of conversations that it was ordered to produce with other conversations –
seemingly because it had deleted millions of the selected logs,” they contend.
In addition, OpenAI “made grossly overbroad and inappropriate
redactions, including instances where OpenAI redacted URLs corresponding to News Plaintiffs’ own websites and articles and to conversations in which users requested content from news
publishers,” they allege.
The letter to the judge also alleges that “OpenAI further: (a) destroyed virtually all of its API logs;
(b) represented that it does not have possession, custody, or control of the enterprise ChatGPT logs; and (c) destroyed billions of consumer ChatGPT logs where the user turned
ChatGPT’s preservation function off or were the subject of a user-initiated deletion.”
Of course OpenAI has also signed lucrative licensing contracts with several
top publishers.
The case is on file with the U.S. District for the Southern District of New York.