The copyright infringement case pursued by several publishers against OpenAI has taken a new turn, one bound to cost time and money.
The plaintiffs, including
the New York Times and the New York Daily News and other titles owned by MediaNews Group and Tribune Publishing, accuse OpenAI of deleting millions of digital conversations
despite court orders to preserve them. They want OpenAI be held in contempt and sanctioned.
In a letter to U.S. Magistrate Judge Ona T. Wang, the plaintiffs allege that
“OpenAI compressed tens of billions of consumer ChatGPT logs, which rendered them unsearchable and therefore functionally unavailable,” on 24 occasions since the filing of the
original New York Times complaint.
The news plaintiffs also include The Intercept, the Center for Investigative
Reporting, and Ziff Davis The various cases have been consolidated.
advertisement
advertisement
This is occurring as web scraping, “which begam as a tool
for search indexing, has now mutated to a global extraction industry,” and has become a boardroom crisis, writes Areejit Banerjee in Corporate Compliance Insights.
Research estimates show that “the web-scraping market currently sits at $1.03 billion and is projected to nearly double to $2 billion by
2030,” Banerjee notes.
Banerjee adds, “For boards, compliance officers and chief information security officers
(CISOs), this is no longer a purely technical problem; it is a governance issue that affects fair competition, fiduciary duty and the credibility of the
organization’s data-protection commitments.
Judge Wang had issued two orders regarding data deletion, according to the letter:
- A May order
instructing OpenAI to “preserve and segregate all output log data that would otherwise be deleted on a going forward basis until further order of the
Court.”
- A November order directing OpenAI to produce 20 million deidentified consumer ChatGPT output logs.
This is part of the
discovery process that conceivably could determine the damages that OpenAI might have to pay. But the plaintiffs are not satisfied with the pace, nor the extent of the data production to
date.
After the court “ordered OpenAI to produce 20 million logs over OpenAI’s vociferous and repeated objections, OpenAI substituted millions of
conversations that it was ordered to produce with other conversations – seemingly because it had deleted millions of the selected logs,” they contend.
In addition, OpenAI “made grossly overbroad and inappropriate redactions, including instances where OpenAI redacted URLs corresponding to News Plaintiffs’ own websites and
articles and to conversations in which users requested content from news publishers,” they allege.
The letter to the judge further alleges
that “OpenAI further: (a) destroyed virtually all of its API logs; (b) represented that it does not have possession, custody, or control of the enterprise ChatGPT logs; and (c) destroyed
billions of consumer ChatGPT logs where the user turned ChatGPT’s preservation function off or were the subject of a user-initiated deletion.”
Of
course OpenAI has also signed lucrative licensing contracts with several top publishers.
The case is on file with the U.S. District for the Southern District of New York.