Commentary

All The Private Chats Fit To Sift

by Wendy Davis , Staff Writer, 4 hours ago

Citing privacy concerns, artificial intelligence company OpenAI is asking a federal judge to reconsider an order requiring it to provide The New York Times and other newspapers with logs of 20 million conversations between users and ChatGPT.

"This data belongs to ChatGPT users all over the world -- families, students, teachers, government officials, financial analysts, programmers, lawyers, doctors, therapists, and even journalists -- whose private thoughts and confidential business information may now be exposed," OpenAI's lawyers argue in a letter sent Wednesday to U.S. District Court Magistrate Judge Ona Wang in New York.

"To be clear: anyone in the world who has used ChatGPT in the past three years must now face the possibility that their personal conversations will be handed over to The Times to sift through at will in a speculative fishing expedition," the company's lawyers add.

OpenAI is asking Wang to conduct further proceedings regarding the material, and to immediately halt the order to disclose the data on the grounds that turning over the information "will irreversibly harm" users' privacy.

The news organizations countered in a letter filed with Wang late Wednesday that OpenAI is simply trying to avoid turning over evidence.

"OpenAI’s ill-considered request is not about protecting user privacy. That concern is simply a pretext for yet another attempt to evade production of evidence," the organizations argue.

Wang issued the order Friday at the request of news organizations that are suing OpenAI over alleged copyright infringement. The Times and others claim OpenAI wrongly trains its large language models on news articles, and that it displays verbatim excerpts from articles in response to queries by users.

Lawyers for the news organizations told Wang they need the chat logs in order to analyze ChatGPT's outputs, arguing in an October 30 letter to Wang that the conversations will provide direct evidence of how users interact with the large language model.

Wang's order specifically requires OpenAI to disclose "anonymized" chat logs by Thursday, or within seven days of "completing the de-identification process."

A similar high-profile dispute came up earlier this century, in Viacom's copyright infringement lawsuit against Google's YouTube. In 2008, a federal judge presiding over that matter ordered Google to disclose records showing which users watched particular YouTube videos. Viacom intended to use data about viewing habits to prove that pirated clips are popular with users.

That order drew widespread condemnation by privacy advocates, and Google and Viacom later agreed to take steps to obscure user IDs and IP addresses by replacing them with anonymous identifiers. (The underlying lawsuit was ultimately settled in 2014.)

OpenAI suggests that a similar procedure won't protect users in this case.

"The logs at issue here are complete conversations: each log in the 20 million sample represents a complete exchange of multiple prompt-output pairs between a user and ChatGPT," the company argues. "Disclosure of those logs is thus much more likely to expose private information, in the same way that eavesdropping on an entire conversation reveals more private information than a 5-second conversation fragment."

The company points to a sworn declaration by a member of its technical staff who said its deidentification procedure is designed to shed certain types of personally identifiable information, passwords and other "sensitive" data, but not content such as health or financial information.

Open AI on Wednesday also said in a blog post that the Times' demand for information "disregards long-standing privacy protections, breaks with common-sense security practices, and would force us to turn over tens of millions of highly personal conversations from people who have no connection" to the litigation.

A spokesperson for the Times accused OpenAI of "fear-mongering," writing in an email to MediaPost that the company's post "misleads its users and omits the facts."

"No ChatGPT user’s privacy is at risk," the spokesperson stated, adding that OpenAI’s terms of service allow it to harness users' chats for training, and to disclose chats for litigation.

ai, chat, digital, generative ai, newspapers, policy, privacy

Next story loading

About the Author

Wendy Davis is a Senior Writer at MediaPost. You can reach Wendy at wdavis@mediapost.com