Commentary

OpenAI Slaps Back: Firm Responds To NY Times Lawsuit In Blog

OpenAI sounds almost as if it has hurt feelings. It has replied to the copyright lawsuit filed against it by The New York Times by saying it works with many news organizations — and that it wants to work with The Times 

These arguments were included in a blog post published on Monday, and reported by The Verge. Of course, they have yet to be adjudicated in a court of law. 

To recap, The Times filed suit last month, alleging that  Microsoft and OpenAI utilized large-language models “that were built by copying and using millions of The Times’s copyrighted news articles, in-depth investigations, opinion pieces, reviews, how-to guides, and more.”

What does OpenAI say in its own defense? For one thing, it points to early partnerships with Associated Press, Axel Springer, American Journalism Project and NYU.

advertisement

advertisement

OpenAI also makes these arguments: 

1.We collaborate with new organizations and are creating new opportunities. The firm has met with dozens, as well as leading industry organizations like the News/Media Alliance, to explore opportunities, discuss their concerns, and provide solutions.

2. Training is fair use, but we provide an opt-out because it’s the right thing to do. The fair use argument is supported by long-standing and widely accepted precedents. We view this principle as fair to creators, necessary for innovators, and critical for US competitiveness.

Then why offer an opt-out? Because “legal right is less important to us than being good citizens," OpenAI says. "We have led the AI industry in providing a simple opt-out process for publishers (which The New York Times adopted in August 2023) to prevent our tools from accessing their sites."

OpenAI continues with their arguments:

3. "Regurgitation” is a rare bug that we are working to drive to zero. Memorization is a rare failure of the learning process that we are continually making progress on, but it’s more common when particular content appears more than once in training data, like if pieces of it appear on lots of different public websites. So we have measures in place to limit inadvertent memorization and prevent regurgitation in model outputs. We also expect our users to act responsibly; intentionally manipulating our models to regurgitate is not an appropriate use of our technology and is against our terms of use.

4. The New York Times is not telling the full story. 

OpenAI says the discussions for creating a partnership apparently fell apart in December and that it learned about the lawsuit by reading about it in The Times. “Along the way, they had mentioned seeing some regurgitation of their content but repeatedly refused to share any examples, despite our commitment to investigate and fix any issues.”

OpenAI adds that “the regurgitations The New York Times induced appear to be from years-old articles that have proliferated on multiple third-party websites. It seems they intentionally manipulated prompts, often including lengthy excerpts of articles, in order to get our model to regurgitate. Even when using such prompts, our models don’t typically behave the way The New York Times insinuates, which suggests they either instructed the model to regurgitate or cherry-picked their examples from many attempts.”

“Despite their claims, this misuse is not typical or allowed user activity, and is not a substitute for The New York Times. Regardless, we are continually making our systems more resistant to adversarial attacks to regurgitate training data, and have already made much progress in our recent models.”

This is quite a lot to respond to, assuming The Times even plans a counter. 

 

 

 

1 comment about "OpenAI Slaps Back: Firm Responds To NY Times Lawsuit In Blog".
Check to receive email when comments are posted.
  1. Ed Papazian from Media Dynamics Inc, January 10, 2024 at 12:42 p.m.

    Ray, this certainly raises some important questions. If a company "publishes" a book, magazine,  or individual article that is "protected" by copyright but is widely circulated on the internet and elsewhere---either portions of it or in full---can anyone simply take this material and use in some manner  as part of some commercially available  information system without permission or payment to the copyright holder?Similar questions might arise for other works---movies or TV shows, for example. If somebody posts an episode of the classic, top rated,  hit 1965  TV sitcom, "The Ed Papazian Show", on YouTube can it be used in some way by somebody in a commercial manner without permission or payment to the production company that thought it had obtained copyright protection?

    It seems to me that at the very least the original creater---or copyright holder---- should be credited in some very visible manner---but doing this so the credit is evident to the user is not as easy as it sounds. Do you provide large numbers of tedious footnotes throughout your hybrid work or merely an appendix which contains the credits--which most users wont bother to read?

    As a publisher, I will watch this case and others  that are likely to develop with great interest. I suspect that some sort of formulation may be worked out which is based on the extent to which the  copyrighted material is utilized---a few sentences may be deemed OK but not entire paragraphs or  articles verbatim.

Next story loading loading..