OpenAI sounds almost as if it has hurt feelings. It has replied to the copyright lawsuit filed against it by The New York Times by saying it works with many news organizations
— and that it wants to work with The Times.
These arguments were included in a blog post published on Monday, and reported by The
Verge. Of course, they have yet to be adjudicated in a court of law.
To recap, The Times filed suit last month, alleging that Microsoft and
OpenAI utilized large-language models “that were built by copying and using millions of The Times’s copyrighted news articles, in-depth investigations,
opinion pieces, reviews, how-to guides, and more.”
What does OpenAI say in its own defense? For one thing, it points to early partnerships
with Associated Press, Axel Springer, American Journalism Project and NYU.
advertisement
advertisement
OpenAI also makes these arguments:
1.We collaborate with new organizations and
are creating new opportunities. The firm has met with dozens, as well as leading industry organizations like the News/Media Alliance, to explore opportunities, discuss their concerns, and provide
solutions.
2. Training is fair use, but we provide an opt-out because it’s the right thing to do. The fair use argument is supported by long-standing and widely accepted precedents. We
view this principle as fair to creators, necessary for innovators, and critical for US competitiveness.
Then why offer an opt-out? Because “legal right is less important to us than being
good citizens," OpenAI says. "We have led the AI industry in providing a simple opt-out process for publishers (which The New York Times adopted in August 2023) to prevent our
tools from accessing their sites."
OpenAI continues with their arguments:
3. "Regurgitation” is a rare bug that we are working to drive to zero. Memorization is a rare failure
of the learning process that we are continually making progress on, but it’s more common when particular content appears more than once in training data, like if pieces of it appear on lots of
different public websites. So we have measures in place to limit inadvertent memorization and prevent regurgitation in model outputs. We also expect our users to act responsibly; intentionally
manipulating our models to regurgitate is not an appropriate use of our technology and is against our terms of use.
4. The New York Times is not telling the full
story.
OpenAI says the discussions for creating a partnership apparently fell apart in December and that it learned about the lawsuit by reading about it in The Times.
“Along the way, they had mentioned seeing some regurgitation of their content but repeatedly refused to share any examples, despite our commitment to investigate and fix any
issues.”
OpenAI adds that “the regurgitations The New York Times induced appear to be from years-old articles that have proliferated
on multiple third-party websites. It seems they intentionally manipulated prompts, often including lengthy excerpts of articles, in order to get our model to regurgitate. Even when
using such prompts, our models don’t typically behave the way The New York Times insinuates, which suggests they either instructed the model to regurgitate or cherry-picked their examples from
many attempts.”
“Despite their claims, this misuse is not typical or allowed user activity, and is not a substitute for The New York Times. Regardless, we are continually making
our systems more resistant to adversarial attacks to regurgitate training data, and have already made much progress in our recent models.”
This is quite a lot to respond to, assuming
The Times even plans a counter.