The AI scraping issue is just not going away. Publications are lining up to technologically challenge ChatGPT, the gen AI tool, and to win fair compensation when their data is
scanned.
This isn’t only about your headlines and articles popping up on somene else’s site: It’s about using your content to train ChatGPT.
The
discussion appears to be about compensation for use of publishers’ content. Failing that, publishers are using technology in their own right to put a halt to the process.
At least 535 news organizations have installed a blocker to prevent stories from being snared and used in this way—since August, the Washington Post reported on
Friday.
How would a payment agreement look? The Post adds that Shutterstock, a stock photo site, has a partnership for providing training data to
OpenAI, and that it also has a Contributor Fund for compensating artists.
advertisement
advertisement
In a time when print sales are declining and digital remains dicey for some, many publishers could use this new form
of revenue. And without such agreements, the courts could be clogged with a welter of legislation.
Can anything good be said of gen AI scraping? To answer this, we
have to broaden the issue and report how Google has answered a class action lawsuit recently filed against it. If nothing else, it shows that this is a complex issue, not so easily
resolved.
“To realize the promise of this technology, Generative AI models must learn a great deal: for example, to communicate in human language, recognize context and connections
in data, and respond usefully on a multitude of subjects. Like a human mind, computer models require a great deal of training to learn these things. That means exposure to vast quantities of
information that is publicly available or otherwise lawful to use.”
The filing adds that “using publicly available information to learn is not stealing. Nor is it an invasion of
privacy, conversion, negligence, unfair competition, or copyright infringement.”
That’s one side of it, anyway.