Commentary

AI Alert: Publishers Try To Block-Or Get Paid For-Content Scraping

The AI scraping issue is just not going away. Publications are lining up to technologically challenge ChatGPT, the gen AI tool, and to win fair compensation when their data is scanned. 

This isn’t only about your headlines and articles popping up on somene else’s site: It’s about using your content to train ChatGPT. 

The discussion appears to be about compensation for use of publishers’ content. Failing that, publishers are using technology in their own right to put a halt to the process.

At least 535 news organizations have installed a blocker to prevent stories from being snared and used in this way—since August, the Washington Post reported on Friday.  

How would a payment agreement look? The Post adds that Shutterstock, a stock photo site, has a partnership for providing training data to OpenAI, and that it also has a Contributor Fund for compensating artists.

advertisement

advertisement

In a time when print sales are declining and digital remains dicey for some, many publishers could use this new form of revenue. And without such agreements, the courts could be clogged with a welter of legislation.   

Can anything good be said of gen AI scraping? To answer this, we have to broaden the issue and report  how Google has answered a class action lawsuit recently filed against it. If nothing else, it shows that this is a complex issue, not so easily resolved. 

“To realize the promise of this technology, Generative AI models must learn a great deal: for example, to communicate in human language, recognize context and connections in data, and respond usefully on a multitude of subjects. Like a human mind, computer models require a great deal of training to learn these things. That means exposure to vast quantities of information that is publicly available or otherwise lawful to use.”

The filing adds that “using publicly available information to learn is not stealing. Nor is it an invasion of privacy, conversion, negligence, unfair competition, or copyright infringement.”

That’s one side of it, anyway. 

 

Next story loading loading..