The debate over content scraping took a new turn on Friday when TollBit, a content licensing startup, alleged that artificial intelligence companies are bypassing a web standard used by
publishers to block scraping, Reuters reports.
The web standard is Robots Exclusion Protocol, or robots.txt, which was created in the 1990s to prevent websites from being
overwhelmed with web crawlers, according to Reuters.
A Wired probe alleges that AI search startup Perplexity has likely bypassed efforts to block its web crawler via robots.txt,
Reuters continues. And TollBit says its analytics show that numerous AI agents are bypassing the robots.txt protocol.
At the same time, Forbes has accused Perplexity of plagiarizing
its stories—for instance, its article on former Google CEO Eric Schmidt’s drone project earlier this month.
Perplexity published its own story, containing “eerily
similar wording” and an illustration from a prior Forbes story on Schmidt, Forbes states.
advertisement
advertisement
Publishers Daily could not independently confirm this
charge at deadline But Forbes continues that the post, “which looked and read like a piece of journalism, didn’t mention Forbes at all.”
"Without the ability to opt out of massive scraping, we cannot monetize our valuable content and pay journalists,” says Danielle Coffey, president of the News/Media Alliance,
according to Reuters. “This could seriously harm our industry."
On another front, Coffee and others expressed wariness over Google’s AI-Assisted Search, which summarizes
articles in greater detail.
“What Google’s trying to do is summarize things in which then, potentially, if you see the summary you don’t necessarily
don’t dig down into the links that the search engine presents,” said Dave Hatter, cyber security consultant for intrustIT, during a broadcast on WVXU News in Cincinatti. “This
could have a very negative effect on people’s traffic and ultimately ad revenue or whatever it is they hope to get out of those links.”
Coffey noted that
these companies need the “quality vetted content” produced by news organizations to inform and train their models.
“Then they compete with our original content, with our
audience, and could potentially put us out of business,” Coffee said. Given Google’s position as a dominant monopoly in search, “That could be catastrophic for us,” she
added.