Commentary

Cloudflare Points Finger At Perplexity, Accusing Firm Of Stealth Activity

Cloudflare, which last month unveiled a system for blocking content scrapers, claims it sees stealth crawling behavior on the part of Perplexity, the search/answer engine.  

This is the same Perplexity that just signed a content licensing deal with Gannett and has such arrangements with numerous other publishers. These are accusations only. 

But Cloudflare has de-listed Perplexity as a verified bot and “added heuristics to our managed rules that block this stealth crawling,” it says. 

What is so stealthy about it?  

“Although Perplexity initially crawls from their declared user agent, when they are presented with a network block, they appear to obscure their crawling identity in an attempt to circumvent the website’s preferences,” Cloudflare argues. “We see continued evidence that Perplexity is repeatedly modifying their user agent and changing their source ASNs to hide their crawling activity, as well as ignoring--or sometimes failing to even fetch--robots.txt files.” 

advertisement

advertisement

Cloudflare continues, “The Internet as we have known it for the past three decades is rapidly changing, but one thing remains constant: it is built on trust. There are clear preferences that crawlers should be transparent, serve a clear purpose, perform a specific activity, and, most importantly, follow website directives and preferences.”

Of course, Cloudflare concludes with a bit of a plug for its own service. 

"It's been just over a month since we announced Content Independence Day, giving content creators and publishers more control over how their content is accessed,” the Cloudflare blog adds. “Today, over two and a half million websites have chosen to completely disallow AI training through our managed robots.txt feature or our managed rule blocking AI Crawlers.” 

Perplexity had not responded to a request for comment at deadline. But Perplexity spokesperson Jesse Dwyer told TechCrrunch that the Cloudflare’s blog post is a sales pitch, and that the screenshots show no content was accessed. Moreover, Dwyer says the bot named in the blog "isn't even ours," according to TechCrunch.  

Next story loading loading..