Commentary

AI Bait: Can Content Crawlers Really Be 'Exhausted?'

Cloudflare has been attempting to protect publishers from having their content scraped with an approach that it says simply exhausts AI crawlers. 

Instead of blocking them, Cloudflare’s AI Labyrinth links crawlers that ignore “no crawl” directives to a series of fake, AI-generated pages that are “convincing enough to entice a crawler to traverse them,” the company said in a blog post earlier this year. 

While real-looking, “this content is not actually the content of the site we are protecting, so the crawler wastes time and resources.” 

It sounds like an AI version of the Rope-a-Dope. But aren’t AI crawlers capable of getting around such a ruse?

Not necessarily. When AI crawlers follow the created links, “they waste valuable computational resources processing irrelevant content rather than extracting your legitimate website data,” Cloudflare writes. This “significantly reduces their ability to gather enough useful information to train their models effectively.”

advertisement

advertisement

Why is this needed? Because traditional ways of blocking malicious bots “can alert the attacker that you are on to them, leading to a shift in approach, and a never-ending arms race,” the post continues.

So how is it done?

“To generate convincing human-like content, we used Workers AI with an open source model to create unique HTML pages on diverse topics.”

In doing this, Cloudflare must prevent creation of inaccurate content that would spread disinformation. 

It’s worth a try, and has to be cheaper than litigation. And it sounds like more work for content creators. 

Clearly, a new approach is needed. 

Cloudflare reports that it has seen “an explosion of new crawlers used by AI companies to scrape data for model training. AI Crawlers generate more than 50 billion requests to the Cloudflare network every day, or just under 1% of all web requests we see.”

But would-be scrapers can certainly build additional capacity and devise other means around AI Labyrinth.

Perhaps in response to this, Cloudflare CEO Matthew Prince is promising another tool that will stop content scraping on a macro scale, Axios reports. Details are scant on what this will be. But Prince states that “every publisher you have ever heard of is on board,” Axios adds. 

 

 

Next story loading loading..