Artificial intelligence (AI) companies that expect to crawl a Cloudflare-hosted site will now need permission from content owners.
The permission-based model announced today blocks AI scrapers from accessing content across millions of websites that Cloudflare protects -- about 24% of all sites across the internet.
Cloudflare, an infrastructure provider, also developed a protocol that gives developers of bots and AI agents a standard to identify themselves. The company believes it has created a new “fundamental economic model for the web.”
The permission and payment model marks a significant shift with major implications for the advertising and digital ecosystem.
It also sets per-request prices across Cloudflare supported sites using the HTTP 402 Payment Required code, allowing publishers to decide the fate of their own content.
advertisement
advertisement
This allows content creators and publishers to control access and charge for content use, and helps protect copyrighted materials from unauthorized scraping and potential misuse.
The technology, announced Tuesday, has been implemented as a default setting to prevent AI companies from scraping content without consent or compensation to original creators.
Any domains that are new to Cloudflare will be signed up as opt-out models, unless the company chooses to transition to opt-in.
Website owners choose whether they want AI crawlers to access their content, and decide how companies can use it.
They must clearly state their purpose -- for example, whether crawlers are being used for training, inference, or search -- which will help website owners decide which crawlers to allow.
Cloudflare head of AI control, privacy and media products Will Allen told MediaPost that many companies did not like the binary option of just blocking a crawler or allowing it because it is too limiting, but felt they needed some type of monetization model.
The monetization model increases transparency across the content ecosystem for AI companies and creators.
AI bots must authenticate themselves. In addition, websites must identify those bots -- giving creators and website owners new identification mechanisms and control over which crawlers they want to allow, the company said.
Publishers and content creators also can set a price for AI companies to access their content through a feature called Pay Per Crawl -- a monetization tool that allows publishers to charge AI firms for access to their data. It also includes the ability to set terms and prices for bot traffic.
Cloudflare's developers used a HTTP response code -- the same type of error code used for the 404 “not found” pages -- to create an AI agent paywall.
The 402 code -- one in a series of several -- is a response code used for bots when payments are required -- significantly different than a publisher’s paywall.
"The 402 code has been baked in to the HTTP codes, but not really used," Allen said. "The response code is sent to a specific crawler. The publisher can block them, let them through for free, or charge them. Prices are set by the publisher."
Each crawl costs a certain amount, depending on the publisher.
The pay service is part of a revenue-sharing model with publishers, and targets builders of large language models such as Anthropic, Google, OpenAI, Meta, or any any company gathering data to train foundation models.
Allen declined to talk about the revenue-sharing model because it is still in its early stages of development.
A long list of publishers and technology companies support this change. Some include The Arena Group, The Associated Press, The Atlantic, Atlas Obscura, BuzzFeed, Condé Nast, Digital Content Next, Dotdash Meredith, Fortune, Gannett Media, Half Baked Newsletter, and Hyperscience.
IAB Tech Lab, Independent Media, International Center for Journalists, Internet Brands, Linkup, News/Media Alliance, Pinterest, Quora, Raptive, Reddit, SimpleFeed, Sky News Group, Snopes, SourceForge, Sovrn, Stack Overflow, StockTwits, SustainableMedia.Center, Third Door Media, TIME, Universal Music Group, Webflow, and Ziff Davis also are on this list.
In the past, search engines indexed content and directed users back to original websites, generating traffic and ad revenue for websites.
AI broke the model of rewarding creators who help users discover new and relevant information.
AI crawlers of today collect text, articles, images and other content to generate answers, without sending visitors to the original source.