Large language model (LLM) crawlers are harvesting content at scale for training and redistribution, the result often being IP theft, SEO dilution and lost
revenue.
Yet 68% of media and publishing sites are failing to detect basic bots -- a 20% increase YoY, according to DataDome’s 2025 Global Bot Security
Report.
Fewer than 30% are partially protected (down 16% YoY), while less than 2% are fully protected, detecting basic and sophisticated AI-powered bots (down 4% YoY).
LLM
crawler volume rose from 2.6% of verified bot traffic in January 2025 to 10.1% by August across DataDome’s customer base.
“AI agents are rewriting the rules of online
engagement,” says Jérôme Segura, vice president of threat research at DataDome. “They mimic human behavior, spawn synthetic browsers, bypass CAPTCHAs, and adapt in real
time.”
Segura adds: “Traditional defenses, built to spot static automation, are collapsing under this complexity. Businesses can’t tell if the AI traffic they’re
seeing is good or bad, which leaves them both exposed to fraud and blind to opportunity.”
advertisement
advertisement
The study presents these findings:
Of the domains studied, “88.9% disallow
GPTBot in their robots.txt files, yet this measure offers little real protection,” the study notes. “AI-powered crawlers and browsers ignore these directives, rendering static blocking
strategies obsolete.”
At the same time, legacy defenses are failing: A mere 2.8% of websites were fully protected this year, versus 8.4% in 2024.
And, AI-driven traffic
doesn’t stop at scraping. This year, 64% of AI bot traffic reached forms, 23% login pages, and 5% checkout flows. The result: new vectors for fraud, account takeover and compliance risk.
Who is suffering the most? High-risk areas such as Government, Nonprofit and Telecom. In contrast, Travel & Hospitality, Gambling, and Real Estate had the highest combined rates of full and
partial protection.
Overall, detection rates for sites with bot security tools deployed “topped out at just 42%, with some detecting only 6% of bot traffic, revealing
major gaps in real-world effectiveness even among providers claiming bot mitigation as a core capability,” the study states. “For media and publishing, who have mostly defaulted to
blocking AI-traffic, this is a significant issue.”