Microsoft Exposes AI Bot Site Traffic

Microsoft's Clarity has released a way to identify patterns that show how artificial intelligence (AI) crawlers, search bots, and automated agents interact with web content long before AI agents serve up citation or referral traffic to publisher or content sites. 

"Bot Activity," the feature launched Thursday, is geared toward website owners and digital professionals who need to identify whether traffic is generated by humans or bots.

It provides a way to identify early signal in AI content cycles. 

These signals are valuable because this represents the earliest observable interaction between content and AI systems, occurring before grounding, citation, or referral activity.

The ability to identify bots that crawl a website is often the earliest identifiable signal of how content might be used later by AI systems that are either summarized or cited in search results, or served in AI assistants. Unlike traditional crawlers, AI systems can access content continuously, across many platforms in significant volumes and at high speeds. 

advertisement

advertisement

Measurement is key in answering questions for publishers and site owners, with new types of questions such as the ability to identify systems that access content and whether bot activity is productive or expensive, and which pages can be identified and accessed most aggressively by the bots.

Microsoft's AI Bot Activity provides data on how verified bots interact with a site's content. Instead of treating crawler traffic as background noise, Clarity turns it into data publishers and marketers can measure and analyze. It makes the data accessible to identify how frequently the bot crawls the site as well as identify the pages, the paths, and the resources.

The data identifies how often the AI agents and bots access content and for what purposes such as indexing, retrieval preparation, embedding generation, or other AI-driven workflows. 

Microsoft data released in December shows how AI assistants have reshaped website traffic patterns. Research shows AI referrals grew 155% during an eight month period. It examined traffic from ChatGPT, Microsoft Copilot, Google Gemini, and similar platforms after they generated referrals to source websites. 

Bot Activity relies on server-side log data collected through CDN integrations, data that client-side analytics cannot see. 

Microsoft notes that connecting to a server or CDN integration may result in additional costs depending on the provider, cloud platform, traffic volume, and regional configuration.

Following the connection, Clarity begins processing server-side logs to surface crawl activity. The Bot operator metric shows the proportion of the site traffic generated by automated bots and agents rather than human users. By default, this view focuses on AI-related bots and crawlers.

The primary metric reflects the share of total requests that originate from bot activity. A high value indicates that automated access represents a significant portion of your overall traffic. These insights can support infrastructure planning, performance optimization, content accessibility decisions, and evaluations of whether bot activity is contributing to downstream value.

Path requests metric highlights the pages and resources that receive the highest number of automated requests. Requests are aggregated at the path level and include content types such as HTML pages, images, JSON endpoints, XML files, and other assets.

Next story loading loading..