Commentary

Chatbot Ad IDs Share Data To Google, Microsoft, Other Analytics Providers

OpenAI, Microsoft and others have recently introduced ads in their respective chatbots that create incentives to identify conversations and track user behavior for ad targeting and conversion measurement. But what happens to user privacy when companies begin tracking and analyzing these conversations?

A paper titled "Tracking Conversations: Measuring Content and Identity Exposure on AI Chatbots," written by professors at the University of California at Davis, shares findings on privacy across mobile applications, browser extensions and plugins, and chatbots embedded in websites.

The goal of this study was to analyze tracking behavior of the chatbot provider’s web interface during a user-initiated conversation.

Researchers used a controlled test prompt to capture and compare network traffic across 20 chatbots in normal and private chat modes.

advertisement

advertisement

One prompt, "pregnancy test near me," was chosen for each test. All were conducted in Google Chrome because it provides a baseline for tracking caused by chatbot website design and third-party integrations, without browser interference.

The traffic analysis was conducted with four stages: preprocessing to decompress and decode encoded payloads, defining search targets across content and identity categories, matching those targets against captured traffic, and attributing matches to the receiving parties.

Forty-seven unique third-party owners and 178 distinct chatbot-to-third-party domain pairs were identified during normal chat sessions.

Analytics services appeared on 17 of 20 chatbots. Advertising services appeared on 12 of 20. Despite appearing on fewer chatbots than analytics, advertising accounted for 73 of 178 observed pairs.

Interestingly, Gemini, Meta AI, and Duck.ai did not contact any external third parties. But this does not mean those services did not send data, because it's clear that Gemini contacted several Google-owned domains such as Google Analytics, Google Tag Manager, and Google static content. Since these are Google domains, the classified them as platforms rather than those owned by third parties.

Meta AI loads content from Facebook's CDN. Duck.ai sends requests to duckduckgo.com. The study notes that Duck.ai strips user IP addresses before forwarding requests to AI providers.

SeaArt, however, in one session, contacted 13 advertisers including A8, Amazon, Google, Meta, Microsoft, Outbrain, Pinterest, Quora, Reddit, TikTok, X, Yahoo Japan, and Yandex. Genspark contacted a comparable number in the same test.

Content captures what the user asks for in the conversation, including the user’s exact prompt, readable assistant-response text when present in captured payloads, keywords drawn from the prompt generated chat titles derived from the prompt, chat identifiers and unique URLs to individual chats.

Identity captures who the user is or how the user can be recognized across requests.

Data transferred includes the account’s first name, last name, display name, email address, internal user identifiers when observable, and any first-party cookies sent to third parties, and more.

What makes this a little scary for user privacy is that four chatbots -- Genspark, SeaArt, ChatOn, and Microsoft Copilot -- embed Microsoft Clarity. The first three int that list transmit conversations in plain text. 

Chatbots assign a per-conversation URL that appears in the browser address bar. The study identified that 15 of the 20 chatbots transmit this URL to at least one third party through standard analytics or advertising tags, reaching 29 destinations.

For example, Meta Pixel's dl parameter, Google Analytics's collect endpoint, Bing's UET tag, and DoubleClick's conversion tags all collect the current page URL by default. 

One documented data path highlights Genspark, which transmits the full prompt to Google Maps.

According to the paper, Genspark transmits the full prompt to Google Maps by embedding it as a URL in a map widget query.

The researchers suggest a privacy type of design for the data to pass along only the minimum information needed to serve the map and information.

The entire paper can be found here.

Next story loading loading..