Google announced a way for publishers to opt-out of having data from their website used to train the company’s artificial intelligence (AI) models, but keep information accessible to those querying on Google Search.
The new tool, Google-Extended, allows sites to continue to get scraped and indexed by crawlers like the Googlebot while avoiding having the data used to train AI models.
The two tools in question include Bard, Google’s conversational AI tool; and Vertex AI, Google’s machine learning platform for building and deploying generative AI-powered search and chat applications.
Google-Extended will let publishers manage whether their sites help to improve Bard and Vertex AI generative APIs.
Publishers can use the toggle to control access to content on a site.” Google confirmed in July that it’s training its AI chatbot, Bard, on publicly available data scraped from the web.
Google-Extended is available a text file, robots.txt, that informs web crawlers whether they can access certain sites. The company also said that as AI applications expand it will explore other machine-readable approaches.
But as “AI applications expand, web publishers will face the increasing complexity of managing different uses at scale,” Google wrote in a blog post. That's why we're committed to engaging with the web and AI communities to explore additional machine-readable approaches.”
The crawler has been added to the Google Search Central documentation on web crawlers.
It’s important to remember that
Google-Extended uses the same directives as other UAs in the robots.txt file. Marketers wanting to block Google from training any site content, add a directive to disallow all or