News Sites Push For More Robots.txt Control
A global group of publishers is rallying behind a set of extensions to the robots.txt protocol called Automated Content Access Protocol (ACAP). Comprised mainly of publishers and news organizations like the Associated Press, the consortium is fighting for more control over what content the search engines can gain access to.
ACAP currently only offers provisions for text and still images, but publishers can choose to limit how long the engines can cache their content, enforce page-wide no follows, and other options. The protocol has been tested with French search engine Exalead for functionality.
Publishers -- and more importantly, the engines -- will have to adopt the protocol on a large scale for it to gain traction. Google for example, is arguing that ACAP hasn't been proven effective for publishers outside of the news media. And Danny Sullivan notes that while robots.txt is definitely in need of an upgrade, online retailers and blog sites might need provisions that ACAP doesn't cover.