Nearly a dozen years after Internet pioneer Tim Berners-Lee
warned that referrer headers could leak information about Web
users, such headers have become the hottest privacy issue to hit the courts.
In the last few weeks, Facebook, Zynga and Google have all been hit with potential class-action privacy lawsuits. In
each case, Web users allege that information that could identify them was leaked via referrer headers.
The allegations against Facebook and Zynga, a gaming developer, are relatively
straightforward: Those companies allegedly transmitted users' Facebook IDs -- which contained enough data to allow people to be identified -- to advertisers.
The allegations against Google
appear to be more complicated. The search company also allegedly transmits information in referrer headers -- but Google allegedly transmits queries, not user IDs. In themselves, however, queries
don't necessarily contain enough information to allow Web site operators to figure out visitors' identities. Even when the queries consist of users' names, landing page operators don't know whether
those users were conducting vanity searches on their own names or were searching for other people.
The complaint against Google, filed last week in federal district court in San Jose by
online user Paloma Gaos, references the AOL Data Valdez -- AOL's decision to release three months worth of search queries for 650,000 users. Within days of the data breach, The New York Times
"de-anonynmized" user Thelma Arnold and ran a front-page profile of her.
That example certainly shows that separate pieces of "anonymous" information, when taken together, can result in the
identification of a specific person. But the Data Valdez incident also involved many search queries from the same individual. It's not clear that a publisher who only receives a handful of queries
from the same user will be able to compile enough information to figure out that person's name.
Still, Google could do a lot more to tell search users that their queries -- and, in many
cases, their IP addresses -- will be passed along to their landing pages. Users could then at least make an informed decision about whether they really want to click on a result after putting their
names or other potentially sensitive information into the query box.