Commentary

Dispelling The Myth Of 'Anonymous' Data

by Wendy Davis , Staff Writer, August 12, 2009

Until relatively recently, ad industry executives tended to talk about the differences between "personally identifiable" and "non-personally identifiable" information when they discussed privacy.

But in the last three years, it's become apparent that the difference between personally identifiable and non-personally identifiable can be illusory. Thanks to AOL, we know definitively that computer users can be identified simply by examining their search queries. In 2006, AOL released search logs showing queries made by more than 650,000 members. While the company changed people's IP addresses, the queries themselves were sufficiently detailed that The New York Times was able to find and profile one "anonymized" user, Thelma Arnold, within days.

And thanks to Netflix, we also know that movie reviewers who post critiques of obscure films can be identified, even when they write pseudonymously.

Given the fact that people can piece together Web users' identities without directly collecting names or addresses, regulators are backing away from the idea that collecting personal data always requires more safeguards than collecting supposedly anonymous data.

But the court system has been slow to acknowledge just how quickly users can be identified based on anonymous information. In June, a federal judge in Seattle ruled that IP addresses aren't personal information. There, the court ruled that Microsoft didn't violate its user agreement by collecting IP addresses of users, even though the agreement said the company would only gather data that doesn't personally identify users.

Additionally, last summer a federal judge in New York ordered YouTube to provide Viacom with the IP addresses of users, as part of Viacom's copyright infringement lawsuit against the video-sharing site. The judge wrote at the time that IP addresses alone can't identify users. (The companies later agreed that Google would replace the actual IP address with a substitute.)

Now, a federal judge in Kentucky has ruled that a nursing student at the University of Louisville didn't reveal personally identifiable information when she posted information on MySpace about a patient who had just given birth to a baby girl.

"The blog post does not disclose the birth mother's name, address, social security number, or the like. It does not disclose her age, race, or ethnicity. The blog post does not contain 'financial' or 'employment related information' about the birth mother. It does not disclose where she was in labor," the court wrote.

Santa Clara University law professor Eric Goldman thinks the judge made the wrong call on that narrow point. "I'm confident that any savvy investigator could combine the blog post with other data sources and quickly identify the mom with a high degree of certainty," he writes.

On the other hand, the Citizen Media Law Project points out that medical personnel would never be able to discuss their professional experiences if that kind of post was held to violate confidentiality.

Still, it doesn't appear as if the judge in this case seriously balanced the risk of de-anonymization against the student's free speech right to blog about her work. Technology often moves faster than the legal system. But when judges are called on to determine matters involving online privacy, one would think they would at least keep up with what's been happening in the last few years, as opposed to relying on outdated definitions of personal information.

privacy

1 comment about "Dispelling The Myth Of 'Anonymous' Data".

Check to receive email when comments are posted.

Joshua Koran from VCLK, August 15, 2009 at 10:45 a.m.
How many pieces of anonymous data do you need to create PII? Depends if you define PII as "personally identifiable information" or "uniquely identifiable anonymous id".
There is a clear difference between a "uniquely identifiable anonymous id" like a cookie or IP address and a "uniquely identifiable personal id" like a name or address. The former cannot be used to identify the latter without a merger with PII.
If a log file contains searches which contain PII, there is still no way to know if the anonymous searcher is searching on their own or someone else's PII without a merger with a PII data source.
If a set of movie habits is joined across IMDB searching and Netflix ratings, in a small set of cases it can uniquely identify an anonymous id. However, one must still merge that id with another PII source to personally identify the movie fan behind the id and/or verify that the registered pseudonym is in fact their actual name. The advertising industry self-regulatory guidelines state that such a merger requires prior opt-in consent of the user.
Reply

Next story loading

About the Author

Wendy Davis is a Senior Writer at MediaPost. You can reach Wendy at wdavis@mediapost.com