The Federal Trade Commission and privacy advocates have challenged that premise, but it's not clear whether Web companies, or the public at large, is convinced that individuals can be identified based on "anonymous" data collection.
Now, two researchers at the University of Texas at Austin have authored a paper attempting to debunk once and for all the concept that "anonymous" means private. In the report, "De-anonymizing Social Networks," the authors say they were able to use "anonymous" information to unmask one in three Twitter users' identities who also had accounts on Flickr.
"The main lesson of the paper is that anonymity is not sufficient for privacy when dealing with social networks," the report concludes.
The authors also cited some Web companies, including behavioral targeting company NebuAd, that attempted to claim they didn't pose privacy risks because they only dealt with anonymous information.
"Anonymity has been unquestionably interpreted as equivalent to privacy in several high-profile cases of data-sharing," state the authors, Arvind Narayanan and Vitaly Shmatikov. "The CEO of NebuAd, a U.S. company that offers targeted advertising based on browsing histories gathered from ISPs, dismissed privacy concerns by saying that 'We don't have any raw data on the identifiable individual. Everything is anonymous.'"
NebuAd's former CEO Bob Dykes stepped down soon after making that statement to The New York Times. The company then retreated from its plan to purchase information about Web users' activity from broadband providers.
The researchers say social networking sites "should stop relying on anonymization as the 'get out of jail' card insofar as user privacy is concerned."
While the specific research into social networking sites is new, some say it's always been possible -- online or offline -- to unmask some "anonymous" speakers, given sufficient information.
"When you leave a data trail behind you, there is always some potential that with some level of work, somebody can tie that to your real identity," said Jules Polonetsky, director of the think tank Future of Privacy Forum. "The more data you leave behind, the greater the probability that somebody could put together that information," he said.
For instance, in the book publishing world, a "literary forensics" specialist unmasked Joe Klein as the author of "Primary Colors" by analyzing the writing.
While I agree with the premise that if one possesses enough data on a specific individual, one has a better possibility of identifying that individual, however that takes into account that one possesses, or has access to, that data and how that data is formatted. For instance if one has a data warehouse that has anonymous data segmented into a huge number of targeting buckets and that is used to target advertising, how is it possible that a malcontent can gain access to and manipulate it so that individuals are identified?
At some point in time, the "data owners (whoever they may be)" need to be trusted to keep that data secure. With Level Three or higher security, can data be stolen and manipulated into reveling its secrets? Probably not. With social network sites, can that data be compromised? Maybe, but with Privacy and Security Policy updates, this becomes more and more remote.
The bottom line is that the data owners need to be shown, one way or another, that they are responsible for data security and if that is compromised then there are financial disincentives. Since it is in the interests of our industry to keep data from being manipulated so that it reveles individual's identity, companies need to be vigilant in their enforcement of Policies. I work for a company involved with BT and because of Policies and systems in place, the possibility of the data being compromised or manipulated into reveling PII or individual identities is non existent.
Another study that really brings nothing new to the table and once again brings forward unnecessary privacy concerns about online data collection.
Arvind Narayanan and Vitaly Shmatikov could have conducted a similar study on offline marketing tactics. What is different here than InfoUSA having employees sitting in county clerks offices gathering information on new business starts? Or how about conducting data cleansing technique utilizing USPS' NCOA list to create a New Mover file? There really is no difference other than in the offline world it is done very visibly and they intentionally seek out the PII but no privacy advocacy concerns exist?
The FTC and privacy advocates must find some middle ground to protect the 4th amendment while preserving the 1st amendment. More weight should not be applied to one side of the argument. This can be done be outlawing certain practices such as desktop application data collection, computer registry manipulation, and purchasing of ISP data. When you take these three data collection techniques out of the mix, all you are left with is data from the browser.
I lean towards letting companies like Microsoft and Mozilla who understand security and privacy lead this charge. They have the experience and knowledgebase that our Congress and privacy advocates lack to address issues about rogue data collection.
Tim, I find your comments extremely accurate and relevant. Thanks for taking the time to share them.