"De-anonymizing Social Networks" is a turgid read to be sure, and our eyes glaze over when they start throwing in equations. But this is the sort of academic scrutiny of privacy that is getting funded (partially by National Science Foundation grants in this case) and will form a bedrock of research about the technical possibilities around identity protection and theft. The outlines of this research are worth reviewing because they reveal the contours of what privacy protection policies will have to address in dealing with social networks.
The basic approach taken by these University of Texas researchers involves overlaying anonymized data from multiple social networks in a way that ultimately reveals identity. The paper does not argue that PII is available directly to third parties, advertisers or malicious hackers, but that anonymized social graphs that are available can be combined to render identities.
The researchers claim that anonymized social graphs are available from many social networks to academic and government data mining, advertising, third-party applications developers and network aggregators. For instance, the paper cites a few instances where widget makers and application developers exposed sensitive information about users on the network. Whether it is through available APIs into a social network or just profile scraping "it is important to understand what a malicious third party application can learn about members of a social network even if it obtains the data in an anonymous form," the researchers argue.
The aim of the research was to determine whether sensitive information about individuals could be extracted from "anonymous" social graphs. The reserarchers used social graph data from Twitter and Flickr, whose APIs they say include mandatory username fields and optional name and location fields. Also included in the research was a graph of friend relationships from the LiveJournal blogging service. They used a series of algorithms (which make our head hurt just thinking about them) to find common people across two networks and then mapped out from there the social graphs to find common cliques.
We wish we could say with some high degree of confidence how these guys managed the trick of de-anonymizing social network members via algorithms, but we confess to some measure of befuddlement over the equations behind the test and the opaqueness of the terminology. Once the basic matches were established, however, the researchers claimed that in 30.8% of the cases their algorithms correctly re-identified previously anonymized members of the social graph. We're still not clear from the paper what the level of identification of individuals was in this experiment.
The implications of this research are twofold, at least. First, the researchers seem to be demonstrating that simple algorithms can be applied by just about any advertiser or developer on readily available anonymous social graphs to render identities. The test claims to have successfully "re-identified" thousands of Twitter users simply by cross-referencing anonymized social graphs with Flickr. In fact, these researchers said their assumptions in making the algorithms were conservative and that in the real world overlap of massive networks like Facebook and MySpace it should be much easier to re-identify higher percentages of common users, if only because so many people readily identify themselves on one or both of the networks. As the social networks get larger and overlap their membership, the research argues, it will become much easier to thwart anonymity and to do so easily. Even anonymized data contains attributes that can make the process easy.
The researchers feel that the potential for abuse is clear and present. Any potential solution would appear to necessitate a fundamental shift in business models and practices and clearer privacy laws on the subject of Personally Identifiable Information."
Perhaps more to the point, the researchers are arguing that "anonymity" online does not equal privacy. In fact, they seem to be proving privacy can be breached on social networks at least fairly easily.
At their clearer online FAQ on the research, the team says it does not expect a technology solution to the problem. "First, the false dichotomy between personally identifiable and non-personally identifiable information should disappear from privacy policies, laws, etc." the team says in its online FAQ. "Any aspect of an individual's online personality can be used for de-anonymization, and this reality should be recognized by the relevant legislation and corporate privacy policies."
Just as the lines between PII and non-PII have themselves become blurred, the associations between anonymity and privacy growing more are complex. As Jules Polonetsky of the Future of Privacy Forum tells us in response to this new research, "It's important to understand that personal and anonymous are not black and white. There are gradients. Any time there are a number of pieces of data about one user, there is an increasing potential they can be identified. For most users it's rather unlikely that anyone will go through the trouble, but it is important to understand that as you build an electronic trail, identification becomes possible."
Hearty readers and cosine freaks can download the original paper or view an online version at http://randomwalker.info/social-networks/index.htm.