Commentary

Beyond Privacy: Data for All and All for Data

Here's a thought to ponder over the long weekend: what if open access to the mountains of data so many of us are trying to hide or protect really could be understood as a good thing? What if the focus of the debate over data shifted from an obsession over privacy to a creative discussion of data's possible social and policy utility? Instead of focusing energy on limiting data use, how about redirecting energies to optimizing the sharing of data?

A bold and fascinating new perspective on the data debates is emerging this season from academia in the form of an article called "The Tragedy of the Data Commons" forthcoming in the Harvard Journal of Law and Technology. Visiting Assistant Professor of Law, Brooklyn Law School Jane Yakowitz takes issue with some of the core preoccupations of privacy regulators and argues for a "Data Commons" in which anonymized public research data could be pooled and used by researchers, policy-makers, and scholars to understand ourselves better and plot smarter policy.

advertisement

advertisement

"Anonymized data is crucial to beneficial social research, and constitutes a public resource -- a commons -- under threat of depletion," she says. In her argument, opting out of sharing data is reducing the pool and accuracy of information that could be invaluable to policy. She uses as examples research that linked data sets to prove, for instance, the relationship between legalized abortion and dropping crime rates or the disproving of arguments about racial differences in education testing. "These studies and many others have made invaluable contributions to public discourse and policy debates, and they would not have been possible without anonymized research data -- what I call the 'data commons.'"

Yakowitz is talking specifically about noncommercial data, the data gleaned by researchers and the government from tax returns, medical records, test scores, etc. But her ideas could have impact on the larger discussion of all data. According to one report, she visited Google to discuss her ideas earlier this year.

The final paper is set to be published in the fall, but drafts have been circulating and causing discussion. The arguments are too involved to do them justice here. But among her stronger points is that regulations and legislation, even when they are designed to protect highly sensitive data like medical information, can unwittingly restrict good uses of the data. She reinforces the principle that many people in the commercial data field have known for years: you never know what you have. The ways in which data can be drilled, parsed and combined in socially productive ways is not predictable, and it certainly is not clear to any individual protecting her data or implicit in any data set. We just don't know that information could be used for positive purposes until creative minds mix and match it. That is why she emphasizes creating a system of open access, where properly anonymized data is available without restriction to all.

"Today we get the worst of both worlds," she writes. "Data can be shared through licensing agreements to whomever the agency chooses, and privacy law and norms provide the agency with an excuse beyond reproach when the agency prefers secrecy to transparency."

On the subject of anonymization, she acknowledges there are risks involved in sharing data that theoretically might be tracked back to individual users. She unravels some of the myths surrounding anonymization and argues that the studies often cited to prove anonymized data can lead back to PII are misused and largely hypothetical. "In considering a public use dataset's disclosure risk, data archivists focus on marginal risks -- that is, the increase in risk of the disclosure of identifiable information, compared to the pre-existing risks independent from the data release," she says.

And she follows with a very interesting and provocative conclusion on the subject of anonymization: "Like any default hypothesis, the best starting point for privacy policy is to assume that reidentification does not happen until we have evidence, any evidence at all, that it does. Because there are lower hanging fruit for the identity thief and the behavioral marketer -- blog posts to be scraped and consumer databases to be purchased -- the thought that these personae non gratae are performing entropic deanonymization algorithms is implausible."

As I say there is much more to this argument that I can offer fully and completely here. It is on the face of it a refreshing perspective that opens up the discussion of data beyond the narrow concerns of privacy alone. The full draft of Yakowitz's upcoming JOLT article is available via the Social Science Research Network.

1 comment about "Beyond Privacy: Data for All and All for Data ".
Check to receive email when comments are posted.
  1. Jean Renard from TRM Inc., May 27, 2011 at 4:03 p.m.

    The problem with good data is that it ends up a powerful tool. I have rarely seen good data used for good, whereas I have seen it used for control and exploitation.

    If the governments that are being rocked with unrest had access to the kind of data proposed, they would find a way to act on it to prevent revolts. Until our humanity catches up to our technology as Einstein put it, I would caution against opening up even more floodgates to data. At least now there is some control, even if it is only via market competitiveness.

Next story loading loading..