Most experts approve of anonymizing data — it’s one way to protect privacy and comply with GDPR. And we all value our anonymity. But those two terms are not the same, according to a new
article in Nature Communications.
The authors — Luc Rocher, Julien M. Hendrickx and Yves-Alexandre de Montjoye — found that individuals can be re-identified through
reverse engineering. To use a phrase coined by privacy expert Martin Abrams, data providers can easily bring together “the shadow you and the real you.”
The authors say they
validated a statistical model to “quantify the likelihood for a re-identification attempt to be successful, even if the dataset is heavily incomplete.”
Applying that model, they
found that “99.98% of Americans would be correctly re-identified in any dataset using 15 demographic attributes.”
And an attacker “can, using our model, correctly re-identify
an individual with high likelihood even if the population uniqueness is low.” The study challenges claims that a low population uniqueness is sufficient to protect people’s
privacy.”
The authors note that de-identification, “the process of anonymyzing datasets before sharing them has been the main paradigm used in research and elsewhere to share data
while preserving people’s privacy."
However, “numerous supposedly anonymous data sets have recently been released and re-identified,” the report continues.
The
authors conclude that “even heavily sampled anonym zed datasets are unlikely to satisfy the modern standards for anonymization set forth by GDPR and seriously challenge the technical and
legal adequacy of the de-identification release-and-forget model.”