Google has been working on ways to make differential privacy more accessible and usable, and on Friday, Data Privacy Day, announced a milestone for its differential privacy framework.
The framework will allow any Python developer to process data with differential privacy -- defined as a system for publicly sharing information about a dataset by describing the patterns of groups in the dataset.
Google worked on the development for more than a year in partnership with OpenMined, an organization of open-source developers.
Its differential privacy library previously was available in three programming languages. Now it’s also available in Python.
The move givesmillions more developers, researchers, and companies can build applications with privacy technology, enabling them to gain insights and observe trends from their datasets while protecting and respecting the privacy of individuals, according to Miguel Guevara, product manager of privacy and data protection officer at Google.
“The library is unique as it can be used with Spark and Beam frameworks, two of the leading engines for large data processing, yielding more flexibility in its usage and implementation,” he wrote in a post.
Companies have already begun experimenting with new projects, such as showing a site’s most visited webpages on a per country basis in an aggregate and anonymized way.
Google has also released a differential privacy tool that enables users to visualize and better tune the parameters used to produce differentially private information.
A paper published today shows how sharing the techniques can efficiently scale differential privacy, which has become a standard for private data analysis, to datasets of a petabyte or more.
Along with the Differential Privacy Framework, Google created Plume, a system built to address various challenges. The paper describes several, sometimes subtle, implementation issues and offer practical solutions that, together, make an industrial-scale system for differentially private data analysis possible.
Plume is currently at Google and is routinely used to process datasets with trillions of records.