PROCHLO: Strong Privacy for Analytics in the Crowd

PROCHLO: Strong Privacy for Analytics in the Crowd

PROCHLO: Strong Privacy for Analytics in the Crowd Andrea Bittau? Ulfar´ Erlingsson? Petros Maniatis? Ilya Mironov? Ananth Raghunathan? David Liez Mitch Rudominer◦ Ushasree Kode◦ Julien Tinnes◦ Bernhard Seefeld◦ ?Google Brain zGoogle Brain and U. Toronto ◦Google Abstract 1. Introduction The large-scale monitoring of computer users’ software Online monitoring of client software behavior has long been activities has become commonplace, e.g., for application used for disparate purposes, such as measuring feature adop- telemetry, error reporting, or demographic profiling. This tion or performance characteristics, as well as large-scale paper describes a principled systems architecture—Encode, error-reporting [34]. For modern software, such monitoring Shuffle, Analyze (ESA)—for performing such monitoring may entail systematic collection of information about client with high utility while also protecting user privacy. The ESA devices, their users, and the software they run [17, 60, 69]. design, and its PROCHLO implementation, are informed by This data collection is in many ways fundamental to modern our practical experiences with an existing, large deployment software operations and economics, and provides many clear of privacy-preserving software monitoring. benefits, e.g., it enables the deployment of security updates With ESA, the privacy of monitored users’ data is guaran- that eliminate software vulnerabilities [62]. teed by its processing in a three-step pipeline. First, the data For such data, the processes, mechanisms, and other is encoded to control scope, granularity, and randomness. means of privacy protection are an increasingly high-profile Second, the encoded data is collected in batches subject to concern. This is especially true when data is collected au- a randomized threshold, and blindly shuffled, to break linka- tomatically and when it is utilized for building user profiles bility and to ensure that individual data items get “lost in the or demographics [21, 69, 71]. Regrettably, in practice, those crowd” of the batch. Third, the anonymous, shuffled data is concerns often remain unaddressed, sometimes despite the analyzed by a specific analysis engine that further prevents existence of strong incentives that would suggest otherwise. statistical inference attacks on analysis results. One reason for this is that techniques that can guarantee pri- ESA extends existing best-practice methods for sensitive- vacy exist mostly as theory, as limited-scope deployments, data analytics, by using cryptography and statistical tech- or as innovative-but-nascent mechanisms [5, 7, 25, 28]. niques to make explicit how data is elided and reduced in We introduce the Encode, Shuffle, Analyze (ESA) archi- precision, how only common-enough, anonymous data is an- tecture for privacy-preserving software monitoring, and its alyzed, and how this is done for only specific, permitted pur- PROCHLO implementation.1 The ESA architecture is in- poses. As a result, ESA remains compatible with the estab- formed by our experience building, operating, and maintain- lished workflows of traditional database analysis. ing the RAPPOR privacy-preserving monitoring system for Strong privacy guarantees, including differential pri- the Chrome Web browser [28]. Over the last 3 years, RAP- vacy, can be established at each processing step to defend POR has processed up to billions of daily, randomized re- against malice or compromise at one or more of those steps. ports in a manner that guarantees local differential privacy, PROCHLO develops new techniques to harden those steps, without assumptions about users’ trust; similar techniques including the Stash Shuffle, a novel scalable and efficient have since gained increased attention [6,7,70,74]. However, oblivious-shuffling algorithm based on Intel’s SGX, and new these techniques have limited utility, both in theory and in applications of cryptographic secret sharing and blinding. our experience, and their statistical nature makes them ill- We describe ESA and PROCHLO, as well as experiments suited to standard software engineering practice. that validate their ability to balance utility and privacy. Our ESA architecture overcomes the limitations of sys- tems like RAPPOR, by extending and strengthening current Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee best practices in private-data processing. In particular, ESA provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, enables any high-utility analysis algorithm to be compatible contact the owner/author(s). SOSP ’17, October 28, 2017, Shanghai, China with strong privacy guarantees, by appropriately building on c 2017 Copyright held by the owner/author(s). users’ trust assumptions, privacy-preserving randomization, ACM ISBN 978-1-4503-5085-3/17/10. DOI: https://doi.org/10.1145/3132747.3132769 Reprinted from SOSP ’17,, [Unknown Proceedings], October 28, 2017, Shanghai, China, pp. 1–19. 1 PROCHLO combines privacy with the Greek word ´oqloc for crowds. 1 and cryptographic mechanisms—in a principled manner, en- cryptographic primitives for secret-shared, blind data compassing realistic attack models. thresholding, reducing users’ trust requirements. With ESA, software users’ data is collected into a database • We evaluate PROCHLO on two representative monitoring of encrypted records to be processed only by a specific anal- use cases, as well as a collaborative filtering task and ysis, determined by the corresponding data decryption key. a deep-learning task, in each case achieving both high- Being centralized, this database is compatible with existing, utility and strong privacy guarantees in their analysis. standard software engineering processes, and its analysis may use the many known techniques for balancing utility On this basis we conclude that the ESA architecture for and privacy, including differentially-private release [27]. monitoring users’ data significantly advances the practical The contents of a materialized ESA database are guaran- state-of-the-art, compared to previous, deployed systems. teed to have been shuffled by a party that is trusted to remove implicit and indirect identifiers, such as arrival time, order, 2. Motivation and Alternatives or originating IP address. Shuffling operates on batches of data which are collected over an epoch of time and reach The systems community has long recognized the privacy concerns raised by software monitoring, and addressed them a minimum threshold cardinality. Thereby, shuffling ensures that each record becomes part of a crowd—not just by hiding in various ways, e.g., by authorization and access controls. timing and metadata, but also by ensuring that more than a Notably, for large-scale error reporting and profiling, it (randomized) threshold number of reported items exists for has become established best practice for systems software every different data partition forwarded for analysis. to perform data reduction, scrubbing, and minimization— Finally, each ESA database record is constructed by spe- respectively, to eliminate superfluous data, remove unique cific reporting software at a user’s client. This software en- identifiers, and to coarsen and minimize what is collected sures that records are encoded for privacy, e.g., by removing without eliminating statistical signals—and to require users’ direct identifiers or extra data, by fragmenting to make the opt-in approval for each collected report. For example, large- data less unique, or by adding noise for local differential pri- scale Windows error reporting operates in this manner [34]. vacy or plausible deniability. Encoded data is transmitted us- However, this established practice is ill-suited to auto- ing nested encryption to ensure that the data will be shuffled mated monitoring at scale, since it is based solely on users’ and analyzed only by the shuffler and analyzer in possession declared trust in the collecting party and explicit approval of of the corresponding private keys. each transmitted report, without any technical privacy guar- ESA protects privacy by building on users’ existing trust antees. Perhaps as a result, there has been little practical relationships as well as technical mechanisms. Users in- deployment of the many promising mechanisms based on dicate their trust assumptions by installing and executing pervasive monitoring developed by the systems community, client-side software with embedded cryptographic keys for e.g., for understanding software behavior or bugs, software a specific shuffler and analyzer. Based on those keys, the performance or resource use, or for other software improve- software’s reporting features encode and transmit data for ments [13, 39, 46, 59, 61, 77]. This is ironic, because these exclusive processing by the subsequent steps. Each of these mechanisms rely on common statistics and correlations— processing steps may independently take measures to pro- not individual users’ particular data—and should therefore tect privacy. In particular, by adding random noise to data, be compatible with protecting users’ privacy. To serve these purposes, the ESA architecture is a flex- thresholds, or analysis results, each step may guarantee dif- ferential privacy (as discussed in detail in x3.5). ible platform that allows high-utility analysis of software- monitoring data,

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    19 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us