Differential Privacy Under Continual Observation

Differential Privacy Under Continual Observation ¤ Cynthia Dwork Moni Naor Microsoft Research Weizmann Institute of Science Mountain View, CA, USA Rehovot, Israel [email protected] [email protected] y z Toniann Pitassi Guy N. Rothblum University of Toronto Princeton University Toronto, Canada Princeton, NJ, USA [email protected] [email protected] ABSTRACT General Terms Di®erential privacy is a recent notion of privacy tailored to Algorithms, Security, Theory, Privacy, Private Data Analy- privacy-preserving data analysis [10]. Up to this point, re- sis search on di®erentially private data analysis has focused on the setting of a trusted curator holding a large, static, data 1. INTRODUCTION set; thus every computation is a \one-shot" object: there is Di®erential privacy is a recent privacy guarantee tailored no point in computing something twice, since the result will to the problem of statistical disclosure control: how to pub- be unchanged, up to any randomness introduced for privacy. licly release statistical information about a set of people However, many applications of data analysis involve re- without compromising the privacy of any individual [10] peated computations, either because the entire goal is one (see [7, 8, 6] for motivation of the de¯nition, history, ba- of monitoring, e.g., of tra±c conditions, search trends, or sic techniques, and pointers to recent results). In a nutshell, incidence of influenza, or because the goal is some kind of di®erential privacy requires that the probability distribution adaptive optimization, e.g., placement of data to minimize on the published results of an analysis is \essentially the access costs. In these cases, the algorithm must permit con- same," independent of whether any individual opts in to, or tinual observation of the system's state. We therefore initi- opts out of, the data set. (The probabilities are over the ate a study of di®erential privacy under continual observa- coin flips of the privacy mechanism.) Statistical databases tion. We identify the problem of maintaining a counter in are frequently created to achieve a social goal, and increased a privacy preserving manner and show its wide applicability participation in the databases permits more accurate anal- to many di®erent problems. yses. The di®erential privacy guarantee supports the social goal by assuring each individual that she incurs little risk by Categories and Subject Descriptors joining the database: anything that can happen is essentially equally likely to do so whether she joins or abstains. F.2.0 [Theory of Computation]: Analysis of Algorithms Up to this point, research on di®erentially private data and problem complexity|General analysis has focused on the setting of a trusted curator holding a large, static, data set, in a permanently infrangible storage system. The curator either responds to queries (the ¤Research supported by a grant from the Israel Science interactive case) or prepares some sort of summary or syn- Foundation. Part of this work was done while visiting MSR thetic database intended to answer all queries of a particu- and the Center for Computational Intractability at Prince- lar type (the non-interactive case)1. A line of work investi- ton University. gates the way in which the accuracy of the responses to the yResearch supported by Nserc. Part of this work was done queries may need to deteriorate with the number, sensitivity while visiting MSR. (a metric describing how much the addition or deletion of a zResearch supported by NSF Grants CCF-0635297, CCF- member of the database can change a®ect the outputs), and 0832797 and by a Computing Innovation Fellowship. Parts geometry of the query sequence [5, 11, 15, 16], and general of this work were done while visiting and interning at MSR. techniques have been developed that in some cases match these bounds [14, 2, 10, 18, 16, 3, 13]. In all previous work every computation is a \one-shot" object: there is no point in computing something twice, since the result will be unchanged, up to any perturbations in- Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are troduced for privacy. However, many applications of data not made or distributed for profit or commercial advantage and that copies analysis involve repeated computations, either because the bear this notice and the full citation on the first page. To copy otherwise, to 1 republish, to post on servers or to redistribute to lists, requires prior specific The one exception considers a distributed setting, in which permission and/or a fee. each data holder controls her own data and decides in which STOC’10, June 5–8, 2010, Cambridge, Massachusetts, USA. analyses to participate [9]. However, this is done by emulat- Copyright 2010 ACM 978-1-4503-0050-6/10/06 ...$10.00. ing the curator via secure multiparty computation. entire goal is one of monitoring, e.g., of tra±c conditions, risk { of anything { incurred by sharing one's information, search trends, or incidence of influenza, typically with a goal even in the presence of an intrusion, further encouraging of preparing some appropriate response, or the goal is to op- participation. Another motivation is the prevention of \mis- timize the placement of data to minimize access costs. In sion creep" for data, protecting the data curator from the such an application the system is required to continually pro- (very real!) pressure to allow data to be used for purposes duce outputs. We therefore initiate a study of di®erential other than that for which they were collected. This ¯ts privacy under continual observation. We illustrate the de- well with streaming algorithms with small state, which can sign goals of our continual observation algorithms by appeal at most store a few data items, and [12] does focus on the to a real-life example. streaming model. However, nothing prevents a streaming Continual Observation. Consider a website for H1N1 algorithm from storing in its state an event of a person of self-assessment2. Individuals can interact with the site to interest. Event-level pan-privacy rules this out, since the learn whether symptoms they are experiencing may be in- internal state must have essentially the same distribution, dicative of the H1N1 flu. The user ¯lls in some demographic independent of whether an event of the person of interest data (age, zipcode, sex), and responds to queries about his has, or has not, appeared in the stream. symptoms (fever over 100:4±F?, sore throat?, duration of We remark that, since an intrusion can occur at an unpre- symptoms?). We would like to continually analyze aggregate dictable time, designing pan-private algorithms is interest- information of consenting users in order to monitor regional ing even when the system will generate only a single output. health conditions, with the goal, for example, of organizing This was studied in [12]. The current paper addresses pan- improved flu response. Can we do this in a di®erentially pri- privacy under continual observation. We emphasize that vate fashion with reasonable accuracy (despite the fact that continual output and pan-privacy properties are interesting the system is continually producing outputs)? and non trivial to achieve and generally speaking are inde- We would expect a given individual to interact very few pendent of each other3. There combination is of course, is times with the H1N1 self-assessment site. For simplicity, let more powerful then each individually. us say this is just once (the general case is an easy exten- Adjacency notions: User-Level vs. Event-Level Pri- sion). In such a setting, it is su±cient to ensure event-level vacy. We have made the reasonable assumption that a sin- privacy, in which the privacy goal is to hide the presence or gle individual interacts with the H1N1 self-assessment site absence of a single event (interaction of one user with the at most a small number of times (one). Such an assump- self-assessment site). That is, the probability of any output tion is not reasonable for other kinds of websites, such as sequence should be essentially the same, independent of the a search engine. We ask: what kinds of statistics can we presence or absence of any single interaction with the site. gather while preserving the privacy of an individual's entire The privacy statement at the H1N1 self-assessment site history of accesses to the website? That is, what can we encourages users to allow their data to be shared: compute if we require that, no matter how many times an individual accesses the website, and no matter how these ac- \This information can be very helpful in monitor- cesses are interleaved with those of others, the distribution ing regional health conditions, plan flu response, on outputs, or, in the case of user-level pan-privacy, the dis- and conduct health research. By allowing the re- tribution on pairs (internal state, output sequence), should sponses to the survey questions to be used for be essentially the same, independent of presence or absence public health, education and research purposes, of of any individual's data in the stream? All the algorithms you can help your community." of [12] enjoy user-level pan-privacy. In this paper we con- struct several algorithms that exhibit user-level pan-privacy But the site also describes conditions under which the data under continual observation. may be disclosed: \...if required to do so by law or in the good faith 2. OVERVIEW OF OUR RESULTS belief that such action is necessary to (a) conform to the edicts of the law or comply with legal pro- Counter.

Load more