Distributed Differential Privacy By Sampling

Joshua Joy University of California - Los Angeles [email protected]

ABSTRACT our privacy-preserving approach utilizing a vehicular crowdsourcing In this paper, we describe our approach to achieve distributed differ- scenario comprising of approximately 50,000 records. In this dataset, ential privacy by sampling alone. Our mechanism works in the semi- each vehicle reports its location utilizing the California Transporta- honest setting (honest-but-curious whereby aggregators attempt to tion Dataset from magnetic pavement sensors (see Section §8.1). peek at the data though follow the protocol). We show that the utility remains constant and does not degrade due to the variance as com- 2 RELATED WORK pared to the randomized response mechanism. In addition, we show Differential privacy [7–10] has been proposed as a mechanism to smaller privacy leakage as compared to the randomized response privately share data such that anything that can be learned if a par- mechanism. ticular data owner is included in the database can also be learned if the particular data owner is not included in the database. To achieve this privacy guarantee, differential privacy mandates that only a sub- 1 INTRODUCTION linear number of queries have access to the database and that noise Personal mobile information is being continuously collected and proportional to the global sensitivity of the counting query is added analyzed with minimal regard to privacy. As we transition from small (independent of the number of data owners). mobile personal devices to large-scale sensor collecting self-driving Distributional privacy [1] is a privacy mechanism which says vehicles the needs of privacy increase. that the released aggregate information only reveals the underlying The self-driving vehicle scenario becomes more difficult to pre- ground truth distribution and nothing else. This protects individual serve both accuracy and privacy due to the smaller number of data data owners and is strictly stronger than differential privacy. How- owners on the order of hundreds as opposed to large scale deploy- ever, it is computationally inefficient though can work over a large ments of millions of data owners that learn only the heavy hitters class of queries known as Vapnik-Chervonenkis (VC) dimension. (top-k queries). Zero-knowledge privacy [14] is a cryptographically influenced While many privacy definitions have been proposed, differential privacy definition that is strictly stronger than differential privacy. privacy has emerged as the gold standard for privacy protection. Dif- Crowd-blending privacy [13] is weaker than differential privacy; ferential privacy essentially states that whether or not a single data however, with a pre-sampling step, satisfies both differential privacy owner decides to participate in data collection, the final aggregate and zero-knowledge privacy. However, these mechanisms do not information will be perturbed only by a negligible amount. That is, add noise linear in the number of data owners and rely on aggressive the aggregate information released gives no hints to the adversary sampling, which negatively impact the accuracy estimations. about a particular data owner. Techniques, such as the Laplace mech- The randomized response based mechanisms [12, 15, 22, 23] anism, add noise calibrated to the sensitivity of the query output [9], satisfies the differential privacy mechanism as well as stronger mech- though do not perturb which data owners actual participate. Thereby anisms such as zero-knowledge privacy. However, the accuracy of making it difficult to privatize graphical structures such as social the randomized response mechanism quickly degrades unless the networks where it is possible to inform data about a targeted data coin toss values are configured to large values (e.g., greater than owner from the targets friends alone. 80%). Randomized response has been shown to be optimal in the local privacy setting [6]. However, in order to preserve accuracy with the 3 PRELIMINARIES randomized response mechanism, privacy must be sacrificed as the Differential Privacy. Differential privacy has become the gold stan- arXiv:1706.04890v1 [cs.CR] 14 Jun 2017 data owners must respond truthfully too frequently. For example, a dard privacy mechanism which ensures that the output of a sani- data owner should respond truthfully more than 80% of the time to tization mechanism does not violate the privacy of any individual have decent accuracy which greatly minimizes any privacy gains [16, inputs. 17]. The reason is due to the high variance from the coin tosses. As more aggressive sampling is performed, the variance quickly Definition 3.1 ([7, 9]). (ε-Differential Privacy). A privacy mech- increases making it difficult to perform accurate estimation ofthe anism San provides ε-differential privacy if, for all datasets D1 and underlying distribution. D2 differing on at most one record (i.e., the Hamming distance H is As a result of the accuracy problem, there have been various HD1,D2 ≤ 1), and for all outputs O ⊆ RangeSan: privacy-preserving systems which focus on the heavy-hitters only [4, PrSanD1 ∈ O 11]. These techniques ensure privacy only for large populations and sup ≤ expε (1) D ,D PrSanD2 ∈ O can only detect or estimate the most frequently occurring distribu- 1 2 tions, rather than smaller or less frequently occurring populations. That is, the probability that a privacy mechanism San produces In this paper, we show how to achieve differential privacy in the a given output is almost independent of the presence or absence of distributed setting by sampling alone. We evaluate the accuracy of any individual record in the dataset. The closer the distributions are 1 (i.e., smaller ε), the stronger the privacy guarantees become and vice other and effectively cancel the noise, allowing for the aggressive versa. sampling.

Private Write. More generally, we assume some class of private 5.2 Sampling information storage [21] mechanisms are utilized by the data owner Sampling whereby a centralized aggregator randomly discards re- to cryptographically protect their writes to cloud services. sponses has been previously formulated as a mechanism to amplify 4 MOTIVATION privacy [5, 13, 18–20]. The intuition is that when sampling approx- imates the original aggregate information, an attacker is unable to Cryptographic blinding allows a service to compute a function with- distinguish when sampling is performed and which data owners are out the service learning the actual input or output. Thus, crypto- sampled. These privacy mechanisms range from sampling without graphic blinding provides private computation. More formally, a a sanitization mechanism, sampling to amplify a differentially pri- client with value x encrypts their value Ex such that the service can vate mechanism, sampling that tolerates a bias, and even sampling a compute y = f Ex. The client then unblinds y to recover the actual weaker privacy notion such as k-anonymity to amplify the privacy output. The client encryption can be as simple as generating a large guarantees. random value r and adding to the input x and then subtracting r from However, sampling alone has several issues. First, data owners y. that answer “Yes" do not have the protection of strong plausible The randomized response mechanism can be viewed as perform- deniability as they never respond “No" or are “forced“ to respond ing a distributed blinding operation where the privacy is provided “Yes" (e.g., via coin tosses). Data owners that answer “No" do not by the estimation error that occurs when trying to estimate the dis- provide privacy protection as they never answer “Yes". Second, as tributed blinded value. More formally, let YES refer to the aggregate we increase the sampling rate the variance will increase rapidly, thus count of the population that should truthfully respond “Yes" and weakening accuracy. Finally, the privacy strength of the population NO the population that should truthfully respond “No" and s is the does not increase as the population increases. The Yes population is sampling probability. fixed (e.g., those at a particular location) and we can only increase Then the randomized response aggregate count is the following the No population. The No population should also contribute noise YESs1 + YES + NOs2 = AGG. The YES + NOs2 is the distributed by answering “Yes" in order to strengthen privacy. blind value that then is estimated by simply calculating the expected YES sampled value. The final value is estimated by subtracting the 5.3 Sampling and Noise expected value (of the blind) from the aggregate and then dividing by the sampled probability used for the first part of the term. We could leverage the No population by use the same sampling It can be seen that the “distributed blinding" provides a notion rate though for the No population have a portion respond “Yes”. To of privacy. However, the issue is that the estimation error increases perform the final estimation we simply subtract the estimated added rapidly as the NO population increases due to the standard deviation noise. scaling with the population size. The remainder of this paper we illustrate our approach which Sampling and Noise Response. Each data owner privatizes their decreases the estimation error and provides strong notions of privacy. actual value Value by performing the following Bernoulli trial. Let The crux of the approach is to provide the ability to remove the error πs be either the sampling probability for the Yes population as πsYes due to the variance by constructing a system of equations. or for the No population as πsNo .

5 DISTRIBUTED DIFFERENTIAL PRIVACY ( ⊥ with probability 1 − π BY SAMPLING Privatized Value = s (2) Value with probability π First, we define the structure of a distributed sampling private mech- s anism. We then illustrate various mechanisms that satisfy distributed Yes sampling. Finally, we provide the mechanism for preserving accu- That is, a percentage of the population responds “Yes" and a No racy in the distributed privacy model. percentage of the population responds “Yes" (providing noise). However, the Yes data owners do not answer “No" and also do 5.1 Structure not have plausible deniability (that is being forced via coin toss to respond “Yes"). Our mechanism has multiple output value. For our purposes, we use the notation of the Yes population as the ground truth and the re- maining data owners are the No population. Each population should 5.4 Sampling and Plausible Deniability blend with each other such that the aggregate information that is We would like to have a percentage of each population respond released is unable to be used to increase the confidence or inference opposite of their actual value, provide plausible deniability, and have of an adversary that is trying to determine the value of a specific outputs from the space of “Yes", “No", and ⊥ (not participating) in data owner. order for the data owners to blend with each other. Data owners are aggressively sampled (e.g., 5%). To overcome To achieve plausible deniability via coin tosses we have a small the estimation error due to the large variance, the estimation of the percentage of the “Yes" population be “forced" to respond “Yes". noisy “Yes" counts and sampled counts are combined to offset each The other output values follow from the sampling and noise scenario. population as πsYes or for the No population as πsNo . Let πp be either  the plausible deniability parameter for the Yes population as π1 or the ⊥ with probability 1 − πs  Yes No population π respectively. The mechanism that we use which 1 with probability 2 Privatized Value = (3) satisfies distributed sampling is as follows: π × π + 1 − π × π  sYes 1 1 2  0 otherwise  ⊥ with probability    1 − πs + πs ⊥ with probability 1 − π  Yes1 Yes2  sNo   1 with probability πs × π1 1 with probability  Yes1 Privatized Value = (4) Privatized Value = 1 with probability π × π (6) sYes2 2  πsNo × 1 − π1 × π2   0 otherwise 0 with probability   πs × 1 − π +  Yes1 1 A benefit of plausible deniability is that the estimation ofthe   πs × 1 − π2 population will provides privacy protection via noise. However, it Yes2 is difficult to estimate the underlying ground truth due to the added noise. We desire better calibration over the privacy mechanism.  ⊥ with probability 1 − πs  No 1 with probability 5.5 Mechanism Privatized Value = (7) π × π The output mechanism has multiple output values. A fraction of each  sNo 3  of the Yes and No population should be included. Each population 0 otherwise should have some notion of plausible deniability. The total number It should be noted that πs × π1 are the percentage of data of data owners DO can be computed by summing the total number Yes1 owners that answer truthfully “Yes" and πs ×π2 are the percentage ⊥ Yes2 of “Yes", “No", and responses. It should be noted that if the of data owners that are “forced" to respond “Yes" providing the ⊥ data owners would not write it would be difficult to estimate and plausible deniability. Each case has its own coin toss parameters in Yes calculate the underlying count. order to be able to fine tune the variance and reduce the estimation Definition 5.1. (Minimal Variance Parameters) We model the sum error as opposed to the prior examples where the variance cascades of independent Bernoulli trials that are not identically distributed, across terms adding error. as the poisson binomial distribution (we combine “No" and ⊥ into the same output space for modeling purposes). Let Yes′ represent 6 ACCURACY those that respond “Yes", regardless if they are from the Yes or No 6.1 First Attempt Cancelling the Noise population. The success probabilities are due to the contributions from the Let DO be the total number of data owners. Let YES and NO be the Yes and No populations. We then sum and search for the minimum population count of those that truthfully respond “Yes" and “No" variance. respectively such that YES + NO = DO. Utility is maximized when: LEMMA 6.1. (Yes Estimate From Aggregated Count)

minVarP“Yes”|Yes +VarP“Yes”|No (5) Expected value of “Yes" responses is: There are a couple observations. The first is that uniform sampling E“Yes” = × × YES+ across both populations (Yes and No) limits the ability to achieve πYes1 π1 optimal variance. As we increase the No population by increasing the πYes2 × π2 × YES+ (10) queries and the number of data owners that participate, the variance πNo × π3 × DO − YES will correspondingly increase. For example, 10% sampling will incur a large variance for a population of one million data owners. Solving for YES results in: To address this, the sampling parameters should be separately tuned for each population. We desire a small amount of data owners to E“Yes” − πNo × π3 × DO be sampled from the Yes population to protect privacy and an even YES = (11) πYes1 × π1 + πYes2 × π2 − πNo × π3 smaller amount from the No population (as this population will be large and only a small amount is required for linear noise). The The estimator YESˆ Yes accounting for the standard deviation σ“Yes” other observation is that for the plausible deniability, by fixing the is: probabilities the same across the Yes and No population also restricts the variance that can be achieved. ˆ E“Yes” ± σ“Yes” − πNo × π3 × DO Thus, the optimal distributed sampling mechanism is one that YESYes = (12) πYes × π1 + πYes × π2 − πNo × π3 tunes both populations. 1 2 LEMMA 6.2. Standard Deviation of the Aggregated “Yes" Count Mechanism. Let πs be either the sampling probability for the Yes The standard deviation σ“Yes” is: LEMMA 6.6. Standard Deviation of the Sampled Population

Var“Yes” = πYes1 × π1 + πYes2 × π2× The standard deviation σ⊥ is:

1 − πYes1 × π1 + πYes2 × π2×

YES+ Var⊥ = 1 − πYes1 + πYes2 × (13) π + π × πNo × π3× Yes1 Yes2 1 − π × π × YES+ No 3 (23) NO 1 − πNo× π × √ No σ“Yes” = Var“Yes” (14) NO

LEMMA 6.3. (Yes Estimate From Aggregated “No" Count) √ σ⊥ = Var⊥ (24) Expected value of “Yes" responses is: LEMMA 6.7. Solving for YES There are two approaches we can take. We can either use the E“Yes” = πYes1 × 1 − π1 × YES+ “Yes" estimators to estimate the underlying population as described πYes × 1 − π2 × YES+ (15) 2 earlier. Or we can treat the equations as a system of linear equations. πNo × 1 − π3 × DO − YES The observation is that setting π1 = π2 = π3 = 1 results in the Solving for YES results in: standard deviation being equal for σ“Yes”, σ“No”, and σ⊥. This has the effect of resulting in no “No" responses and the two equations E“Yes” − πNo × 1 − π × DO are thus dependant. YES = 3 (16) We have the following system of linear equations of two unknown πYes1 × 1 − π1 + πYes2 × 1 − π2 − πNo × 1 − π3 variables YES and σ as follows: The estimator YESˆ No accounting for the standard deviation σ“No” is: E“Yes” ± σ = Observed“Yes” (25) E“Yes” ± σ“Yes” − πNo × 1 − π × DO YESˆ = 3 (17) E⊥ ± σ = Observed⊥ (26) No π × 1 − π + π × 1 − π − π × 1 − π Yes1 1 Yes2 2 No 3 E“Yes” ± σ + E⊥ ± σ = DO (27) LEMMA 6.4. Standard Deviation of the Aggregated “No" Count The standard deviation σ“No” is: We then solve for YES and σ for each combination of varying ± signs using a solver. We eliminate the solutions which assign YES a negative value. Var“Yes” = πYes1 × 1 − π1 + πYes2 × 1 − π2×

1 − πYes1 × 1 − π1 + πYes2 × 1 − π2× It would be nice if we could cancel out the error. It would also be YES+ nice if the system of linear equations above would have exactly one (18) solution. However, it’s not clear that we can immediately guarantee πNo × 1 − π3× this. 1 − πNo × 1 − π3× NO 6.2 Reducing the Error √ As we control the randomization, can we construct a mechanism σ“No” = Var“No” (19) whereby the error introduced by the NO population cancels out? LEMMA 6.5. (Yes Estimate From Sampled Population) Performing uniform sampling across both YES and NO populations Expected value of ⊥ (not participating) responses is: allows us to cancel the population error though we are not able to precisely estimate the YES population as it also cancels out the YES terms. E⊥ = 1 − πYes1 + πYes2 × YES+ (20) One observation is that the error terms potentially could cancel 1 − π × DO − YES No out if the signs were flipped. Thus, we construct our mechanism as Solving for YES results in: follows. Each data owner responds twice for the same query, though slightly flips a single term to allow for the error cancelation. E⊥ − 1 − πNo × DO YES = (21) 1 − πYes + πYes − 1 − πNo  1 2 ⊥ with probability π  1 ⊥1 The estimator YESˆ ⊥ accounting for the standard deviation σ⊥  ⊥1 with probability πs is:  A Privatized Value = ⊥2 with probability π⊥2 (28)  ⊥ with probability E⊥ ± σ⊥ − 1 − πNo × DO  3 YESˆ ⊥ = (22)   1 − π⊥ − π⊥ − πs 1 − πYes1 + πYes2 − 1 − πNo 1 2   ⊥ with probability π ⊥  0 ⊥0  1 with probability π⊥1    ⊥ ⊥0 with probability πs  2 with probability πs   ⊥ with probability B Privatized Value  1 π⊥1 = ⊥2 with probability π⊥2 (29)   Round One = ⊥2 with probability π⊥ (36) ⊥3 with probability 2    − − − ... 1 π⊥1 π⊥2 πs   ⊥V with probability   1 − π − π − ... − π − πs  ⊥0 ⊥1 ⊥V ⊥  1 with probability π⊥1   ⊥2 with probability πs  ⊥0 with probability π⊥0   YESC Privatized Value = ⊥2 with probability π⊥ (30)  2 ⊥1 with probability π⊥1   ⊥3 with probability ⊥ with probability π   2 ⊥2   1 − π⊥1 − π⊥2 − πs ...  Yes Round Two = ⊥V ′ with probability πv′ (37)   ⊥V ′ with probability πs ⊥ with probability π   1 ⊥1   ... ⊥3 with probability πs   ⊥V with probability NOC Privatized Value = ⊥2 with probability π⊥ (31)  2  1 − π − π − ... − π − π  ⊥0 ⊥1 ⊥V s ⊥3 with probability   1 − π⊥ − π⊥ − πs  1 2 ⊥ with probability π  0 ⊥0  The expected values are as follows. ⊥0 with probability πs  The first set expected values are: ⊥ with probability π  1 ⊥1 No Round Two = ⊥2 with probability π⊥2 (38) E⊥ = π × TOTAL + π × TOTAL ... 1A ⊥1 s   E⊥2 = π × TOTAL (32) ⊥V with probability A ⊥2   1 − π − π − ... − π − π E⊥3A = π⊥3 × TOTAL ⊥0 ⊥1 ⊥V s The expected values are as follows. The second set expected values are: The first round of expected values are:

E⊥1B = π⊥1 × TOTAL E⊥00 = π⊥0 × TOTAL + πs × TOTAL

E⊥2B = π⊥2 × TOTAL + πs × TOTAL (33) E⊥10 = π⊥1 × TOTAL

E⊥3B = π⊥3 × TOTAL E⊥20 = π⊥2 × TOTAL (39) The third set expected values are: ...

E⊥V0 = π⊥V × TOTAL The second round of expected values are: E⊥1C = π⊥1 × TOTAL

E⊥2C = π⊥2 × TOTAL + πs × YES (34) E⊥01 = π⊥0 × TOTAL E⊥3C = π⊥3 × TOTAL + πs × NO E⊥11 = π⊥1 × TOTAL To solve for the YES population as follows: E⊥21 = π⊥2 × TOTAL ... (40) E⊥ − E⊥ 2C 2A E⊥ ′ = π⊥ ′ × TOTAL + πs × TOTAL YES = (35) V1 V πs ... E⊥ = π × TOTAL 6.3 Multiple Values V1 ⊥V We now examine how to privatize for multiple for multiple values. To solve for the YES population as follows: We can achieve this in two rounds. Say there are V possible output ′ values (e.g., V possible location IDs). Let V be the truthful output E⊥ ′ − E⊥ ′ V1 V0 value. YES = (41) πs Locations TossThree( ) SamplingRate( ) Number Stations 7 PRIVACY GUARANTEE π3 πsNo 1 0.98 0.05 3320 7.1 Differential Privacy Guarantee 3 0.98 0.00025 83,000 DDPS satisfies differential privacy. Let “Yes” be the privatized re- 3 0.98 0.000025 830,000 sponse and Yes and No be the respective populations. Table 1: Multiple Locations. A single data owner will claim to be in multiple locations simultaneously. |NO Prπ⊥2 |YES  Prπ⊥3  εDP = max ln ,ln (42) Prπ⊥2 |NO Prπ⊥3 |YES CrowdSize TossThree( ) SamplingRate( ) Population Prπ |YES π + πs π3 πsNo ⊥2 = ⊥2 (43) Prπ |NO π 160 0.98 0.05 48,719 ⊥2 ⊥2 140 0.98 0.00025 1,047,719 Prπ⊥ |NO π⊥ + πs 130 0.98 0.000025 10,047,719 3 = 3 (44) Prπ⊥3 |YES π⊥3 Table 2: Crowd size. A single data owner blends with k other data owners who noisily answer “Yes" from the No population π⊥2 + πs π⊥3 + πs  εDP = max , (45) . π⊥2 π⊥3 8 EVALUATION 8.1 Accuracy the mechanism essentially ignores the data owner as the aggregate We evaluate the Distributed Differential Privacy By Sampling (DDPS) output information is essentially the same output distribution as show mechanism over a real dataset rather than arbitrary distributions. We by the graph. As each data owner reports they are at their actual utilize the California Transportation Dataset from magnetic pave- location less than 5% of the time, the additional three simultaneous ment sensors [2] collected in LA\Ventura California freeways [3]. announcements provides adequate privacy protection. There are a total of 3,220 stations and 47,719 vehicles total. We Table 2 shows the number of data owners an individual will assign virtual identities to each vehicle. Each vehicle announces the blend with. We vary the population of additional data owners and station it is currently at. accordingly decrease the sampling rate πNo to calibrate the error. Figure 1 compares the DDPS mechanism to the ground truth data The noise added is linear with the number of data owners to protect. over a 24 hour time period with a confidence interval of 99%. We Sampling reduces the the crowd size yet at the same time provides select a single popular highway station that collects and aggregates privacy protection. vehicle counts every 30 seconds. (For reasons of illustration we graph a subset of points for readability). We assign virtual identifiers 9 CONCLUSION and have every vehicle at the monitored station truthfully report In this paper, we show how to achieve differential privacy in the “Yes" while every other vehicle in the population truthfully reports distributed setting by sampling alone. We show that we can maintain “No". The DDPS mechanism then privatizes each vehicle’s response. constant error (i.e., absolute error) even as the population increases Traffic management analyzing the privatized time series would be and achieve lower privacy leakage as compared to randomized re- able to infer the ebbs and flow of the vehicular traffic. sponse. The coin toss probabilities are fixed and the sampling No rate is adjusted for privacy and error guarantees. The parameters π = sYes1 REFERENCES 0.45, π = 0.50, π = 0.95, π = 0.98, π = 0.98 are fixed and [1] Avrim Blum, Katrina Ligett, and Aaron Roth. 2013. A learning theory approach sYes2 1 2 3 to noninteractive database privacy. J. ACM 60, 2 (2013), 12:1–12:25. https: πs = 0.068 varies from 0.068 to 0.00000068. No //doi.org/10.1145/2450142.2450148 [2] California Department of Transportation 2017. California Department of Trans- 8.2 Privacy portation. http://pems.dot.ca.gov/. (2017). http://pems.dot.ca.gov/ [3] Caltran CWWP Information 2017. Google’s Waze announces government We evaluate privacy strength of the DDPS mechanism by examining data exchange program with 10 initial partners. http://www.dot.ca.gov/ the crowd sizes. We use the binomial complementary cumulative cwwp/InformationPageForward.do. (2017). http://www.dot.ca.gov/cwwp/ distribution function (ccdf) to determine the threshold of the num- InformationPageForward.do [4] T.-H. Hubert Chan, Mingfei Li, Elaine Shi, and Wenchang Xu. 2012. Differentially ber of successes. The crowd blending size is controlled by the No Private Continual Monitoring of Heavy Hitters from Distributed Streams. In population, that is the number of data owners that truthfully respond PETS. [5] Kamalika Chaudhuri and Nina Mishra. 2006. When Random Sampling Preserves “No". Also, the data owner blends in the ⊥ crowd, that is the crowd Privacy. In Advances in Cryptology - CRYPTO 2006, 26th Annual International which is not participating. Cryptology Conference, Santa Barbara, California, USA, August 20-24, 2006, Pro- The number of crowds an single owner blends is determined by ceedings (Lecture Notes in Computer Science), (Ed.), Vol. 4117. Springer, 198–213. https://doi.org/10.1007/11818175_12 the number of queries (e.g., number of crowdsourced locations). The [6] John C. Duchi, Martin J. Wainwright, and Michael I. Jordan. 2013. Local Privacy and Minimax Bounds: Sharp Rates for Probability Estimation. In Ad- coin toss probability of (πsNo × π3) magnifies the number of crowds, vances in Neural Information Processing Systems 26: 27th Annual Conference that is the number of queries that the data owner will respond to. on Neural Information Processing Systems 2013. Proceedings of a meeting Also, the data owner blends in the ⊥ crowd. held December 5-8, 2013, Lake Tahoe, Nevada, United States., Christopher Table 1 shows the number of additional locations a single data J. C. Burges, Léon Bottou, Zoubin Ghahramani, and Kilian Q. Weinberger (Eds.). 1529–1537. http://papers.nips.cc/paper/5013-local-privacy-and-minimax- owner will respond to out of 3,320. Decreasing the sampling rate bounds-sharp-rates-for-probability-estimation causes a reduction in the number of additional locations. In this case, [7] Cynthia Dwork. 2006. Differential Privacy. In ICALP. Ground truth 1mil-ddps-1 1mil-ddps-2 10k-ddps-1 10k-ddps-2 10k-rr 100k-ddps-1 100k-ddps-2 100k-rr 200k-ddps-1 200k-ddps-2 200k-rr 400

300

200 Vehicle Counts

100

00:00 03:00 06:00 09:00 12:00 15:00 18:00 21:00 00:00 Time Of Day

Figure 1: Accuracy. Ground truth versus privatized vehicle counts with a 99% confidence interval. The population not at the station being monitored (i.e., No population) is increases by expanding the query to the population listed in the table.

[8] Cynthia Dwork, Krishnaram Kenthapadi, Frank McSherry, Ilya Mironov, and [17] J. Joy, S. Rajwade, and M. Gerla. 2016. Participation Cost Estimation: Private . 2006. Our Data, Ourselves: Privacy Via Distributed Noise Generation. Versus Non-Private Study. ArXiv e-prints (April 2016). arXiv:cs.CR/1604.04810 In EUROCRYPT. [18] Shiva Prasad Kasiviswanathan, Homin K. Lee, , Sofya Raskhod- [9] Cynthia Dwork, Frank McSherry, Kobbi Nissim, and Adam Smith. 2006. Cali- nikova, and Adam D. Smith. 2008. What Can We Learn Privately?. In 49th brating Noise to Sensitivity in Private Data Analysis. In TCC. Annual IEEE Symposium on Foundations of Computer Science, FOCS 2008, Oc- [10] Cynthia Dwork and Aaron Roth. 2014. The Algorithmic Foundations of Differ- tober 25-28, 2008, Philadelphia, PA, USA. IEEE Computer Society, 531–540. ential Privacy. Foundations and Trends in Theoretical Computer Science 9, 3-4 https://doi.org/10.1109/FOCS.2008.27 (2014), 211–407. https://doi.org/10.1561/0400000042 [19] Ninghui Li, Wahbeh H. Qardaji, and Dong Su. 2012. On sampling, anonymization, [11] Úlfar Erlingsson, Vasyl Pihur, and Aleksandra Korolova. 2014. RAPPOR: Ran- and differential privacy or, k-anonymization meets differential privacy. In 7th ACM domized Aggregatable Privacy-Preserving Ordinal Response. In CCS. Symposium on Information, Compuer and Communications Security, ASIACCS [12] James Alan Fox and Paul E Tracy. 1986. Randomized response: a method for ’12, Seoul, Korea, May 2-4, 2012, Heung Youl Youm and Yoojae Won (Eds.). sensitive surveys. Beverly Hills California Sage Publications. ACM, 32–33. https://doi.org/10.1145/2414456.2414474 [13] Johannes Gehrke, Michael Hay, Edward Lui, and Rafael Pass. 2012. Crowd- [20] Kobbi Nissim, Sofya Raskhodnikova, and Adam D. Smith. 2007. Smooth sen- Blending Privacy. In Advances in Cryptology - CRYPTO 2012 - 32nd Annual sitivity and sampling in private data analysis. In Proceedings of the 39th An- Cryptology Conference, Santa Barbara, CA, USA, August 19-23, 2012. Proceed- nual ACM Symposium on Theory of Computing, San Diego, California, USA, ings (Lecture Notes in Computer Science), Reihaneh Safavi-Naini and Ran Canetti June 11-13, 2007, David S. Johnson and Uriel Feige (Eds.). ACM, 75–84. (Eds.), Vol. 7417. Springer, 479–496. https://doi.org/10.1007/978-3-642-32009- https://doi.org/10.1145/1250790.1250803 5_28 [21] Rafail Ostrovsky and Victor Shoup. 1997. Private Information Storage (Extended [14] Johannes Gehrke, Edward Lui, and Rafael Pass. 2011. Towards Privacy for Abstract). In Proceedings of the Twenty-Ninth Annual ACM Symposium on the Social Networks: A Zero-Knowledge Based Definition of Privacy. In Theory of Theory of Computing, El Paso, Texas, USA, May 4-6, 1997, Frank Thomson Cryptography - 8th Theory of Cryptography Conference, TCC 2011, Providence, Leighton and Peter W. Shor (Eds.). ACM, 294–303. https://doi.org/10.1145/ RI, USA, March 28-30, 2011. Proceedings (Lecture Notes in Computer Science), 258533.258606 Yuval Ishai (Ed.), Vol. 6597. Springer, 432–449. https://doi.org/10.1007/978-3- [22] Ajit C. Tamhane. 1981. Randomized Response Techniques for Multiple Sensitive 642-19571-6_26 Attributes. J. Amer. Statist. Assoc. 76, 376 (1981), 916–923. https://doi.org/ [15] Bernard G Greenberg, Abdel-Latif A Abul-Ela, Walt R Simmons, and Daniel G 10.1080/01621459.1981.10477741 Horvitz. 1969. The unrelated question randomized response model: Theoretical [23] Stanley L Warner. 1965. Randomized response: A survey technique for eliminat- framework. J. Amer. Statist. Assoc. 64, 326 (1969), 520–539. ing evasive answer bias. J. Amer. Statist. Assoc. 60, 309 (1965), 63–69. [16] J. Joy and M. Gerla. 2016. PAS-MC: Privacy-preserving Analytics Stream for the Mobile Cloud. ArXiv e-prints (April 2016). arXiv:cs.CR/1604.04892 rr-0.2 dps-0.4 rr-0.8 dps-0.2 rr-0.6 dps-0.8 20.0 rr-0.4 dps-0.6

17.5

15.0

12.5

10.0

7.5

Epsilon (Privacy Leakage) 5.0

2.5

0.2 0.3 0.4 0.5 0.6 0.7 0.8 Coin Toss Probability

Figure 2: Privacy Leakage. Differential Privacy ε leakage.

20 Randomized Response DDPS

10

0.2 0.4 0.6 0.8

Epsilon (Privacy Leakage) Coin Toss Probability

Figure 3: Privacy Leakage. Differential Privacy ε leakage.