Asymmetric

Shun Takagi Yang Cao Masatoshi Yoshikawa Kyoto University Kyoto University Kyoto University Kyoto, Japan Kyoto, Japan Kyoto, Japan [email protected] [email protected] [email protected]

ABSTRACT cannot know the true answer is whether yes or no due to the pos- Recently, differential privacy (DP) is getting attention as a privacy sible error (i.e., two-sided error). A one-sided error means that the definition when publishing statistics of a dataset. However, when randomized mechanism correctly answers for a one-sided answer answering a decision problem with a DP mechanism, it causes a (e.g., yes), but the opposite answer may include error. If we answer two-sided error. This characteristic of DP is not desirable when pub- yes using Mangat’s randomized response [26], we can know the true lishing risk information such as concerning COVID-19. This paper answer is yes because the output of the algorithm has the one-sided proposes relaxing DP to mitigate the limitation and improve the util- characteristic. ity of published information. First, we define a policy that separates As an example of a problem of the two-sided characteristic, we information into sensitive and non-sensitive. Then, we define asym- consider publishing risky location information for epidemic disease metric differential privacy (ADP) that provides the same privacy monitoring. Specifically, we want to publish each location is safeor guarantee as DP to sensitive information. This partial protection not safe, which means whether it has been visited by many infected induces asymmetricity in privacy protection to improve utility and people (i.e., counting query). Actually, Korea has succeeded in con- allow a one-sided error mechanism. Following ADP, we propose two trol of COVID-19 by measures including publishing information of mechanisms for two tasks based on counting query with utilizing infected people’s moving trajectories [30]. Despite its effectiveness, these characteristics: top-푘 query and publishing risk information of this measure was criticized as an invasion of privacy [30]. One may viruses with an accuracy guarantee. Finally, we conducted experi- think to release a DP histogram about "how many infected people ments to evaluate proposed algorithms using real-world datasets and visited each location". However, it must include two-sided errors due show their practicality and improvement of the utility, comparing to the nature of DP. That means, even if published information says state-of-the-art algorithms. safe, it may not be true due to the noise of DP. Without a guarantee of accuracy (one-sided error), published information is not useful to PVLDB Reference Format: take an appropriate approach. Shun Takagi, Yang Cao, and Masatoshi Yoshikawa. Asymmetric Differential DP is mathematically defined, so we can induce the theoretical Privacy. PVLDB, 14(1): XXX-XXX, 2020. limitations on the error on a decision problem. This paper proves doi:XX.XX/XXX.XX that there is no one-sided error mechanism that satisfies 휖-DP (i.e., the error of all DP mechanisms must be two-sided). Therefore, we PVLDB Artifact Availability: cannot construct a mechanism that satisfies DP with one-sided error. The source code, data, and/or other artifacts have been made available at We note that Mangat’s randomized response is not following DP. http://vldb.org/pvldb/format_vol14.html. This paper proposes a new relaxed definition of DP to overcome 1 INTRODUCTION the limitation, called Asymmetric Differential Privacy (ADP). We introduce a policy which defines whether a value is sensitive or non- Differential privacy (DP) [12] is becoming a gold standard privacy sensitive. Then, we define ADP on the given policy. ADP protects notion so that the US census announced that they adopted DP when privacy following the given policy so that ADP will protect individu- publishing the ’2020 Census results [4] and IT companies such as als’ sensitive values and not protect non-sensitive values. Concretely, Google [16], Apple [31], Microsoft [9], and Uber [21] are using DP ADP guarantees the indistinguishability of sensitive values from to protect privacy while collecting data. The flourish is from the non-sensitive values, but ADP does not guarantee non-sensitive val- arXiv:2103.00996v1 [cs.CR] 1 Mar 2021 mathematical rigorousness under the assumption that an adversary ues from sensitive values1. The asymmetricity enables constructing has any knowledge about individuals in the dataset. a one-sided error mechanism. Then, we propose one-sided error This paper considers error on a decision problem that is a query mechanisms to publish the safety information about viruses with a where we can answer by yes or no: one-sided error and two-sided trajectory dataset. error, which occur when solving a decision problem by a random- The asymmetricity not only allows a one-sided error mechanism ized algorithm such as Monte Carlo algorithm [24]. For example, but also improves the existing DP mechanisms. First, we consider suppose we answer yes to a decision problem based on Warner’s a single-dimensional counting query. The Euclidean distance error randomized response [33] which satisfies DP. In that case, observers (i.e., E푧 [|푥 −√푧|] where 푧 is the output and 푥 is the true count) has the This work is licensed under the Creative Commons BY-NC-ND 4.0 International lower bound 2/휖 and the Laplace mechanism [13] is optimal, which License. Visit https://creativecommons.org/licenses/by-nc-nd/4.0/ to view a copy of was proved by Koufogiannis et al. [23]. Then, we show that ADP this license. For any use beyond those covered by this license, obtain permission by allows the Euclidean distance error / due to the asymmetricity. emailing [email protected]. Copyright is held by the owner/author(s). Publication rights 1 휖 licensed to the VLDB Endowment. Second, this paper considers two representative DP mechanisms Proceedings of the VLDB Endowment, Vol. 14, No. 1 ISSN 2150-8097. doi:XX.XX/XXX.XX 1When DP guarantees the indistinguishability of A from B, DP also guarantees the indistinguishability of B from A, which is the symmetricity of DP. using the Laplace mechanism: report noisy arg-max and sparse Symbol Meaning vector technique [10, 25]. The algorithms aim to make the histogram 퐷 ∈ D A dataset that is a set of records. of top-푘 counts (i.e., set of counting query). We propose replacing 푟 = (푟 , 푟 , . . . , 푟 ) A record that is a set of attributes. DP of the two algorithms with ADP. The asymmetricity improves 1 2 푑 푟 ∈ X An attribute of record 푟. the accuracy of output histogram with the same privacy guarantee to 푖 푖 This paper assumes X is binary. the sensitive values. 푧 ∈ Z An output of a mechanism Finally, we conducted two types of experiments to evaluate our 푆 ⊆ Z A subset of outputs. proposed mechanisms using two kinds of real-world datasets, respec- 휆 A random variable. tively: click-streaming data and trajectory data. First, we show with 퐻 = (휆 , 휆 , . . . , 휆 ) A set of random variables. click-streaming data that the output histogram of top-푘 items from 1 2 푑 푚 : D → Z A randomized mechanism. our ADP mechanisms is more about ten-times accurate w.r.t. the 푚 : D → Z A mechanism using random variables 퐻. mean squared error than a state-of-the-art mechanism [10] due to the 퐻 R The universe of a real number relaxation. Second, we show with trajectory data that our proposed N The unuverse of a natural number. mechanisms are practical to publish safety information of viruses. 휖 ∈ R+ A privacy budget. ADP can guarantee the accuracy of safety, but cannot guarantee 푓 : D → Z A query. This paper considers that Z is N. the one about dangerousness (i.e., false-negative occurs). Our ex- 푞 : D → {0, 1} A decision problem. periments show that the false-negative ratio is practically small. In almost the case, that is smaller than 0.01 at 휖 = 1. Our contribution in this paper is threefold. Table 1: Notation table. • We derive the theorem that shows there are no DP algorithms with one-sided error. Then, we introduce a policy and pro- pose a new privacy definition, called Asymmetric Differential the utility of output information by utilizing non-sensitive records. Privacy (ADP), following a policy to allow one-sided error, Therefore, OSDP cannot follow our policy that separates values. If a and we derive the theorems that show improved utility about record is a value (i.e., data is one-dimensional), OSDP is identical the one-sided error. with ADP, and the asymmetricity in OSDP occurs, but the paper • We introduce a primitive mechanism that satisfies ADP. Then, does not mention and utilize the asymmetricity. Actually, if we use we propose a policy and mechanisms that follow ADP on the OSDP in our tasks, the asymmetricity does not occur because we policy for top-푘 query and monitoring locations’ safety for use multi-dimensional data. avoiding viruses with an accuracy guarantee. Mangat’s randomized response [26] is following our privacy no- • We show the improvements from the state-of-the-art mecha- tion, which we can interpret as asymmetrization of the traditional nism [10] and the practicality for publishing risk information randomized response (i.e., Warner’s randomized response [33]). Util- of viruses by the experiments using real-world datasets. ity optimized local differential privacy (ULDP) [27] is also a privacy definition that Mangat’s randomized response follows, so ULDP 1.1 Related Work also utilizes asymmetricity. However, ULDP is defined on the lo- Chen et al. [7] proposed a way of constructing a COVID-19 vulner- cal setting on data with one attribute, so we cannot apply ULDP ability map with geo-indistinguishability. Geo-indistinguishability to our task (i.e., counting query). Context-aware local differential is a local model so that noise is much larger, so their granularity of privacy [1] and input-discriminative local differential privacy [17] the map is needed to be large. Our model is central, so we need a define a precise privacy level for a combination of data, which are trusted third party, but our map can be any granularity due to the generalizations of ULDP. Therefore, they can also utilize asym- low sensitivity. Contact tracing is now widely used [3, 18] such as metricity in a certain setting (and the papers did not utilize the BlueTrace in Singapore, Cocoa in Japan, etc., but the model is said asymmetricity), but for the same reason as ULDP, we cannot apply to require participants more than about half of the population to them to our task. decrease infection [29]. Actually, many countries failed due to the lack of participants [8] because of several concerns [28]. Our model 2 BACKGROUND requires only infected people to participate, so such a problem does Here, we first introduce DP and related techniques, which are the not occur. basis of our proposed notions. Second, we explain counting query As this work proposes, several papers proposed relaxation of DP and decision problem, which are our target queries to which we by defining a policy to improve utility5 [ , 20, 22]. Their policy is introduce our privacy notion. based on a discriminative pair, so the asymmetricity that we uti- We show the notations used in this paper in Table 1. lize does not occur. Therefore, their relaxation cannot achieve our improvement of utility. 2.1 Differential Privacy To the best of our knowledge, this paper first considers one-sided Differential privacy (DP) [12] is a mathematical privacy definition, error in DP. However, it is noted that several relaxation models which quantitatively evaluates the degree of privacy protection when of DP can achieve one-sided error implicitly [1, 11, 17, 27]. One- publishing an output by a randomized mechanism. The definition of sided differential privacy [11] (OSDP) also defines a policy that DP is as follows: separates records to non-sensitive and sensitive records to improve 2 DEFINITION 1 ((휀, 훿)-DIFFERENTIAL PRIVACY). A randomized DEFINITION 4 (LOCAL ALIGNMENT [10]). A local alignment mechanism 푚 : D → Z satisfies휀, ( 훿)-DP iff for any two neighbor- for 푚 is a function 휙퐷,퐷′,푧 : 푆퐷:푧 → 푆퐷′:푧 such that for all 퐻 ∈ 푆퐷:푧, ing datasets ′ ∈ D and any subset of outputs ⊆ Z, it holds we have ( ) = ( ′) where = { | ( ) = } 퐷, 퐷 푆 푚퐻 퐷 푚휙퐷,퐷′,푧 (퐻) 퐷 푆퐷:푧 퐻 푚퐻 퐷 푧 that and 푧 is a possible output of 푚. Pr[푚(퐷) ∈ 푆] ≤ exp(휀) Pr[푚(퐷 ′) ∈ 푆] + 훿. (1) DEFINITION 5 (ACYCLIC [10]). For any 퐻 = (휆1, 휆2,... ), let where D and Z are the universe of a dataset and output, respectively, ′ ′ ′ ′ ′ 퐻 = (휆1, 휆2,... ) denote 휙퐷,퐷 ,푧 (퐻). We say that 휙퐷,퐷 ,푧 is acyclic and neighboring datasets mean two datasets whose only one record if there exists a permutation 휋 and piecewise differentiable functions differs. (푗) 휓퐷,퐷′,푧 such that Practically, we employ a randomized mechanism 푚 that ensures ′ ′ 휆 ( ) = 휆휋 (1) + 푛푢푚푏푒푟 푡ℎ푎푡 표푛푙푦 푑푒푝푒푛푑푠 표푛 퐷,퐷 , 푧 DP for a query 푓 : D → Z. The mechanism 푚 perturbs the output 휋 1 ′ = + (푗) ( ) ≥ of 푓 to cover 푓 ’s sensitivity that is the maximum degree of change 휆휋 ( 푗) 휆휋 ( 푗) 휓퐷,퐷′,푧 휆휋 (1), . . . , 휆휋 (푗−1) 푓 표푟 푗 2 over any pairs of neighboring datasets 퐷 and 퐷 ′. DEFINITION 6 (ALIGNMENT COST [10]). Suppose each 휆푖 is DEFINITION 2 (SENSITIVITY). The sensitivity of a query 푓 is: generated independently from a distribution 푓푖 with the property that ′ log(푓푖 (푥)/푓푖 (푦)) ≤ |푥 − 푦| /훼푖 for all 푥,푦 in the domain of 푓푖 . Then Δ푓 = sup 푓 (퐷) − 푓 (퐷 ) . 퐷,퐷′ ∈D the cost of 휙퐷,퐷′,푧 is defined as: ∑︁ ′ where | · | is a norm function defined on 푓 ’s output domain and 퐷 푐표푠푡 (휙퐷,퐷′,푧) = 휆푖 − 휆푖 /훼푖 and 퐷 ′ are neighboring datasets. 푖 Based on the sensitivity of 푓 , we design the degree of noise to Using these formulations, we can derive the following theorem. ensure DP. THEOREM 2. If the following conditions are satisfied, then 푚 satisfies -DP [10]. 2.1.1 Laplace Mechanism. The Laplace mechanism [13] is the 휖 most standard mechanism for DP, so many kinds of literature are (1) 푚 terminates with probability 1. using the Laplace mechanism. This mechanism adds noise following (2) The number of random variables used by 푚 can be determined the Laplace distribution to the answer 푓 as follows. from its output. (3) Each 휆푖 is generated independently from a distribution 푓푖 with 퐿푎푝 (푓 , 퐷) = 푓 (퐷) + 휆 휖 the property that log(푓푖 (푥)/푓푖 (푦)) ≤ |푥 − 푦| /훼푖 for all 푥,푦 in where 휆 is a random variable following the Laplace distribution the domain of 푓푖 . | | ′ 휖 exp 휆 휖 . The Laplace mechanism satisfies 휖-DP. (4) For every neighboring dataset 퐷, 퐷 and 푧 there exists a local 2Δ푓 Δ푓 alignment 휙퐷,퐷′,푧 that is acyclic with 푐표푠푡 (휙퐷,퐷′,푧) ≤ 휖. 2.1.2 Composition Theorem. The composition theorem [15] (5) For each neighboring dataset 퐷, 퐷 ′ the number of distinct shows that a mechanism that sequentially applies DP mechanisms local alignments is countable. That is, the set {휙퐷,퐷′,푧 |푧 ∈ Z} satisfies DP. Informally, the composition theorem is as follows. is countable.

THEOREM 1. Let 푚푖 be 휖푖 -DP mechanism for 푖 ∈ [푘]. Then the Following this theorem, we can prove that a mechanism satisfies sequential mechanism 푚 [푘 ] (i.e.,푚 [푘 ] (푥) = (푚1 (푥),푚2 (푥), . . . ,푚푘 (푥))) DP. We refer the reader to [10] for detail. Í satisfies 푖 ∈[푘 ] 휖푖 -DP 2.2 Counting Query 2.1.3 Alignment. A DP mechanism can be so com- Counting query is the most basic statistic query. Counting query plicated (e.g., the sparse vector technique described in the following appears in fractional form, with weights (linear query) or more section) that the wrong proofs appear [25]. Then, many verification complex form, but this paper considers the most simple counting tools using a proof syntax [2, 32, 34] are proposed to verify the query 푓 : D → N, which counts the number of records satisfying correctness of proof for DP. The idea of the proof syntax is called a condition. In this case, the theoretical bound of the Euclidean randomness alignment. Here, we explain randomness alignment distance error (i.e., E [|푥 −푧|] where 푧 is the output and 푥 is the true using pure DP (i.e., 훿 = 0). 푧 count) is as follows In general, we need to prove that Inequality (1) holds for any out- put and any pair of neighboring datasets. 퐻 denotes a set of random THEOREM 3 (LOWERBOUNDONTHECOUNTINGQUERYON variables used in a mechanism, and let 푚퐻 (퐷) denote the output of DP[23]). Let 휖 < 0, every 휖-DP√ mechanism must have the Eu- 푚 using 퐻. Then, we introduce randomness alignment 휙퐷,퐷′ , which clidean distance error more than 2/휖. is a function that maps noise vectors 퐻 into noise vectors 퐻 ′ so that √ ′ ′ The Laplace mechanism 2.1.1 has error 2/휖, which means opti- 푚퐻 (퐷) = 푚퐻 ′ (퐷 ). Here, let 퐷, 퐷 be two neighboring datasets. We follow Ding’s formulation about the definitions and notations for mal on the counting query. randomness alignment [10]. When the number of counting queries is small, the Laplace mech- anism is enough. However, multiple counting queries require much DEFINITION 3 (RANDOMNESS ALIGNMENT [10]). A random- noise due to the composition theorem (Section 2.1.2). Then, the ness alignment is a function 휙퐷,퐷′ such that for all 퐻, 푚퐻 (퐷) = sparse vector technique and the report noisy arg-max algorithm are ( ′). 푚휙퐷,퐷′ (퐻) 퐷 useful for answering multiple counting queries. 3 2.2.1 Sparse Vector Technique. Dwork et al. first proposed the location 1 location 2 sparse vector technique to handle high-sensitive counting query [14]. Bob 1 0 Then, successors proposed many applications [25]. Recently, Ding Tom 0 0 et al. found an accuracy-improved version of the sparse vector tech- Alice 1 1 nique [10]. Ema 0 1 Here, we explain the most standard sparse vector technique [14]. In nature, if we use sequentially a DP mechanism (e.g., Laplace Table 2: Example of a dataset. mechanism) to answer queries (푓0, 푓1, . . . , 푓푡 ), the privacy guarantee is 푡휖-DP from the composition theorem (see Section 3.3.3 for detail). This sequential composition is a problem when the number of queries is large, and the sparse vector technique solves this problem. The sparse vector technique returns a threshold answer (i.e., locations or not. E.g., this table shows that Bob has visited location whether a query answer is above the threshold or not) instead of a 1 and has not visited location 2. noisy count. The algorithm does not consume the privacy budget We first introduce a policy function 푃푖 : X푖 → {0, 1} where X푖 is when outputting a below-the-threshold answer by adding noise to the universe of 푖th attribute. This paper assumes that an attribute’s the threshold and query answer using the Laplace mechanism with universe is binary (i.e., {0, 1}). 휖/2 and 휖/(2푐), respectively. It consumes the privacy budget only DEFINITION 7 (POLICYFUNCTION). A policy function 푃 : when outputting an above-the-threshold answer. In other words, the 푖 X → {0, 1}, which takes a value of an attribute 푖 as an input, sparse vector technique can find 푐 queries of an above-the-threshold 푖 returns 1 when the input is sensitive and returns 0 otherwise2. 푃 answer. This algorithm satisfies 휖-DP. denotes the set of policy functions 푃1, 푃2, . . . , 푃푑 for all attributes, 2.2.2 Report Noisy Arg-max. Here, we consider a maximum where 푑 is the number of attributes of a dataset. Here, we assume query (i.e., finding the maximum answer in a set of queries). By that all individuals in the dataset use the same policy function (i.e., counting all attributes, we can find the argument with maximum a used policy is public information). counts, but this manipulation causes high-sensitivity. Then, a re- EXAMPLE 1 (POLICYFUNCTION). Let us suppose that users port noisy arg-max algorithm was proposed to handle this high- compromise to publish that they have not visited an area but want sensitivity [15]. The algorithm first uses the Laplace mechanism to hide their visited areas to protect their privacy. Then, the policy with 휖/2 for all counts. Then, the algorithms output the argument of function in Table 2 can be defined as follows: the maximum noisy answer. This algorithm satisfies 휖-DP.  1 (푥 = 1) 푃 (푥) = 2.3 Decision Problem 푖 0 (푥 = 0) A decision problem is a query that returns yes or no. In this paper, where 푖 = 0, 1. we handle the query which answers whether the number of counts is below the threshold or not. EXAMPLE 2 (POLICY FUNCTION OF DIFFERENTIAL PRIVACY). DP does not use a policy function but we can interpret that DP uses One-sided and Two-sided error. Here, we introduce one-sided the following policy function for all attributes. error and two-sided error, mainly used in Monte Carlo algorithms to answer a decision problem. In a deterministic algorithm, an error 푃퐷푃 (푥) = 1 does not occur, but a randomized algorithm such as a Monte Carlo In other words, DP considers that all data is sensitive. algorithm causes the error. If a randomized algorithm is always correct when it returns True We note that simply publishing non-sensitive information causes (False), we call it a true-biased (false-biased) algorithm. We say a privacy leak because we assume that a policy function is published. that if an algorithm is a true-biased (false-biased) algorithm, the For example, if Bob only publishes location 2, an adversary knows algorithm is one-sided. If a randomized algorithm has no-biased, we Bob has visited location 1 because Bob does not publish the infor- say that the algorithm is two-sided. mation. Our relaxation is based on this policy function. We relax In Section 4.2, we show that the error of any 휖-DP mechanisms DP by allowing leakage of non-sensitive value. ADP guarantees that must be two-sided, which means that true-biased (false-biased) pub- a mechanism hides a record in 푃-neighboring records defined as lishing is impossible due to the constraint of traditional 휖-DP. follows. 3 NEW PRIVACY DEFINITION DEFINITION 8 (푃-NEIGHBORINGOFARECORD). We say that a record 푟 = (푟 , 푟 , . . . , 푟 ) is 푃-neighboring to a record 푟 ′ = In this section, we propose a new privacy definition, called Asym- 1 2 푑 (푟 ′, 푟 ′, . . . , 푟 ′ ) iff it holds that: metric Differential Privacy (ADP), which relaxes DP. 1 2 푑 ′ ∀푖, 푟푖 = 푟푖 푖푓 푃푖 (푟푖 ) = 0 3.1 Asymmetric Differential Privacy where 푟 and 푟 ′ are 푖 attributes of 푟 and 푟 ′, respectively. Here, we propose the relaxation of DP, called Asymmetric Differ- 푖 푖 th ential Privacy (ADP). To explain ADP, we use Table 2 as a dataset 2Our policy is different from OSDP’s policy [11] in that our policy discriminates values for an example. Each data represents whether the user has visited but OSDP’s policy discriminates records. 4 EXAMPLE 3 (푃-NEIGHBORINGOFARECORD). Bob’s record 3.2.2 Monotonicity. We define monotonicity in sensitivity as fol- in Table 2 is 푃-neighboring to [0, 0], [1, 0]. Alice’s record is 푃- lows: neighboring to [1, 0], [0, 1], [0, 0]. Tom’s record is not 푃-neighboring DEFINITION 12 (MONOTONICITYIN 푃-SENSITIVITY). 푃-sensitivity to any record. of 푓 is called monotonically increasing (decreasing) iff ∀퐷 ∈ ′ In DP, there is no notion corresponding to neighboring record. D, 퐷 ∈ 푁푃 (퐷) Still, we can consider that DP implicitly uses a neighboring record 푓 (퐷) ≤ 푓 (퐷 ′)(푓 (퐷) ≥ 푓 (퐷 ′)). as all records. Then, we can define the notion of 푃-neighboring dataset. As described above, DP requires that sensitivity is not monotonic due to the symmetric neighboring relationship. DEFINITION 9 (푃-NEIGHBORING OF A DATASET). We say that ′ a dataset 퐷 is 푃-neighboring to a dataset 퐷 iff only one record of 퐷 THEOREM 4. Sensitivity in DP (i.e., 푃퐷푃 -sensitivity) is not mono- ′ ′ ′ 3 differs from 퐷 and the differing record is 푃-neighboring. Let 푁푃 (퐷) tonic except the query where ∀퐷, 퐷 , 푓 (퐷) = 푓 (퐷 ) . denote the set of datasets, each of which is 푃-neighboring dataset of 퐷. PROOF. The neighboring relationship of DP is symmetric, so if 푓 (퐷) − 푓 (퐷 ′) > 0, it holds that 푓 (퐷 ′) − 푓 (퐷) < 0. Therefore, Note that the neighboring relationship in DP is symmetric, but sensitivity is not monotonic except the case where ∀퐷, 퐷 ′, 푓 (퐷) = the 푃-neighboring relationship can be asymmetric. In other words, 푓 (퐷 ′). ′ ′ if dataset 퐷 is neighboring to 퐷 , 퐷 is neighboring to 퐷. However, Q.E.D. □ even if 퐷 is 푃-neighboring to 퐷 ′, 퐷 ′ is not always 푃-neighboring to 퐷. The name asymmetric is given by this fact. EXAMPLE 5 (MONOTONICALLYDECREASING). Assume a count- Using the 푃-neighboring relationship, we can define our new ing query 푓 that counts the number of people who have visited a privacy notion, called Asymmetric Differential Privacy (ADP). location. Then, 푃-sensitivity of 푓 where 푃 is the policy in Example 1 is monotonically decreasing since a change of a sensitive value only DEFINITION 10 ((푃, 휖)-ASYMMETRIC DIFFERENTIAL PRIVACY). decreases the count. A randomized mechanism 푚 satisfies (푃, 휖)-Asymmetric Differential ′ Privacy (ADP) iff ∀푆 ⊆ Z and ∀퐷, 퐷 ∈ 푁푃 (퐷) 3.3 Characteristics of ADP 휖 ′ Pr[푚(퐷) ∈ 푆] ≤ e Pr[푚(퐷 ) ∈ 푆] (2) First, we describe the privacy guarantees of ADP against a strong The only difference from DP is in the neighboring relationship. adversary. Second, we analyze a relationship between ADP and DP. Our privacy definition guarantees that it is difficult for an adversary Third, we derive the composition theorem for ADP. Here, we con- ′ sider that a dataset 퐷 consists of only one record for simplicity (i.e., to distinguish 퐷 from 퐷 ∈ 푁푃 (퐷) to the degree of 휖. From an individual’s point of view, the own record has indistinguishability the setting of local differential privacy) without losing generality. from neighboring records. In other words, an ADP algorithm protects 3.3.1 Privacy Guarantee Against a Strong Adversary. A strong a sensitive value (i.e., 푃 (푟푖 ) = 1) by hiding in other values. However, adversary is knowledgeable about a target individual so that the ad- a non-sensitive value defined by a policy function (i.e., 푃 (푟푖 ) = 0) versary knows non-sensitive values but does not know sensitive can leak due to the policy. values. In other words, the adversary knows that a true record is EXAMPLE 4 (A PRIVACY GUARANTEE). Bob in Table 2 has vis- in 푁푃 (퐷) (note that here 푁푃 (퐷) is the set of records due to the assumption of |퐷| = 1). Then, from Definition 10, ited location 1 and the information is protected because 푃푙표푐푎푡푖표푛 1 (1) = 1. However, the information that Bob has not visited location 2 can Pr[푟 |푧] 휖 Pr[푟] ′ ≤ e ′ leak because 푃푙표푐푎푡푖표푛 2 (0) = 0. Tom’s record has no 푃-neighboring Pr[푟 |푧] Pr[푟 ] record, so Tom’s record can leak. ′ where 푟 is a true record and {푟 } ∈ 푁푃 ({푟 }). Here, the strong ad- versary knows that the true record is in 푁푃 ({푟 }), so the privacy 3.2 P-sensitivity guarantee against the strong adversary is the same as DP. In other The asymmetric characteristic on the 푃-neighboring relationship words, the strong adversary cannot improve her/his knowledge up to (see Section 3.1 for detail) can induce monotonicity of sensitivity. 푒휖 even if s/he sees 푧 output by a mechanism that satisfies푃 ( ,휖)-ADP. Monotonicity enables us to construct a more accurate and one-sided If an adversary is not knowledgeable about a target individual, s/he error algorithm. We note that the sensitivity of DP has a character- can know that a true record is in 푁푃 ({푟 }). istic of not monotonicity since DP uses the symmetric neighboring relationship, which is the main motivation to introduce ADP instead 3.3.2 Relationship to Differential Privacy. As a relationship of DP. between ADP and DP, we can prove that ADP is a relaxation of DP. Informally, the following theorems hold. 3.2.1 Definition. We define 푃-sensitivity instead of sensitivity of 퐷푃 DP by changing the neighboring relationship. THEOREM 5. (1) If a mechanism 푚 satisfies푃 ( , 휖)-ADP, 푚 satisfies 휖-DP. DEFINITION 11 (푃-SENSITIVITY). (2) If a mechanism 푚 satisfies 휖-DP, 푚 satisfies푃, ( 휖)-ADP for ′ Δ푓 ,푃 = sup 푓 (퐷 ) − 푓 (퐷) any policy function 푃. ′ 퐷 ∈D,퐷 ∈푁푃 (퐷) 3This query is not sensitive because this always returns the same value regardless of an where 푓 and 푃 are a query and a policy function, respectively. input database. 5 PROOF. (1) 푃퐷푃 -neighboring records are all records. This Q.E.D. □ means that 푃퐷푃 -neighboring datasets are the same as the ones of the neighboring datasets in DP. Therefore, the theorem 4 THEORETICAL BOUND OF ONE-SIDED holds. ERROR (2) Neighboring datasets of 퐷 in DP (i.e. 푃퐷푃 -neighboring datasets) Here, we first analyze a true-biased (false-biased) algorithm inDP include 푃-neighboring datasets of 퐷 for any 푃. Therefore, (i.e., an algorithm with one-sided error). Then, we prove that there if 푚 satisfies 휖-DP, Inequality (4) holds for any neighbor- is no true-biased (false-biased) algorithm that satisfies 휖-DP. Finally, ing datasets, so Inequality (2) also holds for 푃-neighboring we show that ADP allows a true-biased (false-biased) algorithm and datasets. the theoretical bound of one-sided error. Q.E.D. □ 4.1 One-sided True Positive 3.3.3 Composition Theorem. ADP satisfies the composition We call the output of True by a true-biased algorithm (i.e., true theorem like DP. positive) one-sided true positive (OTP). From here, the boolean is THEOREM 6. We consider the sequential mechanism 푀, which se- denoted by {0, 1}. We consider a decision problem 푞 : D → {0, 1} quentially applies푚1,푚2, . . . ,푚푗 to dataset 퐷 to output 푧1, 푧2, . . . , 푧푗 . that takes a dataset as input and returns a boolean. To answer a Assume that 푚1,푚2, . . . ,푚푗 satisfy (푃, 휖1), (푃, 휖2),..., (푃, 휖푗 )-ADP, query with privacy protection, we use a randomized mechanism 푚 : 푗 D → { } ( ) respectively. Then, the sequential mechanism 푀 satisfies (푃, Í 휖 )- 0, 1 instead of 푞 퐷 , which causes an error. We say that the 푖=1 푖 ( ) ( ) ( ) ( ) ADP. output is true positive (negative) iff 푚 퐷 = 1 0 and 푞 퐷 = 1 0 , respectively. Then, the definition of OTP is as follows. ′ PROOF. From the definition of ADP, it holds that ∀푆 ⊆ Z, 퐷, 퐷 ∈ DEFINITION 13 (ONE-SIDEDTRUEPOSITIVE (OTP)). OTP is 푁푃 (퐷), 푖 ∈ [푗] output 1 of a true-biased algorithm. That is, OTP is output 1 of 푚 휖푖 ′ Pr[푚푖 (퐷) ∈ 푆] ≤ e Pr[푚푖 (퐷 ) ∈ 푆] where: Therefore, the following inequality holds. Pr[푚(퐷) = 1;푞(퐷) = 1] > 0 ∧ Pr[푚(퐷) = 1;푞(퐷) = 0] = 0 푗 [ ( ) ∈ ] 푗 [ ( ) ∈ ] Π푖=1 Pr 푚푖 퐷 푆 Π푖=1 Pr 푀 퐷 푆 = If we see output 1 of 푚(퐷), we can validate 푞(퐷) = 1. 푗 [ ( ′) ∈ ] 푗 [ ( ′) ∈ ] Π푖=1 Pr 푚푖 퐷 푆 Π푖=1 Pr 푀 퐷 푆 Í푗 휖 We omit the notation of one-sided true negative to make it easy ≤ e 푖=1 푖 to read, but it will be identical to OTP by replacing 1 with 0. Q.E.D. □ 3.3.4 Randomness Alignment in Asymmetric Differential Pri- 4.2 OTP in DP vacy. As described above, the difference between ADP and DP is From here, we consider a mechanism 푚, which satisfies (휖, 훿)-DP. in neighboring relationships. Therefore, we can apply a randomness From the conclusion as the following theorem, the mechanism alignment technique to prove that a mechanism satisfies (푃, 휖)-ADP achieves OTP with the probability of the proportional to 훿, which as follows. means pure DP (i.e., 휖 = 0) does not achieve OTP.

THEOREM 7. If the following conditions are satisfied, then 푚 THEOREM 8 (OTP IN (휖, 훿)-DIFFERENTIAL PRIVACY). Let us satisfies푃, ( 휖)-ADP. suppose that 푚 satisfies (휖, 훿)-DP and the minimum hamming dis- ′ ′ (1) 푚 terminates with probability 1. tance to change the query answer of 퐷 is 푘 (i.e., ∀퐷 ,푑ℎ (퐷, 퐷 ) ≥ 푘 ′ 4 (2) The number of random variables used by 푚 can be determined where 푞(퐷) = 1 and 푞(퐷 ) = 0) . Then, it holds that: from its output. 푘 ∑︁ (3) Each 휆푖 is generated independently from a distribution 푓푖 with Pr[푂푇푃;푞(퐷) = 1] ≤ 훿 e(푖−1)휖 the property that log(푓푖 (푥)/푓푖 (푦)) ≤ |푥 − 푦| /훼푖 for all 푥,푦 in 푖=1 the domain of 푓푖 . ′ where Pr[푂푇푃;푞(퐷) = 1] is the probability that 푧 is OTP when (4) For every 퐷 ∈ D, 퐷 ∈ 푁푃 (퐷) and 푧 there exists a local 푞(퐷) = 1. alignment 휙퐷,퐷′,푧 that is acyclic with 푐표푠푡 (휙퐷,퐷′,푧) ≤ 휖. ′ (5) For each 퐷 ∈ D, 퐷 ∈ 푁푃 (퐷) the number of distinct local PROOF. Since 푚 satisfies (휖, 훿)-DP, alignments is countable. That is, the set {휙퐷,퐷′,푧 |푧 ∈ Z} is countable. ′ Pr[푚(퐷) ∈ 푆] ≤ e휖 Pr[푚(퐷 ) ∈ 푆] + 훿 (4) PROOF. Theorem 2 means that if a mechanism satisfies the con- where 퐷 and 퐷 ′ are neighboring datasets, and 푆 is any subset in out- ditions in Theorem 2 it holds that put domain. Here, we assume 푞(퐷) = 1 and 푆 to consist of all outputs ′ Pr[푚(퐷) ∈ 푆] ≤ e휖 Pr[푚(퐷 ) ∈ 푆] (3) that are OTP. Then, Pr[푚(퐷) ∈ 푆] is the same as Pr[푂푇푃;푞(퐷) = 1]. for any 퐷, 퐷 ′, and 푆 ∈ Z. Here, we instead use 푃-neighboring Since the minimum hamming distance to change the query answer ′ ′ datasets 퐷 ∈ D, 퐷 ∈ 푁푃 (퐷). Therefore, for any 퐷, 퐷 ∈ 푁푃 (퐷), Inequality (3) holds, which is the definition of ADP. 4Here, 퐷 and 퐷′ are not limited to neighboring datasets. 6 is 푘, we can derive the following inequality by iteratively applying 5 MECHANISMS Inequality (4). Here, we propose mechanisms that satisfy ADP. First, we introduce 푘 a basic mechanism that can be a component of ADP mechanisms, 푘휖 ∑︁ (푖−1)휖 Pr[푚(퐷) ∈ 푆] ≤ e Pr[푚(퐷푘 ) ∈ 푆] + 훿 e called the asymmetric Laplace mechanism. Then, we propose mech- 푖=1 anisms for two use-cases, respectively: top-푘 query and location monitoring. where 퐷푘 is a dataset such that 푑ℎ (퐷, 퐷푘 ) = 푘 and 푞(퐷푘 ) = 0. From the definition of OTP (Definition 13), we have Pr[푚(퐷 ′) ∈ 푆] = 0. Therefore: 5.1 Asymmetric Laplace Mechanism 푘 First, we introduce the asymmetric Laplace mechanism (aLap) that ∑︁ 5 Pr[푚(퐷) ∈ 푆] = Pr[푂푇푃;푞(퐷) = 1] ≤ 훿 e(푖−1)휖 is an ADP version of the Laplace mechanism . = 푖 1 DEFINITION 15 (ASYMMETRIC LAPLACEMECHANISM). Let Q.E.D. □ us assume a counting query 푓 : D → N. Then, the asymmetric Laplace mechanism 푎퐿푎푝 is In general, 훿 should be small (conventionally, 훿 is set as 1/푛 where ( ) ( ) + 푛 is the number of data) because 훿 means the probability that DP 푎퐿푎푝휖 푓 , 퐷 = 푓 퐷 휆 breaks. We note that 훿 of 휖-DP is 0, which means that 휖-DP cannot where 휆 is following the asymmetric Laplace distribution with 휖 as achieve OTP. If 푘 = 1 and 훿 = 1/푛, we achieve OTP with only the follows: probability 1/푛. Therefore, DP is not suited for our setting where if 푃-sensitivity of 푓 is monotonically decreasing, OTP is required. ( 휖 exp −휆휖 (휆 ≥ 0) Δ푓 ,푃 Δ푓 ,푃 4.3 OTP in ADP 0 (표푡ℎ푒푟푤푖푠푒) Here, we derive the upper bound of a probability that ADP achieves else if 푃-sensitivity of 푓 is monotonically increasing, OTP, which means that there are one-sided error mechanisms. Our ( 휖 exp 휆휖 (휆 ≤ 0) neighboring notion is asymmetric as described above, so the ham- Δ푓 ,푃 Δ푓 ,푃 ming distance in DP is not defined. Then, instead of the hamming 0 (표푡ℎ푒푟푤푖푠푒) distance, we use the minimum step. else, DEFINITION 14 (THEMINIMUMSTEP). The minimum step from 휖 |휆| 휖 퐷 to 퐷 ′ is the minimum number of steps to take to change 퐷 to 퐷 ′ exp 2Δ푓 ,푃 Δ푓 ,푃 via 푃-neighboring datasets. The distribution in the case where 푓 is not monotonic is the If the minimum step from 퐷0 to 퐷푘 is 푘, there are datasets same as the Laplace distribution. When 푃-sensitivity is monotonic, 퐷0, 퐷1, . . . , 퐷푘 such that 퐷푖 ∈ 푁푃 (퐷푖−1) for all 1 ≤ 푖 ≤ 푘. the distribution is the exponential distribution, which has a smaller variance of 1/휖2. Therefore, when the query is monotonic, we can THEOREM 9 (OTP IN (푃, 휖)-ADP). Let us assume that 푚 satis- improve accuracy. We note that the estimator of the answer with fies (푃, 휖)-ADP, and the minimum step to change the query answer the Laplace noise is the answer itself, but the estimator for the from 퐷 is 푘. Then, the following inequalities hold: asymmetric Laplace mechanism is 푎퐿푎푝휖 (푓 , 퐷) − 1/휖. 1 Pr[푂푇푃;푞(퐷) = 1] ≤ 1 − e푘휖 THEOREM 10. aLap satisfies푃, ( 휖)-ADP.

PROOF. Let 푆 denote the set consisting of all outputs that are OTP. PROOF. When 푃-sensitivity of 푓 is not monotonic, the asym- Then, we let 푆 ′ denote the complement set of 푆. Since 푚 satisfies metric Laplace mechanism satisfies (푃, 휖)-ADP from Theorem 5 (푃, 휖)-ADP, iteratively using inequality (2), because the Laplace mechanism satisfies 휖-DP. When 푃-sensitivity of 푓 is monotonically decreasing, we need to ′ 푘휖 ′ Pr[푚(퐷) ∈ 푆 ] ≤ e Pr[푚(퐷푘 ) ∈ 푆 ] show that ∀푧 ∈ Z 휖 ′ where 퐷푘 is a dataset such that the minimum step from 퐷 is 푘. Here, Pr[푎퐿푎푝휖 (푓 , 퐷) = 푧] ≤ e Pr[푎퐿푎푝휖 (푓 , 퐷 ) = 푧] assuming 푞(퐷) = 0 and 푞(퐷 ) = 1, we can reformulate the above 푘 where 퐷 ′ is a 푃-neighboring dataset of 퐷. We consider the case inequality as follows: ′ ′ where 푎퐿푎푝휖 (푓 , 퐷) = 푎퐿푎푝휖 (푓 , 퐷 ) (i.e., 푓 (퐷) + 휆 = 푓 (퐷 ) + 푘휖 ′ ′ 1 ≤ e (1 − Pr[푚(퐷푘 ) ∈ 푆]) (5) 휆 ). The difference is the largest when 휆 = 휆 + Δ푓 ,푃 since 푃- sensitivity is monotonically decreasing. In other words, Pr[휆′] is because Pr[푚(퐷 ) ∈ 푆] + Pr[푚(퐷 ) ∈ 푆 ′] = 1 and 푆 is the set of 푘 푘 the smallest when the difference is 푃-sensitivity (i.e., Δ ). There- OTP. Since 푞(퐷 ) = 1, Pr[푚(퐷 ) ∈ 푆] is the probability that we 푓 ,푃 푘 푘 fore, Pr[휆]/Pr[휆′] ≤ e휖 for all 휆, which proves the theorem. When achieve OTP. Therefore, by reformulating (5), the sensitivity of 푓 is monotonically increasing, we consider 휆′ = 1 Pr[푂푇푃;푞(퐷) = 1] ≤ 1 − 5 e푘휖 aLap is similar to OsdpLaplace [11], but we can only use OsdpLaplace for the count of non-sensitive records. However, we can use aLap for any records since aLap utilizes Q.E.D. □ the asymmetricity. 7 휆 − Δ푓 ,푃 and the following proof is the same as when sensitivity is maximum value. Subsequently, Ding et al. proposed an algorithm monotonically decreasing. that can output more information than Dwork’s one [10]. Ding’s Q.E.D. □ algorithm can output top-푘 arguments with gap-information without additional privacy leakage. Here, we propose a new algorithm that When -sensitivity is monotonic, the variance is / 2, which is 푃 1 휖 satisfies ADP instead of DP. By changing DP to ADP, the algorithm smaller than the Laplace mechanism, but when -sensitivity is not 푃 can output more information than Ding’s algorithm. Precisely, we monotonic, the error is the same as the Laplace mechanism. There- can output not the arguments but the top-푘 noisy values. Algorithm 1 fore, if we can construct a policy function such that -sensitivity 푃 푃 is the pseudocode. is monotonic, we should use ADP to make the output more accurate. Otherwise, we need not use ADP and should use DP since ADP’s privacy protection is weaker than DP. Algorithm 1 Asymmetric Report Noisy Max Algorithm Input: 퐷 : dataset, 휖 : privacy budget, 퐹 : a set of queries whose Remark. From the theorem of Hardt and Talwar [19], the lower 푃-sensitivity are monotonically decreasing and 1, bound of the Euclidean distance error is proportional to the volume Output: Top-푘 noisy values. of the sensitivity hull (see [19] for detail). Asymmetric neighboring 1: for each query 푓 ∈ 퐹 do relationship halves the volume, so the lower bound is also halved, 푖 √ 2: 휆 ∼ aLap (푓 , 퐷) which means 2/(2휖). Therefore, the asymmetric Laplace mech- 휖/푘 푖 ˜ anism does not reach optimal because the sensitivity hull is not 3: 푓푖 (퐷) ⇐ 푓푖 (퐷) + 휆 isotropic. Hardt and Talwar’s isotropic transformation method may 4: end for ˜ ˜ ˜ be useful to achieve optimally, but this is beyond this paper’s scope. 5: (푗1, 푗2, . . . , 푗푘 ) ⇐ 푎푟푔푚푎푥푘 {푓0, 푓1,..., 푓푡 } ˜ ˜ ˜ 6: return ((푗1, 푓푗1 (퐷)), (푗2, 푓푗2 (퐷)),..., (푗푘, 푓푗 (퐷)) 5.1.1 Decision Problem with the Asymmetric Laplace Mech- 푘 anism. We can construct a mechanism for a decision problem with 푎퐿푎푝. Here, we consider as a decision problem the query There is a difference in the assumption from Ding’s algorithm. 푞 : D → {0, 1} which asks whether a counting query 푓 (퐷) is un- We assume that 푃-sensitivity of counting queries is monotonically der the threshold or not. By using 푎퐿푎푝휖 (푓 , 퐷) for 푓 (퐷), 푞 satisfies decreasing due to the policy function. This assumption enables the (푃, 휖)-ADP. mechanism to improve output information for two aspects. First, the proposed algorithm can use the asymmetric Laplace distribution THEOREM 11. When 푃-sensitivity of 푓 is monotonic, the above with 푘/휖 instead of the Laplace mechanism with 2푘/휖, which means mechanism 푞 achieves OTP with the probability of 1 − 1 where 푘 √ e푘휖 the Euclidean distance error is 1/(2 2) times smaller than the one is the minimum step. of the Laplace distribution. Second, the proposed algorithm can PROOF. Here, we consider the case where 푃-sensitivity of 푓 is output top-푘 noisy values instead of arguments with gap-information. monotonically decreasing. In this case, OTP is the output 1 (i.e., Gap information is a gap between two neighboring answers (e.g., below-the-threshold) because the noise of 푎퐿푎푝 is always positive 푓푗1 (퐷) − 푓푗2 (퐷)), so Ding’s method needs to use privacy budget to so that the noise does not change the above-the-threshold answer, measure the top-푘 values. Our method does not need to measure which means Pr[푚(퐷) = 1;푞(퐷) = 0] = 0 (i.e., the definition of them because it outputs noisy values. OTP). The probability of OTP is the probability that the variable of THEOREM 12. Algorithm 1 satisfies (푃, 휖)-ADP. the exponential distribution is under 푘. Therefore, ∞ ∫ 휖 −휆휖 PROOF. Let 퐻 = (휆 , 휆 , . . . , 휆 ) denote the noise vector whose 1 − exp 푑휆 1 2 푡 푘 Δ푓 ,푃 Δ푓 ,푃 each element 휆푖 is noise for query 푓푖 used in Algorithm 1. Then, we consider the following noise vector 퐻 ′ = (휆′, 휆′, . . . , 휆′). Since the query is counting query (i.e., Δ푓 ,푃 = 1), the probability is 1 2 푡 1 − 1 .  ( ∈ I푐 ) e푘휖 ′ 휆푖 푖 푧 휆푖 = ′ Q.E.D. □ 휆푖 + 푓푖 (퐷) − 푓푖 (퐷 )(푖 ∈ I푧) 푐 Therefore, we can output OTP with the highest probability by where I푧 represents top-푘 arguments and I푧 represents other argu- using 푎퐿푎푝 for this decision problem. ments. Since the sensitivity of queries is monotonically decreasing, the answer of 푚퐻 (퐷) (i.e., the top-푘 arguments and values corre- 5.2 Top-k Query sponding to the arguments) is the same as 푚퐻 ′ (퐷). It holds that ′ ′ Here, we propose two ADP mechanisms for a top-푘 query: asym- 푚퐻 (퐷) = 푚퐻 ′ (퐷 ), so we can define 휙퐷,퐷′,푧 (퐻) = 퐻 as acyclic lo- metric report noisy max algorithm and asymmetric sparse vector cal alignments. Additionally, due to the monotonicity, each variable technique, which are based on the popular algorithms for DP [15]. is following the exponential distribution, and the cost is 휖. Therefore, Due to the asymmetricity of ADP, our algorithms can output a more the conditions of Theorem 7 hold. accurate top-푘 histogram than DP. Q.E.D. □ 5.2.1 Asymmetric Report Noisy Max Algorithm. The report 5.2.2 Asymmetric Sparse Vector Technique. Next, we propose noisy arg-max is an algorithm satisfying DP to answer the argument the asymmetric sparse vector technique, which is the ADP version of a noisy max value (see Section 2.2.2 for detail). Dwork et al. first of the sparse vector technique [14]. We consider a set of count- proposed the algorithm [15] outputting only the argument with a ing queries 퐹 = {푓1, 푓2, . . . , 푓푡 } and assume that 푃-sensitivity of the 8 = queries satisfies decreasing monotonicity and Δ푓푖,푃 1. Naively an- The traditional sparse vector technique needs to measure the top-푘 swering each query with the asymmetric Laplace mechanism using values in addition to the sparse vector technique to make a top-푘 휖 finally consumes privacy budget 푡휖 due to the sequential compo- histogram. Therefore, Ding et al. split the privacy budget in half for sition (Theorem 6). The asymmetric sparse vector technique uses the sparse vector technique and measure. However, our asymmetric a decision problem that asks whether 푓푖 (퐷) is under the threshold sparse vector technique does not require the measurement because it 푇푖 or not instead of the counting query (see Section 5.1.1 for detail) can output noisy values. to save the privacy budget. If the query answer is OTP (i.e., the noisy answer is below-the-threshold), the algorithm answers the de- 5.3 Location Monitoring cision problem without consuming the privacy budget. If the query Since trajectory information is high-sensitive [6], DP requires much answer is above-the-threshold, the algorithm consumes the privacy noise to publish a trajectory itself, which also applies to ADP. There- budget, so the algorithm answers the counting query with 푎퐿푎푝휖/푐 . fore, instead of publishing a trajectory itself, we propose mechanisms Algorithm 2 is the pseudocode. to publish information about a location’s safety using counting query as a use-case of a decision problem. We define safe as the number Algorithm 2 The asymmetric sparse vector technique of target people (e.g., infected people) who have visited the location Input: 휖, 푐: the number of above-the-threshold answers, 퐹 = is under a threshold. We propose two types of queries to monitor safety. {푓0, 푓1, . . . , 푓푡 }: a set of queries whose 푃-sensitivity are monoton- ically decreasing and 1, 푇 = {푇0,푇1, . . . ,푇푡 }: a set of thresholds (1) query whether a location is safe in a range of time or not Output: out (2) query whether locations were safe at a certain time or not 1: count= 0 In the first query, we designate a location we want to monitor. In 2: for each query 푓푖 ∈ 퐹 do the second query, we designate a time where we want to monitor 3: 푧 ⇐ 푎퐿푎푝휖/푐 (푓푖, 퐷) locations. By defining a policy function as Example 1 (i.e., not 4: if 푧 ≥ 푇푖 then visiting is non-sensitive and visiting is sensitive), we can construct 5: Output 푎푖 = 푧 ADP mechanisms with one-sided error. Here, we assume that a 6: count=count+1, abort if count≥ 푐. dataset is frequently updated when data come or data is expired 7: else because the risk information change according to updated data. 8: Output 푎 = 0 푖 5.3.1 Monitoring a Location. Here, we consider the query 푞 : 9: end if D → {0, 1}, which asks whether the target location is safe or not. In 10: end for other words, the query answers whether the number of target people who have recently visited the location is under the threshold or not. Algorithm 3 is the pseudocode of the mechanism for this query. THEOREM 13. Algorithm 2 satisfies푃, ( 휖)-ADP. Algorithm 3 Monitoring a location PROOF. We use the technique of randomness alignment to prove Input: this theorem. We assume that the algorithm outputs 퐴 = (푎0, 푎1, . . . , 푎푡 ) 휖, 푓 : 푐표푢푛푡푖푛푔 푞푢푒푟푦,푇 : 푡ℎ푟푒푠ℎ표푙푑 Output: out using 퐻 = (휆0, 휆1, . . . , 휆푡 ) as the random variables for 푎퐿푎푝 (i.e., 1: initialize Queue 푚퐻 (퐷) = 퐴). Then, we consider the following randomness align- ′ ′ ′ ′ 2: while batch at input do ment 퐻 = (휆0, 휆1, . . . , 휆푚). 3: initialize data in batch to be False  ′ 휆푖 (푎푖 = 0) 4: enqueue all data in batch to Queue 휆푖 = ′ 휆푖 + 푓푖 (퐷) − 푓푖 (퐷 )(푎푖 = 푧) 5: while Queue.tail is expired do ′ ′ ′ ( ) = ′ ( ) = 6: Queue.dequeue() It holds that 푚퐻 퐷 퐴, which means that 휙퐷,퐷 ,푧 퐻 퐻 7: end while is an acyclic local alignment. This is because 푓푖 is monotonically decreasing so that the below-the-threshold answer does not change 8: if Any data.mark in Queue is True then ′ ′ 9: and 휆 + 푓푖 (퐷 ) = 휆 + 푓푖 (퐷). Therefore, the cost is 푐 ∗ 휖/푐 = 휖, so the back to line 2 conditions of Theorem 7 hold. 10: end if Q.E.D. □ 11: if 푧 = 푎퐿푎푝휖 (푓 , 푄푢푒푢푒) ≥ 푇 then 12: Output 푎푖 = 푧 We note that when the query is monotonically increasing, we can 13: for data in Queue do get below-the-threshold answers in the same way. 14: data.mark = True The asymmetricity improves three aspects from the sparse vec- 15: end for tor technique (Section 2.2.1). First, the asymmetric sparse vector 16: else technique does not require noise for the threshold. Second, we can 17: Output 푎푖 = 0 use the√ asymmetric Laplace mechanism with 휖/푐, which has the 18: end if 1/(2 2) times smaller error than the Laplace mechanism. Third, we 19: end while can answer a noisy counting query when the algorithm consumes the privacy budget. These three differences improve accuracy as shown The algorithm outputs whether the location is safe or not every in Section 6. time a new batch of data comes (Line 2). The dataset is then updated 9 by adding a new batch and removing expired data (Line 3-7). We Algorithm 4 Monitoring locations at a certain time note that we predefine the expired period of data. E.g., if we predefine Input: 휖, 퐿,푇, {푓푙 }푙 ∈퐿 the period as two weeks, the data is expired when two weeks have Output: out passed since the user of the data has visited the location. Using 1: initialize Array aLap, we perturb the count and compare the perturbed count with 2: initialize marks of all locations to be False the given threshold. If the perturbed count is above the threshold, we 3: while batch at input do output the noisy value and mark all data in Queue (Line 11-15). This 4: append all data in batch to Array marking represents the consumption of 휖, so we cannot continue 5: for each location 푙 ∈ 퐿 do to output if the data is marked (Line 8-9). If the perturbed count is 6: if 푙.mark = True then below the threshold, we output 0, which does not consume 휖 due to 7: back to Line 4 the asymmetric sparse vector technique (Line 17). 8: end if This algorithm satisfies (푃, 휖)-ADP, and output 0 is OTP (i.e., safe 9: if 푧푙푖 = 푎퐿푎푝휖 (푓푙, 퐴푟푟푎푦) ≥ 푇 then information is accurate). We can continue to monitor the safety of 10: 푙.mark = True the location with the highest probability of OTP due to the property 11: Output 푎푙푖 = 푧푙푖 of aLap described in Section 5.1.1. 12: else We note that to monitor multiple locations, we need to separate 13: Output 푎푙푖 = 0 the privacy budget (e.g., if we want to monitor three locations, we 14: end if need to use 휖/3 for each query). 15: end for 16: end while THEOREM 14. Algorithm 3 satisfies (푃, 휖)-ADP.

PROOF. We consider all batches used in the algorithm as dataset We note that to monitor multiple times, we need to separate the 퐷. We assume that output is 푚퐻 (퐷) = 퐴 = (푎0, 푎1, . . . , 푎푡 ) where privacy budget (e.g., if we monitor three designated times, we need the used noise vector is 퐻 = (휆0, 휆1, . . . , 휆푡 ). Consider the following to use 휖/3 for each query). randomness alignment. THEOREM 15. Algorithm 4 satisfies푃, ( 휖)-ADP  ′ 휆푖 (푎푖 = 0) PROOF. We consider all batches used in the algorithm as dataset 휆푖 = ′ 휆푖 + 푓푖 (퐷) − 푓푖 (퐷 )(푎푖 = 푧) ( ) = = ( ) 퐷. We assume that output is 푚퐻 퐷 퐴 푎푙00, 푎푙01, . . . , 푎푙|퐿|푡 = ( ) where the used noise vector is 퐻 휆푙00, 휆푙01, . . . , 휆푙|퐿|푡 . Consider where 푓푖 is the query used at 푖th update (i.e., 푓푖 (퐷) = 푓 (퐷푖푛푝푢푡푖 ) ′ the following randomness alignment. where 퐷 is the input at 푖th update). Then, it holds that푚 ′ (퐷 ) = 푖푛푝푢푡푖 퐻  ′ ′ ′ ′ ′ ′ 휆 (푎 = 0) 퐴 = 퐴 where 퐻 = (휆0, 휆1, . . . , 휆푡 ) due to the monotonicity. Since 푙푖 푙푖 휆푙푖 = ′ we stop the output when the above-the-threshold answer appears and 휆푙푖 + 푓푙푖 (퐷) − 푓푙푖 (퐷 )(푎푙푖 = 푧) until the corresponding record disappears, a change of one record where 푓 is the query used at 푖th update (i.e., 푓 (퐷) = 푓 (퐷푖푛푝푢푡 ) ′ 푙푖 푙푖 푙 푖 only affects one query answer. Therefore, it holds that 휆 = 휆 expect ′ ′ 푖 푖 where 퐷푖푛푝푢푡푖 is the input at 푖th update). Then, it holds that푚퐻 (퐷 ) = one of the indices. Therefore, the alignment cost is 휖, so the con- 퐴′ = 퐴 where 퐻 ′ = (휆′ , 휆′ , . . . , 휆′ ) due to the monotonicity. 푙00 푙01 푙|퐿|푡 ditions of Theorem 7 hold, which means that Algorithm 3 satisfies Since an individual can be one place at a certain time and we stop ( ) 푃, 휖 -ADP. the output when the above-the-threshold answer appears and until Q.E.D. the corresponding record disappears, the change of one record only □ affects one query answer. Therefore, it holds that ′ = expect 휆푙푖 휆푙푖 one of the indices. Therefore, the alignment cost is 휖, so the con- 5.3.2 Monitoring Locations at a Designated Time. The first ditions of Theorem 7 hold, which means that Algorithm 4 satisfies mechanism adds up the needed privacy budget when locations to (푃, 휖)-ADP. monitor increase. By designating a time, we can monitor the un- Q.E.D. □ limited number of locations with a fixed privacy budget since each individual is at one location at a certain time. In other words, the 6 EXPERIMENTS query answers whether the number of target people who have been We conducted simulations of the two use-cases using real-world at the location at the time is below the threshold. Algorithm 4 is the datasets and evaluate their performance. We open the source code pseudocode of the mechanism for this query. used in these experiments on https://github.com/tkgsn/adp-algorithms. Like Algorithm 3, the mechanism perturbs the answer of 푓푙 , which is the counting query for location 푙 using aLap. If the perturbed count 6.1 Datasets and a Policy is above the threshold, the mechanism outputs noisy answer 푧 and 푙푖 First, we explain the datasets used for our experiments. We use marks the location because the mechanism consumes 휖 (Line 9-11). different datasets for each use-case. If the mechanism consumed 휖 for location 푙, we would skip the output for 푙 (Line 6-8). If the perturbed count is below the threshold, Top-푘 Query. For evaluating the asymmetric sparse vector tech- we output 0 without consuming 휖 due to the asymmetric property nique and the asymmetric report noisy max algorithm, we use two (Line 13). real-world datasets from [25] and a synthetic dataset created by the 10 Dataset # Records # Unique Items histogram is and top-푘 items are. We iterated this evaluation 10000 times, and we show the average of them in a solid line and its 95% BMS-POS 515,597 1,657 bootstrapped confidential interval in the shade. Kosarak 990,002 41,270 T40I10D100K 100,000 942 6.2.1 Asymmetric Report Noisy Max Algorithm. Varying 휖 value. Here, we fix 푘 as 100 and plot the results in Table 3: Statistics of datasets above Figure 2 with varying the value of 휖. We can see that our algorithm can achieve nearly 1 at accuracy for any dataset in 휖 = 0.5 although the free-gap algorithm cannot achieve 1 at even 휖 = 1. The histogram from our algorithm is about ten-times more accurate than the one from the free-gap algorithm. Varying 푘 value. Here, we fix 휖 as 0.5 and plot the results in below Figure 2 with varying the value of 푘. We can see that the free-gap algorithm more shapely decreases accuracy as 푘 increases than our algorithm. 6.2.2 Asymmetric Sparse Vector Technique. The sparse vector Figure 1: The number of locations with respect to the counts of technique requires a threshold. However, we cannot know the thresh- visited people. old for the top-푘 query. We took the same measure for this problem as the one of Ding et al. [10]. We randomly choose an integer 푖 from generator from the IBM Almaden Quest research group: BMS-POS, [푘, 2푘] and we use the value indexed by the chosen integer (i.e., 푖th Kosarak, and T40I10D100K. These datasets are collections of click- from the top) as the threshold. stream, and each record is a stream of items. In this paper, we show Varying 휖 value. We fix 푘 as 100 and plot the results in above the results of only BMS-POS due to space limitation. We refer the Figure 3. We can see that our algorithm is about ten-times more readers to the full version for other datasets, but the results have the accurate than the free-gap algorithm. When 휖 = 1, accuracy is same tendency as BMS-POS. We show the statistics of each dataset almost the same. This is because the sparse vector technique requires in Table 3. the threshold, which is sensitive information so that the maximum Location Monitoring. For evaluating our proposed location mon- accuracy is limited. Our algorithm reaches the maximum accuracy itoring mechanisms, we use the Peopleflow dataset6 as trajectory at lower 휖 than the free-gap algorithm. data of infected people. Trajectory data in the Peopleflow dataset Varying 푘 value. Here, we fix 휖 as 0.5 and plot the results in below includes location_id, which represents the category of the location Figure 3 with varying the value of 푘. The same as the above results, (e.g., restaurant or amusement facility). There are 5835 locations when 푘 is small, accuracy is almost the same, and our algorithm in the Peopleflow dataset, and we assume that if a point includes achieves similar results on large 푘 although the free-gap algorithm a location_id, the individual has visited the location. We randomly sharply decreases accuracy. separate individuals into a batch with the size of 500. We plot the number of locations according to the counts of visited people using 6.2.3 Remark. Our algorithms output a more accurate answer a randomly chosen batch in Figure 1. We can see that most locations to a top-푘 query for any dataset, 푘, and 휖 than Ding’s algorithm are not visited. Counts concentrate on locations such as a station. by providing the privacy guarantee of DP to only sensitive values defined by the policy. Therefore, if a user wants to hide non-click Policy. Here, we use the policy in Example 1. That is, we assume information, these algorithms are not appropriate. Still, if a user can that the fact that a user has clicked an item is sensitive, and the compromise the policy, we can improve the accuracy. fact that a user has not clicked an item is non-sensitive for the click The asymmetric report noisy 푘-max algorithm is more accurate datasets. For the trajectory dataset, we assume that the fact that than the asymmetric sparse vector technique for the top-푘 query. a user has visited a location is sensitive, and the fact that a user However, we note that the asymmetric sparse vector technique out- has not visited a location is non-sensitive. This policy induces the puts more information: noisy values of above-the-threshold answers monotonically decreasing at 푃-sensitivity when counting the fact and arguments of below-the-threshold answers. Additionally, since that a user has clicked an item (visited a location) for the same reason the decision problem in the asymmetric sparse vector technique is as Example 5. one-sided, arguments of below-the-threshold answers are accurate.

6.2 Top-k Query 6.3 Location Monitoring We here compare our algorithms with the state-of-the-art algorithms Here, the goal is to publish one-sided safety information. In other satisfying DP [10], which we call the free-gap algorithm. To see words, the algorithm outputs whether the number of target people how the parameters impact the results, we show the results varying 휖 who have visited the location is under the threshold or not. Therefore, and 푘. Here, we use two measurements: the mean squared error and we guarantee that the false-positive ratio = 0. However, we cannot accuracy of top-푘 items. I.e., we evaluate how accurate the published guarantee that the false-negative ratio = 0 due to the constraint of 6http://pflow.csis.u-tokyo.ac.jp/ ADP. Then, we use the false-negative ratio as the utility measure. 11 Figure 2: The results of the asymmetric report noisy 푘-max al- Figure 3: The results of the asymmetric sparse vector technique gorithm for BMS-POS, Kosarak, and T40I10D100K varying 휖 for BMS-POS, Kosarak, and T40I10D100K varying 휖 and 푘. and 푘.

6.3.1 Monitoring a Location. Here, we evaluate Algorithm 3. 6.3.2 Monitoring Locations at a Designated Time. Here, we Algorithm 3 continues to output until infected people do not appear. evaluate Algorithm 4. First, we visualize the outputs for each case Then, we evaluate the output for each update and plot the result in of the ground truth (휖 = ∞), 휖 = 0.5, 휖 = 1 and 휖 = 2 in Figure 5. Figure 4. The false-negative ratio increases when the true counts The threshold 푇 is set as 10. In this simulation, we fix the batch size (i.e., 푓 (퐷)) or the threshold decreases. If the true count is 0, the as 500 and update 5 times to add 5 batches, which means that the false negative ratio is less than 0.01 for 휖 = 1 and threshold= 5. This dataset’s final size is 2500. The blue dots represent safe locations, means that we can publish accurate information with more than 99% and orange dots are the hot spots. Comparing with the ground truth, for each update. we can see that the orange dots increase. These additional orange 12 errors decreases. The threshold is an important key to adjust this trade-off.

7 CONCLUSION We proposed asymmetric differential privacy, which is the relaxation of differential privacy to mitigate the constraints of differential pri- vacy. We proved that ADP improves the Euclidean error and allows one-sided error. Then, we proposed practical algorithms satisfying Figure 4: False negative ratio for each update when varying the ADP. We show by experiments with real-world datasets that we counts. The left figure is for threshold= 5 and the right figure is can get a more accurate top-푘 histogram and one-sided information for threshold= 10. about the safety of locations using our proposed algorithms. In this paper, we consider the single-dimensional counting query as the use case of ADP. However, ADP is the general definition so that we may introduce ADP to other queries to improve utility, which is one of the future directions.

REFERENCES [1] Jayadev Acharya, Kallista Bonawitz, Peter Kairouz, Daniel Ramage, and Ziteng Sun. 2020. Context Aware Local Differential Privacy. In International Conference on Machine Learning. PMLR, 52–62. [2] Aws Albarghouthi and Justin Hsu. 2017. Synthesizing Coupling Proofs of Dif- ferential Privacy. Proc. ACM Program. Lang. 2, POPL, Article 58 (Dec. 2017), 30 pages. https://doi.org/10.1145/3158146 [3] Johannes K Becker, David Li, and David Starobinski. 2019. Tracking anonymized bluetooth devices. Proceedings on Privacy Enhancing Technologies 2019, 3 (2019), 50–65. [4] USC Bureau. [n.d.]. On the map: Longitudinal employer-household dynamics, https://lehd.ces.census.gov/applications/help/onthemap.html#confidentiality _protection. [5] Yang Cao, Yonghui Xiao, Shun Takagi, Li Xiong, Masatoshi Yoshikawa, Yilin Shen, Jinfei Liu, Hongxia Jin, and Xiaofeng Xu. 2020. PGLP: Customizable and Figure 5: The monitored locations and hot spots (counts ≥ 10). Rigorous Location Privacy Through Policy Graph. In European Symposium on From left, ground truth, 휖 = 0.5, 휖 = 1 and 휖 = 2. Research in . Springer, 655–676. [6] Rui Chen, Benjamin CM Fung, Noman Mohammed, Bipin C Desai, and Ke Wang. 2013. Privacy-preserving trajectory data publishing by local suppression. Information Sciences 231 (2013), 83–97. [7] Rui Chen, Liang Li, Jeffrey Jiarui Chen, Ronghui Hou, Yanmin Gong, Yuanxiong Guo, and Miao Pan. 2020. COVID-19 Vulnerability Map Construction via Loca- tion Privacy Preserving Mobile Crowdsourcing. In GLOBECOM 2020-2020 IEEE Global Communications Conference. IEEE, 1–6. [8] Eva Clark, Elizabeth Y Chiao, and E Susan Amirian. 2020. Why contact tracing efforts have failed to curb coronavirus disease 2019 (covid-19) transmission in much of the united states. Clinical Infectious Diseases (2020). [9] Bolin Ding, Janardhan Kulkarni, and Sergey Yekhanin. 2017. Collecting telemetry data privately. In Advances in Neural Information Processing Systems. 3571–3580. [10] Zeyu Ding, Yuxin Wang, Danfeng Zhang, and Daniel Kifer. 2020. Free gap infor- mation from the differentially private sparse vector and noisy max mechanisms. Figure 6: The false negative ra-Figure 7: The false negative ra- Proceedings of the VLDB Endowment (2020). tio at each update. tio varying the threshold. [11] Stelios Doudalis, Ios Kotsogiannis, Samuel Haney, Ashwin Machanavajjhala, and Sharad Mehrotra. 2020. One-sided differential privacy. ICDE (2020). [12] . 2006. Differential privacy. In Proceedings of the 33rd interna- tional conference on Automata, Languages and Programming-Volume Part II. dots are the error (i.e., false negative), and the blue dots represent Springer-Verlag, 1–12. [13] Cynthia Dwork, Frank McSherry, , and Adam Smith. 2006. Cal- OTP (i.e., accurate safe information). The smaller 휖 is, the more ibrating noise to sensitivity in private data analysis. In Theory of orange dots exits due to the stronger requirement ADP. conference. Springer, 265–284. Next, we evaluate the false negative ratio. Here, the false-negative [14] Cynthia Dwork, , , Guy N Rothblum, and . 2009. On the complexity of differentially private data release: efficient algorithms ratio is the ratio of the number of wrong orange dots to the number and hardness results. In Proceedings of the forty-first annual ACM symposium on of all right blue dots in Figure 5. We plot the false negative ratio at Theory of computing. 381–390. each update in Figure 6. From this figure, the false-negative ratio is [15] Cynthia Dwork, Aaron Roth, et al. 2014. The algorithmic foundations of differ- ential privacy. Foundations and Trends in Theoretical 9, 3-4 at most 0.01. This means that we can publish 99% safe areas with the (2014), 211–407. guarantee of accuracy. [16] Úlfar Erlingsson, Vasyl Pihur, and Aleksandra Korolova. 2014. Rappor: Random- Finally, we evaluate the false negative ratio varying the threshold. ized aggregatable privacy-preserving ordinal response. In Proceedings of the 2014 ACM SIGSAC conference on computer and communications security. 1054–1067. When the threshold is small (e.g., 2), the false-negative ratio is very [17] Xiaolan Gu, Ming Li, Li Xiong, and Yang Cao. 2020. Providing input- high. By setting the smaller threshold, we can know the precise discriminative protection for local differential privacy. In 2020 IEEE 36th In- ternational Conference on Data Engineering (ICDE). IEEE, 505–516. information, but the output includes many errors. Setting the bigger [18] Yaron Gvili. 2020. Security analysis of the COVID-19 contact tracing specifi- threshold obscures the published information but the number of cations by Apple Inc. and Google Inc. IACR Cryptol. ePrint Arch. 2020 (2020), 13 428. [27] Takao Murakami and Yusuke Kawamoto. 2019. Utility-optimized local differential [19] Moritz Hardt and Kunal Talwar. 2010. On the geometry of differential privacy. privacy mechanisms for distribution estimation. In 28th {USENIX} Security In Proceedings of the forty-second ACM symposium on Theory of computing. Symposium ({USENIX} Security 19). 1877–1894. 705–714. [28] Simon Nicholas, Chris Armitage, Tova Tampe, and Kimberly Dienes. 2020. Public [20] Xi He, Ashwin Machanavajjhala, and Bolin Ding. 2014. Blowfish privacy: Tuning attitudes towards COVID-19 contact tracing apps: a UK-based focus group study. privacy-utility trade-offs using policies. In Proceedings of the 2014 ACM SIGMOD (2020). international conference on Management of data. 1447–1458. [29] Yuto Omae, Jun Toyotani, Kazuyuki Hara, Yasuhiro Gon, and Hirotaka Taka- [21] Noah Johnson, Joseph P Near, and Dawn Song. 2018. Towards practical differen- hashi. 2020. A Calculation Model for Estimating Effect of COVID-19 Contact- tial privacy for SQL queries. Proceedings of the VLDB Endowment 11, 5 (2018), Confirming Application (COCOA) on Decreasing Infectors. arXiv preprint 526–539. arXiv:2010.12067 (2020). [22] Daniel Kifer and Ashwin Machanavajjhala. 2014. Pufferfish: A framework for [30] Sangchul Park, Gina Jeehyun Choi, and Haksoo Ko. 2020. Information technology– mathematical privacy definitions. ACM Transactions on Database Systems (TODS) based tracing strategy in response to COVID-19 in South Korea—privacy contro- 39, 1 (2014), 1–36. versies. Jama (2020). [23] Fragkiskos Koufogiannis, Shuo Han, and George J Pappas. 2015. Optimality of [31] A. D. P. Team. [n.d.]. Learning with privacy at scale. the laplace mechanism in differential privacy. arXiv preprint arXiv:1504.00065 [32] Yuxin Wang, Zeyu Ding, Guanhong Wang, Daniel Kifer, and Danfeng Zhang. (2015). 2019. Proving differential privacy with shadow execution. In Proceedings of [24] Robert Kudelic.´ 2016. Monte-Carlo randomized algorithm for minimal feedback the 40th ACM SIGPLAN Conference on Programming Language Design and arc set problem. Applied Soft Computing 41 (2016), 235–246. Implementation. 655–669. [25] Min Lyu, Dong Su, and Ninghui Li. 2016. Understanding the sparse vector [33] Stanley L Warner. 1965. Randomized response: A survey technique for eliminating technique for differential privacy. arXiv preprint arXiv:1603.01699 (2016). evasive answer bias. J. Amer. Statist. Assoc. 60, 309 (1965), 63–69. [26] Naurang S Mangat. 1994. An improved randomized response strategy. Journal of [34] Danfeng Zhang and Daniel Kifer. 2017. LightDP: Towards Automating Dif- the Royal Statistical Society: Series B (Methodological) 56, 1 (1994), 93–95. ferential Privacy Proofs. SIGPLAN Not. 52, 1 (Jan. 2017), 888–901. https: //doi.org/10.1145/3093333.3009884

14