Tailoring Differentially Private Bayesian Inference to Distance

Tailoring Differentially Private Bayesian Inference to Distance Between Distributions Jiawen Liu*, Mark Bun**, Gian Pietro Farina*, and Marco Gaboardi* *University at Buffalo, SUNY. fjliu223,gianpiet,gaboardig@buffalo.edu **Princeton University. [email protected] Contents 1 Introduction 3 2 Preliminaries 5 3 Technical Problem Statement and Motivations6 4 Mechanism Proposition8 4.1 Laplace Mechanism Family.............................8 4.1.1 using `1 norm metric.............................8 4.1.2 using improved `1 norm metric.......................8 4.2 Exponential Mechanism Family...........................9 4.2.1 Standard Exponential Mechanism.....................9 4.2.2 Exponential Mechanism with Hellinger Metric and Local Sensitivity.. 10 4.2.3 Exponential Mechanism with Hellinger Metric and Smoothed Sensitivity 10 5 Privacy Analysis 12 5.1 Privacy of Laplace Mechanism Family....................... 12 5.2 Privacy of Exponential Mechanism Family..................... 12 5.2.1 expMech(;; ) −Differential Privacy..................... 12 5.2.2 expMechlocal(;; ) non-Differential Privacy................. 12 5.2.3 expMechsmoo −Differential Privacy Proof................. 12 6 Accuracy Analysis 14 6.1 Accuracy Bound for Baseline Mechanisms..................... 14 6.1.1 Accuracy Bound for Laplace Mechanism.................. 14 6.1.2 Accuracy Bound for Improved Laplace Mechanism............ 16 6.2 Accuracy Bound for expMechsmoo ......................... 16 6.3 Accuracy Comparison between expMechsmoo, lapMech and ilapMech ...... 16 7 Experimental Evaluations 18 7.1 Efficiency Evaluation................................. 18 7.2 Accuracy Evaluation................................. 18 7.2.1 Theoretical Results.............................. 18 7.2.2 Experimental Results............................ 20 1 7.3 Privacy Evaluation.................................. 22 8 Conclusion and Future Work 22 2 Abstract Bayesian inference is a statistical method which allows one to derive a posterior distribution, starting from a prior distribution and observed data. Several approaches have been explored in order to make this process differentially private. For example, [6], and [11] proved that, under specific conditions, sampling from the posterior distribution is already differentially private. [15], [9], designed differentially private mechanisms that output a representation of the full posterior distribution. When the output of a differentially private mechanism is a probability distribution, accuracy is naturally measured by means of probabilistic distances measuring how far this distribution is from the original one. Some classical examples are total variation distance, Hellinger distance, χ2-distance, KL-divergence, etc. In this work, we explore the design space for differentially private bayesian inference, by considering different metrics and different algorithms we design a mechanism. We focus on two discrete models, the Beta-Binomial and the Dirichlet-multinomial models, and one probability distance, Hellinger distance. Our mechanism can be understood as a version of the exponential mechanism where the noise is calibrated to the smooth sensitivity of the utility function, rather than to its global sensitivity. In our setting, the utility function is the probability distance we want to use to measure accuracy. To show the usefulness of this mechanism we show an experimental analysis comparing it with mechanisms based on the Laplace mechanism. 1 Introduction Data analysis techniques are broadly used in various applications in different areas to improve their services, including disease-medicine service, financial service, location service, social network and so on. In order to provide a better service, large of data are collected from users for analysis. As a consequence, the data privacy came to be a critical issue. Sensitivity information in data can be revealed through the analysis results. The key challenge here is to release a private analysis results, from which adversary cannot observe individual's sensitive information in data. Plenty of work have been developed to solve this issue, guaranteeing the privacy in specific data analysis algorithms. They achieved the −differential privacy by adopting either Laplace mechanism or achieved the (, δ)−differential privacy. But they are not giving better accuracy than the differential privacy mechanism itself. Here, we are proposing mechanism with better accuracy. Our work is conducted under a Bayesian inference scenario, where the posterior distribution is the analysis result we obtained from the data. Publishing the posterior distribution inferred from a sensitive dataset can leak information about the individuals in the dataset. In order to guarantee differential privacy and to protect the individuals' data we can add noise to the posterior before releasing it. The amount of the noise that we need to introduced depends on the privacy parameter and the sensitivity of the inference to small changes in the 3 data set. Sensitivity can be computed in many different ways based on which metric space we consider on the output set of the mechanism. In the literature on private Bayesian inference ([15, 13]), it is only measured with respect to the vector of numbers parametrizing the output distribution using, e.g. the `1 norm. A more natural approach which we explore here, is to measure sensitivity with respect to a metric on the space of inferred probability distributions. A re-loved question is that of how to measure accuracy. Again, this can be answered in different ways based on the metric imposed on the output space, and yet again only in few works in literature (e.g. [15]) distances between probability measures have been used for these purposes. The question that this work aims at answering is whether an approach based on probability metrics can improve on the accuracy of approaches based on metrics over the numeric parameters of the distributions. We will see that in some cases this can happen. Main contributions. • We designed a differentially private Bayesian inference mechanism based on the standard exponential mechanism. • We explored two ways to improve the accuracy: 1) calibrating noise to the sensitivity of a metric over distributions (e.g. Hellinger distance (H), f- divergences, etc. ). 2) Using a smooth upper bound on the local sensitivity and scale the noise to this smooth bound rather than global sensitivity, to improve the mechanism accuracy. • A full proof on the newly designed mechanism is (, δ)−differential privacy is given in paper. • We implemented the new proposed mechanism and other art-of-state mechanisms, comparing the performance in terms of accuracy and privacy. Related Work. A plentiful of data analysis algorithms have been studied to preserve differential privacy, including the subspace clustering algorithm [10], the gradient dece- dent algorithm in deep learning [1], logical regression [3], principle component analysis [4], probabilitic inference [12] and convergence in statistic estimation [2], etc. In Bayesian Inference data analysis, mechanisms are proposed corresponded to maintain their differential privacy, focusing on 3 different goals: 1) Inherited differential privacy property of posterior sampling in Bayesian inference. [6], [15], [16] and [11]. 2) Data sampled and released from posterior distribution of Bayesian is differentially private [14], [7], [9]. 3) The inference process is differentially private and the posterior distribution released should be private itself, in the meantime, with good accuracy. The third topic where our work focus on is still very new. 4 2 Preliminaries Bayesian Inference. Given a prior belief Pr(θ) on some parameter θ, and an observation x, the posterior distribution on θ given x is computed as: Pr(xjθ) · Pr(θ) Pr(θjx) = Pr(x) where the expression Pr(xjθ) denotes the likelihood of observing x under a value of θ. Since we consider x to be fixed, the likelihood is a function of θ. For the same reason Pr(x) is a constant independent of θ. Usually in statistics the prior distribution Pr(θ) is chosen so that it represents the initial belief on θ, that is, when no data has been observed. In practice though, prior distributions and likelihood functions are usually chosen so that the posterior belongs to the same family of distributions. In this case we say that the prior is conjugate to the likelihood function. Use of a conjugate prior simplifies calculations and allows for inference to be performed in a recursive fashion over the data. Then, we have: Pr(xjθ) · Pr(θ) bysInfer(Pr(xjθ); Pr(θ); x) = Pr(x) Beta-binomial System. In this work we will consider a specific instance of Bayesian inference and one of its generalizations. specifically, a Beta-binomial mode. We will consider the situation the underlying data is binomial distribution (s binomial(θ)), where θ represents the parameter {informally called bias{ of a Bernoulli distributed random variable. The prior distribution over θ 2 [0; 1] is going to be a beta distribution, beta(α; β), with parameters α; β 2 R+, and with p.d.f: θα(1 − θ)β Pr(θ) ≡ B(α; β) where B(·; ·) is the beta function. The data x will be a sequence of n 2 N binary values, that is x = (x1; : : : xn); xi 2 f0; 1g, and the likelihood function is: Pr(xjθ) ≡ θ∆α(1 − θ)n−∆α n X where ∆α = xi. From this, the inference can be easily derived: i=1 bysInfer(binomial(θ); beta(α; β); x) = beta(α + ∆α; β + n − ∆α) Dirichlet-multinomial Systems. The beta-binomial model can be immediately generalized to Dirichlet-multinomial, with underlying

Tailoring Differentially Private Bayesian Inference to Distance

Hellinger Distance Based Drift Detection for Nonstationary Environments

Hellinger Distance-Based Similarity Measures for Recommender Systems

On Measures of Entropy and Information

An Information-Geometric Approach to Feature Extraction and Moment

Issue PDF (13986

Statistics As Both a Purely Mathematical Activity and an Applied Science NAW 5/18 Nr

On Some Properties of Goodness of Fit Measures Based on Statistical Entropy

New Developments in Statistical Information Theory Based on Entropy and Divergence Measures

Evaluating Fit Using Hellinger Discrimination and Dirichlet Process Prior

Lecture 35: December 4 35.1 the Fundamental Statistical Distances

Statistical Distances and Their Role in Robustness 3

Rare Event Analysis for Minimum Hellinger Distance Estimators Via Large Deviation Theory