Quick viewing(Text Mode)

White-Box Vs Black-Box: Bayes Optimal Strategies for Membership Inference

White-Box Vs Black-Box: Bayes Optimal Strategies for Membership Inference

White-box vs Black-box: Bayes Optimal Strategies for Membership Inference

Alexandre Sablayrolles 12 Matthijs Douze 2 Yann Ollivier 2 Cordelia Schmid 1 HerveJ´ egou´ 2

Abstract both the privacy and intellectual property associated with data. For neural networks, the privacy issue was recently Membership inference determines, given a sam- considered by Yeom et al. (2018) for the MNIST and CI- ple and trained parameters of a machine learn- FAR datasets. The authors evidence the close relationship ing model, whether the sample was part of the between overfitting and privacy of training images. This is training set. In this paper, we derive the optimal reminiscent of prior membership inference attacks, which strategy for membership inference with a few as- employ the output of the classifier associated with a particu- sumptions on the distribution of the parameters. lar sample to determine whether it was used during training We show that optimal attacks only depend on the or not (Shokri et al., 2017). loss function, and thus black-box attacks are as good as white-box attacks. As the optimal strat- At this stage, it is worth defining the different levels of egy is not tractable, we provide approximations of information to which the “attacker”, i.e., the membership it leading to several inference methods, and show inference , has access to. We assume that the attacker that existing membership inference methods are knows the data distribution and the specifications of the coarser approximations of this optimal strategy. model (training procedure, architecture of the network, etc), Our membership attacks outperform the state of even though they are not necessarily required for all meth- the art in various settings, ranging from a simple ods. We refer to the white-box setting as the case where the logistic regression to more complex architectures attacker knows all the network parameters. On a side note, and datasets, such as ResNet-101 and Imagenet. the setup commonly adopted in differential privacy (Dwork et al., 2006) corresponds to the white-box setting, where the attacker additionally knows all the training samples except 1. Introduction the one to be tested. Ateniese et al. (2015) state that “it is unsafe to release The black-box setting is when these parameters are unknown. trained classifiers since valuable information about the For classification models, the attacker has only access to the training set can be extracted from them”. The problem output for a given input, in one of the following forms: that we address in this paper, i.e., to determine whether a (i) the classifier decision; sample has been used to train a given model, is related to (ii) the loss of the correct label; the privacy implications of machine learning . They (iii) the full response for all classes. were first discussed in the context of support vector ma- Prior works on membership inference commonly assume chines (Rubinstein et al., 2009; Biggio et al., 2014). The (i) or (iii). Our paper focuses on the black-box case (ii), problem of “unintended memorization” (Carlini et al., 2018) in which we know the loss incurred by the correct label. appears in most applications of machine learning, such as The state of the art in this setting are the shadow models natural language processing systems (Carlini et al., 2018) proposed by Shokri et al. (2017). or image classification (Yeom et al., 2018). In our work, we use a probabilistic framework to derive More specifically, we consider the problem of membership a formal analysis of the optimal attack. This framework inference, i.e., we aim at determining if a specific image was encompasses both Bayesian learning, and noisy training, used to train a model, given only the (image, label) pair and where the noise is injected (Welling & Teh, 2011) or comes the model parameters. This question is important to protect from the stochasticity of SGD. Under mild assumptions on 1University Grenoble Alpes, Inria, CNRS, Grenoble INP, LJK the distribution of the parameters, we derive the optimal 2Facebook AI Research. Correspondence to: Alexandre Sablay- membership inference strategy. This strategy only depends rolles . on the classifier through evaluation of the loss, thereby show-

th ing that black-box attacks will perform as well as white-box Proceedings of the 36 International Conference on Machine attacks in this optimal asymptotic setting. This result may Learning, Long Beach, California, PMLR 97, 2019. Copyright 2019 by the author(s). explain why, to the best of our knowledge, the literature White-box vs Black-box: Bayes Optimal Strategies for Membership Inference does not report white-box attacks outperforming the state- popular neural nets can perfectly fit them in simple cases, of-the-art black-box-(ii) attacks. such as small datasets (CIFAR10) or Imagenet without data augmentation. Krueger et al. (2017) extend their analysis The aforementioned optimal strategy is not tractable, and argue in particular that the effective capacity of neural therefore we introduce approximations to derive an nets depends on the dataset considered. In a privacy con- explicit method for membership inference. As a text, Yeom et al. (2018) exploit this memorizing property byproduct of this derivation, we show that state-of-the- to watermark networks. As a side note, random labeling art approaches (Shokri et al., 2017; Yeom et al., 2018) are and data augmentation have been used for the purpose of coarser approximations of the optimal strategy. One of training a network without any annotated data (Dosovitskiy the approximation drastically simplifies the membership et al., 2014; Bojanowski & Joulin, 2017). inference procedure by simply relying on the loss and a cal- ibration term. We employ this strategy to the more complex In the context of differential privacy (Dwork et al., 2006), case of neural networks, and show that it outperforms all recent works (Wang et al., 2016; Bassily et al., 2016) sug- approaches we are aware of. gest that guaranteeing privacy requires learning systems to generalize well, i.e., to not overfit. Wang et al. (2015) show In summary, our main contributions are as follows: that Bayesian posterior sampling offers differential privacy guarantees. Abadi et al. (2016) introduce noisy SGD to We show, under a few assumptions on training, that the • learn deep models with differential privacy. optimal inference only depends on the loss function, and not on the parameters of the classifier. In other Membership Inference. A few recent works (Hayes terms, white-box attacks don’t provide any additional et al., 2017; Shokri et al., 2017; Long et al., 2018) have ad- information and result in the same optimal strategy. dressed membership inference. Yeom et al. (2018) propose a series of membership attacks and derive their performance. We employ different approximations to derive three • Long et al. (2018) observe that some training images are explicit membership attack strategies. We show that more vulnerable than others and propose a strategy to iden- state-of-the-art methods constitute other approxima- tify them. Hayes et al. (2017) analyze privacy issues arising tions. Simple simulations show the superiority of our in generative models. Dwork et al. (2015) and Sankarara- approach on a simple regression problem. man et al. (2009) provide optimal strategies for membership We apply a simplified, tractable, strategy to infer the inference in genomics data. • membership of images to the train set in the case of Shadow models. Shadow models were introduced by the public image classification benchmarks CIFAR and Shokri et al. (2017) in the context of black-box attacks. Imagenet. It outperforms the state of the art for mem- In this setup, an attacker has black-box-(iii) access (full re- bership inference, namely the shadow models. sponse for all classes) to a model trained on a private dataset, and to a public dataset that follows the same distribution as The paper is organized as follows. Section 2 reviews related the private dataset. The attacker wishes to perform member- work. Section 3 introduces our probabilistic formulation ship inference using black-box outputs of the private model. and derives our main theoretical result. This section also For this, the attacker simulates models by training shadow discusses the connection between membership inference models on known splits from the public set. On this simu- and differential privacy. Section 4 considers approximations lated models, the attacker can analyze the output patterns for practical use-cases, which allow us to derive inference corresponding to samples from the training set and from a strategies in closed-form, some of which are connected with held-out set. Shokri et al. (2017) propose to train an attack existing methods from the literature. Section 5 summarizes model that learns to predict, given an output pattern, whether the practical derived from our analysis. Finally, it corresponds to a training or held-out sample. If the attack Section 6 considers the more difficult case of membership model simply predicts “training” when the output activa- inference for real-life neural networks and datasets. tions fire on the correct class, this strategy is equivalent to Yeom et al. (2018)’s adversary. Salem et al. (2019) further 2. Related work show that shadow models work under weaker assumptions than those of Shokri et al. (2017). Leakage of information from the training set. Our work is related to the topics of overfitting and memorization 3. Membership inference model capabilities of classifiers. Determining what neural net- works actually memorize from their training set is not trivial. In this section, we derive the Bayes optimal performance A few recent works (Zhang et al., 2017; Yeom et al., 2018) for membership inference (Theorem 1). We then make evaluate how a network can fit random labels. Zhang et al. the connection with differential privacy and propose looser (2017) replace true labels by random labels and show that guarantees that prevent membership inference. White-box vs Black-box: Bayes Optimal Strategies for Membership Inference

3.1. Posterior distribution of parameters 3.3. Optimal membership inference

Let be a data distribution, from which we sample n N In Theorem 1, we derive the explicit formula for (✓, z1). D 2 M points z1,z2,...,zn . A machine learning Theorem 1. Given a parameter ✓ and a sample z1, the i.i.d.⇠ D n optimal membership inference is given by: produces parameters ✓ that incur a low loss i=1 `(✓, zi). Typically in the case of classification, z =(x, y) where x is P P(✓ m1 =1,z1, ) an input sample and y a class label, and the loss function ` (✓, z1)=E log | T + t , M T (✓ m1 =0,z1, )  ✓ ✓ P ◆ ◆ is high on samples (x, y0) for which y0 = y. | T 6 (4) We assume that the machine learning algorithm has some randomness, and we model it with a posterior distribution with t = log 1 . over parameters ✓ z1,...,zn. The randomness in ✓ either ⇣ ⌘ | comes from the training procedure (e.g., Bayesian posterior Proof. By the law of total expectation, we have: sampling), or arises naturally, as is the case with Stochastic Gradient methods. (✓, z1)=P(m1 =1 ✓, z1) (5) M | = E [P (m1 =1 ✓, z1, )] . (6) In general, we assume that the posterior distribution follows: T | T

1 n `(✓,zi) P(✓ z1,...,zn) e T i=1 , (1) Applying Bayes’ formula: | / P P (✓ m1 =1,z1, ) P(m1 =1) where T is a temperature parameter, intuitively controlling P (m1 =1 ✓, z1, )= | T , | T P (✓ z1, ) the stochasticity of ✓. T =1corresponds to the case of the | T (7) Bayesian posterior, T 0 the case of MAP (Maximum A ! ↵ ↵ Posteriori) inference and a small T to the case of averaged = = log (8) ↵ + SGD (Polyak & Juditsky, 1992). Note that we do not include ✓ ✓ ◆◆ a prior on ✓: we assume that the prior is either uniform on a where: bounded ✓, or that it has been factored in the loss term `. ↵ := P (✓ m1 =1,z1, ) P(m1 = 1) (9) | T 3.2. Membership inference := P (✓ m1 =0,z1, ) P(m1 = 0) (10) | T Given ✓ produced by such a machine learning algorithm, Given that P(m1 = 1) = , membership inference asks the following question: What information does ✓ contain about its training set z1,...,zn? ↵ P(✓ m1 =1,z1, ) log = log | T + log , P(✓ m1 =0,z1, ) 1 Formally, we assume that binary membership variables ✓ ◆ ✓ | T ◆ ✓ ◆ m1,m2,...,mn are drawn independently, with probability (11) = P(mi = 1). The samples for which mi =0are the test which gives the expression for (✓, z1). set, while the samples for which mi =1are the training set. M Equation (1) becomes Note that Theorem 1 only relies on the fact that ✓ given 1 n mi`(✓,zi) P(✓ z1,...,zn,m1,...,mn) e T i=1 , z1,...,zn,m1,...,mn is a random variable, but it does | / { } P (2) not make any assumption on the form of the distribution. In particular the loss ` does not appear in the expression. Taking the case of z1 without loss of generality, membership Theorem 2 uses the assumption in Equation (2) to further inference determines, given parameters ✓ and sample z1, explicit (✓, z ); we give its formal expression below, M 1 whether m1 =1or m1 =0. prove it, and analyze the expression qualitatively. Let us Definition 1 (Membership inference). Inferring the mem- first define the posterior over the parameters given samples bership of sample z1 to the training set amounts to comput- z2,...,zn and memberships m2,...,mn: ing: 1 n m `(✓,z ) e T i=2 i i p (✓):= 1 P n . (12) (✓, z1):=P(m1 =1 ✓, z1). (3) T T i=2 mi`(t,zi) M | t e dt P R Notation. We denote by the sigmoid function (u)= Theorem 2. Given a parameter ✓ and a sample z1, the u 1 (1 + e ) . We collect the knowledge about the optimal membership inference is given by: other samples and their memberships into the set = T z ,...,z ,m ,...,m . (✓, z1)=E [ (s(z1,✓,p )+t)] (13) { 2 n 2 n} M T T White-box vs Black-box: Bayes Optimal Strategies for Membership Inference where we define the following score: Another consequence of Theorem 2 is that a higher temper- ature T decreases s, and thus decreases P(m1 =1 ✓, z1): 1 `(t,z ) | ⌧ (z ):= T log e T 1 p(t)dt (14) it corresponds to the intuition that more randomness in ✓ p 1 ✓Zt ◆ protects the privacy of training data. 1 s(z1,✓,p):= (⌧p(z1) `(✓, z1)) . (15) T 3.4. Differential privacy and guarantees

Proof. Singling out m1 in Equation (2) yields the following In this subsection we make the link with differential privacy. ↵ expressions for and : Differential privacy (Dwork et al., 2006) is a framework that 1 `(✓,z ) 1 n m `(✓,z ) allows to learn model parameters ✓ while maintaining the e T 1 e T i=2 i i ↵ = (16) confidentiality of data. It ensures that even if a malicious 1 `(t,z ) 1 Pn m `(t,z ) e T 1 e T i=2 i i dt t attacker knows parameters ✓ and samples zi, i 2, for 1 `(✓,z ) P R e T 1 p (✓) which mi =1, the privacy of z1 is not compromised. = T , (17) 1 `(t,z ) e T 1 p (t)dt Definition 2 (✏-differential privacy). A machine learning t T algorithm is ✏-differentially private if, for any choice of z1 and R and , T 1 n m `(✓,z ) T i=2 i i P(✓ m1 =1,z1, ) e log | T <✏. (20) =(1 ) 1 P n =(1 )p (✓). T i=2 mi`(t,zi) T P(✓ m1 =0,z1, ) t e dt ✓ | T ◆ P (18) R Note that this definition is slightly different from the one of Thus, Dwork et al. (2006) in that we consider the removal of z1 rather than its substitution with z0. Additionally we consider 1 ↵ `(✓, z1) `(t,z1)) probability densities instead of probabilities of sets, without log = log e T p (t)dt + t T T loss of generality. ✓ ◆ ✓Zt ◆ = s(z1,✓,p )+t. (19) Property 1 (✏-differential privacy). If the training is ✏- T differentially private, then: Then, Equation (8) yields the expected result. ✏ P(m1 =1 ✓, z1) + . (21) |  4

The first in Theorem 2 is that (✓, z1) does M Proof. Combining Equation (20) and the fact that (u) not depend on the parameters ✓ beyond the evaluation of  the loss `(✓, ): this strategy does not require, for instance, (v) + max(u v, 0)/4 (Appendix A.3), we have: · internal parameters of the model that a white-box attack P(✓ m1 =1,z1, ) ✏ could provide. This means that if we can compute ⌧ or log | T + t (t)+ p P(✓ m1 =0,z1, )  4 approximate it well enough, then the optimal membership ✓ ✓ | T ◆ ◆ ✏ = + . (22) inference depends only on the loss. In other terms, asymptot- 4 ically, the white-box setting does not provide any benefit Combining this expression with Theorem 1 yields the result. compared to black-box membership inference. Let us analyze qualitatively the terms in the expression: Since is a training set, p corresponds to a posterior over Note that this bound gives a tangible sense of ✏. In general, T T this training set, i.e., a typical distribution of trained param- decreasing ✏ increases privacy, but there is no consensus eters. ⌧ (z ) is the softmin of loss terms `( ,z ) over these over “good” values of ✏; this bound indicates for instance p 1 · 1 typical models, and corresponds therefore to the typical loss that ✏ =0.01 would be sufficient for membership privacy. of sample z under models that have not seen z . 1 1 ✏-differential privacy gives strong membership inference The quantity ⌧p(z1) can be seen as a threshold, to which guarantees, at the expense of a constrained training proce- the loss `(✓, z1) is compared. Around this threshold, when dure resulting generally in a loss of accuracy (Abadi et al., `(✓, z ) ⌧ (z ), then s 0: since (t )=, the mem- 2016). However, if we assume that the attacker knows the 1 ⇡ p 1 ⇡ bership posterior probability (✓, z ) is equal to , and z ,i 2 for which m =1, ✏-differential privacy is required M 1 i i thus we have no information on m1 beyond prior knowledge. to protect the privacy of z1. Depending on the information As the loss `( ,z ) gets lower than this threshold, s becomes we have on z ,i 2, there is a continuum between differen- · 1 i positive. Since is non decreasing, when s(z1,✓,p ) > 0, tial privacy (all zi’s are known) and membership inference T (✓, z ) >and thus we gain non-trivial membership (only prior knowledge on z ). In the case of membership M 1 i information on z1. inference, it suffices to have the following guarantee: White-box vs Black-box: Bayes Optimal Strategies for Membership Inference

Definition 3 ((✏,) membership privacy). The training is 4.2. MALT: Constant ⌧ (✏,)-membership private for some ✏>0,> 0 if with If we further assume that ⌧( ) is constant, the optimal strat- probability 1 over the choice of : · T egy reduces to predicting that z1 comes from the training set if its loss `(✓, z1) is lower than this threshold ⌧, and from `(t, z1)p (t)dt `(✓, z1) ✏. (23) T  Zt the test set otherwise: Property 2. If the training is (✏,)-membership private, s (✓, z )= `(✓, z )+⌧. (31) then: MALT 1 1 ✏ P(m1 =1 ✓, z1) + + . (24) |  4T A similar strategy is proposed by Yeom et al. (2018) for Gaussian models. Carlini et al. (2018) estimate a secret Proof. Jensen’s inequality states that for any distribution p token in text datasets by comparing probabilities of the sen- and any function f: tence “My SSN is X” with various values of X. Surprisingly,

f(t) to the best of our knowledge, it has not been proposed in f(t)p(t)dt log e p(t)dt , (25) the literature to estimate the threshold ⌧ on public data, and t  t Z ✓Z ◆ to apply it for membership inference. As we show in Sec- s hence the score from Equation (15) verifies: tion 6, this simple strategy yields better results than shadow 1 models. Their attack models take as input the softmax acti- s(z1,✓,p) `(t, z1)p(t)dt `(✓, z1) . (26)  T t vation ✓(x) and the class label y, and predict whether the ✓Z ◆ sample comes from the training or test set. For classification Thus, distinguishing the cases and 1 in the expectation models, `(✓, (x, y)) = log(✓(x)y). Hence the optimal in Equation (13), attack performs: ✏ P(m1 =1 ✓, z1) +(1 ) + (27) |  4T sMALT(✓, (x, y)) = log(✓(x)y)+⌧. (32) ✏ ⇣ ⌘ + + , (28)  4T In Shokri et al. (2017), we argue that the attack model which gives the desired bound. essentially performs such an estimation, albeit in a non- explicit way. In particular, we believe that the gap between

Membership privacy provides a post-hoc guarantee on ✓, z1. Shokri et al. (2017)’s method and ours is due to instabilities Guarantees in the form of Equation (23) can be obtained by in the estimation of ⌧ and the numerical computation of the PAC (Probably Approximately Correct) bounds. log, as the model is given only ✓(x). As a side note, the expectation term in = z ,...,z ,m ,...,m is very T 2 n 2 n similar in spirit to the shadow models, and they can be 4. Approximations for membership inference viewed as a Monte-Carlo estimation of this quantity. Estimating the probability of Equation (15) mainly requires An with Gaussian data. We illustrate the to compute the term ⌧p. Since its expression is intractable, difference between a MALT (global ⌧) and MAST (per- we use approximations to derive concrete membership at- sample ⌧( )) on a simple toy example. Let’s assume we · tacks (MA). We now detail these approximations, referred estimate the mean µ of Gaussian data with unit variance. to as MAST (MA Sample Threshold), MALT (MA Loss We sample n values z1,...,zn from = (µ, I). The Threshold) and MATT (MA Taylor Threshold). 1 n D N estimate of the mean is ✓ = mizi where n0 = n0 i=1 i mi =1 . We have (see Appendix A.2 for derivations): 4.1. MAST: Approximation of ⌧(z1) |{ | }| P 1 2 We first make the mean-field assumption that p (t) does `(✓, zi):= zi ✓ (33) T not depend on (we note it p), and define 2k k T n0 2 1 ⌧(zi)= zi µ (34) `(t,z1) ⌧(z1) := log e T p(t)dt . (29) 2(n0 + 1)k k t ✓Z ◆ n0 2 n0 ⌧ = E z µ = d. (35) The quantity ⌧( ) is a “calibrating” term that reflects the 2(n0 + 1) k k 2(n0 + 1) · difficulty of a sample. Intuitively, a low ⌧(z1) means that the sample z1 is easy to predict, and thus a low value of The expression of ⌧(zi) shows that the “difficulty” of sample `(✓, z1) does not necessarily indicate that z1 belongs to the zi is its distance to µ, i.e., how untypical this sample is. train set. Thus, Theorem 2 gives the optimal attack model: Figure 1 shows the results with a global ⌧ or a per-sample ⌧: the per-sample ⌧ better separates the two distributions, sMAST(✓, z1)= `(✓, z1)+⌧(z1). (30) leading to an increased membership inference accuracy. White-box vs Black-box: Bayes Optimal Strategies for Membership Inference

����� ����� ��������� ��������� ������� �������

���� ���� ���� ��� � �� ��� ��� ��� ��� ��� � �� �� ���� ����� ���� �����

Figure 1. Comparison of MALT and MALT for membership inference on the mean estimator for Gaussian data (n =100samples in 2000 dimensions). Distribution of scores s used to distinguish between samples seen or not at training. MALT: a single threshold is used for all the dataset; MAST: each sample gets assigned a different threshold. MAST better separates training and non-training samples.

MATT: Estimation with Taylor expansion We as- error of the estimated parameters is of order 1/pn. There- sume that the posterior induced by the loss (✓)= fore the difference ✓ ✓0⇤ is of order 1/pn. On the right- n L i=1 mi`(✓, zi) is a Gaussian centered on ✓⇤, the mini- hand side, the matrix H is the summation of n sample-wise mum of the loss, with covariance C. This corresponds to Hessian matrices. Therefore, asymptotically, the right-hand theP Laplace approximation of the posterior distribution. The side shrinks at a rate 1/n, which is negligible compared 1 inverse covariance matrix C is asymptotically n times to the other, which shrinks at 1/pn. In addition to the the Fisher matrix (van der Vaart, 1998), which itself is the asymptotic reasoning, we verified this approximation exper- Hessian of the loss (Kullback, 1997): imentally. Thus, we approximate Equation (39) to give the following score: 1 T 1 (✓ ✓⇤) H(✓ ✓⇤) P(✓ z1, )= e 2 . | T det (2⇡H 1) T sMATT(✓, z1)= (✓ ✓0⇤) ✓`(✓0⇤,z1). (40) (36) r p ✓ We denote by ✓0⇤ (resp. ✓1⇤) the mode of the Gaussian corre- Equation (40) has an intuitive interpretation: parameters sponding to z2,...,zn (resp. z1,...,zn ), and H0 (resp. were trained using z1 if their difference with a set of parame- H { } { } H 1) the corresponding Hessian matrix. We assume that ters trained without z1 (i.e. ✓0⇤) is aligned with the direction is not impacted by removing z1 from the training set, and of the update ✓`(✓0⇤,z1). thus H := H0 H1 (cf. appendix A.4 for a more precise r justification). The⇡ log-ratio is therefore 5. Membership inference algorithms P(✓ m1 =1,z1, ) log | T (37) P(✓ m1 =0,z1, ) In this section, we detail how the approximations of ✓ | T ◆ 1 T 1 T s(✓, z1,p) are employed to perform membership inference. = (✓ ✓1⇤) H(✓ ✓1⇤)+ (✓ ✓0⇤) H(✓ ✓0⇤) 2 2 We assume that a machine learning model has been trained, T 1 T =(✓ ✓0⇤) H(✓1⇤ ✓0⇤) (✓1⇤ ✓0⇤) H(✓1⇤ ✓0⇤). yielding parameters ✓. We assume also that similar models 2 can be re-trained with different training sets. Given a sample The difference ✓⇤ ✓⇤ can be estimated using a Taylor z , we want to decide whether z belongs to the training set. 1 0 1 1 expansion of the loss gradient around ✓0⇤ (see e.g. Koh & Liang (2017)): 5.1. The 0-1 baseline 1 We consider as a baseline the “0-1” heuristic, which predicts ✓1⇤ ✓0⇤ H ✓`(✓0⇤,z1) (38) ⇡ r that z1 comes from the training set if the class is predicted Combining this with Equation (37) leads to correctly, and from the test set if the class is predicted incor- rectly. We note ptrain (resp. ptest) the classification accuracy T 1 T 1 on the training (resp. held-out) set. The accuracy of the (✓ ✓0⇤) ✓`(✓0⇤,z1) ✓`(✓0⇤,z1) H ✓`(✓0⇤,z1). r 2 r r heuristic is (see Appendix A.1 for derivations): (39) p = p +(1 )(1 p ). (41) We study the asymptotic behavior of this expression when bayes train test n . On the left-hand side, the parameters ✓ and ✓⇤ are !1 0 estimates of the optimal ✓⇤, and under mild conditions, the For example when =1/2, since p p this heuristic train test White-box vs Black-box: Bayes Optimal Strategies for Membership Inference is better than random guessing (accuracy 1/2) and the im- Table 1. Accuracy (top) and mAP (bottom) of membership infer- provement is proportional to the overfitting gap p p . train test ence on the 2-class logistic regression with simple CNN features, for different types of attacks. Note that 0-1 corresponds to the 5.2. Making hard decisions from scores baseline (Yeom et al., 2018). We do not report Shokri et al. (2017) since Table 2 shows MALT performs better. Results are averaged Variants of our method provide different estimates of over 100 different random seeds. s(✓, z1,p). Theorem 2 shows that this score has to be passed through a sigmoid function, but since it is an increasing func- Model accuracy Attack accuracy tion, the threshold can be chosen directly on these scores. n train validation 0 1 MALT MATT Estimation of this threshold has to be conducted on simu- 400 97.9 93.8 52.1 54.4 57.0 lated sets, for which membership information is known. We 1000 97.3 94.5 51.4 52.6 54.5 observed that there is almost no difference between chosing 2000 96.8 95.2 50.8 51.7 53.0 the threshold on the set to be tested and cross-validating 4000 97.7 95.6 51.0 51.4 52.1 it. This is expected, as a one-dimensional parameter the 6000 97.5 96.0 50.7 51.0 51.8 threshold is not prone to overfitting.

5.3. Membership algorithms mAPtest mAPtrain MALT: Threshold on the loss. Since ⌧ in Equation (31) is n MALT MATT MALT MATT constant, and using the invariance to increasing functions, 400 55.8 60.1 51.9 57.1 we need only to use loss value for the sample, `(✓, z ). 1 1000 53.2 56.6 50.5 54.8 MAST: Estimating ⌧(z1). To estimate ⌧(z1) in Equa- 2000 51.8 54.4 50.4 53.4 tion (29), we train several models with different subsamples 4000 51.9 53.7 50.1 52.6 of the training set. This yields a set of per-sample losses for 6000 51.4 53.0 50.2 52.2 z1 that are averaged into an estimate of ⌧(z1). MATT: the Taylor approximation. We run the training on a separate set to obtain ✓0⇤. Then we take a gradient step over the loss to estimate the approximation in Equation (40). Note that this strategy is not compatible with neural net- works because the assumption that parameters lie around a unique global minimum does not hold. In addition, pa- from n = 400 to 6, 000. rameters from two different networks ✓ and ✓ cannot be 0⇤ We train a logistic regression to separate the two classes. compared directly as neural networks that express the same The logistic regression takes as input features extracted from function can have very different parameters (e.g. because a pretrained Resnet18 on CIFAR-100 (a disjoint dataset). channels can be permuted arbitrarily). The regularization parameter C of the logistic regression is cross-validated on held-out data. 6. We assume that =1/2 (n/2 training images and n/2 test In this section we evaluate the membership inference meth- images). We also reserve n/2 images to estimate ✓0⇤ for the ods on machine-learning tasks of increasing complexity. MATT method. In both experiments, we report the peak accuracy obtained for the best threshold (cf. Section 5.2). 6.1. Evaluation Table 1 shows the results of our experiments, in terms of We evaluate three metrics: the accuracy of the attack, and accuracy and mean average precision. In accuracy, the the mean average precision when detecting either from train Taylor expansion method MATT outperforms the MALT (mAPtrain) or test (mAPtest) images. For the mean average method, for any number of training instances n, which itself precision, the scores need not to be thresholded, the metric obtains much better results than the naive 0-1 attack. is invariant to mapping by any increasing function. Interestingly, it shows a difference between MALT and MATT: both perform similarly in terms of mAPtest, but 6.2. Logistic regression MATT slightly outperforms MALT in mAPtrain. The main CIFAR-10 is a dataset of 32 32 pixel images grouped in reason for this difference is that the MALT attack is asym- ⇥ 10 classes. In this subsection, we consider two of the classes metric: it is relatively easy to predict that elements come (truck and boat) and vary the number of training images from the test set, as they have a high loss, but elements with a low loss can come either from the train set or the test set. White-box vs Black-box: Bayes Optimal Strategies for Membership Inference

Table 2. Accuracy of membership attacks on the CIFAR-10 classi- Table 3. Imagenet classification with deep convolutional networks: fication with a simple neural network. The numbers for the related Accuracy of membership inference attacks of the models. works are from the respective papers. Model Augmentation 0-1 MALT Method Accuracy Resnet101 None 76.3 90.4 0-1 (Yeom et al., 2018) 69.4 Flip, Crop 5 69.5 77.4 ± Shadow models (Shokri et al., 2017) 73.9 Flip, Crop 65.4 68.0 MALT 77.1 VGG16 None 77.4 90.8 MAST 77.6 Flip, Crop 5 71.3 79.5 ± Flip, Crop 63.8 64.3

6.3. Small convolutional network We experiment with the popular VGG-16 (Simonyan & 1 In this section we train a small convolutional network with Zisserman, 2014) and Resnet-101 (He et al., 2016) archi- the same architecture as Shokri et al. (2017) on the CIFAR- tectures. The model is learned in 90 epochs, with an initial 10 dataset, using a training set of 15, 000 images. Our model learning rate of 0.01, divided by 10 every 30 epochs. Param- is trained for 50 epochs with a learning rate of 0.001.We eter optimization is conducted with SGD with a momentum 4 assume a balanced prior on membership ( =1/2). of 0.9, a weight decay of 10 , and a batch size of 256. We run the MALT and MAST attacks on the classifiers. We conduct the membership inference test by running the As stated before, the MATT attack cannot be carried out 0-1 attack and MALT. An important factor for the success of on convolutional networks. For MAST, the threshold is the attacks is the amount of data augmentation. To assess the estimated from 30 shadow models: we train these 30 shadow effect of data augmentation, we train different networks with models on 15, 000 images chosen among the train+held-out varying data augmentation: None, Flip+Crop 5, Flip+Crop ± set (30, 000 images). Thus, for each image, we have on (by increasing intensity). average 15 models trained on it and 15 models not trained on it: we estimate the threshold for this image by taking the Table 3 shows that data augmentation reduces the gap be- value ⌧(z) that separates the best the two distributions: this tween the training and the held-out accuracy. This decreases corresponds to a non-parametric estimation of ⌧(z). the accuracy of the Bayes attack and the MALT attack. As we can see, without data augmentation, it is possible to Table 2 shows that our estimations outperform the related guess with high accuracy if a given image was used to train works. Note that this setup gives a slight advantage to a model (about 90% with our approach, against 77% for MAST as the threshold is estimated directly for each sample existing approaches). Stronger data augmentation reduces under investigation, whereas MALT first estimates a thresh- the accuracy of the attacks, that still remain above 64%. old, and then applies it to never-seen data. Yet, in contrast with the experiment on Gaussian data, MAST performs only slightly better than MALT. Our interpretation for this is that 7. Conclusion the images in the training set have a high variability, so it This paper has addressed the problem of membership infer- is difficult to obtain a good estimate of ⌧(z1). Furthermore, ence by adopting a probabilistic point of view. This led us our analysis of the estimated thresholds ⌧(z1) show that to derive the optimal inference strategy. This strategy, while they are very concentrated around a central value ⌧, so their not explicit and therefore not applicable in practice, does impact when added to the scores is limited. not depend on the parameters of the classifier if we have ac- Therefore, in the following experiment we focus on the cess to the loss. Therefore, a main conclusion of this paper MALT attack. is to show that, asymptotically, white-box inference does not provide more information than an optimized black-box 6.4. Evaluation on Imagenet setting. We then proposed two approximations that lead to three We evaluate a real-world dataset and tackle classification concrete strategies. They outperform competitive strategies with large neural networks on the Imagenet dataset (Deng for a simple logistic problem, by a large margin for our et al., 2009; Russakovsky et al., 2015), which contains 1.2 most sophisticated approach (MATT). Our simplest strat- million images partitioned into 1000 categories. We divide egy (MALT) is applied to the more complex problem of Imagenet equally into two splits, use one for training and membership inference from a deep convolutional network hold out the rest of the data. on Imagenet, and significantly outperforms the baseline. 1 2 convolutional and 2 linear layers with Tanh non-linearity. White-box vs Black-box: Bayes Optimal Strategies for Membership Inference

References David Krueger, Nicolas Ballas, Stanislaw Jastrzebski, De- vansh Arpit, Maxinder S. Kanwal, Tegan Maharaj, Em- Martin Abadi, Andy Chu, Ian Goodfellow, Brendan McMa- manuel Bengio, Asja Fischer, Aaron Courville, Simon han, Ilya Mironov, Kunal Talwar, and Li Zhang. Deep Lacoste-Julien, and Yoshua Bengio. A closer look at learning with differential privacy. In CCS, 2016. memorization in deep networks. In ICML, 2017. Giuseppe Ateniese, Luigi V Mancini, Angelo Spognardi, S. Kullback. Information Theory And Statistics. Dover Antonio Villani, Domenico Vitali, and Giovanni Felici. Publications, 1997. Hacking smart machines with smarter ones: How to ex- tract meaningful data from machine learning classifiers. Yunhui Long, Vincent Bindschaedler, Lei Wang, Diyue IJSN, 2015. Bu, Xiaofeng Wang, Haixu Tang, Carl A Gunter, and Kai Chen. Understanding membership inferences Raef Bassily, Kobbi Nissim, Adam Smith, Thomas Steinke, on well-generalized learning models. arXiv preprint Uri Stemmer, and Jonathan Ullman. Algorithmic stability arXiv:1802.04889, 2018. for adaptive data analysis. In STOC, 2016. B. T. Polyak and A. B. Juditsky. Acceleration of stochastic Battista Biggio, Igino Corona, Blaine Nelson, Benjamin I. P. approximation by averaging. SIAM J. Control Optim., Rubinstein, Davide Maiorca, Giorgio Fumera, Giorgio 1992. Giacinto, and Fabio Roli. Security Evaluation of Support Vector Machines in Adversarial Environments. Springer Benjamin IP Rubinstein, Peter L Bartlett, Ling Huang, International Publishing, 2014. and Nina Taft. Learning in a large function space: Privacy-preserving mechanisms for SVM learning. Piotr Bojanowski and Armand Joulin. Unsupervised learn- arXiv:0911.5708, 2009. ing by predicting noise. In ICML, 2017. Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, ´ Nicholas Carlini, Chang Liu, Jernej Kos, Ulfar Erlingsson, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej and Dawn Song. The secret sharer: Measuring unintended Karpathy, Aditya Khosla, Michael Bernstein, Alexan- neural network memorization & extracting secrets. arXiv der C. Berg, and Li Fei-Fei. ImageNet Large Scale Visual preprint arXiv:1802.08232, 2018. Recognition Challenge. IJCV, 2015. Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Ahmed Salem, Yang Zhang, Mathias Humbert, Pascal Li Fei-Fei. Imagenet: A large-scale hierarchical image Berrang, Mario Fritz, and Michael Backes. Ml-leaks: database. In CVPR, 2009. Model and data independent membership inference at- Alexey Dosovitskiy, Jost Tobias Springenberg, Martin Ried- tacks and defenses on machine learning models. In NCSS, miller, and Thomas Brox. Discriminative unsupervised 2019. feature learning with convolutional neural networks. In Sriram Sankararaman, Guillaume Obozinski, Michael I. Jor- NIPS, 2014. dan, and Eran Halperin. Genomic privacy and limits of individual detection in a pool. Nature Genetics, 2009. Cynthia Dwork, Frank McSherry, Kobbi Nissim, and Adam Smith. Calibrating noise to sensitivity in private data Reza Shokri, Marco Stronati, and Vitaly Shmatikov. Mem- analysis. In TCC, 2006. bership inference attacks against machine learning mod- els. IEEE Symp. Security and Privacy, 2017. Cynthia Dwork, Adam Smith, Thomas Steinke, Jonathan Ullman, and Salil Vadhan. Robust traceability from trace Karen Simonyan and Andrew Zisserman. Very deep convo- amounts. In Proceedings of the Symposium on the Foun- lutional networks for large-scale image recognition. In dations of Computer , 2015. ICLR, 2014. Jamie Hayes, Luca Melis, George Danezis, and Emiliano A. W. van der Vaart. Asymptotic statistics. Cambridge Series De Cristofaro. Logan: evaluating privacy leakage of in Statistical and Probabilistic Mathematics. Cambridge generative models using generative adversarial networks. University Press, 1998. arXiv preprint arXiv:1705.07663, 2017. Yu-Xiang Wang, Stephen Fienberg, and Alex Smola. Pri- Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. vacy for free: Posterior sampling and stochastic gradient Deep residual learning for image recognition. In CVPR, monte carlo. In ICML, 2015. 2016. Yu-Xiang Wang, Jing Lei, and Stephen E Fienberg. On- Pang Wei Koh and Percy Liang. Understanding black-box average KL-privacy and its equivalence to generalization predictions via influence functions. In ICML, 2017. for max-entropy mechanisms. In PSD. Springer, 2016. White-box vs Black-box: Bayes Optimal Strategies for Membership Inference

Max Welling and Yee Whye Teh. Bayesian learning via stochastic gradient langevin dynamics. In ICML, 2011. Samuel Yeom, Irene Giacomelli, Matt Fredrikson, and Somesh Jha. Privacy risk in machine learning: Analyzing the connection to overfitting. In CSF, 2018. Chiyuan Zhang, Samy Bengio, Moritz Hardt, Benjamin Recht, and Oriol Vinyals. Understanding requires rethinking generalization. In ICLR, 2017.