White-Box Vs Black-Box: Bayes Optimal Strategies for Membership Inference
Total Page:16
File Type:pdf, Size:1020Kb
White-box vs Black-box: Bayes Optimal Strategies for Membership Inference Alexandre Sablayrolles 12 Matthijs Douze 2 Yann Ollivier 2 Cordelia Schmid 1 HerveJ´ egou´ 2 Abstract both the privacy and intellectual property associated with data. For neural networks, the privacy issue was recently Membership inference determines, given a sam- considered by Yeom et al. (2018) for the MNIST and CI- ple and trained parameters of a machine learn- FAR datasets. The authors evidence the close relationship ing model, whether the sample was part of the between overfitting and privacy of training images. This is training set. In this paper, we derive the optimal reminiscent of prior membership inference attacks, which strategy for membership inference with a few as- employ the output of the classifier associated with a particu- sumptions on the distribution of the parameters. lar sample to determine whether it was used during training We show that optimal attacks only depend on the or not (Shokri et al., 2017). loss function, and thus black-box attacks are as good as white-box attacks. As the optimal strat- At this stage, it is worth defining the different levels of egy is not tractable, we provide approximations of information to which the “attacker”, i.e., the membership it leading to several inference methods, and show inference system, has access to. We assume that the attacker that existing membership inference methods are knows the data distribution and the specifications of the coarser approximations of this optimal strategy. model (training procedure, architecture of the network, etc), Our membership attacks outperform the state of even though they are not necessarily required for all meth- the art in various settings, ranging from a simple ods. We refer to the white-box setting as the case where the logistic regression to more complex architectures attacker knows all the network parameters. On a side note, and datasets, such as ResNet-101 and Imagenet. the setup commonly adopted in differential privacy (Dwork et al., 2006) corresponds to the white-box setting, where the attacker additionally knows all the training samples except 1. Introduction the one to be tested. Ateniese et al. (2015) state that “it is unsafe to release The black-box setting is when these parameters are unknown. trained classifiers since valuable information about the For classification models, the attacker has only access to the training set can be extracted from them”. The problem output for a given input, in one of the following forms: that we address in this paper, i.e., to determine whether a (i) the classifier decision; sample has been used to train a given model, is related to (ii) the loss of the correct label; the privacy implications of machine learning systems. They (iii) the full response for all classes. were first discussed in the context of support vector ma- Prior works on membership inference commonly assume chines (Rubinstein et al., 2009; Biggio et al., 2014). The (i) or (iii). Our paper focuses on the black-box case (ii), problem of “unintended memorization” (Carlini et al., 2018) in which we know the loss incurred by the correct label. appears in most applications of machine learning, such as The state of the art in this setting are the shadow models natural language processing systems (Carlini et al., 2018) proposed by Shokri et al. (2017). or image classification (Yeom et al., 2018). In our work, we use a probabilistic framework to derive More specifically, we consider the problem of membership a formal analysis of the optimal attack. This framework inference, i.e., we aim at determining if a specific image was encompasses both Bayesian learning, and noisy training, used to train a model, given only the (image, label) pair and where the noise is injected (Welling & Teh, 2011) or comes the model parameters. This question is important to protect from the stochasticity of SGD. Under mild assumptions on 1University Grenoble Alpes, Inria, CNRS, Grenoble INP, LJK the distribution of the parameters, we derive the optimal 2Facebook AI Research. Correspondence to: Alexandre Sablay- membership inference strategy. This strategy only depends rolles <[email protected]>. on the classifier through evaluation of the loss, thereby show- th ing that black-box attacks will perform as well as white-box Proceedings of the 36 International Conference on Machine attacks in this optimal asymptotic setting. This result may Learning, Long Beach, California, PMLR 97, 2019. Copyright 2019 by the author(s). explain why, to the best of our knowledge, the literature White-box vs Black-box: Bayes Optimal Strategies for Membership Inference does not report white-box attacks outperforming the state- popular neural nets can perfectly fit them in simple cases, of-the-art black-box-(ii) attacks. such as small datasets (CIFAR10) or Imagenet without data augmentation. Krueger et al. (2017) extend their analysis The aforementioned optimal strategy is not tractable, and argue in particular that the effective capacity of neural therefore we introduce approximations to derive an nets depends on the dataset considered. In a privacy con- explicit method for membership inference. As a text, Yeom et al. (2018) exploit this memorizing property byproduct of this derivation, we show that state-of-the- to watermark networks. As a side note, random labeling art approaches (Shokri et al., 2017; Yeom et al., 2018) are and data augmentation have been used for the purpose of coarser approximations of the optimal strategy. One of training a network without any annotated data (Dosovitskiy the approximation drastically simplifies the membership et al., 2014; Bojanowski & Joulin, 2017). inference procedure by simply relying on the loss and a cal- ibration term. We employ this strategy to the more complex In the context of differential privacy (Dwork et al., 2006), case of neural networks, and show that it outperforms all recent works (Wang et al., 2016; Bassily et al., 2016) sug- approaches we are aware of. gest that guaranteeing privacy requires learning systems to generalize well, i.e., to not overfit. Wang et al. (2015) show In summary, our main contributions are as follows: that Bayesian posterior sampling offers differential privacy guarantees. Abadi et al. (2016) introduce noisy SGD to We show, under a few assumptions on training, that the • learn deep models with differential privacy. optimal inference only depends on the loss function, and not on the parameters of the classifier. In other Membership Inference. A few recent works (Hayes terms, white-box attacks don’t provide any additional et al., 2017; Shokri et al., 2017; Long et al., 2018) have ad- information and result in the same optimal strategy. dressed membership inference. Yeom et al. (2018) propose a series of membership attacks and derive their performance. We employ different approximations to derive three • Long et al. (2018) observe that some training images are explicit membership attack strategies. We show that more vulnerable than others and propose a strategy to iden- state-of-the-art methods constitute other approxima- tify them. Hayes et al. (2017) analyze privacy issues arising tions. Simple simulations show the superiority of our in generative models. Dwork et al. (2015) and Sankarara- approach on a simple regression problem. man et al. (2009) provide optimal strategies for membership We apply a simplified, tractable, strategy to infer the inference in genomics data. • membership of images to the train set in the case of Shadow models. Shadow models were introduced by the public image classification benchmarks CIFAR and Shokri et al. (2017) in the context of black-box attacks. Imagenet. It outperforms the state of the art for mem- In this setup, an attacker has black-box-(iii) access (full re- bership inference, namely the shadow models. sponse for all classes) to a model trained on a private dataset, and to a public dataset that follows the same distribution as The paper is organized as follows. Section 2 reviews related the private dataset. The attacker wishes to perform member- work. Section 3 introduces our probabilistic formulation ship inference using black-box outputs of the private model. and derives our main theoretical result. This section also For this, the attacker simulates models by training shadow discusses the connection between membership inference models on known splits from the public set. On this simu- and differential privacy. Section 4 considers approximations lated models, the attacker can analyze the output patterns for practical use-cases, which allow us to derive inference corresponding to samples from the training set and from a strategies in closed-form, some of which are connected with held-out set. Shokri et al. (2017) propose to train an attack existing methods from the literature. Section 5 summarizes model that learns to predict, given an output pattern, whether the practical algorithms derived from our analysis. Finally, it corresponds to a training or held-out sample. If the attack Section 6 considers the more difficult case of membership model simply predicts “training” when the output activa- inference for real-life neural networks and datasets. tions fire on the correct class, this strategy is equivalent to Yeom et al. (2018)’s adversary. Salem et al. (2019) further 2. Related work show that shadow models work under weaker assumptions than those of Shokri et al. (2017). Leakage of information from the training set. Our work is related to the topics of overfitting and memorization 3. Membership inference model capabilities of classifiers. Determining what neural net- works actually memorize from their training set is not trivial. In this section, we derive the Bayes optimal performance A few recent works (Zhang et al., 2017; Yeom et al., 2018) for membership inference (Theorem 1). We then make evaluate how a network can fit random labels. Zhang et al. the connection with differential privacy and propose looser (2017) replace true labels by random labels and show that guarantees that prevent membership inference.