Generative Models for Security: Attacks, Defenses, and Opportunities

LUKE A. BAUER, University of Florida, USA VINCENT BINDSCHAEDLER, University of Florida, USA

Generative models learn the distribution of from a sample dataset and can then generate new data instances. Recent advances in has brought forth improvements in architectures, and some state-of-the-art models can (in some cases) produce outputs realistic enough to fool humans. We survey recent research at the intersection of security and privacy and generative models. In particular, we discuss the use of generative models in adversarial , in helping automate or enhance existing attacks, and as building blocks for defenses in contexts such as intrusion detection, biometrics spoofing, and malware obfuscation. We also describe the use of generative models in diverse applications such as fairness in machine learning, privacy-preserving data synthesis, and steganography. Finally, we discuss new threats due to generative models: the creation of synthetic media such as that can be used for disinformation.

1 INTRODUCTION Generative models learn to characterize the distribution of data using only samples from it and then generate new data instances from this distribution. Although generative models are not new, the deep learning revolution has reinvigorated research into generative model architectures, and deep generative models using state-of-the-art architectures can now produce output that is sometimes indistinguishable from real-world data. Along with this, comes a host of new issues and opportunities relating to security and privacy. This paper provides a comprehensive survey of research at the intersection of generative models and security and privacy. In particular, we describe the recent use of generative models in adversarial machine learning. We also discuss applications such as producing adversarial examples without perturbations, steganography, and privacy-preserving data synthesis. We show how a number of attacks and defenses for various cybersecurity problems such as password generation, intrusion, and malware detection, can benefit from generative models because of their ability to learn the distribution of the training data. By characterizing the data distribution, generative models help practitioners better understand phenomena they are studying and supplement existing attack and defense techniques. Finally, we discuss the extent to which deep generative models present new threats when used to produce synthetic media. The increased intensity of work at the intersection of generative models and security/privacy is evidenced by a growing body of literature. This is illustrated in Fig. 1, which shows the number of papers (published and pre-prints) on this topic from 2000 to 2020. This survey is structured as follows. Section 2 provides a brief overview of generative model architectures, as well as a discussion of metrics used to evaluate generative models. Section 3, we survey the use of generative models for intrusion

arXiv:2107.10139v2 [cs.CR] 29 Jul 2021 detection, malware detection, and biometric spoofing. In Section 4, we describe how generative models can be used to create synthetic datasets. In Section 5, we discuss generative models in adversarial machine learning. We then describe attacks on generative models (Section 6). In Section 7, we discuss applications of generative models to steganography, whereas in Section 8 we discuss their application to fairness problems. We then discuss the emerging threat of synthetic media and disinformation (Section 9). Finally, in Section 10, we highlight some open research problems. We conclude in Section 11.

. 1 2 Bauer and Bindschaedler

Fig. 1. Number of papers (published and pre-prints) on the topic of generative models, and on the topic of generative models and security from 2000 to 2020 according to Scholar. A significant increase of work on generative models can be seen starting in 2015. In 2017, there is also a significant increase of work on generative models and security. Methodology: we gathered the list of papers from Google Scholar using the Publish or Perish program [91]. Concretely, we performed two queries. Query 1 searches for terms relating to generative models (i.e.: Generative Model or RNN or LSTM or Variational Auto-encoder or GAN) in the body of papers, whereas query 2 searches for terms relating to security/privacy (i.e.: Security or Privacy or Attacks or Defenses) in the title of papers. The first query then provides number of generative model papers for each year, whereas the intersection of the results of both queries provides the number of generative model and security for each year.

2 BACKGROUND: NEURAL NETWORKS AND GENERATIVE MODELS This section provides a brief overview of neural network architectures, generative models, and performance metrics for generative models.

2.1 Neural Networks & Deep Learning 2.1.1 Neural Networks. Neural networks are machine learning (ML) models loosely based on how brains work. A neural network consists of one or more layers of neurons or units. A neuron/unit is associated with a set of weights for its connections, a bias term, and an . Each unit computes a linear combination of its inputs and weights and then outputs the result of the activation function (typically a non-linear function) on this combination [129]. Neural networks that have arbitrarily many layers can approximate any function to any degree of precision [206]. Layers between the input layer and output layers are called hidden layers and a neural network with more than one hidden layer is called a deep neural network. To train a neural network, all its parameters (i.e., weights and biases) are initialized randomly, then through the training process, the parameters are tuned using [129]. There are different choices for activation function for each unit or layer, loss function (training objective), and ways to connect the units of one layer to those of another layer, leading to virtually endless possibilities for neural network architectures. Generative Models for Security: Attacks, Defenses, and Opportunities 3

2.1.2 Architectures. The simplest neural network architecture is composed of fully-connected (sometimes called dense) layers where the information only flows forward from the input layer to the output layer (through the hidden layers). These are called feed-forward neural networks [92]. Recurrent Neural Networks (RNNs) have layers that connect the output of units at one time step to its input at the next step, thereby creating some kind of hidden state or memory [159]. This enables feeding the network sequences in such a way that the information from earlier elements of the sequence can be used when processing later elements. For example, this means that an RNN processing text sentences can remember and use information from the first word of the sentence while processing the last word. In practice, simple RNNs architectures have trouble retaining information for long periods of time [82]. Long Short Term Memory (LSTM) cells were created to mitigate this problem and allow information to be retained over longer time frames [182, 199]. Recurrent neural networks based on LSTMs cells and alternatives such as Gated Recurrent Units [43] (GRUs) are well-suited to process time-series data or other sequential data such as natural language text and speech. Convolutional Neural Networks (CNNs) use a combination of convolutional layers and sampling/pooling layers. The convolutional layers are made up of filters that convolve over the data using a sliding window and extract features into a set of feature maps [48]. Pooling layers downsample, reduce or otherwise aggregate the information of a feature map. CNNs are well-suited for data where spatial information is related such as images (e.g., nearby pixels of an image often contain similar information).

2.2 Generative Models Generative models provide a way to sample new data from a distribution learned from existing “training” data. More formally, constructing a generative model involves estimating an underlying distribution 푝 that approximates some true (unknown) distribution 푃 of interest. For this, one uses a training dataset 푋, where each 푥 ∈ 푋 is a data point assumed to have been sampled i.i.d according to 푃. The problem of constructing (i.e., training) the generative model is difficult because it is an underspecified problem. Indeed, there are innumerable distributions that are compatible with the training dataset. As a consequence, generative model architectures are distinguished by the assumptions used to make constructing the generative model tractable. To estimate the underlying distribution 푝, some models (e.g., autoregressive models) explicitly characterize this distribution for any data point, so that given any data point 푥 in the support of 푝, they can calculate 푝(푥) (i.e., the probability that 푥 is sampled from 푝) exactly. In contrast, other models (e.g., variational ) learn 푝 only implicitly, for example by introducing a “latent” variable 푧 and then modeling only the probability of 푥 conditional on 푧, i.e., 푝(푥|푧). The latent variable is then assigned a known prior distribution 푞 (e.g., isotropic Gaussian), so that one can sample 푧 according to 푞 and then 푥 according to 푝(푥|푧). In this survey, we focus on state-of-the-art generative models based on neural network architectures (sometimes called “deep” generative models). This includes generative adversarial networks, autoregressive models, and variational autoencoders. More traditional types of models such as probabilistic graphical models (e.g., Bayesian networks), Gaussian mixture models, and Markov chains or hidden Markov models can also be used as generative models. However, they are not the focus of this survey.

2.2.1 Autoregressive Generative Models. Autoregressive models view each data point 푥 as a tuple 푥 = (푥1, 푥2, . . . , 푥푚), which allows them to process sequential data, as well as view inherently non-sequential data (e.g., an image) as a sequence (e.g., the sequence of pixels that the image is composed off). The key idea is to capture the underlying distribution 푝 4 Bauer and Bindschaedler

Î푚 using the chain rule of probability [207], so that: 푝(푥) = 푖=1 Pr{푥푖 |푥푖−1, . . . , 푥2, 푥1}. As a result, autoregressive models can directly estimate the density 푝(푥) for any data point 푥. Despite the sequential nature of the internal representation of the density, autoregressive models can be built using recurrent architectures, but also feed-forward neural networks such as CNNs with masked [111]. To maintain the autoregressive property, it suffices to ensure that the distribution over the 푥푖 depends only on 푥 푗 for 푗 < 푖. Prominent examples of autoregressive models include PixelRNN [230] and PixelCNN [231], Wavenet [173], and many others [42, 202, 232, 259, 268]. The PixelRNN and PixelCNN models produce images and are constructed by training an autoregressive model using a stack of LSTMs for PixelRNN and convolutional networks for PixelCNN. Both models sample an image pixel by pixel, with each pixel depending only on previously generated pixels. Wavenet produces state-of-the-art audio samples. It uses a CNN to process the training data. To generate audio, each sample produced is then fed back into the model and used to determine the next sample.

2.2.2 Sequential Generative Models. Recurrent neural networks such as those based on LSTMs and GRUs are often used in tasks such as classification or labeling tasks. But, due to their inherently sequential nature, when the models are made to learn to predict the next element of a sequence, they can be viewed as generative models in the sense that given a starting input 푥1, they can “generate” the sequence 푥2, 푥3, etc. However, RNNs and LSTMs are often used as components of larger generative models instead of by themselves [144, 162, 271]. An alternative to RNNs and LSTMs for problems that deal with sequences is the transformer architecture [233, 248]. Transformer models use stacked encoding/decoding layers and self-attention [233] to allow for parallelization and remembering information over long input sequences [53, 55, 59, 121, 174, 181, 192]. The Transformer architecture has become the state-of-the-art architecture for natural language processing [248]. Both LSTMs and Transformers are often used for tasks involving natural language. For both text and speech they are used for generation, translation, and understanding. In particular, LSTMs have been used to generate text [86], [87], and machine translation [5]. Transformers have been used to generate text [191], for language understanding [59], machine translation [5], text to speech [195], audio speech generation [138] and even some image generation tasks [181]. Some models are bi-directional [59, 87].

2.2.3 Autoencoders. Autoencoders (AE) learn to approximately “copy” their input to their output. To do so, the architecture combines an encoder and a decoder (both neural networks) to efficiently represent inputs as codes or latent variables in a latent space. Because the latent space is typical of lower dimensionality than the input space, the encoding process must discard extraneous information in the input, preserving only the most important information for the reconstruction. Autoencoders have applications to dimensionality reduction [155, 200] and feature learning [269], but they can also be used as generative models or to learn generative models. For example, sampling latent space points (from an adequate distribution) and feeding the samples to the decoder yields new data points from the autoencoder’s learned data distribution. To ensure that autoencoders are able to generate samples not in the training dataset, are not overfitted, and have a tractable, continuous, and interpretable latent space, they need to be regularized. There are several types of autoencoders using various regularization techniques. For example, denoising autoencoders are trained by adding noise to their training data [235], whereas contractive auto-encoders are regularized by adding a penalty to the reconstruction loss [197]. A prominent alternative are variational autoencoders, proposed by Kingma and Welling [125]. Variational AutoEncoders (VAEs) model the latent space as isotropic Gaussian. To approximate the posterior distribution of the encoder, variational inference [27] is used. Generative Models for Security: Attacks, Defenses, and Opportunities 5

VAEs benefit from relatively straightforward training. The encoder can be used for continual updates ofthelatent space or dimensionality reduction. The decoder allows for sampling from different parts of the latent space. This can be leveraged to reveal relationships between data points and/or produce samples with a specific relationship to other samples. VAEs are often used for image generation [193], and can be used to edit existing photos [69]. However, some authors have expressed concern about the blurry output of VAEs [286], and while some mitigation methods have been proposed, this problem persists in many cases [64]. VAEs have also been used for text [106] and video generation [95]. There are many variants of VAEs such as 훽-VAE [97] or IWAE [32]. Significant differences between autoencoder architectures are the underlying neural network architecture used for the encoder and decoder (e.g., CNN or RNN), and also the loss function used during training. Another notable type of autoencoders are adversarial autoencoders (AAE) [155], which use generative adversarial networks for variational inference.

2.2.4 Generative Adversarial Networks. Generative adversarial networks (GANs) were proposed by Goodfellow et al. [84] in 2014. GANs are made up of two separate feed-forward neural networks that are set up to be adversaries of each other. The first network is a generator model that produces samples from an estimation of the training distribution. The second network is a discriminator that attempts to determine if a given sample is from the training data (returns 1) or produced by the generator (returns 0). The generator uses the predictions from the discriminator to determine the quality of its samples, while the discriminator uses the generator’s samples to refine its results. Both networks are trained simultaneously. In practice, GANs can be difficult to train. The generator and the discriminator must learn at a similar paceorthe process will fail. If the discriminator becomes significantly better than the generator, or vice-versa, the gradient will diminish until the model is unable to improve. In addition, even if the two networks are balanced in their learning, the GAN may still suffer mode collapse [20], a condition where the model always produces a few samples of high scoring output rather than reflecting the diversity in the training data.

2.2.5 GAN Architectures. There are numerous GAN variants. Most significant alternative GAN architectures target the generator or the loss function [175]. GANs initially used a simple feed-forward network as a generator, but later implementations replaced this with different models including a CNN[190], an autoencoder [155], and a VAE [133]. Other variations make changes to the loss function [13, 84, 157] to improve and stabilize the training process. Finally, some variations seek to add features to GANs. For example, Zhu et al. [292] allow for image-to-image translation without the need for training images that have been translated through alternative methods. Gomez et al. [83] make GANs reversible. Donahue et al. [62] propose a GAN that can project data back into the latent space.

2.2.6 Applications & Data Types. The nature and architecture of these generative models lend themselves to some data types and applications/tasks better than others.

Text and natural language. Transformers are generally regarded as the state-of-the-art text generator models. Examples include BERT [59], OpenAI’s GPT-2 [191] and GPT-3 [31], and XLNet [259], which is an autoregressive model used for language understanding. There have also been several proposals applying GANs to text generation [38, 282], but the GAN architecture can be challenging to apply to sets or sequences of words. That said, several papers have proposed ways to improve performance on this type of data [131, 255]. Finally, VAEs have also been applied to text [106], but their performance is not competitive with transformers. 6 Bauer and Bindschaedler

Audio and speech. A popular audio generator is Wavenet [173]. Transformers have also seen success in music generation [107]. GANs have also been applied to audio [61, 74]. In particular, Bińkowski et al. propose GAN-TTS, a generative adversarial network for text-to- [25]. GANs have also been used to generate music [68, 130].

Images and videos. GANs are well-suited for image generation [30, 110, 119] and editing [11, 291]. Autoregressive models are also able to produce competitive images through PixelCNN [202, 231]. VAEs are also often used for image generation [64, 102, 193, 256] and to making small edits to existing photos [69]. LSTMs [88] and transformers [181] have also been used to generate images. Finally, both VAEs [95] and GANs [228] have been applied to video generation. GANs have also been applied to video prediction [18, 140, 237].

2.3 Evaluating Generative Models The performance of generative models is difficult to evaluate. A classifier can simply be evaluated on howmany examples it correctly classifies; but there is no obvious one-size-fits-all metric to quantify the quality of model-generated data such as images, audio, or text. For generative models that have a tractable probability distribution, we can quantify the likelihood of producing a specific output, but this is not always indicative of output quality. Further, thenotion of quality may be task-dependent, so that no single metric can be applied in all cases and multiple metrics may be necessary to evaluate different aspects of the model. Borji [29] gives a comprehensive comparison of many different evaluation metrics for GANs, mainly focusing on image generation. There is also evidence that metrics estimating output likelihood often do not match up with human judgement [221]. For example, factors such as background color can have a significant impact on likelihood. Also, images with (relatively) low likelihood may nevertheless appeartobe much clearer and more realistic (to humans) than images with high likelihood. Additionally, measures of quality often do not take output diversity into account. Theis et al. [221] suggest that the best evaluation method depends on the model, type of data, and the goals of the user. In this section, we provide a brief taxonomy of existing metrics. We classify metrics by what they aim to measure and summarize each class in a table. The classes are output quality, model quality, comparison against training data, and human evaluation. The tables include image, audio, and text columns to indicate metrics that are often used to compare output of a model, such as between epochs or different versions of the same model. Metrics useful for model comparison can be used to compare two or more models attempting to generate the same distribution. Finally, metrics that evaluate output diversity quantify the extent to which the model is able to produce diverse outputs. For example, models that output data points that all look similar, or models that suffer from mode collapse would score poorly on output diversity metrics. Note that different aspects are not always mutually exclusive: methods that evaluate image quality can in some cases be used to determine model quality, and high-quality models usually produce high-quality output. To show this, we use the following symbols: what it is intended to measure; what it can be adapted to measure; and and what it cannot measure.

2.3.1 Output Quality Metrics. Table 1 shows metrics that are evaluated by solely examining the output of the generative models. Methods such as asking survey participants to rate the quality of output or low-level image statistics are often used to evaluate model output against images that are not generated. These types of methods are important since the end goal of many generative models is to produce the most realistic output possible. It should be noted that the text comparison methods are intended for use with machine translation, but can also be used for evaluating generated text. Generative Models for Security: Attacks, Defenses, and Opportunities 7

Table 1. Output quality metrics. These metrics solely look at the output of a generative model and can be used to compare images, audio, text, and the models themselves. We do not include the diversity column here since none of the metrics in this table are useful for determining output diversity of a model.

Image Audio Text Model Image Quality Measurements [171, 194, 245] Low-level Image Statistics [119, 274] Inter/Intra Track Evaluation [63] BLEU [179] ROGUE [141] METEOR [57]

2.3.2 Model Quality. Table 2 shows metrics that are used to evaluate the quality of a model itself rather than only its output. These metrics often identify desirable qualities of a model such as producing diverse output or having a discriminator able to function well as a classifier. Since these metrics are examining the model rather than the output, they may not always correlate with outputs that humans think of as high quality. For instance, an image may have a high likelihood of being produced by the model, but it may look “off" or unintelligible to a person. These metrics are still useful for comparing different outputs from a single model or evaluating a certain aspect such as output sharpness. Many of these metrics directly measure model output diversity. A model may produce a perfect picture, but if it always produces the same picture it is not useful as a generative model. This is especially important for evaluating GANs as they are susceptible to mode collapse. Finally, methods such as tournament win rate and generative adversarial metric are useful for comparing competing models.

Table 2. Model quality metrics. These metrics examine the internals and the output of a generative model to measure desirable qualities. They can be used to compare images, audio, text, the models themselves, and output diversity.

Image Audio Text Model Diversity Average Log Likelihood [84, 221] Inception Scores [201] Generative Adversarial Metric [109] Tournament Win Rate and Skill Rating [172] Normalized Relative Discriminative Score [284] Birthday Paradox Test [14] Number of Statistically Different Bins [196] Precision, Recall, and F1 Score [151] Examining Internals of Networks [40, 272] Perplexity [113] Classification Performance [190]

2.3.3 Comparison Against Training Data. Table 3 shows metrics that compare model output against training samples of real data. These metrics are mainly used to ensure that the model approximates the unknown training data distribution as closely as possible. This means that the model both produces output that is similar to training data, and it produces a full distribution rather than just a few high scoring samples. These metrics do not directly examine the internals of the model, allowing them to be used to compare models with different architectures trained on the same data. For example, a VAE can be compared against a GAN. 8 Bauer and Bindschaedler

Table 3. Comparison against training data metrics. These metrics compare generative model output against training data to ensure that the model is learning the training data distribution. They can be used to compare images, audio, text, the models themselves, and output diversity.

Image Audio Text Model Diversity Fréchet Inception Distance [96] Maximum Mean Discrepancy [76] Reconstruction Error [250] Wasserstein Distance [13] Boundary Distortion [204] Classifier Two-sample Tests [135] Image Retrieval Performance [243] Fréchet Audio Distance [122]

2.3.4 Human Evaluation. Table 4 shows metrics based on human evaluation. Human evaluation, though expensive to conduct, is an important aspect of generative model evaluation. For some applications, the most important factor is whether humans can distinguish between the generated output and real samples. However, there are several challenges with human evaluation. Humans are inconsistent, so ratings can vary greatly between individuals. Another consideration is the cost and effort of recruiting participants and obtaining meaningful data from this. As a result, human evaluation should (in most cases) not be the only metric used, but applications that entail a generative model whose outputs will be shown to humans would benefit from some human evaluation.

Table 4. Human evaluation metrics. These metrics have humans directly evaluate the quality of generative model output. They can be used to compare images, audio, text, the models themselves, and output diversity.

Image Audio Text Model Diversity Nearest Neighbors [244] Rapid Scene Categorization [225] Rating and Preference Judgement [276] Realism Measurement [289] Timed Realism Measurement [289]

3 INTRUSION, BIOMETRICS, AND MALWARE Because generative models are able to learn detailed representations of data, attackers can create realistic-looking data that may be able to bypass defenses while still retaining malicious functionality. At the same time, defenders can use generative models to learn the distribution of “normal” or benign data and attempt to leverage this to identify “anomalous” objects that do not fit the distribution such as zero-day malware. In this section, we provide a brief overview of research on intrusion and anomaly detection, as well as biometrics and malware attacks and defenses that leverage generative models.

3.1 Intrusion Attacks Generative models can be used to learn the distribution of normal traffic, and then automate the generation of attacks that look benign. For example, Feng et al. [75] use deep learning to automatically learn what normal traffic entering the system looks like, and then generate samples from this distribution. Lin et al. [142] use GAN’s training method to train Generative Models for Security: Attacks, Defenses, and Opportunities 9 a generator against a discriminator, creating adversarial traffic samples that can evade the target detector. These models can be used to generate traffic to evade Intrusion Detection Systems (IDS) protecting Industrial Control Systems (ICS). While attacks on ICS, most notably Stuxnet [72], are well-established, they remain difficult to perform because they require knowledge of both the ICS and the IDS. Generative models change the equation because they simplify these attacks by analyzing the structure traffic going through the IDS and crafting messages that are able to bypass thesystem by appearing benign, while still containing the malicious code.

3.2 Anomaly Detection Generative models’ ability to learn data distributions can also be used to detect attacks because it can produce a model of normal use or operation of the system. Any input that does not fit this “normal” model can then be rejected or flagged for further analysis. An advantage of this method is that it can readily detect never seen before malicious input. While most classifiers can be trained to identify malicious input versus benign input, this requires samples of benign and malicious input. For example, Kim et al. [123] use deep autoencoders and a GAN to learn the features of a zero-day malware sample allowing for sample generation while also training the GAN’s discriminator. The discriminator’s learning can then be transferred to the detector to improve the detection of the new malware. In a similar vein, Zheng et al. [288] use a deep denoising autoencoder to detect fraud in bank transfers. Another example is Chandy et al.’s [36] proposal to use a variational autoencoder to detect cyberattacks on a town’s water distribution system. A different methodology to achieve a similar goal is to use generative models to support other detectors. Anexample of this is Bot-GAN [267]. The idea is to use an existing botnet detector with a GAN continuously producing new samples of botnet activity. This allows the detector to become more robust, since it has many more samples to train on.

3.3 Biometric Spoofing Biometrics use a person’s unique attributes, such as fingerprints, voice, or even handwriting, to identify an individual. Biometrics are used to protect many systems from smartphones to buildings. However, generative models can be used to create new biometrics samples to break into systems. GANs can be used to simply generate synthetic fingerprints [160], but producing random fingerprints is not typically sufficient to break biometrics systems. More importantly, by sampling from the latent space generative modelscan generate masterprints [28]. Masterprints are special fingerprints that incorrectly match with many different fingerprints, undermining the security of fingerprint-protected systems. Generative models can also spoof handwriting [149], and keystroke latency [164].

3.4 Malware Obfuscation and Detection Malware are malicious programs that seek to compromise computer systems to exfiltrate data or otherwise cause harm. Ideally, malware can be detected before it compromises a system. But novel malware and obfuscated malware can be difficult to detect. Generative models can be used to obfuscate malware in new and unpredictable ways[105], but they can also be used as malware detectors [123, 124, 153, 180]. Generative models can help improve malware detection by augmenting the available amount of training data. In particular, generative models can be used to simply generate malware, creating samples to bolster detectors [124, 153]. For more refined detectors they can be used to create clusters of similar malware around likely malware base forms, which augment the detector so that it can identify the different classes of malware even if they are changed slightly123 [ , 180]. 10 Bauer and Bindschaedler

An example of the use of a generative model to avoid detection is MalGAN [105]. MalGAN produces malware examples that are able to bypass many malware detectors. The idea is to train a GAN with an estimation of the target detector as the adversarial network and a generator which adds and removes malware features. This creates adversarial malware samples that are better able to avoid the target detector, compared to just adding random variation.

4 DATA SYNTHESIS The ability of generative models to produce new samples from the distribution of existing data is useful by itself for several security and privacy related applications. In particular, generative models can be used to produce synthetic datasets that can be shared when the original (sensitive) data cannot itself be shared or publicly released for legal, policy, or privacy reasons. Furthermore, the ability to create large datasets enables the study of data which may impact security policies. For example, generating realistic passwords can help practitioners study password management policies or design defenses for data breaches.

4.1 Privacy-Preserving Data Synthesis Organizations owning large datasets of sensitive individual data such as medical records may wish to share this data with outside parties. Due to privacy concerns captured in organizational policies or legal considerations, such data sharing is often impossible. Organizations could of course release only aggregate statistics of the data in an anonymized way, but this limits the type of analyzes that can be performed. Generative models offer an attractive alternative solution: train a generative model on the sensitive dataset then uses it to produce a synthetic dataset that can be shared or published. To preserve utility, we need to ensure that the synthetic dataset reflects the statistical properties of the original dataset. Note that just because the data is “synthetic” does not mean it automatically safeguards privacy; we need to ensure that the synthetic dataset does not inadvertently leak sensitive information from the original dataset [216]. So, research in the past decade has focused on ways to create the synthetic dataset in such a way that the process satisfies 휀-differential privacy65 [ ], a well-established notion for privacy protection. Methods to achieve differential privacy need to be tailored to the specifics of the generative model used, butthe process typically involves adding random noise (from a carefully chosen distribution and in a specific way) during the model’s training process. In particular, one of the first lines of work for this used Bayesian networks as generative models [24, 186, 278, 285]. For example, PrivBayes [278] shows how to construct the Bayesian network and estimate its probabilities from data such that the process satisfies 휀-differential privacy. More recently, generative models based on neural networks have been used to create synthetic data [2, 3, 8, 21, 79, 198, 220, 224, 226, 227, 252, 253]. These papers use a variety of model architecture, including GANs and variations of GANs such as W-GAN [8], CGANs [224], or autoencoder GANs [220]. Variational Autoencoders have also been used [3]. There are various techniques to achieve differential privacy (or related notions), but a common approach is totrain the model using DP-SGD [1, 167], which adds carefully crafted Gaussian noise to the gradient during the stochastic (SGD) process that tunes the parameters. Some of the other methods include average KL-Privacy [226], differentially private clusters [252]. A different but related line of work for the semi-supervised setting is Private Aggregation of Teacher Ensembles (PATE) [116, 147, 176, 178]. The idea is to partition the original (sensitive) dataset into disjoint subsets, and then train a “teacher” model on each subset. Teacher models are trained without consideration of privacy. But the predictions of all teacher models are aggregated with random noise to achieve differential privacy. Finally, a privacy-preserving “student” Generative Models for Security: Attacks, Defenses, and Opportunities 11 model can be obtained from a public dataset of unlabeled data, through knowledge transfer from the teacher models. While the PATE approach does not directly provide a synthetic dataset or a generative model, it can be extended to do so as described by Jordon et al. [116] or Long et al. [147].

4.2 Understanding Data Generative models can also help better understand data that influence security decisions. PassGAN99 [ ] is a GAN that is able to generate a large number of potential passwords. Previous methods of guessing passwords would often consist of human-generated rules, such as replacing certain letters with numbers, or limited n-gram Markov models. While effective, they are limited and unable to discover passwords that do not fit the specific rules in use. AlthoughPassGAN was presented as a password guessing attack method, its existence shows how generative models can be used to develop an understanding of the data being examined, without humans having to go through an enormous amount of data to identify patterns. Models like PassGAN and more work recent [166] can be used to learn common password patterns and steer users away from easily guessable passwords. Another potential application is to improve honeywords [117]. Honeywords are passwords of fake accounts setup to detect compromises. Augenstein et al. [16] shows how differentially private generative models, as previously described, can be usedto examine federated data while still maintaining the privacy of the entries in the . They use generative models to study private federated data and debug other models trained with the data. This involves creating two private generative models using federated data, one with the best samples and one with the worst, and compare outputs. With this technique, the authors were able to debug a specific pixel inversion issue, showing that generative models canbe used to better understand data humans cannot realistically study.

5 ADVERSARIAL MACHINE LEARNING Adversarial machine learning is the study of attacks against and defenses for machine learning models and the associated threat models [108]. Attacks against machine learning can be classified into one of three categories: exploratory, evasion, or poisoning [35]. Exploratory attacks attempt to extract information from a model, such as data used to train the model [209], the model parameters [77], or its hyperparameters [240]. Such attacks can violate the privacy of individuals whose data is part of the training set or steal confidential and valuable information about machine learning engineering from the model’s developers. Evasion attacks create malicious inputs (often called adversarial examples) that result in the model making a false prediction [15, 85, 177]. This can be used by attackers in various settings such as to evade detection by a model trying to detect or filter out attacks. The goal of poisoning attacks is to corrupt the training data so that the attackers can control the model’s future predictions under certain scenarios [41, 89, 145]. Both evasion and poisoning attacks can have potentially disastrous real-world effects if they are perpetrated against some systems. For instance, a self-driving car could be made to ignore a stop sign [213] when provided with specially crafted malicious inputs. Finally, there exist defenses against these attacks. Also, within all of these categories, there are different attack scenarios and threat models such as whether the attacker has black-box or white-box access to the model. In this section, we survey the use of generative models within adversarial machine learning. This includes the use of generative models to help craft adversarial examples (Section 5.1), as a defense against adversarial examples (Section 5.2), and for poisoning attacks (Section 5.3). 12 Bauer and Bindschaedler

5.1 Generating Adversarial Examples Adversarial examples are maliciously crafted inputs that force a machine learning model to produce specific decisions or outputs that are counter to human expectations. In particular, if the model is a classifier, a benign input belonging to class A is maliciously perturbed in a way that a human would consider it to still be of class A, but such that the model classifies it as class B. For example, an adversary may create an “adversarial stop sign” by adding a small perturbationto an existing stop sign in a way that is imperceptible or inconspicuous to humans but causes a self-driving car to ignore the stop sign [71]. There are several methods for creating adversarial examples depending on the constraints of the problem and the threat model. However, one of the most common methods is to treat the task of crafting an adversarial example as an optimization problem. This typically involves starting with a benign example and taking steps in the opposite direction of the gradient of the loss of the classifier (with respect to the input) on the desired target class[85, 219]. Generative models provide a different way to craft adversarial examples. The idea is to use a generative modelto sample adversarial perturbations [251] or adversarial examples directly [112, 163, 215, 287]. For example, this can be accomplished by training a generative model on the original (benign) data and then finding adversarial examples in the latent space of the generative model. There are several search strategies over the latent space, but they involve two steps. First, one finds a normal/benign data point in the latent space. Then one searches near this data point (inthe latent space) to find another data point which (in the original space) is classified differently by the target model.This method allows attackers to generate completely new adversarial images without explicitly constructing perturbations. A key advantage of this technique is that it can be applied to domains where valid perturbations are difficult to find. For instance, Hu et al. [105] uses a GAN to generate malware that looks benign to a target discriminator. In a separate paper, the same authors use a generative RNN to creates adversarial examples for RNN-based malware detectors [104]. Also, some types of data, such as text and code, do not easily lend themselves to perturbations. For example, adversarial text can be created by switching out words [67], but this is more difficult to accomplish without changing the meaning of the sentence, compared to images where the pixel values can often be shifted by a small amount while resulting in (almost) imperceptible changes. A line of work that may be of particular interest to cybersecurity practitioners is the use of GANs to generate domain names that are able to bypass detectors [9, 50]. Domain names generation are often used to generate domain names for command and control servers. Corley et al. [50] train three different GANs capable of producing outputs that can evade state-of-the-art classifiers meant to detect domain generation algorithms. They conclude that, compared to using traditional domain generation algorithms, the use of GANs significantly improves the ability to evade detection. Finally, the ability to produce adversarial examples without producing an explicit perturbation allows one to bypass some adversarial example detectors. This is because some adversarial example detectors are specifically designed to identify and remove perturbations of benign inputs [203, 214].

5.2 Defenses Against Adversarial Examples Generative models can also be used to defend against adversarial examples. To improve the robustness of a classifier, Lee et al. [134] propose using the GAN training method as a trainer for the classifier, where the classifier works asthe discriminator and an adversarial example generator works as the generator. This method makes the model more robust against adversarial examples. A similar effect can be achieved by simply adding adversarial examples to the training dataset. This is known as adversarial training [154] and it has been evaluated in several papers [9, 112]. Generative Models for Security: Attacks, Defenses, and Opportunities 13

Zhang et al. [281] used the assumption that the underlying data distribution of a set of images is captured by generative models to study model robustness and interactions with adversarial examples. By comparing the robustness of their generative model and a theoretical limit of robustness they are better able to understand how models are vulnerable to adversarial examples. While the main purpose of the paper is to display the disparity in model robustness versus the robustness of the actual image space, they propose that this method could also be used to bolster defensive classifiers by showing where adversarial examples are likely to be created. Alternatively, GANs can create a “denoised” version of any input that can then be fed into the classifier19 [ , 115, 203, 214]. This can eliminate many adversarial examples by removing the adversarial perturbation, leaving behind the original input which can be correctly classified. A prominent instance of this method is Defense-GAN [203] which uses a WGAN to take in an input image and generates a new image that retains all of the pertinent information while eliminating the extraneous artifacts that have the potential to be malicious. A similar effect can be accomplished through dimensionality reduction with a simple autoencoder, but GANs are better able to completely recreate a clean image, rather than simply removing many of the finer details. This method increases a model’s resistance to adversarial examples, while being flexible, since it is not necessary to make any changes to the classifier itself; GANs justneedto be run over any potential input first. Ironically, adversarial examples that are produced by generative models (discussed in Section 5.1) may defeat this type of defense because they do not rely on explicit perturbations.

5.3 Poisoning Attacks The goal of poisoning attacks is to corrupt the target model [41, 89, 145]. In particular, a subset of poisoning attacks, backdoor attacks add a “trigger” (e.g., a small pattern or even a single pixel of a specific color in the context of images) to any data point to force the classifier to make a specific decision. Backdoor attacks accomplish this goal by addingthe trigger to many training images and labeling them all as a specific target class, regardless of their original class labels. During training the model learns to associate the trigger with the target class. For backdoor attacks, a trigger does not need to be generated, it can be added to any image. In addition, the trigger need not be hidden. It can be as obvious as the attacker desires. That said, Turner et al. [229] propose label-consistent backdoor attacks that use generative models to create adversarial samples that look more similar to their original image, while still forcing the model to learn the trigger. This is useful since these examples are less likely to be filtered out of the training data. If the image is too different from the target class, a simple classifier run over the training datawill pick it out and remove the image. In addition, generative models can be used to speed up the generation of poisoned samples and to mount attacks on federated learning. Yang et al. [257] used a GAN to generate poisoned samples over 200 times faster than traditional gradient methods. Zhang et al. [277] used GANs to attack federated learning networks.

6 ATTACKS AGAINST GENERATIVE MODELS In addition to adversarial machine learning attacks performed using generative models, there are attacks used against generative models themselves. Indeed, generative models have been shown to be vulnerable to the three main categories of adversarial ML attacks: exploratory, evasion, and poisoning attacks. However, it should be pointed out that evasion attacks on generative models focus on causing the model to generate specific outputs. 14 Bauer and Bindschaedler

6.1 Exploratory Attacks A prominent type of exploratory attack on machine learning is membership inference attacks. In a membership inference attack, the attacker aims to determine whether a specific individual’s data was part of the target model’s training data [209]. These attacks exploit the fact that machine learning models such as classifiers often behave differently when asked to make a prediction for a data point from their training dataset compared to for a data point not part of the training dataset (but from the same distribution as the training dataset). This behavior is related to overfitting, although recent work suggests that models may be vulnerable even in the absence of overfitting [136, 146, 148]. Membership inference attacks on generative models seek to exploit the same effect. In particular, some membership inference attacks have been developed against GANs [94] and generative models [98]. Additionally, VAEs are vulnerable to reconstruction attacks [98].

6.2 Adversarial Examples Generative models are susceptible to adversarial examples. Concretely, adversarial examples attacks have been demon- strated against generative models in natural language processing (NLP) and image compression. In the NLP context, Wallace et al. [238] show that there exist universal adversarial triggers. These are sequences of tokens that can be appended to any input to cause the model to output specific predictions. More specifically, Wallace et al. were able to append certain words which caused models such as GPT-2 to output offensive text. Interestingly many of these triggers worked on all of the models tested. In the image compression context, Kos et al. [127] propose a scenario where the encoder and decoder of a VAE or VAE-GAN have been separated. The encoder is being used by a sender to encode an image into the latent space representation 푧, which is transported to a receiver who will attempt to decode 푧 and reproduce the original image. They argue that an attacker could potentially trick the sender into choosing to send an adversarial image that looks benign but returns an entirely different image when compressed and then decoded. For instance, a picture of acatcould be slightly perturbed to decode to an image asking for money.

6.3 Poisoning Attacks Wallace et al. [239] show how poisoning attacks can be applied to NLP translation models. They focus on classification models, but the authors show that the method also works on translation models. The idea is to insert a small number of triggers into the model, none of which actually contain the trigger phrase, making it difficult to determine which training samples are causing the error. Although this work only applies this technique to classification and translation models, it shows the potential vulnerability of generative models to poisoning.

7 STEGANOGRAPHY Steganography is the process of hiding secret messages inside non-secret mediums to evade detection. The idea is not to simply hide the message content, as one would with encryption. Instead, the goal is to hide the very fact that a message is being sent.

7.1 Steganography without Generative Models Steganography hides the message in some inconspicuous “cover” medium such as an image or audio sample. While there are a number of ways to perform steganography, a popular method is through embedding. Distortion can be Generative Models for Security: Attacks, Defenses, and Opportunities 15 added to a cover, which the recipient can decode to retrieve the secret message. This method is often used with images as cover [22, 100, 101, 184], but can also use audio [158] or text [23, 187] as cover. However, even small distortions to an image or audio can be detected [78, 185, 189]. This issue is even more prevalent with text since many changes are evident to humans [23].

7.2 Steganography with Generative Models Methods using generative models for steganography can be classified into three categories: (1) cover creation, (2) cover modification, and (3) generative steganography. We illustrate this by taxonomizing recent work across these categories and types of data in Table 5. The first category is cover creation. Here generative models are used to generate a cover media (e.g., an image,text, audio sample) into which the message can be embedded using other methods [208, 236] that may not rely on generative models. An advantage of this method is that it creates new cover media, which prevents detectors from comparing a potentially pre-existing version of the cover against a steganographic version. However, the distortion of the cover from embedding the secret message may still be discoverable. Remark that, the cover media may look conspicuous depending on the context, or the quality of the generative model used. The second category involves modifying existing cover media and embedding the message in it. Methods in this category start from an initial (typically not model generated) cover media, which the secret message is then embedded into using a generative model through a variety of methods. Similar to more traditional steganography through embedding, the message can be transformed into a vector or modification matrix and then added to the cover through generated distortion [17, 51, 258]. Here GANs’ adversarial training is useful since the generator can add the distortion while the discriminator can try and detect steganography. This allows for more refined embedding since the model learns how to avoid the detector while still embedding the secret message. Other steganographic methods embed a secret sample of the same data type as the cover media. Instead of a text message, an entire image or audio sample can be transformed into a new form or vector and then hidden within a cover sample. This has been used in both images [279], and audio [114, 128, 266]. Finally, the last category involves the use of generative models to directly produce the cover media with the secret message embedded in it. The idea is to use bits of the secret message to directly sample from the probability distribution of the generative model. The receiver can then recover the message by examining the steganographic image [103, 137, 143] or audio sample [37, 260] and inverting the choices made in the generation of the sample. It is worth noting that similar techniques have been used to produce text steganographically [165, 222, 293]. In these cases, a language model (e.g., a Markov chain, a Transformer, or LSTM) is used to define the probability distribution of text. The secret message is encoded into samples of the probability distribution of the language model. The exact encoding method varies, with two popular algorithms being Huffman coding161 [ ] and arithmetic coding [247]. The receiver can then invert the sampling from the probability distributions and retrieve the secret message. It is worth noting that text usually has much lower capacity than images or audio. Also, text cannot be easily distorted without it being noticeable; adding out of place letters or punctuation is conspicuous, so generative steganography makes text significantly more viable as a steganographic carrier. It is also possible to perform generative steganography by training generative models to embed a message or secret sample into future samples it generates. For instance, a cycle-GAN can be trained so that all images it generates have a secret image hidden in the signal of the new image [45]. A similar method is used by Zhang et al. [283] where the message is embedded in certain parts of the image which are then retained in future generated images. 16 Bauer and Bindschaedler

Table 5. Methods using generative models for steganography. Columns represent the different ways the models can be used. Rows represent the data types being generated.

Method Cover Creation Cover Modification Generative Steganography Images [51, 208, 236] [17, 208, 258, 279] [45, 103, 137, 143, 283] Audio N/A [114, 128, 266] [37, 260] Text N/A N/A [165, 222, 262–265, 280, 290, 293]

7.3 Detecting Steganography There are a number of papers for detecting steganography performed through embedding [246, 261]. The idea is to detect differences in latent spaces and probability distributions between the images or text that contains messages comparedto the version with nothing embedded. However, there is a dearth of steganalysis papers focused on detecting generative steganography. Many generative steganography papers measure detectability using measures such as perplexity (for text) [262], bits per sample (for audio) [37], human comparison (all) [128, 152], or discriminator models trained on innocent vs steganographic samples [93]. Moreover, steganographic systems can potentially be thwarted if one can easily remove the secret message. For example, one could use GANs to resample suspicious images, thereby likely making the secret message unrecoverable in the process.

8 FAIRNESS There is a rich literature of proposed notions of fairness and their formalization and application to machine learning. These include notions of individual fairness such as fairness through awareness [66] and counterfactual fairness [132] and also notions of group fairness such as statistical parity [66], conditional statistical parity [49], and equalized odds [90]. However, many of these notions conflict [66] and there is little to no consensus as to what notion (if any) is the correct one. An important motivation for the study of fairness in machine learning is the recognition that machine learning can be applied in situations where the stakes are high such as crime sentencing [10, 60] and loan approval [90, 118]. The concern is that due to training bias, bias inherited in the data [118], or other reasons, models trained may be biased. There have been many attempts to create truly fair models, from removing sensitive attributes from models representations [150, 273] to optimizing models to ignore sensitive attributes [33, 249]. Generative models provide ways of generating fair data rather than having to train fair models or modifying an existing model to guarantee some fairness criterion. There are several proposals to generate fair datasets. One is FairGAN by Xu et al. [254]. FairGAN can remove sensitive attributes and recreate the data without them. This has been used with text datasets to recreate the whole dataset and maintain relationships while removing any indication of sensitive features. In a similar vein, Sattigeri et al. [205] propose removing sensitive features such as skin color from image databases. The hope is that as generative models improve, these methods can be used to completely replace datasets with sanitized versions. This would ensure that decisions made with that data are fair, without having to specifically train a fair model. A separate line of work explores the use of generative models to detect unfair models. While detecting unfair models can be done without generative models, one way to check for fairness is to use a generative model to synthesize data points with and without sensitive information and then feed them to the potentially unfair model. If the model returns Generative Models for Security: Attacks, Defenses, and Opportunities 17 different results, then it is unfair. This idea has been applied to both text[26] and images [58], by changing sensitive attributes such as race, skin color, age, and others and evaluating model output for different values of these attributes.

9 THREAT OF GENERATED INFORMATION Generative models can be used to falsify information or generate fake information that is then presented as legitimate in an attempt to influence people’s thinking or decisions. A prominent example is deepfakes73 [ ], which are realistic- looking media (i.e., audio, photos, or videos) that depict situations or events that did not occur such as famous people saying things they did not say. The concern surrounding deepfakes is its potential use as disinformation: e.g., that unsuspecting individuals may believe that a is genuine and share it [6] on social media thereby potentially influencing public opinion [223] or maybe even the outcome of elections [212]. A related concern is that deepfakes have the potential to undermine the public’s trust in information [44]. It is also worth noting that the use of this technology for certain purposes has already been outlawed in several US states [52, 126] due to its potential to cause harm, for example through the creation of revenge and celebrity porn.

9.1 Creation of Deepfakes and Text-based Synthetic Media The use of generative models to modify images or videos has legitimate applications. Early examples of this technology focused on making small alterations to images, such as making a person look older [11] or change the color of a purse [291]. These techniques were then further utilized to alter images more substantially, for example by pasting one person’s face onto an image or video of someone else [170], creating an entirely new (fictional) people [120], or even lips-syncing entirely new audio [218]. These techniques are the basis for the most famous example of this threat, deepfakes. Deepfakes were first created by a user named Deepfake. There are several methods to create deepfakes, but an early one involved using autoencoders trained on facial images. The encoder extracts features from the face and the decoder reconstructs the face. By making two encoder-decoder pairs it is possible to swap the encoders and thereby swap faces. When the output of the encoder for face A is passed to the decoder of face B, the features of face A are essentially “pasted” onto face B [169]. While theoretically used by amateurs as a proof of concept, the threat of this technology has been demonstrated on politicians [234], celebrities [156], and in [241]. A popular method to generate deepfakes is DeepFaceLab [183]. This framework allows users to create their own deepfake models and customize them with different model architectures, loss functions, and targets. It is popular because it achieves high-quality results while requiring a comparatively small amount of user input. This is in contrast to earlier methods such as Synthesizing Obama [218], which required the manual construction of a full 3-D model of Obama’s face. Audio can also be faked as part of a video deepfake, or a standalone audio deepfake can be produced. Audio deepfakes are often produced through voice conversion [12, 39] or impersonation [80]. This process takes as input a sentence by the attacker and outputs an audio sample of the target uttering that sentence. While there are a variety of models and methods used, the general idea is to train a GAN or autoencoder to replicate the speaker’s pitch, tone, and more features, while ensuring that the result matches the desired sentence(s). These techniques have already been used in several real-world voice impersonation attacks where companies were tricked into forwarding money to attackers [217]. The threat of model-generated information presented as genuine is not limited to visual or audio media. For example, advances in language modeling have resulted in models that produce increasingly realistic outputs. For example, when OpenAI’s GPT-2 [211] was released, the authors were concerned about malicious use of their work [47]. To prevent 18 Bauer and Bindschaedler their model from being used for spam and disinformation, they initially released a smaller version of their model that only had about 124 million parameters. Several months later, they released a 355M version, then 774M, and finally they released the full 1.5 billion parameters model in November of 2019 [210]. The time between each release allowed for detectors and other defensive measures to be developed. Since then GPT-3 has also been developed. While the model has not been open-sourced, an API makes it available to the public. Samples from it [70] alone, which can fool fool humans, have sparked concerns [54]. For instance, a blog received 26 thousand visitors before someone evoked the possibility that its articles were generated using GPT-3 [188]. In contrast to deepfakes, which are enabled by generative models, text-based disinformation can easily be produced and disseminated by humans without the use of language models such as GPT-2 and GPT-3.

9.2 Detecting Synthetic Media Several techniques have been proposed to detect deepfakes and generative models’ output more generally. These often fall into two broad categories: (1) identify artifacts commonly present in generative models’ output, or (2) detect if a possibly synthetic image came from a known generative model. An example of a technique that falls in the first category is Li et al.139 [ ]. The idea is to use the fact that many deepfake generators do not properly replicate blinking. Indeed, in many generated videos the subject of the video did not blink at all or if they did it was rare. While this detection method worked for the models they examined (LSTM-RNN), better generative models were then designed to take blinking into account. Since then, there have been similar proposals that use politician facial patterns [4], eye gaze [56], or general biological signals [46] to identify deepfakes. The second category attempts to leverage knowledge of the model to detect synthetic media. This is related to the attribution problem, which involves attributing synthetic media to the specific model that produced it. Detectors such as GLTR [81] use the likelihood of text under GPT-2 to determine if that text was generated using GPT-2. Several papers have produced methods of inverting images back to the model [7], or looking for specific telltale artifacts left behind by certain models [270]. Zhang et al. [275] uses entropy measures to estimate if an image came from one of their known models. Chai et al. [34] developed a method to split an image up into patches and use them to identify places where a synthetic image is most detectable. This allows defenders to quickly identify artifacts left behind by new models. A major downside of the attribution approach is that the model used must be known. While this assumption may be true for adversaries that use public models, an adversary who trains their own model may be able to avoid detection. Additionally, some methods do not neatly fall into the two aforementioned categories. For example, Nataraj et al. [168] propose splitting an image into co-occurrence matrices they are able to detect if it was generated with a GAN. Wang et al. [242] show that a detector trained on one image generation model is transferable to many other image generators while still maintaining high accuracy.

10 DISCUSSION & OPEN PROBLEMS There are already many applications of generative models to security and privacy. It is expected that as research into generative model architectures leads to improvements, more security-related applications will emerge. In addition, improvements will also enhance existing applications. For example, generative models producing more realistic outputs will improve the security of steganography techniques (Section 7), and the quality of synthetic datasets produced for privacy (Section 4). At the same time, these improvements will also increase the quality of deepfakes and synthetic media, thus making them harder to detect. Generative Models for Security: Attacks, Defenses, and Opportunities 19

Beyond improvements in generative models architectures and techniques, there are important open research problems that deserve discussion.

Evaluating generative models. As we discussed in Section 2.3, there is no one-size-fits-all metric and each specific application of generative techniques often requires its own sets of metrics and evaluation methodology. A metric or singular process that would be able to evaluate output realism, similarity to the training dataset, and diversity would vastly simplify the model training process. This problem is exacerbated when considering generative models in a security context, due to the lack of security metrics for some applications. For example, when generative models are used for image-based steganography (Section 7), the goal of the evaluation methodology may be to quantify the ability of an adversary to distinguish between images that have been produced using the generative model (and thus embed some hidden message) and “normal” images. But this depends on the adversarial threat model (i.e., on the assumed capabilities and knowledge of the adversary). Are the images being compared against samples in a dataset of “normal” images? Are they being evaluated by a human? In the first case, image comparison metrics are needed, while in the second human evaluation metrics would be more accurate. In contrast, when applying generative models to produce adversarial examples (Section 5.1), important factors may include how well the attacking model represents the target’s latent space, as well as the output diversity/stability. A model that scores high on these metrics would be able to generate a large number of adversarial examples, even if they may not be the most realistic. A possible direction for future research is to develop metrics and methodologies for evaluating generative models within the context of specific security-related applications.

Detecting and attributing synthetic media to generative models. A related open problem is the detection of synthetic media and its attribution to specific generative models. It seems natural to think that the more realistic a generative model’s output is, the harder it becomes to distinguish it from real data. However, this is not necessarily true. For instance, take a generative model that reproduces exactly a photo from its training data and only ever produces this specific output; the model is easy to detect. It is also clear what it cannot produce (i.e., everything else). As described in Section 9, there are several techniques to detect deepfakes on the basis of “artifacts” left during the generation process. However, the limitation of these approaches is that one can (in principle) always train a better generative model that does not produce these artifacts. Thus, defenses in this vein are chasing an ever-moving target. Future research may investigate the possibility of detecting entire classes of generative techniques by the kind of artifacts they may produce, or examine the relationship between model complexity and the kinds of outputs it can produce.

11 CONCLUSIONS Generative models have many applications. In this paper, we present a comprehensive survey of their application to security-related problems. We describe the different architectures of generative models and how they apply to different types of data or applications. We taxonomize metrics of generative models quality. We then turn our study to examining the use of generative models to enhance, automate, or otherwise facilitate attacks. We also survey their defensive uses and survey recent research on the use of generative techniques for privacy-preserving data synthesis, steganography, and fairness. We discuss the new threats of generative models being used to produce deepfakes and other types of synthetic media for the purpose of disinformation. Finally, we highlight some open problems and possible research directions. 20 Bauer and Bindschaedler

REFERENCES [1] Martin Abadi, Andy Chu, , H Brendan McMahan, Ilya Mironov, Kunal Talwar, and Li Zhang. 2016. Deep learning with differential privacy. In Proceedings of the 2016 ACM SIGSAC conference on computer and communications security. 308–318. [2] Nazmiye Ceren Abay, Yan Zhou, Murat Kantarcioglu, Bhavani Thuraisingham, and Latanya Sweeney. 2018. Privacy preserving synthetic data release using deep learning. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer, 510–526. [3] Gergely Acs, Luca Melis, Claude Castelluccia, and Emiliano De Cristofaro. 2018. Differentially private mixture of generative neural networks. IEEE Transactions on Knowledge and Data Engineering 31, 6 (2018), 1109–1121. [4] Shruti Agarwal, Hany Farid, Yuming Gu, Mingming He, Koki Nagano, and Hao Li. 2019. Protecting world leaders against deep fakes. In Proceedings of the IEEE Conference on Computer Vision and Workshops. 38–45. [5] Karim Ahmed, Nitish Shirish Keskar, and Richard Socher. 2017. Weighted transformer network for machine translation. arXiv preprint arXiv:1711.02132 (2017). [6] Saifuddin Ahmed. 2021. Who inadvertently shares deepfakes? Analyzing the role of political interest, cognitive ability, and social network size. Telematics and Informatics 57 (2021), 101508. [7] Michael Albright and Scott McCloskey. 2019. Source generator attribution via inversion. In CVPR Workshop on Media Forensics. 96–103. [8] Moustafa Alzantot and Mani Srivastava. 2019. Differential Privacy Synthetic Data Generation using WGANs. https://github.com/nesl/ nist_differential_privacy_synthetic_data_challenge. [9] Hyrum S Anderson, Jonathan Woodbridge, and Bobby Filar. 2016. DeepDGA: Adversarially-tuned domain generation and detection. In Proceedings of the 2016 ACM Workshop on and Security. 13–21. [10] Julia Angwin, Jeff Larson, Surya Mattu, and Lauren Kirchner. 2016. Machine bias. ProPublica, May 23 (2016), 2016. [11] Grigory Antipov, Moez Baccouche, and Jean-Luc Dugelay. 2017. Face aging with conditional generative adversarial networks. In 2017 IEEE international conference on image processing (ICIP). IEEE, 2089–2093. [12] Sercan O Arik, Jitong Chen, Kainan Peng, Wei Ping, and Yanqi Zhou. 2018. Neural voice cloning with a few samples. arXiv preprint arXiv:1802.06006 (2018). [13] Martin Arjovsky, Soumith Chintala, and Léon Bottou. 2017. Wasserstein generative adversarial networks. In Proceedings of the 34th International Conference on Machine Learning-Volume 70. 214–223. [14] Sanjeev Arora and Yi Zhang. 2017. Do gans actually learn the distribution? an empirical study. arXiv preprint arXiv:1706.08224 (2017). [15] Anish Athalye, Logan Engstrom, Andrew Ilyas, and Kevin Kwok. 2018. Synthesizing robust adversarial examples. In International conference on machine learning. PMLR, 284–293. [16] Sean Augenstein, H Brendan McMahan, Daniel Ramage, Swaroop Ramaswamy, Peter Kairouz, Mingqing Chen, Rajiv Mathews, et al. 2019. Generative models for effective ml on private, decentralized datasets. arXiv preprint arXiv:1911.06679 (2019). [17] Shumeet Baluja. 2017. Hiding images in plain sight: Deep steganography. In Advances in Neural Information Processing Systems. 2069–2079. [18] Aayush Bansal, Shugao Ma, Deva Ramanan, and Yaser Sheikh. 2018. Recycle-gan: Unsupervised video retargeting. In Proceedings of the European conference on computer vision (ECCV). 119–135. [19] Ruying Bao, Sihang Liang, and Qingcan Wang. 2018. Featurized bidirectional gan: Adversarial defense via adversarially learned semantic inference. arXiv preprint arXiv:1805.07862 (2018). [20] David Bau, Jun-Yan Zhu, Jonas Wulff, William Peebles, Hendrik Strobelt, Bolei Zhou, and Antonio Torralba. 2019. Seeing what aGANcannot generate. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 4502–4511. [21] Brett K Beaulieu-Jones, Zhiwei Steven Wu, Chris Williams, Ran Lee, Sanjeev P Bhavnani, James Brian Byrd, and Casey S Greene. 2019. Privacy- preserving generative deep neural networks support clinical data sharing. Circulation: Cardiovascular Quality and Outcomes 12, 7 (2019), e005122. [22] Walter Bender, Daniel Gruhl, Norishige Morimoto, and Anthony Lu. 1996. Techniques for data hiding. IBM systems journal 35, 3.4 (1996), 313–336. [23] Krista Bennett. 2004. Linguistic steganography: Survey, analysis, and robustness concerns for hiding information in text. (2004). [24] Vincent Bindschaedler, Reza Shokri, and Carl A Gunter. 2017. Plausible Deniability for Privacy-Preserving Data Synthesis. Proceedings of the VLDB Endowment 10, 5 (2017). [25] Mikołaj Bińkowski, Jeff Donahue, Sander Dieleman, Aidan Clark, Erich Elsen, Norman Casagrande, Luis C Cobo, and Karen Simonyan. 2019.High Fidelity Speech Synthesis with Adversarial Networks. In International Conference on Learning Representations. [26] Emily Black, Samuel Yeom, and Matt Fredrikson. 2020. FlipTest: fairness testing via optimal transport. In Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency. 111–121. [27] David M Blei, Alp Kucukelbir, and Jon D McAuliffe. 2017. Variational inference: A review for statisticians. Journal of the American statistical Association 112, 518 (2017), 859–877. [28] Philip Bontrager, Aditi Roy, Julian Togelius, Nasir Memon, and Arun Ross. 2018. Deepmasterprints: Generating masterprints for dictionary attacks via latent variable evolution. In 2018 IEEE 9th International Conference on Biometrics Theory, Applications and Systems (BTAS). IEEE, 1–9. [29] Ali Borji. 2019. Pros and cons of gan evaluation measures. Computer Vision and Image Understanding 179 (2019), 41–65. [30] Andrew Brock, Jeff Donahue, and Karen Simonyan. 2018. Large Scale GAN Training for High Fidelity Natural Image Synthesis. In International Conference on Learning Representations. Generative Models for Security: Attacks, Defenses, and Opportunities 21

[31] Tom B Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. 2020. Language models are few-shot learners. arXiv preprint arXiv:2005.14165 (2020). [32] Yuri Burda, Roger B Grosse, and Ruslan Salakhutdinov. 2016. Importance Weighted Autoencoders. In ICLR (Poster). [33] Toon Calders, Faisal Kamiran, and Mykola Pechenizkiy. 2009. Building classifiers with independency constraints. In 2009 IEEE International Conference on Data Mining Workshops. IEEE, 13–18. [34] Lucy Chai, David Bau, Ser-Nam Lim, and Phillip Isola. 2020. What makes fake images detectable? Understanding properties that generalize. In European Conference on Computer Vision. Springer, 103–120. [35] Anirban Chakraborty, Manaar Alam, Vishal Dey, Anupam Chattopadhyay, and Debdeep Mukhopadhyay. 2018. Adversarial attacks and defences: A survey. arXiv preprint arXiv:1810.00069 (2018). [36] Sarin E Chandy, Amin Rasekh, Zachary A Barker, and M Ehsan Shafiee. 2019. Cyberattack detection using deep generative models with variational inference. Journal of Water Resources Planning and Management 145, 2 (2019), 04018093. [37] Kejiang Chen, Hang Zhou, Hanqing Zhao, Dongdong Chen, Weiming Zhang, and Nenghai Yu. 2021. Distribution-Preserving Steganography Based on Text-to-Speech Generative Models. IEEE Transactions on Dependable and Secure Computing (2021). [38] Liqun Chen, Shuyang Dai, Chenyang Tao, Haichao Zhang, Zhe Gan, Dinghan Shen, Yizhe Zhang, Guoyin Wang, Ruiyi Zhang, and Lawrence Carin. 2018. Adversarial text generation via feature-mover’s distance. In Advances in Neural Information Processing Systems. 4666–4677. [39] Ling-Hui Chen, Zhen-Hua Ling, Li-Juan Liu, and Li-Rong Dai. 2014. Voice conversion using deep neural networks with layer-wise generative training. IEEE/ACM Transactions on Audio, Speech, and Language Processing 22, 12 (2014), 1859–1872. [40] Xi Chen, Yan Duan, Rein Houthooft, John Schulman, Ilya Sutskever, and Pieter Abbeel. 2016. Infogan: Interpretable representation learning by information maximizing generative adversarial nets. In Advances in neural information processing systems. 2172–2180. [41] Xinyun Chen, Chang Liu, Bo Li, Kimberly Lu, and Dawn Song. 2017. Targeted backdoor attacks on deep learning systems using data poisoning. arXiv preprint arXiv:1712.05526 (2017). [42] Xi Chen, Nikhil Mishra, Mostafa Rohaninejad, and Pieter Abbeel. 2018. Pixelsnail: An improved autoregressive generative model. In International Conference on Machine Learning. 864–872. [43] Kyunghyun Cho, Bart Van Merriënboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and . 2014. Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078 (2014). [44] Jon Christian. 2018. Lawmakers: Deepfakes Could “Undermine Public Trust” in “Objective Depictions of Reality”. https://futurism.com/the- byte/lawmakers-deepfakes-undermine-public-trust. [45] Casey Chu, Andrey Zhmoginov, and Mark Sandler. 2017. Cyclegan, a master of steganography. arXiv preprint arXiv:1712.02950 (2017). [46] Umur Aybars Ciftci, Ilke Demir, and Lijun Yin. 2020. Fakecatcher: Detection of synthetic portrait videos using biological signals. IEEE Transactions on Pattern Analysis and Machine Intelligence (2020). [47] Jack Clark. 2020. GPT-2: 6-Month Follow-Up. https://openai.com/blog/gpt-2-6-month-follow-up/. [48] Ronan Collobert and Jason Weston. 2008. A unified architecture for natural language processing: Deep neural networks with multitask learning. In Proceedings of the 25th international conference on Machine learning. 160–167. [49] Sam Corbett-Davies, Emma Pierson, Avi Feller, Sharad Goel, and Aziz Huq. 2017. Algorithmic decision making and the cost of fairness. In Proceedings of the 23rd acm sigkdd international conference on knowledge discovery and data mining. 797–806. [50] Isaac Corley, Jonathan Lwowski, and Justin Hoffman. 2019. DomainGAN: Generating Adversarial Examples to Attack Domain Generation Classifiers. arXiv preprint arXiv:1911.06285 (2019). [51] Qi Cui, Zhili Zhou, Zhangjie Fu, Ruohan Meng, Xingming Sun, and QM Jonathan Wu. 2019. Image steganography based on foreground object generation by generative adversarial networks in mobile edge computing with Internet of Things. IEEE Access 7 (2019), 90815–90824. [52] Cara Curtis. 2019. California makes deepfakes illegal to curb revenge porn and doctored political videos. https://thenextweb.com/tech/2019/10/07/ california-makes-deepfakes-illegal-to-curb-revenge-porn-and-doctored-political-videos/. [53] Zihang Dai, Zhilin Yang, Yiming Yang, Jaime G Carbonell, Quoc Le, and Ruslan Salakhutdinov. 2019. Transformer-XL: Attentive Language Models beyond a Fixed-Length Context. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 2978–2988. [54] Sejuti Das. 2020. How OpenAI’s GPT-3 Can Be Alarming For The Society. https://analyticsindiamag.com/how-openais-gpt-3-can-be-alarming- for-the-society/. [55] Mostafa Dehghani, Stephan Gouws, Oriol Vinyals, Jakob Uszkoreit, and Łukasz Kaiser. 2018. Universal transformers. arXiv preprint arXiv:1807.03819 (2018). [56] Ilke Demir and Umur A Ciftci. 2021. Where Do Deep Fakes Look? Synthetic Face Detection via Gaze Tracking. arXiv preprint arXiv:2101.01165 (2021). [57] Michael Denkowski and Alon Lavie. 2011. Meteor 1.3: Automatic metric for reliable optimization and evaluation of machine translation systems. In Proceedings of the sixth workshop on statistical machine translation. Association for Computational Linguistics, 85–91. [58] Emily Denton, Ben Hutchinson, Margaret Mitchell, and Timnit Gebru. 2019. Detecting bias with generative counterfactual face attribute augmentation. arXiv preprint arXiv:1906.06439 (2019). [59] Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018). 22 Bauer and Bindschaedler

[60] William Dieterich, Christina Mendoza, and Tim Brennan. 2016. COMPAS risk scales: Demonstrating accuracy equity and predictive parity. Northpointe Inc (2016). [61] Chris Donahue, Julian McAuley, and Miller Puckette. 2018. Adversarial Audio Synthesis. In International Conference on Learning Representations. [62] Jeff Donahue, Philipp Krähenbühl, and Trevor Darrell. 2016. Adversarial feature learning. arXiv preprint arXiv:1605.09782 (2016). [63] Hao-Wen Dong, Wen-Yi Hsiao, Li-Chia Yang, and Yi-Hsuan Yang. 2018. Musegan: Multi-track sequential generative adversarial networks for symbolic music generation and accompaniment. In Thirty-Second AAAI Conference on Artificial Intelligence. [64] Alexey Dosovitskiy and Thomas Brox. 2016. Generating images with perceptual similarity metrics based on deep networks. In Advances in neural information processing systems. 658–666. [65] Cynthia Dwork. 2008. Differential privacy: A survey of results. In International conference on theory and applications of models of computation. Springer, 1–19. [66] Cynthia Dwork, Moritz Hardt, Toniann Pitassi, Omer Reingold, and Richard Zemel. 2012. Fairness through awareness. In Proceedings of the 3rd innovations in theoretical computer science conference. 214–226. [67] Javid Ebrahimi, Anyi Rao, Daniel Lowd, and Dejing Dou. 2018. HotFlip: White-Box Adversarial Examples for Text Classification. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). 31–36. [68] Jesse Engel, Kumar Krishna Agrawal, Shuo Chen, Ishaan Gulrajani, Chris Donahue, and Adam Roberts. 2018. GANSynth: Adversarial Neural Audio Synthesis. In International Conference on Learning Representations. [69] Jesse Engel, Matthew Hoffman, and Adam Roberts. 2018. Latent Constraints: Learning to Generate Conditionally from Unconditional Generative Models. In International Conference on Learning Representations. [70] GPT3 Examples. 2020. https://gpt3examples.com/. [71] Kevin Eykholt, Ivan Evtimov, Earlence Fernandes, Bo Li, Amir Rahmati, Chaowei Xiao, Atul Prakash, Tadayoshi Kohno, and Dawn Song. 2018. Robust physical-world attacks on deep learning visual classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1625–1634. [72] Nicolas Falliere, Liam O Murchu, and Eric Chien. 2011. W32. stuxnet dossier. White paper, Symantec Corp., Security Response 5, 6 (2011), 29. [73] Don Fallis. 2020. The Epistemic Threat of Deepfakes. Philosophy & Technology (2020), 1–21. [74] William Fedus, Ian Goodfellow, and Andrew M Dai. 2018. MaskGAN: Better Text Generation via Filling in the _. In International Conference on Learning Representations. [75] Cheng Feng, Tingting Li, Zhanxing Zhu, and Deeph Chana. 2017. A deep learning-based framework for conducting stealthy attacks in industrial control systems. arXiv preprint arXiv:1709.06397 (2017). [76] Robert Fortet and Edith Mourier. 1953. Convergence de la répartition empirique vers la répartition théorique. In Annales scientifiques de l’École Normale Supérieure, Vol. 70. 267–285. [77] Matt Fredrikson, Somesh Jha, and Thomas Ristenpart. 2015. Model inversion attacks that exploit confidence information and basic countermeasures. In Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security. 1322–1333. [78] Jessica Fridrich and Jan Kodovsky. 2012. Rich models for steganalysis of digital images. IEEE Transactions on Information Forensics and Security 7, 3 (2012), 868–882. [79] Lorenzo Frigerio, Anderson Santana de Oliveira, Laurent Gomez, and Patrick Duverger. 2019. Differentially private generative adversarial networks for time series, continuous, and discrete open data. In IFIP International Conference on ICT Systems Security and Privacy Protection. Springer, 151–164. [80] Yang Gao, Rita Singh, and Bhiksha Raj. 2018. Voice impersonation using generative adversarial networks. In 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2506–2510. [81] Sebastian Gehrmann, Hendrik Strobelt, and Alexander Rush. 2019. GLTR: Statistical Detection and Visualization of Generated Text. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: System Demonstrations. Association for Computational Linguistics, Florence, Italy, 111–116. https://doi.org/10.18653/v1/P19-3019 [82] Felix A Gers, Jürgen Schmidhuber, and Fred Cummins. 2000. Learning to Forget: Continual Prediction with LSTM. Neural Computation 12, 10 (2000), 2451–2471. [83] Aidan N Gomez, Mengye Ren, Raquel Urtasun, and Roger B Grosse. 2017. The reversible residual network: Backpropagation without storing activations. In Advances in neural information processing systems. 2214–2224. [84] Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative adversarial nets. In Advances in neural information processing systems. 2672–2680. [85] Ian J Goodfellow, Jonathon Shlens, and Christian Szegedy. 2014. Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572 (2014). [86] Alex Graves. 2013. Generating sequences with recurrent neural networks. arXiv preprint arXiv:1308.0850 (2013). [87] Alex Graves, Navdeep Jaitly, and Abdel-rahman Mohamed. 2013. Hybrid speech recognition with deep bidirectional LSTM. In 2013 IEEE workshop on automatic speech recognition and understanding. IEEE, 273–278. [88] Karol Gregor, Ivo Danihelka, Alex Graves, Danilo Jimenez Rezende, and Daan Wierstra. 2015. DRAW: A Recurrent Neural Network For Image Generation. In ICML. [89] Tianyu Gu, Kang Liu, Brendan Dolan-Gavitt, and Siddharth Garg. 2019. Badnets: Evaluating backdooring attacks on deep neural networks. IEEE Access 7 (2019), 47230–47244. Generative Models for Security: Attacks, Defenses, and Opportunities 23

[90] Moritz Hardt, Eric Price, and Nati Srebro. 2016. Equality of opportunity in supervised learning. In Advances in neural information processing systems. 3315–3323. [91] Anne-Wil Harzing. 2020. Publish or Perish. https://harzing.com/resources/publish-or-perish. [92] Mohamad H Hassoun et al. 1995. Fundamentals of artificial neural networks. MIT press. [93] Jamie Hayes and George Danezis. 2017. Generating steganographic images via adversarial training. arXiv preprint arXiv:1703.00371 (2017). [94] Jamie Hayes, Luca Melis, George Danezis, and Emiliano De Cristofaro. 2019. LOGAN: Membership inference attacks against generative models. Proceedings on Privacy Enhancing Technologies 2019, 1 (2019), 133–152. [95] Jiawei He, Andreas Lehrmann, Joseph Marino, Greg Mori, and Leonid Sigal. 2018. Probabilistic video generation using holistic attribute control. In Proceedings of the European Conference on Computer Vision (ECCV). 452–467. [96] Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler, and Sepp Hochreiter. 2017. Gans trained by a two time-scale update rule converge to a local nash equilibrium. In Advances in neural information processing systems. 6626–6637. [97] Irina Higgins, Loic Matthey, Arka Pal, Christopher Burgess, Xavier Glorot, Matthew Botvinick, Shakir Mohamed, and Alexander Lerchner. 2017. beta-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework. Iclr 2, 5 (2017), 6. [98] Benjamin Hilprecht, Martin Härterich, and Daniel Bernau. 2019. Monte carlo and reconstruction membership inference attacks against generative models. Proceedings on Privacy Enhancing Technologies 2019, 4 (2019), 232–249. [99] Briland Hitaj, Paolo Gasti, Giuseppe Ateniese, and Fernando Perez-Cruz. 2019. Passgan: A deep learning approach for password guessing. In International Conference on Applied Cryptography and Network Security. Springer, 217–237. [100] Vojtěch Holub and Jessica Fridrich. 2012. Designing steganographic distortion using directional filters. In 2012 IEEE International workshop on information forensics and security (WIFS). IEEE, 234–239. [101] Vojtěch Holub, Jessica Fridrich, and Tomáš Denemark. 2014. Universal distortion function for steganography in an arbitrary domain. EURASIP Journal on Information Security 2014, 1 (2014), 1. [102] Xianxu Hou, Linlin Shen, Ke Sun, and Guoping Qiu. 2017. Deep feature consistent variational autoencoder. In 2017 IEEE Winter Conference on Applications of Computer Vision (WACV). IEEE, 1133–1141. [103] Donghui Hu, Liang Wang, Wenjie Jiang, Shuli Zheng, and Bin Li. 2018. A novel image steganography method via deep convolutional generative adversarial networks. IEEE Access 6 (2018), 38303–38314. [104] Weiwei Hu and Ying Tan. 2017. Black-box attacks against RNN based malware detection algorithms. arXiv preprint arXiv:1705.08131 (2017). [105] Weiwei Hu and Ying Tan. 2017. Generating adversarial malware examples for black-box attacks based on gan. arXiv preprint arXiv:1702.05983 (2017). [106] Zhiting Hu, Zichao Yang, Xiaodan Liang, Ruslan Salakhutdinov, and Eric P Xing. 2017. Toward controlled generation of text. In Proceedings of the 34th International Conference on Machine Learning-Volume 70. JMLR. org, 1587–1596. [107] Cheng-Zhi Anna Huang, Ashish Vaswani, Jakob Uszkoreit, Noam Shazeer, Ian Simon, Curtis Hawthorne, Andrew M Dai, Matthew D Hoffman, Monica Dinculescu, and Douglas Eck. 2018. Music transformer. arXiv preprint arXiv:1809.04281 (2018). [108] Ling Huang, Anthony D Joseph, Blaine Nelson, Benjamin IP Rubinstein, and J Doug Tygar. 2011. Adversarial machine learning. In Proceedings of the 4th ACM workshop on Security and artificial intelligence. 43–58. [109] Daniel Jiwoong Im, Chris Dongjoo Kim, Hui Jiang, and Roland Memisevic. 2016. Generating images with recurrent adversarial networks. arXiv preprint arXiv:1602.05110 (2016). [110] Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, and Alexei A Efros. 2017. Image-to-image translation with conditional adversarial networks. In Proceedings of the IEEE conference on computer vision and pattern recognition. 1125–1134. [111] Ajay Jain, Pieter Abbeel, and Deepak Pathak. 2020. Locally Masked for Autoregressive Models. In Conference on Uncertainty in Artificial Intelligence. PMLR, 1358–1367. [112] Ajil Jalal, Andrew Ilyas, Constantinos Daskalakis, and Alexandros G Dimakis. 2017. The robust manifold defense: Adversarial training using generative models. arXiv preprint arXiv:1712.09196 (2017). [113] Fred Jelinek, Robert L Mercer, Lalit R Bahl, and James K Baker. 1977. Perplexity—a measure of the difficulty of speech recognition tasks. The Journal of the Acoustical Society of America 62, S1 (1977), S63–S63. [114] Shunzhi Jiang, Dengpan Ye, Jiaqing Huang, Yueyun Shang, and Zhuoyuan Zheng. 2020. SmartSteganogaphy: Light-weight generative audio steganography model for smart embedding application. Journal of Network and Computer Applications 165 (2020), 102689. [115] Guoqing Jin, Shiwei Shen, Dongming Zhang, Feng Dai, and Yongdong Zhang. 2019. APE-GAN: Adversarial perturbation elimination with GAN. In ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 3842–3846. [116] James Jordon, Jinsung Yoon, and Mihaela van der Schaar. 2018. PATE-GAN: Generating synthetic data with differential privacy guarantees. In International Conference on Learning Representations. [117] Ari Juels and Ronald L Rivest. 2013. Honeywords: Making password-cracking detectable. In Proceedings of the 2013 ACM SIGSAC conference on Computer & communications security. 145–160. [118] Nathan Kallus and Angela Zhou. 2018. Residual unfairness in fair machine learning from prejudiced data. In International Conference on Machine Learning. PMLR, 2439–2448. [119] Tero Karras, Timo Aila, Samuli Laine, and Jaakko Lehtinen. 2018. Progressive Growing of GANs for Improved Quality, Stability, and Variation. In International Conference on Learning Representations. 24 Bauer and Bindschaedler

[120] Tero Karras, Samuli Laine, and Timo Aila. 2019. A style-based generator architecture for generative adversarial networks. In Proceedings of the IEEE conference on computer vision and pattern recognition. 4401–4410. [121] Nitish Shirish Keskar, Bryan McCann, Lav R Varshney, Caiming Xiong, and Richard Socher. 2019. Ctrl: A conditional transformer language model for controllable generation. arXiv preprint arXiv:1909.05858 (2019). [122] Kevin Kilgour, Mauricio Zuluaga, Dominik Roblek, and Matthew Sharifi. 2018. Fréchet Audio Distance: A Metric for Evaluating Music Enhancement Algorithms. arXiv:1812.08466 [eess.AS] [123] Jin-Young Kim, Seok-Jun Bu, and Sung-Bae Cho. 2018. Zero-day malware detection using transferred generative adversarial networks based on deep autoencoders. Information Sciences 460 (2018), 83–102. [124] Jin-Young Kim and Sung-Bae Cho. 2018. Detecting intrusive malware with a hybrid generative deep learning model. In International Conference on Intelligent Data Engineering and Automated Learning. Springer, 499–507. [125] Diederik P Kingma and Max Welling. 2013. Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114 (2013). [126] Kirsten Korosec. 2019. ’Deepfake’ revenge porn is now illegal in Virginia. https://techcrunch.com/2019/07/01/deepfake-revenge-porn-is-now- illegal-in-virginia/. [127] Jernej Kos, Ian Fischer, and Dawn Song. 2018. Adversarial examples for generative models. In 2018 IEEE Security and Privacy Workshops (SPW). IEEE, 36–42. [128] Felix Kreuk, Yossi Adi, Bhiksha Raj, Rita Singh, and Joseph Keshet. 2019. Hide and speak: Deep neural networks for speech steganography. arXiv preprint arXiv:1902.03083 (2019). [129] Ben Kröse, Ben Krose, Patrick van der Smagt, and Patrick Smagt. 1993. An introduction to neural networks. (1993). [130] Kundan Kumar, Rithesh Kumar, Thibault de Boissiere, Lucas Gestin, Wei Zhen Teoh, Jose Sotelo, Alexandre de Brébisson, Yoshua Bengio, and Aaron Courville. 2019. Melgan: Generative adversarial networks for conditional waveform synthesis. arXiv preprint arXiv:1910.06711 (2019). [131] Matt J Kusner and José Miguel Hernández-Lobato. 2016. Gans for sequences of discrete elements with the gumbel-softmax distribution. arXiv preprint arXiv:1611.04051 (2016). [132] Matt J Kusner, Joshua R Loftus, Chris Russell, and Ricardo Silva. 2017. Counterfactual fairness. arXiv preprint arXiv:1703.06856 (2017). [133] Anders Boesen Lindbo Larsen, Søren Kaae Sønderby, Hugo Larochelle, and Ole Winther. 2016. Autoencoding beyond pixels using a learned similarity metric. http://proceedings.mlr.press/v48/larsen16.html. In Proceedings of The 33rd International Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 48), Maria Florina Balcan and Kilian Q. Weinberger (Eds.). PMLR, New York, New York, USA, 1558–1566. [134] Hyeungill Lee, Sungyeob Han, and Jungwoo Lee. 2017. Generative adversarial trainer: Defense to adversarial perturbations with gan. arXiv preprint arXiv:1705.03387 (2017). [135] Erich L Lehmann and Joseph P Romano. 2006. Testing statistical hypotheses. Springer Science & Business Media. [136] Klas Leino and Matt Fredrikson. 2020. Stolen memories: Leveraging model memorization for calibrated white-box membership inference. In 29th {USENIX} Security Symposium ({USENIX} Security 20). 1605–1622. [137] Jun Li, Ke Niu, Liwei Liao, Lijie Wang, Jia Liu, Yu Lei, and Minqing Zhang. 2020. A Generative Steganography Method Based on WGAN-GP. In International Conference on Artificial Intelligence and Security. Springer, 386–397. [138] Naihan Li, Shujie Liu, Yanqing Liu, Sheng Zhao, and Ming Liu. 2019. Neural speech synthesis with transformer network. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33. 6706–6713. [139] Yuezun Li, Ming-Ching Chang, and Siwei Lyu. 2018. In ictu oculi: Exposing ai created fake videos by detecting eye blinking. In 2018 IEEE International Workshop on Information Forensics and Security (WIFS). IEEE, 1–7. [140] Xiaodan Liang, Lisa Lee, Wei Dai, and Eric P Xing. 2017. Dual motion GAN for future-flow embedded video prediction. In Proceedings of the IEEE International Conference on Computer Vision. 1744–1752. [141] Chin-Yew Lin and Franz Josef Och. 2004. Automatic evaluation of machine translation quality using longest common subsequence and skip-bigram statistics. In Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics. Association for Computational Linguistics, 605. [142] Zilong Lin, Yong Shi, and Zhi Xue. 2018. Idsgan: Generative adversarial networks for attack generation against intrusion detection. arXiv preprint arXiv:1809.02077 (2018). [143] Ming-ming Liu, Min-qing Zhang, Jia Liu, Ying-nan Zhang, and Yan Ke. 2017. Coverless information hiding based on generative adversarial networks. arXiv preprint arXiv:1712.06951 (2017). [144] Tianyu Liu, Kexiang Wang, Lei Sha, Baobao Chang, and Zhifang Sui. 2018. Table-to-text generation by structure-aware seq2seq learning. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 32. [145] Yingqi Liu, Shiqing Ma, Yousra Aafer, Wen-Chuan Lee, Juan Zhai, Weihang Wang, and Xiangyu Zhang. 2017. Trojaning attack on neural networks. (2017). [146] Yunhui Long, Vincent Bindschaedler, Lei Wang, Diyue Bu, Xiaofeng Wang, Haixu Tang, Carl A Gunter, and Kai Chen. 2018. Understanding membership inferences on well-generalized learning models. arXiv preprint arXiv:1802.04889 (2018). [147] Yunhui Long, Suxin Lin, Zhuolin Yang, Carl A Gunter, and Bo Li. 2019. Scalable differentially private generative student model via pate. arXiv preprint arXiv:1906.09338 (2019). [148] Yunhui Long, Lei Wang, Diyue Bu, Vincent Bindschaedler, Xiaofeng Wang, Haixu Tang, Carl A Gunter, and Kai Chen. 2020. A Pragmatic Approach to Membership Inferences on Machine Learning Models. In 5th IEEE European Symposium on Security and Privacy, Euro S and P 2020. Institute of Electrical and Electronics Engineers Inc., 521–534. Generative Models for Security: Attacks, Defenses, and Opportunities 25

[149] Daniel P Lopresti and Jarret D Raim. 2005. The effectiveness of generative attacks on an online handwriting biometric. In International Conference on Audio-and Video-Based Biometric Person Authentication. Springer, 1090–1099. [150] Christos Louizos, Kevin Swersky, Yujia Li, Max Welling, and Richard Zemel. 2015. The variational fair autoencoder. arXiv preprint arXiv:1511.00830 (2015). [151] Mario Lucic, Karol Kurach, Marcin Michalski, Sylvain Gelly, and Olivier Bousquet. 2018. Are gans created equal? a large-scale study. In Advances in neural information processing systems. 700–709. [152] Yubo Luo and Yongfeng Huang. 2017. Text steganography with high embedding rate: Using recurrent neural networks to generate chinese classic poetry. In Proceedings of the 5th ACM workshop on information hiding and multimedia security. 99–104. [153] Adam Lutz, Victor F Sansing III, Waleed E Farag, and Soundararajan Ezekiel. 2019. Malware classification using fusion of neural networks. In Disruptive Technologies in Information Sciences II, Vol. 11013. International Society for Optics and Photonics, 110130X. [154] Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. 2018. Towards Deep Learning Models Resistant to Adversarial Attacks. In International Conference on Learning Representations. [155] Alireza Makhzani, Jonathon Shlens, Navdeep Jaitly, Ian Goodfellow, and Brendan Frey. 2015. Adversarial autoencoders. arXiv preprint arXiv:1511.05644 (2015). [156] Carly Mallenbaum. 2019. Bill Hader has NOT seen those deepfake videos he’s in: ’It’s a weird technology, man’. https://www.usatoday.com/story/ entertainment/movies/2019/09/02/bill-hader-has-not-seen-those-deepfake-videos/2145558001/. [157] Xudong Mao, Qing Li, Haoran Xie, Raymond YK Lau, Zhen Wang, and Stephen Paul Smolley. 2017. Least squares generative adversarial networks. In Proceedings of the IEEE International Conference on Computer Vision. 2794–2802. [158] ML Mat Kiah, BB Zaidan, AA Zaidan, A Mohammed Ahmed, and Sameer Hasan Al-Bakri. 2011. A review of audio based steganography and digital watermarking. International Journal of Physical Sciences 6, 16 (2011), 3837–3850. [159] Tomáš Mikolov, Martin Karafiát, Lukáš Burget, Jan Černocky,` and Sanjeev Khudanpur. 2010. Recurrent neural network based language model. In Eleventh annual conference of the international speech communication association. [160] Shervin Minaee and Amirali Abdolrashidi. 2018. Finger-gan: Generating realistic fingerprint images using connectivity imposed gan. arXiv preprint arXiv:1812.10482 (2018). [161] Alistair Moffat. 2019. Huffman coding. ACM Computing Surveys (CSUR) 52, 4 (2019), 1–35. [162] Olof Mogren. 2016. C-RNN-GAN: Continuous recurrent neural networks with adversarial training. arXiv preprint arXiv:1611.09904 (2016). [163] Hadi Mohaghegh Dolatabadi, Sarah Erfani, and Christopher Leckie. 2020. AdvFlow: Inconspicuous Black-box Adversarial Attacks using Normalizing Flows. In Advances in Neural Information Processing Systems, H. Larochelle, M. Ranzato, R. Hadsell, M. F. Balcan, and H. Lin (Eds.), Vol. 33. Curran Associates, Inc., 15871–15884. https://proceedings.neurips.cc/paper/2020/file/b6cf334c22c8f4ce8eb920bb7b512ed0-Paper.pdf [164] John V Monaco, Md Liakat Ali, and Charles C Tappert. 2015. Spoofing key-press latencies with a generative keystroke dynamics model. In 2015 IEEE 7th international conference on biometrics theory, applications and systems (BTAS). IEEE, 1–8. [165] H Hernan Moraldo. 2014. An Approach for text steganography based on Markov Chains. arXiv preprint arXiv:1409.0915 (2014). [166] Sungyup Nam, Seungho Jeon, Hongkyo Kim, and Jongsub Moon. 2020. Recurrent gans password cracker for iot password security enhancement. Sensors 20, 11 (2020), 3106. [167] Milad Nasr, Reza Shokri, et al. 2020. Improving Deep Learning with Differential Privacy using Gradient Encoding and Denoising. arXiv preprint arXiv:2007.11524 (2020). [168] Lakshmanan Nataraj, Tajuddin Manhar Mohammed, BS Manjunath, Shivkumar Chandrasekaran, Arjuna Flenner, Jawadul H Bappy, and Amit K Roy-Chowdhury. 2019. Detecting GAN generated fake images using co-occurrence matrices. Electronic Imaging 2019, 5 (2019), 532–1. [169] Thanh Thi Nguyen, Cuong M Nguyen, Dung Tien Nguyen, Duc Thanh Nguyen, and Saeid Nahavandi. 2019. Deep Learning for Deepfakes Creation and Detection. arXiv preprint arXiv:1909.11573 (2019). [170] Yuval Nirkin, Yosi Keller, and Tal Hassner. 2019. FSGAN: Subject agnostic face swapping and reenactment. In Proceedings of the IEEE international conference on computer vision. 7184–7193. [171] Augustus Odena, Christopher Olah, and Jonathon Shlens. 2017. Conditional image synthesis with auxiliary classifier gans. In Proceedings of the 34th International Conference on Machine Learning-Volume 70. JMLR. org, 2642–2651. [172] Catherine Olsson, Surya Bhupatiraju, Tom Brown, Augustus Odena, and Ian Goodfellow. 2018. Skill rating for generative models. arXiv preprint arXiv:1808.04888 (2018). [173] Aaron van den Oord, Sander Dieleman, Heiga Zen, Karen Simonyan, Oriol Vinyals, Alex Graves, Nal Kalchbrenner, Andrew Senior, and Koray Kavukcuoglu. 2016. Wavenet: A generative model for raw audio. arXiv preprint arXiv:1609.03499 (2016). [174] Timothy J O’Shea, Latha Pemula, Dhruv Batra, and T Charles Clancy. 2016. Radio transformer networks: Attention models for learning to synchronize in wireless systems. In 2016 50th Asilomar Conference on Signals, Systems and Computers. IEEE, 662–666. [175] Zhaoqing Pan, Weijie Yu, Xiaokai Yi, Asifullah Khan, Feng Yuan, and Yuhui Zheng. 2019. Recent progress on generative adversarial networks (GANs): A survey. IEEE Access 7 (2019), 36322–36333. [176] Nicolas Papernot, Martín Abadi, Ulfar Erlingsson, Ian Goodfellow, and Kunal Talwar. 2016. Semi-supervised knowledge transfer for deep learning from private training data. arXiv preprint arXiv:1610.05755 (2016). [177] Nicolas Papernot, Patrick McDaniel, Ian Goodfellow, Somesh Jha, Z Berkay Celik, and Ananthram Swami. 2017. Practical black-box attacks against machine learning. In Proceedings of the 2017 ACM on Asia conference on computer and communications security. 506–519. 26 Bauer and Bindschaedler

[178] Nicolas Papernot, Shuang Song, Ilya Mironov, Ananth Raghunathan, Kunal Talwar, and Úlfar Erlingsson. 2018. Scalable private learning with pate. arXiv preprint arXiv:1802.08908 (2018). [179] Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. BLEU: a method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting on association for computational linguistics. Association for Computational Linguistics, 311–318. [180] Sean Park, Iqbal Gondal, Joarder Kamruzzaman, and Jon Oliver. 2019. Generative malware outbreak detection. In 2019 IEEE International Conference on Industrial Technology (ICIT). IEEE, 1149–1154. [181] Niki Parmar, Ashish Vaswani, Jakob Uszkoreit, Lukasz Kaiser, Noam Shazeer, Alexander Ku, and Dustin Tran. 2018. Image Transformer. In International Conference on Machine Learning. 4055–4064. [182] Razvan Pascanu, Tomas Mikolov, and Yoshua Bengio. 2013. On the difficulty of training recurrent neural networks. In International conference on machine learning. 1310–1318. [183] Ivan Petrov, Daiheng Gao, Nikolay Chervoniy, Kunlin Liu, Sugasa Marangonda, Chris Umé, Jian Jiang, Luis RP, Sheng Zhang, Pingyu Wu, et al. 2020. DeepFaceLab: A simple, flexible and extensible face swapping framework. arXiv preprint arXiv:2005.05535 (2020). [184] Tomáš Pevny,` Tomáš Filler, and Patrick Bas. 2010. Using high-dimensional image models to perform highly undetectable steganography. In International Workshop on Information Hiding. Springer, 161–177. [185] Lionel Pibre, Jérôme Pasquet, Dino Ienco, and Marc Chaumont. 2016. Deep learning is a good steganalysis tool when embedding key is reused for different images, even if there is a cover sourcemismatch. Electronic Imaging 2016, 8 (2016), 1–11. [186] Haoyue Ping, Julia Stoyanovich, and Bill Howe. 2017. Datasynthesizer: Privacy-preserving synthetic datasets. In Proceedings of the 29th International Conference on Scientific and Statistical Database Management. 1–5. [187] Lip Y Por, TF Ang, and B Delina. 2008. Whitesteg: a new scheme in information hiding using text steganography. WSEAS transactions on computers 7, 6 (2008), 735–745. [188] Liam Porr. 2020. My GPT-3 Blog Got 26 Thousand Visitors in 2 Weeks. https://liamp.substack.com/p/my-gpt-3-blog-got-26-thousand-visitors. [189] Yinlong Qian, Jing Dong, Wei Wang, and Tieniu Tan. 2015. Deep learning for steganalysis via convolutional neural networks. In Media Watermarking, Security, and Forensics 2015, Vol. 9409. International Society for Optics and Photonics, 94090J. [190] Alec Radford, Luke Metz, and Soumith Chintala. 2015. Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv preprint arXiv:1511.06434 (2015). [191] Alec Radford, Jeff Wu, Rewon Child, David Luan, Dario Amodei, and Ilya Sutskever. 2019. Language Models are Unsupervised Multitask Learners. (2019). [192] Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J Liu. 2019.Exploring the limits of transfer learning with a unified text-to-text transformer. arXiv preprint arXiv:1910.10683 (2019). [193] Ali Razavi, Aaron van den Oord, and Oriol Vinyals. 2019. Generating diverse high-fidelity images with vq-vae-2. In Advances in Neural Information Processing Systems. 14837–14847. [194] Krishna Regmi and Ali Borji. 2018. Cross-view image synthesis using conditional gans. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3501–3510. [195] Yi Ren, Yangjun Ruan, Xu Tan, Tao Qin, Sheng Zhao, Zhou Zhao, and Tie-Yan Liu. 2019. Fastspeech: Fast, robust and controllable text to speech. In Advances in Neural Information Processing Systems. 3171–3180. [196] Eitan Richardson and Yair Weiss. 2018. On gans and gmms. In Advances in Neural Information Processing Systems. 5847–5858. [197] Salah Rifai, Pascal Vincent, Xavier Muller, Xavier Glorot, and Yoshua Bengio. 2011. Contractive auto-encoders: Explicit invariance during feature extraction. In Icml. [198] Lucas Rosenblatt, Xiaoyan Liu, Samira Pouyanfar, Eduardo de Leon, Anuj Desai, and Joshua Allen. 2020. Differentially private synthetic data: Applied evaluations and enhancements. arXiv preprint arXiv:2011.05537 (2020). [199] David E Rumelhart, Geoffrey E Hinton, and Ronald J Williams. 1986. Learning representations by back-propagating errors. nature 323, 6088 (1986), 533–536. [200] Mayu Sakurada and Takehisa Yairi. 2014. Anomaly detection using autoencoders with nonlinear dimensionality reduction. In Proceedings of the MLSDA 2014 2nd Workshop on Machine Learning for Sensory Data Analysis. 4–11. [201] Tim Salimans, Ian Goodfellow, Wojciech Zaremba, Vicki Cheung, Alec Radford, and Xi Chen. 2016. Improved techniques for training gans. In Advances in neural information processing systems. 2234–2242. [202] Tim Salimans, Andrej Karpathy, Xi Chen, and Diederik P Kingma. 2017. Pixelcnn++: Improving the pixelcnn with discretized logistic mixture likelihood and other modifications. arXiv preprint arXiv:1701.05517 (2017). [203] Pouya Samangouei, Maya Kabkab, and Rama Chellappa. 2018. Defense-GAN: Protecting Classifiers Against Adversarial Attacks Using Generative Models. In International Conference on Learning Representations. [204] Shibani Santurkar, Ludwig Schmidt, and Aleksander Mądry. 2017. A classification-based study of covariate shift in gan distributions. arXiv preprint arXiv:1711.00970 (2017). [205] Prasanna Sattigeri, Samuel C Hoffman, Vijil Chenthamarakshan, and Kush R Varshney. 2019. Fairness GAN: Generating datasets with fairness properties using a generative adversarial network. IBM Journal of Research and Development 63, 4/5 (2019), 3–1. [206] Franco Scarselli and Ah Chung Tsoi. 1998. Universal approximation using feedforward neural networks: A survey of some existing methods, and some new results. Neural networks 11, 1 (1998), 15–37. Generative Models for Security: Attacks, Defenses, and Opportunities 27

[207] David A Schum. 2001. The evidential foundations of probabilistic reasoning. Northwestern University Press. [208] Haichao Shi, Jing Dong, Wei Wang, Yinlong Qian, and Xiaoyu Zhang. 2017. SSGAN: secure steganography based on generative adversarial networks. In Pacific Rim Conference on Multimedia. Springer, 534–544. [209] Reza Shokri, Marco Stronati, Congzheng Song, and Vitaly Shmatikov. 2017. Membership inference attacks against machine learning models. In 2017 IEEE Symposium on Security and Privacy (SP). IEEE, 3–18. [210] Irene Solaiman. 2020. GPT-2: 1.5B Release. https://openai.com/blog/gpt-2-1-5b-release/. [211] Irene Solaiman, Miles Brundage, Jack Clark, Amanda Askell, Ariel Herbert-Voss, Jeff Wu, Alec Radford, and Jasmine Wang. 2019. Release strategies and the social impacts of language models. arXiv preprint arXiv:1908.09203 (2019). [212] Joan E. Solsman. 2020. Deepfakes’ threat to 2020 US election isn’t what you’d think. https://www.cnet.com/features/deepfakes-threat-to-the- 2020-us-election-isnt-what-youd-think/. [213] Dawn Song, Kevin Eykholt, Ivan Evtimov, Earlence Fernandes, Bo Li, Amir Rahmati, Florian Tramer, Atul Prakash, and Tadayoshi Kohno. 2018. Physical adversarial examples for object detectors. In 12th {USENIX} Workshop on Offensive Technologies ({WOOT} 18). [214] Yang Song, Taesup Kim, Sebastian Nowozin, Stefano Ermon, and Nate Kushman. 2017. Pixeldefend: Leveraging generative models to understand and defend against adversarial examples. arXiv preprint arXiv:1710.10766 (2017). [215] Yang Song, Rui Shu, Nate Kushman, and Stefano Ermon. 2018. Constructing unrestricted adversarial examples with generative models. In Advances in Neural Information Processing Systems. 8312–8323. [216] Theresa Stadler, Bristena Oprisanu, and Carmela Troncoso. 2021. Synthetic Data – Anonymisation Groundhog Day. arXiv:2011.07018 [cs.LG] [217] Nick Statt. 2019. Thieves are now using AI deepfakes to trick companies into sending them money. https://www.theverge.com/2019/9/5/20851248/ deepfakes-ai-fake-audio-phone-calls-thieves-trick-companies-stealing-money. [218] Supasorn Suwajanakorn, Steven M Seitz, and Ira Kemelmacher-Shlizerman. 2017. Synthesizing obama: learning lip sync from audio. ACM Transactions on Graphics (TOG) 36, 4 (2017), 1–13. [219] Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow, and Rob Fergus. 2013. Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199 (2013). [220] Uthaipon Tantipongpipat, Chris Waites, Digvijay Boob, Amaresh Ankit Siva, and Rachel Cummings. 2019. Differentially Private Mixed-Type Data Generation For . arXiv preprint arXiv:1912.03250 (2019). [221] L Theis, A van den Oord, and M Bethge. 2016. A note on the evaluation of generative models. In International Conference on Learning Representations (ICLR 2016). 1–10. [222] Tina Tina Fang, Martin Jaggi, and Katerina Argyraki. 2017. Generating Steganographic Text with LSTMs. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics-Student Research Workshop. 100–106. [223] Rob Toews. 2020. Deepfakes Are Going To Wreak Havoc On Society. We Are Not Prepared. https://www.forbes.com/sites/robtoews/2020/05/25/ deepfakes-are-going-to-wreak-havoc-on-society-we-are-not-prepared/#5687fc1d7494. [224] Reihaneh Torkzadehmahani, Peter Kairouz, and Benedict Paten. 2019. Dp-cgan: Differentially private synthetic data and label generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. 0–0. [225] David S Touretzky, Michael C Mozer, and Michael E Hasselmo. 1996. Advances in Neural Information Processing Systems 8: Proceedings of the 1995 Conference. Vol. 8. Mit Press. [226] Aleksei Triastcyn and Boi Faltings. 2018. Generating artificial data for private deep learning. arXiv preprint arXiv:1803.03148 (2018). [227] Aleksei Triastcyn and Boi Faltings. 2020. Generating higher-fidelity synthetic datasets with privacy guarantees. arXiv preprint arXiv:2003.00997 (2020). [228] Sergey Tulyakov, Ming-Yu Liu, Xiaodong Yang, and Jan Kautz. 2018. Mocogan: Decomposing motion and content for video generation. In Proceedings of the IEEE conference on computer vision and pattern recognition. 1526–1535. [229] Alexander Turner, Dimitris Tsipras, and Aleksander Madry. 2019. Label-Consistent Backdoor Attacks. arXiv preprint arXiv:1912.02771 (2019). [230] Aäron van den Oord and Nal Kalchbrenner. 2016. Pixel RNN. In ICML. [231] Aaron Van den Oord, Nal Kalchbrenner, Lasse Espeholt, Oriol Vinyals, Alex Graves, et al. 2016. Conditional image generation with pixelcnn decoders. In Advances in neural information processing systems. 4790–4798. [232] Sean Vasquez and Mike Lewis. 2019. Melnet: A generative model for audio in the frequency domain. arXiv preprint arXiv:1906.01083 (2019). [233] Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in neural information processing systems. 5998–6008. [234] James Vincent. 2018. Watch Jordan Peele use AI to make Barack Obama deliver a PSA about . https://www.theverge.com/tldr/2018/4/17/ 17247334/ai-fake-news-video-barack-obama-jordan-peele-buzzfeed. [235] Pascal Vincent, Hugo Larochelle, Isabelle Lajoie, Yoshua Bengio, Pierre-Antoine Manzagol, and Léon Bottou. 2010. Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion. Journal of machine learning research 11, 12 (2010). [236] Denis Volkhonskiy, Ivan Nazarov, and Evgeny Burnaev. 2020. Steganographic generative adversarial networks. In Twelfth International Conference on Machine Vision (ICMV 2019), Vol. 11433. International Society for Optics and Photonics, 114333M. [237] Jacob Walker, Kenneth Marino, Abhinav Gupta, and Martial Hebert. 2017. The pose knows: Video forecasting by generating pose futures. In Proceedings of the IEEE international conference on computer vision. 3332–3341. 28 Bauer and Bindschaedler

[238] Eric Wallace, Shi Feng, Nikhil Kandpal, Matt Gardner, and Sameer Singh. 2019. Universal adversarial triggers for attacking and analyzing NLP. arXiv preprint arXiv:1908.07125 (2019). [239] Eric Wallace, Tony Z Zhao, Shi Feng, and Sameer Singh. 2020. Customizing Triggers with Concealed Data Poisoning. arXiv preprint arXiv:2010.12563 (2020). [240] Binghui Wang and Neil Zhenqiang Gong. 2018. Stealing hyperparameters in machine learning. In 2018 IEEE Symposium on Security and Privacy (SP). IEEE, 36–52. [241] Chenxi Wang. 2019. Deepfakes, Revenge Porn, And The Impact On Women. https://www.forbes.com/sites/chenxiwang/2019/11/01/deepfakes- revenge-porn-and-the-impact-on-women/#1ce404831f53. [242] Sheng-Yu Wang, Oliver Wang, Richard Zhang, Andrew Owens, and Alexei A Efros. 2020. CNN-generated images are surprisingly easy to spot... for now. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 8695–8704. [243] Yaxing Wang, Lichao Zhang, and Joost Van De Weijer. 2016. Ensembles of generative adversarial networks. arXiv preprint arXiv:1612.00991 (2016). [244] Zhou Wang and Alan C Bovik. 2009. Mean squared error: Love it or leave it? A new look at signal fidelity measures. IEEE signal processing magazine 26, 1 (2009), 98–117. [245] Zhou Wang, Alan C Bovik, Hamid R Sheikh, and Eero P Simoncelli. 2004. Image quality assessment: from error visibility to structural similarity. IEEE transactions on image processing 13, 4 (2004), 600–612. [246] Juan Wen, Xuejing Zhou, Ping Zhong, and Yiming Xue. 2019. Convolutional neural network based text steganalysis. IEEE Signal Processing Letters 26, 3 (2019), 460–464. [247] Ian H Witten, Radford M Neal, and John G Cleary. 1987. Arithmetic coding for data compression. Commun. ACM 30, 6 (1987), 520–540. [248] Thomas Wolf, Julien Chaumond, Lysandre Debut, Victor Sanh, Clement Delangue, Anthony Moi, Pierric Cistac, Morgan Funtowicz, Joe Davison, Sam Shleifer, et al. 2020. Transformers: State-of-the-art natural language processing. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations. 38–45. [249] Blake Woodworth, Suriya Gunasekar, Mesrob I Ohannessian, and Nathan Srebro. 2017. Learning Non-Discriminatory Predictors. In Conference on Learning Theory. 1920–1953. [250] Sitao Xiang and Hao Li. 2017. On the effects of batch and weight normalization in generative adversarial networks. arXiv preprint arXiv:1704.03971 (2017). [251] Chaowei Xiao, Bo Li, Jun Yan Zhu, Warren He, Mingyan Liu, and Dawn Song. 2018. Generating adversarial examples with adversarial networks. In 27th International Joint Conference on Artificial Intelligence, IJCAI 2018. International Joint Conferences on Artificial Intelligence, 3905–3911. [252] Liyang Xie, Kaixiang Lin, Shu Wang, Fei Wang, and Jiayu Zhou. 2018. Differentially private generative adversarial network. arXiv preprint arXiv:1802.06739 (2018). [253] Chugui Xu, Ju Ren, Deyu Zhang, Yaoxue Zhang, Zhan Qin, and Kui Ren. 2019. GANobfuscator: Mitigating information leakage under GAN via differential privacy. IEEE Transactions on Information Forensics and Security 14, 9 (2019), 2358–2371. [254] Depeng Xu, Shuhan Yuan, Lu Zhang, and Xintao Wu. 2018. Fairgan: Fairness-aware generative adversarial networks. In 2018 IEEE International Conference on Big Data (Big Data). IEEE, 570–575. [255] Lei Xu, Maria Skoularidou, Alfredo Cuesta-Infante, and Kalyan Veeramachaneni. 2019. Modeling tabular data using conditional gan. arXiv preprint arXiv:1907.00503 (2019). [256] Xinchen Yan, Jimei Yang, Kihyuk Sohn, and Honglak Lee. 2016. Attribute2image: Conditional image generation from visual attributes. In European Conference on Computer Vision. Springer, 776–791. [257] Chaofei Yang, Qing Wu, Hai Li, and Yiran Chen. 2017. Generative poisoning attack method against neural networks. arXiv preprint arXiv:1703.01340 (2017). [258] Jianhua Yang, Kai Liu, Xiangui Kang, Edward K Wong, and Yun-Qing Shi. 2018. Spatial image steganography based on generative adversarial network. arXiv preprint arXiv:1804.07939 (2018). [259] Zhilin Yang, Zihang Dai, Yiming Yang, Jaime Carbonell, Russ R Salakhutdinov, and Quoc V Le. 2019. Xlnet: Generalized autoregressive pretraining for language understanding. In Advances in neural information processing systems. 5753–5763. [260] Zhongliang Yang, Xingjian Du, Yilin Tan, Yongfeng Huang, and Yu-Jin Zhang. 2018. Aag-stega: Automatic audio generation-based steganography. arXiv preprint arXiv:1809.03463 (2018). [261] Zhongliang Yang, Ke Wang, Jian Li, Yongfeng Huang, and Yu-Jin Zhang. 2019. TS-RNN: text steganalysis based on recurrent neural networks. IEEE Signal Processing Letters 26, 12 (2019), 1743–1747. [262] Zhongliang Yang, Nan Wei, Qinghe Liu, Yongfeng Huang, and Yujin Zhang. 2019. GAN-TStega: Text Steganography Based on Generative Adversarial Networks.. In IWDW. 18–31. [263] Zhongliang Yang, Lingyun Xiang, Siyu Zhang, Xingming Sun, and Yongfeng Huang. 2021. Linguistic Generative Steganography With Enhanced Cognitive-Imperceptibility. IEEE Signal Processing Letters 28 (2021), 409–413. [264] Zhong-Liang Yang, Xiao-Qing Guo, Zi-Ming Chen, Yong-Feng Huang, and Yu-Jin Zhang. 2018. RNN-stega: Linguistic steganography based on recurrent neural networks. IEEE Transactions on Information Forensics and Security 14, 5 (2018), 1280–1295. [265] Zhong-Liang Yang, Si-Yu Zhang, Yu-Ting Hu, Zhi-Wen Hu, and Yong-Feng Huang. 2020. VAE-Stega: linguistic steganography based on variational auto-encoder. IEEE Transactions on Information Forensics and Security 16 (2020), 880–895. Generative Models for Security: Attacks, Defenses, and Opportunities 29

[266] Dengpan Ye, Shunzhi Jiang, and Jiaqin Huang. 2019. Heard more than heard: An audio steganography method based on gan. arXiv preprint arXiv:1907.04986 (2019). [267] Chuanlong Yin, Yuefei Zhu, Shengli Liu, Jinlong Fei, and Hetong Zhang. 2018. An enhancing framework for botnet detection using generative adversarial networks. In 2018 International Conference on Artificial Intelligence and Big Data (ICAIBD). IEEE, 228–234. [268] Jiaxuan You, Rex Ying, Xiang Ren, William L Hamilton, and Jure Leskovec. 2018. GraphRNN: Generating Realistic Graphs with Deep Auto-regressive Models. In ICML. [269] Mahmood Yousefi-Azar, Vijay Varadharajan, Len Hamey, and Uday Tupakula. 2017. Autoencoder-based feature learning for cyber security applications. In 2017 International joint conference on neural networks (IJCNN). IEEE, 3854–3861. [270] Ning Yu, Larry S Davis, and Mario Fritz. 2019. Attributing fake images to gans: Learning and analyzing gan fingerprints. In Proceedings of the IEEE International Conference on Computer Vision. 7556–7566. [271] Yi Yu and Simon Canales. 2019. Conditional lstm-gan for melody generation from lyrics. arXiv preprint arXiv:1908.05551 (2019). [272] Yang Yu, Zhiqiang Gong, Ping Zhong, and Jiaxin Shan. 2017. Unsupervised representation learning with deep convolutional neural network for remote sensing images. In International Conference on Image and Graphics. Springer, 97–108. [273] Rich Zemel, Yu Wu, Kevin Swersky, Toni Pitassi, and Cynthia Dwork. 2013. Learning fair representations. In International Conference on Machine Learning. 325–333. [274] Yu Zeng, Huchuan Lu, and Ali Borji. 2017. Statistics of deep generated images. arXiv preprint arXiv:1708.02688 (2017). [275] Baiwu Zhang, Jin Peng Zhou, Ilia Shumailov, and Nicolas Papernot. 2020. Not My Deepfake: Towards Plausible Deniability for Machine-Generated Media. arXiv preprint arXiv:2008.09194 (2020). [276] Han Zhang, Tao Xu, Hongsheng Li, Shaoting Zhang, Xiaogang Wang, Xiaolei Huang, and Dimitris N Metaxas. 2017. Stackgan: Text to photo-realistic image synthesis with stacked generative adversarial networks. In Proceedings of the IEEE international conference on computer vision. 5907–5915. [277] Jiale Zhang, Junjun Chen, Di Wu, Bing Chen, and Shui Yu. 2019. Poisoning Attack in Federated Learning using Generative Adversarial Nets. In 2019 18th IEEE International Conference On Trust, Security And Privacy In Computing And Communications/13th IEEE International Conference On Big Data Science And Engineering (TrustCom/BigDataSE). IEEE, 374–380. [278] Jun Zhang, Graham Cormode, Cecilia M Procopiuc, Divesh Srivastava, and Xiaokui Xiao. 2017. Privbayes: Private data release via bayesian networks. ACM Transactions on Database Systems (TODS) 42, 4 (2017), 1–41. [279] Ru Zhang, Shiqi Dong, and Jianyi Liu. 2019. Invisible steganography via generative adversarial networks. Multimedia tools and applications 78, 7 (2019), 8559–8575. [280] Siyu Zhang, Zhongliang Yang, Jinshuai Yang, and Yongfeng Huang. 2021. Provably Secure Generative Linguistic Steganography. arXiv preprint arXiv:2106.02011 (2021). [281] Xiao Zhang, Jinghui Chen, Quanquan Gu, and David Evans. 2020. Understanding the intrinsic robustness of image distributions using conditional generative models. In International Conference on Artificial Intelligence and Statistics. PMLR, 3883–3893. [282] Yizhe Zhang, Zhe Gan, Kai Fan, Zhi Chen, Ricardo Henao, Dinghan Shen, and Lawrence Carin. 2017. Adversarial feature matching for text generation. In Proceedings of the 34th International Conference on Machine Learning-Volume 70. 4006–4015. [283] Zhuo Zhang, Jia Liu, Yan Ke, Yu Lei, Jun Li, Minqing Zhang, and Xiaoyuan Yang. 2019. Generative steganography by sampling. IEEE Access 7 (2019), 118586–118597. [284] Zhifei Zhang, Yang Song, and Hairong Qi. 2018. Decoupled learning for conditional adversarial networks. In 2018 IEEE Winter Conference on Applications of Computer Vision (WACV). IEEE, 700–708. [285] Zhikun Zhang, Tianhao Wang, Ninghui Li, Jean Honorio, Michael Backes, Shibo He, Jiming Chen, and Yang Zhang. 2020. PrivSyn: Differentially Private Data Synthesis. arXiv preprint arXiv:2012.15128 (2020). [286] Shengjia Zhao, Jiaming Song, and Stefano Ermon. 2017. Towards deeper understanding of variational autoencoding models. arXiv preprint arXiv:1702.08658 (2017). [287] Zhengli Zhao, Dheeru Dua, and Sameer Singh. 2018. Generating Natural Adversarial Examples. In International Conference on Learning Representa- tions. [288] Yu-Jun Zheng, Xiao-Han Zhou, Wei-Guo Sheng, Yu Xue, and Sheng-Yong Chen. 2018. Generative adversarial network based telecom fraud detection at the receiving bank. Neural Networks 102 (2018), 78–86. [289] Sharon Zhou, Mitchell Gordon, Ranjay Krishna, Austin Narcomey, Li F Fei-Fei, and Michael Bernstein. 2019. Hype: A benchmark for human eye perceptual evaluation of generative models. In Advances in Neural Information Processing Systems. 3449–3461. [290] Xuejing Zhou, Wanli Peng, Boya Yang, Juan Wen, Yiming Xue, and Ping Zhong. 2021. Linguistic steganography based on adaptive probability distribution. IEEE Transactions on Dependable and Secure Computing (2021). [291] Jun-Yan Zhu, Philipp Krähenbühl, Eli Shechtman, and Alexei A Efros. 2016. Generative visual manipulation on the natural image manifold. In European Conference on Computer Vision. Springer, 597–613. [292] Jun-Yan Zhu, Taesung Park, Phillip Isola, and Alexei A Efros. 2017. Unpaired image-to-image translation using cycle-consistent adversarial networks. In Proceedings of the IEEE international conference on computer vision. 2223–2232. [293] Zachary M Ziegler, Yuntian Deng, and Alexander M Rush. 2019. Neural linguistic steganography. arXiv preprint arXiv:1909.01496 (2019).