Privately Training an AI Model Using Fake Images Generated by Generative Adversarial Networks

Privately Training an AI Model Using Fake Images Generated by Generative Adversarial Networks WWT Artificial Intelligence Research & Development AUGUST | 2019 World Wide Technology wwt.com Table of Contents Abstract ..................................................................................................... 3 Business Value ........................................................................................... 3 Introduction .............................................................................................. 4 Methodology ........................................................................................... 4 An Introduction to Generative Adversarial Networks ..................................... 4 Input Data ............................................................................................ 6 Experimental Setup .................................................................................. 6 Implementations and Results .....................................................................7 Deep Convolutional GAN ................................................................................. 7 Training DCGAN ..................................................................................... 9 Results ................................................................................................. 9 Conditional GAN .................................................................................... 10 Training CGAN.......................................................................................11 Results ................................................................................................11 Spectral Normalization-GAN ................................................................... 14 Training SN-GAN ................................................................................... 15 Results ............................................................................................... 15 Conclusion ............................................................................................... 20 References ............................................................................................... 22 Privately Training an AI Model Using Fake Images Generated by Generative Adversarial Networks 2 Abstract Deep learning algorithms produce sophisticated results using different machine learning and computer vision tasks. To perform well on a given problem, these algorithms require a large dataset for training. Often, deep learning algorithms lack generalization and suffer from over- fitting when trained on a small dataset. For example, the storage of image data along with its corresponding labels for supervised image analysis in medical imaging is costly and time- consuming. Another challenge is that most of the data collected by corporations and public institutions is sensitive and may be prohibited from being shared publicly or with third parties. In this paper, Generative Adversarial Networks (GANs) are used to generate synthetic images that can then be used for further analysis in deep learning algorithms or used by a third party, while obscuring any confidential information. This research has been carried out on proprietary images of race cars using multiple GAN techniques that generate precise segmented images based on car classes. The images generated using one of the techniques (SN-GAN) captured features of real data so well that the classification model trained on those generated images achieved 89.6% accuracy, when tested on real images. Business Value This paper discusses methods to generate image data using artificial intelligence. Rather than obfuscating images by deleting personal identifiers or adding noise, GANs will be used to generate entirely new images. The new sample of data will fall within the same distribution of the original data but will not correspond directly to any unique image from the original data set. Ideally, this will be done so that the structure of correspondence between variables reflects the original data. This paper aims to prove that useful models can be built using the generated data of a GAN. Many sources of sensitive data are not allowed to be shared with third-party researchers. In particular, healthcare organizations and financial records kept by banks and governments have strict limitations pertaining to the sharing of patient data. These protected sources of data could be used in a variety of positive ways if made accessible to third parties. For example, in the healthcare industry, medical images could be more easily analyzed by outside researchers. In the financial industry, financial records or credit card statements could be used by security analysts to develop techniques to detect fraud or other illegal activities. Network logs that were GAN-generated mock the activity of viruses or malware (E. David), and are able to test the performance of security software without the danger of installing the virus on a computer. GANs were first devised in 2014 and have most commonly been used to generate images of people and objects (I. Goodfellow). GANs can create entirely new data sets for images, text or numbers. They work as two competing neural networks, a generator and discriminator: the generator seeks to Privately Training an AI Model Using Fake Images Generated by Generative Adversarial Networks 3 create an image representative of real images while the discriminator attempts to discern between real images and images created by the generator. The adversarial parallel training in GANs enables the generator of the network to continuously improve its depiction of the true data distribution until its outputs are indistinguishable from the real data. In addition, adversarial-trained generative models parameterize real-world data in order to generate new samples. Introduction This research is focused on generating race car image data using Generative Adversarial Networks with a given set of input images. Three different GAN approaches are explored and compared to generate synthetic race car images. This research revolves around learning about advanced GANs and using multiple parameters to tune the generator algorithms. In the following sections, there is a detailed explanation of GANs and an introduction to the input data (section one, methodology). Experimental setup (section two) lends itself towards the implementation and results of the three different model techniques applied (section three). METHODOLOGY An Introduction to GANs A GAN consists of two neural networks: a generator and a discriminator. The generator produces fake data, and the discriminator tries to differentiate between the fake and real data. The two train against each other, as demonstrated in Figure 1. A key feature of this structure is that the generator never sees the real data and learns how to produce similar looking data through feedback from the discriminator. Thus, in situations involving confidential data, one can train the full network in a secure environment and then release only the generator to outside researchers. Then, the generator can be used to produce arbitrary quantities of data for analysis. Training the discriminator - h3 This works like any other neural network, but with the extra step of producing a current batch of fake data from the generator prior to each training iteration. One feeds both the real and generated data through the neural network, and trains the discriminator to output the correct real and fake labels. Privately Training an AI Model Using Fake Images Generated by Generative Adversarial Networks 4 Training the discriminator This works like any other neural network, but with the extra step of producing a current batch of fake data from the generator prior to each training iteration. One feeds both the real and generated data through the neural network and trains the discriminator to output the correct real and fake labels. FIGURE 1: GAN Training Process Training the generator To train the generator, one uses the combined architecture but trains only the layers belonging to the generator. These layers are updated with backpropagation to achieve labels of “real” for the generated data, as in Figure 1, shown above. Input Data Car No. 24 Car No. 42 FIGURE 2: Input data for race car analysis Car No. 48 Car No. 9 Privately Training an AI Model Using Fake Images Generated by Generative Adversarial Networks 5 Training of GAN models was conducted using a proprietary data set of 60,000 NASCAR race car images during a race. For information on how these images were curated and labeled, see our previous paper, Image Classification of Race Cars. Every car with a specific number had distinctive features in terms of color, design and sponsor positioning. Additionally, model training was tried on two different resolutions, 64x64 and 128x128. EXPERIMENTAL SETUP Hardware Training for the GANs was performed on a Cisco C480ML machine using Tesla V100 GPU. Docker containers were loaded on the C480ML to run Python 3 and the packages listed below. Most of the code development was done using Jupyter notebooks in JupyterLab environment. The following Python packages were used to perform the training and testing: The following Python packages were used to perform the training and testing: 1. Pandas 2. Numpy 3. SciKit-Learn 4. SciKit-image 5. Matplotlib 6. Tensorflow-gpu 7. Keras 8. Pillo Implementations and Results In this experiment, the objective was to generate multi-label image data using GANs trained on a proprietary data set of

Load more