Robust Variational Autoencoders: Generating Noise-Free Images from Corrupted Images

Robust Variational Autoencoders: Generating Noise-Free Images from Corrupted Images Huimin Ren1, Yun Yue1, Chong Zhou2, Randy C. Paffenroth1,Yanhua Li1, Matthew L. Weiss1 1Worcester Polytechnic Institute, 2Microsoft Corporation 1{hren,yyue,rcpaffenroth,yli15,mlweiss}@wpi.edu, [email protected] ABSTRACT In particular, there are two primary generative models, Gener- Generative models, including Variational Autoencoders, aim to find ative Adversarial Networks (GANs) [10] and Variational Autoen- mappings from easily sampled latent spaces to intractable observed coders (VAEs) [13]. A GAN trains a generator and discriminator spaces. Such mappings allow one to generate new instances by map- at the same time until they reach Nash Equilibrium [10]. A VAE ping samples in the latent space to points in the high dimensional assumes that a collection of latent variables generates all the obser- observed space. However, in many real-world problems, pervasive vations [13]. Recently, various flavors of GANs and VAEs have been noise is commonplace and these corrupted measurements in the proposed, which have achieved compelling results in the image observed spaces can lead to substantial corruptions in the latent generation area [2, 20]. space. Herein, we demonstrate a novel extension to Variational Most generation models depend on clean noise-free input. How- Autoencoders, which can generate new samples without access to ever, anomalies and noise are commonplace and high-quality data is any clean noise-free training data and pre-denoising stages. Our work not always available in many cases [26]. Recently proposed research arises from Robust Principal Component Analysis and Robust Deep with generative models either focus on removing noise from cor- Autoencoders, and we split the input data into two parts, - = ! ¸ (, rupted input [6] or generating new images from available cleaned where ( contains the noise and ! is the noise-free data which can data, which can be obtained from existing off-the-shelf denoising be accurately mapped from the latent space to the observed space. methods [4]. This raises the question: can we combine the denoising We demonstrate the effectiveness of our model by comparing it and generation abilities of neural networks to create clean images against standard Variational Autoencoders, Generative Adversarial from corrupted input data directly? It may seem intuitive to denoise Neural Networks, and other pre-trained denoising models. first, then generate new data from denoising output. However, the final generation depends highly on the denoising, which cannotbe KEYWORDS guaranteed to pass clear images to the generative step. To bridge the research gap of creating realistic images from noisy Denoising, Variational Autoencoder, Robust Generative Model input data directly, we propose a novel denoising generative model, ACM Reference Format: Robust Variational Autoencoder (RVAE), where an enhanced VAE Huimin Ren1, Yun Yue1, Chong Zhou2, Randy C. Paffenroth1,Yanhua Li1, takes corrupted images and generates noise-free images. Our main Matthew L. Weiss1. 2018. Robust Variational Autoencoders: Generating contributions are summarized as follows: Noise-Free Images from Corrupted Images. In AdvML ’20: Workshop on • We propose an extension of VAEs to robust cases where no Adversarial Learning Methods for Machine Learning and Data Mining, August clean, noise-free data is available. Such an extension allows 24, 2020, San Diego, CA. ACM, New York, NY, USA, 6 pages. https://doi.org/ denoising and inferring new instances at the same time, 10.1145/1122445.1122456 which, to the best of our knowledge, is a novel combination of robust models and generative models. 1 INTRODUCTION • Instead of separating the denoising and generation processes, Generative models have been successfully applied to many applica- our model integrates them. The denoising part offers clear tion domains including image and text generation, semi-supervised inputs to the generative part and the generative part provides learning, and domain adaption [17, 20]. Some advanced applica- potential corrupted points to the denoising part. tions of generative models have been proposed such as generating • We demonstrate the robustness of our proposed method plausible images from human-written descriptions [21], and recov- using different data sets such as MNIST, fashion-MNIST, ering photo-realistic textures from heavily down-sampled images and CelebA, where the input images are corrupted by differ- [15]. Building good generative models of realistic images is also a ent noise types, including Gaussian noise and the salt-and- fundamental requirement of current AI systems[7]. pepper noise. Permission to make digital or hard copies of all or part of this work for personal or 2 OVERVIEW AND RELATED WORK classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation In this section, we outline some of the key ideas from RPCA, RDA, on the first page. Copyrights for components of this work owned by others than ACM and VAE. RPCA [5, 18] assumes observed instances and features are must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a linearly correlated, with the exception of noise and outliers. Such fee. Request permissions from [email protected]. a model offers a framework that can be extended and generalized AdvML ’20, August 24, 2020, San Diego, CA from linear feature learning to non-linear, as shown in RDA [28]. © 2018 Association for Computing Machinery. ACM ISBN 978-x-xxxx-xxxx-x/YY/MM...$15.00 VAEs, which recently gained popularity, are generative models that https://doi.org/10.1145/1122445.1122456 learn a mapping from a latent variable I to the observations -. And In the next section, we provide technical details of our novel encoder and 퐷\2 is called the decoder. Building on ideas from [13], contribution to the above-mentioned problems, more specifically, the commonly used optimization function for VAE training is: by allowing a VAE to be embedded into the denoising framework k − ¹ ¹ ººk ¸ ¹ ¹ º j N ¹ ºº min - 퐷\2 퐸\1 - ! 퐸\1 - 0, 1 , (4) of RPCA and RDA. \1,\2 where KL represents Kullback-Leibler divergence (KL divergence) 2.1 From Robust Principal Component k − ¹ ¹ ººk and the first term, - 퐷\2 퐸\1 - represents the standard Analysis to Robust Deep Autoencoders autoencoder reconstruction error. RPCA is a generalization of Principal Component Analysis (PCA) [8] that attempts to alleviate the effects of grossly corrupted ob- 3 METHODOLOGY servations, which are unavoidable in real-world data. In particular, In this section, we provide details of our model, RVAE, which builds RPCA assumes a given data matrix - is comprised of an unknown an anomaly filter into a standard VAE. The key idea of RVAE isthat low-rank matrix ! and an unknown sparse matrix (, with the goal noisy and clean data essentially arise from different distributions, of discovering both ! and ( simultaneously. and therefore the generation of both noisy and clean data from In the literature, there exist commonly used approaches in which the same latent variables is highly unlikely. In particular, a VAE RPCA can be treated using a tractable convex optimization as fol- assumes all instances are generated from simple, low-dimensional lows [5, 18]: distributions, but noise and anomalies share little information with min k!k∗ ¸ _k( k1 clean data. This results in large errors if one tries to infer noise !,( (1) from generative mappings which are optimal on clean data. k − − k2 = s.t. - ! ( 퐹 0, We depict the structure of an RVAE in Figure 1, where denoising where k · k∗ is the nuclear norm which is the sum of non-zero and inferring new instances are implemented at the same time. Í= singular values of a matrix, k!k∗ = 8= f8 , k · k1 is ℓ1 norm which The noisy input is split into two parts, ! and (. ! represents de- 1Í is the sum of absolute values, k( k1 = 8,9 k(8,9 k, and _ ¡ 0 is a sired clean data, and it is passed to a standard VAE that includes regularization parameter to balance ! and (. latent variables I, inference mapping ?¹Ij!º and generative map- RDA maintains a deep autoencoder’s ability to discover high- ping @¹!jIº. Therefore, a VAE uses ℓ1 norm to separate data into quality non-linear features in data but also uses the principles of outliers, represented by ( and nominal data, represented by !. We RPCA to remove outliers and noise from data. The key insight of also provide a training algorithm for the splitting of ! and (, which RDA is that noise and outliers are substantially incompressible and is a non-differentiable and non-convex problem. The denoising and therefore cannot adequately be projected to a low-dimensional hid- generation stages finish simultaneously, as they share the same den layer by an autoencoder. Similar to RPCA, an RDA also splits parameters from the decoder. the input data - into two parts, ! and (. Here ! represents the portion of the input data that is well represented by an autoencoder hidden layer and ( contains noise and outliers which are difficult to reconstruct. By removing noise and outliers from -, the autoencoders can more accurately recover the remaining !. In particular, the objective function for RDA is given by the following [28] min k! − 퐷\ ¹퐸\ ¹!ººk ¸ _k( k1 \,( (2) s.t. - − ! − ( = 0, where ( is the anomalous data, ! is a low dimension manifold which can be accurately reconstructed by an encoder map 퐸 and a decoder map 퐷, and _ is a parameter that tunes the level of sparsity in (. The first term is the objective function for a standard autoencoder, where ! is the input, and 퐷\ ¹퐸\ ¹!ºº is the reconstruction of !. 2.2 Variational Autoencoders Figure 1: Structure of RVAE A VAE assumes all the observed instances - are generated from a latent variable I but the distribution of -, ?¹-º, is intractable 3.1 Robust Variational Autoencoders to compute with a limited number of observations.

Load more