Autogan: Neural Architecture Search for Generative Adversarial Networks
Total Page:16
File Type:pdf, Size:1020Kb
AutoGAN: Neural Architecture Search for Generative Adversarial Networks Xinyu Gong1 Shiyu Chang2 Yifan Jiang1 Zhangyang Wang1 1Department of Computer Science & Engineering, Texas A&M University 2MIT-IBM Watson AI Lab {xy gong, yifanjiang97, atlaswang}@tamu.edu [email protected] Abstract owing to the notorious instability in GAN training. Lately, several state-of-the-art GANs [13, 42, 64, 5] adopted deep Neural architecture search (NAS) has witnessed prevail- residual network generator backbones for better generat- ing success in image classification and (very recently) seg- ing high-resolution images. In most other computer vision mentation tasks. In this paper, we present the first prelimi- tasks, a lot of the progress arises from the improved de- nary study on introducing the NAS algorithm to generative sign of network architectures, such as image classification adversarial networks (GANs), dubbed AutoGAN. The mar- [15, 16, 52, 51, 35], segmentation [8, 9, 49], and pose esti- riage of NAS and GANs faces its unique challenges. We mation [43]. Hereby, we advocate that enhanced backbone define the search space for the generator architectural vari- designs are also important for improving GANs further. ations and use an RNN controller to guide the search, with In recent years, there are surging interests in designing parameter sharing and dynamic-resetting to accelerate the sophisticated neural network architectures automatically. process. Inception score is adopted as the reward, and a Neural architecture search (NAS) has been successfully de- multi-level search strategy is introduced to perform NAS veloped and evaluated on the task of image classification in a progressive way. Experiments validate the effective- [71, 46], and lately on image segmentation as well [7, 36]. ness of AutoGAN on the task of unconditional image gen- The discovered architectures outperform human-designed eration. Specifically, our discovered architectures achieve models. However, naively porting existing NAS ideas from highly competitive performance compared to current state- image classification/segmentation to GAN would not suf- of-the-art hand-crafted GANs, e.g., setting new state-of-the- fice. First and foremost, even given hand-designed archi- art FID scores of 12.42 on CIFAR-10, and 31.01 on STL-10, tectures, the training of GANs is notoriously unstable and respectively. We also conclude with a discussion of the cur- prone to collapse [50]. Mingling NAS into the training pro- rent limitations and future potential of AutoGAN. The code cess will undoubtedly amplify the difficulty. As another is avaliable at https://github.com/TAMU-VITA/ important challenge, while the validation accuracy makes a AutoGAN. natural reward option for NAS in image classification, it is less straightforward to choose a good metric for evaluating and guiding the GAN search process. 1. Introduction This paper presents an architecture search scheme specifically tailored for GANs, dubbed AutoGAN. Up to Generative adversarial networks (GANs) have been pre- our best knowledge, AutoGAN describes the first attempt vailing since its origin [11]. One of their most notable suc- to incorporate NAS with GANs, and belongs to one of the cesses lies in generating realistic natural images with vari- first attempts to extend NAS beyond image classification ous convolutional architectures [47, 5, 61, 26, 13, 67]. In too. Our technical innovations are summarized as follows: order to improve the quality of generated images, many efforts have been proposed, including modifying discrim- • We define the search space to capture the GAN archi- inator loss functions [3, 68], enforcing regularization terms tectural variations. On top of that, we use an RNN con- [6, 13, 42, 5], introducing attention mechanism [61, 64], and troller [71] to guide the architecture search. Based on adopting progressive training [26]. the parameter sharing strategy in [46], we further in- However, the backbone architecture design of GANs troduce a parameter dynamic-resetting strategy in our has received relatively less attention, and was often con- search progress, to boost the training speed. sidered a less significant factor accounting for GAN per- • We use Inception score (IS) [50] as the reward, in formance [39, 32]. Most earlier GANs stick to relatively the reinforcement-learning-based optimization of Au- shallow generator and discriminator architectures, mostly toGAN. The discovered models are found also to show 3224 favorable performance under other GAN metrics, e.g., cell, and top-performance candidates are preserved. The the Frchet Inception Distance (FID) [50]. next round of search will be continued based on them for • We further introduce multi-level architecture search the bigger cell. (MLAS) to AutoGAN, as motivated by the progres- 2.2. Generative Adversarial Network sive GAN training [26]. MLAS performs the search in multiple stages, in a bottom-up sequential fashion, A GAN has a generator network and a discrimina- with beam search [37]. tor network playing a min-max two-player game against each other. It has achieved great success in many genera- We conduct a variety of experiments to validate the effec- tion and synthesis tasks, such as text-to-image translation tiveness of AutoGAN. Our discovered architectures yield [66, 65, 61, 48], image-to-image translation [22, 70, 63], highly promising results that are better than or compara- and image enhancement [33, 31, 23]. However, the train- ble with current hand-designed GANs. On the CIFAR-10 ing of GAN is often found to be highly unstable [50], and dataset, AutoGAN obtains an Inception score of 8.55, and commonly suffers from non-convergence, mode collapse, a FID score of 12.42. In addition, we show that the discov- and sensitivity to hyperparameters. Many efforts have been ered architecture on CIFAR-10 even performs competitively devoted to alleviating those problems, such as the Wasser- on the STL-10 image generation task, with a 9.16 Inception stein loss [3], spectral normalization [42], progressive train- score and a 31.01 FID score, demonstrating a strong trans- ing [26], and the self-attention block [64], to name just a ferability. On both datasets, AutoGAN establishes the new few. state-of-the-art FID scores. Many of our experimental find- ings concur with previous GAN crafting experiences, and 3. Technical Approach shed light on new insights into GAN generator design. A GAN consists of two competing networks: a generator 2. Related Work and a discriminator. It is also known that the two architec- tures have to be delicately balanced in their learning capac- 2.1. Neural Architecture Search ities. Hence to build AutoGAN, the first question is: how Neural architecture search (NAS) algorithms aim to find to build the two networks in a GAN (generator and discrim- an optimal neural network architecture instead of using a inator, denoted as G and D hereinafter) together? On one hand-crafted one for a specific task. Previous works on NAS hand, if we use a pre-fixed D (or G) and search only for G have achieved great success on the task of image classifica- (or D), it will easily incur imbalance between the powers tion [30]. Recent works further extended the NAS algo- of D or G [18, 2] (especially, at the early stage of NAS), rithms to dense and structured prediction [7, 36]. It is also resulting in slow updates or trivial learning. On the other worth mentioning that NAS is also applied to CNN com- hand, while it might be possible jointly search for G and pression and acceleration [17]. However, there is not any D, empirical experiments observe that such two-way NAS NAS algorithm developed for generative models. will further deteriorate the original unstable GAN training, A NAS algorithm consists of three key components: the leading to highly oscillating training curves and often fail- search space, the optimization algorithm, and the proxy ure of convergence. As a trade-off, we propose to use NAS task. For search space, there are generally two cate- to only search for the architecture of G, while growing gories: searching for the whole architecture directly (macro D as G becomes deeper, by following a given routine to search), or searching for cells and stacking them in a stack pre-defined blocks. The details of growing D will be pre-defined way (micro search). For the optimization al- explained more in the supplementary. gorithm, popular options include reinforcement learning Based on that, AutoGAN follows the basic idea of [71] to [4, 71, 69, 72], evolutionary algorithm [60], Bayesian op- use a recurrent neural network (RNN) controller to choose timization [24], random search [7], and gradient-based op- blocks from its search space, to build the G network. The timization methods [38, 1]. For the proxy task, it is de- basic scheme is illustrated in Figure 1. We make multi-fold signed to evaluate the performance of discovered architec- innovations to address the unique challenges arising from ture efficiently during training. Examples include early stop the specific task of training GANs. We next introduce Au- [7, 72], using low-resolution images [57, 10, 29], perfor- toGAN from the three key aspects: the search space, the mance prediction with a surrogate model [37], employing a proxy task and the optimization algorithm. small backbone [7] or leveraging shared parameters [46]. 3.1. Search Space Most NAS algorithms [71, 46] generate the network ar- chitecture (macro search) or the cell (micro search) by one AutoGAN is based on a multi-level architecture pass of the controller. A recent work [37] introduced multi- search strategy, where the generator is composed of level search to NAS in the image classification task, using several cells. Here we use a (s+5) element tuple beam search. Architecture search will begin on a smaller (skip1,...,skips,C,N,U,SC) to categorize the s-th cell, 3225 Shortcut Shortcut ConvBlock norm upsample Skip ConvBlock norm upsample connection connection connection Controller Controller Controller Controller Top Controller Controller Controller Controller Controller Top 0 0 0 0 K 1 1 1 1 1 K … ConvBlock upsample Shortcut Skip ConvBlock upsample norm connection connection norm Cell 0 Cell 1 Figure 1: The running scheme of the RNN controller.