Alphagan: Fully Differentiable Architecture Search for Generative Adversarial Networks

AlphaGAN: Fully Differentiable Architecture Search for Generative Adversarial Networks Anonymous Author(s) Affiliation Address email Abstract 1 Generative Adversarial Networks (GANs) are formulated as minimax game prob- 2 lems, whereby generators attempt to approach real data distributions by virtue of 3 adversarial learning against discriminators. In this work, we aim to boost model 4 learning from the perspective of network architectures, by incorporating recent 5 progress on automated architecture search into GANs. To this end, we propose a 6 fully differentiable search framework for generative adversarial networks, dubbed 7 alphaGAN. The searching process is formalized as solving a bi-level minimax 8 optimization problem, where the outer-level objective aims for seeking a suitable 9 network architecture towards pure Nash Equilibrium conditioned on the network pa- 10 rameters optimized in the inner level. The entire optimization performs a first-order 11 method by alternately optimizing the two-level objective in a fully differentiable 12 manner, enabling architecture search to be completed in an enormous search space. 13 Extensive experiments on CIFAR-10 and STL-10 datasets show that our algorithm 14 can obtain high-performing architectures only with 3-GPU hours on a single GPU 11 15 in the search space comprised of approximate 2 × 10 possible configurations. 16 1 Introduction 17 Generative Adversarial Networks (GANs) [1] have shown promising performance on a variety of 18 generative tasks (e.g., image generation [2], image translation [3, 4], dialogue generation [5], and 19 image inpainting [6]). However, pursuing high-performance generative networks is non-trivial due 20 to the non-convex non-concave property. There is a rich history of research aiming to improve the 21 training stabilization and alleviate mode collapse by introducing generative adversarial functions (e.g., 22 Wasserstein distance [7], Least Squares loss [8], and hinge loss [9]) or regularization (e.g., gradient 23 penalty [10, 11]). 24 Alongside the direction of improving loss functions, improving architectures has been proven to be 25 important for stabilizing training and improving generalization. Previous works [12, 9, 10, 13, 2] 26 employ deep convolutional networks to boost the performance of GANs. However, such a manual 27 architecture design typically requires domain-specific knowledge from human experts, which is 28 even challenging for GANs due to the minimax formulation that it intrinsically possesses. Recent 29 progress of architecture search on a variety of supervised learning tasks [14, 15, 16, 17] has shown 30 that remarkable achievements can be achieved by automating the architecture search process. 31 In this paper, we aim to address the problem of GAN architecture search from the perspective of 32 Game theory since it is essentially a minimax problem [1] targeting at finding pure Nash Equilibrium 33 of generator and discriminator [18, 19]. From this perspective, we propose a fully differentiable 34 architecture search framework for GANs, dubbed alphaGAN, in which a differentiable evaluation 35 metric is introduced for guiding architecture search towards pure Nash Equilibrium [20]. Motivated 36 by DARTS [15], we formulate the search process of alphaGAN as a bi-level minimax optimization Submitted to 34th Conference on Neural Information Processing Systems (NeurIPS 2020). Do not distribute. 37 problem, and solve it efficiently via stochastic gradient-type methods. Specifically, the outer level 38 objective aims to optimize the generator architecture parameters towards pure Nash Equilibrium, 39 whereas the inner level constraint targets at optimizing the weight parameters conditioned on the 40 architecture currently searched. 41 This work is related to several recent methods. GAN architecture search is performed with a 42 reinforcement learning paradigm [21, 22, 23] or a gradient-based paradigm [24]. 43 Extensive experiments including comparison to these methods and analysis of the searched archi- 44 tectures, demonstrate the effectiveness of the proposed algorithm in performance and efficiency. 45 Specially, alphaGAN can discover high-performance architectures while being much faster than the 46 other automated architecture search methods. 47 2 GAN Architecture Search as Fully Differential Optimization 48 Differentiable Architecture Search was first proposed in [15], where the problem is formulated as a 49 bi-level optimization. Weight parameters and architecture parameters are optimized in the inner level 50 and outer level, respectively. The objective function in both levels is the cross entropy loss, which 51 can reflect the quality of current models in classification tasks. 52 However, deploying such a framework to searching architectures of GANs is non-trivial. The training 53 of GANs corresponds to the optimization of a minimax problem. Many previous works [2, 25] have 54 pointed that the adversarial loss cannot refect the quality of GANs. A suitable evaluation metric is 55 essential for a gradient-based NAS-GAN framework. 56 Evaluation function. Due to the intrinsic minimax property of GANs, the training process of 57 GANs can be viewed as a zero-sum game as in [18, 1]. The universal objective of training GANs 58 consequentially can be regarded as reaching pure equilibrium. Hence, we adopt the primal-dual gap 59 function [25, 26] for evaluating the generalization of vanilla GANs. Given a pair of G and D, the 60 duality-gap function is defined as V(G; D) = Adv(G; D) − Adv(G; D) := max Adv(G; D) − min Adv(G; D): (1) D G 61 The evaluation metric V(G; D) is non-negative and V (G; D) = 0 can be only achieved when the 62 pure equilibrium holds. Function (1) provides a quantified measure of describing “how close is 63 current GAN to pure equilibrium", which can be used for assessing model capacity. 64 The architecture search for GANs can be formulated as a specific bi-level optimization problem: n o min V(G; D):(G; D) := arg min max Adv (G; D) ; (2) α G D 65 where V(G; D) performs on the validation dataset and supervises seeking the optimal generator 66 architecture as an outer-level problem, and the inner-level optimization on Adv (G; D) aims to learn 67 suitable network parameters (including both the generator and discriminator) for GAN on the current 68 architecture. 69 In this work, we exploit the hinge loss from [9, 27] as the generative adversarial function Adv (G; D). 70 AlphaGAN formulation. By integrating the generative adversarial function (i.e., hinge loss) and 71 evaluation function (1) into the bi-level optimization (2), we can obtain the final objective for the 72 framework as follows, min Vval(G; D) = Adv(G; D) − Adv(G; D) (3) α s:t: ! 2 arg min max Advtrain(G; D) (4) !G !D 73 where generator G and discriminator D are parameterized with variables (αG;!G) and (!D), re- 74 spectively, D = arg maxD Advval(G; D), and G = arg minG Advval(G; D). The detailed search 75 algorithm and other details are in the supplementary material. 2 Table 1: Comparison with state-of-the-art GANs on CIFAR-10. y denotes the results reproduced by us, with the structure released by Auto-GAN and trained under the same setting as AutoGAN. Params FLOPs search time search search IS FID Architecture (M) (G) (GPU-hours) space method (" is better) (# is better) DCGAN([12]) - - - - manual 6:64 ± 0:14 - SN-GAN([9]) - - - - manual 8:22 ± 0:05 21:7 ± 0:01 Progressive GAN([13]) - - - - manual 8.80±0.05 - WGAN-GP, ResNet([10]) - - - - manual 7:86 ± 0:07 - AutoGAN([22]) 5:192 1:77 - ∼ 105 RL 8:55 ± 0:1 12:42 AutoGANy 5:192 1:77 82 ∼ 105 RL 8:38 ± 0:08 13:95 AGAN([21]) - - 28800 ∼ 20000 RL 8:29 ± 0:09 30:5 Random search([30]) 2:701 1:11 40 ∼ 2 × 1011 Random 8:46 ± 0:09 15:43 11 alphaGAN(l) 8:618 2:78 22 ∼ 2 × 10 gradient 8:71 ± 0:12 11.23 11 alphaGAN(s) 2:953 1:32 3 ∼ 2 × 10 gradient 8:60 ± 0:11 11:85 76 3 Experiments 77 In this section, we conduct extensive experiments on CIFAR-10 [28] and STL-10 [29]. First, the 78 generator architecture is searched on CIFAR-10 and the discretized optimal structure is fully re- 79 trained from scratch following [22] in Section 3.1. We compare alphaGAN with the other automated 80 GAN methods in multiple measures to demonstrate its effectiveness. Second, the generalization of 81 the searched architectures is verified by fully training on STL-10 and evaluation in Section 3.2. To 82 further understand the properties of our method, a series of studies on the key components of the 83 framework are shown in Section 3.3. Experiment details are in the supplementary material. 84 3.1 Searching on CIFAR-10 85 We first compare our method with recent automated GAN methods. For a fair comparison, we report 86 the performance of best run (over 3 runs) for reproduced baselines and ours in the Table 1 and provide 87 the performance of several representative works with manually designed architectures for reference. 88 We provide a detailed analysis and discussion about the searching process. And the statistic properties 89 of architectures searched by alphaGAN are in the supplementary material. 90 Performances of alphaGAN with two search configurations are shown In Tab. 1 by adjusting step 91 sizes of the inner loop and the outer loop, where alphaGAN(l) represents passing through every epoch 92 on the training and validation sets for each loop, i.e., steps = 390. And alphaGAN(s) represents 93 using smaller interval steps with steps = 20. 94 The results show that our method performs well in the two settings and outperforms the other 95 automated GAN methods in terms of both efficiency and performance.

Load more