Unbiased Auxiliary Classifier Gans with MINE

Unbiased Auxiliary Classifier Gans with MINE

Unbiased Auxiliary Classifier GANs with MINE Ligong Han Anastasis Stathopoulos Tao Xue Rutgers University Rutgers University Rutgers University [email protected] [email protected] [email protected] Dimitris Metaxas Rutgers University [email protected] Abstract Despite its simplicity and popularity, AC-GAN is re- ported to produce less diverse data samples [18, 14]. This Auxiliary Classifier GANs (AC-GANs) [15] are widely phenomenon is formally discussed in Twin Auxiliary Clas- used conditional generative models and are capable of sifier GAN (TAC-GAN) [5]. The authors of TAC-GAN re- generating high-quality images. Previous work [18] has veal that due to a missing negative conditional entropy term pointed out that AC-GAN learns a biased distribution. To in the objective of AC-GAN, it does not exactly minimize remedy this, Twin Auxiliary Classifier GAN (TAC-GAN) [5] the divergence between real and fake conditional distribu- introduces a twin classifier to the min-max game. How- tions. TAC-GAN proposes to estimate this missing term by ever, it has been reported that using a twin auxiliary clas- introducing an additional classifier in the min-max game. sifier may cause instability in training. To this end, we However, it has also been reported that using such twin aux- propose an Unbiased Auxiliary GANs (UAC-GAN) that uti- iliary classifiers might result in unstable training [10]. lizes the Mutual Information Neural Estimator (MINE) [2] In this paper, we propose to incorporate the negative to estimate the mutual information between the generated conditional entropy in the min-max game by directly esti- data distribution and labels. To further improve the per- mating the mutual information between generated data and formance, we also propose a novel projection-based statis- labels. The resulting method enjoys the same theoretical tics network architecture for MINE∗. Experimental results guarantees as that of TAC-GAN and avoids the instability on three datasets, including Mixture of Gaussian (MoG), caused by using a twin auxiliary classifier. We term the MNIST [12] and CIFAR10 [11] datasets, show that our proposed method UAC-GAN because (1) it learns an Un- UAC-GAN performs better than AC-GAN and TAC-GAN. biased distribution, and (2) MINE [2] relates to Unnormal- Code can be found on the project websitey. ized bounds [16]. Finally, our method demonstrates supe- rior performance compared to AC-GAN and TAC-GAN on 1-D mixture of Gaussian synthetic data, MNIST [12], and 1. Introduction CIFAR10 [11] dataset. Generative Adversarial Networks (GANs) [6] are gen- arXiv:2006.07567v1 [cs.CV] 13 Jun 2020 2. Related Work erative models that can be used to sample from high di- mensional non-parametric distributions, such as natural im- Learning unbiased AC-GANs. In CausalGAN [10], the ages or videos. Conditional GANs [13] is an extension of authors incorporate a binary Anti-Labeler in AC-GAN and GANs that utilize the label information to enable sampling theoretically show its necessity for the generator to learn the from the class conditional data distribution. Class condi- true class conditional data distributions. The Anti-Labeler tional sampling can be achieved by either (1) conditioning is similar to the twin auxiliary classifier in TAC-GAN, but the discriminator directly on labels [13,9, 14], or by (2) in- it is used only for binary classification. Shu et al.[18] for- corporating an additional classification loss in the training mulates the AC-GAN objective as a Lagrangian to a con- objective [15]. The latter approach originates in Auxiliary strained optimization problem and shows that the AC-GAN Classifier GAN (AC-GAN) [15]. tends to push the data points away from the decision bound- ary of the auxiliary classifiers. TAC-GAN [5] builds on ∗This is an extended version of a CVPRW’20 workshop paper with the same title. In the current version the projection form of MINE is detailed. the insights of [18] and shows that the bias in AC-GAN is yhttps://github.com/phymhan/ACGAN-PyTorch caused by a missing negative conditional entropy term. In 1 addition, [5] proposes to make AC-GAN unbiased by intro- 3.2. Twin Auxiliary Classifier GANs ducing a twin auxiliary classifier that competes in an adver- Twin Auxiliary Classifier GAN (TAC-GAN) [5] tries to sarial game with the generator. The TAC-GAN can be con- estimate H (Y jX) by introducing another auxiliary clas- sidered as a generalization of CausalGAN’s Anti-Labeler to Q sifier Cmi. First, notice the mutual information can be de- the multi-class setting. composed in two symmetrical forms, Mutual information estimation. Learning a twin auxil- iary classifier is essentially estimating the mutual informa- IQ(X; Y ) = H(Y ) − HQ(Y jX) = HQ(X) − HQ(XjY ): tion between generated data and labels. We refer readers to [16] for a comprehensive review of variational mutual in- Herein, the subscript Q denotes the corresponding distribu- formation estimators. In this paper, we employ the Mutual tion Q induced by G. Since H(Y ) is constant, optimizing Information Neural Estimator (MINE) [2]. −HQ(Y jX) is equivalent to optimizing IQ(X; Y ). TAC- GAN shows that when Y is uniform, the latter form of 3. Background IQ can be written as the Jensen-Shannon divergence (JSD) 3.1. Bias in Auxiliary Classifier GANs between conditionals fQXjY =1;:::;QXjY =K g. Finally, TAC-GAN introduces the following min-max game First, we review the AC-GAN [15] and the analysis in [5, mi 18] to show why AC-GAN learns a biased distribution. The min max VTAC(G; C ) = AC-GAN introduces an auxiliary classifier C and optimizes G Cmi mi the following objective Ez∼PZ ;y∼PY log C (G(z; y); y); (4) min max LAC(G; C; D) = (1) to minimize the JSD between multiple distributions. The G;C D overall objective is Ex∼PX log D(x) + Ez∼PZ ;y∼PY log(1 − D(G(z; y))) mi | {za } min max LTAC(G; D; C; C ) = LAC + VTAC : (5) G;C D;Cmi |{z} d − Ex;y∼PXY log C(x; y) − Ez∼PZ ;y∼PY log C(G(z; y); y); | {z } | {z } b c 3.3. Insights on Twin Auxiliary Classifier GANs where a is the value function of a vanilla GAN, and b TAC-GAN from a variational perspective. Training the c correspond to cross-entropy classification error on real twin auxiliary classifier minimizes the label reconstruction c and fake data samples, respectively. Let QY jX denote error on fake data as in InfoGAN [3]. Thus, when opti- the conditional distribution induced by C. As pointed out mizing over G, TAC-GAN minimizes a lower bound of the in [5], adding a data-dependent negative conditional entropy mutual information. To see this, −HP (Y jX) to b yields the Kullback-Leibler (KL) diver- c mi gence between PY jX and QY jX , VTAC =Ex;y∼QXY log C (x; y) mi c Q (yjx) −H(Y jX) + b = D (P kQ ): (2) = x∼Q y∼Q log Q(yjx) Ex∼PX KL Y jX Y jX E X E Y jX Q(yjx) Similarly, adding a term −HQ(Y jX) to c yields the KL- =Ex∼QX Ey∼QY jX log Q(yjx) c divergence between QY jX and QY jX , mi − Ex∼QX DKL(QY jX kQY jX ) c ≤ − HQ(Y jX): (6) −HQ(Y jX) + c = Ex∼QX DKL(QY jX kQY jX ): (3) As illustrated above, if we were to optimize2 and3, the The above shows that d is a lower bound of −HQ(Y jX). mi generated data posterior QY jX and the real data posterior The bound is tight when classifier C learns the true pos- PY jX would be effectively chained together by the two KL- terior QY jX on fake data. However, minimizing a lower divergence terms. However, HQ(Y jX) cannot be consid- bound might be problematic in practice. Indeed, previous ered as a constant when updating G. Thus, to make the literature [10] has reported unstable training behavior of us- original AC-GAN unbiased, the term −HQ(Y jX) has to ing an adversarial twin auxiliary classifier in AC-GAN. be added in the objective function. Without this term, the TAC-GAN as a generalized CausalGAN. A binary ver- generator tends to generate data points that are away from sion of the twin auxiliary classifier has been introduced the decision boundary of C, and thus learns a biased (de- as Anti-Labeler in CausalGAN [10] to tackle the issue of generate) distribution. Intuitively, minimizing −HQ(Y jX) label-conditioned mode collapse. As pointed out in [10], over G forces the generator to generate diverse samples with the use of Anti-Labeler brings practical challenges with high (conditional) entropy. gradient-based training. Specifically, (1) in the early stage, the Anti-Labeler quickly minimizes its loss if the gener- 4.1. Mutual Information Neural Estimator ator exhibits label-conditioned mode collapse, and (2) in The mutual information I (X; Y ) is equal to the KL- the later stage, as the generator produces more and more Q divergence between the joint Q and the product of the realistic images, Anti-Labeler behaves more like Labeler XY marginals Q ⊗ Q (here we denote Q = P for a con- (the other auxiliary classifier). Therefore, maximizing Anti- X Y Y Y sistent and general notation), Labeler loss and minimizing Labeler loss become a contra- dicting task, which ends up with unstable training. To ac- IQ(X; Y ) = DKL(QXY kQX ⊗ QY ): (7) count for this, CausalGAN adds an exponential decaying MINE is built on top of the bound of Donsker and Varadhan weight before the Anti-Labeler loss term (or d in5 when [4] (for the KL-divergence between distributions P and Q), optimizing G).

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    5 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us