Topology Distance: a Topology-Based Approach for Evaluating Generative Adversarial Networks
Total Page:16
File Type:pdf, Size:1020Kb
Topology Distance: A Topology-Based Approach For Evaluating Generative Adversarial Networks Danijela Horak 1 Simiao Yu 1 Gholamreza Salimi-Khorshidi 1 Abstract resemble real images Xr, while D discriminates between Xg Automatic evaluation of the goodness of Gener- and Xr. G and D are trained by playing a two-player mini- ative Adversarial Networks (GANs) has been a max game in a competing manner. Such novel adversarial challenge for the field of machine learning. In training process is a key factor in GANs’ success: It im- this work, we propose a distance complementary plicitly defines a learnable objective that is flexibly adaptive to existing measures: Topology Distance (TD), to various complicated image-generation tasks, in which it the main idea behind which is to compare the would be difficult or impossible to explicitly define such an geometric and topological features of the latent objective. manifold of real data with those of generated data. One of the biggest challenges in the field of generative More specifically, we build Vietoris-Rips com- models – including for GANs – is the automatic evaluation plex on image features, and define TD based on of the goodness of such models (e.g., whether or not the the differences in persistent-homology groups of data generated by such models are similar to the data they the two manifolds. We compare TD with the were trained on). Unlike supervised learning, where the most commonly-used and relevant measures in goodness of the models can be assessed by comparing their the field, including Inception Score (IS), Frechet´ predictions with the actual labels, or in some other deep- Inception Distance (FID), Kernel Inception Dis- learning models where the goodness of the model can be tance (KID) and Geometry Score (GS), in a range assessed using the likelihood of the validation data under the of experiments on various datasets. We demon- distribution that the real data comes from, in most state of strate the unique advantage and superiority of our the art generative models we do not know this distribution proposed approach over the aforementioned met- explicitly or can not rely on labels for such evaluations. rics. A combination of our empirical results and the theoretical argument we propose in favour Given that the data (or their corresponding features) in such of TD, strongly supports the claim that TD is a situations can be assumed to lie on a manifold embedded in powerful candidate metric that researchers can a high dimensional space (Goodfellow et al., 2016), tools employ when aiming to automatically evaluate from topology and geometry come as a natural choice when the goodness of GANs learning. studying differences between two data set. Hence, we pro- pose topology distance (TD) for the evaluation of GANs; it compares the the topological structures of two manifolds, 1. Introduction and calculates a distance between them to evaluate their (dis)similarities. We compare TD with widely-used and rel- arXiv:2002.12054v1 [cs.LG] 27 Feb 2020 Generative Adversarial Networks (GANs) (Goodfellow evant metrics, and demonstrate that it is more robust to noise et al., 2014) are a class of deep generative models that compared to competing distance measures on GAN’s, and have achieved unprecedented performance in generating it is better suited to distinguish among various shapes that high-quality and diverse images (Brock et al., 2019). They the data might come in. TD is able to evaluate GANs with have also been successfully applied to a variety of image- new insights different from other existing measurements. generation tasks, e.g. super resolution (Ledig et al., 2017), It can therefore be used either as an alternative to, or in image-to-image translation (Zhu et al., 2017), and text-to- conjunction with other metrics. image synthesis (Reed et al., 2016), to name a few. The GAN framework consists of a generator G and a discrimi- 1.1. Related work nator D, where G generates images Xg that are expected to There have been multiple metrics proposed to automatically 1 Investments AI, AIG, London, United Kingdom. Correspon- evaluate the performance of GANs. In this paper we focus dence to: Danijela Horak <[email protected]>. on the most commonly-used and relevant approaches (as follows); for a more comprehensive review of such measure- Topology Distance ments, please refer to (Borji, 2018). is that the target distance is computed by considering the geometric and topological properties of those latent features. Inception Score (IS) The main idea behind IS (Salimans et al., 2016) is that generated images of high quality are Geometry Score (GS) GS (Khrulkov & Oseledets, 2018) expected to meet two requirements: They should contain Geometry score is defined as l2 distance between means easily classifiable objects (i.e. the conditional label distribu- of the relative living-times (RLT) vectors associated with tion p(yjx) with low entropy) and should be diverse (i.e. the the two sets of images. RLT of a point cloud data (e.g., a marginal distribution p(y) with high entropy). IS measures group of images in the feature space) is an infinite vector the average KL divergence between these two distributions: (u1; u2;:::) whose i-th entry is a measure of persistent in- tervals having 1-persistent homology group rank equal to 1 Pn(n−1)=2 IS = exp( x∼p [KL(p(yjx) jj p(y))]); (1) i. That is, ui = Ij(dj+1 − dj), where E g dn(n−1)=2 j=1 I equals 1 if the rank of persistent homology group of di- where p is the generative distribution. IS relies on a pre- j g mension 1 in interval [d ; d ] is i, and zero otherwise. trained Inception model (Szegedy et al., 2016) for the classi- j j+1 Persistent homology parameters d , i 2 [0 ::: n(n − 1)=2] fication of the generated images. Therefore, a key limitation i are sorted distances in the observed point cloud data. of IS is that it is unable to evaluate the image types that are distinct from those that the Inception model was trained on. Geometry score exploits similar idea to the topological dis- tance, with the difference being in the underlying point Frechet´ Inception Distance (FID) and Kernel Inception cloud data used, dimensionality of the homology group and Distance (KID) Proposed by (Heusel et al., 2017), FID distance measure between the persistent diagrams. We claim relies on a pretrained Inception model, which maps each that our method better aligns with the existing theory in the image to a vector representation (or, features). Given two area of computational algebraic topology and has superior groups of data in this vector space (one from the real and experimental results. the other from the generated images), FID measures their similarities, assuming that the features are distributed as multivariate Gaussian; the distance will be the Frechet´ dis- 2. Main idea tance (also known as Wasserstein-2 distance) between the According to the manifold hypothesis (Goodfellow et al., two Gaussians: 2016), real world high dimensional data (and their features) 2 1 lie on a low dimensional manifold embedded in a high di- FID(p ; p ) = µ − µ + Tr(Σ + Σ − 2(Σ Σ ) 2 ) r g r g 2 r g r g mensional space. The main idea of this paper is to compare (2) the latent manifold of the real data with that of the generated where p and p denote the feature distributions of real and r g data, based entirely on the topological properties of the data generated data, (µ , Σ ) and (µ , Σ ) denote the means r r g g samples from these two manifolds. Let and be the and covariances of the corresponding feature distributions, Mr Mg latent manifolds of the real and generated data, respectively. respectively. It has been shown that FID is more robust to We aim to compare these two manifolds using the finite noise (of certain types) than IS (Heusel et al., 2017), but its samples of points V from and V from . assumption of features following a multivariate Gaussian r Mr g Mg distribution might be an oversimplification. Most mainstream methods compare samples Vr and Vg using the lower order moments (e.g. (Heusel et al., 2017)) A similar metric to FID is KID (Mikoaj Bikowski, 2018), – similar to the way we compare two functions using their which computes the squared maximum mean discrepancy Taylor expansion, for instance. However, this would only be (MMD) between the features (learned also from a pretrained valid if the underlying manifold is an Euclidean space (zero Inception model) of real and generated images: curvature), as all moments of the samples are calculated 0 using Euclidean distance. For a Riemannian manifold with KID(p ; p ) = 0 [k(x ; x )] r g Exr;xr∼pr r r 0 a nonzero curvature, this type of approach, at least in theory, + x ;x0 ∼p [k(x ; x )] E g g g g g (3) would not work, and using geodesic instead of Euclidean −2 [k(x ; x )] distance would agree more with the hypothesis. Exr∼pr;xg∼pg r g Here we propose the comparison of the two manifolds on where k denotes a polynomial kernel function k(x; x0) = the basis of their topology and/or geometry. The ideal way ( 1 x|x0 + 1)3 with feature dimension d. Compared with d to compare two manifolds would be to infer if they are ge- FID, KID does not have any parametric form assumption ometrically equivalent, i.e. isometric. This, unfortunately, for feature distribution, and has a unbiased estimator. is not attainable. However, we could compare two mani- Our proposed TD is closely related to FID and KID in that folds by the means of eigenvalues of the Laplace-Beltrami it also measures the distance between latent features of real and generated data.