<<

LoGAN: Generating Logos with a Generative Adversarial Neural Network Conditioned on

Ajkel Mino Gerasimos Spanakis Department of Data Science and Knowledge Engineering Department of Data Science and Knowledge Engineering Maastricht University Maastricht University Maastricht, Netherlands Maastricht, Netherlands [email protected] [email protected]

Abstract—Designing a logo is a long, complicated, and expen- et al. [6] to be only one to have tackled this problem thus sive process for any designer. However, recent advancements in far. They propose a clustered approach for dealing with multi- generative algorithms provide models that could offer a possible modal data, specifically logos. Logos get assigned synthetic solution. Logos are multi-modal, have very few categorical prop- erties, and do not have a continuous latent space. Yet, conditional labels, defined by the cluster they belong to, and a GAN generative adversarial networks can be used to generate logos is trained, conditioned on these labels. However, synthetic that could help designers in their creative process. We propose labels do not provide enough flexibility. The need for more LoGAN: an improved auxiliary classifier Wasserstein generative descriptive labels arises: labels that could better convey what is adversarial neural network (with gradient penalty) that is able in the logos, and present designers with more detailed selection to generate logos conditioned on twelve different . In 768 generated instances (12 classes and 64 logos per class), when criteria. looking at the most prominent color, the conditional generation This paper provides a first step to more expressive and part of the model has an overall precision and recall of 0.8 and 0.7 informative labels instead of synthetic ones, which is achieved respectively. LoGAN’s results offer a first glance at how artificial by defining logos using their most prominent color. Twelve intelligence can be used to assist designers in their creative color classes are introduced: , , brown, , gray, process and open promising future directions, such as including more descriptive labels which will provide a more exhaustive and , orange, , , , , and . We propose easy-to-use system. LoGAN: an improved Auxiliary Classifier Wasserstein Gen- Index Terms—Conditional Generative Adversarial Neural Net- erative Adversarial Neural Network (with Gradient Penalty) work, Logo generation (AC-WGAN-GP) that is able to generate logos conditioned on the aforementioned twelve colors. I.INTRODUCTION The rest of this paper is structured as follows: In Section Designing a logo is a lengthy process that requires continu- 2 the related work and background will be discussed, then ous collaboration between the designers and their clients. Each in Section 3 the proposed architecture be conveyed, followed drafted logo takes time and effort to be designed, which turns by Section 4, where the dataset and labeling process will this creative process into a tedious and expensive endeavor. be explained. Section 5 presents the experimental results, Recent advancements in generative models suggest a possi- consisting of the training details and results, and finally we ble use of artificial intelligence as a solution to this problem. conclude and discuss some possible extensions of this work Specifically, Generative Adversarial Neural Networks (GANs) in Section 6. [1], which learn how to mimic any distribution of data. They II.BACKGROUND &RELATED WORK consist of two neural networks, a generator and a discrimina- arXiv:1810.10395v1 [cs.CV] 23 Oct 2018 tor, that are contended against each other, trying to reach a First proposed in 2014 by Goodfellow et al. [1], GANs have Nash Equilibrium [2] in a minimax game. seen a rise in popularity during the last couple of years, after possible solutions to training instability and mode collapse Whilst GANs and other generative models have been used were introduced [7]–[9]. to generate almost anything, from MNIST digits [1] to anime faces [3], Mario levels [4] and fake celebrities [5], logo A. Generative Adversarial Networks generation has yet to receive thorough exploration. One pos- GANs (architecture depicted in Fig.1(a)) consist of two sible explanation is that logos do not contain a hierarchy different neural networks, a generator and a discriminator, that of nested segments, which networks can learn and try to are trained simultaneously in a competitive manner. reproduce. Furthermore, their latent space is not continuous, The generator is fed a noise vector (G(z)) from a probability meaning that not every generated logo will necessarily be distribution (p ), and outputs a generated data-point (fake aesthetically pleasing. To the best of our knowledge, Sage, z image). The discriminator takes its input either from the We gratefully acknowledge the support of Mediaan in this research espe- generator (the fake image) (D(G(z))) or from the training cially for providing the necessary computing resources. set (the real image) (D(x)) and is trained to distinguish Fig. 1. GAN, conditional GAN (CGAN) and auxiliary classifier GAN (ACGAN) architectures, where x denotes the real image, c the class label, z the noise vector, G the Generator, and D the Discriminator. between the two. The discriminator and the generator play a C. GAN Applications two-player minimax game with value function (1), where the These different GAN architectures have been used for discriminator tries to maximize V, while the generator tries to numerous purposes, including (but not limited to): higher- minimize it. quality image generation [5], [17], image blending [18], image super-resolution [19] and object detection [20]. Up until the writing of this paper, logo generation has min max V (D,G) = Ex∼pdata [log D(x)] (1) G D previously only been investigated by Sage et al. [6], who + [log(1 − D(G(z)))] Ez∼pz accomplish three main things: 1) Objective functions: While GANs are able to gen- • Define the Large Logo Dataset (LLD) [21] erate high-quality images, they are notoriously difficult to • Synthetically label the logos by clustering them using the train. They suffer from problems like training stability, non- ResNet Classifier network convergence and mode collapse. Multiple improvements have • Build a GAN conditioned on these synthetic labels been suggested to fix these problems [7]–[9]; including using However, as these labels are defined from computer- deep convolutional layers for the networks [10] and modified generated clusters, they do not necessarily provide intuitive objective functions i.e. to least-squares [11] or to Wasserstein classes for a human designer. More descriptive labels, that distance between the distributions [12]–[14]. could provide designers with more detailed selection criteria and better convey what is in the logos are needed. B. Conditionality This paper offers a solution to the previous problem by: Conditional generation with GANs entails using labeled 1) Using twelve colors to define the logo classes data to generate images based on a certain class. The two types 2) Defining LoGAN: An AC-WGAN-GP that can condi- that will be discussed in the subsections below are Conditional tionally generate logos GANs and Auxiliary Classifier GANs. 1) Conditional GANs: In a conditional GAN (CGAN) [15] III.PROPOSED ARCHITECTURE (architecture depicted in Fig.1(b)) the discriminator and the The proposed architecture for LoGAN is an Auxiliary Clas- generator are conditioned on c, which could be a class label sifier Wasserstein Generative Adversarial Neural Network with or some data from another modality. The input and c are gradient penalty (AC-WGAN-GP), depicted on Fig.2. LoGAN combined in a joint hidden representation and fed as an is based on the ACGAN architecture , with the main difference additional input layer in both networks. being that it consists of three neural networks, namely the 2) Auxiliary Classifier GANs: Contrary to CGANs, in Aux- discriminator, the generator and an additional classification iliary Classifier GANs (AC-GAN) [16] (architecture depicted network 1. The latter is responsible for assisting the discrimi- z in Fig.1(c)) the latent space is conditioned on the class label. nator in classifying the logos, as the original classification loss The discriminator is forced to identify fake and real images, as well as the class of the image, irrespective of whether it is 1Code for this paper is available via fake or real. https://github.com/ajki/LoGAN from AC-GAN was dominated by the Wasserstein distance trained once for every 5 iterations of discriminator training, used by the WGAN-GP. as suggested by Gulrajan et al. [14], z was sampled from a Gaussian Distribution [22], and batch normalization was used [23].

IV. DATA The dataset used to train LoGAN is the LLD-icons dataset [21], which consists of 486’377 32 × 32 icons. 1) Labeling: To extract the most prominent color from the image a k-Means algorithm with k = 3 was used to define the RGB values of the centroids. The algorithm makes use of the MiniBatchKMeans implementation from sci-kit learn package, and the package webcolors was used to turn the RGB values into color words. The preliminary colors consisted of 2, which were grouped into 12 main classes: black, blue, brown, cyan, gray, green, orange, pink, purple, red, white, and yellow. The class distribution can be observed in Fig.3.

Fig. 2. LoGAN architecture, where x denotes the real image, c the class label, z the noise vector, G the Generator, D the Discriminator and Q the Classifier

In an original AC-GAN [16] the discriminator is forced to optimize the classification loss together with the real-fake loss. This is shown in equations (2) & (3), which describe the loss from defining the source of the image (training set or generator) and the loss from defining the class of the image respectively. The adversarial aspect of AC-GAN comes as the ACGAN ACGAN discriminator tries to maximize LClass + LSource , whilst ACGAN ACGAN the generator tries to maximize LClass − LSource [16].

ACGAN LSource =E[log P (S = real|Xreal)] (2) + E[log P (S = fake|Xfake)] Fig. 3. Dataset distribution by class.

ACGAN LClass = E[log P (C = c|Xreal)]+E[log P (C = c|Xfake)] V. EXPERIMENTAL RESULTS (3) In this section, the training process, quality evaluation and Since WGAN-GP is more stable while training, the pro- the results obtained from the model 3 will be presented. posed architecture makes use of the WGAN-GP loss, instead of the AC-GAN loss. The loss function for the discriminator A. Training and generator of LoGAN are the same as a WGAN-GP [14], Fig.4 shows the loss graphs per batch for the discriminator stated in equations (4) & (5). 4a, generator 4b and classifier 4c. It can be observed that both the discriminator and the generator have not converged as the W GANGP loss graphs have a downward trend. This does, however, not LD = − Ex∼pd [D(x)] + Exˆ∼pg [D(ˆx)] 2 imply improper training as neither WGAN nor WGAN-GP are + λExˆ∼pg [(||∇D(αx + (1 − αxˆ))||2 − 1) ] (4) guaranteed to reach convergence [8]. The classifier on the other hand has converged with a loss W GANGP value close to 1. Further investigation into the classification LG = −Exˆ∼pg [D(ˆx)] (5) loss shows that the classification loss for fake images has The loss of the additional classifier network, is defined as: 2 LoGAN A list of the X11 color names, and the class group- LQ = Ex,y[log Q(y|x)] (6) ing of RGB values can be found on Wikipedia at : https://en.wikipedia.org/wiki/Web colors#X11 color names To avoid instability during training and mode collapse 3The model was trained on a Windows 10 machine, with a Tesla K80 GPU, certain measures were taken. The generator and classifier were for 400 epochs, which lasted around three full days. (a) Fig. 5. Classification loss for generated images.

B. Quality evaluation The network is expected to generate logo-resembling images that have clear color definition. In each epoch, 64 logos will be generated per class. The top three most prominent colors in the logos will be extracted, and the generated pairs and triplets will be analyzed. For each class (c), the precision (7), recall (8) and F1-score (9) will be measured for the most prominent color in the logo. correctly generated as(c) P recision = (7) total generated as(c) correctly generated as(c) Recall = (8) total actual(c) (b) P recision ∗ Recall F 1 = 2 ∗ (9) P recision + Recall C. Results The results for the class conditioned generation after 400 epochs are shown in Fig.6. As expected, a slight blurriness can be noticed on the generated logos. This is because the training images are only 32×32 pixels. Despite the blurriness, the generated logos resemble feasible ones. The generated images are mainly dominated by round and square shapes. Even irregular shapes have been generated, for example the heart and the x in the array of white logos. A look-alike of the Google Chrome logo is also present in the midst of the cyan class. The precision, recall and F1-score of the class conditioned generation after 400 epochs is shown on Table I. The precision (c) scores are relatively high with the exception of white and Fig. 4. Discriminator, Generator and Classifier losses, where the X-axis gray. This is because the most predominantly white logos are represents the batch number, and the Y-axis denotes the loss value for that small logos with a lot of white or transparent space around. certain batch. Consequently many small logos generated in other colors are going to be classified as white. Similarly, because of its neutrality the gray class also appears a lot in other classes. converged to zero (loss depicted on Fig.5). This means that The recall values are lower than the precision overall, with the generated images get classified correctly. red having the highest recall at 0.92 and pink having the lowest Fig. 6. Results from the generation of 64 logos per class after 400 epochs of training. Classes from left to right top to bottom: green, purple, white, brown, blue, cyan, yellow, gray, red, pink, orange, black.

TABLE I intelligence techniques, namely Generative Adversarial Net- PRECISION,RECALLAND F1-SCOREOFTHEMOSTPROMINENTCOLORIN works. The proposed model can successfully create logos if EACHOFTHELOGOSIN FIG.6. given a certain keyword, which in our case consisted of the Class Precision Recall F1 most prominent color in the logo. This class of keywords can black 0,95 0,86 0,90 be considered descriptive as it provides a property of the logo blue 0,73 0,69 0,71 brown 0,63 0,47 0,55 that is easy for humans to distinguish. cyan 0,98 0,66 0,79 The proposed architecture consists of an Auxilliary Classi- gray 0,57 0,50 0,53 green 1 0,80 0,89 fication Wasserstein GAN with gradient penalty (AC-WGAN- orange 0,96 0,80 0,87 GP) and generates logos conditioned on twelve colors. This pink 0,95 0,30 0,45 helps designers in their creative process and while brainstorm- purple 0,65 0,41 0,50 red 0,84 0,92 0,88 ing, making logo design cheaper and less time-consuming. As white 0,24 0,83 0,38 the generated logos have very low resolution, they can serve yellow 0,96 0,78 0,86 as a very rough first draft of a final logo, or as a means of Average 0,79 0,67 0,69 inspiration for the designer. Regarding the results, the classifier converged, and the generated logos meet the requirements of the class they belong to. This is backed up by precision, recall, at 0.3. Most of the pink labeled logos generated belong to class and comparison with other logos. white. Despite the promising results, a limitation of the approach The F1-score marks black on top with 0.90, and white on is the blurriness of the generated logos. At the same time, the bottom with 0.38. Based on Equation (9), and the fact that color is not a stand-alone keyword for defining a logo. Higher white has the lowest precision, its F1-score is also expected resolution training images and other labels such as the shape of to be low. the logo or the focus of the company would provide valuable Fig.7 shows the distribution of the top three generated colors input, thus improving the results. in each class. This distribution is calculated by extracting the top three colors from each logo, and distinguishing the most Possible extensions to this work include conditioning on prominent ones in each category. As expected from the results more labels, such as the shape of the logo. However, as logos in Table I, white and gray are present in the top three for do not always have clear geometrical shapes, possible issues many of the classes. Some interesting combinations include the could include logos with text only or logos with irregular orange class, where the color brown appears, and the yellow shapes. These issues could be overcome by splitting the dataset class, where the color blue appears. There are three classes into two main groups, logos with an obvious geometrical which get generated using only shades from their own class shape, e.g. quadrilateral, circular or triangular; and logos with (blue, brown, purple). However, if we consider black and white a non-regular shape, e.g. text, letters, hearts, etc. Extracting as , that number rises to five. shapes from the first group can be easily done using packages like OpenCV. The non-regular group of shapes may be a bit more challenging to label. A possible path is to use a tool such VI.CONCLUSION &FUTUREWORK as tessaract-ocr perform optical character recognition to In this paper, it was shown how designers can be assisted extract the text in the logo, and use it as the label. As for the in their creative process for logo design by modern artificial irregularly shaped logos, those could be put in an extra class Fig. 7. Distribution of the top three generated colors per class. with a label such as ’others’. [10] A. Radford, L. Metz, and S. Chintala, “Unsupervised representation Finally, another possible set of labels for the logos could learning with deep convolutional generative adversarial networks,” un- published. be gathered for the LLD-logos dataset (which also provides [11] X. Mao, Q. Li, H. Xie, R. Y. Lau, Z. Wang, and S. P. Smolley, “Least higher quality images) by gathering the most used words squares generative adversarial networks,” in 2017 IEEE International to describe the company the logo belongs to. Combining Conference on Computer Vision (ICCV). IEEE, 2017, pp. 2813–2821. [12] M. Arjovsky, S. Chintala, and L. Bottou, “Wasserstein gan,” unpublished. these labels with word embedding models could potentially [13] D. Berthelot, T. Schumm, and L. Metz, “Began: Boundary equilibrium introduce a semantic meaning to the logos, further boosting generative adversarial networks,” unpublished. the interpretability of the current approach. [14] I. Gulrajani, F. Ahmed, M. Arjovsky, V. Dumoulin, and A. C. Courville, “Improved training of wasserstein gans,” in Advances in Neural Infor- REFERENCES mation Processing Systems, 2017, pp. 5769–5779. [15] M. Mirza and S. Osindero, “Conditional generative adversarial nets,” [1] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, unpublished. S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial nets,” in [16] A. Odena, C. Olah, and J. Shlens, “Conditional image synthesis with Advances in neural information processing systems, 2014, pp. 2672– auxiliary classifier gans,” unpublished. 2680. [17] H. Zhang, T. Xu, H. Li, S. Zhang, X. Huang, X. Wang, and D. Metaxas, [2] J. F. Nash et al., “Equilibrium points in n-person games,” Proceedings “Stackgan: Text to photo-realistic image synthesis with stacked genera- of the national academy of sciences, vol. 36, no. 1, pp. 48–49, 1950. tive adversarial networks,” in IEEE Int. Conf. Comput. Vision (ICCV), [3] Y. Jin, J. Zhang, M. Li, Y. Tian, H. Zhu, and Z. Fang, “Towards 2017, pp. 5907–5915. the automatic anime characters creation with generative adversarial [18] H. Wu, S. Zheng, J. Zhang, and K. Huang, “Gp-gan: Towards realistic networks,” unpublished. high-resolution image blending,” unpublished. [4] V. Volz, J. Schrum, J. Liu, S. M. Lucas, A. M. Smith, and [19] C. Ledig, L. Theis, F. Huszar,´ J. Caballero, A. Cunningham, A. Acosta, S. Risi, “Evolving mario levels in the latent space of a deep A. Aitken, A. Tejani, J. Totz, Z. Wang et al., “Photo-realistic single convolutional generative adversarial network,” in Proceedings of image super-resolution using a generative adversarial network,” unpub- the Genetic and Evolutionary Computation Conference (GECCO lished. 2018). New York, NY, USA: ACM, July 2018. [Online]. Available: [20] J. Li, X. Liang, Y. Wei, T. Xu, J. Feng, and S. Yan, “Perceptual http://doi.acm.org/10.1145/3205455.3205517 generative adversarial networks for small object detection,” in IEEE [5] T. Karras, T. Aila, S. Laine, and J. Lehtinen, “Progressive growing of CVPR, 2017. gans for improved quality, stability, and variation,” unpublished. [21] A. Sage, E. Agustsson, R. Timofte, and L. Van Gool, “Lld - large logo [6] A. Sage, E. Agustsson, R. Timofte, and L. Van Gool, “Logo synthe- dataset - version 0.1,” https://data.vision.ee.ethz.ch/cvl/lld, 2017. sis and manipulation with clustered generative adversarial networks,” [22] T. White, “Sampling generative networks: Notes on a few effective unpublished. techniques,” unpublished. [7] N. Kodali, J. Abernethy, J. Hays, and Z. Kira, “On convergence and [23] S. Ioffe and C. Szegedy, “Batch normalization: Accelerating deep stability of gans,” unpublished. network training by reducing internal covariate shift,” unpublished. [8] L. Mescheder, A. Geiger, and S. Nowozin, “Which training methods for gans do actually converge? arxiv preprint,” unpublished. [9] T. Salimans, I. Goodfellow, W. Zaremba, V. Cheung, A. Radford, and X. Chen, “Improved techniques for training gans,” in Advances in Neural Information Processing Systems, 2016, pp. 2234–2242.