DECEMBER 16, 2019

Generating Art from Neural Networks

Generative adversarial networks have received much media attention due to the rise of . These algorithms are finding unique applications in the arts and helping us make giant strides in understanding .

By Tejesh Kinariwala

WorldQuant, LLC 1700 East Putnam Ave. Third Floor Old Greenwich, CT 06870 www.weareworldquant.com 12.16.19 GENERATING ART FROM NEURAL NETWORKS PERSPECTIVES

developing a machine that can.2 In the past few decades, AI research “All art is has increasingly adopted statistical modeling techniques like , in which systems learn by looking for patterns but imitation of nature.” in data and making inferences with minimal human intervention. ― Seneca One such modeling technique, called a neural network, has driven much progress in recent years, leveraging growing computational IN OCTOBER 2018, A PAINTING TITLED PORTRAIT OF EDMOND power and access to massive datasets. GANs are the latest in Belamy was expected to sell at auction for $7,000 to $10,000, the line of such models and take a uniquely creative approach but to the surprise of auction house Christie’s it ended up using neural networks to train machines. So groundbreaking is fetching the whopping price of $432,500.1 The gilt-framed this idea that Yann LeCun, one of the modern pioneers in artificial painting is a portrait of a black-clad man with indistinct facial intelligence, has described GANs as the “coolest idea in machine features. The corners of the portrait are unfinished, but the learning in the last 20 years.”3 most unique part of the painting, and perhaps the reason for its high price, is the mathematical formula that can be FROM DISCRIMINATION TO GENERATION seen in the bottom right corner, where the artist’s signature To understand the game-changing potential of GANs, we need to normally would be found. The painting was created not by a first look at the concepts of discriminative modeling and generative human but by an algorithm. Specifically, it was generated by modeling. In machine learning, researchers have been trying to a class of machine learning algorithms known as generative develop algorithms that can ingest large volumes of training data adversarial networks (GANs), developed by Ian Goodfellow, to learn and understand the world. But until recently, most of the a renowned artificial intelligence (AI) researcher currently noteworthy progress in the field revolved around the idea of dis- working at Apple. criminative modeling. This refers to tasks like identifying whether a photo contains a dog or whether a given painting was created GANs have received a lot of media attention recently due to the by van Gogh. Here the algorithms learn from training data, with rise of deepfakes — videos created by superimposing celebri- each observation labeled. Mathematically speaking, discriminative ties’ and politicians’ faces on other people’s modeling tries to estimate the probability bodies, often those of impersonators. These that an observation x belongs to a category deepfakes, which are powered by GANs, are y. Since the launch of the ImageNet data- eerily realistic and capable of convincing base in the early 2010s, the ImageNet Visual viewers that they feature real celebrities. Recognition Challenge and the development Unsurprisingly, GANs have found applica- of the deep convolutional neural network tions in all kinds of visual content editing, (CNN), such image classification tasks have from auto-generating anime characters to become easier, with many considering the changing photos of fashion models to show challenge a solved problem. different poses to increasing the resolution of blurry photographs. The video game Generative modeling, on the other hand, design industry is on the verge of a revo- is not merely about identifying whether a lution thanks to this technology, which is photo shows a dog. It learns from a training being used to create more-realistic com- dataset of images of dogs to figure out the puter graphics and virtual environments. rules about their appearance and generate or Some consumer-facing applications, like synthesize new canine images. Importantly, FaceApp, also employ GANs, showing users this model should be probabilistic and not how they would look if they aged a certain deterministic. A deterministic model always number of years. Even astronomers are Portrait of Edmond Belamy produces the same result, given a set of using GANs to fill in parts of the sky with Source: @obvious_art starting conditions or initial parameters. missing data and generate realistic realizations of deep-space The generative model should therefore include a random element for further research. so that the new, synthesized image is different every time. Assume there is some unknown probabilistic distribution that describes why But GANs’ true potential lies in how the algorithms could advance certain images are likely to be found in the training dataset and other the field of AI from narrow applications to more general ones. Ever images are not. The generative model should closely resemble this since Alan Turing published his famous paper asking whether distribution and sample from it to output a group of pixels that look machines can think, there has been steady progress toward like they could have been part of the original training dataset.

Copyright © 2019 WorldQuant, LLC December 2019 2 12.16.19 GENERATING ART FROM NEURAL NETWORKS PERSPECTIVES

Yann LeCun, one of the modern The first term on the right-hand side of the formula represents the likelihood of the real sample passing through the discriminator; the pioneers in artificial intelligence, has second term is the likelihood of the synthetic sample not passing described GANs as the “coolest idea in through. The aim of the discriminator is to maximize this function so that in the most ideal case all real samples will pass through machine learning in the last 20 years.” and synthetic samples won’t. The generator’s job is exactly the opposite — to minimize the function. The two networks engage A GAN comprises neural networks that are based on the preceding in this zero-sum game until the model reaches an equilibrium. In two models but engaged in opposing objective functions: a gen- fact, the signature in the Edmond Belamy painting is a version of erative network and a discriminator, or adversarial, network. The this formula. generative network is trained to take random noise as input and output a synthetic candidate. To create a painting, a GAN would WALKING A TIGHTROPE take in numerous samples of paintings as input and generate an For generative adversarial networks, the most crucial challenge artificial one. To generate artificial faces, it would study a huge data lies in the training process. This is typically done in a cyclical set of real photos of people. manner so both networks have an opportunity to learn from each other’s progress. In one step, the generator learns from how the The adversarial network, on the other hand, is trained to discrim- discriminator classified the previously generated samples. If some inate between a synthetic candidate and a real one. That is, this were more likely to get classified as real than others, the generator discriminator is expected to “catch” or classify a generated painting learns to produce more samples similar to them. The discrimina- or an artificial face as being fake. When trained in a cyclical fashion, tor is frozen until the generator has learned as much as possible the generative network becomes progressively better at what it from the current state of its adversary. Once that has happened, does — generating synthetic candidates very close to the real ones. the generator is frozen and the discriminator is allowed to learn And the discriminator network gets better at its job of catching what made some of the samples almost get classified as real fakes and picking out the synthetic candidates. in the previous iteration; this helps the discriminator spot these near-misses going forward. This cycle is repeated again and again, Think of the generative network as a forger producing imitations improving both networks. of great artworks and the adversarial network as an appraiser evaluating the authenticity of those works. The two are engaged in It’s not ideal for one of the networks to advance too quickly. If a constant tug of war. The forger wants the fake to be misclassified either network gets too good before the other can catch up, the as real; the appraiser stands in the forger’s way because he can spot training will plateau and the overall result will be suboptimal.4 A the fakes. The forger makes a large number of attempts and learns useful analogy is that of two chess students playing each other from what the appraiser allows to go through. The appraiser, for to improve their respective games. They both have to learn and his part, learns from all the tricks the forger plays and in doing so improve at roughly the same pace. If one student improves her becomes better and better at distinguishing a fake from a real work game significantly more than the other, they will both end up of art. This process helps both networks understand the nuances with a suboptimal level of expertise: The better player will not be of what makes a painting real. How do we know when the training challenged enough, and the lesser player will keep losing without is complete? When a human eye cannot tell whether the painting learning anything significant. was created by an algorithm or by an actual artist. When trained well, GANs can be tools to generate information in Mathematically, the GAN system can be represented by the fol- any scenario where we have a certain understanding of what to lowing function: expect and where we have a system to tell if the generated infor- mation meets our expectations.

Consider the simple but all too common goal of increasing the D and G denote, respectively, the discriminative and generative resolution of a photograph, or “upscaling.” Starting with a low-res- models. D(x) represents the probability that x came from the real olution image, a GAN’s generator will create thousands of random data rather than the generator’s distribution. G(z) is a function that high-resolution images as candidates to be the upscaled version generates output when a noise z is introduced. By that logic, it can of the original. In other words, these are candidates for high-res- be seen that D(G(z)) estimates the probability that a synthesized olution images that could produce our original input image if their data instance is real. E stands for the expected value of the respec- resolutions were reduced, or downsampled. The discriminator will tive probability distributions. then go through these high-resolution images and try to classify

Copyright © 2019 WorldQuant, LLC December 2019 3 12.16.19 GENERATING ART FROM NEURAL NETWORKS PERSPECTIVES

them based on the most likely and reasonable possibilities, given It’s not ideal for one of the networks to its training over many high-resolution images. Together the gen- erator and the discriminator will generate an upscaled image from advance too quickly. If either network a low-resolution one that will be closest to a real high-resolution gets too good before the other can image, if it had existed.5 Essentially, the GAN tries to make the best guess based on its training, even though it may not initially have all catch up, the training will plateau and the information. GANs can also be used to remove unwanted objects the overall result will be suboptimal. or undesirable elements from an image — for example, watermarks or lampposts and trash bins. This is done by deleting the unwanted with famous artworks, they were able to synthesize new renderings elements and letting the GAN fill the space with the most expected of the photographs in those very artistic styles. The new render- information, as in the process of upscaling described above.6 ing showed the same content as the photograph, but the style resembled the artwork. For example, combining a photograph of What if we have no input image to start with, but only a verbal the Neckarfront (a tourist attraction in Tübingen, Germany) and description? Let’s say we have the words “a blue bird sitting on van Gogh’s painting The Starry Night as the style reference image, a tree branch, facing left.” Theoretically, a GAN should be able to the algorithm was able to create a new, artistic version of the create an image from just the words describing the image. In the photo, complete with post-Impressionistic flourishes resembling standard process, the generator will create thousands of images the painting. and the discriminator will look through all of them, allowing only those that match the description. After many iterations, the GAN Although the 2015 paper was groundbreaking, it relied on a single will generate an image of a blue bird sitting on a branch facing left, image as the reference for the style. Subsequent research has and it will be an entirely new creation because it was generated taken this idea further by training GANs to learn from a domain from a model involving a random element. of images, such as the complete works of a specific painter or art from a certain time period. This is precisely what the Paris-based The ability to generate near-realistic data comes in handy in other art collective Obvious did to create the painting Portrait of Edmond areas of machine learning research, such as reinforcement learn- Belamy. It trained the GAN on a dataset of 15,000 portraits painted ing, which involves the optimization of a goal through trial and between the 14th and 20th centuries. The generator in the GAN error. Trial-and-error experiments can be complicated to conduct in was tasked with synthesizing new images based on this dataset, certain environments. Consider the case of teaching a self-driving while the discriminator tried to catch the images that were not car to navigate a rocky terrain with cliffs and pits. If an algorithm human-made. could simulate the environment using a GAN, the testing could be done in a virtual setting and the learning could be accelerated. MORE THAN MEETS THE EYE As GANs have evolved, new, unforeseen uses for them have been THE STATE OF THE ART discovered. One of these involves deepfakes and has caught wide- Some of the more straightforward applications of GANs include spread media attention. Deepfakes use GANs to superimpose upscaling, removing objects from images and converting audio content onto a source image or video and seamlessly alter the to images. But the real fun begins when GANs are combined with original content. They can be used for fun and harmless applica- other technologies, such as convolutional neural networks that tions like impersonating celebrities and transferring professional specialize in image processing and object recognition tasks. CNNs dance moves onto the body of amateurs, but in the wrong hands the consist of layers, or filters, that extract a certain feature from an technology can be weaponized for harassment, social engineering, image and produce differently filtered versions of the input image. political misinformation campaigns and propaganda. The These are capable of transforming images into representations video of President Barack Obama superimposed on comedian that capture the high-level content (what objects are in the image, Jordan Peele was a timely warning about the possible dangers how they are arranged) without worrying about the exact pixel of misinformation using the technology. We have reached a stage values. They can also produce a representation of the style — a where anyone with access to a reasonable dataset and comput- texturized version of the input image that captures the color and ing power could create such videos to mislead the public, disrupt localized structures. financial markets or even cause national security incidents.

In their landmark 2015 paper, computer scientists Leon Gatys, Research is already underway into catching these deepfakes Alexander Ecker and Matthias Bethge made the breakthrough of through forensic techniques that model subtle mannerisms and separating content and style representations.7 They then demon- facial expressions specific to an individual’s speaking patterns. One strated the idea of style transfer: By mixing an input photograph observation made in the early stages of deepfake generation was

Copyright © 2019 WorldQuant, LLC December 2019 4 12.16.19 GENERATING ART FROM NEURAL NETWORKS PERSPECTIVES

Research is underway into catching to narrow tasks in specific domains. Think of intelligent systems like Translate, Siri, Alexa and common facial recognition these deepfakes through forensic software. These demonstrate high levels of intelligence for specific techniques that model subtle functions and in some cases are superior to human capability. But these systems are not of much use when applied to tasks other mannerisms and facial expressions. than their specialty. In contrast, a hypothetical form of artificial general intelligence would be able to extend learning across dif- that the eyes of artificially created faces didn’t blink like natural eyes ferent functions and could tackle more-complicated problems, would. This was because the training data did not include images react to unfamiliar environments and make decisions on its own. with the person’s eyes closed, so the algorithms had no way to learn about the concept of blinking. But once this flaw was noticed, The arrival of GANs has added much excitement to this grow- the next generation of deepfakes accounted for blinking and easily ing field of research. It has allowed machine learning techniques bypassed detection techniques. Subsequently, deepfake pioneer to progress beyond merely being able to understand and label Hao Li conducted a study that revealed certain “soft biometrics” — the data supplied to them. The techniques are now getting better distinct movements of the face, head and upper body — that could at figuring out how the data was generated in the first place. To help distinguish real videos of people from deepfakes.8 But Li thinks achieve true intelligence, machines should not only be able to figure it will soon become impossible to detect fakes, with new forensic out whether a photo contains a dog or a cat but also be able to techniques and countermeasures battling each other, improving understand what it means for the photo to be of a dog or a cat. The each side, not unlike the two networks competing in a GAN system.9 latest applications of GANs in visual content generation, especially in creating artworks, seem to suggest that we are heading in the CONCLUSION right direction. ■ In his famous 1950 test, Alan Turing proposed that a machine can be said to exhibit intelligent behavior if a human evaluator is unable Tejesh Kinariwala is a Vice President, Portfolio Management, at to distinguish its responses from that of a human in a text-based WorldQuant and has a bachelor’s degree in electrical engineering from the Indian Institute of Technology, Delhi, and an MBA from the Indian conversation. Since then, the field of AI has grown dramatically and Institute of Management, Ahmedabad. given rise to a number of applications, most of which are restricted

ENDNOTES

1. “Is Artificial Intelligence Set to Become Art’s Next Medium?” Christie’s, 6. Rakshith Shetty, Mario Fritz and Bernt Schiele. “Adversarial Scene Editing: December 2018. Automatic Object Removal from .” 32nd Conference on Neural Information Processing Systems, 2018. 2. Alan Turing. “Computing Machinery and Intelligence.” Mind 59, no. 236 (1950): 433–460. 7. Leon A. Gatys, Alexander S. Ecker and Matthias Bethge. “A Neural Algorithm of Artistic Style.” Journal of Vision 16, no. 12 (2015). 3. Yann LeCun, “: The Next Frontier in AI”, Robotics Institute Seminar Series, Carnegie Mellon, November 18, 2016. 8. Shruti Agarwal, Hany Farid, Yuming Gu, Mingming He, Koki Nagano and Hao Li. “Protecting World Leaders Against Deep Fakes.” 2019 IEEE Conference on 4. Tim Salimans, Ian Goodfellow, Wojciech Zaremba, Vicki Cheung, Alec Radford and , 2019. and Xi Chen. “Improved Techniques for Training GANs.” 30th Conference on Neural Information Processing Systems, 2016. 9. James Vincent. “Deepfake Detection Algorithms Will Never Be Enough.” The Verge, June 27, 2019. 5. Christian Ledig, Lucas Theis, Ferenc Huszár, Jose Caballero, Andrew Cunningham, Alejandro Acosta, Andrew Aitken, Alykhan Tejani, Johannes Totz, Zehan Wang and Wenzhe Shi. “Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network.” IEEE Conference on Computer Vision and Pattern Recognition, 2017.

Thought Leadership articles are prepared by and are the property of WorldQuant, LLC, and are being made available for informational and educational purposes only. This article is not intended to relate to any specific investment strategy or product, nor does this article constitute investment advice or convey an offer to sell, or the solicitation of an offer to buy, any securities or other financial products. In addition, the information contained in any article is not intended to provide, and should not be relied upon for, investment, accounting, legal or tax advice. WorldQuant makes no warranties or representations, express or implied, regarding the accuracy or adequacy of any information, and you accept all risks in relying on such information. The views expressed herein are solely those of WorldQuant as of the date of this article and are subject to change without notice. No assurances can be given that any aims, assumptions, expectations and/or goals described in this article will be realized or that the activities described in the article did or will continue at all or in the same manner as they were conducted during the period covered by this article. WorldQuant does not undertake to advise you of any changes in the views expressed herein. WorldQuant and its affiliates are involved in a wide range of securities trading and investment activities, and may have a significant financial interest in one or more securities or financial products discussed in the articles.

Copyright © 2019 WorldQuant, LLC December 2019 5