<<

The Aesthetics of Absence: Awareness in the Age of Neural Networks MASSACHUST INSTITUTE by OF TECHNOLOGY Matthew Groh JUL 2 6 2019 B.A., Middlebury College (2010) LIBRARIES ANUNHIVt05 Submitted to the Program of Media Arts and Sciences, School of Architecture and Planning in partial fulfillment of the requirements for the degree of Master of Science in Media Arts and Sciences at the MASSACHUSETTS INSTITUTE OF TECHNOLOGY

June 2019 oMassachusetts Institute of Technology 2019. All rights reserved.

Signature redacted A uthor ...... Program of Media Arts and Sciences, School of Architecture and Planning Signature redacted May10,2019 Certified by ...... I 00r Iyad Rahwan Associate Professor of Media Arts and Sciences Thesis Supervisor Signature redacted Accepted by...... () Tod Machover Acad ic Head and Professor of Media Arts and Sciences The Aesthetics of Absence: Awareness in the Age of Neural Networks by Matthew Groh

Submitted to the Program of Media Arts and Sciences, School of Architecture and Planning on May 10, 2019, in partial fulfillment of the requirements for the degree of Master of Science in Media Arts and Sciences

Abstract

In this hurtling technological age, the world seems more lost than ever before. When we optimize only for what can be observe, we can lose sight of the mysteries that help to define us. This thesis begins with the premise that not all aspects of humanity are amenable to empirical study. Inspired by a monoprint painted by Paul Klee and vividly described by Walter Benjamin, we design and deploy a four-part intervention at the intersection of artificial intelligence and media. First, we probe the tradition of via negativa. Second, we develop an artificial intelligence (AI) model that can disappear objects in photographs and deploy it online on a website called Deep Angel. Frorh August 2018 to April 2019, over 100,000 people visited Deep Angel. Third, we examine the precautionary principle for Al media manipulation with a randomized experiment. In this particular domain with this particular technology, we find that exposure to media manipulation improves individuals' ability to detect manipulations. Fourth, we create art. By infusing ancient wisdom traditions with modern technologies, this thesis points to a path out of digital and material clutter towards a rehabilitation and recovery of what has been lost in this Internet age: presence.

Thesis Supervisor: Iyad Rahwan Title: Associate Professor of Media Arts and Sciences

2

gill lli illiolwim,!PIRRIqRommwpw lglpp I V I 11m RN loom w"R" RoppPROPR" This masters thesis has been examined by a Committee of the Department of Media Arts and Sciences as follows:

Signature redacted

Professor Iyad Rahwan...... Thesis Supervisor Associate Professor of Media Arts and Sciences

Signature redacted

Dr. Andrew Lippman...... Thesis Reader Senior Research Scientist

,Signature redacted W illiam Powers...... Thesis Reader Research Scientist Acknowledgments

Some of the text and images in this thesis proposal have been previously submitted to, will be soon submitted to, is under review at, or accepted in peer-reviewed journals or art galleries. I am grateful for the wonderful guidance of Manuel Cebrian, William Powers, Andrew Lippman, and Iyad Rahwan. I thank Micah Epstein and Julian Kelly for designing fantastic graphics for the Deep Angel website, Joyce Feng for providing excellent research assistance as an MIT UROP, the Harvard Cyber Law group (Mason Kortz, Jessica Fjeld, Sally Kagay, and Rebecca Rechtszaid) for expert legal assistance and guidance, Abhimanyu Dubey for brilliant technical advice, and Zivvy Epstein for outstanding collaboration and support of the development of ideas throughout this project from the technical to the creative. All errors are my own.

3

I IRFM "Mr. P RIM," -- , I 1 .1, 1 Contents

1 Introduction 11 1.1 Creation and Creativity ...... 13 1.2 Experiment in Phenomenology ...... 19 1.3 Artificial Intelligence and Media Manipulation ...... 19

2 Design via Negativa 25 2.1 Angelus Novus ...... 26 2.2 Dtournement ...... 29 2.3 Suchness ...... 30 2.4 Negative Space in Art ...... 31

3 Engineering Vision for Human Vision 33 3.1 Machine Vision ...... 33 3.2 Related Work ...... 34 3.2.1 Object Detection and Instance Segmentation ...... 34 3.2.2 Image Inpainting ...... 35 3.3 Target Object Removal ...... 36 3.4 Unanchored Object Conjuring ...... 38 3.5 Deep Angel ...... 40

5 3.5.1 Interaction Design ...... 40 3.5.2 Backend Architecture ...... 41 3.5.3 Live Deployment ...... 42

4 Science of Deception 45 4.1 User Interactions ...... 45 4.2 Quality Evaluations ...... 47

5 Art from Absence 53 5.1 The Spirit of Wonder ...... 53 5.2 The Broken Flaneur ...... 55 5.3 A l Spirits ...... 55 5.4 Shadows sans Substance ...... 58

6 Conclusion 61

A Meta Deep Angel 63

6

I T @RIP Re I IN1, Ow 1,RRM List of Figures

1-1 Map of Creation: The Aristotelian Elements ...... 17

1-2 Map of Creativity: The of Creativity...... 18

1-3 Photographic manipulation has been a tool of fascist governments in nefarious attempts to subvert reality. On the top left, Joseph Stalin is standing next to Nikolai Yezhov who Stalin later ordered to be executed and disappeared from the photograph. On the top right, Mao Zedong is standing beside the "Gang of Four" who were arrested a month after Mao's death and subsequently erased from this historic photograph. On the bottom, Benito Mussolini strikes a heroic pose on a horse while his trainer holds the horse steady. The photographic manipulation showcases Mussolini's skill for manipulating the facts and covered up his lack of horsemanship...... 23

2-1 Angelus Novus by Paul Klee. 1920...... 28

2-2 Examples of Negative Space in Paul Pfeiffer's Four Horsemen of the Apocalypse and Adrian Piper's Everything ...... 32

3-1 Comparisons of inpainting . Image graphics from Mikhail Erofeev's Image Inpainting Humans vs. AI [22] ...... 36

7 3-2 End-to-end pipeline for Target Object Removal following [30,71] . .. 38 3-3 End-to-end pipeline for Unanchored Object Conjuring following 130,34,71] 39 3-4 Screenshots of Deep Angel's user interface...... 41 3-5 Diagram of Deep Angel's server architecture ...... 42

4-1 Examples of original images uploaded to Deep Angel and corresponding m anipulations ...... 47 4-2 Probability density function displaying the accuracy of guesses over im ages ...... 48 4-3 Accuracy of guesses over exposure to manipulated images with a 95% confidence interval...... 49

5-1 Screen Shot from The Broken Flaneur short film. Watch it on YouTube at https://youtu.be/1QCFAwuIUUE ...... 56 5-2 Photographs from the AI Spirits collection...... 57 5-3 Photographs from the Shadows sans Substance collection...... 59

A-1 This just got meta. Mason Kortz, Joan Donovan, Jessica Fjeld, and Matt Groh are disappeared from their panel at 2019 SXSW titled "Al-Powered Media Manipulation and its Consequences." ...... 64

8 List of Tables

4.1 Top 10 Target Object Removal Selections for Uploaded Images and Targeted Instagram Crawls on Deep Angel. Each Instagram username selection initiated a targeted crawl of Instagram for the three most recently uploaded images of selected user ...... 46 4.2 Logistic regression results on guessing accuracy with user and image fixed effects. Standard errors in parentheses. *, **, and *** indicates statistical significance at the 90, 95, and 99 percent confidence intervals, respectively. All columns include user and image fixed effects. Column (1) shows all users (2) drops all images where nothing was disappeared (3) drops all users who submitted fewer than 10 guesses (4) drops all observations where a user has already seen a particular image (5) keeps only the images qualitatively judged as very high quality...... 52

9 10 Chapter 1

Introduction

Change the of your eyes and you see the whole world before you is radiant. Joseph Campbell

At a moment in history when technology is increasingly distracting, overwhelming, and manipulating, this thesis is a clarion call to humankind to recognize what it means to be human again. Humans are technology's creators. What we create reflects who we are. If we wish to know ourselves (and we ought), we need to make the time to introspect. This is not a new problem; it is our ever-present challenge. In Charlie

Chaplin's final speech in The Great Dictator, he explains the fundamental tension between modern technology and humanity: "Machinery that gives abundance has left us in want. Our knowledge has made us cynical; our cleverness, hard and unkind.

We think too much and feel too little. More than machinery we need humanity." Absence is the oft-overlooked anti-medium that offers a rehabilitation. In the words of Marshall McLuhan, anti-media and counter-environments provide "a means of perceiving the dominant one [environment]" and create a wider awareness [44]. Through absence, we can explore being human with fresh eyes. This thesis is a not

11 an analysis of absence. Rather, it is a generative endeavor, a provocative probe and confrontational intervention. Following the trail of absence to its exhaustion, the aesthetics of absence is a lens to understand media, society, humanity, and the soul. This probe considers how we can learn from the world by what it is not. This anti-approach is called via negativa and has roots in both early Christian mysticism and the Upanishads [68]. From religion to philosophy to politics to art, via negativa can be applied to build a deeper understanding of our environment. This thesis confronts technological determinism by co-opting artificial intelligence (AI) media manipulation to intervene in the default state. People should be able to access their default state network and set their own defaults. Via negativa is the thread that ties together speculative design, machine learning engineering, behavioral science, and generative in this thesis. Taken together, these seemingly disparate directions unite for an intervention aimed at the ultimate question of tech humanism: how do we encode the machines of the future with our best selves? This thesis begins with three meditations: first, the relationship between four modes of creativity and absence; second, an experiment to reawaken wonder and rekindle presence; third, a confrontation with Al media manipulation. The goal of this thesis is to critique and expand on ancient wisdom, build new technology, understand human behavior, and present a fresh perspective. Drawing on metaphors from ancient civilizations to post-modern philosophers, I collaborated with a colleagues to design an interactive media experiment based on an Al model. On the technical side, I engineered a neural network architecture that disappears objects from images. In other words, this neural network architecture is an Al model that can generate absence in photographs. I hosted this AI model on a website called Deep Angel where anyone on the Internet can interact with it. From August 2018 to April 2019, over 100,000 users visited Deep Angel spending on average 2 minutes each on the website.

12 Users uploaded their own images and rated the quality of the manipulation of other users' uploaded images. Based on users' ratings, this thesis examines how exposure to manipulated media affects the ability to detect manipulations. Before concluding, this thesis presents three algorithmically generated artworks.

1.1 Creation and Creativity

Across ancient civilizations from Greece to India to Egypt, philosophers conceptualized Earth and its complexities as a mixture of four elements: earth, water, air, and fire. In On Generation and Corruption, Aristotle posits a further atomization of terrestrial nature. He describes two forms of change (hot and wet) and their corresponding privations (cold and dry) as underlying dimensions upon which matter attains its quality [42]. From a graphical perspective, the four classic elements manifest along points on the Cartesian plane where the X and Y axes are defined by Aristotle's dualistic forms of change. While Aristotle believed this discontinuous movement along these two dimensions could explain creation on Earth, he posited that something else explains the greater cosmos. Instead of the recti-linear motion that characterized changes between the classic four elements, he imagined the movement and materiality of the cosmos as continuous and circular with no contrary [42]. To explain what first set the Cosmos in motion, Aristotle proposed a fifth element, aether, that accounts for the movement and non-dualistic nature of the greater cosmos [42]. In other cultures, this fifth element has been referred to as quintessence (pre-Renaissance Europe), Qi

(ancient China), Mana (ancient Polynesian), and Akasha (ancient Indian). In Jewish mysticism, this element manifests as a process known as tzimtzum, the contraction of infinite light enabling an empty space in which physical world and free will come into existence. This metaphysical non-dualistic essence is central to the Buddhist teaching

13 of Sunyata, the teaching of emptiness. Before probing emptiness, let us consider how things come to be through creativity. Aristotle's systematization of elements bears a striking resemblance to recent conceptualizations of creativity. In Rich Gold's 2007 book, The Plenitude: Creativity, Innovation, and Making Stuff, Gold condenses creativity into a two-by-two cartoon matrix of hats. Each hat - Science, Engineering, Design, Art - represents a cornerstone of creativity and a path to worldly production [27]. While none of these hats are strictly defined, each represent different approaches to the act of creation. In a blog post reflecting on Rich Gold's matrix, John Maeda ascribes a mission to each corner of creativity. Science is for exploration, Engineering for invention, Design for communication, and Art for expression [43]. In creative processes, these hats are all intertwined and support each other. In Age of Entanglement, Neri Oxman extends these ideas into a speculative map, the Krebs Cycle of Creativity, that addresses the questions of (1) how we travel between the four "embodiments of creativity and innovation" and (2) the results of inhabiting creativity's interstitial zones [51]. Akin to Aristotle's forms of change, Oxman identifies two axes of traversal. The first axis spans culture and nature, which divides art and design from science and engineering. The second axis bridges production and perception,,which splits art and science from design and engineering. In contrast to Aristotle's conceptualization of recti-linear movement, Oxman metaphorically describes the energy and movement of creativity with the Krebs Cycle noting the "(r)evolution[ary]", perpetual nature of creative energy [51]. Oxman implores her readers, "Granted, my determination to posit the completed circle-to assert the continuity of the [Krebs Cycle of Creativity]-may be seen as naive, or even sophomoric. Please assume the former, and suspend disbelief." [51] I listen and ask the same as I expand on her ideas. Approach this idea with a

14 beginner's mind and assume naivete not pretension. After all, William James once wrote that it's "only your mystic, your dreamer, or your insolvent tramp or loafer, [who] can afford so sympathetic an occupation, an occupation which will change the usual standards of human value in the twinkling of an eye, giving to foolishness a place ahead of power, and laying low in a minute the distinctions which it takes a hard-working conventional man a lifetime to build up." [35] I speculate a unification of the Aristotelian classification of elements and the Krebs Cycle of Creativity, and I propose a missing fifth element.

In both maps of creation and creativity, the X-axis represents change in entropy, the degree of disorder in a system. According to the Second Law of Thermodynamics, entropy in an open system always increases. 1 Without an external source of heat, a closed system's entropy increases. On the other hand, increasing heat in a closed system can decrease entropy. If we imagine nature as the forces (laws of physics) that act upon an open system at the cosmic scale and culture as the forces (where we develop and maintain values, institutions, art, and technology) that act upon a system within nature, then nature would represent increasing entropy and culture, reversing entropy. By reversing entropy, I mean rejecting the tendency for things to fall apart. We should remember Oxman's refrain that "nature is culture is nature" and recognize we could also imagine nature and culture representing the reverse. [51] The Y-axes of Aristotle and Oxman's maps align based on the qualities of their respective forms. The best way to understand what Aristotle meant by wetness and dryness is to consider an example, say flour. When flour is dry, it is a light powder that is easy to separate and disperse. But, once you add water to it, it immediately sticks together. The material quality of dryness connotes separateness and objectivity.

More specifically, the Second Law of Thermodynamics states that the total entropy of an isolated system can never decrease over time.

15 On the other hand, wetness naturally coheres and manifests in a subjective form. From the dimensions of objectivity and subjectivity, we can project production and perception. Production is a reification of concepts and objects into a defined state. The existence of that state is objective. In contrast, perception takes an input and becomes an observer's experience. The state of the observer's experience is subjective; it depends on the observer. The missing fifth element in the map of creativity is the aesthetics of absence. It is not absence itself, but instead, it is the truth and beauty of presence that absence reveals. W.B. Yeats called it spiritus mundi, the creative spirit that inspires poets [691. In Buddhism, it is called Sunyata or simply Buddha-nature. We can imagine this fifth element as akin to aether with no contrary and perpetual motion. The point here is to recognize the underlying essence of creativity: the generative nature of absence in the "Aha!" moments of presence. I adapt previous maps to include this fifth element at its core. Instead of the metaphor of recti-linear or cyclical motion, creativity most certainly moves in a pattern where the parts resemble the whole. Rather than cyclical repetitions, fractal motion is a metaphor for pattern matching in an ever expanding and changing context. can be generated by simple equations that produce complex geometric figures. For example, the Mandelbrot Set is defined by a set of points obtained from the quadratic recurrence equation Z, 1 = Z + where c represents points in the complex plane and Z, does not tend to infinity as n goes to infinity for Zo = 0. By filtering out non-escaping trajectories, we can plot the Mandelbrot Set on the Complex Plane. After only few iterations, we begin to see the Buddhabrot, a fractal resembling the Buddha. As a metaphor, the Buddhabrot is a visual call to the Buddha-nature in everything. By plotting the Buddhabrot on the Complex Plane and overlaying the plot with the four embodiments of creativity in the same relative positions as the previous

16 maps of creativity, we can consider a new perspective. By moving beyond Euclidean Space to the Complex Plane, perception and production are approached with a mathematical metaphor. Specifically, I re-formulate Oxman's perception/production axis (Aristotle's wet/dry) as the Imaginary axis spaning imagination and reification. Likewise, the nature-culture (hot/cold) axis is represented by changes in entropy. which is signified by the Real axis. The fractal itself represents the entanglement of the chaos and order, nature and culture, perception and production in the iterative creation process. As another perspective on creativity, I present the Buddhabrot of Creativity.

ARISTOTELIAN ELEMENTS 4TH CENTURY BCE

I R 7WAT ERP\A

\EARTH FIRE

COLD < > HOT

Figure 1-1: Map of Creation: The Aristotelian Elements

17 Figure 1-2: Map of Creativity: The Buddhabrot of Creativity.

18 1.2 Experiment in Phenomenology

Another map for creation and creativity could be a blank canvas upon which the observer imbues his or her own meaning. In 1989, Shepard Fairey and his posse set in motion OBEY, a guerrilla marketing campaign with no call to action. Drawing inspiration from the 1988 film They Live and its signs containing single word impera- tives to "CONFORM," "OBEY," and "CONSUME," the posse designed and deployed an experiment. 2 They designed a stencil of Andre the Giant and the single word, OBEY, and posted stickers of the stencil and OBEY in dense, urban environments. Without social media or even the World Wide Web, these stickers went viral and became a global meme. In Fairey's online Manifesto, he describes OBEY as an "experiment in phenomenology" designed to "reawaken a sense of wonder about one's environment." [23] OBEY is jarring and frustrating. Nobody likes to be commanded to do anything, let alone, simply obey. On the surface, OBEY does not refer to anything. That is the point. If people take the time to reflect on OBEY, they realize its irony, which addresses the idea that all commercial ads are trying to get us to obey. In Fairey's words, the intention of the OBEY experiment was to "stimulate curiosity and bring people to question both the sticker and their relationship with their surroundings." [23] Deep Angel extends the essence of OBEY to Al applied to content generation and media manipulation.

1.3 Artificial Intelligence and Media Manipulation

The recent emergence of artificial intelligence (Al) powered media manipulations has widespread societal implications for all fields and for journalism [17], democracy [17],

2 They Live was based on a short story, Eight O'Clock in the Morning by Ray Nelson.

19 national security [3], and art [31] in particular. On one hand, AI has the potential to scale misinformation to unprecedented levels by creating various forms of synthetic media. For example, Al systems can synthesize realistic video portraits of an individual with full control of facial expressions including eye and lip movement [26,39,59,64,65]; Al systems can clone a speaker's voice with few training samples and generate new natural sounding audio of something the speaker never previously said [5]; Al systems can synthesize visually indicated sound effects [50]; Al systems can generate high quality, relevant text based on an initial prompt [53]; Al systems can produce photo-realistic images of a variety of objects and combinations of objects from text inputs [14,38,48]; Al systems can generate photo-realistic videos of people expressing emotions from a single image [8]. On the other hand, these generative Al systems can offer new creative tools for artists and practitioners alike. For example, the Creative Adversarial Network learns art by its styles and generates new art by deviating from the styles' norms [21], while interactive GANs (iGANS) can offer applications for artists and designers to explore new ideas [16]. These examples highlight the diversity, automation, and scale of content generation in the age of Al. La plus ga change, la plus c'est lamme chose. Media manipulation is not new. In fact, it goes by many names - , fake news, misinformation, truthiness. For a particular kind of media manipulation, there's a modern Latin term, damnatio memoriae, that refers to the erasure of an individual from official accounts, often in service of dominant political agendas. The earliest known instances of damnatio memoriae were discovered in ancient Egyptian artifacts and similar patterns of removal have appeared in most image-based societies across time and space since [25,66]. Figure 1-3 presents iconic examples from recent history of individuals removed from photographs in this same fashion with an aim towards advancing a particular political

20 agenda. Beyond the philosophical and political concerns, scalable media manipulation has practical concerns. In 1986, the Whole Earth Review published its 47th issue focusing on the state-of-the-art technology for image manipulation. The review includes the following excerpt from a fictional legal trial, which speaks to both the foresight of the publication and how media manipulation has long been a technological concern:

Your Honor, we cannot accept this photograph in evidence. While it purports to show my client in a hotel bedroom with a woman not his wife, there is no way to prove the photograph is real. As we know, the craft of digital retouching has advanced to the point where a "photograph" can represent anything whatever. It could show my client in bed with Your Honor.

To be sure, digital retouching is still a somewhat expensive process. A black-and-white photo like this, and the negative its made from, might cost a few thousand dollars to concoct as fiction, but considering my client's social position and the financial stakes of this case, the cost of the technique is irrelevant here. If Your Honor prefers, the defense will state that this photograph is a fake, but that is not necessary. The photograph could be a fake; no one can prove it isn't; therefore it cannot be admitted in evidence. Photography has no place in this or any other courtroom. For that matter, neither does film, videotape, or audiotape, in case the plaintiff plans to introduce in evidence other media susceptible to digital retouching.

-Some lawyer, any day now [2]

While this is a fictional excerpt that is three decades old, it speaks to the changing

21 nature of media in the age of generative neural networks. Historically, visual and audio manipulations required both skilled experts and a significant investment of time and resources. Today, an Al can produce photorealistic manipulation nearly instantaneously and at scale. This new capability poses an existential threat for standards of evidence, and thus, the changing technology calls for an examination of humans' ability to discern Al generated manipulations and how society trusts media.

Recently, research institutions have applied the precautionary principle to the dissemination of media manipulation technologies. For example, Google withheld the discriminator for their BigGAN model while publicly hosting the generator for anyone to play with. [14] BigGAN can generate realistic appearing objects in images. [14] Similarly, OpenAl restricted access to their GPT-2 model while open-sourcing a pared down model trained with fewer parameters. [53] GPT-2 can generate a plausible story given an initial prompt. [531 Withholding access to Al models prevents the general population and research community from further evaluating these Al models. Technical know-how is not enough for replication of these kinds of models. The largest barriers are the computational costs and access to appropriate data. If so desired, a well-resourced state actor could overcome these barriers. As such, important questions arise at the intersection of Al and content generation: how should we apply the precautionary principle in the field of Al research and can we adapt our ability to detect fakes produced by increasingly sophisticated Al models?

22 Figure 1-3: Photographic manipulation has been a tool of fascist governments in nefarious attempts to subvert reality. On the top left, Joseph Stalin is standing next to Nikolai Yezhov who Stalin later ordered to be executed and disappeared from the photograph. On the top right, Mao Zedong is standing beside the .Gang of Four" who were arrested a month after Mao's death and subsequently erased from this historic photograph. On the bottom, Benito Mussolini strikes a heroic pose on a horse while his trainer holds the horse steady. The photographic manipulation showcases Mussolini's skill for manipulating the facts and covered up his lack of horsemanship.

As a medium, photography connects us to moments, places, and interactions that we might never dream of seeing. From an Earth-rise over the moon to a black hole in outer space to Martin Luther King Jr. in front of the Lincoln Memorial during the March on Washington for Jobs and Freedom, photographs offer a us a chance to examine a moment frozen in space and time. Beyond serving as evidence and insight, photography is a medium for inducing empathy. Photography be "used to stimulate a moral response" [62]. For example, consider the moral outrage you feel when looking at the photograph of Phan Thi Kim Phuc running naked on a road after being burn by a South Vietnamese napalm attack.

23 This visceral feeling of moral outrage in response to the atrocities of war is absolutely justified. However, this moral response can be artificially co-opted. As an example, consider the Time Magazine cover photo of OJ Simpson that dramatically darkened his face and evoked a sinister perception by magazine readers. The problem is Time Magazine intentionally took advantage of how we process images. In the words of Richard Misrach, "Every medium creates a primary illusion... the novel creates an illusion of memory; music creates the illusion of passing time; drama creates the illusion of history... photography creates the primary illusion of fact." [2] We need to be careful not to assume fact in photography.

24 Chapter 2

Design via Negativa

Does this spark joy?

Marie Kondo

If we assumed that both (a) all aspects of what it means to be human are amenable to empirical study and (b) our language is comprehensive enough to explain anything and everything, then we would be able to describe the world strictly through positive statements. But, neither assumption is realistic. There is so much we do not know and cannot know. Via negativa - describing the world by what it is not - can help us navigate this uncertainty.

In the theory of special relativity from physics, the Heisenberg uncertainty principle states the position and velocity of a subatomic particle cannot both be known simultaneously. Likewise, Gdel's incompleteness theorems show that a complete and consistent set of axioms for cannot exist. If we can accept the theory of special relativity or Gdel's incompleteness theorems, then we admit that uncertainty and unknowability are an intrinsic part of the universe. Furthermore, language cannot communicate everything. For example, we often describe experiences

25 with "you had to be there." In a 1964 case before the Supreme Court, Justice Potter Stewart explained that he could not explicitly describe . Instead, he said, "I know it when I see it." Sometimes, we do not even know something when we see it. Behavioral science is a field that studies these kinds of blind spots. One particular psychological bias is known as the endowment effect. [36] Without naming it as such, Aristotle described this bias as follows: "For most things are differently valued by those who have them and by those who wish to get them: what belongs to us, and what we give away, always seems very precious to us." [6] In other words, we attach additional value to objects that we own simply because we own the objects. Overtime, the endowment effect leads to material overload because we continue to accumulate things and falsely attach value to things we do not truly value. As a solution to material overload, a Netflix-series tagline turned 2019 Internet meme asks, "Does this spark joy?" If something does not spark joy, then the answer - via negativa - is to throw it out. The essence of this meme is age-old wisdom. In Revelation 3:14, the last book of the Gospel of John, God sends a message to the angel of the Church of Laodicea: "So, because you are lukewarm, and neither hot nor cold, I will spit you out of my mouth." Life is not worth tepid enthusiasm. When we recognize our world drifting to the lukewarm clutter, it is our moral obligation to call it out.

2.1 Angelus Novus

Angelus Novus, a monoprint by Paul Klee, embodies a call-to-arms against the lukewarm clutter of thoughtless progress. Walter Benjamin described the monoprint as follows:

Angelus Novus shows an angel looking as though he is about to move

26 away from something he is fixedly contemplating. His eyes are staring, his mouth is open, his wings are spread. This is how one pictures the angel of history. His face is turned toward the past. Where we perceive a chain of events, he sees one single catastrophe which keeps piling wreckage upon wreckage and hurls it in front of his feet. The angel would like to stay, awaken the dead, and make whole what has been smashed. But a storm is blowing from Paradise; it has got caught in his wings with such violence that the angel can no longer close them. The storm irresistibly propels him into the future to which his back is turned, while the pile of debris before him grows skyward. This storm is what we call progress. [11]

From Benjamin's perspective, technology is going awry. A monolithic catastrophe is brewing. While the angel is blind to the future, he can see the past as a growing pile of material overload. Angelus Novus' desire to warn the world about these visions sparked the conception of Deep Angel and led to research into past activist movements to revealing the state of reality by showing what it is not.

27 - -~ -~- - I,

* 1~4 i~j~

'I -le-

P

r -t ;

ii.

Il

V

28 Figure 2-1: Angelus Novus by Paul Klee. 1920. 2.2 Detournement

In logic, there's a Latin phrase reductio ad absurdum for a form of counterargument that makes its case by examining the extremes to which an argument's premise leads. One way to reconsider the future is to take the present to its logical extreme and examine what you see. Dtournement (French for hijacking or rerouting) is an activist technique combining reductio ad absurdum with via negativa to subvert existing power structures. In the 1950s, Guy Debord and the Situationist Internationals, saw the system of capitalism as reducing life to commodified experiences. Similar to Angelus Novus' gaze into a "single catastrophe", Debord warned humanity of the spectacle whereby a superficial manifestation of a value system gone awry takes control of individuals' agency and rules them as passive subjects controlled by a set of commodities 119]. As a way out of the spectacle, detournement inverted the dominant media culture upon itself to create and provoke thought. In detournement's wake sprung the movement of the 1980s from which They Live and OBEY draw their subversive roots.

Before d6tournement, there was Dada and Surrealism. Marcel Duchamp in- famously described his ready-made art as an antidote to the "retinal" art of his contemporaries that was only pleasing the eyes. [7] DuChamp's intention was to engage the mind. Today, Banksy's artwork continues the tradition of subverting traditional power structures by inverting their forms and functions.

Closely related to the counter-environment movements of the 20th century, Diego Velazquez (the original selfie-taker) pioneered the method of flipping art from the spectacle of the scene onto the spectacle of the seeing. [24] In Las Meninas, you are looking at a painting in which the artist is staring back at you. Las Meninas is not like other paintings. As a viewer of the painting, you are triggered to think

29 differently. Rather than contemplate the scene, you start to think about how the artwork perceives you. Or you consider how the media consumes you rather than you consuming the media. The process of meta-thinking is a key aspect of the via negativa, and these thought processes frequently arise in mystical approaches to understanding the Divine.

2.3 Suchness

In theology, via negativa is referred to as the apophatic approach to the Divine. Apophatic theology describes God by what God is not. Its opposite, cataphatic theology, approaches God by affirmations about what God is. Zens Koans are seemingly paradoxical statements or questions that are designed to help truth-seekers release themselves from self-deception. In a discussion on suchness, the Buddhist philosopher, D.T. Suzuki, once expalined that "to be absolutely nothing is to be everything. When one is in possession of something that something will keep all other somethings from coming in." [45] The underlying idea is that the exhaustion of absence is the presence of suchness. Once we tune in, we see this suchness everywhere. In order to further explain this concept, D.T. Suzuki tells a story:

Student: Am I in possession of Buddha consciousness?

Guru: No.

Student: Well, I heard that all things are in possession of Buddha con- sciousness. The stones, the trees, the flowers, the birds, the animals, and all beings.

30 Guru: Yes, you are correct. All things are in possession of Buddha consciousness. The stones, the flowers, the bees, the birds, but not you.

Student: Why not me?

Guru: Because you're asking the question. [15]

By asking the question, the student is focusing on the knowledge of himself as separate from the universe and all things. Once he begins to live in the knowledge of himself as transcendent, he will come into being with Buddha-nature. [15] Sometimes, via negativa is most effective when it is apparent and explicit.

2.4 Negative Space in Art

What happens when people are removed from photographs? Two artists, Adrian Piper and Paul Pfeiffer reframe what we have previously seen as an act of totalitarian media manipulation. In one of Piper's pieces from her Everything series, she creates a photographic palmipsest, which effaces two people's faces and writes "Everything will be taken away" over the effaced portion of the photograph. Her series draws from the following quote from Aleksandr Solzhenitsyn's Two Hundred Years Together: "You only have power over people so long as you don't take everything away from them. But when you've robbed a man of everything, he's no longer in your power - he's free again." Piper in her art and Solzhenitsyn in his prose suggest that once we lose everything, we become free again. In other words, absence creates presence. From another perspective, Pfeiffer's Four Horseman of the Apocalypse series examines how the world looks like when we remove particular aspects. In one piece, he removes the basketball and the other players on the court leaving Bill Russell by

31 himself in mid-air going for a jump ball. What would have been obscured by the presence of everything else becomes vivid and prominent. In the middle of an empty court in front of an audience of thousands, Russell appears as Christ on an invisible crucifix. What is notable here is that the absence of a few components offers a new perspective and important question: why does removing things from a scene allow us to see something new in an old scene?

Figure 2-2: Examples of Negative Space in Paul Pfeiffer's Four Horsemen of the Apocalypse and Adrian Piper's Everything

32 Chapter 3

Engineering Computer Vision for Human Vision

The soul without imagination is what an observatory would be without a telescope.

Henry Ward Beecher

3.1 Machine Vision

Convolutional neural networks have surpassed human-level accuracy in a variety of

object recognition and detection tasks [30,40]. Likewise, recent image generation variants of the generative adversarial network are capable of producing high-quality natural images [28,37]. If can successfully detect objects and generate new scenes, what are the limits of computers to re-write history by making subtle yet automatic changes to photographs? In particular, how well can computers replace objects with a plausible background given the context of the photograph? If computers are reasonably successful at imagining the scenery behind objects, then

33 how well can computers conjure objects back into scenes? Are computers creative? What kind of metaphors does algorithmic omission create? These are the high-level questions that guide the machine learning portion of my thesis. I developed two end-to-end neural networks to address these questions: (1) targeted object removal and (2) unanchored object conjuring. The target object removal network is intended to detect objects and erase them from images. The unanchored object conjuring is intended to reverse the target object removal and reconstruct objects in an image.

3.2 Related Work

3.2.1 Object Detection and Instance Segmentation

Over the past decade, convolutional neural networks have dramatically improved computational performance in object detection and instance segmentation [40]. Object detection and instance segmentation provide an automatic and scalable method to identify objects in images and separate the objects from the background and each other. Today, the state-of-the-art convolutional neural network for object detection and instance segmentation is Mask R-CNN, which builds upon an series of convolutional neural networks: R-CNN, Fast R-CNN, and Faster R-CNN 130]. The neural network is a region-based network that identifies a manageable number of potential object regions and evaluates each region with a convolutional neural network. Once the network identifies the bounding box of an object, it segments the object from its local bounding box.

34 3.2.2 Image Inpainting

Image inpainting refers to the filling in of missing pixels in an image. The first image inpainting algorithm introduced a directional image propagation scheme to refill selected portions of images [12]. Adobe commercialized inpainting as a feature called Content-Aware Fill in Photoshop. Adobe's Content-Aware Fill is powered by the Patch Match algorithm, which enables portions of an image to be removed and replaced with an approximate nearest neighbor image patch [9]. Patch Match is particularly adept at matching stationary backgrounds with uniform texture, but it often fails on non-stationary cases where both objects overlap with other objects and the space could be described with a semantic representation. Previous solutions to handle non-stationary cases rely on copying patches from similar scenery from a large database of images rather than imagining a wholly new patch 129].

In the last year and a half, dilated convolutional neural networks trained with an adversarial loss function have demonstrated a dramatic improvement in image inpainting performance [33,71]. These networks leverage semantic information learned from large-scale datasets to handle stationary and non-stationary backgrounds and "imagine" missing content in the masked portion of the image. Further refinement of this architecture includes a contextual attention layer to capture non-local dependen- cies in images and gated convolutions to handle the inpainting of non-rectangular, freeform masks [41, 70, 71]. The size, variety, and quality of the datasets used for training these neural networks determine the inpainting quality. These neural net- works can be trained on scenery or human faces, and they perform well on similar sets of images to which they were trained.

One month after the launch of Deep Angel, Towards Data Science, a Medium publication, published a blog post comparing ground truth images to three versions of

35 inpainting: (1) human artists (2) neural networking inpainting (3) non-neural network inpainting [22]. In subjective quality scores of image inpainting, the unaltered image has the highest score followed by artists [22]. The highest performing computer vision inpainting model is generative inpainting trained on the MIT Places 2 dataset, which is the model used in the Target Object Removal pipeline [22].

M Ground truth M Human artist M Neural method M Non-neural method Overall (3 imagesJ Overall (33 images) Ground truth______Ground truth Artist *2 Artist#u Generative inting (Places2) Artist~ AdobePhotoshop CS5 Generative ntinPlaces? Generative In aintinp((maeNet) A oehtshop CS5 Statitcs of ath~st statisticsofPatch Ofsets Exemplar-Based (patch 13 piels) Eaenilar-Based (patch 13 pixes) Partial Convolutions EaepilarBased (patch 9pixels Hig-Resolution Neural Inpainting Generatienint mi obally andLocally Consistent Hih-Resolution Neuralein ntng Shift-Net lobally and Locally CoistentIn Deep Image Prioro shift-Net 0 1 2 DeepImage Prior0 1 2 3 4 5 0 1 2 3 4 Ground truth Neural method Non-neural method

Figure 3-1: Comparisons of inpainting algorithms. Image graphics from Mikhail Erofeev's Image Inpainting Humans vs. AI [22]

3.3 Target Object Removal

I engineered a Target Object Removal pipeline to remove objects in images and replace those objects with a plausible background. I combine a convolutional neural network (CNN) trained to detect objects with a generative adversarial network (GAN) trained to inpaint missing pixels in an image [28, 30,37,40]. Specifically, the model generates object masks with a CNN based on a RoIAlign bilinear interpolation on nearby points in the feature map [30]. RoIAlign bilinear interpolation is a technique used in Mask R-CNN for preserving spatial locations of object instances within a convolutional neural network. [30] Interpolation is a technique for constructing new data points within a range of discrete known data points. Bilinear interpolation is a technique for constructing these data points based on interpolating two variables. If we have

36 four points, then we can write a solution to the bilinear interpolation problem as f(x, y) = ao + a1 x + a2y + a3xy. After generating object maskes, the pipeline crops the object masks from the image and apply a generative inpainting architecture to fill in the object masks [33,71]. The generative inpainting architecture is based on dilated CNNs with an adversarial loss function which allows the generative inpainting architecture to learn semantic information from large scale datasets and generate missing content that makes contextual sense in the masked portion of the image. The end-to-end targeted object removal pipeline consists of three interfacing neural networks:

e Object Mask Generator (G): This network creates a segmentation mask X= G(X, y) given an input image X and a target class y. In our experiments, G is initialized from a semantic segmentation network trained on the 2014 MS-COCO dataset following the Mask-RCNN algorithm 130]. The network generates masks for all object classes present in an image and the pipeline selects only the correct masks based on input y. This network was trained on 60 object classes.

" Generative Inpainter (I): This network creates an inpainted version Z I(X, X) of the input image X and the object mask X. I is initialized following the DeepFill algorithm trained on the MIT Places 2 dataset [71,72].

" Local Discriminator (D): The final discriminator network takes in the in- painted image and determines the validity of the image. Following the training of a GAN discriminator, D is trained simultaneously on I where X are images from the MIT Places 2 dataset and Xare the same images with randomly assigned holes following [71,72].

37 For every input image and class label pair, an object mask is generated using G, which is paired with the image and inputted to the inpainting network I that produces the generated image. The inpainter is trained from the loss of the discriminator D, following the typical GAN pipeline, which can be understood as an adversarial training process by a generator and discriminator. An illustration of our neural network architecture is provided in Figure 3-2.

Spatially Discounted f Loss

Class Box

-o Global Discriminator Real or Fake

Input Rol Align Mask Dilated Convolution Coarse Result Contetual Attention Inpainting Result

efneentNetwork Local Discriminator

Object Mask Generation Generative Inpainting

Figure 3-2: End-to-end pipeline for Tarqet Object Removal following [30 71]

3.4 Unanchored Object Conjuring

If objects can be plausibly removed from images, then it is reasonable to imagine objects can be plausibly generated in an image from which they never existed. We approached adding objects to images using image-to-image translation with conditional adversarial networks 134].1 These neural networks learn a mapping from an input image to an output image. Based on pairs of user submitted images as outputs and their resulting manipulations as inputs, we (Zivvy and I) trained a generative model

'The development of Unanchored Object Conjuring and Al Spirits was produced by a collaboration between Zivvy Epstein and Matt Groh. Zivvy and Matt jointly conceived the idea, Zivvy developed a high-quality image curation tool and curated images, Matt wrote the script to run the neural network. Manuel Cebrian and Iyad Rahwan were executive producers.

38 that can partially bring back missing objects in images. The latent structure of the input images is encoded in information like edges, shape, size, texture, and color that are anchored across contexts. By applying image-to-image translation to the results of the Target Object Removal pipeline, we force the model to learn both the structural representation for removed objects and their contextual location. We call this process Unanchored Object Conjuring.

In October, we filtered all images uploaded to Deep Angel to 5,634 images where people were selected to be removed. We manually filtered these images to the 1000 best manipulations based on qualitative judgements. Then, we resized and cropped images to 1024 x 1024. We trained these images following the pix2pixHD image-to- image translation architecture. Figure 3-3 shows the architecture for this extended Unanchored Object Conjuring pipeline.

Spatially Discounted t Loss G1 G2 G1

Class Box

- Global Discriminator Residual | _. Blocks Residual

Input Rol Align Mask Dilated Convolution Coarse Contexta Atten on Inpanting or Fake Coarse Network Result LayerwitDIlale Resl 2xdwsmln Convolution Refinement Network Local Discriminator

Object Mask Generation Generative Inpainting Image-to-image translation

Figure 3-3: End-to-end pipeline for Unanchored Object Conjuring following [30,34,71]

39 3.5 Deep Angel

3.5.1 Interaction Design

In collaboration with lawyers, designers, and colleagues at Scalable Cooperation, I de- signed an interactive website called Deep Angel to make the Target Object Removal ar- chitecture publicly available. 2 The website is hosted at https://deepangel.media.mit.edu.

Deep Angel offers two main user interactions: (1) users can upload their own images and evaluate how the AI transformed the image and (2) users can guess which images on the website have been manipulated. Figure 3-4 contains screenshots of these two interactions.

In the first interaction, users select one of sixty objects to remove and either upload an image from their computer or select an Instagram account from which to transform the first three images. After the user submits his or her selections, Deep Angel returns both the original image and a transformation of the original image with the selected objects removed.

In the second interaction, users are presented with an image manipulated by the Target Object Removal architecture and an image from the 2014 MS-COCO dataset. Users are instructed to select the image that has something removed by Deep Angel. After the user makes a selection, Deep Angel reveals which image was manipulated and offers the user the opportunity to guess again on a new pair of images.

2 We retained the Cyberlaw Clinic from the Harvard Law School and Berkman Klein Center for Internet & Society to advise and support Deep Angel. Micah Epstein and Julian Kelly designed graphics and the UI/UX framework for the website. Together, Zivvy Epstein, Manuel Cebrian, and I conceived the idea for Deep Angel. Nick Obradovich and Iyad Rahwan provided valuable insights and support throughout the process. Nick suggested the idea to include a fake detection feature on the website. I performed the machine learning engineering and backend development. I also extended Micah and Julian's frontend code for a variety of additional features.

40 Figure 3-4: Screenshots of Deep Angel's user interface.

3.5.2 Backend Architecture

The the architecture for the Deep Angel website is diagrammed in Figure 3-5. We uses NGINX, a highly stable web-server, to serve a Flask application, which is a Python-based web framework. The Flask application has privileged access to an external API providing access to the Target Object Removal architecture. This API is hosted on a single Nvidia Geforce GTX Titan X GPU.

When a user uploads an image, the image is uploaded to Amazon's S3 file storage system and the S3 URL for the image is sent to the Target Object Removal API. Next, the API transforms the image, saves the manipulated images to S3, returns the S3 URLs of the manipulated images to Flask, and saves all relevant data to a relational database. Likewise, when a user selects an Instagram account, the API crawls Instagram, saves the first three images of that Instagram user to S3, and repeats the same process as when an image is uploaded.

When a user is interacting with the fake detection interface, a pair of images are randomly selected for display, users select an image, the users' selection is saved to the relational database, the correct selection is revealed to the user, and a new pair of images are randomly selected for display.

41 Users

NGINX

PostGRES Flask Flask(AWS RDS)

Target Object Removal F ile Storage API (AWS 53)

Figure 3-5: Diagram of Deep Angel's server architecture

3.5.3 Live Deployment

We publicly launched Deep Angel on August 28th, 2018 on Product Hunt, a website that curates the best new products on the Internet. 930 people upvoted Deep Angel and it was awarded #1 Product of the Day and #3 Product of the Week on Produce Hunt. The next week Deep Angel reached the top of Hacker News. Within the first few months of launching, Deep Angel was covered by the New York Times, Le Monde, Fast Company, Digg, Artsy, Aeon, and other media outlets [1,18,46,52,58,67]. From

42 August 2018 to April 2019, over 100,000 people from across the world visited the website. With data on all these user interactions, we can begin to explore the science of deception.

43 44 Chapter 4

Science of Deception

The amount of energy needed to refute bullshit is an order of magnitude bigger

than to produce it. Alberto Brandolini

Empirically-speaking, how do people interact with Deep Angel? What kind of images do people upload? How well does the Target Object Removal pipeline work?

Are all manipulations plausible? Are any? How often do image removal manipulations fool people? When do they fool people? Here we describe how people used Deep

Angel and I apply methodologies from statistics and psychophysics to understand how people adapt to media manipulations.

4.1 User Interactions

Users uploaded 16,755 unique images from mobile phones and computers. In addition, user directed the crawling of 10,866 unique images from Instagram. The most frequently selected objects for removal are displayed in Table 4.1. This table contains

45 Image Uploads Instagram Object Count Order Object Count Order Person 12293 1 Person 6606 1 Car 1195 6 Cat 697 2 Cat 1037 2 Dog 467 3 Dog 1032 3 Elephant 162 4 Elephant 175 4 Car 157 6 Bicycle 152 7 Bicycle 70 7 Bird 132 22 Sheep 51 5 Tie 113 31 Stop Sign 29 8 Airplane 100 13 Airplane 28 13 Stop Sign 90 8 Skateboard 24 10

Table 4.1: Top 10 Target Object Removal Selections for Uploaded Images and Targeted Instagram Crawls on Deep Angel. Each Instagram username selection initiated a targeted crawl of Instagram for the three most recently uploaded images of selected user.

both the number of images from which the object was removed and the order in which the object appeared in the pull down menu. For the image uploads and Instagram directed crawling, seven and nine, respectively, of the first ten objects listed in the user interface were in the ten most frequently selected objects. While it's difficult to disentangle the choice architecture created by an ordered list from users' preferences of what objects to disappear, there appears to be a high propensity for users to choose to remove people.

The overwhelming majority of images uploaded and Instagram accounts selected were unique. 88 percent of the usernames entered for targeted Instagram crawls were unique. The most frequently selected Instagram accounts were cats__of instagram (25), kimkardashian (23), and realdonaldtrump (19).

46 Figure 4-1: Examples of original images uploaded to Deep Angel and corresponding manipulations.

4.2 Quality Evaluations

The evaluation of generative adversarial networks (GANs) for images is complicated. There is no single best quantitative metric for evaluating the performance of a GAN. [13] There exist at least 26 quantitative and qualitative measures for evaluating GANs trained on images. [13] These include metrics like the Inception Score and Frechet Inception Distance, which work well on data from the ImageNet dataset but have been discredited for evaluation on other datasets. [10, 13,32, 55,56,60, 73] In a paper that I coauthored earlier this year, we explain the problem of evaluating a generative model on images is the assumption that the dataset of images is a reasonable proxy for "the family of distributions from which it was sampled." [49] In light of the limitations of quantitative metrics, the evaluation of the quality of a GANs relies on human judgements [14,20,38,57,60].

Drawing on methods from psychophysics, Human Eye Perception Evaluation (HYPE) metric offers a structured, validated method for comparing the quality of GAN generated images. [73] I measure performance based on HYPEO, the rate at which people accurately identify manipulated images and real images without any time limitation. Following the HYPE method and standard practice in the psychophysics literature, the Deep Angel interface highlights which image was manipulated by revealing what was disappeared from the manipulated photograph after a user

47 guesses. [73] By examining the most frequently misidentified images, it is possible to surface extremely plausible object removal manipulations. Figure 4-1 presents two pairs of the most frequently misidentified images. However, most images uploaded by users are not plausible manipulations. Specifically, 62% of images are correctly identified as manipulated by users for over 90% of guesses. This result should not be surprising because successful manipulations with the Target Object Removal pipeline require that the image fit several conditions e.g. the object is relatively small and the background is not too complex. Figure 4-2 presents the distribution of accurate guessing across images, which shows that plausible manipulations are very image dependent.

0.06-

0.05-

.9 0.04-

.2 0.03- t 0.02-

0.01-

0.00-

0.0 0.1 0.2 0 3 0.4 0.5 0'6 0.7 0.8 Percent Guessed Wrong

Figure 4-2: Probability density function displaying the accuracy of guesses over images

As users are exposed to image manipulations on Deep Angel, they learn to spot the manipulations. Figure 4-3 shows the relationship between guessing accuracy and the number of images that a user has seen. 72% of users accurately identify the manipulated image on the first guess and 88% accurately identify the manipulated image on the tenth guess.

48 0.92- 0.90- 0.88- 0.86- U 0.84- o0.82 0.80- 0.78 -

0 10 20 30 40 50 # of Images Seen

Figure 4-3: Accuracy of guesses over exposure to manipulated images with a 95% confidence interval.

I designed Deep Angel with two levels of randomization. First, each pair of images presented to a user contains a randomly selected altered and unaltered image. The altered image is randomly selected from 418 images manipulated by Deep Angel that users submitted to be shared publicly. The unaltered image is randomly selected from 4,472 images from the MS-COCO dataset. Second, once the pair is selected, I randomly assign the placement of the images to each other. Nearly half of the time, the altered image appears on the right. The other half of the time, the unaltered image appears on the right. Most users interacted with this interface multiple times. Each interaction followed the same randomization design and the images displayed to users did not depend on what the user had previously seen.

In this randomization scheme, randomization at the image level is equivalent to randomization of the order of images that each user sees. Based on the randomized order of images, it is possible to causally test the effect of image order on accuracy of users' guesses. I test the causal effects with the following fixed effects logistic regression:

49 i,j = aX~i±+ T" + -yT 10+ + V + eij

where Qij is the HYPE, (predicted accuracy) of user j on image i. Xj represent a matrix of covariates, Ti, represents the order n (up to 10) in which image i appears to user j, Ti 10 represents an indicator of whether the order n in which the image was seen was greater than 10, si represents the image fixed effects, vj represents the user fixed effects, and E>j represents the error term. Based on a visual inspection of the data that reveals a kink in the relationship between images seen and HYPE,, I split the treatment variable into two separate variables at the kink point. The viral spread of Deep Angel on the Internet provided a sufficient sample size to evaluate the order based effects. Users submitted 201,128 guesses with a mean HYPE, of 86%. Users who submitted guesses represent 13,600 unique IP addresses. Deep Angel did not require user sign on, so I assume each IP address represents a single individual. The median number of guesses per user is 8 and the interquartile range of number of guesses is 3 to 16. 6,040 users submitted guesses for at least 10 images. Each image appears as the first image an average of 29 times and the tenth image an average of 14 times.

Results

With 180,469 observations (restricting the set of images to images uploaded before February 1st 2019), I run a logistic regression with user and image fixed effects on the likelihood of guessing correctly and present the results in Table 4.2. Each column in Table 4.2 adds an incremental filter to offer a series of robustness checks. The first column shows all observations. The second drops all images where nothing was

50 disappeared. The third column drops all users who submitted fewer than 10 guesses. The fourth column drops all observations where a user has already seen a particular image. The fifth column drops all images qualitatively judged as below very high quality.

The treatment variables, Ti, and Ti 0 , are labeled 'Number of Images Seen (up to 10)' and 'More than 10 Images Seen,' respectively. Across all five robustness checks, I find a statistically significant relationship between Ti, and yij and the coefficient # on Ti, is equal to about 0.01. In other words, users improve their HYPE by 1 percentage point on average for each of the first 10 guesses. I find a statistically significant relationship between Ti, and Ij in the first four regressions and do not find a statistically significant significant relationship in the fifth regression. For the first four regressions, the interpretation of this relationship is that I find the HYPE, on the the eleventh image and beyond to be 1 percentage point less than the guessing accuracy on the tenth image seen. For the fifth regression, I do not find a relationship between seeing more than 10 images and HYPEc. This statistically significant improvement suggests that within the context of Deep Angel exposure to media manipulation can successfully prepare people to detect fakes. The covariates Xi, are comprised of the non-treatment variables in Table 4.2. The placement of the manipulated image has less than a 1% effect on the accuracy of guesses, which suggests that users were not systematically clicking the left or right image. Likewise, we do not find a relationship between whether user views Deep Angel on a mobile phone or computer and guessing accuracy. The height and width of an image appear to have minimal effect on the accuracy. Consistent with a qualitative inspection of the images, a number of image features help explain what makes a Deep Angel manipulation believable. The proportion of image that is masked and then replaced is important for determining the likelihood

51 (1) (2) (3) (4) (5) Number of Images Seen (up to 10) 0.0102*** 0.0112*** 0.0111*** 0.0110*** 0.0095** (0.0004) (0.0004) (0.0004) (0.0004) (0.0045) More than 10 Images Seen -0.0099*** -0.0107*** -0.0103*** -0.0115*** 0.0011 (0.0027) (0.0027) (0.0027) (0.0028) (0.0264) Manipulation on Right Side -0.0068*** -0.0067*** -0.0063*** -0.0047*** -0.0200 (0.0015) (0.0015) (0.0015) (0.0016) (0.0138) Mobile Phone 0.0013 -0.0016 -0.0024 -0.0011 0.0102 (0.0022) (0.0021) (0.0022) (0.0023) (0.0206) Proportion Masked 0.1764*** 0.1370*** 0.1392*** 0.1399*** 0.0414*** (0.0057) (0.0056) (0.0057) (0.0061) (0.0130) # Selected Objects Disappeared -0.0043*** -0.0057*** -0.0059*** -0.0058*** -0.0439*** (0.0004) (0.0004) (0.0004) (0.0004) (0.0083) # of Other Objects -0.0017* 0.0148*** 0.0151*** 0.0149*** -0.0478*** (0.0009) (0.0012) (0.0012) (0.0013) (0.0113) # of Distinct Object Categories -0.0002 -0.0229*** -0.0236*** -0.0229*** 0.0642*** (0.0018) (0.0021) (0.0021) (0.0022) (0.0170) Image Height -0.0000 -0.0000 -0.0000 -0.0000 0.0000 (0.0000) (0.0000) (0.0000) (0.0000) (0.0000) Image Width 0.0001*** 0.0001*** 0.0001*** 0.0001*** 0.0005*** (0.0000) (0.0000) (0.0000) (0.0000) (0.0001) Entropy 0.0048*** 0.0005 -0.0004 -0.0004 0.1287*** (0.0013) (0.0014) (0.0014) (0.0015) (0.0253) N 180469 166951 143375 128288 6440 Mean Accuracy on 1tImage 0.72 0.73 0.78 0.78 0.72 Mean AccuracyonlthImage 0.88 0.91 0.91 0.91 0.81 Rt 2 0.28 0.26 0.18 0.19 0.65

Table 4.2: Logistic regression results on guessing accuracy with user and image fixed effects. Standard errors in parentheses. *, **, and *** indicates statistical significance at the 90, 95, and 99 percent confidence intervals, respectively. All columns include user and image fixed effects. Column (1) shows all users (2) drops all images where nothing was disappeared (3) drops all users who submitted fewer than 10 guesses (4) drops all observations where a user has already seen a particular image (5) keeps only the images qualitatively judged as very high quality. of an accurate guess. Specifically, a 10 percentage point decrease in the masked proportion of the image is associated with a 0.4 to 1.8 percentage point decrease in the likelihood of guessing correctly.

52 Chapter 5

Art from Absence

All that is gold does not glitter, Not all those who wander are lost;

The old that is strong does not wither, Deep roots are not reached by the frost.

From the ashes, a fire shall be woken, A light from the shadows shall spring;

Renewed shall be blade that was broken, The crownless again shall be king. J.R.R. Tolkien

5.1 The Spirit of Wonder

The reawakening of wonder is a journey. Just like Shepard Fairey's OBEY, the aesthetics of absence has no meaning. 1 The meaning of the aesthetics of absence is

'By now, the reader is expected to be fully trained in apparent contradictions and the higher order reasoning required for a proper interpretation. This footnote exists as a forewarning for people who take the referenced sentence out of context.

53 what we create. By making salient the things that are disappearing from around us, by making us reflect on all things that we will soon miss if we stop looking, by uncrowding our thoughts, the aesthetics of absence spurs imagination and reawakens agency. Slumps are part of life's journey. Whether it is an individual or a society, slumps happen. And un-slumping is not easily done [61]. It is possible to have full information and perfect analytical capabilities yet still be stuck in a Kobayashi Maru simulation, a no-win situation. The solution is to defy the internal logic and step out of the paradigm. The flaneur is the man (or flneuse, the woman) who breaks the paradigm, sets wonder as a north star, and wanders. Flnerie is aimless wandering; it is full presence of mind with an absence of objective; it is doing nothing (other than walking outside) with a purpose (undefined discovery). When we least expect it, we stumble on an "Aha!" moment where our souls shout "Eureka!" Let's be clear. Wandering does not meaning doing nothing. It means removing the shackles of pre-defined paths and pursuing truth and beauty for its own sake. In Oliver Wendell Holmes' poem, The Flaneur, it is flAnerie that breathes light into the night. It is the love for humanity that wakes John Keats' knight-at-arms from the cold hillside upon which he once met the love of his life. While the world irrevocably changes, it is wandering that transforms the tortuous memories of lost love into a Siren singing a nostalgic tune [63]. Even the flaneur goes through slumps. In the 19th century, the flineur was once an aimless wanderer on a random walk to explore the infinitudes of cities, crowds, alleyways, solitude, and chaos. [47] As cities evolved, the dangers of vehicular traffic and the commercial beckonings of billboard advertisements and department stores broke the flAneur. [47] When the Internet first appeared, cyberfinerie, a combination

54

-H of "solitude and individuality, anonymity and opacity, mystery and ambivalence, curiosity and risk-taking," took hold. [47] But, the same forces of urban economics invaded cyberspace to create hyper-optimized, attention seeking calls to action. This broke the cyberfianeur.

5.2 The Broken Flaneur

The Broken Flaneur is a short film that was shot and produced in less than 24 hours. The short film debuted in Munich, Germany at the 2018 European Conference for Computer Vision and is hosted at the Computer Vision Art Gallery, which is an online gallery showcasing art generated with computer vision techniques.2 The special effects for the film were produced solely by applying the Target Object Removal architecture to each frame of the original footage. The Broken Fldneur asks, "What's the point of flinerie if there's nothing to see." The film holds a mirror to the protagonist's technology-induced void with a technology-generated absence. The technology-induced void is not simple absence but rather a false sense of presence. In this short film, the void is the idea that beyond the screen there is nothing left to see. Absence offers a step back, a moment of reflection, and a pause to reconsider and reflect on presence.

5.3 Al Spirits

Al Spirits is a collection of photographs injected with the spirits of people who were disappeared by Deep Angel. The collection debuted in Montr6al, Canada at the

2 The Broken Flaneur was written and directed by Zivvy Epstein and me. Manuel Cebrian and Iyad Rahwan were executive producers of the short film. May Elhazzani managed the set.

55 MORE IDEO

Figure 5-1: Screen Shot from The Broken Flncur short film. Watch it on YouTube at https: yout.hbe 1QCFAwuIUUE

2018 Neural Information Processing Symposiumand is hosted at Al Art Gallery. which is an online gallery showcasing art, music, and design using machine learning. The photograph collection was generated by applying image-to-image translation on pairs of images manipulated by the Target Object Removal architecture. In other words, one neural network architecture disappeared humans and another attempted to reconstruct the humans.

AI Spirits offers a look into an implementation of teletransportation paradox

56 gone awry. The teletransportation paradox is a thought experiment about identity in teletransportation. Consider a teletransportation machine that consists of two parts (1) a machine at one location can scan an individual's molecular composition at the atomic level and (2) a machine at another location can reconstruct an individual based on this scan. If a person enters the teletransportation machine and reappears on the other side, is the person who reappears the same person as who entered the machine in the first place? If so, what happens if there's a "bug" in the teletransportation machine and the machine accidentally reappears multiple copies of the person who entered. Likewise, what happens when there's a transformation in the reappearance process? In ancient Greece, this paradox is discussed in the myth known as the Ship of Theseus. AI Spirits presents a lossy reconstruction of humans from photographs leading the viewer to wonder about the reconstructed subjects' identity.

Figure 5-2: Photographs from the Al Spirits collection.

57 5.4 Shadows sans Substance

The Shadows sans Substance is a collection of photographs of long shadows with their corresponding living, breathing counterparts removed. The collection was on display at the inauguration of the MIT Schwarzman College of Computing and is currently under submission for a planned display in the Fall of 2019. We assembled this collection by curating photographs of people taken just after dawn and before dusk and applying the Target Object Removal architecture to each photographs to remove people. 3 Shadows sans Substance peers into a mirror world where the ephemeral becomes permanent and life becomes ephemeral. The function of this collection is to offer an orthogonal perspective on the passage of time. In our physical reality, shadows are ephemeral occlusions of light by objects. But, the reverse is true in the domain of the digital images and functionality of the Target Object Removal architecture. The underlying architecture lacks any sense of association between shadows and objects. The resulting transformations highlight the mortal nature of the individual and the lasting essence of his or her actions in this world. There's a famous line attributed to Friedrich Nietzsche (likely apocryphally) that goes, "Every eternal is but a reference, but a metaphor." [15] Shadows sans Substance inverts permanence with temporality transforming shadows into a metaphor for who we are beyond our physical bodies.

3 Joyce Feng and I curated the photographs for Shadows sans Substance using Deep Angel.

58 5" qn,

Figure 5-3: Photographs from the Shadowssans Substance collection.

59 60 Chapter 6

Conclusion

We shall not cease from exploration, and the end of all our exploring will be to

arrive where we started and know the place for the first time. T.S. Eliot

With via negativa as a guide, I started a probe into the depths of absence to reclaim presence.

Donning my design hat, I combined perspectives from Paul Klee, Walter Benjamin, Guy Debord, Shepard Fairey, Diego Velazquez, Thomas Merton, D.T. Suzuki, Susan Sontag, Adrian Piper, Paul Pfeiffer, Joseph Campbell, Oliver Wendell Holmes, and

John Keats into a generative inquiry. Rather than strictly analyze text or data, I deployed an intervention to provoke deeper thinking on the meaning of presence and the potential of Al for media manipulation.

Indeed, Deep Angel provoked. In a long-form essay in Aeon referencing Deep

Angel, John Cornwell (fellow at Cambridge University) argued that "Al calls for an updated theory of imagination for future generations" that "should involve the perspectives of artists, writers, composers, and spiritual thinkers." [18] He specifically references how Al can "help us see into the future and act to change it, so as to dispel

61 the foreboding of the angel of melancholy." [181 Here, the angel of melancholy refers to Angelus Novus. A New York Times article reflecting on AI as a dual-use technology references Deep Angel to suggest, "If machines can generate believable photos and videos, we may have to change the way we view what winds up on the internet." [46] As an intervention, Deep Angel offers an opportunity to examine individuals' adaptability to media manipulation. By randomizing the order in which images were presented to users, this thesis is able to make a causal claim about the relationship between how many manipulated photos a user has seen and a user's ability to detect manipulations. Within this particular randomized experiment, people improve their ability to guess which image had been manipulated by 10 percentage points over the first ten images seen. This experiment suggests optimism about our ability to counteract manipulation and offers an example of how to apply psychophysics to machine behavior. [54] As art, AI Spirits represents the gestalt of Deep Angel's inquiry into absence and return to presence. The aesthetics of absence is the process of the creative spirit. The aesthetics is an equal concern of beauty and truth within absence. Portuguese speakers describe saudade as an emotion particular to Lusophone culture that roughly translates to the feeling of the presence of absence. While often considered untranslateable, saudade has been described as "a feeling that manages to give, despite being a confrontation with what has been taken away. It is revelatory: when caught in saudade's grip, we become aware of that which is most important to us, that which makes us what we are." [4] True presence comes from the illumination of absence. So, we beat on like boats against the current designing, deploying, analyzing, and transcending the fractal of presence and absence.

62 Appendix A

Meta Deep Angel

63 Figure A-1: This just got meta. Mason Kortz, Joan Donovan, Jessica Fjeld, and Matt Groh are disappeared from their panel at 2019 SXSW titled "A-Powered Media Manipulation and its Consequences." 64 Bibliography

[1] Des objets disparaissent de photos en un clic grAce A del'intelligence artificielle.

[2] Flying saucers in san francisco. Whole Earth Review.

[3] Greg Allen and Taniel Chan. Artificial intelligence and national security. Belfer Center for Science and International Affairs Cambridge, MA, 2017.

[4] Michael Amoruso. Saudade, the untranslateable word for the presence of absence. Aeon.

[5] Sercan 0 Arik, Jitong Chen, Kainan Peng, Wei Ping, and Yanqi Zhou. Neural voice cloning with a few samples. arXiv preprint arXiv:1802.06006, 2018.

[6] Aristotle. Contents, pages v-v. Cambridge Texts in the History of Philosophy. Cambridge University Press, 2000.

[7] H Harvard Arnason and Elizabeth Mansfield. History of modern art: painting, sculpture, architecture, photography. Pearson, 2013.

[8] Hadar Averbuch-Elor, Daniel Cohen-Or, Johannes Kopf, and Michael F Cohen. Bringing portraits to life. ACM Transactions on Graphics (TOG), 36(6):196, 2017.

[9] Connelly Barnes, Eli Shechtman, Adam Finkelstein, and Dan B Goldman. Patch- Match: A randomized correspondence algorithm for structural image editing. ACM Transactions on Graphics (Proc. SIGGRAPH), 28(3), August 2009.

[10] Shane Barratt and Rishi Sharma. A note on the inception score. arXiv preprint arXiv:1801.01973, 2018.

[11] Walter Benjamin. Theses on the philosophy of history. 1989.

65 [12] Marcelo Bertalmio, Guillermo Sapiro, Vincent Caselles, and Coloma Ballester. Image inpainting. In Proceedings of the 27th Annual Conference on Computer Graphics and Interactive Techniques, SIGGRAPH '00, pages 417-424, New York, NY, USA, 2000. ACM Press/Addison-Wesley Publishing Co.

1131 Ali Borji. Pros and cons of GAN evaluation measures. CoRR, abs/1802.03446, 2018.

[141 Andrew Brock, Jeff Donahue, and Karen Simonyan. Large scale gan training for high fidelity natural image synthesis. arXiv preprint arXiv:1809.11096, 2018.

1151 Joseph Campbell. The hero's journey: Joseph Campbell on his life and work, volume 7. New World Library, 2003.

[16] Shan Carter and Michael Nielsen. Using artificial intelligence to augment human intelligence. Distill, 2(12):e9, 2017.

1171 Robert Chesney and Danielle Keats Citron. Deep fakes: A looming challenge for privacy, democracy, and national security. 2018.

[18] John Cornwell. Imagine if we didn't fear the machines of our own making. Aeon.

[19] Guy Debord. La socit6 du spectacle (1967). Paris: Les Editions Gallimard, 1992.

[20] Emily L Denton, Soumith Chintala, Rob Fergus, et al. Deep generative image models using a laplacian of adversarial networks. In Advances in neural information processing systems, pages 1486-1494, 2015.

[211 Ahmed M. Elgammal, Bingchen Liu, Mohamed Elhoseiny, and Marian Mazzone. CAN: creative adversarial networks, generating "art" by learning about styles and deviating from style norms. CoRR, abs/1706.07068, 2017.

[221 Mikhail Erofeev. Image inpainting: Humans vs. ai. https://towardsdatascience.com/ image-inpainting-humans-vs-ai-48fc4bca7ecc, September 2018. Accessed: 2019-04-01.

[23] Shepard Fairey. Manifesto. Obey Giant, 5, 1990.

[24] Michel Foucault. The order of things. Routledge, 2005.

66 [25] David Freedberg. The power of images: Studies in the history and theory of response. University of Chicago Press Chicago, 1989.

[26] Pablo Garrido, Levi Valgaerts, Hamid Sarmadi, Ingmar Steiner, Kiran Varanasi, Patrick Perez, and Christian Theobalt. Vdub: Modifying face video of actors for plausible visual alignment to a dubbed audio track. In Computer Graphics Forum, volume 34, pages 193-204. Wiley Online Library, 2015.

[27] Rich Gold. The Plenitude Creativity, Innovation and Making Stuff. MIT Press, 2007.

[281 Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde- Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial nets. pages 2672-2680, 2014.

[29] James Hays and Alexei A Efros. Scene completion using millions of photographs. ACM Transactions on Graphics (SIGGRAPH 2007), 26(3), 2007.

[30] Kaiming He, Georgia Gkioxari, Piotr Dolldr, and Ross B. Girshick. Mask R-CNN. CoRR, abs/1703.06870, 2017.

[31] Aaron Hertzmann. Can computers create art? In Arts, volume 7, page 18. Multidisciplinary Digital Publishing Institute, 2018.

[32] Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler, and Sepp Hochreiter. Gans trained by a two time-scale update rule converge to a local nash equilibrium. In Advances in Neural Information Processing Systems, pages 6626-6637, 2017.

[33] Satoshi Iizuka, Edgar Simo-Serra, and Hiroshi Ishikawa. Globally and Locally Consistent Image Completion. ACM Transactions on Graphics (Proc. of SIG- GRAPH 2017), 36(4), 2017.

[34] Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, and Alexei A. Efros. Image-to-image translation with conditional adversarial networks. CoRR, abs/1611.07004, 2016.

[35] William James. On Some of Life's Ideals. Henry Holt and Company, 1912.

[361 Daniel Kahneman, Jack L Knetsch, and Richard H Thaler. Experimental tests of the endowment effect and the coase theorem. Journal of political Economy, 98(6):1325-1348, 1990.

67 [37] Tero Karras, Timo Aila, Samuli Laine, and Jaakko Lehtinen. Progressive growing of gans for improved quality, stability, and variation. arXiv preprint arXiv:1710.10196, 2017.

[38] Tero Karras, Samuli Laine, and Timo Aila. A style-based generator architecture for generative adversarial networks. arXiv preprint arXiv:1812.04948, 2018.

[39] Hyeongwoo Kim, Pablo Garrido, Ayush Tewari, Weipeng Xu, Justus Thies, Matthias Nietner, Patrick P6rez, Christian Richardt, Michael Zollh6fer, and Christian Theobalt. Deep video portraits. arXiv preprint arXiv:1805.11714, 2018.

[40] Yann LeCun, Yoshua Bengio, and Geoffrey E. Hinton. Deep learning. Nature, 521(7553):436-444, 2015.

[41] Guilin Liu, Fitsum A. Reda, Kevin J. Shih, Ting-Chun Wang, Andrew Tao, and Bryan Catanzaro. Image inpainting for irregular holes using partial convolutions. CoRR, abs/1804.07723, 2018.

[42] G.E.R Lloyd. Aristotle - The Growth and Structure of His Thought. Cambridge At the University Press, 1968.

[43] John Maeda. The bermuda quadrilateral (2006). https: / /maeda. pm/2017/ 11/14/the-bermuda-quadrilateral-2006/, September 2006. Accessed: 2019-04-20.

[44] Marshall McLuhan. Counterblast. Rapp Whitting Ltd., 1970. [45] Thomas Merton. Zen and the Birds of Appetite, volume 261. New Directions Publishing, 1968.

[46] Cade Metz. Efforts to acknowledge the risks of new a.i. technology. The New York Times.

[47] Evgeny Morozov. The death of the cyberfliAneur. https: //www.nytimes.com/2012/02/05/opinion/sunday/ the-death-of-the-cyberflaneur.html, February 2012. Accessed: 2019-04-01.

[48] Anh Nguyen, Jason Yosinski, Yoshua Bengio, Alexey Dosovitskiy, and Jeff Clune. Plug & play generative networks: Conditional iterative generation of images in latent space. CoRR, abs/1612.00005, 2016.

68 [49] Shayne O'Brien, Matthew Groh, and Abhimanyu Dubey. Evaluating gener- ative adversarial networks on explicitly parameterized distributions. CoRR, abs/1812.10782, 2018.

[50] Andrew Owens, Phillip Isola, Josh H. McDermott, Antonio Torralba, Ed- ward H. Adelson, and William T. Freeman. Visually indicated sounds. CoRR, abs/1512.08512, 2015.

[51] Neri Oxman. Age of entanglement. Journal of Design and Science, 2016.

[52] Jacqui Palumbo. Ai will have the biggest impact on photography since the digital camera. Artsy.

[53] Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, and Ilya Sutskever. Language models are unsupervised multitask learners. OpenAI Blog, 1:8, 2019.

154] Iyad Rahwan, Manuel Cebrian, Nick Obradovich, Josh Bongard, Jean-Frangois Bonnefon, Cynthia Breazeal, Jacob W. Crandall, Nicholas A. Christakis, lain D. Couzin, Matthew 0. Jackson, Nicholas R. Jennings, Ece Kamar, Isabel M. Kloumann, Hugo Larochelle, David Lazer, Richard McElreath, Alan Mislove, David C. Parkes, Alex 'Sandy' Pentland, Margaret E. Roberts, Azim Shariff, Joshua B. Tenenbaum, and Michael Wellman. Machine behaviour. Nature, 568(7753):477-486, 2019.

[55] Suman Ravuri, Shakir Mohamed, Mihaela Rosca, and Oriol Vinyals. Learning implicit generative models with the method of learned moments. arXiv preprint arXiv:1806.11006, 2018.

[56] Mihaela Rosca, Balaji Lakshminarayanan, David Warde-Farley, and Shakir Mohamed. Variational approaches for auto-encoding generative adversarial networks. arXiv preprint arXiv:1706.04987, 2017.

[57] Andreas R6ssler, Davide Cozzolino, Luisa Verdoliva, Christian Riess, Justus Thies, and Matthias Niegner. Faceforensics: A large-scale video dataset for forgery detection in human faces. arXiv preprint arXiv:1803.09179, 2018.

[58] Steve Rousseau. This ai that erases things in images is the best thing online this week. Digg.

69 [59] Shunsuke Saito, Lingyu Wei, Liwen Hu, Koki Nagano, and Hao Li. Photorealistic facial texture inference using deep neural networks. CoRR, abs/1612.00523, 2016.

[60] Tim Salimans, Ian Goodfellow, Wojciech Zaremba, Vicki Cheung, Alec Radford, and Xi Chen. Improved techniques for training gans. In Advances in neural informationprocessing systems, pages 2234-2242, 2016.

[61] Seuss. Oh, the places you'll go! Random House Books for Young Readers, 1990.

[62] Susan Sontag. On photography, volume 48. Macmillan, 2001.

[63] Oliver Wendell Homes Sr. The flneur. https: / /www. poet ryf oundat ion. org/poems/44382/the-flaneur, December 1882. Accessed: 2019-04-01.

[64] Supasorn Suwajanakorn, Steven M Seitz, and Ira Kemelmacher-Shlizerman. Synthesizing obama: learning lip sync from audio. ACM Transactions on Graphics (TOG), 36(4):95, 2017.

[65] Justus Thies, Michael Zollhofer, Marc Stamminger, Christian Theobalt, and Matthias Nie1ner. Face2face: Real-time face capture and reenactment of rgb videos. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 2387-2395, 2016.

[66] Eric R Varner. Monumenta Graeca et Romana: Mutilation and transformation: damnatio memoriae and Roman imperial portraiture,volume 10. Brill, 2004.

[67] Mark Wilson. Mit's new tool erases anything (or anyone) from old photos. Fast Company.

[68] Clifton Wolters. The cloud of unknowing. Courier Dover Publications, 2018.

[69] W.B. Yeats. The Second Coming. The Dial, 1920.

[70] Jiahui Yu, Zhe Lin, Jimei Yang, Xiaohui Shen, Xin Lu, and Thomas S Huang. Free-form image inpainting with gated convolution. arXiv preprint arXiv:1806.03589, 2018.

[71] Jiahui Yu, Zhe Lin, Jimei Yang, Xiaohui Shen, Xin Lu, and Thomas S Huang. Generative image inpainting with contextual attention. arXiv preprint arXiv:1801.07892, 2018.

70 [72] Bolei Zhou, Agata Lapedriza, Aditya Khosla, Aude Oliva, and Antonio Torralba. Places: A 10 million image database for scene recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017.

[73] Sharon Zhou, Mitchell Gordon, Ranjay Krishna, Austin Narcomey, Durim Morina, and Michael S. Bernstein. HYPE: human eye perceptual evaluation of generative models. CoRR, abs/1904.01121, 2019.

71