A reprint from American Scientist the magazine of Sigma Xi, The Scientific Research Society

This reprint is provided for personal and noncommercial use. For any other use, please send a request Brian Hayes by electronic mail to [email protected]. Computing Science and Computer Hallucinations

A peek inside an artificial neural network reveals some pretty freaky images.

Brian Hayes

eople have an amazing knack images specially designed to fool the network proposes a label. If the choice for image recognition. We networks, much as optical illusions fool is incorrect, an error signal propagates can riffle through a stack of the biological eye and brain. Another backward through the layers, reducing pictures and almost instantly approach runs the neural network in the activation of the wrongly chosen Plabel each one: dog, birthday cake, bi- reverse; instead of giving it an image output neuron. The training process cycle, teapot. What we can’t do is ex- as input and asking for a concept as does not alter the wiring diagram of plain how we perform this feat. When output, we specify a concept and the the network or the internal operations you see a rose, certain neurons in your network generates a corresponding of the individual neurons. Instead, it brain’s visual cortex light up with ac- image. A related technique called deep adjusts the weight, or strength, of the tivity; a tulip stimulates a different set dreaming burst on the scene last spring connections between one neuron and of cells. What distinguishing features following a blog post from Google Re- the next. The discovery of an efficient of the two flowers determine this re- search. Deep dreaming transforms and “, which sponse? Experiments that might an- embellishes an image with motifs the quickly identifies the weights that most swer such questions are hard to carry network has learned to recognize. A need adjusting, was the key to making out in the living brain. mountaintop becomes a bird’s beak, a neural networks a practical tool. What about studying image recog- button morphs into an eye, landscapes Early neural networks had just one nition in an artificial brain? Comput- teem with turtle-dogs, fish-lizards, and hidden layer, because deeper networks ers have lately become quite good at other chimeric creatures. These fanciful, were too difficult to train. In the past classifying images—so good that ex- grotesque images have become an In- 10 years this problem has been over- pert human classifiers have to work ternet sensation, but they can also serve come by a combination of algorithmic hard to match their performance. as a mirror on the computational mind, innovation, faster hardware, and larger Because these computer systems are however weirdly distorted. training sets. Networks with more than products of human design, it seems a dozen layers are now commonplace. we should be able to say exactly how Learning to See Some networks are fully connected: they work. But no: It turns out com- The neurons of an artificial neural Every neuron in a layer receives input putational vision systems are almost network are simple signal-processing from every neuron in the layer below. as inscrutable as biological ones. They units. Thousands or millions of them The new image-recognition networks are “deep neural networks,” mod- are arranged in layers, with signals are built on a different plan. In most of eled on structures in the brain, and flowing from one layer to the next. the layers each neuron receives inputs their expertise is not preprogrammed A neural network for classifying from only a small region of the layer but rather learned from examples. images has an input layer at the bot- below—perhaps a 3×3 or 5×5 square. What they “know” about images is tom with one neuron for each pixel (or All of these patches share the same stored in huge tables of numeric co- three neurons per pixel for color im- set of weights, and so they detect the efficients, which defy direct human ages.) At the top of the stack is a layer same motifs, regardless of position in ­comprehension. with one output neuron for each pos- the image plane. The result of apply- In the past year or two, however, sible category of image. Between the ing such position-independent filters neural nets have begun to yield up a input and output layers are “hidden” is known as convolution, and image- few fleeting glimpses of what’s going layers, where features that distinguish processing systems built in this way on inside. One set of clues comes from one class from another are somehow are called convolutional neural networks, extracted and stored. or convnets. Brian Hayes is senior writer for American Scien- A newly constructed neural network The convnet architecture creates a tist. Additional material related to the Comput- is a blank slate; before it can recognize natural hierarchy of image structures. ing Science column can be found online at http:// anything, it must be trained. An image In the lower layers of the network each bit-player.org. E-mail: [email protected] is presented to the input layer, and the neuron sees a neighborhood of only a

380 American Scientist, Volume 103 © 2015 Brian Hayes. Reproduction with permission only. Contact [email protected]. The process known as deep dreaming transforms a photograph of pe- has learned to “look for” in images. Many of the embellishments seem culiar landforms—conical sandstone “hoodoos” in northern New to arise from local features of the image. A dark patch becomes a dog’s Mexico—into a far stranger collage of animal forms, faces, architectural eye or nose, and the rest of the animal grows from that nucleus. But fantasies and abstract patterns. The algorithm probes the content of an there are also intriguing global transformations. Note how parts of the artificial neural network, accentuating various motifs that the network steep terrain have become a gently sloping plane seen in perspective. few pixels, but as information propa- leagues. The network is a 22-layer ers of the network, they ask what in- gates upward it diffuses over wider convnet with some 60 million param- put image would maximize the target areas. Thus small-scale features (eyes, eters to be adjusted during training. neuron’s level of activation. A varia- nose, mouth) later become elements of tion of the backpropagation algorithm a coherent whole (a face). Seeing in Reverse can answer this question, producing An annual contest called the Image- When a convnet learns to recognize a an image that in some sense embodies Net Large Scale Visual Recognition Welsh springer spaniel, what exactly the network’s vision of a flower or an Challenge has become a benchmark has it learned? If a person performs the automobile. (You might try the same for progress in computer vision. Con- same task, we say that he or she has exercise for yourself. When you sum- testants are given a training set of 1.2 acquired a concept, or mental model, of mon to mind a category such as mea- million images sorted into 1,000 cat- what the dog breed looks like. Perhaps suring cup, what images flash before egories. Then the trained programs the same kind of model is encoded in your eyes?) must classify another 100,000 images, the connection weights of GoogLeNet, The reversal process can never be trying to match the labels suggested by but where should you look for it among complete and unambiguous. Classi- human viewers. Some of the categories those 60 million parameters? fication is a many-to-one mapping, are fairly broad (restaurant, barn), oth- One promising trick for sifting which means the inverse mapping is ers much more specific (Welsh spring- through the network’s knowledge is to one-to-many. Each class concept repre- er spaniel, steel arch bridge). reverse the layer-to-layer flow of infor- sents a potentially infinite collection of For the past three years the con- mation. Among the groups exploring input images. Moreover, the network test has been dominated by convnets. this idea are Andrea Vedaldi and An- does not retain all of the pixels for any The 2014 winner was a system called drew Zisserman of the University of of these images, and so it cannot show GoogLeNet, developed by Christian Oxford and their colleagues. Given a us representative examples. As mem- Szegedy of Google and eight col- specific target neuron in the upper lay- bers of the Oxford group write, “the www.americanscientist.org © 2015 Brian Hayes. Reproduction with permission only. 2015 November–December 381 Contact [email protected]. network captures just a sketch of the ed interest not just from the computer objects.” All we can hope to recover vision community but also from art- is a murky and incomplete collage of ists, cognitive scientists, and the press features that the convnet found to be and public. This new genre of graphic useful in classification. The dalmatian works was given the name inceptionism, image has black and white spots, and alluding to a line in the science fiction the lemon image includes globular yel- film Inception: “We need to go deeper.” low objects, but many other details are A follow-up blog post introduced the missing or indecipherable. term deep , which has caught on. The algorithm behind the from Failure dream images was devised by Alex- Quite a lot of what’s known about hu- ander Mordvintsev, a Google soft- man cognitive abilities comes from ware engineer in Zurich. In the blog studies of mental malfunctions, in- posts he was joined by two coauthors: cluding the effects of injury and dis- Mike Tyka, a biochemist, artist, and ease as well as more mundane events Google software engineer in Seattle; such as verbal errors and misinter- and Christopher Olah of Toronto, a preted images. Two intriguing recent software engineering intern at Google. results apply this idea to image recog- Here’s a recipe for deep dreaming. nition in convnets. Start by choosing a source image and a A group led by Szegedy (the de- target layer within the neural network. veloper of GoogLeNet) harnessed an Present the image to the network’s in- optimization algorithm to find “ad- put layer, and allow the recognition versarial” images, specially crafted process to proceed normally until it to fool a convnet classifier. Start with reaches the target layer. Then, start- an image that the network correctly ing at the target layer, apply the back- recognizes as a school bus, change a propagation algorithm that corrects few pixels—changes so slight they are errors during the training process. imperceptible to the human eye—and However, instead of adjusting connec- the network now assigns the image to tion weights to improve the accuracy another class. of the network’s response, adjust the Ahn Nguyen of the University of source image to increase the amplitude Wyoming, with Jason Yosinski and Jeff of the response in the target layer. This Clune, has performed a complementa- forward-backward cycle is then repeat- ry experiment. They generated images ed a number of times, and at intervals that look to the human observer like the image is resampled to increase the pure noise, yet the network recognizes number of pixels. them with high confidence as a chee- As the iterations continue, ghostly tah or a centipede. patterns emerge from the image, faint- These findings raise questions about ly at first and then more distinctly. A the reliability and robustness of neural dark smudge becomes a dog’s nose, a network methods, but those concerns wrinkled bit of cloth turns into a spi- should not be overblown. It is not the der web, lighthouses and windmills case that any small random change sprout from the empty blue sky. The to an image is likely to mislead the process is self-reinforcing. A neural classifier. As a matter of fact, convnets network has within it a huge jumble of perform well even with heavy doses of image elements drawn from the train- random noise. The adversarial exam- ing set, many of which can be matched ples are so rare they will almost never to random fragments of the source im- be encountered by chance, yet their age. The network acts a bit like Hamlet existence indicates that the network’s feigning madness, when he looks at training leaves “wormholes” where a cloud and sees first a camel, then a two distant regions of the image space weasel, then a whale. Painterly effects decorate another landscape are brought together. In an e-mail exchange I asked Mord­ photograph given the deep-dreaming treat- vintsev, Tyka, and Olah how they ment. Abstract patterns that resemble con- “We Need to Go Deeper” came to invent their technique. I was tour lines, shaded relief maps, embossing, In June of this year an article posted surprised to learn that the original goal and brushstrokes are prominent in the earliest layers of the neural network (although more on the Google Research Blog suddenly was solving a routine graphics prob- pictorial animal forms begin to emerge in the brought the mysteries of deep neural lem: preventing loss of detail when two bottom panels). Some of the patterns are networks to the attention of a much enlarging an image. “We expected that similar to motifs found in the mammalian wider audience. The post was accom- maximizing the magnitude of current visual cortex; some have been likened to hal- panied by a gallery of outlandish but internal activations of the [convnet] lucinations induced by psychoactive drugs. strangely engaging images that attract- on random patches of a slightly blurry

382 American Scientist, Volume 103 © 2015 Brian Hayes. Reproduction with permission only. Contact [email protected]. image would add some of the missing a “ground plane” seen in perspective. in mind that the underlying technolo- details. Turned out it did.” In some cases there’s a rough sense of gy was designed not to generate these A few weeks after their first blog scale consistent with the perspective weird images but to recognize and post, Mordvintsev, Tyka, and Olah view: Big dog down front, tiny build- classify ordinary ones. Furthermore, published their deep dream program, ing on the horizon. the program does that job quite well. making it free for anyone to download The most flamboyant dream images The two-headed dogs and the sky spi- and run. Others immediately began come from layers near the middle of ders are evidently part of that process. experimenting with the , the convnet stack, but the results from The task now is to understand why. and several websites now offer deep lower layers are also interesting, both dreaming as a service. One company aesthetically and for what they reveal Bibliography has packaged the code with a point- about perceptual mechanisms. In the Krizhevsky, Alex, Ilya Sutskever, and Geoffrey and-click interface for $15 (but it’s not mammalian visual cortex some of the E. Hinton. 2012. ImageNet classification with as versatile as the original). deep convolutional neural networks. In Ad- earliest stages of processing detect vances in Neural Information Processing, vol. 25. The deep dream program itself is edges in various orientations, gradi- Mahendran, Aravindh, and Andrea Vedaldi. only about 100 lines of code, writ- ents, and other simple high-contrast 2014. Understanding deep image repre- ten in the Python programming lan- forms such as center-surround pat- sentations by inverting them. http://arxiv. guage, but it relies on several other terns. It’s fascinating to see that similar org/1412.0035 large frameworks and libraries, includ- motifs turn up in the early layers of Mordvintsev, Alexander, Michael Tyka, and ing a few that must be compiled from Christopher Olah. 2015. Inceptionism: Go- a convolutional neural network. And ing deeper into neural networks, Google Re- source code. If all goes well, install- they were not put there by the pro- search Blog, http://googleresearch.blogspot. ing the software takes a few hours. It grammer; they emerged from the net- ch/2015/06/inceptionism-going-deeper- did not go well for me on my first try, work’s own geometric analysis of the into-neural.html. See also DeepDream— or my second. I finally succeeded by training set. a code example for visualizing neural networks, Google Research Blog, http:// starting fresh with a blank disk drive. One could dismiss the deep dream googleresearch.blogspot.com/2015/07/ technique as an overengineered con- deepdream-code-example-for-visualizing. Dream and Hallucination trivance for making funny-looking html; and Deepdream code repository, “Dreaming,” in my view, is not quite pictures, like an Instagram filter run https://github.com/google/deepdream. the right metaphor for this process. amok. And indeed the fad may fade Nguyen, Ahn, Jason Yosinski, and Jeff Clune. 2015. Deep neural networks are easily are what the mind conjures away as quickly as it came. So far, the fooled: High confidence predictions for un- when the perceptual apparatus is shut methodology is documented only in recognizable images. In Computer Vision and down; here the visual system is hy- source code and blog posts; if there is Pattern Recognition 2015. peractive, and what it generates are more scholarly work under way, it has Russakovsky, Olga, et al. 2015. ImageNet hallucinations. In these images we wit- not yet been published. Will anything Large Scale Visual Recognition Challenge. International Journal of Computer Vision, ness a neural network struggling to of substance ever come out of this line DOI:10.1007/s11263-015-0816-y. make sense of the world. The training of inquiry? Simonyan, Karen, Andrea Vedaldi, and An- process has implanted expectations I don’t know, but I have some ques- drew Zisserman. 2014. Deep inside convolu- about how pieces of reality should fit tions I would like to see answered. In tional networks: Visualising image classifica- together, and the network fills in the particular, why are certain kinds of tion models and saliency maps. http://arxiv. blanks accordingly. Photographs and content so heavily overrepresented org/1312.6034. Szegedy, Christian, et al. 2014. Intriguing other “natural” images—all those that in the dream images? The abundance properties of neural networks. http://arxiv. might conceivably represent a three- of canines may reflect biases in the org/1312.6199. dimensional scene on planet Earth— ImageNet database (120 of the 1,000 Yosinski, Jason, et al. 2015. Understanding form a minute subset of all possible categories are dog breeds). Birds, spi- neural networks through deep visualiza- arrays of colored pixels. The network ders, ornate buildings, lanterns, and tion. 31st International Conference on Ma- can only construct images consistent gazebos are also frequent, and eyes are chine Learning. http://www.evolvingai. org/files/2015_Yosinski_ICML.pdf. with this very special sample. everywhere. But where are the cats? Many aspects of the images suggest All of these images were downloaded a focus on purely local transforma- from the Web, which is supposed to be A note to my readers tions. Faces, whether human or ani- full of cats! It has been my privilege to write the Comput- mal, generally have the proper com- I would also like to know which ing Science column since 1993. This is my plement of eyes, nose, and mouth, but geometric elements in the substrate 125th column, and it will be my last. the face may well be mounted on the image are most likely to be embel- I thank my patient editors. I thank the wrong kind of body. Also, neural net- lished. I thought I might approach this many scientists and mathematicians who have generously shared their work, and works apparently can’t count. Dogs question by looking at the program’s guided mine. And I thank the readers of are not limited to just four legs, or just action on simple textures, such as a American Scientist—by far the most thought- one head. Yet there are also some glob- photograph of beach pebbles. It turns ful and responsive audience I have ever had. al constraints that seem to be enforced out that such planar patterns don’t To answer some questions that often go throughout the image frame. However evoke much; the network seems to unspoken: I have not been fired, and I am many legs an animal has, they all reach need 3D structure to stimulate the cre- not retiring. On my agenda is learning more the ground. Objects of all kinds stand ative urge. math, doing more computing, and writing upright and rest upon a surface. The The freaky menagerie of deep all about it. If you would like to follow my system can even create such a surface dream images is both entertaining and further adventures, please stay tuned to if necessary, turning a vertical wall into distracting. I think it important to keep http://bit-player.org. www.americanscientist.org © 2015 Brian Hayes. Reproduction with permission only. 2015 November–December 383 Contact [email protected].