Loopy Neural Nets: Imitating Feedback Loops in the Human Brain

Loopy Neural Nets: Imitating Feedback Loops in the Human Brain Caswell, Isaac Shen, Chuanqi Wang, Lisa [email protected] [email protected] [email protected] Stanford University 450 Serra Mall, Stanford, CA 94305 Abstract sonal assistants (Siri and Cortana), play video games [7], and identify cat videos [5]. Artificial Neural networks purport to be biomimetic, but However, artificial neural networks differ from their are by definition acyclic computational graphs. As a corol- counterparts from nature in a distinct way; artificial neu- lary, neurons in artificial nets fire only once and have no ral networks are directed acyclic graphs (DAGs), but the time-dynamics. Both these properties contrast with what network of neurons in our brains contain many feedback neuroscience has taught us about human brain connectiv- loops. In fact, neuroscience tells us that the architecture of ity, especially with regards to object recognition. We there- the human brain is fundamentally cyclic. For instance, the fore propose a way to simulate feedback loops in the brain well-documented “what pathway” and “where pathway”, by unrolling loopy neural networks several timesteps, and the two main visual object recognition systems in humans, investigate the properties of these networks. We compare contain many feedback loops, leading for instance to the different variants of loops, including multiplicative com- phenomenon known as top down attention [1]. position of inputs and additive composition of inputs. We In this paper, we propose a new model, which we call demonstrate that loopy networks outperform deep feedfor- loopy neural networks (LNNs). At a high level, a LNN ward networks with the same number of parameters on the mimics the cyclic structures in the human brain and is CIFAR-10 dataset, as well as nonloopy versions of the same created by augmenting conventional neural networks with network, and perform equally well on the MNIST dataset. ”loop layers” that allow information from deeper layers to In order to further understand our models, we visualize be fed to lower layers. neurons in loop layers with guided backprop, demonstrat- ing that the same filters behave increasingly nonlinearly at 2. Our Model higher unrolling levels. Furthermore, we interpret loops as attention mechanisms, and demonstrate that the composi- 2.1. Theoretical Model tion of the loop output with the input image produces images Fig 1 shows an simple example of our proposed model. that look qualitatively like attention maps. In this example we use convolutional layers, but the same idea can also be applied to any type of layer. While neural 1. Introduction networks are in general acyclic, the model above contains a loop. In particular, the output from loop undergoes elemen- Neural networks, in particular deep neural networks, twise addition with the input layer before being fed as input comprise a nascent field that has seen much active research to the first layer again. in recent years. A model that was inspired by the brain, It would be ideal if the loop continued being processed whereby each neuron supports simple, primitive functions until the result converged. However, in general convergence but an entire network of them allows the brain to perform is not guaranteed, and neither would this be computation- complicated tasks, a neural network consists of a series of ally feasible. Neither is this in fact especially biomimetic, interconnected layers. Each layer of neurons computes a as the input to a real brain is constantly changing. There- simple function of the previous layer, but amalgamating fore we approximate the loopy structure by simulating the many layers together allows the final network to perform execution of a small number of passes through the feedback a range of arbitrarily complex tasks. Active research, fu- loop. elled by a huge increase in computing power, has trans- To accomplish this, we propose a mechanism similar to formed neural networks into a state-of-the-art technology. that in recurrent neural networks (RNNs). The model is With neural networks, computers nowadays can act as per- further defined by a parameter k (called the unroll factor), 1 be the sum of gradients of W1 in each layer. In other words, k X dWi = dWi;j j=1 where dWi is the gradient of layer Wi and dWi;j is the th gradietn of Wi at the j unroll. 2.2. Architecture We wrote a library based on Lasagne 1 that allows us to specify the layer details, loop configurations, relevant hy- perparameters etc in a config file2. We accomplish layer duplication in the unrolled case by tying the weights among layers which correspond to the same layer in the loopy model. This is as simple as passing in the same Theano shared variable to all unrolled layers. Figure 1. CIFAR-10 loopy model with loop layer. Unrolled net- 3. Data Sets work on the right. Since our project is focusing on the design of a new neural net architecture, we decided to use two standard Com- puter Vision datasets, MNIST and CIFAR-10. which determines the number of times the loop will be processed. k is so named because it is very similar to the un- 3.1. MNIST rolling mechanism in RNNs. For example, the diagram on the right in fig. 1 depicts what happens when the network MNIST is a dataset of handwritten digits [6]. It is on the left is unrolled 3 times. considered a standard dataset in computer vision and pat- tern recognition. We started with this dataset since it re- With the addition of loops, it is hoped that LNNs im- quires less memory than datasets with color images. We prove upon vanilla neural networks in the following ways. used 40,000 images for training, 10,000 for validation and 1. Feedback mechanism By allowing lower-level layers 10,000 for test. There 10 classes, one for each digit. The im- to know the weights of higher-level features, a more re- ages are in grey-scale and the size of each image is 28x28 fined choice of weights for the lower-level layers may pixels. We did not do any other processing on the MNIST be possible. images before feeding them into our neural nets. 2. Compact Representation Even a shallow LNN can 3.2. CIFAR-10 resemble a deep neural network when unrolled several CIFAR-10 is a subset of the Tiny Images dataset. times. Yet, the unrolled network uses far fewer param- CIFAR-10 contains 60,000 color images from ten object eters when compared to a deep neural network of the classes and is also considered a standard dataset in com- same depth. It is hoped that both networks can pro- puter vision. The ten classes are airplane, automobile, bird, vide similar expressive power despite the discrepancy cat, deer, dog, frog, horse, ship and truck. The distribution in the number of parameters. If this is found to be true, of classes across the data set is even and each image can then LNNs can serve as a compact representation of only belong to one class of the ten classes. Examples from complicated, deep models. the CIFAR-10 dataset can be seen in (fig. 2) Training a LNN is very similar to training a vanilla neural Since the images are in color, each image is represented in 3 network. After a LNN is unrolled, forward propagation can color channels. The image size is 32x32 pixels, resulting in be performed in the standard manner. Backward propaga- a 3x32x32 representation for each image. We used 20,000 tion can also be done on unrolled LNN in the standard man- for training, 1000 for validation and 1000 for test. We did ner. However, due to shared parameters between layers in not do any other processing on the CIFAR-10 images before the unrolled network, the gradient needs to be updated dif- feeding them into our neural nets. ferently. For example, in fig. 1, we will treat each layer in 1which is based on Theano, which is based on C, which is based on each unrolled layer as distinct and perform backward prop- assembly...a deep framework for a deep problem. 2 agation as normal. However, the actual gradient of W1 will See https://github.com/icaswell/loopy_CNNs 2 Figure 3. Building block of residual learning. Figure 2. Sample images from CIFAR-10. high accuracies on image recognition tasks are correlated with the depth of the neural net. 4. Related Work However, with deeper networks, the problem of degradation arises. With increasing the depth, the validation accu- There are models in existing literature that also try to racy saturates at some point and degrades afterwards. Ac- incorporate loops into the model, and we derived some of cording to He et al., this is not a problem with overfitting, our ideas from these models. since the training accuracy goes down as well. Deep resid- Recurrent Neural Networks. A Recurrent Neural Net- ual learning addresses degradation by adding identity short- work (RNN) is a model that takes at each timestep an input cut connections, which skip layers. Fig. 3 shows a building from the external world and an input from itself from a pre- block of residual learning with the identity shortcut. Our vious timestep, summarizing all the information it’s seen so loopy network, once unrolled, is similar to a network with far. The way that we unroll our networks leads to a very skip connections, with the difference that the loop connec- similar architecture: each copy of the network (each ‘un- tions feed into layers that share parameters with previous roll’) takes as input all outputs from loops from the previ- layers. Our addition loops are similar to these identity skip ous unroll, as well as the output of the highest layer in the connections since they do not introduce extra parameters network with no upstream loop and feed into an addition node that combines two outputs The unrolled structure looks very much like an RNN, from different layers.

Load more