Going in Circles Is the Way Forward: the Role of Recurrence in Visual Inference Van Bergen and Kriegeskorte 177
Total Page:16
File Type:pdf, Size:1020Kb
Available online at www.sciencedirect.com ScienceDirect Going in circles is the way forward: the role of recurrence in visual inference 1 1,2,3,4 Ruben S van Bergen and Nikolaus Kriegeskorte Biological visual systems exhibit abundant recurrent visual perception can be disturbed by carefully timed connectivity. State-of-the-art neural network models for visual interventions that coincide with the arrival of re-entrant recognition, by contrast, rely heavily or exclusively on information to a visual area [12–15]. The evidence for feedforward computation. Any finite-time recurrent neural recurrent computation in the primate brain, thus, is network (RNN) can be unrolled along time to yield an equivalent unequivocal. What is less obvious, however, is why the feedforward neural network (FNN). This important insight brain uses a recurrent algorithm. suggests that computational neuroscientists may not need to engage recurrent computation, and that computer-vision This question has recently been brought into sharper focus engineers may be limiting themselves to a special case of FNN by the successes of deep feedforward neural network models if they build recurrent models. Here we argue, to the contrary, (FNNs) [16,17]. These models now match or exceed human that FNNs are a special case of RNNs and that computational performance on certain visual tasks [18–20], and better neuroscientists and engineers should engage recurrence to predict primate recognition behavior [21,22,23] and neural understand how brains and machines can (1) achieve greater activity [24–29] than current alternative models. and more flexible computational depth (2) compress complex computations into limited hardware (3) integrate priors and Although computer vision and computational neurosci- priorities into visual inference through expectation and ence both have a long history of recurrent models [30–33], attention (4) exploit sequential dependencies in their data for feedforward models have earned a dominant status in better inference and prediction and (5) leverage the power of both fields. How should we account for this discrepancy iterative computation. between brains and models? Addresses 1 One answer is that the discrepancy reflects the fact that Zuckerman Mind Brain Behavior Institute, Columbia University, New York, NY, United States brains and computer-vision systems operate on different 2 hardware and under different constraints on space, time, Department of Psychology, Columbia University, New York, NY, United States and energy. Perhaps we have come to a point at which the 3 Department of Neuroscience, Columbia University, New York, NY, two fields must go their separate ways. However, this United States 4 answer is unsatisfying. Computational neuroscience must Affiliated member, Electrical Engineering, Columbia University, New still find out how visual inference works in brains. And York, NY, United States although engineers face quantitatively different con- straints when building computer-vision systems, they, Current Opinion in Neurobiology 2020, 65:xx-yy too, must care about the spatial, temporal, and energetic This review comes from a themed issue on Whole-brain interactions limitations their models must operate under when between neural circuits deployed in, for example, a smartphone. Moreover, as Edited by Karel Svoboda and Laurence Abbott long as neural network models continue to dominate computer vision, more efficient hardware implementa- tions are likely to be more similar to biological neural networks than current implementations using conven- https://doi.org/10.1016/j.conb.2020.11.009 tional processors and graphics processing units (GPUs). 0959-4388/ã 2020 The Authors. Published by Elsevier Ltd. This is an open access article under the CC BY license (http://creativecommons. A second explanation for the discrepancy is that the org/licenses/by/4.0/). abundance of recurrent connections in cortex belies a superficial role in neural computation. Perhaps the core computations can be performed by a feedforward network Introduction [34], while recurrent processing serves more auxiliary and The primate visual cortex uses a recurrent algorithm to modulatory functions, such as divisive normalization [35] process sensory input [1–3]. Anatomically, connectivity is and attention [36–39]. This perspective is convenient cyclic. Neurons are connected in cycles within local because it enables us to hold on to the feedforward cortical circuits [4–6]. Global inter-area connections are model in our minds. The auxiliary and modulatory func- dense and mostly bidirectional [7–9]. Physiologically, the tions let us acknowledge recurrence without fundamen- dynamics of neural responses bear temporal signatures tally changing the way we envision the algorithm of indicative of recurrent processing [10,1,11]. Behaviorally, recognition. Current Opinion in Neurobiology 2020, 65:176–193 www.sciencedirect.com Going in circles is the way forward: the role of recurrence in visual inference van Bergen and Kriegeskorte 177 However, there is a third and more exciting explanation for notion we will return to later. Notice how this temporally the discrepancy between recurrent brains and feedforward unrolled, small network resembles a larger feedforward models: although feedforward computation is powerful, a neural network with more connections and areas between recurrent algorithm provides a fundamentally superior its input and output. We can emphasize this recurrent- solution to the problem of visual inference, and this algo- feedforward equivalence by interpreting the computa- rithm is implemented in primate visual cortex. This recur- tional graph over time as a spatial architecture, and rent algorithm explains how primate vision can be so visually arranging the induced areas and connections in efficient in terms of space, time, energy, and data, while a linear spatial sequence — an operation we call unrolling being so rich and robust in terms of the inferences and their in space (Figure 1d). This results in a deep feedforward generalization to novel environments. architecture with many skip connections between areas that are separated by more than one level in this new hierar- In this review, we argue for the latter possibility, discuss- chy, and with many connections that are exact copies of ing a range of potential computational functions of recur- one another (sharing identical connection weights). rence and citing the evidence suggesting that the primate brain employs them. We aim to distinguish established Thus, any finite-time RNN can be transformed into an from more speculative, and superficial from more pro- equivalent FNN. But this should not be taken to mean that found forms of recurrence, so as to clarify the most RNNs are a special case of FNNs. In fact, FNNs are a special exciting directions for future research that will close case of finite-time RNNs (Figure 2a), comprising those the gap between models and brains. which happen to have no cycles. More practically, not every unrolled finite-time RNN is a realistic FNN (Figure 2b). By Unrolling a recurrent network realistic networks, we mean networks that conform to the What exactly do we mean when we say that a neural real-world constraints the system must operate under. For network — whether biological or artificial — is recurrent computational neuroscience, a realistic network is one that rather than feedforward? This may seem obvious, but it fits in the brain of the animal and does not require a deeper turns outthat the distinctioncaneasily beblurred.Consider network architecture or more processing steps than the the simple network in Figure 1a. It consists of three animal can accommodate. For computer vision, a realistic processing stages, arranged hierarchically, which we will network is one that can be trained and deployed on available refer to as areas, by analogy to cortex. Each area contains a hardware at the training and deployment stages. For exam- number of neurons (real or artificial) that apply fixed ple, there may be limits on the storage and energy available, operations to their input. Visual input enters in the first which would limit the complexity of the architecture and area, where it undergoes some transformation, the result of computational graph. A realistic finite-time RNN, when which is passed as input to the second area, and so forth. unrolled, can yield an unworkably deep FNN. Although Information travels exclusively in one direction — the the most widely used current method for training RNNs ‘forward’ direction, from input to output — and so this is (backpropagation through time) requires unrolling, an RNN an example of a feedforward architecture. Notably, the is not equivalent to its unrolled FNN twin at the stage of real- number of transformations between input and output is world deployment: the RNN’s recurrent connections need fixed, and equal to the number of areas in the network. not be physically duplicated, but can be reused across cycles of computation. Now compare this to the architecture in Figure 1b. Here, we have added lateral and feedback connections to the An important recent observation [40 ,41,42] is that the network. Lateral connections allow the output of an area architecture that results from spatially unrolling a recurrent to be fed back into the same area, to influence its network, resembles the architectures of state-of-the art computations in the next processing step. Feedback FNNs used in computer vision, which similarly contain connections allow the output of an area to influence