LETTER Communicated by Sophie Deneve
Cortical Circuitry Implementing Graphical Models
Shai Litvak [email protected] Shimon Ullman [email protected] Department of Computer Science and Applied Mathematics, Weizmann Institute of Science, Rehovot, Israel, 71600
In this letter, we develop and simulate a large-scale network of spik- ing neurons that approximates the inference computations performed by graphical models. Unlike previous related schemes, which used sum and product operations in either the log or linear domains, the current model uses an inference scheme based on the sum and maximization opera- tions in the log domain. Simulations show that using these operations, a large-scale circuit, which combines populations of spiking neurons as basic building blocks, is capable of finding close approximations to the full mathematical computations performed by graphical models within a few hundred milliseconds. The circuit is general in the sense that it can be wired for any graph structure, it supports multistate variables, and it uses standard leaky integrate-and-fire neuronal units. Following previ- ous work, which proposed relations between graphical models and the large-scale cortical anatomy, we focus on the cortical microcircuitry and propose how anatomical and physiological aspects of the local circuitry may map onto elements of the graphical model implementation. We dis- cuss in particular the roles of three major types of inhibitory neurons (small fast-spiking basket cells, large layer 2/3 basket cells, and double- bouquet neurons), subpopulations of strongly interconnected neurons with their unique connectivity patterns in different cortical layers, and the possible role of minicolumns in the realization of the population- based maximum operation.
1 Introduction
A number of studies have shown that optimal inference, which finds the most probable interpretation given the sensory input, can successfully explain psychophysical results (Ernst & Banks, 2002; Knill & Richards, 1996; Kording & Wolpert, 2004; Rao, Olshausen, & Lewicki, 2002; Stocker & Simoncelli, 2006; Weiss, Simoncelli, & Adelson, 2002). Related studies described how neurons may implement such a computation and pro- posed the view that computations of the cortical circuitry can be usefully
Neural Computation 21, 3010–3056 (2009) C 2009 Massachusetts Institute of Technology
Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/neco.2009.05-08-783 by guest on 26 September 2021 Cortical Circuitry Implementing Graphical Models 3011
approximated in terms of abstract probabilistic graphical models (Beck & Pouget, 2007; Dayan, Hinton, Neal, & Zemel, 1995; Deneve, 2008a; Doya, Ishii, Alexandre, & Rao, 2007; Hinton & Sejnowski, 1986; Huys, Zemel, Natarajan, & Dayan, 2007; Koechlin, Anton, & Burnod, 1999; Lee & Mum- ford, 2003; Ma, Beck, Latham, & Pouget, 2006; Ott & Stoop, 2006; Rao, 2005b; Sahani & Dayan, 2003). The graphical model framework has an intuitive appeal for describ- ing cortical computations both functionally and structurally. Probabilistic graphical models can describe complex, uncertain, real-world relations be- tween many variables and systematically integrate stored long-term knowl- edge with ongoing observation. Inference computation is typically imple- mented using a connected network of a large number of relatively simple interacting elements. One of the most useful graphical inference methods is known as belief propagation (Jordan & Sejnowski, 2001; Pearl, 1988). This algorithm uses al- ternations of summation and multiplication operations based on the graph structure. Several works suggested that belief propagation is directly im- plemented by cortical circuits (Beck & Pouget, 2007; Koechlin et al., 1999; Lee & Mumford, 2003; Ma et al., 2006). One difficulty with this direct ap- proach is the neuronal implementation of the product operation (Hausser & Mel, 2003; Mel & Koch, 1990). One suggestion for dealing with this issue was to apply the scheme in the log domain, replacing the products with sums (Deneve, 2008a; Ott & Stoop, 2006; Rao, 2004b, 2005b). Although this solves the multiplication problem, it also raises new difficulties concerning the original summation operation. In this letter, we develop a biologically plausible version of a graphical model and simulate a large-scale network of spiking neurons that approxi- mates the computation performed by the graphical model in realistic times. In the model we use, instead of belief propagation, a related inference algorithm called belief revision (Pearl, 1987; Weiss, 1997). This algorithm is similar to belief propagation but uses maximization and multiplication operations. The current model is based on a log-domain variant of belief revision, which requires maximization and summation. To implement the maximization operation, we enhanced the circuit studied by Yu, Giese, and Poggio (2002). The new circuit uses populations of spiking leaky integrate- and-fire neurons to represent the different inputs and the output within a few tens of milliseconds. Summation in similar timescales was imple- mented using coupled inhibitory and excitatory populations. Following the construction of the basic circuit elements, we next connect multiple summa- tion and maximization circuits according to the belief-revision scheme and construct a large-scale network of LIF neurons. We show in simulations that the network is able to find highly likely solutions for difficult multivariate problems within a few hundred milliseconds. Finally, we propose a detailed mapping between the spiking inference circuit and the cortical microcircuitry based on recent empirical studies.
Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/neco.2009.05-08-783 by guest on 26 September 2021 3012 S. Litvak and S. Ullman
First, computation in the log domain is partially supported by experimen- tal studies (Carpenter & Williams, 1995; Jazayeri & Movshon, 2006; Yang & Shadlen, 2007). Max-like responses are reported in cortical neurons in V1 and V4 of the cat visual cortex (Gawne & Martin, 2002; Lampl, Ferster, Poggio, & Riesenhuber, 2004). There is anatomical evidence (Yoshimura & Callaway, 2005; Yoshimura, Dantzker, & Callaway,2005) that the maximiza- tion circuit, which is based on specifically connected neuronal populations of excitatory and inhibitory neurons, is implemented in the superficial lay- ers of the cortex by pyramidal neurons and a specific type of inhibitory neurons, characterized by spike rate adaptation. These studies also support the view that the suggested summation circuit is implemented by highly coupled excitatory populations and fast-spiking basket neurons. Together, this leads to a model where maximization, or a similar nonlinearity, is implemented in the superficial layers of the cortex, while layer 4, and pos- sibly other cortical layers, implement summation, or a related near-linear computation. Finally, the mapping suggests that cortical minicolumns, or related small, localized neuronal structures (Jones, 2000; Peters & Sethares, 1997; Rockland & Ichinohe, 2004), serve as relatively separated maximiza- tion circuits, which are associated with different states of the same network variable. The organization of the letter is as follows. Section 2 gives a short intro- duction to the main aspects of graphical models needed for the discussion. Readers who are familiar with the Markov random field model may want to skip this section. Section 3 describes the new inference circuit and reports inference results. In section 4, we suggest how the proposed circuit can be mapped onto cortex. Finally, in section 5, the new model is compared with related previous models, and issues for future study are suggested.
2 The Structure of Probabilistic Graphical Models
In this section we describe briefly the basic structure of probabilistic graph- ical models and the computations they perform. The goal is not to provide a comprehensive review, but only to summarize the main concepts and notations used in this letter. (For a comprehensive treatment of these com- putations, see Aji & McEliece, 2000; Jordan & Weiss, 2002; Pearl, 1988). The general problem solved by graphical models is to assign optimal, or near-optimal, values to a set of unknown variables based on observations. For example, in vision, the problem may be to recognize objects in the scene and localize their parts based on the measured intensity values of image pixels. During learning, the model learns an approximation to the joint , ,..., distribution P(x1 x2 xN) over all variables xi in a set X that can include the possible objects, their parts and subparts, and the image pixel values. During recognition, the problem is to find the best values of a subset H ⊂ X of the hidden or unobserved variables (the objects, parts, and subparts) from a complementary set O = X/H of observed variables (the pixels intensities),
Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/neco.2009.05-08-783 by guest on 26 September 2021 Cortical Circuitry Implementing Graphical Models 3013
, ,..., for example, by maximizing P(x1 x2 xN) over possible values of all xi in H. This is a general and natural inference task, but its solution for an arbi- trary function P can be computationally demanding. The approach fol- lowed by graphical models is to express, or approximate, the function , ,..., = , P as the product of simpler functions P(x1 x2 xN) i i (Oi Hi ). The functions i are simpler than P in the sense that each depends on a small subset of all variables. The computational problem is then to find , maxH( i i (Oi Hi )). As we will see, the functions max and product in this computation can also be replaced by other function pairs. The components of such a graphical model are described using a math- ematical graph, which will not be discussed here in detail. For example, a node in the graph can be associated with a variable. When all functions i involve only two variables, each i can be represented by an edge in the graph between the two associated nodes. The most useful aspect of graphical models is that they decompose the complex and global optimiza- tion problem into simpler and more local computations associated with the nodes and edges of the graph, using the same basic function pair. A range of theoretical graphical models has been studied (Ghahramani, 2002; Jordan, 1999; Yedidia, Freeman, & Weiss, 2003). These models can differ in directionality of the edges, the type of variables being used, and the graph structure. The edges can be undirected, as in the case of Markov random fields, or directed, as in Bayesian networks. The variables of the model can be discrete, continuous, or combined, as in mixture of experts. They can be unconstrained as in factor graphs, random variables as in hidden Markov models, or specific, such as the gaussians of Kalman filters. However, in all these models, inference can be performed based on the same principle: iterating two mathematical operations over a set of evolving values that are associated with variables and states according to the struc- ture of the graph and based on the distributive law. This decomposed infer- ence computation has been termed the generalized distributive law (GDL; Aji & McEliece, 2000). Many algorithms, including fast Fourier transform (FFT) and fast Hadamard transform, the Viterbi algorithm, the forward- backward algorithm, Kalman filter, and turbo codes, were shown to be instances of the GDL (Kschischang, Frey, & Loeliger, 2001). Computation methods of graphical models such as belief propagation and particle filter- ing (Samejima, Doya, Ueda, & Kimura, 2004), which are based on the GDL, are capable of coping with difficult, real-world, multivariate problems, in- cluding vision (Epshtein & Ullman, 2005), diagnosis and bioinformatics (Jaimovich, Elidan, Margalit, & Friedman, 2006), and coding and commu- nication (MacKay, 2003). In the cortical model discussed and simulated below, the circuit imple- ments a GDL-based inference for a simple graphical model termed pairwise Markov random field (MRF). Similar GDL-based inference circuits can be constructed for other graphical models such as Bayesian networks and
Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/neco.2009.05-08-783 by guest on 26 September 2021 3014 S. Litvak and S. Ullman
factor graphs. The formulation of pairwise-MRF is useful for the next sec- tion, where several cortical models, which implement inference using GDL principles, are reviewed. A pairwise MRF is an undirected graphical model. Let X = { , ,..., } X1 X2 XN be a set of random variables, each having a discrete set = , = ,..., = of states. Let P(X1 x1 X2 x2 XN xN) be the joint probability over all the variables. G = (V, E) is an undirected graph with a set V of N nodes and a set E of edges. Each node vi of G is associated with the random variable Xi , and each edge between node vi and node v j in- dicates statistical dependencies between the two associated variables Xi and X j , represented by the compatibility function φij(Xi , Xj ) defined be- low. The joint distribution is assumed to have the neighborhood property: | ,..., , ,..., = | P(xi x1 xi−1 xi+1 xN) p(xi Ne(xi )), where Ne(Xi ) are the neigh- bors of Xi in G. It is known from the Hammersley-Clifford theorem (Besag, 1974) that the probability distribution can be represented as the product of pairwise compatibility functions: