LETTER Communicated by Sophie Deneve

Cortical Circuitry Implementing Graphical Models

Shai Litvak [email protected] Shimon Ullman [email protected] Department of Computer Science and Applied Mathematics, Weizmann Institute of Science, Rehovot, Israel, 71600

In this letter, we develop and simulate a large-scale network of spik- ing that approximates the inference computations performed by graphical models. Unlike previous related schemes, which used sum and product operations in either the log or linear domains, the current model uses an inference scheme based on the sum and maximization opera- tions in the log domain. Simulations show that using these operations, a large-scale circuit, which combines populations of spiking neurons as basic building blocks, is capable of finding close approximations to the full mathematical computations performed by graphical models within a few hundred milliseconds. The circuit is general in the sense that it can be wired for any graph structure, it supports multistate variables, and it uses standard leaky integrate-and-fire neuronal units. Following previ- ous work, which proposed relations between graphical models and the large-scale cortical anatomy, we focus on the cortical microcircuitry and propose how anatomical and physiological aspects of the local circuitry may map onto elements of the graphical model implementation. We dis- cuss in particular the roles of three major types of inhibitory neurons (small fast-spiking basket cells, large layer 2/3 basket cells, and double- bouquet neurons), subpopulations of strongly interconnected neurons with their unique connectivity patterns in different cortical layers, and the possible role of minicolumns in the realization of the population- based maximum operation.

1 Introduction

A number of studies have shown that optimal inference, which finds the most probable interpretation given the sensory input, can successfully explain psychophysical results (Ernst & Banks, 2002; Knill & Richards, 1996; Kording & Wolpert, 2004; Rao, Olshausen, & Lewicki, 2002; Stocker & Simoncelli, 2006; Weiss, Simoncelli, & Adelson, 2002). Related studies described how neurons may implement such a computation and pro- posed the view that computations of the cortical circuitry can be usefully

Neural Computation 21, 3010–3056 (2009) C 2009 Massachusetts Institute of Technology

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/neco.2009.05-08-783 by guest on 26 September 2021 Cortical Circuitry Implementing Graphical Models 3011

approximated in terms of abstract probabilistic graphical models (Beck & Pouget, 2007; Dayan, Hinton, Neal, & Zemel, 1995; Deneve, 2008a; Doya, Ishii, Alexandre, & Rao, 2007; Hinton & Sejnowski, 1986; Huys, Zemel, Natarajan, & Dayan, 2007; Koechlin, Anton, & Burnod, 1999; Lee & Mum- ford, 2003; Ma, Beck, Latham, & Pouget, 2006; Ott & Stoop, 2006; Rao, 2005b; Sahani & Dayan, 2003). The graphical model framework has an intuitive appeal for describ- ing cortical computations both functionally and structurally. Probabilistic graphical models can describe complex, uncertain, real-world relations be- tween many variables and systematically integrate stored long-term knowl- edge with ongoing observation. Inference computation is typically imple- mented using a connected network of a large number of relatively simple interacting elements. One of the most useful graphical inference methods is known as belief propagation (Jordan & Sejnowski, 2001; Pearl, 1988). This algorithm uses al- ternations of summation and multiplication operations based on the graph structure. Several works suggested that belief propagation is directly im- plemented by cortical circuits (Beck & Pouget, 2007; Koechlin et al., 1999; Lee & Mumford, 2003; Ma et al., 2006). One difficulty with this direct ap- proach is the neuronal implementation of the product operation (Hausser & Mel, 2003; Mel & Koch, 1990). One suggestion for dealing with this issue was to apply the scheme in the log domain, replacing the products with sums (Deneve, 2008a; Ott & Stoop, 2006; Rao, 2004b, 2005b). Although this solves the multiplication problem, it also raises new difficulties concerning the original summation operation. In this letter, we develop a biologically plausible version of a graphical model and simulate a large-scale network of spiking neurons that approxi- mates the computation performed by the graphical model in realistic times. In the model we use, instead of belief propagation, a related inference algorithm called belief revision (Pearl, 1987; Weiss, 1997). This algorithm is similar to belief propagation but uses maximization and multiplication operations. The current model is based on a log-domain variant of belief revision, which requires maximization and summation. To implement the maximization operation, we enhanced the circuit studied by Yu, Giese, and Poggio (2002). The new circuit uses populations of spiking leaky integrate- and-fire neurons to represent the different inputs and the output within a few tens of milliseconds. Summation in similar timescales was imple- mented using coupled inhibitory and excitatory populations. Following the construction of the basic circuit elements, we next connect multiple summa- tion and maximization circuits according to the belief-revision scheme and construct a large-scale network of LIF neurons. We show in simulations that the network is able to find highly likely solutions for difficult multivariate problems within a few hundred milliseconds. Finally, we propose a detailed mapping between the spiking inference circuit and the cortical microcircuitry based on recent empirical studies.

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/neco.2009.05-08-783 by guest on 26 September 2021 3012 S. Litvak and S. Ullman

First, computation in the log domain is partially supported by experimen- tal studies (Carpenter & Williams, 1995; Jazayeri & Movshon, 2006; Yang & Shadlen, 2007). Max-like responses are reported in cortical neurons in V1 and V4 of the cat (Gawne & Martin, 2002; Lampl, Ferster, Poggio, & Riesenhuber, 2004). There is anatomical evidence (Yoshimura & Callaway, 2005; Yoshimura, Dantzker, & Callaway,2005) that the maximiza- tion circuit, which is based on specifically connected neuronal populations of excitatory and inhibitory neurons, is implemented in the superficial lay- ers of the cortex by pyramidal neurons and a specific type of inhibitory neurons, characterized by spike rate adaptation. These studies also support the view that the suggested summation circuit is implemented by highly coupled excitatory populations and fast-spiking basket neurons. Together, this leads to a model where maximization, or a similar nonlinearity, is implemented in the superficial layers of the cortex, while layer 4, and pos- sibly other cortical layers, implement summation, or a related near-linear computation. Finally, the mapping suggests that cortical minicolumns, or related small, localized neuronal structures (Jones, 2000; Peters & Sethares, 1997; Rockland & Ichinohe, 2004), serve as relatively separated maximiza- tion circuits, which are associated with different states of the same network variable. The organization of the letter is as follows. Section 2 gives a short intro- duction to the main aspects of graphical models needed for the discussion. Readers who are familiar with the Markov random field model may want to skip this section. Section 3 describes the new inference circuit and reports inference results. In section 4, we suggest how the proposed circuit can be mapped onto cortex. Finally, in section 5, the new model is compared with related previous models, and issues for future study are suggested.

2 The Structure of Probabilistic Graphical Models

In this section we describe briefly the basic structure of probabilistic graph- ical models and the computations they perform. The goal is not to provide a comprehensive review, but only to summarize the main concepts and notations used in this letter. (For a comprehensive treatment of these com- putations, see Aji & McEliece, 2000; Jordan & Weiss, 2002; Pearl, 1988). The general problem solved by graphical models is to assign optimal, or near-optimal, values to a set of unknown variables based on observations. For example, in vision, the problem may be to recognize objects in the scene and localize their parts based on the measured intensity values of image pixels. During learning, the model learns an approximation to the joint , ,..., distribution P(x1 x2 xN) over all variables xi in a set X that can include the possible objects, their parts and subparts, and the image pixel values. During recognition, the problem is to find the best values of a subset H ⊂ X of the hidden or unobserved variables (the objects, parts, and subparts) from a complementary set O = X/H of observed variables (the pixels intensities),

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/neco.2009.05-08-783 by guest on 26 September 2021 Cortical Circuitry Implementing Graphical Models 3013

, ,..., for example, by maximizing P(x1 x2 xN) over possible values of all xi in H. This is a general and natural inference task, but its solution for an arbi- trary function P can be computationally demanding. The approach fol- lowed by graphical models is to express, or approximate, the function , ,..., = , P as the product of simpler functions P(x1 x2 xN) i i (Oi Hi ). The functions i are simpler than P in the sense that each depends on a small subset of all variables. The computational problem is then to find , maxH( i i (Oi Hi )). As we will see, the functions max and product in this computation can also be replaced by other function pairs. The components of such a graphical model are described using a math- ematical graph, which will not be discussed here in detail. For example, a node in the graph can be associated with a variable. When all functions i involve only two variables, each i can be represented by an edge in the graph between the two associated nodes. The most useful aspect of graphical models is that they decompose the complex and global optimiza- tion problem into simpler and more local computations associated with the nodes and edges of the graph, using the same basic function pair. A range of theoretical graphical models has been studied (Ghahramani, 2002; Jordan, 1999; Yedidia, Freeman, & Weiss, 2003). These models can differ in directionality of the edges, the type of variables being used, and the graph structure. The edges can be undirected, as in the case of Markov random fields, or directed, as in Bayesian networks. The variables of the model can be discrete, continuous, or combined, as in mixture of experts. They can be unconstrained as in factor graphs, random variables as in hidden Markov models, or specific, such as the gaussians of Kalman filters. However, in all these models, inference can be performed based on the same principle: iterating two mathematical operations over a set of evolving values that are associated with variables and states according to the struc- ture of the graph and based on the distributive law. This decomposed infer- ence computation has been termed the generalized distributive law (GDL; Aji & McEliece, 2000). Many algorithms, including fast Fourier transform (FFT) and fast Hadamard transform, the Viterbi algorithm, the forward- backward algorithm, Kalman filter, and turbo codes, were shown to be instances of the GDL (Kschischang, Frey, & Loeliger, 2001). Computation methods of graphical models such as belief propagation and particle filter- ing (Samejima, Doya, Ueda, & Kimura, 2004), which are based on the GDL, are capable of coping with difficult, real-world, multivariate problems, in- cluding vision (Epshtein & Ullman, 2005), diagnosis and bioinformatics (Jaimovich, Elidan, Margalit, & Friedman, 2006), and coding and commu- nication (MacKay, 2003). In the cortical model discussed and simulated below, the circuit imple- ments a GDL-based inference for a simple graphical model termed pairwise Markov random field (MRF). Similar GDL-based inference circuits can be constructed for other graphical models such as Bayesian networks and

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/neco.2009.05-08-783 by guest on 26 September 2021 3014 S. Litvak and S. Ullman

factor graphs. The formulation of pairwise-MRF is useful for the next sec- tion, where several cortical models, which implement inference using GDL principles, are reviewed. A pairwise MRF is an undirected graphical model. Let X = { , ,..., } X1 X2 XN be a set of random variables, each having a discrete set = , = ,..., = of states. Let P(X1 x1 X2 x2 XN xN) be the joint probability over all the variables. G = (V, E) is an undirected graph with a set V of N nodes and a set E of edges. Each node vi of G is associated with the random variable Xi , and each edge between node vi and node v j in- dicates statistical dependencies between the two associated variables Xi and X j , represented by the compatibility function φij(Xi , Xj ) defined be- low. The joint distribution is assumed to have the neighborhood property: | ,..., , ,..., = | P(xi x1 xi−1 xi+1 xN) p(xi Ne(xi )), where Ne(Xi ) are the neigh- bors of Xi in G. It is known from the Hammersley-Clifford theorem (Besag, 1974) that the probability distribution can be represented as the product of pairwise compatibility functions:

, ,..., = φ , . P(x1 x2 xN) ij(xi xj ) (2.1) (i, j)

Given the specific states (Xo = xˆo ) for some subset O ⊂ X of observed vari- ables Xo ∈ O (also called the visible variables or the evidence), the inference problem is to choose an optimal set of states (Xh = xˆh) for the set H = X\O of the other unobserved variables Xh ∈ H. The choice of unobserved states can be thought of as the preferred interpretation (e.g., the recognized ob- jects) of the evidence (the pixel intensities). Typically two types of optimality criteria are used: r The MPE (most probable explanation) estimate of all unobserved vari- ables is the set of statex ˆh such that the joint posterior probability of all , ,..., states P(ˆx1 xˆ2 xˆ N) is maximal. MPE is also sometimes referred to in the literature as the MAP (maximum a posteriori) of all unobserved r variables. The MPM (posterior marginal) estimate chooses each unobserved statex ˆh independent of the others. First, the marginal probabil- ities Ph(xh) are calculated for the states of Xh by marginalizing = | = over all unobserved variables except Xh : Ph(xh) P(xh O) H\X , ,..., h P(X1 X2 XN). Thenx ˆh is selected to be the state with the highest marginal probability. MPM is sometimes called the minimum mean squared error (MMSE).

Both MPE and MPM estimates are commonly used in practical optimiza- tion problems.

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/neco.2009.05-08-783 by guest on 26 September 2021 Cortical Circuitry Implementing Graphical Models 3015

We next summarize briefly how the MPE or MPM is computed in graph- ical models, and in the next section we show how similar computations could be implemented in a neural network. Finding the exact MPE and MPM in the general case of a loopy graph is a difficult, NP-hard problem, which essentially requires examining the exponential number (in N) of all combinations of the states (Cooper, 1990). A message-passing algorithm that can be used to approximate and, under some conditions, find the exact MPE solution is called belief revision (Pearl, 1987; Weiss, 1997) and can be implemented by iteratively calculating, at time steps t = 1,...,T, the following message equation for every edge Xi − Xj and every state xj ∈ Xj : ⎛ ⎞ t = ⎝φ , t−1 ⎠ . Mij(xj ) max ij(xi xj ) Mni (xi ) (2.2) xi ∈ Xi Xn∈ Ne(Xi )\Xj

The product in equation 2.2 is over all messages coming from the neighbors of Xi except the target node Xj . The maximization is over all states of Xi . t The message Mij(xj )fromnodeXi to node Xj is thus a function of the target variable Xj , that is, a vector with r values, when r is the number of states of Xj . Equation 2.2 describes an iterative procedure, where the left-hand-side t message Mij(xj ) represents a new value, based on older values at the previ- ous iteration on the right-hand side. Roughly, in sending a message from Xi to Xj , the node Xi takes the maximal value of several products of incoming messages to Xi (ignoring messages from the target Xj itself), multiplied by the pairwise function φij. An approximated local MPE estimate, that is, the preferred statex ˆi of Xi given all the evidence in the network, can be computed at any time T by the product of all incoming messages: = T . xˆi argmax Mni (xi ) (2.3) xi ∈Xi Xn∈ Ne(Xi )

A detailed step-by-step example of this max-sum inference can be found in Freeman, Pasztor, and Carmichael (2000, sec. 2.1). While belief revision is used for inferring the MPE, a similar algorithm for computing the MPM estimate also exists and is called belief propagation. Table 1 compares the two estimates and their associated message-passing iterative equations. The only difference between the two algorithms is that one uses maxi- mization and the other uses summation. The computation of the optimal estimate (see equation 2.3) is identical for both algorithms. The similarity between belief revision and belief propagation suggests that they are two

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/neco.2009.05-08-783 by guest on 26 September 2021 3016 S. Litvak and S. Ullman

Table 1: MPE versus MPM: Statistical Estimates and the Corresponding GDL- Based Message-Passing Equations.

MPE MPM = , ,..., , ,..., Exact Pi (xi ) max P(x1 x2 xN) P(x1 x2 xN) allX ∈H\X j i allXj ∈H\Xi Approximate Belief revision (max product) Belief propagation (sum product) t = φ , t−1 φ , t−1 Mij(xj ) max ij(xi xj ) Mni (xi ) ij(xi xj ) Mni (xi ) xi ∈Xi ∈ Xn∈Ne(Xi )\Xj xi Xi Xn∈Ne(Xi )\Xj

Notes: Note the structural similarity of the two message equations and the correspon- dence between the mathematical operations used for the exact and approximated infer- ence. While the exact estimate requires global information related to all variables, the iterative message equation multiplies local messages coming from the direct neighbors of each node.

versions of a single general computation. Indeed, two groups described at about the same time generalized frameworks for inference on MRF, Bayesian networks, and other graphical models. One generalization (Aji & McEliece, 2000) introduced the notion of the generalized distributive law (GDL), while the second (Kschischang et al., 2001) used the notion of factor graphs. The GDL describes a method for fast inference by concatenating two binary operations, + and •, which define a commutative semiring over asetK . We will call them below the basic operation. This generalization was established for the case of graphical models without loops. In this case, the algorithms are guaranteed to converge and find the exact global optimum. Moreover, the computation time is linear in the size of the graph instead of the exponential time required for the straightforward, exhaustive MPE search (or MPM marginalization). On loopy graphs, GDL-based algorithms are not guaranteed to converge, and their theoretical justification is only partial (Heskes, 2004; Murphy, Weiss, & Jordan, 1999; Yedidia et al., 2003; Yuille, 2002). However, experience gained from many applications in vari- ous domains (Freeman et al., 2000; Meltzer, Yanover, & Weiss, 2005) shows that in practice, these algorithms are very useful and often quickly find a good, near-optimal interpretation. We use the terms GDL-based inference and graphical-like inference equiva- lently while referring to this general inference method, which is based on the GDL. In the next sections, we discuss issues of neuronal implementation of graphical-like inference and describe the neuronal inference circuit that performs this task.

3 A Model of Max-Sum Inference by Cortical Columns

In this section we describe a graphical-like inference model of cortical cir- cuitry based on maximization (max) and summation (sum). In section 3.1,

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/neco.2009.05-08-783 by guest on 26 September 2021 Cortical Circuitry Implementing Graphical Models 3017

we describe the underlying message equation of the model. Section 3.2 describes a corresponding circuit with abstract sum and max units. The neuronal, spike-based implementation of summation and maximization is discussed in 3.3. Section 3.4 describes the implementation of the full neu- ronal inference circuit with LIF neurons. Finally, in section 3.5 we present inference results using this circuit.

3.1 Biological Graphical-Like Computation: Belief Consolidation. In section 2 we described the belief revision message equation 2.2, which is often used for finding a useful, highly likely solution in loopy graphs with many variables. This equation uses maximization and multiplica- tion whose neuronal implementation is not straightforward. In this sec- tion, we develop a variant of belief revision, which is more amenable to neuronal implementation but maintains the key qualities of the original computation. First, by taking the log on both sides of equation 2.2, we get ⎛ ⎞ t = ⎝ψ , + t−1 ⎠ , mij(xj ) max i, j (xi xj ) mni (xi ) (3.1) xi ∈ Xi Xn∈ Ne(Xi )\Xj

where m and ψ are the log versions of M and φ in the original equation. Note that both the product and the max of belief revision have natural equivalents in the log domain, unlike the log domain version of the sum product belief propagation equation (see Table 1). Next, we rewrite the computation in a slightly modified form. The im- plementation of equation 3.1 is an alternating cascade of sum and max operations, which is visualized in Figure 1. Starting at a different phase of this alternation, we get t = ψ , + t−1 , mij(xi ) max n,i (xn xi ) mni (xn) (3.2) xn∈ Xn Xn∈ Ne(Xi )\Xj

which is an alternative message equation, whose iterations create exactly the same cascade. In the terminology of Figure 1, equation 3.2 describes sum-max-LINCs while equation 3.1 is equivalent to max-sum-LINCs. While in equation 3.1, the message mij was a function of the target variable Xj ,in equation 3.2, it is a function of the source variable Xi . A direct implementation of equation 3.2 requires a summation unit for calculating a unique sum for each edge on the original graph and each state of Xi , as the message from Xi to Xj depends on both variables. This dependency arises because the message sent from Xi to Xj combines the messages from all the neighbors of Xi with the exception of the recipient node, Xj .

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/neco.2009.05-08-783 by guest on 26 September 2021 3018 S. Litvak and S. Ullman

Figure 1: Two connectivity patterns in a graphical-like inference circuit. The inset shows a segment of a graph with four connected variables. The main figure shows schematically the corresponding segment of the full inference circuit over the graph. Summation ( ) units associated with all states of one variable X2 or X3 are wired (solid arrows) to a maximization (max) unit. A unique compatibility value, associated with the source and the target states, is added to each sum (small hatched squares). Each max unit is associated with one neighbor of X1 and a single state of X1. The max units are wired to a sigma element (dotted arrows from X2 and X3 to X1). This alternating cascade of max-sum elements and solid-dotted wiring patterns is repeated throughout the entire inference circuit. Dotted frames represent max-sum LINCs, and solid frames represent sum-max LINCs (details in the text).

A simplified, approximated implementation we simulate below sends the same message to all recipients: t = ψ , + t−1 . m¯ i (xi ) max n,i (xn xi ) mn (xn) (3.3) xn∈Xn Xn∈ Ne(Xi )

t This simplified message value mi (xi ) is often called the belief of state xi at time t. Note that calculating the messages (or “beliefs”) for all nodes according to equation 3.3 is more economical than calculating the messages for all pairs of nodes in equation 3.2. In a richly interconnected network,

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/neco.2009.05-08-783 by guest on 26 September 2021 Cortical Circuitry Implementing Graphical Models 3019

the contribution of a single neighbor is usually small, and therefore the uniform messages in equation 3.3 are similar to the different messages in equation 3.2. Numerical experiments show that in such networks, the two messaging schemes produce similar results (details below). Next, we address the problem of representing negative unbounded val- ues of messages and compatibility functions in the log domain. To overcome this, the log values were shifted to a positive bounded range, which is equiv- alent to scaling in the linear domain. This normalization step is done over t the valuesm ¯ i from the left side of equation 3.3, using their mean value  t  ∈ m¯ i (x) x Xi :

+ t t t m (xi ) = η · m¯ (xi ) − m¯ (x) . (3.4) i i i x∈Xi

The shifted log compatibilities are

+ ψn,i (xn, xi ) = log φn,i (xn, xi ) + σ ∈ [0,σ] . (3.5)

The parameters η and σ control the range of values, which should be coded by neural activity and synaptic efficacies. Ignoring the remaining negative values eliminates small messages and compatibility values but does not typically affect the MPE, as demonstrated by the experimental results below. We refer to the max-sum iterative scheme of equations 3.3 and 3.4 as belief consolidation. Since in equation 3.3 we already sum over all neighbors, reading the final estimate from the messages produced by belief consolidation is straightfor- ward:

= t . xˆi argmax mi (xi ) (3.6) xi ∈Xi

We compared the accuracy of estimates achieved by belief consolidation (see equations 3.3 and 3.4) with those of the standard belief revision (see equation 2.2). We used 100 randomly generated graphs with six unobserved nodes: three states at each variable and three neighbors. (For more detail on the graph construction, see section 3.5.) The parameters in all inference trials were η = 1.1, σ = 6.0. Belief consolidation found 80 exact MPE so- lutions versus 92 found by belief revision. However, 99 solutions found by belief consolidation were within the top 1%, in terms of likelihood, of all 729 possible configurations, compared to 96 of belief revision. All so- lutions found by both belief consolidation and belief revision were within the top 5% most likely configurations. On another set of 100 graphs with 12 unobserved binary nodes, results were similar. In summary,belief consolidation (see equations 3.3 and 3.4) is a variant of belief revision in the log domain, which sends identical, positive messages

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/neco.2009.05-08-783 by guest on 26 September 2021 3020 S. Litvak and S. Ullman

to all neighbors, and achieves comparable results. The following section describes how these message equations can be implemented in a circuit of max and sum units.

3.2 From Equations to Circuits. A graphical-like inference circuit for computing the messages associated with all states and all variables in the graph in parallel can be constructed by assembling many computational units, which implement summation and maximization. A critical issue is the wiring: the neuronal units should be wired according to the state-variable relations and the neighborhood relations on the graph. In particular, for belief consolidation (see equations 3.3), the sum units associated with all states of a given variable Xn should be wired to a single max unit, and max units associated with all neighbors Xn of Xi and with a single state xi ∈ Xi should be wired to a single sum unit. Figure 1 demonstrates schematically these wiring patterns in a segment of a full inference circuit. Note that a circuit for the log-domain belief revision (see equation 3.1) is similar but requires a specific set of sum units for each target variable. A belief propagation circuit has the same connectivity but uses product and sum units instead of sum and max units. The circuit in Figure 1 can also be viewed as a cascade of repeating compound units. We call such a compound unit a LINC—a local inference circuit. There are two options for choosing LINCs: max-sum LINCs (dotted frames), which implement max over sum, or sum-max LINCs (solid frames), which implement sum over max. While each max-sum LINC is associated with an edge of the original graph, a sum-max LINC is associated with a variable in the graph. Some previous inference models suggested that binary variables and their corresponding LINC variants are implemented in the cortex by a single (see section 5). The current model suggests that a max-sum LINC is realized by several nearby cortical minicolumns in a single cortical area, while minicolumns in two connected areas on the cortical hierarchy can be viewed as realizing together a sum-max LINC of a multistate variable (see Figure 8). In section 4, we discuss findings on connectivity between different types of cortical neurons in different layers that support this view. In the sections that follow, we describe how the belief consolidation circuit with abstract sum and max units can be implemented by spiking neurons.

3.3 Graphical-Like Circuit Elements Using Spiking Neurons. In this section, we discuss the implementation of the building blocks of a neuronal inference circuit for graphical-like computation. A full inference circuit is based on about 16,000 LIF neurons. It is obtained by connecting elementary circuits that implement two mathematical operations: a linear circuit for summation and a maximization circuit. For each elementary circuit, we give a short description of the connectivity, demonstrate the typical responses,

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/neco.2009.05-08-783 by guest on 26 September 2021 Cortical Circuitry Implementing Graphical Models 3021

and report results indicating the robustness of the circuit. Before describing the elementary circuits, we briefly describe general aspects of the simulated circuits.

3.3.1 Populations of Spiking Neurons. The model uses populations of leaky integrate-and-fire neurons (Gerstner & Kistler, 2002) to implement the infer- ence computation. Analog values that are computed during the inference, such as messages, are coded by the mean rate in neuronal populations dur- ing short periods of a few tens of milliseconds. The standard LIF equation and typical parameters that were used are detailed in appendix A. The model was simulated using the publicly available neuronal simulation en- vironment CSIM (Natschlager,¨ Markram, & Maass, 2003) running under the Linux environment. An integration time step of 0.1 ms was used for calculating membrane potentials, spikes, and synaptic events. The elementary neuronal circuits combine excitatory and inhibitory neu- ronal populations. The number of populations and the connectivity between them vary from one circuit to another. For stand-alone simulations of an elementary circuit, we typically used excitatory populations of 100 neurons each and inhibitory populations of 36 neurons each. In the context of the large neuronal inference circuit, we used maximization and the linear cir- cuits with populations of 25 excitatory and of 9 inhibitory neurons, which are slightly less accurate but require fewer simulation resources. Overall, a typical simulated inference circuit has a total of about 14,000 excitatory neurons, and 2000 inhibitory ones. Populations in an elementary circuit are connected by population con- nections. A population connection is the set of all synapses between neurons in the source population and neurons in the target population. In all cases, we created the connections randomly according to the connection probabil- ity P. Different sets of connection probabilities between populations define different elementary circuits with specific functional properties. In each population connection, the synaptic weights, wk , of the different specific connections between source and target neurons are chosen accord- ing to a gamma distribution. For simplicity, we used the same mean efficacy of 0.25 nA in all connections and differentiate them using the single param- eter of the connection probability P. The weights are set at the beginning of the simulation and do not change. For each circuit, the connection probabil- ities were tuned using an optimization procedure that minimized the error between the simulated spiking response and the expected mathematical function of the circuit. However, we found that the circuits function in a similar manner over a wide range of the parameters. Each circuit has one or more input population and output population. The same type of neural code is used in both, which allows the construc- tion of compound circuits from simpler ones. In all circuits, we consider the mean firing rate of an excitatory population to represent a positive real value. For elementary circuits, which perform quick computation, the

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/neco.2009.05-08-783 by guest on 26 September 2021 3022 S. Litvak and S. Ullman

Circuit Response a1 a2 a3 1 Excitatory Population Source Population Inhibitory Population 0.8 2.3 1.6 0.6 Circuit Excitatory Population 0.4

2.7 Circuit Activity 8.2 0.2 6.0 0 0.01 0.02 0.03 Time (sec) 0 a ,a ,a 0 0.2 0.4 0.6 0.8 1 1 2 3 Source Population Activity

Figure 2: (Left) The linear circuit. An excitatory population of 100 neurons (red) and an inhibitory population of 36 neurons (black) are recurrently connected. Both are input populations and receive the combined inputs from external source populations. Output is decoded from the excitatory population. The numbers correspond to the population connection probabilities (expressed in percentages). (Center) In this example, one external source population (not shown on the left) is projecting to the linear circuit. Sixty-four spikes generated independently during 25 ms at the source population (top plot) invoke 63 spikes at the excitatory population of the linear circuit within a few milliseconds delay (bottom plot). The height of the dot corresponds to the index of the neuron in the population. (Right) Both the inhibitory and the excitatory populations of the linear circuit respond linearly to different levels of source activity over 20 independent trials, each lasting 25 ms. Axes represent mean population rates. When several source populations are connected to the circuit, it responds linearly to the combined activity in all of them.

output is decoded by counting the spikes in the relevant output popula- tions during a short interval of 25 ms. In this interval, cells rarely fire more than a single spike, and therefore the mean rate also represents the frac- tion of active cells, which is a graded positive value typically in the range [0, 1]. Input values in the range [0, 1] are also provided to the relevant in- put populations of an elementary circuit through population connections from external source excitatory populations. During 25 ms, spikes are artifi- cially generated by the external source populations, such that the mean rate represents the required input value. The precise spike times are selected independently during this time window. For the neuronal inference circuit, both the encoding of input and the decoding of output are done over a longer period of several hundred milliseconds (details below). We have described general properties of the simulated neuronal popula- tion. In the next section, we describe how several populations are connected and form the linear (summation) circuit and the maximization circuit.

3.3.2 Linear-Circuit. A simple linear circuit with two populations of spik- ing neurons, excitatory and inhibitory (see Figure 2), is used to sum the

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/neco.2009.05-08-783 by guest on 26 September 2021 Cortical Circuitry Implementing Graphical Models 3023

,..., activity of the input sources: given the input values a1 ak the circuit α ≤ approximates i ai for small values of k (we tested cases with k 12) and a fixed positive α. In the context of inference computation (see equa- tion 3.1), the linear circuit is useful for summation over messages from different neighbors and for adding compatibility values. We observed in the simulation that when an excitatory source population is connected to an excitatory target population, spiking activity at the source might cause a burst of synchronous activity at all target neurons. The population spike was avoided by including in the circuit several factors that increased the variability of membrane potentials and injected synaptic currents across the population. This includes the balancing interaction between excitation and inhibition (Wilson & Cowan, 1972), the gamma distributed efficacies, probabilistic synapses (Maass & Natschlager, 2000), and some variability of threshold and resting potentials. The circuit responds nearly linearly to the combined number of input spikes, when the activity level is not more than a single spike, on average, for each neuron. It can be used to replicate the activity of the source, linearly amplify it, or sum the responses of several sources. The excitatory popu- lation sums up the input activity, and the inhibitory population keeps the activity within a desired range. Connection probabilities were optimized to approximate summation with a gain (α above) of 1. The optimal con- nectivity parameters, in percentages, are indicated near the corresponding arrows in Figure 2 (left). This circuit was tested with 200 independent trials. At each trial, a number x in the range [0, 1] was sampled from a uniform , , distribution, and three input values a1 a2 a3 were chosen randomly and normalized such that their sum equaled x. This guaranteed that the entire range of possible inputs was tested. When a 25 ms time window was used for the population activity, the approximation error over 200 trials had a mean of 0.042 (standard deviation of 0.054). Using time windows of 35 ms and 15 ms gave a similar mean approximation accuracy of 0.041 (±0.050) and 0.054 (±0.060), respectively. When the connectivity parameters were randomly jittered in the range of 0.9 to 1.1 of the original values, the mean error over 5000 independent trials was similar—0.051 (±0.055). We tested the circuit with different numbers of inputs, ranging from 1 to 36, and obtained similar accuracy levels. Together, these results indicate that the linear circuit is robust to noise and variations in parameters, as expected from a biologically realistic circuit. In sections 4.1 and 4.3 we suggest a pos- sible correspondence between summation circuits and “neuronal cliques,” which are highly interconnected populations of excitatory cortical neurons and fast-spiking inhibitory basket neurons.

3.3.3 Maximization Circuit. The maximization circuit (see Figure 3, left) ,..., can approximate maxi (ai ) given the input values a1 ak .Thecircuitisa type of population-based, soft, winner-takes-all (WTA) network. Several ex- citatory input populations are recurrently connected to a central inhibitory

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/neco.2009.05-08-783 by guest on 26 September 2021 3024 S. Litvak and S. Ullman

Circuit Response max a a a 1, 2, 3 1

0.8

0.6

0.4 Total Output Activity Output Total 0.2 a1 a2 a 3 0 12.0 0 0.2 0.4 0.6 0.8 1 Maximal Source Activity ψψ ψ 9.0 1 2 3 14.0 2.4 2.9

m1 m2 m3

8.5

a1 a 2 a 3

Figure 3: (Left) The maximization circuit. Each of the three max components (excitatory input populations, indicated by red rectangles) is driven by an ex- ternal source population (not shown) and is recurrently connected to a central inhibitory population (black elongated rectangle). Responses of the excitatory populations are added by the linear circuit (dashed rectangle), which represents the output of the maximization circuit. The numbers on the arrows indicate the connection probability (in percentile) and are identical for all excitatory pop- ulations. The circuit behaves similarly over a wide range of the parameters. (Center) Weighted maximization circuit. Each input line to the maximization circuit goes through an additional input linear circuit, which adds the corre- sponding compatibility value as explained in the text. (Right) Near-max re- sponses. Two hundred independent trials, each with three inputs. The x-axis indicates the true maximum at each trial, and the y-axis is the circuit result coded by the mean population rate.

population. In addition, they all project to a linear circuit, whose output pop- ulation encodes the near-max response. Each input population is driven by one external source. The operation of the circuit is based on the simple relation max(a, b) = + + a + [b − a] for any two inputs a and b (where [x] ≡ max(0, x)). For multi- ,..., ple inputs a1 an the max can be calculated as a series of max operations − over two variables: assume that Ai−1 is the maximal value of the first i 1 + = i = − , = − + − − inputs; then Ai max j=1 a j max(Ai 1 ai ) Ai 1 [ai Ai 1] . The proof in appendix B shows that the computation also holds for multiple inputs, without sequential ordering. The simulations show that an implementation of the ideal circuit with noisy LIF neurons produces a good approximation to the max computation. Robustness to noise and other parameters was tested by simulations. In 200 independent trials (see

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/neco.2009.05-08-783 by guest on 26 September 2021 Cortical Circuitry Implementing Graphical Models 3025

Figure 3, right) using sets of three randomly selected input values in the range [0, 1], and a 25 ms time window for the populations’ activity, the approximation error had a mean of 0.056 and a standard deviation of 0.072. A larger time window of 35 ms results in a slightly superior approximation (0.053 ± 0.065), while a shorter window of 15 ms induces slightly less accurate results (0.072 ± 0.081). The network was robust to jittering the connectivity parameters randomly in up to ±10% of the optimal values (indicated near the arrows in Figure 3, left) and to different numbers of inputs. Finally, we observed a trade-off between size and accuracy: larger populations can be used to achieve higher accuracy. A simplified version of the maximization circuit (without self-inhibition and excitation) is similar to the circuit studied by Yu et al. (2002) and others. The circuit of Yu et al. uses a single excitatory spiking neuron for each input line and, for the output line, measures the rate of the output neuron over several hundred milliseconds and assumes ongoing high levels of input and output activity during this period. In comparison, our circuit uses pop- ulations instead of single neurons, codes the input and output values using the mean rates over the populations during a short period of 25 ms, and assumes momentary input and output. The population approach allows computing the max with only about a single spike for each neuron. The linear circuit at the output, as well as the self-excitation and self-inhibition in each input population, contributes to the accuracy of the new circuit. In the context of inference in the log domain (see equations 3.1 and 3.3) it is necessary to compute maximization over weighted message values. The weights in the log domain are additive. We implemented a weighted max- imization circuit (see Figure 3, center), which adds the compatibility value ψi corresponding to each input message mi , using a linear circuit, before finding the maximum. The compatibility value ψi is coded as a multiplica- tive gain of the connection probabilities to the excitatory (0.08 · ψi ) and in- hibitory (0.065 · ψi ) populations of the ith linear circuit, from a background excitatory population (elongated with a cross pattern). The background population (one for the entire inference circuit) simulates an ongoing back- ground activity with a fixed firing rate. In this manner, the linear circuit receives currents corresponding to ψi through one input line and currents corresponding to m through the other line. The overall computation of the weighted maximization circuit is maxi (ψi + mi ). In section 4 we discuss evidence linking the structure of the maximization circuit to the properties of a minicolumn in the superficial cortical layers.

3.4 The Neuronal Inference Circuit. In previous sections, we described the neuronal implementation of the summation and maximization nodes. In this section, we describe how they are interconnected to form the full neuronal inference circuit, which can find highly likely (near MPE) inter- pretations of the visible variables in few hundred milliseconds by imple- menting belief consolidation in parallel for all variables. An example of a

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/neco.2009.05-08-783 by guest on 26 September 2021 3026 S. Litvak and S. Ullman

full circuit and its components is shown in Figure 4. The entire circuit was simulated with LIF neurons. The building block of the neuronal inference circuit is the neuronal local inference circuit (neuronal LINC; see Figure 4, main rectangle), which im- plements equations 3.3 and 3.4 for a single variable. For an n-ary variable with k neighbors, the neuronal LINC uses n × k weighted maximization cir- cuits (“max” frames), each over n values, and n summation circuits (frame mark by sigma), each over k values (n = 3, k = 2 for Figure 4). The output of each summation node is a single value. Together the n values comprise a message vector corresponding to one variable, which is the output of the neuronal LINC. Regarding the effect of scaling up the network, note that if the graph is sparse (the number of neighbors of all variables is bounded by a small number K) and the size of each sum and max circuit is fixed, then the size of the neuronal inference circuit that corresponds to the entire graph is linear in the number of states of all variables. The output message values are normalized in a normalization circuit (green dotted frame in Figure 4), where the summation circuits represent- ing the different message values are recurrently inhibiting each other. The excitatory populations of all summation circuits are reciprocally connected to a central inhibitory population (black rectangle in the center). This circuit approximates the normalization step of equation 3.4. For more details see appendix C. Section 4.4 describes a possible realization of the normalization in cortex. Each neuronal LINC is connected to other LINCs according to the MRF graph and propagates the same message to all neighbors. The neuronal LINC is the neuronal embodiment of the abstract sum max LINC (see Figure 1, solid frame of X1). All connectivity parameters of the neuronal inference circuit are listed as numbers near the connecting arrow in Figures 2 through 4. Each num- ber indicates the percentage of connected neurons in the two associated populations. For the rest of the simulation parameters, see the section on populations at the beginning of section 3.3, and appendix A. We suggest that a neuronal LINC corresponds to several functional cor- tical minicolumns and spreads over layer 4, layers 2 to 3, and 5, as discussed in section 4. The same structure can be systematically extended to an arbitrary graph and any number of states. Given an MRF graph G and an associated set of compatibility functions over the edges, a corresponding neuronal infer- ence circuit can be constructed by connecting instances of neuronal LINCs according to neighborhood relation G and setting the relevant synapses with log of the compatibilities (by equation 3.5). The example illustrated in Figure 4 contains six input nodes (the Yi ) and six interconnected nodes (Xi ).

3.5 Inference with Spiking Neurons: Results. To test how the model performs with spiking neurons, we constructed neuronal inference circuits

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/neco.2009.05-08-783 by guest on 26 September 2021 Cortical Circuitry Implementing Graphical Models 3027

Figure 4: Neuronal LINCs in the context of the full neuronal inference circuit. The full circuit (left) contains six inputs (Yi ) and six interconnected internal nodes (the variables Xi ). Each node is a neuronal LINC (right; large black frames), which approximates equation 3.3. The neuronal LINC is built from populations of LIF neurons (small red, black, and dashed rectangles). These populations are combined to form several instances of weighted maximization nodes (max frames) and summation nodes (sigma frames). The main black frame shows the neuronal LINC for the variable X1. In the top graph, X1 has three neighbors: X2, X3, X5. Due to size limitations, the corresponding neuronal LINC shows only the projections from X2 and X3 and the related maximization nodes. For these two neighbors and for each of the three states of X1 the relevant compatibility values are added to the incoming messages and the maximum of the weighted message values is found by one maximization circuit. Each pair of maximum results is combined by one summation node (linear circuit). The three sums are normalized by a normalization circuit (green dotted frame; see details in text), which contains a globally connected inhibitory population (black rectangle in the center) and comprises the outgoing message of X1. Copies of this message are propagated to all of X1’s neighbors (outgoing edges on the MRF graph; each edge represents three connections). The circuit finds a highly likely, near-MPE interpretation of the visible variables within few hundred milliseconds.

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/neco.2009.05-08-783 by guest on 26 September 2021 3028 S. Litvak and S. Ullman

for MRF graphs of the kind described in Figure 4. Two sets of graphs, each of 100 different inference circuits, were implemented and tested. While in section 3.1 we reported results for the same sets of graphs using direct im- plementation of the message equation, here we describe results with spiking networks. The circuits in set A were based on MRF graphs with 12 binary unobserved variables, 12 observed variables, and a search space of 4096 pos- sible interpretations. That is, given the observed data, there are 4096 possible assignments of the unobserved variables, and the network searches for the most likely one. Set B was based on graphs with six ternary unobserved variables, six visible variables, and a search space of 729 interpretations. In all graphs, each unobserved variable Xi had three unobserved neighbors, selected randomly, and a single visible neighbor Yi . For each graph, values for all compatibility functions φij(Xi , Xj ), φi (Xi , Yi ), were chosen indepen- dently from a uniform distribution and normalized. For each graph, a set { } Yˆ = yˆi of randomly selected visible states was chosen. Inference circuits of set A were made of 13,512 neurons and 318,973 synapses on average. Set B circuits used 16,656 neurons and 940,152 synapses on average. An inference trial over a neuronal inference circuit was simulated during 500 ms. During this time, the input was presented to the circuit, and the output was extracted. To set the input to the circuit, we generated spike trains during the entire simulation time for the populations associated with the Yi ’s, with a fixed mean rate corresponding to the relevant compatibility values and the chosen evidence statey ˆi , in the range 0 to 40 spikes per second. The output of the circuit was extracted from the output populations of all the neuronal LINCs during a test period lasting between 100 and 400 ms of the inference trial. For each output population, the instantaneous mean rate was evaluated by counting spikes in 150 ms time bins, separated 10 ms from each other (140 ms overlap between two consecutive bins). The instantaneous rate represents the message value of the corresponding state (red curves in Figure 5). For a given neuronal LINC, the instantaneously most active population reflects the preferred state. Collectively, the win- ning states over all variables comprise the instantaneous interpretation or solution expressed momentarily by the network. We defined the dominant solution to be the interpretation expressed by the network for the longest time during the test period. This solution was compared with the solutions found by nonneuronal inference methods. Figure 5 shows an example of a successful inference trial with the neu- ronal inference circuit. To understand how the computation evolves and finds highly likely solutions, we refer to equation 3.3. Computationally the likely solutions are found by iterating the message passing in equation 3.3 a sufficient number of times. Since this message equation is approximated by spiking LINCs (see Figure 4), the inference process in Figure 5 can be understood in terms of iterating the approximated messages mi (xi ), which are coded by the spiking populations. The 18 individual plots are the neural

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/neco.2009.05-08-783 by guest on 26 September 2021 Cortical Circuitry Implementing Graphical Models 3029

Figure 5: An example inference trial. The neuronal inference circuit finds the MPE solution over a pairwise-MRF graph with 6 variables, 3 states each, and 18 edges with random compatibility function. A simulation period of 400 ms is plotted. (Top 18 plots) Each row Xi Sj represents the activity in one output excitatory population (25 neurons) of the neuronal LINC associated with the state j of variable i (the output of three sigma units in Figure 4). A gray dot indicates a spike (the height is the index of the neuron). The red curve represents a smoothed version of the mean spike rate in the population on a scale of 0– 60 Hz. A colored segment indicates the preferred state at any give time, whose instantaneous mean rate is the maximal among the three states of the variable. The preferred states found by the network define an instantaneous solution or interpretation of the input. An end point of a colored segment (black vertical line) indicates a change of the instantaneous interpretation. (Bottom plot) Log likelihood of instantaneous interpretations found by the circuit. Each number indicates the index of the instantaneous interpretation among all 729 possible ones when ordered according to their likelihood. At about 310 ms, the network found the MPE, the most likely, solution.

representations of the messages generated for the 6 ternary variables Xi in the MRF graph. Assuming that a message is computed in approximately 25 ms using spikes, the 400 ms are equivalent to 16 iterations. At an early stage of the inference, only the message from the immediate neighbors of a given node contributes to the sum of equation 3.3. Later, once the excitatory output populations of the LINCs start spiking, the messages from more dis- tant LINCs start affecting the new evolving messages. As this propagation of messages continues, the instantaneous solution (combination of preferred states the plots of which are colored) reflects a growing consistency between the preferred states at different LINCs, and the solution likelihood typically increases. For an explicit step-by-step example of message evolution in a max-sum inference, with numerical messages (rather than spikes), see a detailed description in Freeman et al. (2000, section 2.1, equation 1–21).

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/neco.2009.05-08-783 by guest on 26 September 2021 3030 S. Litvak and S. Ullman

Near-optimal solutions are found in most of the 100 inference trials over different graphs in both sets. For set A, the likelihood of the solution in 61 trials was in the top 1% of the most likely configurations. Ninety-three trials (including the 61) are in the top 5% configurations, and 98 are in the top 20% configurations. In five cases the dominant solution is the exact MPE solution. For set B, 11 are the exact MPE, 37 solutions are in the top 1%, 71 in the top 5%, and 91 in the top 20%. The distribution of the solutions is summarized in Figure 6 (center). The same graph also compares the quality of the solutions found by the neuronal inference circuit with the solutions found by an explicit implementation of belief revision and belief consolidation. In terms of finding a near-MPE solution (within the top 5% most likely solutions), the solutions found by the neuronal inference circuit were slightly inferior compared with the solutions of the explicit algorithms. For set A, there are 93 near-MPE spiking solutions versus 98 of belief consolidation and 100 of belief revision. For set B, there are 71 spiking solutions versus 100 of belief consolidation and 99 of belief revision. We also measured, for comparison, the accuracy of solutions where the preferred state of Xi was selected based on local prior probabilities (from Yi ), without considering the pairwise interactions between different Xi ’s. This method proved to be significantly inferior with respect to all other inference methods: for set A (B), there were 0 (3) exact MAP solutions, 19 (16) in the top 1%, and 48 (33) in the top 5%. The log likelihood of the solutions found using the spiking network was higher by 2.9 (2.3) compared with the maximal local probability solutions, averaged over 100 graphs. These results indicate that although the spiking network inference is not as effective as the underlying approximated computation, it does effectively use pairwise compatibilities. The two histograms on the right of Figure 6 show another measure of the approximation accuracy of the neuronal circuits. We calculated the KL difference between the exact marginal distributions of different variables and the approximated distributions extracted from the mean firing rates of the corresponding neuronal populations (see the triplets of plots marked by the same color in Figure 5), in the period between 200 and 400 ms, after transforming them from the log domain to the linear domain and normalizing. The KL distances are typically small: a mean of 0.123 (σ = 0.115) for 600 tested distributions of ternary variables and 0.119 (σ = 0.151) for 1200 binary variables. This indicates that the firing rates effectively encode an approximation of the marginal distributions. To understand the source of accuracy differences between the spiking and the direct implementation of belief consolidation, we tested another inference model. This model used direct belief consolidation, where a ran- dom jitter was added after the implementation of each mathematical oper- ation (maximization, summation, normalization). The level of the jitter was equivalent to the measured error in the specific spiking subcircuit that im- plements the operation (see section 3.3). The inference results were inferior

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/neco.2009.05-08-783 by guest on 26 September 2021 Cortical Circuitry Implementing Graphical Models 3031

Figure 6: (Left) Stability of inference solutions. Each plot describes one infer- ence trial over a neuronal network corresponding to a different MRF graph. The plot shows the log likelihood of the instantaneous solution held by the network. The instantaneous solution of the network can be stable, as in Figure 5, or nonstable, as shown in these examples. (Top) The network finds the MPE solution but loses it after a while. (Middle) The network is oscillating between the first and the fourth most likely interpretations. (Bottom) No stability or sim- ple oscillation can be observed during the first 0.4 second. (Center) Accuracy of the neuronal inference circuit. One hundred inference trials with different MRF graphs and evidence were conducted using (a) an explicit computation of the standard belief revision equation 2.2 (black bars), (b) an explicit computation of the belief consolidation equations 3.3 and 3.4 (dark gray bars), (c) a neuronal inference circuit based on spiking neurons (white bars), and (d) selection of preferred states according to the local prior probability (light gray bars). Each bar describes the number of solutions whose likelihood is within 5% of all con- figurations (ordered by likelihood). The top plot shows the results for graphs with 12 binary variables. On 93 out of 100 inference trials, the spiking network solution was among the top 5% of solutions. The bottom plot shows results for graphs with six variables—three states each. (Right) The approximation accu- racy of the neuronal inference circuits measured in terms of the KL distance between the exact marginal distribution of the variables and the approximated values, extracted from the mean rates of the populations associated with the corresponding variable-states (see Figure 5) in the period 200–400 ms. The top (bottom) histogram represents all KL distances for the binary (ternary) variables in 100 different networks. The majority of the KL distances are small, indicating that the network effectively approximates the marginal distributions.

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/neco.2009.05-08-783 by guest on 26 September 2021 3032 S. Litvak and S. Ullman

compared with belief consolidation without jitter but superior to the spiking network. This indicates that the inaccuracy of the neuronal inference circuit comes in part from the sub circuits’ approximation error but also from ad- ditional sources. For example, the error may grow following phenomena in the spiking network, such as partial synchronization in subpopulations, which evolves during the inference (see dense and sparse regions of spikes in the top plots of Figure 5). Interpretations found by the neuronal inference circuit often move be- tween several likely interpretations (Figure 6, left), typically in the top 10% range. In most cases, it is only one or two variables whose instantaneous winning states are alternating, while other variables are stable. Oscillations may appear also with the explicit algorithm (Heskes, 2004; Murphy et al., 1999; Yedidia et al., 2003). The high speed of the inference computation, often in less than 400 ms, is a result of both the highly parallel spiking implementation and the effi- cient underlying max-sum inference algorithm (see equation 3.3). First, the representation of numerical values is based on a rate code in populations of several tens of neurons, such that in a short time (e.g., 25 ms) and with biologically realistic rates (e.g., 0–60 Hz), it is sufficiently accurate for the inference task. Second, each of the summation, maximization, or normaliza- tion subcircuits described above can perform the computation over many inputs within 25 ms or less. Third, the output population of every elemen- tary circuit starts responding to the input within only a few milliseconds after the input is presented (see, e.g., Figure 2, center). Consequently the re- sponse of a cascade of elementary circuits in a LINC is not much longer than the operation of a single elementary circuit, and 400 ms in the simulations are roughly equivalent to 16 time steps of the underlying sum-max message equation. Fourth, all messages for all variables are computed in parallel by the corresponding LINCs. Finally, the underlying max-sum belief consoli- dation, is a case of the generalized distributive low (Aji & McEliece, 2000), which effectively decomposes multivariate computations (see section 2). In the direct numerical implementation of belief consolidation, we found that after 16 time steps, the algorithm typically finds an optimal or near-optimal solution. This combination of factors can explain the fast computation by the neuronal inference circuit.

4 Cortical Correlates of the Graphical Model

The neuronal inference circuit described above (see Figure 4) demonstrates that efficient distributed inference can be performed by a network of spiking neurons, in a biological time scale, using a rate code in neural subpopula- tions. In this section, we discuss relevant aspects of cortical circuitry and how they may relate to the model. In terms of its large-scale structure, the model of neuronal inference computation described above refers to a loopy pairwise MRF. However, the

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/neco.2009.05-08-783 by guest on 26 September 2021 Cortical Circuitry Implementing Graphical Models 3033

model is not restricted to such graph structures. Computationally graphical models can be directed or undirected, loopy or loop free. There are differing views on the directionality and the dominance of local cycles in a graph- ical model of the cortical circuitry. Massive intra-areal lateral connectivity (Angelucci et al., 2002) suggests that cortical variables represented within a single cortical area are tightly connected and may create significantly loopy subgraphs and possibly undirected, symmetric, compatibility func- tions. The feedforward and feedback inter-areal connections (Felleman & Van Essen, 1991) correspond more naturally to hierarchical, directed graphs. The computations performed in the cortical hierarchy may be governed by relatively independent bottom-up and top-down streams of information (Hinton, Dayan, Frey, & Radford, 1995; Ullman, 1995) or by strongly cou- pled streams, which form a relaxation process (Lee & Mumford, 2003). We will not discuss here details of the large-scale structure of cortical graphical models, but shall focus on the mapping of the local structures of graphical models to the local, layered cortical circuit in a column of a few hundred microns. The notion of a canonical local cortical circuitry, shared by many species and cortical areas and characterized by several types of inhibitory and exci- tatory neurons, their laminar location, and interconnection probabilities, is an active field of investigation (for a recent review, see Douglas & Martin, 2004). The neuronal inference circuit suggests the existence within the cortex of several levels of organization and connectivity patterns. In particular, the model suggests the existence of neuronal groups corresponding to the rep- resentation of values (states) of variables, and a higher-level organization into variables, each one composed of a number of possible states. The vari- ables and states should be interconnected in a particular pattern (see Figure 1), allowing, for example, the appropriate maximization and summation to take place. In the sections below, we briefly review relevant aspects of the local cortical organization, together with their possible relation to the neuronal inference circuit.

4.1 Subnetworks Within a . The goal of the discussion here is to suggest possible connections between functional subunits within the inference circuit and substructures within the cortical column. In this section, we briefly review relevant empirical evidence concerning local subnetworks within the cortical column. It is well known that neurons with similar response properties are ar- ranged in columns spanning through all cortical layers (Mountcastle, 2003). The functional column is often assumed to be the basic common building block of the cortical circuitry. However, a functional column whose typical diameter is 300 to 500 μm is a rather large piece of cortical machinery with a complex internal structure. This local cortical circuit was recently charac- terized quantitatively by specifying the average probability of connections between every major type of neurons and their laminar location (Binzegger,

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/neco.2009.05-08-783 by guest on 26 September 2021 3034 S. Litvak and S. Ullman

Douglas, & Hartin, 2004). There is, however, a question to whether any two nearby neurons of given types and layers have the same connection probability. The alternative is the existence of several fine-scale clusters or subnetworks of strongly interconnected neurons, which segregate nearby neurons to separate functional units. In this case, neurons of the same clus- ter will have a higher connection probability than neurons from different clusters. Indeed, several possible structures were suggested for the fine- scale segregation of a functional column into microcolumns, minicolumns, or modules. Criteria for identifying such structures were based on vari- ous repeating anatomical features, including vertical chains or clumps of up to 20 pyramidal neurons in layers 2 to 6 (most vividly identified in the temporal cortex of ); ascending bundles of dendrites of layer 5 and 3 pyramidal neurons (more details below) and different systems of dendritic bundles of layer 6 pyramids and of layer 4 small pyramids; and upper-layer honeycomb-like minicolumns that stain for Zn (found in the rat visual cortex and primates’ cortex; for a review see Jones, 2000; Rockland & Ichinohe, 2004). From these anatomical descriptions, it remains unclear, however, whether the pyramidal neurons in such localized clusters also have common and unique input sources and share common projection tar- gets. Such selective connectivity is required for a cluster to be functionally differentiated from nearby clusters and allow it to play the role of a sepa- rate subunit within the larger column. As the structures are relatively small (about 50–100 μm) compared with the typical dendritic field of a pyrami- dal neuron (e.g., 300 μm), neurons of nearby structures tend to have similar input. Thalamic, cortical and other projections are therefore likely to con- tribute to multiple nearby clusters. However, there are partial indications that different clusters within the same neighborhood can have different and specific connectivity patterns. Two examples are specific corticocor- tical input to layer 2, honeycomb-like structures in the rat visual cortex (Ichinohe, Fujiyama, Kaneko, & Rockland, 2003) and in monkeys (Ichinohe & Rockland, 2004) and bundles of dendrites of pyramidal neurons project- ing specifically to the contralateral hemisphere in the mouse motor cortex (Lev & White, 1997). In summary, the evidence suggests the existence of localized neuronal structures in cortex that are smaller than the functional column, but the functional role of these structures is still unclear. A different and possibly complementary division of the functional col- umn to subunits is the notion of several fine-scale, strongly interconnected subnetworks whose neurons are spatially mixed. Neurons of different sub- networks share overlapping dendritic fields but connect preferentially to a specific subnetwork. Growing evidence in recent years supports this gen- eral notion. Physiological studies of connection probabilities indicate the existence of highly connected substructures in layer 5 as well as the super- ficial layers. For example, in layer 5 of the rat visual cortex, bidirectionally connected pairs and strongly interconnected three-neuron clusters are sig- nificantly overrepresented (Song, Sjostrom Reigl, Nelson, & Chklovskii,

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/neco.2009.05-08-783 by guest on 26 September 2021 Cortical Circuitry Implementing Graphical Models 3035

2005). Similarly, Wang et al. (2006) found in the ferret PFC a subclass of layer 5 pyramidal neurons where the chance of a connected pair to be re- ciprocally connected was as high as about a half. Evidence for fine-scale subnetworks of pyramidal neurons exists also in the superficial layers of the rat visual cortex, where they tend to form selective connectivity with subnetworks in other layers. Yoshimura et al. (2005) found that connected pyramidal neurons in layers 2/3 tend to be reciprocally connected and also share the same input sources from layer 4, indicating that each layer 4 subnetwork preferentially connects to a specific layer 2/3 subnetwork (Yoshimura et al., 2005, Figure 4). Evidence for selective connectivity from layer 2/3 subnetworks to layer 5 subnetworks also exists (Kampa, Letzkus, & Stuart, 2006). Ohki, Chung, Ch’ng, Kara, and Reid (2005) provided func- tional support for the parallel subnetworks view by presenting an ori- entation map at the single cell resolution, of the rat visual cortex, where neurons with robust orientation-specificity are spatially mixed. In the cat visual cortex, orientation-selective cells appear spatially segregated (on an eight-orientation resolution). However, spatially mixed functional subnet- works may still exist for finer orientation resolution or perhaps for another stimulus parameter. The connectivity pattern of inhibitory neurons is also consistent with the notion of segregated subcolumnar structures. Yoshimura and Callaway (2005) found that in the superficial layers, inhibitory fast-spiking (FS) basket cells tend to be reciprocally connected with specific pyramidal neurons and share the same input from layer 4 with these neurons. We conclude that the empirical data support the existence of multiple local subnetworks within the cortical column, which are characterized by three main properties. First, a subnetwork is made of excitatory pyramidal neurons and inhibitory FS basket cells. Second, neurons in a subnetwork tend to be strongly interconnected. Third, neurons in a subnetwork tend to share a common input. We use the term neuronal cliques to describe such subnetworks of strongly coupled populations of pyramidals and of inhibitory FS basket cells. In the next sections, we describe the proposed relationships between neuronal cliques in different layers and their role in the inference circuit. Briefly, we suggest that linear summation nodes correspond to local layer 4 neuronal cliques that include excitatory cells and inhibitory FS basket cells and that maximization nodes correspond to superficial layers minicolumns that include double-bouquet inhibitory cells and several neuronal cliques of pyramidal and FS basket cells.

4.2 Cortical Minicolumns Correspond to the Maximization Nodes of the Inference Circuit. A central element of the neuronal inference circuit is the maximization-node (max frames in Figure 4). It finds the maximum (or softmax) of several input values—a weighted version of message values related to a specific input variable. Each of the input values is represented in the maximization circuit by a max component (the populations are depicted

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/neco.2009.05-08-783 by guest on 26 September 2021 3036 S. Litvak and S. Ullman

by red rectangles in a max frame in Figure 4). Central inhibition (the black rectangle in a max frame) drives the nonlinear response of the max compo- nents. In cortex, we suggest that a cortical minicolumn may correspond to a maximization circuit. Within each minicolumn, several neuronal cliques in the superficial layers correspond to the max components, while inhibitory double bouquet cells and possibly related inhibitory types correspond to the central inhibition. Following is the anatomical evidence supporting this suggestion. With respect to inhibitory neurons, the model suggests that FS basket cells are related locally to a neuronal clique, and double bouquet provide broader inhibition between different components. It was found (Yoshimura & Callaway, 2005) that in the rat visual cortex, FS baskets in layer 2/3 tend to connect preferentially with neurons in a specific clique, but in- hibitory neurons characterized by spike rate adaptation nonselectively con- nect with pyramidals in all neuronal cliques. Additional evidence, reviewed in Markram et al. (2004), suggests that the less specific inhibitory adaptive cells (AC) correspond to the double bouquets and related morphological types. DeFelipe, Hendry, Hashikawa, Molinari, and Jones (1990), Peters and Sethares (1997), and others studied extensively the arrangement of double- bouquet neurons in layer 2/3 and their vertical “horse tail” bundles of axonal branches, which typically descend through layer 4 down to layer 5. Peters and Sethares (1997) found that in the monkey striate cortex, the are often located at regular distances of about 23 μm, approximately corresponding to the regularity of ascending bundles of apical dendrites of layers 2, 3, and 5 pyramidal neurons and of independent dendritic bundles of layer 6 pyramidal neurons. It was suggested that such a narrow module or minicolumn of roughly 150 excitatory and inhibitory cells may serve as a repeating cortical subcircuit. In different species and areas, the axons and dendrite periodicity vary in the range of 22 to 70 μm, while the cell count varies in the range of 142 to 335. (See Table 1 in Buxhoeveden & Casanova, 2002, and Table 2 in Lev & White, 1997. For a detailed review on minicolumns, see Buxhoeveden & Casanova, 2002.) The extent of minicolumn-specific connectivity of double bouquets and pyramidals is not fully known. The synapses of the double-bouquet ax- ons may target neurons whose cell body is located outside the minicol- umn boundaries. However, with respect to layer specificity, Binzegger et al. (2004) confirm the relatively high rate of connections between double bou- quets and pyramidals of the superficial layers. Our suggestion combines the two findings on double bouquets: nonse- lectivity with respect to neuronal cliques of the superficial layers and nar- row columnar arrangement. A minicolumn with several separated neuronal cliques, each having a unique input and with centrally connecting inhibitory double bouquets, can function as a maximization node as demonstrated by

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/neco.2009.05-08-783 by guest on 26 September 2021 Cortical Circuitry Implementing Graphical Models 3037

our simulation (see Figure 3). Moreover, several adjacent minicolumns may serve as different maximization nodes over a different weighted version of the same set of inputs. This is required in the model for the compu- tation of message values for different states of the same target variable. Figure 7 shows two nearby minicolumns, each with three superficial neu- ronal cliques, colored according to their respective input sources in layer 4. The arrangement is similar to the model in Figure 4: the cortical neu- ronal cliques correspond to the max components and the double-bouquet neurons to the central inhibition in one maximization node. This architecture of the superficial layers supports a key aspect of graphical-like inference: evaluating the contribution of a specific source state to the message of a specific target state. A superficial neuronal clique instantiates the interaction between the two states. The set of nearby mini- columns, which project to different states of the target variable, locally in- tersects with the set of incoming projections from different layer 4 cliques, which represent different states of the source variable (more details in the following section). At each of the intersection points, there is one superficial neuronal clique representing one source and one target state. Evidence for a narrow, minicolumn-like, functional arrangement exists, for example, in the primary motor cortex of behaving monkeys (Amirikian & Georgopoulos, 2003), where neurons showing a similar preferred direc- tion in 3D movement space were arranged in 50 to 100 μm minicolumns, located roughly 200 μm apart. The concept of isolated, independent minicolumns is probably a sim- plification of cortical reality as the dendritic field of pyramidal and other neurons spans much beyond 100 μm. However, synaptic learning mech- anisms can contribute to the specificity of local clusters even when their dendritic fields are overlapping. Self-organization can promote the func- tional connectivity of the maximization circuit, namely, that several max components (neuronal cliques) with related functional responses share a central, reciprocally connected inhibition. We conclude that maximization nodes are implemented in superficial cortical layers by small, local, and relatively independent minicolumns, which combine several neuronal cliques with central inhibition.

4.3 Layer 4 Neuronal Cliques Correspond to Summation Nodes of the Inference Circuit. In this section we suggest a mapping between neuronal cliques of layer 4 and summation nodes of the neuronal inference circuit (sigma frames in Figure 4). A summation node finds the sum (or the mean) of several input values. In cortex, we suggest that feedforward projections from the superficial neuronal cliques of several minicolumns in a lower cortical area terminate on a neuronal clique in layer 4 of a higher cortical area. The target clique is structured as linear circuit (see Figure 2) whose fast-spiking inhibitory basket neurons and excitatory neurons balance its activity and allow a linear response to the sum of its inputs, much like the

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/neco.2009.05-08-783 by guest on 26 September 2021 3038 S. Litvak and S. Ullman

Figure 7: Organization of minicolumns and neuronal cliques in layers 2/3 and 4 that forms maximization circuits. Two minicolumns, each with three neuronal cliques at the superficial layers (indicated by red, pink and yellow), are shown in this schematic drawing of a vertical section across layers 2/3, 4, and 5. In the minicolumn model, double-bouquet inhibitory neurons with their descending axons (black squares and bold lines) and layer 2/3 and 5 pyramidal cells with their ascending dendritic bundles (circles with dotted lines) cooperate to form minicolumns whose diameter is several tens of microns. In addition, the figure incorporates evidence (see text) showing segregation to spatially overlapping neuronal cliques of excitatory neurons and inhibitory fast-spiking basket neu- rons (small colored square) in layers 4 and 2/3. Neurons in each clique are strongly interconnected. Neurons in a layer 2/3 clique share common input from layer 4 (circles of the same color). Inhibitory neurons with an adaptive firing rate (mainly double bouquets) connect nonpreferentially to all neuronal cliques. This arrangement, shown schematically in the gray frame, suggests that a minicolumn functions as a maximization circuit (see Figure 3): each neuronal clique corresponds to a max component, while the adapting inhibitory neurons correspond to the central inhibition. The minicolumn finds the maximum of several input message values. A message arrives at layer 4 neuronal cliques, which are not minicolumn specific (bottom arrow). From there, the message val- ues are projected to layer 2/3 neuronal cliques in several nearby minicolumns using different synaptic weights to different minicolumns (middle arrow). Dif- ferent maximal results computed by different minicolumns are sent to higher areas in the cortical hierarchy through feedforward projection (upper arrow). In this way, minicolumns function as multiple maximization nodes in a cortical inference circuit.

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/neco.2009.05-08-783 by guest on 26 September 2021 Cortical Circuitry Implementing Graphical Models 3039

linear circuit at the output of a maximization circuit (dashed rectangle on the left of Figure 3), which adds up the activity of the max components. Note that unlike the superficial layer cliques, which are suggested to respond nonlinearly due to the strong interaction with double bouquets, the extent of interaction between layer 4 cliques and double bouquets is much smaller (Binzegger et al., 2004). Figure 8 presents the suggested feedforward connectivity: several maxi- mization nodes in a lower area project specifically to a target neuronal clique in a higher area. Compare the sketch of cortex suggested in Figure 8 with the corresponding neuronal LINC presented in Figure 4. In both figures, two source variables contribute to the computation of a message related to a target variable. While max nodes are implemented in the superficial layers of the lower area, summation nodes are implemented by neuronal cliques at layer 4 in the higher area. In the previous sections we described the mapping between key proper- ties of the graphical-like inference circuit and the local cortical circuits. We suggested that two types of operations—summation and maximization— and two types of connectivity patterns are implemented in cortical layer 4 and layers 2/3. In the next section we briefly discuss a number of more global aspects of the relation between the cortical architecture and the in- ference circuits.

4.4 Large-Scale Aspects of the Model

4.4.1 Variables and Their Interconnections. In a graphical model, infer- ence requires that messages associated with connected variables (nodes) are combined. In the cortex, many long-range connections exhibit a patchy con- nectivity pattern that may be a manifestation of the corresponding graph. That is, patches may correspond to variables or clusters of variables, and their anatomical connections form the graph structure between variables. Patches of efferent cells and axonal terminations, whose size is typically a few hundred microns, can be seen after injections of retrograde or an- terograde tracers to small enough cortical locations. Such patches were found for feedforward, feedback, and horizontal intra-areal connections (e.g., Anderson & Martin, 2002; Angelucci et al., 2002). Patches in lower visual areas tend to be arranged in a regular manner, which may corre- spond to regular relations between simple visual features represented by variables in lower layers of the hierarchical graph. In higher cortical areas, such as the inferior temporal cortex, patches are typically less regular and may correspond to a complex relation between variables in higher layers of the graph, which represent compound visual structures (Fujita, 2002). The exact relation between cortical patches and variables of the graph is an open question. Considering the relatively large patch size and the wide range of features to which neurons can be sensitive in a single patch, it is likely that a patch corresponds to a cluster of variables rather than a single variable.

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/neco.2009.05-08-783 by guest on 26 September 2021 3040 S. Litvak and S. Ullman

Figure 8: Feedforward cortical connectivity forms a neuronal LINC. In this schematic sketch of cortical connectivity, two variables X2 and X3, each with three states, are represented in area A by neuronal cliques in layer 4 (sets of colored circles—three for each variable). Three states of X1 are represented by colored cliques in area B. A state (colored clique) in area B gets specific connections from one maximization node of every source variable in area A (two rectangles whose background color is the same as the area B clique). A maximization node (rectangle in layer 2/3) finds the maximum of the three message values of the area A variable, which are uniquely weighted with respect to the target X1 state. The target neuronal clique in area B finds the sum of its two input max circuits. The connectivity of the summation circuit implemented by the blue neuronal clique with its two input minicolumns is shown schematically in the gray box. Compare the main figure with the simulated inference circuit of Figure 4: the superficial circuits in the six minicolumns correspond to the six max nodes, and the three neuronal cliques in area B correspond to the three summation nodes. Together the two cortical areas realize a sum-max LINC (see Figure 1, solid frames), while a max-sum LINC is realized by several nearby minicolumns in a single area (dotted frames in Figure 1). With alternating max and sum operations and two connectivity patterns implemented in the superficial layers and in layer 4, cortical architecture implements graphical-like computation. See section 4 for details.

Within a patch, neuronal cliques may correspond to states of the variables as discussed in the previous sections.

4.4.2 Learning Relations Between Variables. Inference in a graphical model is based on the integration of immediate evidence with long-term knowl- edge, which is embedded in the parameters of the models. These parameters

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/neco.2009.05-08-783 by guest on 26 September 2021 Cortical Circuitry Implementing Graphical Models 3041

represent the statistical relations between different features in the world and are typically learned from experience. We suggest that in cortex, these statistical relations are encoded in synapses of pyramidal neurons of the superficial layers, where the maximization circuits discussed earlier are lo- cated. Two important sources of input to superficial neurons are axons of local layer 4 neurons, which represent relatively simpler, lower-level fea- tures, and feedback afferents from higher areas in the cortical hierarchy, which represent more complex, higher-level features (Douglas & Martin, 2004). The convergence of information associated with the feedforward and feedback streams of information allows the superficial neurons and their synapses to extract the graphical model parameters—namely, dependent probabilities or compatibility functions that describe the statistical rela- tion between the higher- and lower-level features. This can be obtained by spike-timing-dependent plasticity (Abbott & Nelson, 2000) and other neuronal learning mechanisms as shown, for example, by Deneve (2008b). The superficial neurons are the primary origins of feedforward projections, which carry the weighted maximization results to variables represented on higher cortical areas. Johnson and Burkhalter (1997) found that superficial neuronal cliques in the rat primary visual cortex are reciprocally connected with visual cortical area LM. In other words, patches of highly intercon- nected neurons in layers 2/3 of V1 tend to be both the origin of feedforward projections to a higher area and the recipient of the feedback from the same area. Precisely aligned reciprocity between feedforward and feedback was also found in the cat (Salin, Kennedy, & Bullier, 1995). This suggests that higher-level variables, which are estimated based on the weighted max- imization of lower-level variables, send their estimation by the feedback projections back to the contributing lower-level superficial neurons, where the specific higher and lower information converges and allows learning of the required dependent probabilities. An alternative option to the location of this convergence are the synapses from neurons in layer 6 to layer 4. Binzegger et al. (2004) report that this is the strongest excitatory connection, in terms of number of synapses, between different cortical layers in a V1 column. Layer 6 neurons are one of the major targets of feedback connection, such that the relevant high- level information, as well as the low-level one, is available in the discussed synapses. In the current simulated model, the integration of high- and low- level signals for driving the learning process was not incorporated. This aspect of the circuit is left for future extensions.

4.4.3 Message Normalization. An important aspect of the graphical model is that features (states) of the same variable, such as different orientations, are mutually exclusive. In the neuronal LINC, cross-variable inhibition (the large black rectangle, in the center of Figure 4) normalizes the message values associated with different states of the same variable, such that the overall activity remains bounded. We suggest that in cortex, long axonal

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/neco.2009.05-08-783 by guest on 26 September 2021 3042 S. Litvak and S. Ullman

branches of large, superficial basket cells play this role and ensure compe- tition between states. This is supported, for example, by Kisvarday et al. (2002) who found that in the cat primary visual cortex, such cells supply specific inhibition between dissimilar orientations.

4.5 Predictions. Several predictions regarding cortical architecture can be made based on the graphical-like inference model suggested in this letter. First, a general prediction is that the two underlying connectivity patterns of the GDL (see Figure 1) exist in cortex. This prediction focuses on the connectivity rather than the specific choice of operations (sum, max, or other options discussed in section 2) or the specific neuronal realization of messages, states, and variables. More concrete predictions are based on the specific cortical model (see section 4). Although we discussed support- ing experimental evidence for this model, some aspects of the model are empirically unknown but predicted by the model. One concrete prediction concerns the selective inhibition in minicolumns. In the model, max compo- nents associated with competing features of a single variable are separately evaluated in one maximization circuit. This requires that a set of superficial layer neurons (associated with one source variable) has a corresponding set of one or more double bouquets (or related inhibitory types) that is pref- erentially and reciprocally connected to it. Two such sets are schematized in Figure 7. This preference may be induced by the vertical orientation of double-bouquet axons and pyramidals’ dendrites but also by their synaptic learning rules. We predict that synapses between the double bouquets and the superficial pyramidals will exhibit activity-dependent plasticity and promote connections between coactive cells. This has a specific functional implication in the model: it promotes the self-organization of relatively in- dependent clusters of reciprocally interconnected pyramidals and double bouquets. Learning was demonstrated at inhibitory synapses on pyramidal neurons (Haas, Nowotny, & Abarbanel, 2006; Holmgren & Zilberter, 2001), but specific data for synapses from double bouquet and basket cells are currently not available. It will not be surprising if different learning rules characterize synapses formed by the two types of cells and give rise to the two different connectivity patterns described earlier. The third prediction relates to the feedforward projections from pyrami- dal neurons of the superficial layers in a lower cortical area to their targets in layer 4 of a higher cortical area. We expect to find selective connectivity of the kind found by Yoshimura et al. (2005) between layer 4 and layers 2/3 in a single area. Based on the model, the projections from several functionally related superficial maximization nodes terminate together specifically on the same target neuronal cliques in a higher area. For example, in Figure 8, three pairs of maximization nodes (minicolumns) in area A are specifically connected to three neuronal cliques in layer 4 of area B. In each pair, the maximization nodes are functionally related. Note that this prediction does

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/neco.2009.05-08-783 by guest on 26 September 2021 Cortical Circuitry Implementing Graphical Models 3043

not concern the size or the spatial arrangement of the neuronal cliques, which may be spatially mixed or separated.

4.6 Summary of the Cortical Model. We have described several aspects of fine-scale cortical organization and proposed their correspondence to the elements of the neuronal inference circuit. Following are the main proposed relations between the graphical model and the cortical structure: r Neuronal cliques of strongly interconnected excitatory and fast- spiking inhibitory basket neurons serve as basic building blocks for r cortical circuits. Such cliques are typically spatially mixed in cortex. Neuronal cliques preferentially connect to a subset of other cliques in r a manner that forms a cortical inference circuit. The activity of a neuronal clique can be interpreted as one numerical value (e.g., likelihood) using the mean rate in the population during r a short period of time. A layer 4 neuronal clique corresponds to a summation node of the graphical model. It receives input from several functionally related minicolumns in lower cortical areas, takes the sum (or mean) of their r activity, and represents a message value. A layer 2/3 neuronal clique corresponds to a max component: it gets input from a specific layer 4 clique and represents a weighted version r of its input message value. A minicolumn corresponds to a maximization node. It computes the maximal (or softmax) value over several weighted message values represented by its neuronal cliques. Adapting inhibitory cells (double- bouquets and related types) are reciprocally connected to all input r cliques and drive their nonlinear responses. A cortical LINC corresponds to several maximization nodes (super- ficial layer minicolumns) and summation nodes (layer 4 neuronal r cliques). The long axonal branches of large basket cells are responsible for the normalization of message values, or cross-states competition, in r cortically represented variables. Overall, the efficiency of the distributed graphical-like inference is achieved by combining a selective connectivity scheme, based on states and variables (see Figure 1), and two processing stages: max of several weighted message values by layers 2/3, together with the sum of several maximal results by layer 4. The view of the local cortical circuitry as a network of neuronal cliques that selectively connect to each other according to a precise set of rules is clearly an abstraction rather than a detailed biological reality. However, the structural principles expressed by this abstraction may be useful for under- standing the computational power of the cortical circuitry, as suggested by graphical models.

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/neco.2009.05-08-783 by guest on 26 September 2021 3044 S. Litvak and S. Ullman

5Conclusion

A graphical model is a representation that encodes the complex relations between elements using variables and their states, represented in a graph structure. This structure decomposes a complex joint function (such as a joint probability distribution) into simpler, local functions, which become amenable to efficient computation. The visual system may similarly repre- sent, for example, the relationships between different parts of an object for the purpose of object recognition. A particular part, such as a mouth within a face, may appear in an image in different versions, such as open, closed, smiling, or neutral, at different locations and sizes. In the terminology of graphical models, the mouth is a variable that can have different values or states. When performing visual recognition, the system combines evidence from many observed variables to find the most likely states of all the vari- ables and consequently produce an interpretation of the entire image. This corresponds to the graphical-like inference performed in graphical models, which seeks the most likely assignment of the unobserved variables given the observed data. The model described in this letter combines for the first time the general structure of graphical models with a population-based, fast, cortical-like network model simulated with standard LIF neurons. The computations are obtained in biologically realistic inference time and produce likely ap- proximated inference results. This model includes a number of novel as- pects related to the computations performed by the network, its neuronal implementation, and its relation to known aspects of the cortical circuitry. These aspects of novel contribution are listed below and compared to cor- responding properties of several previous related models. With respect to the underlying computation, the model proposes a neu- ronal implementation based on the max-sum scheme, which is naturally derived from the max-product belief revision shifted to the log domain. This pair of basic operations was previously used in cortical modeling (e.g., in Riesenhuber & Poggio, 1999), but it is used here for the first time for a graphical-like inference cortical model. Previous models that pioneered the use of graphical-like neuronal circuits used different pairs of basic opera- tions. Rao (2004b) used the log-domain variant of a sum-product scheme such that products become sums. The log of sums is approximated by a φ ≈ sum of weighted logs: log( i i Mi ) i ai log(Mi ). In an alternative model with spiking neurons (Rao, 2005b), the log of sum is explicitly computed in dendrites, using log and exponential transformations. Deneve (2008a) suggested using exclusively binary variables, such that each neuron prop- agates single-value messages, encoding an estimate of the log likelihood ratio: log(P(Xi = 1)/P(Xi = 0)). A spiking model was proposed using a log domain version of the sum product message equation. Ott and Stoop (2006) also used binary variables, log likelihood ratios, and single-value mes- sages by a single neuron. They showed that under some assumptions, a

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/neco.2009.05-08-783 by guest on 26 September 2021 Cortical Circuitry Implementing Graphical Models 3045

time-continuous, log-domain version of the sum-product equations for MRF can be approximated by a Hopfield network dynamics. Beck and Pouget (2007) suggested a neuronal implementation for the time-dependent inference equation of a chain HMM with time-varying input based on a dynamical system of a recurrently connected neuronal population. Multi- plication was used directly by the system, assuming it is implemented by dendrites. Ma et al. (2006) suggested implementing the product operation in a sum-product equation by representing the analog message values using a probabilistic population code, which takes advantage of the variability of tuning curves in a population. As discussed in sections 3 and 4, the current model using sum-max operations has an advantage in terms of its biological plausibility, speed of computation, and agreement with empirical data. Another unique aspect of the current model is the combination of a stan- dard LIF single-unit model with a general scalable circuit—namely, the sup- port of an arbitrary graph structure and discrete variables with any number of states. This is obtained by intermediate-size subcircuits implementing maximization, summation, and normalization, which are replicated and connected as required by the underlying scheme of the general graphical- like inference circuit (see Figure 1). This approach is different from that of several past models (Beck & Pouget, 2007; Deneve, 2008a; Ma et al., 2006; Ott & Stoop, 2006; Rao, 2004a, 2005b), which suggested mapping a single neuronal unit to one element (a variable or a state) of the graphical model. In order to achieve the required graphical-like computation associated with a state or a variable using a single unit, previous models typically imposed some restrictions on the structure of the graphical model or the model of sin- gle unit dynamics. For example, some models focused on hidden Markov chains or on graphs with binary variables only. Depending on the specific implementation, these models may require fewer neurons for computing each message value have a simpler network connectivity compared to the current model. The current model supports general graph structure and multivalue variables using repeating basic units, which appear consistent with the layered cortical architecture described in section 4. The current inference model was shown in simulations to find a highly likely interpretation of the input, in a large space of possible interpreta- tions, within a biologically realistic time of a few hundred milliseconds, using spikes, with different graph structures and different settings of the dependencies between variables (see Figure 6). The fast computation is achieved by a high degree of parallelization at two levels. First, the compu- tation is decomposed by the underlying graphical model message equation, which is implemented in parallel for all states and variables using many lo- cal inference circuits. Second, the basic operations are implemented by local circuits with multiple populations of neurons (see Figure 4). Each popula- tion represents an analog value using the mean rate in a short time period of a few tens of milliseconds. The summation, maximization, and normaliza- tion subcircuits were shown, using simulations, to quickly approximate the

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/neco.2009.05-08-783 by guest on 26 September 2021 3046 S. Litvak and S. Ullman

desired output. For example, the maximization circuit finds the maximum value over several inputs within 25 ms after typically a single spike per active neuron. While the input and output are coded with rates, the com- putation is implemented in terms of spiking dynamics in a population, not by rate equations. The use of populations for fast computation is common in neuronal modeling. However, using populations for full graphical-like inference is more difficult (Ma et al., 2006; Pouget, Dayan, & Zemel, 2003; Zemel, Dayan, & Pouget, 1998). The current model combines graphical model decomposition, parallel circuits, and spiking dynamics in popula- tions to achieve fast inference. In the current model, all neurons in a population encoding a given value are assumed to have similar response properties. The accuracy of the repre- sentation and the associated computation depends on the population size. Given a fixed population size, there is a trade-off between the number of states that can be represented and the accuracy of representation: when the number of states increases, accuracy decreases. An interesting direction for further study is an extension of the max-sum LINC to support more efficient population coding, where a combination of the activities of many neurons, with different response properties, can approximately represent a density function. An extended max-sum inference circuit, which is based on com- binatorial population coding, may be more economical in resources than the current circuit. It could also support interpolation, that is, not all states of the variable would have to be explicitly represented by an independent neuron or population. Representation and computation with population codes along these lines were proposed and studied, for example by Abbott and Dayan (1999), Eliasmith and Anderson (2003), Jazayeri and Movshon (2006), Ma et al. (2006), Pouget et al. (2003), Seung and Sompolinsky (1993), Zemel et al. (1998), and Zhang, Ginzburg, McNaughton, and Sejnowski (1998). The model addresses a number of basic anatomical and physiological aspects of the local cortical microcircuitry and proposes how they may nat- urally map onto elements of the graphical model implementation. In this, we follow previous studies of Dayan et al. (1995), Friston (2003), Hinton and Sejnowski (1983), Koechlin et al. (1999), Lee and Mumford (2003) Rao (2005a), Rao and Ballard (1999), Ullman (1995), and others, which suggested that feedforward and feedback connection may contribute to graphical-like processing in the cortical hierarchy. In addition to the large-scale connec- tivity of the cortical hierarchy, which was extensively discussed in previous works, our model suggests a connection between graphical-like process- ing and the following aspects of the local, fine-scale, cortical circuitry: the role of subpopulations of strongly connected neurons (neuronal cliques) and their unique connectivity patterns in different layers (see Figure 8); the roles of three major types of inhibitory neurons (small fast-spiking basket cells, large superficial basket cells, and double bouquets); and the proposed relation of minicolumns to maximization circuits (see Figure 7). It should

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/neco.2009.05-08-783 by guest on 26 September 2021 Cortical Circuitry Implementing Graphical Models 3047

be noted that in discussing these relations, we do not suggest an exact cor- respondence between the mathematical graphical models and the cortical circuitry. Rather, the graphical models provide an approximation to impor- tant aspects of the cortical circuitry and help to raise a number of testable predictions for future studies described in section 4. The model also raises a number of open questions. First, it does not address in its current form several important aspects of the cortical archi- tecture, which may be directly relevant to graphical-like computation. These additional aspects include the large-scale structure of the underlying corti- cal graph, in particular, the feedback and its connectivity profile (Rockland, 1997); the function of unique types of inhibitory neurons (Markram et al., 2004), beyond the three qualitative archetypes discussed in the model; the role of specific subtypes of pyramidal neurons and of other excitatory neu- rons (e.g., spiny stellate cells of layer 4) in the graphical-like processing; firing adaptation of neurons; and short time synaptic plasticity (Abbott & Regehr, 2004), which may affect the operation of summation and maxi- mization nodes and contribute to the accuracy of normalization and to the convergence of the inference process. In addition, learning the statistical dependencies between variables is a central, complementary problem to the inference problem. It raises, in the context of the sum-max model, a number of open issues, including the local learning rules and their synaptic and intrinsic neuronal implementation, the global mechanisms that trigger the learning, and the details of cortical pathways that deliver the required information to the learning site. A broader theory of graphical-like inference in cortex, which includes these aspects, is a challenge for further studies. Understanding the structure of the cortical inference circuits may also be useful for improving our understanding of computational graphical models and how to use them effectively. Cortical solutions found by evolution may propose ways to overcome issues of inference speed, convergence, and ac- curacy when complex relations between real-world variables are involved. This may be learned from local aspects of the circuit as discussed in our model and from aspects of the large-scale cortical graphs such as hierar- chy, sparseness, and cycles. The external world, which we grasp through our sensory modalities, may be effectively modeled using graphs with some specific structural patterns, which the study of the cortex may help to reveal.

Appendix A: LIF Neurons

The subthreshold membrane potential of the LIF neuron is governed by

dVm 1 Cm =− (Vm − Vresting) + (Isyn(t) + Iinject). (A.1) dt Rm

When the membrane potential Vm crosses Vth, an action potential is gener- ated and the membrane potential is set back to the Vresting and clamped for a

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/neco.2009.05-08-783 by guest on 26 September 2021 3048 S. Litvak and S. Ullman

Table 2: Neuronal and Synaptic Parameters Used in the Simulated Circuits.

Threshold potential Vth ∼ U(−47, −43) mV Resting membrane potential Vresting ∼ U(−75, −65) mV Membrane resistance Rm = 100 M Membrane time constant τm = Cm · Rm = 30 ms Absolute refractoryperiod trefrac = 3ms Constant nonspecific background current Iinject ∼ U(0.161, 0.163) nA Excitatory synapse time constant τsynE = 3ms Inhibitory synapse time constant τsynl = 6ms Synaptic transmission probability psyn = 0.75

refractory period of trefrac. Isyn(t) is the time-dependent sum of postsynaptic currents injected through all the input synapses. The temporary current at each synapse k is

t − τ Ik (t) = wk · e syn , (A.2)

where t is the time since the last synaptic event. The synaptic events are nondeterministic and occur after a presynaptic spike with probability psyn. Neuronal and synaptic parameters are detailed in Table 2 and based on data from Gray and McCormick (1996), Gupta, Wang, and Markram (2000), Hausser and Roth (1997), Markram, Wang, and Tsodyks (1998).

Appendix B: Maximization Circuit: Analysis of Population Activity

In section 3.3 we described the maximization circuit. Here we show that the circuit approximates the max operation using a simplified model of population activity. Each excitatory population in Figure 3 (left, three red rectangles) is modeled as a single unit, which responds linearly to its cumu- lative synaptic input. When discrete time steps are used, the output activity Ei (t) of the excitatory populations i at time t is described by

+ Ei (t) = [Pi (t)] , s.t. Pi (t) = Pi (t − 1) + ui (t − 1) − I (t − 1) t−1 = (ui (s) − I (s)). (B.1) s=1

The membrane potential Pi (t) of unit i accumulates all excitatory and in- hibitory inputs before time t, ui (t) is the external input from source i at time t, and I (t) is the activity of the inhibitory population (elongated black rectan- gle, Figure 3). We model the inhibitory activity as an instantaneous response = n to the overall momentary excitatory input, I (t) i=1 Ei (t), assuming fast synapses.

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/neco.2009.05-08-783 by guest on 26 September 2021 Cortical Circuitry Implementing Graphical Models 3049 = = Let Z(t) s=1,...,t−1,i=1,...,n Ei (s) s=1,...,t−1 I (s) be the accumulated excitatory activity at all input populations over all time steps before t. This is also the accumulated activity of the inhibitory population, as well as the accumulated activity of the output population of the circuit (see Figure 3, dashed rectangle at the top). First, we analyze the special case where the input values ai are presented to the circuit sequentially starting from t = 0, that is: ui (t = i − 1) = ai and ui (t = i − 1) = 0, for i = 1,...,n. At time step t, only the single excitatory unit with index t, which experiences the external activity ut(t − 1) ≥ 0, may be active with Et(t) > 0. The instantaneous inhibitory activity is then equal to the activity of this excitatory unit: I (t) = Et(t). On the next time step, t+1, this inhibitory activity is distributed with a minus sign to all excitatory units including unit t, which ensures that Et(s) = 0 for all s ≥ t + 1. Under these conditions,

n t−1 + I (t) = Ei (t) = Et(t) = [ut(t − 1) − I (s)] (B.2) i=1 s=1

from equation B.1. Since only ut(t − 1) > 0,

+ t−1 + = ut(t − 1) − I (s) = [ut(t − 1) − Z(t)] . (B.3) s=1

+ = − We can now show by induction on t that Z(t 1) maxi=1...t ut(i 1), as follows:

+ Z(t + 1) = Z(t) + I (t) = Z(t) + [ut(t − 1) − Z(t)]

= max (Z(t), ut(t − 1)) . (B.4)

+ The last step holds because a + [b − a] = max(a, b) using the induction assumption: t−1 t = max max ui (i − 1), ut(t − 1) = max ui (i − 1). (B.5) i=1 i=1

,..., This shows that when the input values a1 an are presented to the system sequentially at time steps t = 0,...,n − 1, the accumulated response of the output population at time T = n + 1 is the maximal value: Z(n + 1) = maxi=1,...,n ai . In the more general, nonsequential case, the input value at each external source line is arbitrarily divided across all time steps 0,...,T − 1. We use small time steps and assume that at every time step t, only a single excitatory

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/neco.2009.05-08-783 by guest on 26 September 2021 3050 S. Litvak and S. Ullman

population with index k may be active, that is, Ek (t) ≥ 0. This also means = that I (t) Ek (t). = Let ak (t) s=0..t−1 uk (s) the total input that had already arrived to unit k until time t. Based on equation B.1 and the definition of Z, the instantaneous in- hibitory activity is then

+ t−1 + Ei (t) = ai (t) − I (s) = [ai (t) − Z(t)] . (B.6) s=1

+ + And specifically, I (t) = [ai (t) − Z(t)] . Using a + [b − a] = max(a, b)we get

+ Z(t + 1) = Z(t) + I (t) = Z(t) + [ak (t) − Z(t)] = max (Z(t), ak (t)) . (B.7)

n The induction assumption in this case is Z(t) = max = ai (t − 1) = ak (t − 1) i 1 where k is the single active unit at time t − 1. Based on this assumption and on equation B.7, n Z(t + 1) = max (Z(t), ak (t)) = max max ai (t − 1), ak (t) . (B.8) i=1

+ Since only Ek (t) is active at time t, Ek (t) = [ak (t) − Z(t)] ≥ 0andEi (t) = + [ai (t) − Z(t)] = 0 for all i = k,henceak (t) ≥ ai (t) for all i = k. ai (t) is mono- n − ≤ n tonically increasing in t,hencemaxi=1 ai (t 1) maxi=1 ai (t). Together this + = n = implies that Z(t 1) maxi=1 ai (t), as required. Since for all i, ai (T) T−1 = + + = n = n s=0 ui (s) ai , at time T 1 we get Z(T 1) maxi=1 ai (T) maxi=1 ai .

Appendix C: Normalization Circuit

The normalization circuit is approximating the normalization step (see equation 3.4, with η = 1.2) of the belief consolidation scheme. The circuit is shown in Figure 4 (green dotted frame) as part of a LINC. The connec- tivity parameters (percentages, indicated by the arrows) were chosen using an optimizing procedure, similar to the maximization and linear circuit. The circuit approximation error was tested in a simulation using randomly selected sets of 36 inputs in the range [0, 1]. The mean error for 200 inde- pendent trials was 0.075 (±0.092). When the connectivity parameters were randomly jittered in the range of 0.9 to 1.1 of the original values the mean error over 5000 trials was slightly higher: 0.105 (±0.102). The circuit is also robust to different number of inputs.

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/neco.2009.05-08-783 by guest on 26 September 2021 Cortical Circuitry Implementing Graphical Models 3051

Acknowledgments

We thank Rodney Douglas for the inspiring discussions on spiking cir- cuits and cortical architecture and for his contribution in raising financial support. This work was supported by ISF grant 7-0369 and EU IST grant FP6-2005-015803 and conducted at the Moross Laboratory for Vision and Motor Control (Rehovot, Israel) and in part at the Institute for Neuroinfor- matics (Zurich, Switzerland).

References

Abbott, L. F., & Dayan, P. (1999). The effect of correlated variability on the accuracy of a population code. Neural Computation, 11(1), 91–101. Abbott, L. F., & Nelson, S. B. (2000). Synaptic plasticity: Taming the beast. Nature Neuroscience, 3, 1178–1183. Abbott, L. F., & Regehr, W. G. (2004). Synaptic computation. Nature, 431(7010), 796– 803. Aji, S. M., & McEliece, R. J. (2000). The generalized distributive law. IEEE Transactions on Information Theory, 46(2), 325–343. Amirikian, B., & Georgopoulos, A. P. (2003). Modular organization of directionally tuned cells in the motor cortex: Is there a short-range order? Proceedings of the National Academy of Sciences, 100, 12474–12479. Anderson, J. C., & Martin, K. A. (2002). Connection from cortical area V2 to MT in macaque monkey. Journal of Comparative Neurology, 443(1), 56–70. Angelucci, A., Levitt, J. B., Walton, E. J., Hupe, J. M., Bullier, J., & Lund, J. S. (2002). Circuits for local and global signal integration in primary visual cortex. Journal of Neuroscience, 22(19), 8633–8646. Beck, J. M., & Pouget, A. (2007). Exact inferences in a neural implementation of a hidden Markov model. Neural Computation, 19(5), 1344–1361. Besag, J. (1974). Spatial interaction and the statistical analysis of lattice systems. Journal of the Royal Statistical Society, 36(2), 192–326. Binzegger, T., Douglas, R. J., & Martin, K. A. (2004). A quantitative map of the circuit of cat primary visual cortex. Journal of Neuroscience, 24(39), 8441–8453. Buxhoeveden, D. P., & Casanova, M. F. (2002). The minicolumn hypothesis in neu- roscience. , 125(5), 935–951. Carpenter, R. H., & Williams, M. L. (1995). Neural computation of log likelihood in control of saccadic eye movements. Nature, 377(6544), 59–62. Cooper, G. F. (1990). The computational complexity of probabilistic inference using Bayesian belief networks. Artificial Intelligence, 42(2–3), 393–405. Dayan, P., Hinton, G. E., Neal, R. M., & Zemel, R. S. (1995). The Helmholtz machine. Neural Computation, 7, 1022–1037. DeFelipe, J., Hendry, S. H., Hashikawa, T., Molinari, M., & Jones, E. G. (1990). A microcolumnar structure of monkey revealed by immunocyto- chemical studies of double bouquet cell axons. Neuroscience, 37(3), 655–673. Deneve, S. (2008a). Bayesian spiking neurons I: Inference. Neural Computation, 20(1), 91–117.

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/neco.2009.05-08-783 by guest on 26 September 2021 3052 S. Litvak and S. Ullman

Deneve, S. (2008b). Bayesian spiking neurons II: Learning. Neural Computation, 20(1), 118–145. Douglas, R. J., & Martin, K. A. C. (2004). Neuronal circuits of the . Annual Review of Neuroscience, 27, 419–451. Doya, K., Ishii, S., Alexandre, P., Rao, R. P. N. (Eds.). (2007). Bayesian brain. Cambridge, MA: MIT Press. Eliasmith, C., & Anderson, C. H. (2003). Neural engineering: Computation, representa- tion, and dynamics in neurobiological systems. Cambridge, MA: MIT Press. Epshtein, B., & Ullman, S. (2005). Feature hierarchies for object classification. Pro- ceedings of the International Conference on Computer Vision, 220–227. Ernst, M. O., & Banks, M. S. (2002). Humans integrate visual and haptic information in a statistically optimal fashion. Nature, 415(6870), 429–433. Felleman, D. J., & Van Essen, D. C. (1991). Distributed hierarchical processing in the cerebral cortex. Cerebral Cortex, 1(1), 1–47. Freeman, W. T., Pasztor, E. C., & Carmichael, O. T. (2000). Learning low-level vision. International Journal of Computer Vision, 40(1), 25–47. Friston, K. (2003). Learning and inference in the brain. Neural Networks, 16(9), 1325– 1352. Fujita, I. (2002). The inferior temporal cortex: Architecture, computation, and repre- sentation. Journal of Neurocytology, 31(3–5), 359–371. Gawne, T. J., & Martin, J. M. (2002). Responses of primate visual cortical V4 neurons to simultaneously presented stimuli. Journal of Neurophysiology, 88(3), 1128–1135. Gerstner, W., & Kistler, W. M. (2002). Spiking neuron models.Cambridge:Cambridge University Press. Ghahramani, Z. (2002). An introduction to hidden Markov models and Bayesian networks. IJPRAI, 15(1), 9–42. Gray, C. M., & McCormick, D. A. (1996). Chattering cells: Superficial pyramidal neurons contributing to the generation of synchronous oscillations in the visual cortex. Science, 274(5284), 109–113. Gupta, A., Wang, Y., & Markram, H. (2000). Organizing principles for a diversity of GABAergic interneurons and synapses in the neocortex. Science, 287(5451), 273–278. Haas, J. S., Nowotny,T., & Abarbanel, H. D. (2006). Spike-timing-dependent plasticity of inhibitory synapses in the entorhinal cortex. Journal of Neurophysiology, 96(6), 3305–3313. Hausser, M., & Mel, B. (2003). Dendrites: Bug or feature? Current Opinion in Neuro- biology, 13(3), 372–383. Hausser, M., & Roth, A. (1997). Estimating the time course of the excitatory synaptic conductance in neocortical pyramidal cells using a novel voltage jump method. Journal of Neuroscience, 17(20), 7606–7625. Heskes, T. (2004). On the uniqueness of loopy belief propagation fixed points. Neural Computation, 16(11), 2379–2413. Hinton, G., Dayan, P., Frey, B., & Radford, N. (1995). The wake-sleep algorithm for unsupervised neural networks. Science, 268, 1158–1160. Hinton, G. E., & Sejnowski, T. J. (1983). Optimal perceptual inference. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 448–453). Piscataway, NJ: IEEE Press.

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/neco.2009.05-08-783 by guest on 26 September 2021 Cortical Circuitry Implementing Graphical Models 3053

Hinton, G. E., & Sejnowski, T. J. (1986). Learning and relearning in Boltzmann ma- chines. In D. E. Rumelhart & J. L. McClelland (Eds.), Parallel distributed processing: Explorations in the microstructure of cognition, Vol. 1: Foundations (pp. 282–317). Cambridge, MA: MIT Press. Holmgren, C. D., & Zilberter, Y. (2001). Coincident spiking activity induces long- term changes in inhibition of neocortical pyramidal cells. Journal of Neuroscience, 21(20), 8270–8277. Huys, Q. J., Zemel, R. S., Natarajan, R., & Dayan, P. (2007). Fast population coding. Neural Computation, 19(2), 404–441. Ichinohe, N., Fujiyama, F., Kaneko, T., & Rockland, K. S. (2003). Honeycomb-like mosaic at the border of layers 1 and 2 in the cerebral cortex. Journal of Neuroscience, 23(4), 1372–1382. Ichinohe, N., & Rockland, K. S. (2004). Region specific micromodularity in the up- permost layers in primate cerebral cortex. Cerebral Cortex, 14(11), 1173–1184. Jaimovich, A., Elidan, G., Margalit, H., & Friedman, N. (2006). Towards an inte- grated protein-protein interaction network: A relational Markov network ap- proach. Journal of Computational Biology, 13(2), 145–164. Jazayeri, M., & Movshon, J. A. (2006). Optimal representation of sensory information by neural populations. Nature Neuroscience, 9(5), 690–696. Johnson, R. R., & Burkhalter, A. (1997). A polysynaptic feedback circuit in rat visual cortex. Journal of Neuroscience, 17(18), 7129–7140. Jones, E. G. (2000). Microcolumns in the cerebral cortex. Proceedings of the National Academy of Sciences, 97, 5019–5021. Jordan, M. I. (1999). Learning in graphical models. Cambridge, MA: MIT Press. Jordan, M. I., & Sejnowski, T. J. (2001). Graphical models: Foundations of neural compu- tation. Cambridge, MA: MIT Press. Jordan, M. I., & Weiss, Y. (2002). Probabilistic inference in graphical models. In M. Arbib (Ed.), Handbook of neural networks and brain theory (2nd ed.). Cambridge, MA: MIT Press. Kampa, B. M., Letzkus, J. J., & Stuart, G. J. (2006). Cortical feed-forward networks for binding different streams of sensory information. Nature Neuroscience, 9(12), 1472–1473. Kisvarday, Z. F., Ferecsko, A. S., Kovacs, K., Buzas, P., Budd, J. M., & Eysel, U. T. (2002). One -multiple functions: Specificity of lateral inhibitory connections by large basket cells. Journal of Neurocytology, 31(3–5), 255–264. Knill, D. C., & Richards, W. (1996). Perception as Bayesian inference. Cambridge: Cam- bridge University Press. Koechlin, E., Anton, J. L., & Burnod, Y. (1999). Bayesian inference in populations of cortical neurons: A model of motion integration and segmentation in area MT. Biological Cybernetics, 80(1), 25–44. Kording, K. P., & Wolpert, D. M. (2004). Bayesian integration in sensorimotor learn- ing. Nature, 427(6971), 244–247. Kschischang, F. R., Frey, B. J., & Loeliger, H.-A. (2001). Factor graphs and the sum- product algorithm. IEEE Transactions on Information Theory, 47(2), 498–519. Lampl, I., Ferster, D., Poggio, T., & Riesenhuber, M. (2004). Intracellular measure- ments of spatial integration and the MAX operation in complex cells of the cat primary visual cortex. Journal of Neurophysiology, 92(5), 2704–2713.

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/neco.2009.05-08-783 by guest on 26 September 2021 3054 S. Litvak and S. Ullman

Lee, T., & Mumford, D. (2003). Hierarchical Bayesian inference in the visual cortex. Journal of Optical Society of America, 20(7), 1434–1448. Lev, D. L., & White, E. L. (1997). Organization of pyramidal cell apical dendrites and composition of dendritic clusters in the mouse: Emphasis on primary motor cortex. European Journal of Neuroscience, 9(2), 280–290. Ma, W. J., Beck, J. M., Latham, P. E., & Pouget, A. (2006). Bayesian inference with probabilistic population codes. Nature Neuroscience, 9(11), 1432–1438. Maass, W., & Natschlager, T. (2000). A model for fast analog computation based on unreliable synapses. Neural Comput., 12(7), 1679–1704. MacKay, D. J. C. (2003). Information theory, inference, and learning algorithms.Cam- bridge: Cambridge University Press. Markram, H., Toledo-Rodriguez, M., Wang, Y., Gupta, A., Silberberg, G., & Wu, C. (2004). Interneurons of the neocortical inhibitory system. Nature Reviews Neuro- science, 5(10), 793–807. Markram, H., Wang, Y., & Tsodyks, M. (1998). Differential signaling via the same axon of neocortical pyramidal neurons. Proceedings of the National Academy of Sciences, 95, 5323–5328. Mel, B. W., & Koch, C. (1990). Sigma-pi learning: On radial basis functions and cortical associative learning. In D. Touretzky (Ed.), Advances in Neural information processing systems, 2 (pp. 474–481). San Francisco: Morgan Kaufmann. Meltzer, T., Yanover, C., & Weiss, Y. (2005). Globally optimal solutions for energy minimization in stereo vision using reweighted Belief Propagation. In Proceedings of the IEEE International Conference on Computer Vision (pp. 428–435). Piscataway, NJ: IEEE Press. Mountcastle, V. B. (2003). Introduction: Computation in cortical columns. Cerebral Cortex, 13(1), 2–4. Murphy, K., Weiss, Y., & Jordan, M. (1999). Loopy belief propagation for approximate inference: an empirical study. In Proceedings of the Conference on Uncertainty in Artificial Intelligence (pp. 467–475). San Francisco: Morgan Kaufmann. Natschlager,¨ T., Markram, H., & Maass, W. (2003). Computer models and analysis tools for neural microcircuits. In R. Kotter¨ (Ed.), Neuroscience databases: A practical guide (pp. 123–138). Norwell, MA: Kluwer. Ohki, K., Chung, S., Ch’ng, Y. H., Kara, P., & Reid, R. C. (2005). Functional imaging with cellular resolution reveals precise micro-architecture in visual cortex. Nature, 433(7026), 597–603. Ott, T., & Stoop, R. (2006). The neurodynamics of belief propagation on binary Markov random fields. In B. Scholkopf,¨ J. Platt, & T. Hoffman (Eds.), Advances in neural information processing systems, 18 (pp. 1057–1064). Cambridge, MA: MIT Press. Pearl, J. (1987). Distributed revision of composite beliefs. Artificial Intelligence, 33(2), 173–215. Pearl, J. (1988). Probabilistic reasoning in intelligent systems: Networks of plausible infer- ence. San Mateo, CA: Morgan Kaufmann. Peters, A., & Sethares, C. (1997). The organization of double bouquet cells in monkey striate cortex. Journal of Neurocytology, 26(12), 779–797. Pouget, A., Dayan, P., & Zemel, R. S. (2003). Inference and computation with popu- lation codes. Annual Review of Neuroscience, 26, 381–410.

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/neco.2009.05-08-783 by guest on 26 September 2021 Cortical Circuitry Implementing Graphical Models 3055

Rao, R. P. (2004a). Bayesian computation in recurrent neural circuits. Neural Compu- tation, 16(1), 1–38. Rao, R. P. N. (2004b). Bayesian computation in recurrent neural circuits. Neural Computation, 16(1), 1–38. Rao, R. P.(2005a). Bayesian inference and attentional modulation in the visual cortex. Neuroreport, 16(16), 1843–1848. Rao, R. P.N. (2005b). Hierarchical Bayesian inference in networks of spiking neurons. In L. K. Saul, Y. Weiss, & L. Bottou (Eds.), Advances in neural information processing systems, 17 (pp. 1113–1120). Cambridge, MA: MIT Press. Rao, R. P., & Ballard, D. H. (1999). Predictive coding in the visual cortex: A functional interpretation of some extra-classical receptive-field effects. Nature Neuroscience, 2(1), 79–87. Rao, R. P. N., Olshausen, B. A., & Lewicki, M. S. (Eds). (2002). Probabilistic models of the brain: Perception and neural function. Cambridge, MA.: MIT Press. Riesenhuber, M., & Poggio, T. (1999). Hierarchical models of object recognition in cortex. Nature Neuroscience, 2(11), 1019–1025. Rockland, K. S. (1997). Elements of cortical architecture: hierarchy revisited. In K. S. Rockland, J. H. Kaas, & A. Peters (Eds.), Cerebral cortex: Extrastriate cortex in primates (pp. 243–293). New York: Plenum Press. Rockland, K. S., & Ichinohe, N. (2004). Some thoughts on cortical minicolumns. Exp. Brain. Res., 158(3), 265–277. Sahani, M., & Dayan, P. (2003). Doubly distributional population codes: Simulta- neous representation of uncertainty and multiplicity. Neural Computation, 15(10), 2255–2279. Salin, P. A., Kennedy, H., & Bullier, J. (1995). Spatial reciprocity of connections be- tween areas 17 and 18 in the cat. Canadian Journal of Physiology and Pharmacology, 73(9), 1339–1347. Samejima, K., Doya, K., Ueda, K., & Kimura, M. (2004). Estimating internal variables and parameters of a learning agent by a particle filter. In S. Thrun,L.Saul,¨ & B. Scholkopf (Eds.), Advances in neural information processing systems, 16 (pp. 1335–1342). Cambridge, MA.: MIT Press. Seung, H. S., & Sompolinsky, H. (1993). Simple models for reading neuronal popu- lation codes. Proceedings of the National Academy of Science, 90, 10749–10753. Song, S., Sjostrom, P. J., Reigl, M., Nelson, S., & Chklovskii, D. B. (2005). Highly non- random features of synaptic connectivity in local cortical circuits. PLoS Biology, 3(3), e68. Stocker, A. A., & Simoncelli, E. P. (2006). Noise characteristics and prior expectations in human visual speed perception. Nature Neuroscience, 9(4), 578–585. Ullman, S. (1995). Sequence seeking and counter streams: A computational model for bidirectional information flow in the visual cortex. Cerebral Cortex, 5(1), 1–11. Wang, Y., Markram, H., Goodman, P. H., Berger, T. K., Ma, J., & Goldman-Rakic, P. S. (2006). Heterogeneity in the pyramidal network of the medial prefrontal cortex. Nature Neuroscience, 9(4), 534–542. Weiss, Y. (1997). Belief propagation and revision in networks with Loops (No. AIM-1616). Cambridge, MA: Massachusetts Institute of Technology. Weiss, Y., Simoncelli, E. P., & Adelson, E. H. (2002). Motion illusions as optimal percepts. Nature Neuroscience, 5(6), 598–604.

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/neco.2009.05-08-783 by guest on 26 September 2021 3056 S. Litvak and S. Ullman

Wilson, H. R., & Cowan, J. D. (1972). Excitatory and inhibitory interactions in local- ized populations of model neurons. Biophysical Journal, 12(1), 1–24. Yang, T., & Shadlen, M. N. (2007). Probabilistic reasoning by neurons. Nature, 447(7148), 1075–1080. Yedidia, J., Freeman, W., & Weiss, Y. (2003). Understanding belief propagation and its generalizations. In G. Lakemeyer & B. Nebel (Eds.), Exploring artificial intelligence in the new millennium (pp. 239–269). San Francisco: Morgan Kaufmann. Yoshimura, Y., & Callaway, E. M. (2005). Fine-scale specificity of cortical networks depends on inhibitory cell type and connectivity. Nature Neuroscience, 8(11), 1552– 1559. Yoshimura, Y., Dantzker, J. L., & Callaway, E. M. (2005). Excitatory cortical neurons form fine-scale functional networks. Nature, 433(7028), 868–873. Yu, A. J., Giese, M. A., & Poggio, T. A. (2002). Biophysiologically plausible imple- mentations of the maximum operation. Neural Computation, 14(12), 2857–2881. Yuille, A. L. (2002). CCCP algorithms to minimize the Bethe and Kikuchi free en- ergies: Convergent alternatives to belief propagation. Neural Computation, 14(7), 1691–1722. Zemel, R. S., Dayan, P.,& Pouget, A. (1998). Probabilistic interpretation of population codes. Neural Computation, 10(2), 403–430. Zhang, K., Ginzburg, I., McNaughton, B. L., & Sejnowski, T. J. (1998). Interpreting neuronal population activity by reconstruction: unified framework with applica- tion to hippocampal place cells. Journal of Neurophysiology, 79(2), 1017–1044.

Received May 9, 2008; accepted March 27, 2009.

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/neco.2009.05-08-783 by guest on 26 September 2021