Markov Transitions Between Attractor States in a Recurrent Neural Network
Total Page:16
File Type:pdf, Size:1020Kb
Markov Transitions between Attractor States in a Recurrent Neural Network Jeremy Bernstein∗ Ishita Dasgupta∗ Computation and Neural Systems Department of Physics California Institute of Technology, USA Harvard University, USA [email protected] [email protected] David Rolnick∗ Haim Sompolinsky∗y Department of Mathematics The Edmond and Lily Safra Center for Brain Sciences Massachusetts Institute of Technology, USA Hebrew University of Jerusalem, Israel [email protected] [email protected] Stochasticity is an essential part of explaining the world. sition probabilities can also form the basis for neurally plau- Increasingly, neuroscientists and cognitive scientists are sible probabilistic inference on a discrete state space. identifying mechanisms whereby the brain uses probabilis- It is important here to distinguish between stochastic- tic reasoning in representational, predictive, and generative ity in our perception or neural representation of states, and settings. But stochasticity is not always useful: robust per- stochasticity incorporated into a computational step. The ception and memory retrieval require representations that first is unavoidable and due to noise in our sensory modali- are immune to corruption by stochastic noise. In an ef- ties and communication channels. The second is inherent to fort to combine these robust representations with stochastic a process the brain is carrying out in order to make proba- computation, we present an architecture that generalizes tra- bilistic judgments, and represents useful information about ditional recurrent attractor networks to follow probabilistic the structure of the environment. While it is difficult to Markov dynamics between stable and noise-resistant fixed tease apart these sources of noise and variability, (Beck et points. al. 2012) suggest that sensory or representational noise is not the primary reason for trial-to-trial variability seen in hu- Motivation man responses and that there are other sources of stochastic- With the advancement of probabilistic theories of human ity arising from the process of inference that might be more cognition (Griffiths et al. 2010), there has been increasing in- important and influential in explaining observed behavioral terest in neural mechanisms that can represent and compute variability. Humans are in fact remarkably immune to noise these probabilities. Several new models of neural compu- in percepts - for example when identifying occluded objects tation carry out Bayesian probabilistic inference taking into (Johnson and Olshausen 2005) and filtering out one source account both data and prior knowledge, and can represent of sound amid ambient noise (Handel 1993). uncertainty about the conclusions they draw (Ma, Beck, and Hopfield networks represent an effective model for stor- Pouget 2008; Pecevski, Buesing, and Maass 2011; Shi et age and representation that is largely immune to noise; dif- al. 2010). In many tasks, neural mechanisms are required ferent noisy or partial sensory percepts all converge to the that can transition stochastically to a new state depending same memory as long as they fall within that memory’s basin on the current state: for example, to predict the path of a of attraction. These “memory” states are represented in a moving object (Vul et al. 2009), gauge the effect of a colli- distributed system and are robust to the death of individual sion (Sanborn and Griffiths 2009), or estimate the dynamic neurons. Stochastic transitions in Hopfield networks there- motion of fluids (Bates et al. 2015), as well as in the gen- fore are a step towards stochastic computation that still en- eral context of carrying out correlated sampling over a pos- sures a noise-robust representation of states. terior distribution (Gershman, Vul, and Tenenbaum 2012; The Markov chain dynamics we model also have appli- Bonawitz et al. 2014; Denison et al. 2013). The Markov cations in systems where experimental verification is more transition probabilities in these cases are dictated by knowl- lucid. For example, the Bengalese finch’s song has been edge of the world. The stochasticity of transitions allows effectively modeled as a hidden Markov model (Jin and decisions that are tempered by uncertainty, rather than mak- Kozhevnikov 2011). While deterministic birdsong in the ing a “best guess” or point estimate that is agnostic to uncer- zebra finch has previously been modeled by feedforward tainty and is chosen deterministically based on some mea- chains of neurons in HVC (Long and Fee 2008), our net- sure of optimality. Further, Markov chain Monte Carlo work provides a potential neural model for stochastic bird- methods (Neal 1993) allow us to engineer a Markov chain song. Further, its specific structure has possible parallels in with stationary distribution equal to any distribution of in- songbird neural architecture, as we later detail. terest. Therefore a simple Markov chain with the right tran- Background ∗All authors contributed equally to this work. yAlso: Center for Brain Science, Harvard University, USA A Hopfield network (Hopfield 1982) is a network of binary Copyright c 2017, Association for the Advancement of Artificial neurons with recurrent connections given by a symmetric Intelligence (www.aaai.org). All rights reserved. synaptic weight matrix, Jij. The state xi of the ith neuron is updated according to the following rule: the mixed network should be linearly separable. A simple concatenation of memory and noise states would result in 0 1 n a strong linear dependence between mixed states, making X xi sign @ JijxjA (1) them difficult to linearly separate (Cover 1965). We recover j=1 linear separability in our model by instead constructing the mixed network as a random projection of memory and noise With this update rule, every initial state of the network states into a higher dimensional space (Barak, Rigotti, and deterministically falls into one of a number of stable fixed Fusi 2013). points which are preserved under updates. The identity of The connections from the mixed network back to the these fixed points (attractors or memories) can be controlled memory network that induce the transition are slow connec- by appropriate choice of Jij, according to any of various tions (see (Sompolinsky and Kanter 1986)); they are time- learning rules (Hebb 2005; Rosenblatt 1957; Storkey 1997; delayed by a constant τ and are active at intervals of τ. This Hillar, Sohl-Dickstein, and Koepsell 2012). If the network allows the memory network to stabilize its previous state is initialized at a corrupted version of a memory, it is then before a transition occurs. Thus, at every time step, each able to converge to the true memory, provided that the cor- memory neuron takes a time-delayed linear readout from rupted/noisy initialization falls within the true memory’s the mixed representation, adds it to the Hopfield contribu- basin of attraction. This allows Hopfield networks to be a tion from the memory network and passes the sum through model for content-addressable, associative memory. a threshold non-linearity. Due to symmetry of weights, a traditional Hopfield net- Formally, the dynamics are given by the following equa- work always converges to a stable attractor state. By adding tions, where xM (0 ≤ i < n ), xN (0 ≤ j < n ), and xQ asymmetric connections, it is possible to induce transitions i M j N k (0 ≤ k < n ) denote states of neurons in the memory net- between the attractor states. (Sompolinsky and Kanter 1986) Q work, noise network, and mixed network, respectively. The show that a set of deterministic transitions between attrac- function δmod(t) is 1 when t ≡ 0 (mod τ), otherwise 0; tor states can be learned with a Hebbian learning rule, by τ and the notation x(t − τ) denotes the state x at time t − τ means of time-delayed slow connections. Here, the transi- (otherwise assumed to be time t). The function ν(t; τ) repre- tion structure is built into the synapses of the network and sents a noise function that is resampled uniformly at random is not stochastic. The challenge we address in this paper is at intervals of τ.1 to leverage what we know from past work about determin- n istic transitions in attractor networks and combine it with nM Q ! M X M M mod X MQ Q a source of noise to make these transitions stochastic, with xi sign Ji` x` + δτ (t) Jik xk (t − τ) ; controllable Markov probabilities for each transition. `=1 k=1 nN ! N X N N Network architecture xj sign Jj` x` + ν(t; τ) ; We propose a network consisting of three parts: A memory `=1 0 n 1 network, a noise network, and a mixed network (see Fig. 1). Q nM nN Q X Q Q X QM M X QN N The memory network taken by itself is an attractor network xk sign @ Jk`x` + Jki xi + Jkj xj A ; with stabilizing recurrent connections; it stores states of the `=1 i=1 j=1 Markov chain as attractors. The noise network also stores a number of attractor states (the noise states); in its case, the The weight matrices J M ;J N ;J Q; and J MQ are learned QM QN transitions between attractors occur uniformly at random. (see below), while J and J are random, with nQ The mixed network is another attractor network, which nM ; nN . receives input from both the memory and noise networks, We implement the noise network as a ring attractor — according to fixed random weights. The attractors (mixed a ring of neurons where activating any contiguous half-ring states) of the mixed network are chosen according to the yields an attractor state. Here we have adapted the model memory and noise states; thus, a different pair of memory described in (Ben-Yishai, Bar-Or, and Sompolinsky 1995) state and noise state will induce the mixed network to fall to the discrete setting according to the following dynamics: into a different attractor. The memory network receives in- N −1 for nN =4 ≤ ji − jj ≤ 3nN =4; put from the mixed network, which induces it to transition Jij = between the memory attractor states.