UCLA UCLA Electronic Theses and Dissertations
Title The Role of Short-Term Synaptic Plasticity in Neural Network Spiking Dynamics and in the Learning of Multiple Distal Rewards
Permalink https://escholarship.org/uc/item/63r8s0br
Author O'Brien, Michael John
Publication Date 2013
Peer reviewed|Thesis/dissertation
eScholarship.org Powered by the California Digital Library University of California University of California Los Angeles
The Role of Short-Term Synaptic Plasticity in Neural Network Spiking Dynamics and in the Learning of Multiple Distal Rewards
A dissertation submitted in partial satisfaction of the requirements for the degree Doctor of Philosophy in Mathematics
by
Michael John O’Brien
2013 c Copyright by Michael John O’Brien 2013 Abstract of the Dissertation The Role of Short-Term Synaptic Plasticity in Neural Network Spiking Dynamics and in the Learning of Multiple Distal Rewards
by
Michael John O’Brien Doctor of Philosophy in Mathematics University of California, Los Angeles, 2013 Professor Chris Anderson, Chair
In this thesis, we assess the role of short-term synaptic plasticity in an artificial neural network constructed to emulate two important brain functions: self-sustained activity and signal propagation. We employ a widely used short-term synaptic plasticity model (STP) in a symbiotic network, in which two subnetworks with differently tuned STP behaviors are weakly coupled. This enables both self-sustained global network activity, generated by one of the subnetworks, as well as faithful signal propagation within subcircuits of the other subnetwork. Finding the parameters for a properly tuned STP network is difficult. We provide a theoretical argument for a method which boosts the probability of finding the elusive STP parameters by two orders of magnitude, as demonstrated in tests.
We then combine STP with a novel critic-like synaptic learning algorithm, which we call ARG-STDP for attenuated-reward-gating of STDP. STDP refers to a commonly used long- term synaptic plasticity model called spike-timing dependent plasticity. With ARG-STDP, we are able to learn multiple distal rewards simultaneously, improving on the previous reward modulated STDP (R-STDP) that could learn only a single distal reward. However, we also provide a theoretical upperbound on the number of distal reward that can be learned using ARG-STDP.
ii We also consider the problem of simulating large spiking neural networks. We describe an architecture for efficiently simulating such networks. The architecture is suitable for implementation on a cluster of General Purpose Graphical Processing Units (GPGPU). Novel aspects of the architecture are described and an analysis of its performance is benchmarked on a GPGPU cluster. With the advent of inexpensive GPGPU cards and compute power, the described architecture offers an affordable and scalable tool for the design, real-time simulation, and analysis of large scale spiking neural networks.
iii The dissertation of Michael John O’Brien is approved.
Dean Buonomano Joseph Teran Andrea Bertozzi Chris Anderson, Committee Chair
University of California, Los Angeles 2013
iv Table of Contents
1 Introduction ...... 1
1.1 Motivation ...... 1
1.2 Historical Context ...... 2
1.3 Thesis Overview ...... 4
1.4 Chapter Summaries ...... 6
2 Background: Computational Models for Neural Dynamics and Synaptic Plasticity ...... 8
2.1 Neuron Models ...... 8
2.1.1 Hodgkin and Huxley Neurons ...... 8
2.1.2 Leaky Integrate-and-Fire Neurons ...... 11
2.1.3 Izhikevich Neurons ...... 13
2.2 Plasticity Models ...... 13
2.2.1 Spike Time Dependent Plasticity ...... 14
2.2.2 Short Term Plasticity ...... 14
3 Short Term Plasticity Aided Signal Propagation ...... 16
3.1 Introduction ...... 16
3.2 RAIN Networks ...... 17
3.3 Signal Propagation ...... 19
3.3.1 Circuit Design ...... 19
3.4 Properties of STP ...... 21
3.5 STP Conditioned RAIN ...... 24
v 3.6 Signal Transmission in Coupled STP Networks ...... 25
3.6.1 Network Layout ...... 25
3.6.2 Coupled RAIN Dynamics ...... 28
3.6.3 Coupled Signal Propagation Dynamics ...... 30
3.7 Finding Master STP Parameters ...... 36
3.8 Analysis ...... 38
3.8.1 Analyzing Firing Rate Changes ...... 38
3.8.2 Critical Firing Rate ...... 44
3.8.3 Assessing Circuit Layer Correlation ...... 45
3.9 Conclusion ...... 46
4 Learning Multiple Signals Through Reinforcement ...... 48
4.1 Introduction ...... 48
4.2 Distal Reward Problem ...... 49
4.3 Methods ...... 50
4.3.1 Reward Modulated STDP ...... 51
4.3.2 R-STDP with Attenuated Reward Gating ...... 52
4.4 Single Synapse Reinforcement Experiment ...... 54
4.5 Generalization to Multiple Synapse Learning ...... 57
4.5.1 R-STDP with STP Learns Multiple r-Patterns ...... 59
4.5.2 ARG-STDP Learns Multiple r-Patterns ...... 60
4.5.3 STP Stabilizes ARG-STDP Network Learning Dynamics ...... 61
4.6 Properties of ARG-STDP with STP ...... 64
4.6.1 Reward Predictive Properties of r-Patterns ...... 64
4.6.2 Learning Robustness to Reward Release Probability ...... 66
vi 4.6.3 Learning Robustness to Reward Ordering ...... 68
4.6.4 Network Scaling ...... 69
4.6.5 The Reward Scheduling Problem ...... 69
4.6.6 Firing Rate Affects Learning Capacity ...... 72
4.6.7 Eligibility Trace Time Constant Affects Learning Capacity ...... 73
4.6.8 Interval Learning ...... 75
4.7 Analysis ...... 77
4.7.1 Defining the Correlation Metric ...... 77
4.7.2 Computing the Decaying Eligibility Trace ...... 78
4.8 Discussion ...... 81
5 HRL Simulator ...... 85
5.1 Introduction ...... 85
5.1.1 GPGPU Programming with CUDA ...... 86
5.1.2 Spiking Neural Simulators ...... 87
5.2 Simulator Description ...... 89
5.2.1 User Network Model Description ...... 90
5.2.2 Input ...... 92
5.2.3 Analysis ...... 95
5.3 Simulator Design ...... 96
5.3.1 Modular Design ...... 96
5.3.2 Parallelizing Simulation/Communication ...... 98
5.3.3 MPI Communication ...... 99
5.3.4 Simulation ...... 103
5.4 Performance Evaluation ...... 112
vii 5.4.1 Large-Scale Neural Model ...... 112
5.4.2 GPU Performance ...... 115
5.4.3 CPU Performance ...... 118
5.4.4 Network Splitting ...... 120
5.4.5 Memory Consumption ...... 120
5.5 Discussion ...... 121
5.6 Conclusion ...... 122
6 Conclusion ...... 124
References ...... 126
viii List of Figures
3.1 RAIN network configuration. The red arrows indicate inhibitory connections and the blue arrows are excitatory connections...... 17
3.2 The firing rate for the networks tested in the synaptic weight parameter sweep. 19
3.3 Signal propagation circuit network architecture. A naturally occurring feed- forward circuit is found within a RAIN network. The feed-forward connections are then strengthened, and this circuit is the circuit we observe for signal propagation...... 20
3.4 A) Signal propagation through 5 layers. B) A reverberating signal that is experienced in layer 5, but without inputs to layer 1. C) The average firing rate of the neurons in each layer for the duration of the experiment...... 21
3.5 The dynamic synapses plotted as a function of the presynaptic firing rate. The STP parameters can be chosen to produce a fixed point firing rate. Here,
the fixed point is 10 Hz, at which point µmn = Wmn, which was already chosen to produce stable RAIN firing...... 23
3.6 A) RAIN activity for 100 of the network neurons. The network parameters are suboptimal, leading to activity that lasts less than 2 seconds. B) RAIN activity for 100 of the network neurons. STP is employed, enabling the net- work to overcome the faulty choice in network parameters. The activity lasts more than 10 seconds...... 24
ix 3.7 The coupled signal propagation network architecture. Two circuit networks are weakly coupled together. The two networks have the same general neural parameters and configuration statistics, but the STP parameters for each net- work can be chosen independently, producing different firing dynamics in each network. The left network is referred to as Master, having STP parameters that yield self-sustained network activity. The right network is referred to as Slave, which has STP parameters that allow short excitatory bursts through, then kills network activity...... 25
3.8 A) Slave and Master are uncoupled. Master continues indefinitely whereas Slave dies. B) Slave has projections onto Master. Here Slave dies, as expected, and Master continues indefinitely...... 29
3.9 A) Master has projections onto Slave. This is sufficient to restart Slave when- ever Slave dies. B) Slave and Master are mutually coupled. In this case, only Slave received initial inputs, and Master relied on Slave for a jump-start. This demonstrates that Slave has the ability to start Master in the event Master dies. In this configuration, both networks thrive indefinitely...... 31
3.10 An analysis of the coupling required for the connections between Master and Slave. A & B) The average firing rate of Slave and Master, for one second of elapsed time, for different connectivity probabilities. These were performed with a bridge synapse strength of 30 nS. C & D) The average firing rate of Slave and Master, for one second of elapsed time, for different connectivity strengths. These were performed with a synaptic bridge connection probability of 2E-4. 32
3.11 For any layer k of interest, we construct a binary projection neuron pair. Layer k projects onto the excitatory indicator neuron (blue). The indicator neuron has an excitatory connection to the inhibitory neuron (red) which, in turn, inhibits the indicator neuron to prevent the indicator from being overwhelmed by the circuit layer during a stimulus...... 33
x 3.12 A & B) Signal propagation through 5 layers for Master and Slave. C & D) A reverberating signal that is experienced in layer 5 of Master, but not in Slave. E & F) The average firing rate of the neurons in each layer for Slave and Master respectively...... 35
4.1 System reward R, reward tracker Rk and success signal Sk for reward channel k
are plotted. The time constant τR controls the rate of convergence of Rk → R. The independent axis is discrete and denotes the number of times success signal k is presented. Though the domain is discrete, interpolation is used to emphasize the trend...... 53
4.2 Network configuration diagram. There are 1000 neurons, with 800 excitatory and 200 inhibitory and 1.5% network connectivity. The blue arrows indicate excitatory connections, and the red arrows indicate inhibitory connections. In
addition, N pre-synaptic neurons are chosen at random and denoted by P rek
for k ∈ [1, 2,...,N]. For each pre-synaptic neuron P rek, a random post-
synaptic neuron is chosen from its fan-out pool, and denoted by P ostk. The
synaptic weights between each P rek and P ostk is set to zero, whereas the rest of the synaptic strengths are either set to 0.3 (for excitatory synapses), or 0.8 (for inhibitory synapses). In addition, for each of the neuron pairs, k, a sepa-
rate reward channel is introduced, represented by a VTAK (ventral tegmental area) neuron that releases a global reward or success signal, represented by the green arrow...... 55
xi 4.3 Synaptic learning under R-STDP. a) & c) Evolution of the synaptic weight for the 1-synapse and 2-synapse learning experiments respectively, for a du- ration 10,000 seconds. Each color represents a unique synapse. b) & d) Con- ductance histogram showing the final network conductance distribution for the 1-synapse and 2-synapse learning experiments (in log scale), respectively.
Synaptic strength is measured in units of gmax, where 1.0 is fully potentiated. The synapses at 0.8 are inhibitory synapses, which are held static (red). . . 58
4.4 Synaptic learning under R-STDP with STP. a) & c) Evolution of the synaptic weight for the 20-synapse and 25-synapse learning experiments, respectively, for a duration of 100,000 seconds. Each color represents a unique synapse. b) & d) Conductance histogram showing the final network conductance distri- bution for the 20-synapse and 25-synapse learning experiments (in log scale),
respectively. Synaptic strength is measured in units of gmax, where 1.0 is fully potentiated. The synapses at 0.8 are inhibitory synapses, which are held static (red)...... 59
4.5 Synaptic learning under ARG-STDP. a) & c) Evolution of the synaptic weight for the 16-synapse and 17-synapse learning experiments, respectively, for a du- ration of 30,000 seconds. Each color represents a unique synapse. b) & d) Conductance histogram showing the final network conductance distribution for the 16-synapse and 17-synapse learning experiments (in log scale), re-
spectively. Synaptic strength is measured in units of gmax, where 1.0 is fully potentiated. The synapses at 0.8 are inhibitory synapses, which are held static (red)...... 62
xii 4.6 Analysis of average synaptic growth and firing rates. The neuron pools are
E, I, P re, P ost, indicating the excitatory, inhibitory, the P rek, and the
P ostk neuron pools. a) & c) & e) The average firing rates of each pool of neurons for the 16-synapse, 17-synapse, and 17-synapse with STP learning experiments, respectively. The inset in (c) shows the detrimental rise in the average firing rate of Post. b) & d) & f) The average synaptic strengths between the different neuron groups for the 16-synapse, 17-synapse, and 17- synapse with STP learning experiments, respectively, measured in units of
gmax...... 63
4.7 STP has a stabilizing effect on synaptic learning within the network. a) & b) Depict the evolution of the synaptic weights for a duration 30,000 seconds and the conductance histogram showing the final network synaptic conduc- tance distribution, respectively, for the 17-synapse learning experiment with- out STP. c) & d) Depict the evolution of the synaptic weights for a duration 30,000 seconds and the conductance histogram showing the final network con- ductance distribution, respectively, for the 17-synapse learning experiment with STP. In (a) and (c), each color represents a unique synapse and the
synaptic strengths are measured in units of gmax, where 1.0 is fully poten- tiated. In (b) and (d), plotted in log-scale, the synapses at 0.8 (red) are inhibitory synapses, which are held static...... 65
4.8 Synaptic learning under ARG-STDP with STP. a) & c) Evolution of the synaptic weight for the 30-synapse and 40-synapse learning experiments, re- spectively, for a duration 100,000 seconds. Each color represents a unique synapse. b) & d) Conductance histogram showing the final network conduc- tance distribution for the 30-synapse and 40-synapse learning experiments (in
log scale), respectively. Synaptic strength is measured in units of gmax, where 1.0 is fully potentiated. The synapses at 0.8 (red) are inhibitory synapses, which are held static...... 66
xiii 4.9 Heat map depicting the values of the correlation d(k, `) between the kth r- patterns and the rewards released from the `th reward channel, where k, ` ∈ [1,..., 10]...... 67
4.10 The network learning capacity is plotted as a function of p. The data points indicate verified learning whereas the error bars correspond to simulations that were conducted with a granularity of 10 r-patterns. Thus, the error bars are one-sided with a length of 9...... 68
4.11 In ARG-STDP, the reward’s effect on the weight gain in a synapse is dependent on the amount of time that passes from the completion of the r-pattern until the presentation of the reward. Here, consider the effects of a reward at time
zero on the r1-pattern, which is within the 2 second RGI, and the r2-pattern, which is beyond the RGI. Though the length of RGI is somewhat arbitrarily picked, its effects are clear, and it gives us a benchmark to compare with across experiments...... 71
4.12 The average eligibility trace, hEiji, as a function of N, the number of reward channels. Network learning decreases as N becomes large. Several examples with various values of N have been simulated, demonstrating the decreasing learning capacity of a network...... 73
4.13 Synaptic learning under ARG-STDP with STP. Here the firing rates of P rek
and P ostk, for k ∈ [1,...,N], is reduced to 0.5 Hz, down from 1 Hz in previous experiments. a) Evolution of the synaptic weight for the 120-synapse learning experiment, for a duration 800,000 seconds. Each color represents a unique synapse. b) Conductance histogram showing the final network conductance distribution for the 120-synapse learning experiments (in log scale). Synaptic
strength is measured in units of gmax, where 1.0 is fully potentiated. The synapses at 0.8 (red) are inhibitory synapses, which are held static...... 74
xiv 4.14 Synaptic learning under ARG-STDP with STP. Here τE = 300 ms, down from 1000 ms in previous experiments. a) Evolution of the synaptic weight for the 100-synapse learning experiment, for a duration 200,000 seconds. Each color represents a unique synapse. b) Conductance histogram showing the final network conductance distribution for the 100-synapse learning experiments
(in log scale). Synaptic strength is measured in units of gmax, where 1.0 is fully potentiated. The synapses at 0.8 (red) are inhibitory synapses, which are held static...... 75
4.15 Interval learning. Each color represents a unique synapse. a) A seesaw pattern emerged in some of the simulations. In this case, the following were used: a
reduced spiking rate (0.5 Hz) for the P rek and P ostk neurons; two synaptic groups, each of size 30, for 60 total synapses; learning intervals of 100,000 seconds; and β = 1.3. This simulation was ran for 600,000 seconds. b) In this experiment the following were used: a reduced spiking rate (0.5 Hz) for
the P rek and P ostk neurons; two synaptic groups of size 100, for 200 total synapses; learning intervals of 10,000 seconds; and β = 1.1. This simulation was ran for 1,200,000 seconds. c) In this experiment the following were used:
τE = 300 ms; two synaptic groups of size 100, for 200 total synapses; learning intervals of 10,000 seconds; and β = 1.1. This simulation was ran for 300,000 seconds...... 76
4.16 Comparison of the decreasing eligibility traces for the standard experiment and the experiments from sections 4.6.6 and 4.6.7...... 81
5.1 The simulator modules of HRLSim with all the interactions between them is shown here...... 95
5.2 Flow charts showing how the communication thread is parallelized with the computation thread...... 97
xv 5.3 An example showing how dummy neurons can be used to simplify message passing...... 100
5.4 Dynamic spike packing is compared with the AER approach for simulating a network with 5000 outgoing axons...... 102
5.5 The conversion from a graph representation of a network to a flattened linear array of the same network...... 105
5.6 Graph showing the timing breakdown of a GPU simulation with neuron inte- gration in green, pre-synaptic updates in blue, simple post-synaptic updates in red, and optimized post-synaptic updates in purple...... 109
5.7 (Top) Two 80% excitatory / 20% inhibitory networks connected in a small world fashion. Here 25% means that 25% of all the neurons have outgoing axons that connect to an external network. Since the weight is fixed at zero the exact connectivity of the axons is irrelevant. 1:80, 1:20 and 1:1 indicates a fanout of 80, 20 and 1, respectively. (Bottom) The raster plot of 2000 neurons from one of these networks showing their bursting firing...... 112
5.8 GPU Results, 100,000 neurons per node. (A), (B), and (C) show the runtime distribution, with respect to ||I||, on 4, 16 and 64 nodes (400K, 1,600K, and 6,400K total neurons), respectively. (D) shows the total-time linear regression for plots (A), (B), and (C). (E) shows the histogram of ||I|| over I ∈ I. (F) shows how the runtime scales to network size...... 115
5.9 GPU Results, 20,000 neurons per node. (A), (B), and (C) show the runtime distribution, with respect to ||I||, on 4, 16 and 64 nodes (80K, 320K, and 1,280K total neurons), respectively. (D) shows the total-time linear regression for plots (A), (B), and (C). (E) shows the histogram of ||I|| over I ∈ I. (F) shows how the runtime scales to network size...... 116
xvi 5.10 CPU Results, 20,000 neurons per node. (A), (B), and (C) show the runtime distribution, with respect to ||I||, on 16, 64 and 128 nodes (320K, 1,280K, and 2,560K total neurons), respectively. (D) shows the total-time linear regression for plots (A), (B), and (C). (E) shows the histogram of ||I|| over I ∈ I. (F) shows how the runtime scales to network size...... 119
xvii List of Tables
3.1 Network parameters used in this chapter...... 18
3.2 Master and Slave STP parameters used in this chapter...... 26
3.3 The success rate for finding Master-like STP parameters for various regions of the STP parameter domain. Uniformly at random is the first entry and the most prolific regions defined in section 3.7 follow...... 39
4.1 Network parameters used in this chapter...... 51
5.1 Network parameters used in the benchmarks...... 114
5.2 Memory consumed by HRLSim during simulation for the 20K neuron network topology described in section 5.4.1. The left column reports memory for the 100 fanout networks used in the benchmarks throughout, and the right column reports memory for a more realistic situation of a 10K fanout...... 120
xviii Vita
2005 B.A. (Mathematics, Physics and Computer Science). Claremont McKenna College, Claremont, California.
2006 M.A. (Mathematics). UCLA, Los Angeles, California.
2003 Research Assistant. Reed Institute of Decision Science, Claremont, Cali- fornia.
2004–2005 Engineer Intern. Raytheon, El Segundo, California.
2005–2012 Teaching Assistant. Mathematics Department, UCLA.
2006–2008 Systems Engineer Intern, Aerospace Corporation, El Segundo, California.
2008 Adjunct Professor. Mathematics Department, Claremont McKenna Col- lege, Claremont, California.
2011 Adjunct Professor. Mathematics Department, Claremont McKenna Col- lege, Claremont, California.
2008–present Research Assistant. Center for Neural and Emergent Systems, Informa- tion and System Sciences Department, HRL Laboratories LLC, Malibu, California.
Publications
Fast Douglas Rachford Splitting Optimization Methods. Michael J. O’Brien and Thomas Goldstein. In preparation.
xix Using Short Term Plasticity in Symbiotic Coupled Networks to Aid Faithful Signal Propa- gation. Michael J. O’Brien and Narayan Srinivasa. In preparation.
Efficiently Passing Messages in Distributed Spiking Neural Network Simulation. Corey M. Thibeault, Kirill Minkovich, Michael J. OBrien, Frederick C Harris, Jr., and Narayan Srini- vasa. In preparation.
HRLSim: A High Performance Spiking Neural Network Simulator for GPGPU Clusters. Kirill Minkovich, Corey M. Thibeault, Michael J. OBrien, Aleksey Nogin, Youngkwan Cho, and Narayan Srinivasa. IEEE Transactions on Neural Networks and Learning Systems. In review.
A Spiking Neural Model for Stable Reinforcement of Synapses Based on Multiple Distal Rewards. Mike J. OBrien and Narayan Srinivasa. Neural Computation 2013, 25(1), 123- 156.
Equality in Pollard’s Theorem on Addition of Congruence Classes. Eva Nazarevicz, Mike O’Brien, Mike O’Neill and Carolyn Staples. Acta Arith. 127 (2007), 1-15
xx CHAPTER 1
Introduction
1.1 Motivation
The brain is by far the most sophisticated computing tool on earth. Though it can be outper- formed in certain specific tasks, such as chess and Jeopardy, its performance is unparalleled in solving a wide range of problems that require flexibility, creativity and complexity. For instance, the biological brain is magnitudes better than any artificial architecture known at both navigating ever changing environments and the recognition of people and objects despite spatial orientation, partial obstruction and deterioration or the effects of aging. These tasks are considered simple, even to young children, and yet machines cannot efficiently/accurately achieve them. On the other end of the spectrum, the brain is capable of complex reasoning, deduction and art: calculating the age of the universe, proving the existence of arbitrarily large arithmetic progressions within the primes, the Sistine Chapel fresco and the Toccata and Fugue in D minor. Each of these, while not considered easy, were achieved by the cre- ativity and brilliance of the biological brain architecture. It is clear that the brain possesses different, if not higher, computational abilities than artificial architectures have been able to achieve. Furthermore, this all can be achieved in a package that requires approximately a liter of space and thirty watts of power [Len03, AL01]. The most advanced computers, on the other hand, require upwards of 100 MW of power and 40 ML of space. So, it is with great interest that mankind uses its brain derived intellect in a quest in self-understanding.
We strive to create artificial neural networks that emulate the neurological dynamics of the brain in order to create computational systems that are able to come closer to the ca-
1 pabilities and efficiencies of the human brain for processing information and learning. The brain is a biologically evolved learning system, 6 billion years in the making, which provides great inspiration for developing artificial learning systems for solving very challenging prob- lems. In this thesis, we will explore some of the building blocks of the biologically inspired artificial neural network learning systems that are sought after.
1.2 Historical Context
The anatomical organization of the brain has long been studied as it was thought that the brain was the center of consciousness. Around 1900, the Spanish anatomist Santiago Ram´ony Cajal proposed the idea that discrete cells are the primary functional units, communicating with each other via specialized junctions, which were later coined as synapses by Sir Charles Sherrington. However, it was not until 1952 that Hodgkin and Huxley discovered that the action potential, which serves as the communication between neurons, is generated by a series of chemical reactions in which the electrical potential between the intracellular and extracellular regions is manipulated by ion channels in the cell membrane [HH52]. Hodgkin and Huxley provided a mathematical model to describe the nonlinear evolution of the cellular membrane potential. The Hodgkin and Huxley model, discussed in section 2.1.1, is the foundation of current neuroscience research.
Despite the breakthrough of the Hodgkin and Huxley model for cellular dynamics, alone it is not enough to produce the evolving intelligence that is paramount to our species. For instance, artificial neural networks have long been a staple of computer science. An artificial neural network is typically defined as a network of connected compute nodes, where the specific computation within a node is called the activation function. The activation function is a mapping from the node’s input weights to the node’s output. For a proper connection weight set, artificial neural networks can demonstrate very complex computational behavior, such as face recognition. However, for a given activation function, the behavior is static with respect to the connection weight set. So, even with the highly evolved, non-linear activation
2 function described by the Hodgkin-Huxley equations, or for any activation function, the network behavior can be complex, but cannot learn. It is the synaptic plasticity in the brain that allows for learning, making the brain special. With plastic weights, a network can evolve to counter unexpected difficulties. For instance, a network can deal with a sudden loss of nodes (injury), or a change to the rules of the game (environmental changes). This consideration is essential to our being, as mankind, but also has important applications in science. For example, if an unmanned space mission beyond the reach of practical radio communication is damaged by space debris, it could relearn to navigate by rewiring the important connections within its neural architecture, through trial and error as provided by the environment.
Given evolving weights, the emergent network behavior can fall into a broad spectrum of possibilities, which can be achieved through efficient synaptic plasticity rules that optimize a network’s behavior. In computer science, genetic algorithms, gradient descent, and sim- ulated annealing are amongst the tools used for evolving optimal synaptic weights. These tools produce good weight sets for solving a particular problem, however an on-line learning algorithm is required for neural networks to learn in real-time, through interactions with the environment. Computer scientists employ a number of on-line learning techniques to solve this problem, such as temporal difference learning, but in this work we will be investigating biologically plausible synaptic plasticity rules.
In 1949, Donald Hebb proposed the existence of a biological mechanism through which synaptic connections between causally connected neurons become stronger, whereas other connections become weaker [Heb49]. Causally connected neurons are a pair of neurons j and i, where neuron j has a feed-forward connection to neuron i, and spikes in neuron j tend to elicit spikes in neuron i. In this case, we say that neuron j participates in firing neuron i. In 1973, evidence for such synaptic weight changes was discovered, and the plasticity was termed long-term potentiation [BL73, Lo03]. Learning induced by causally spiking neurons has become known as Hebbian learning. It has provided modern neuroscience with an on- line mechanism through which neural networks can evolve and learn from interactions with
3 environmental stimuli.
The precise form of the learning, however, is a very important research problem. There is volumes of literature attempting to address this question. The true and full nature of synaptic plasticity is likely some combination of the mechanisms that have been studied, as well as some that are yet to be discovered. For instance, though Hebbian plasticity has been observed, so has anti-Hebbian plasticity, as well as short-term (non-permanent) plasticity, amongst others. Yet, experimental difficulties arise in the validation of the proposed synaptic plasticity models due to the difficulty in measuring isolated synapses, let alone a cluster of interacting neurons. However, in general it is important to consider a variety of biologically plausible models, not only to push forward the theory of biological synaptic plasticity and the emergent neural dynamics, but also to provide learning mechanisms that can be used in artificial architectures.
1.3 Thesis Overview
In this thesis, we assess the role of short-term synaptic plasticity in an artificial neural network constructed to emulate two important brain functions: self-sustained activity and signal propagation. Short-term plasticity is a mechanism by which the synaptic weights are temporarily altered with respect to the firing rate of the presynaptic neuron. For instance, in some types of synaptic connections, a fast firing presynaptic neuron can exhaust the connection, leading to a reduced postsynaptic response for subsequent presynaptic spikes by depressing the effective synaptic weight. Short-term plasticity is often ignored in com- putational neuroscience because its dynamics do not shape the long-term synaptic weight distribution and, probably, because the usefulness of it is not fully understood. We val- idate the usefulness of short-term plasticity for self-sustained network activity as well as signal propagation within a neural network, demonstrating that the dynamics of short-term plasticity produce interesting network wide dynamics.
We also extend a form of long-term (permanent) Hebbian synaptic plasticity in order
4 to develop a model that can learn from distal, or delayed, rewards. Hebbian models rely on local interactions between the presynaptic and postsynaptic neurons. The system we develop assumes an additional global reward signal, such as dopamine within the brain, that modulates the Hebbian plasticity rule. The introduction of the extracellular global reward allows for learning from distal rewards. This is an important feature as learning via environmental feedback is a crucial type of learning. We found the learning within the system to be unstable under an increased (more than one) number of independent distal rewards. We then augmented the system with short-term plasticity and investigated its stability under learning multiple distal rewards. The stabilization properties of short-term plasticity aid in stabilizing the long-term Hebbian learning, enabling the stable learning of multiple distal rewards. This further validates the usefulness of short-term plasticity.
In studying neural networks, in addition to developing appropriate neural and synaptic models, there exists the computational task of implementing the models. Throughout this thesis, we consider relatively small neural networks (20,000 neurons), but in general these are just atomic models built to demonstrate the functional advantages of certain techniques. As the field of computational neuroscience continues to mature, the refined atomic models will be the building blocks for large-scale models. The building blocks will interact with each other, providing for complex emergent behavior. For instance, in [IE08] a large-scale model of a mammalian thalamocortical network is simulated. The network is comprised of one million neurons and almost half a billion synapses. In this model, it took one minute to simulate one second of activity. As these large-scale models are developed, it is necessary to produce a neural network simulator capable of efficiently simulating large networks. In this thesis, we propose a simulation architecture to address this concern, and demonstrate its ability to efficiently simulate large-scale neural networks.
5 1.4 Chapter Summaries
In this work, we consider two standard types of synaptic plasticity. We consider long term spike timing dependent plasticity, commonly called STDP, and we consider short term plas- ticity, called STP. The specifics of these models will be discussed in chapter 2, along with an introduction to several important neuron models. STDP is considered the primary synaptic plasticity rule, slowly working to permanently alter the synaptic strengths based on spike correlations, however the specific model it follows is debated. We use a widely considered model proposed by Song and Abbott [SMA00]. We also use a standard form of STP, pro- posed by Markram et al. [MLF97], as a network regulatory device. The synaptic plasticity in this case is temporary and the effect on the synaptic strengths immediate.
In chapter 3, we use STP to help stabilize networks with self-sustaining random asyn- chronous spiking activity, and also to aid in faithful signal transmission within a noisy network’s sub-circuit. We couple two different types of networks in symbiotic relationship in order to provide the desired dynamics. One network provides stability with respect to self-sustaining activity, and the second network provides the medium in which signal propa- gation is more faithful. An analytical characterization to assist in finding the relevant STP parameters is derived. The characterization is demonstrated to boost the probability of finding a useful network by two orders of magnitude over a random search.
In chapter 4, we consider the distal reward problem in which a neural network is to learn a stimulus signal based on a subsequent, and delayed, reward. This is analogous to Pavlovian, or classical, conditioning. By augmenting traditional STDP with a reward trace, often likened to an extracellular presence of dopamine, the distal reward problem was solved for a single distal reward [Izh07b, Flo07]. It was, however, thought that this technique was limited in scope to learning a single distal reward [FSG10]. In chapter 4 we employ STP, in combination to the reward modulated STDP, to learn multiple distal rewards. We also develop a novel learning rule in which the effects of the dopamine modulation attenuate with time. This further enhances the number of rewards we are able to learn. With this
6 algorithm, we demonstrate the learning of upwards of 200 distal rewards. Despite this success over previous methods, we also demonstrate a theoretical upper-bound on our technique.
In chapter 5, the problem of large-scale neural network simulation is addressed. A sim- ulator is described, designed specifically for efficiently simulating large scale models. The simulator is the first designed to exploit a parallel architecture of many GPGPUs (general purpose graphics processing units). Each GPGPU is characterized by a very high compu- tational throughput, achieved by its highly parallel architecture. We demonstrate that the proposed simulator architecture scales well with large networks.
7 CHAPTER 2
Background: Computational Models for Neural Dynamics and Synaptic Plasticity
The computational power of the brain relies on robust and fault tolerant neural networks. The complex behavior of the brain is realized through the individual firing of each cell along with the complex network configuration joining the cells together. The classic action poten- tial mechanisms of a neuron in computational neural modeling are presented in section 2.1. In order to be robust and fault-tolerant, the network connection complexities required for high-level activity must evolve naturally from a set of (chemically induced) rules. These rules are far from fully understood and are the subject of wide research. However, many basic principles have been found. The computational models typically used for plasticity are presented in section 2.2.
2.1 Neuron Models
In this section, we present several different neural models that accomplish the spiking dy- namics that are the basis of the brain’s robust computational power. Each of these models have their own strengths as well as drawbacks.
2.1.1 Hodgkin and Huxley Neurons
The spacial extent of a neuron is defined by the cell membrane that separates the intracellular contents from the cell’s environment. This membrane acts as an insulator between the intracellular and extracellular ions. This insulator induces a concentration difference in ion
8 density, resulting in an electrical potential across the cell membrane. For each ion species, neurons have a large number of microscopic channels composed of selective proteins (ensuring selectivity to the ion species). In each microscopic ion channel, the associated proteins form a small number of physical gates that regulate the flow of the ion species across the channel. Each gate can be either in a permissive or a non-permissive state. If all of the gates within an ion channel are in the permissive state, then ions are able to flow across the channel, and we call the channel open. If any of the gates are in the non-permissive state, ions cannot flow across the channel, and the channel is called closed [NR98].
When a membrane potential reaches a certain threshold through ion-gating interactions with the environment, the voltage triggers a non-linear sequence of ion channels opening and/or closing. This produces a 2 ms process of depolarization, followed by repolarization. The membrane then resets to the cellular resting potential. This is the neural action-potential modeled by the Hodgkin and Huxley neuron. Hodgkin and Huxley were the first to simplify the study of the membrane potential as an electrical circuit, considering the neuron mem- brane as a capacitor and the potential across ion channels as batteries. They proposed that the ionic conductances of a neuron were dynamically changing functions of the membrane potential [HH52]. It is now known that the voltage dependence is due to the biophysical properties of the ion channels. Given an input current, I(t), charge will build up on the capacitor, or leak through the channels. The electrical circuit is described by
dV C = I + I , (2.1) m dt ion ext
where V is the membrane potential, Cm is the membrane capacitance, Iext is the externally
applied current and Iion is the net flow of ion current across the membrane. Iion is the sum of
9 3 INa = gNam h(ENa − V ), (2.2)
4 IK = gKn (EK − V ), (2.3)
IL = gL(EL − V ), (2.4) representing the sodium current, potassium current and leakage current respectively [NR98].
For r ∈ {Na, K, L}, gr and Er correspond to the experimentally normalized macroscopic conductance and equilibrium potential for the macroscopic ion channel (which is the aggre- gate of the microscopic ion channels), and m, h, n ∈ [0, 1] are gating probability variables for different types of gates, examined below.
If we consider a single type of gate and its probability p of being in the permissive state, then the probability transition is assumed to obey the first-order kinetics
dp = α(V )(1 − p) − β(V )p. (2.5) dt
Here α is a voltage-dependent rate constant describing the transition from the non-permissive state to the permissive state. Likewise β is a voltage-dependent rate constant describing the transition from the permissive state to the non-permissive state. They are both fit to exper- imental data. With this formalism, the microscopic sodium channels are governed by three independent m-type gates and one independent h-type gate, resulting in equation (2.2) when the channels are considered in aggregate. Likewise, each microscopic potassium channel is governed by four independent n-type gates, resulting in the macroscopic behavior of equa- tion (2.3). The combined dynamics, known as the Hodgkin-Huxley neuron model (HH), produces the four-variable ODE system [NR98]:
10 dV C = g m3h(E − V ) + g n4(E − V ) + g (E − V ) + I , (2.6) m dt Na Na K k L L ext dm = α (V )(1 − m) − β (V )m, (2.7) dt m m dh = α (V )(1 − h) − β (V )h, (2.8) dt h h dn = α (V )(1 − n) − β (V )n. (2.9) dt n n
The HH model is the basis for most biophysical neuron models. Traditionally, however, most large-scale neural network simulations require only an emulation of macro neural dy- namics, and a simpler high-level abstraction of the HH dynamics is used. Two common models will be presented below.
2.1.2 Leaky Integrate-and-Fire Neurons
A leaky integrate-and-fire (LIF) [Abb99] neuronal model simplifies the Hodgkin-Huxley model by assuming a binary spike or no-spike neuron in which pre-synaptic inputs are in- tegrated into a post-synaptic neuron. Once the post-synaptic neuron’s potential crosses a threshold, the neuron emits a spike. The assumption here is that the action-potential is more important to neural network behavior than the specifics of how the action-potential is generated at the cellular level. This assumption has led to the widespread use of abstraction models that represent the essential network-level dynamics of the Hodgkin-Huxley model.
In the following model, it is assumed that neurons are either excitatory or inhibitory. A pre-synaptic spike from an excitatory (inhibitory) neuron increases (decreases) the post- synaptic neuron’s membrane voltage. With this, each neuron, i, has spiking behavior gov- erned by the voltage equation
dV C i = g (V − V ) + gexc(t)(E − V ) + ginh(t)(E − V ) + I (t). (2.10) m dt L rest i i exc i i inh i ext
11 sp A neuron spike at time ti is defined by the reset criteria:
lim Vi(t) = Vthr (2.11) sp− t→ti
lim Vi(t) = Vreset. (2.12) sp+ t→ti
After an action potential, the voltages are clamped for a refractory period of 2 ms. In this dissertation: Vrest = -74 mV, is the resting neuronal voltage; Vthr = -54 mV, is the neural membrane action-potential threshold; Iext denotes external current; Eexc = 0 mV and Einh =
exc inh -80 mV are the excitatory and inhibitory reversal potentials respectively; and gi and gi are the summed conductance contributions from the excitatory and inhibitory pre-synaptic inputs, indexed by j, to post-synaptic neuron i. The dynamics of these conductances can be described as:
dg` X τ i = −g` + g` · W (t)δ(t − (tsp + ∆ )), for ` ∈ {exc, inh}. (2.13) ` dt i max ij j j j
Here, τexc = 5 ms and τinh = 20 ms are the conductance decay constants, δ is the Dirac delta sp function, and tj is the time of neuron j’s last spike and ∆j is the axonal delay for neuron j. An axonal delay is the delay in time between a neuron’s action potential is released near the neuronal soma and the action potential’s arrival at the axonal terminals, where synapses with other cells are formed. This value can be less than a millisecond, or more than 100 ms.
The values Wij(t) indicate the synaptic weight from neuron j to neuron i at time t and are
exc inh measured in units of gmax or gmax (depending on the type of presynaptic neuron), which is the maximum synaptic conductance. An input resistance of 150 MΩ is assumed throughout this work. The conductances can be plastic, as discussed in section 2.2.
12 2.1.3 Izhikevich Neurons
The Izhikevich neuron model is more costly computationally than the LIF neuron, but can recreate a larger range of neuron classes [Izh03, Izh07a, Izh04]. It is thought to be a good compromise between the computationally efficient LIF neuron and the accurate dynamics of the Hodgkin-Huxley neuron. The model uses continuous dynamics to represent all the different types of behaviors found in real neurons, and does so without the artificial thresh- olding employed by LIF neurons. The model is expressed by the simple membrane voltage equation
dV = 0.04V 2 + 5V + 140 − u + gexc(t)(E − V ) + ginh(t)(E − V ) + I (t), (2.14) dt i exc i i inh i ext a recovery variable
du = a(bV − u), (2.15) dt and the spike reset rules:
V ← c if V ≥ 30, then (2.16) u ← u + d.
2.2 Plasticity Models
In this section, the basic plasticity rules employed in this dissertation are presented. Synaptic plasticity rules serve an important role in neural network evolution as they are the key in how a neural network learns a behavior.
13 2.2.1 Spike Time Dependent Plasticity
Spike time-dependent plasticity, or STDP, is used as the basic synaptic plasticity model [SMA00] described succinctly by [Flo07]. STDP is a Hebbian [Heb49] learning rule that potentiates causal synaptic connections. Specifically, if a pre-synaptic spike precedes a post- synaptic spike, then the corresponding synapse will be strengthened. If, on the other hand, a post-synaptic spike precedes the pre-synaptic spike, then the corresponding synapse is weakened.
In this plasticity model, the term X (t) = P δ(t−tsp) denotes the spike train of neuron j spj j j as a sum of Dirac functions over the spike times of neuron j. The synaptic update rule for the weight Wij, between pre-synaptic neuron j and post-synaptic neuron i is given by:
˙ Wij(t) = Pij(t)Xi(t) − Dij(t)Xj(t − ∆ij) (2.17)
˙ Pij(t) Pij(t) = − + A+Xj(t − ∆ij) (2.18) τ+
˙ Dij(t) Dij(t) = − + A−Xi(t), (2.19) τ−
where Pij is the potentiation trace, tracking the influence of pre-synaptic spikes, and Dij is
the depression trace, tracking the influence of post-synaptic spikes. A+ and A− correspond to
the maximum potentiation and depression of synaptic strength possible, respectively, and τ+
and τ− determine the effective time windows for potentiation and depression, respectively.
To ensure network stability, β := A−τ−/A+τ+ < 1 so that depression is stronger than
potentiation [SMA00]. The values Wij, measured in units of gmax, are artificially limited to the interval [0, 1].
2.2.2 Short Term Plasticity
Short term plasticity (STP) temporarily modifies the synaptic weights used in the neural dynamics based on the pre-synaptic firing rate. In the experiments involving STP, the fol-
14 lowing algorithm was employed [TPM98, MWT98, MM02]. When invoking STP, integration of the voltage (equation (2.10) or equation (2.14)) is augmented by what is called an effective synaptic weight µij, rather than the absolute synaptic weights Wij. These effective synaptic weights are short-term modifications of the absolute weights, dependent on the pre-synaptic firing rate. That is, formally, replace equation (2.13) with
dg` X τ i = −g` + µ (t)δ(t − tsp) for ` ∈ {exc, inh}, (2.20) ` dt i ij j j where µij is computed using another set of equations [TPM98]:
µij = AijxU1 (2.21) u u˙ = − + Uij(1 − u)r(t) (2.22) τFij 1 − x x˙ = − U1xr(t) (2.23) τDij
U1 = u(1 − Uij) + Uij. (2.24)
Here, Aij is just a scaling constant, τDij and τFij are the depression and facilitation time constants, u tracks synaptic utilization, x tracks synaptic availability, r is the instantaneous
firing rate of pre-synaptic neuron j , Uij is a constant determining the initial release proba- bility of the first spike, and U1 is just a mathematical convenience factor.
15 CHAPTER 3
Short Term Plasticity Aided Signal Propagation
3.1 Introduction
In the absence of stimulus, the brain remains active. That is, the brain maintains a sustained neural background level of activity irrelevant of stimulus input. The baseline activity in neu- ral networks is referred to as RAIN activity, or Recurrent Asynchronous Irregular Nonlinear activity [BQH10, VA05, KSA08]. These networks can be achieved through a balance of exci- tation and inhibition, where the contributions from each nearly cancel. The activity then is a result of fluctuations about the mean [SN94, TS95, TM97]. In vivo measurements demon- strate that neural responses are highly variable [BW76, Dea81, SK93, HSK96, ALG00]. It has been established as an essential aspect of neural models in order to reproduce the data recorded for such responses [USK94]. Oftentimes, models employ random noise from an ex- ternal source. However, though neurons are subject to external noise, it is evident that most cortical variability is generated from internal activity [ASG96]. Sparsely connected balanced networks of spiking neurons can sustain the background noisy activity without the need of a random external source [VS96, VS98, AB97, Bru00, MHK03, LAH04].
Once a background of neural activity is in place, it is important to understand how signals can be faithfully transmitted through the noise. External signal inputs can blow up or dissipate along the signal path, causing either system-wide runaway activity, obfuscating
This chapter is joint work with Narayan Srinivasa.
16 the original signal, or a loss of information.
In this chapter, we first consider the problem of generating RAIN networks, and then we examine the faithfulness of signal transmission through an embedded circuit. We use STP (section 2.2.2) in a novel coupled network to both enhance network RAIN activity as well as boost network signal transmission capabilities. We then explore the problem of selecting the appropriate STP parameters.
3.2 RAIN Networks
Figure 3.1: RAIN network configuration. The red arrows indicate inhibitory connections and the blue arrows are excitatory connections.
To construct a RAIN network, we use 8,000 excitatory neurons and 2,000 inhibitory neurons with a connectivity of 1.5%. This means that for any two neurons j and i, the probability that there is a connecting synapse from j to i is 1.5%. Figure 3.1 is the network
diagram. The uniform synaptic strengths the excitatory pool are denoted by Wexc and the
synaptic weights from the inhibitory pool are Winh. For example, for any excitatory neuron
j, and any other neuron i in j’s fanout pool, the strength of the synapse from j to i is Wexc.
With the network parameters in table 3.1, we proceed by performing a parameter sweep
17 All Networks
Cm = 200 pF gleak = 10 nS
Einh = −80 mV Eexc = 0 mV
Vthresh = −54 mV Vreset = −60 mV
Erest = −74 mV fanout = 150
τexc = 5 ms τinh = 15 ms
Table 3.1: Network parameters used in this chapter.
across the values (Wexc,Winh) ∈ (0, 10] × (0, 100], in nS, with a discretization of .1 and 1 nS respectively. For each parameter set, 200 random neurons (allowing both excitatory and inhibitory) are stimulated with Poisson distributed current for 50 ms, resulting in an initial 60 Hz of activity, at which point all external inputs are turned off and the network’s activity is recorded for 2 seconds. The final 100 ms of activity is analyzed for average firing rate and inter-spike-interval coefficient of variation for each neuron. These values are then averaged across the network, and recorded. The goal is to find sustainable asynchronous network activity between 10 and 20 Hz. The coefficient of variation above 1 ensures network
asynchrony. With this approach, we were able to find (Wexc,Winh) = (4.1 nS, 98 nS), amongst others (figure 3.2), with a low firing rate and asynchronous activity, all sustained for at least 2 seconds. These parameters are used throughout this chapter.
18 Figure 3.2: The firing rate for the networks tested in the synaptic weight parameter sweep.
3.3 Signal Propagation
3.3.1 Circuit Design
An important aspect of neural computation is the transmission of information within the cortex [VA05, KRA10]. The signal transmission traits of a neural network is thus an im- portant feature to study. Here we consider a simple model for signal transmission proposed in [VA05]. Once a sustainable background network exhibiting RAIN activity is established, we select a random 5-layer circuit from the network in the following way. The first layer of the circuit consists of 30 random neurons from the network. The second layer of the circuit consists of 30 neurons selected from the pool of postsynaptic neurons from the first layer, with the requirement that any neuron selected must have at least 3 feed-forward connections from layer 1. For layer n > 2, we select 30 neurons from the pool of postsynaptic neurons of
19 Figure 3.3: Signal propagation circuit network architecture. A naturally occurring feed-forward circuit is found within a RAIN network. The feed-forward connections are then strengthened, and this circuit is the circuit we observe for signal propagation. layer n − 1, where each selected neuron has at least 3 feed-forward connections from layer n − 1. In addition, we impose a no short-circuit requirement, where each neuron in layer n has no connections from a neuron in layer k, where k < n−1. This forces the signal to prop- agate through the layers in order. If 30 neurons cannot be selected for a layer, the layer will consist of as many neurons as can be found satisfying the requirements. The feed-forward synaptic weights in the circuit are strengthened by a factor of 16. See figure 3.3 for the network and circuit architecture. This design we refer to as the circuit network. It will be used as a basis for larger networks in subsequent sections. The signal propagation through 5 layers is shown in figure 3.4. Notice the reverberating signals that can be propagated as well, which are signals that are transmitted through the circuit without a layer 1 stimulus.
20 Figure 3.4: A) Signal propagation through 5 layers. B) A reverberating signal that is experienced in layer 5, but without inputs to layer 1. C) The average firing rate of the neurons in each layer for the duration of the experiment.
3.4 Properties of STP
The introduction of STP (see section 2.2.2) into the neural dynamics strongly influences the stability of the RAIN activity and a network’s ability to propagate signals. In [STM07], it was demonstrated that synaptic STP dynamics can support a fixed average network firing rate, leading to stable firing dynamics. In section 3.8.1, the linear approximation to the change in firing rate with respect to the change in network inputs is examined, using mean
21 field theory. In this section we present the fixed-point requirements, and the heuristics behind the analysis conducted in section 3.8.1, following [STM07].
First, consider the steady state for equations (2.21) to (2.24), given a steady firing rate r∗ (steady state values indicated by asterisks):
∗ ∗ ∗ µij = Aijx U1 (3.1) ∗ ∗ τFij Uijr u = ∗ (3.2) 1 + τFij Uijr
∗ 1 x = ∗ ∗ (3.3) 1 + τDij U1 r ∗ ∗ U1 = u (1 − Uij) + Uij. (3.4)
Assuming that the static weights Wij were selected to give an average network firing rate at r∗, the dynamical synapses can produce a fixed point at firing rate r∗, if the multiplicative constant Aij is selected in the following way. For a given set of STP parameters and desired firing rate r∗, pick
∗ Wij Aij(U, τDij , τFij , r ) = ∗ ∗ ∗ ∗ . (3.5) x (U, τDij , τFij , r )U1 (Uij, τDij , τFij , r )
∗ We can now compute the effective weight µij at r :
∗ ∗ Wij ∗ ∗ µij = Aijx U1 = ∗ ∗ x U1 = Wij, (3.6) x U1
which yields fixed firing dynamics, by assumption. Thus, with Aij picked in this way, we attain a fixed-point firing rate for the system. We now consider a heuristic for the stability of r∗. In order to make r∗ a stable fixed point, [STM07] proposed that for a given firing rate
r, the effective synaptic weights µij should obey:
1. When r < r∗:
22 Figure 3.5: The dynamic synapses plotted as a function of the presynaptic firing rate. The STP parameters can be chosen to produce a fixed point firing rate. Here, the fixed point is 10 Hz, at which point µmn = Wmn, which was already chosen to produce stable RAIN firing.
(a) Increase in synaptic strength efficacy for E → E and I → I synapses,
(b) Decrease in synaptic strength efficacy for E → I and I → E synapses.
2. When r > r∗:
(a) Decrease in synaptic strength efficacy for E → E and I → I synapses,
(b) Increase in synaptic strength efficacy for E → I and I → E synapses.
This heuristic is visualized in figure 3.5. Consider the left figure, which plots the excitatory to excitatory and inhibitory to excitatory dynamic synapses (µee and µei respectively). When a presynaptic excitatory neuron is firing greater than 10 Hz, then the assumption is that the postsynaptic neuron is also firing too fast, and then µee < Wee, providing less excitation than
23 a static synapse, to slow the network down. Similarly, for slower than 10 Hz presynaptic excitatory firing, µee > Wee, to speed up the network. If an inhibitory presynaptic neuron is firing faster than 10 Hz, then µei > Wei, providing more inhibition to slow down the postsynaptic neuron. If the presynaptic inhibitory neuron is firing slower than 10 Hz, then
µei < Wei, to lessen the inhibition to the postsynaptic neuron.
Figure 3.6: A) RAIN activity for 100 of the network neurons. The network parameters are suboptimal, leading to activity that lasts less than 2 seconds. B) RAIN activity for 100 of the network neurons. STP is employed, enabling the network to overcome the faulty choice in network parameters. The activity lasts more than 10 seconds.
3.5 STP Conditioned RAIN
We found that STP’s stabilization properties, discussed above, can improve the fault-tolerance of a network’s self-sustained RAIN activity despite a suboptimal balance between excitatory and inhibitory weights. Optimally balanced excitatory/inhibitory networks can produce con- tinual RAIN activity as seen in section 3.2. However self-sustained RAIN activity can be fleeting in networks with unbalanced parameters, or can arise from network wide shocks, caused from external inputs to the network or internal activity fluctuations, compromising the excitatory/inhibitory balance and subsequently silencing the network. Figure 3.6 (A)
24 shows the stunted activity in a network with improperly chosen network parameters. In- troducing STP dynamics can fix the RAIN activity, resulting in sustained activity. Fig- ure 3.6 (B) demonstrates sustained activity for a network identical to that in (A), with the addition of dynamical synapses. The activity, influenced by STP, persists for over 10 seconds.
3.6 Signal Transmission in Coupled STP Networks
3.6.1 Network Layout
Figure 3.7: The coupled signal propagation network architecture. Two circuit networks are weakly coupled together. The two networks have the same general neural parameters and configuration statistics, but the STP parameters for each network can be chosen independently, producing different firing dynamics in each network. The left network is referred to as Master, having STP parameters that yield self-sustained network activity. The right network is referred to as Slave, which has STP parameters that allow short excitatory bursts through, then kills network activity.
In this section, we consider the advantages of STP with respect to the signal transmis- sion experiment proposed in section 3.3.1. In this version of the experiment, we consider two types of STP parameters. Consider a dual-network setup, where each network is configured
25 Type Master Slave
U = 0.2 U = 0.2
E → E τD = 9 ms τD = 10 ms
τF = 72 ms τF = 5 ms
U = 0.2 U = 0.2
E → I τD = 10 ms τD = 9 ms
τF = 5 ms τF = 72 ms
U = 0.2 U = 0.2
I → E τD = 10 ms τD = 9 ms
τF = 5 ms τF = 72 ms
U = 0.2 U = 0.2
I → I τD = 9 ms τD = 10 ms
τF = 72 ms τF = 5 ms
Table 3.2: Master and Slave STP parameters used in this chapter.
26 as in section 3.3.1, and we include dynamical synapses governed by STP. We call the two networks Master and Slave, which are weakly coupled (see figure 3.7). The difference be- tween the Master and Slave networks lies solely in the type of STP parameters selected. In Master, the STP parameters chosen provide longevity, with respect to the RAIN dynamics, as demonstrated in section 3.5. The Slave network, on the other hand, employs STP param- eters that tend to kill spiking dynamics. For example, in the brain, the prefrontal cortex (PFC) is known to be reciprocally connected to many other areas in the brain. However, the STP parameters in the PFC are more facilitating than other cortical areas, which are primarily depressive [MBT08, Fus08]. Parts of the PFC are thought to be more important for neural modulatory purposes rather than computing and signal propagation [ZB06, BZ07], which would be analagous to the role played by Master in our coupled network architecture. In this chapter, we demonstrate that, likewise, Master is ill-suited for signal propagation, as Master’s tendency to sustain activity elicits reverberating signals. On the other hand, the dynamics in Slave tend to kill spike activity. This allows for quick bursts of signal to prop- agate through a circuit, yet the dynamics kill subsequent ripple effects. On its own, Slave cannot sustain activity, and dies quickly. However, as we will show below, the weak cou- pling between Slave and Master is enough to sustain baseline RAIN activity in Slave. This is important because, without such activity, stimulus signals will not propagate through the circuit as readily. The baseline activity keeps the membrane voltage of the neurons near threshold, enabling quick ascension to action-potential and, in turn, quick responses to input signals.
In this work, we are interested in the emergent global network stability and how signal propagates through the local circuits, as we demonstrate in the following sections. However, the Master and Slave networks, and the weak coupling between them, are approximates to more biologically relevant small-world networks [ASW06, SCK04]. In small-world topologies, neurons have very few neighbors, but the average synaptic pathway between any two random neurons is short, allowing for efficient global communication. Within either Slave or Master, though the probability that two neurons are neighbors is low (since the connectivity is just
27 1.5%), the average path length between two neurons is 2.1, in adherence with the fundamental structural properties of small-world architectures. Weakly coupling of Master and Slave still allows for relatively short global path lengths since the probability that a single neuron does not have a bridge neuron to the network in its fanout pool is almost zero. However, because the coupling is weak, the global transfer of dynamics between networks is slow, preventing sudden disturbances in either network to adversely affect the other. Thus, while we are analyzing the global stability in a very specific network architecture, these results are applicable to more biologically inspired networks. The key to the success of this model is in the local versus global dynamic, in which global transmission rates are low (but present, inducing the requisite stability properties), yet local feed-forward circuits are easily found. These attributes are shared with general biological networks.
3.6.2 Coupled RAIN Dynamics
We refer to the coupling between the networks as the bridge. In this chapter, except for figure 3.10, we use a bridge probability p = 2E-6, and a synaptic strength for each synapse of 30 pS. The bridge probability p indicates the probability that a neuron in one network is connected to a neuron in the other. As the networks have 10,000 neurons each, p = 2E-6, for example corresponds to 200 connections in each direction (assuming a bidirectional bridge), each of strength 30 pS. In the control configuration for the coupled network, these parameters are used, along with a bidirectional bridge. The STP parameters used for Master and Slave in this chapter are listed in table 3.2.
We now consider several variants of the coupled network configuration, where we vary the direction of the bridge, and examine the resulting activity in Master and Slave. We initialize each network independently (unless otherwise stated), as described in section 3.2. Figures 3.8 and 3.9 summarize the following results. When the networks operate indepen- dently (no bridge), Master sustains activity whereas Slave dies quickly as expected (see figure 3.8 (A)). When Slave bridges to Master, Slave dies quickly, but Master sustains ac- tivity, as in figure 3.8 (B). When Master bridges to Slave, figure 3.9 (A), both networks can
28 Figure 3.8: A) Slave and Master are uncoupled. Master continues indefinitely whereas Slave dies. B) Slave has projections onto Master. Here Slave dies, as expected, and Master continues indefinitely.
29 sustain activity. In figure 3.9 (B), a bidirectional bridge is used, but only Slave is initialized. This is enough to, in turn, initialize Master via the bridge. Master is also able to maintain activity in Slave through the bridge. These configurations demonstrate that Slave needs Master, whereas Master could potentially survive without Slave.1 However a bidirectional weak coupling is desirable. This is because, in the event that Master experiences a shock, Slave can prevent the death of Master through rare inputs, as seen in the initialization of Master through the bridge connections in figure 3.9 (B). This is important as achieving RAIN activity is a delicate balance, and difficult to attain [KSA08].
In figure 3.10 we examine how weak the network bidirectional bridge can be before activity breaks down. In figure 3.10 (A) and (B), the average firing rate of Slave and Master is plotted for different bridge connectivity probabilities p, as shown in the legend. For p = 2E-7, which is merely 20 connections in each direction, but networks thrive. For p = 2E-8, two bridge connections, Master lives while Slave dies. The network dies quickly for p = 2E-4, which is unsurprising as that corresponds to 20,000 connections in each direction, doubling the number of neurons in each network. In this case, a surge in activity is easily propagated globally, which in turn forces network synchrony and then death [KSA08]. In figure 3.10 (C) and (D) the average firing rate of Slave and Master is plotted for different connectivity strengths. The synaptic strength s in the legend indicates the connection strength of the bridge synapses. The coupled network can self-sustain even for s = 5 nS, but the network dies for s = 2 nS.
3.6.3 Coupled Signal Propagation Dynamics
In this section, we consider the signal propagation capabilities of each network. We build the coupled circuit network by combining two circuit networks from section 3.3.1. The networks are then coupled and endowed with independent STP parameters as in section 3.6.2. Again, we refer to the individual circuit networks as Master and Slave, based on the type of STP
1However, in the brain, the PFC, which could serve as Master, does require some stimulation to become self-sustaining.
30 Figure 3.9: A) Master has projections onto Slave. This is sufficient to restart Slave whenever Slave dies. B) Slave and Master are mutually coupled. In this case, only Slave received initial inputs, and Master relied on Slave for a jump-start. This demonstrates that Slave has the ability to start Master in the event Master dies. In this configuration, both networks thrive indefinitely.
31 Figure 3.10: An analysis of the coupling required for the connections between Master and Slave. A & B) The average firing rate of Slave and Master, for one second of elapsed time, for different connectivity probabilities. These were performed with a bridge synapse strength of 30 nS. C & D) The average firing rate of Slave and Master, for one second of elapsed time, for different connectivity strengths. These were performed with a synaptic bridge connection probability of 2E-4.
32 Figure 3.11: For any layer k of interest, we construct a binary projection neuron pair. Layer k projects onto the excitatory indicator neuron (blue). The indicator neuron has an excitatory connection to the inhibitory neuron (red) which, in turn, inhibits the indicator neuron to prevent the indicator from being overwhelmed by the circuit layer during a stimulus. dynamics used in the network. We then run the signal propagation experiment from before, with identical inputs to Master and Slave.
From figure 3.12 (D) and (F), we see that reverberating signals are prominent in Master. Visually, the circuit in Slave does not become hyperactive as readily, due to its tendency to kill activity. This deceases the severity reverberating signal. In order to quantify the visual results, we wanted to measure signal propagation in Master and Slave. For each circuit layer of interest, we introduce a binary projection neuron pair composed of an excitatory neuron and an inhibitory neuron. The excitatory neuron is called the indicator neuron, to which the circuit layer being measured projects. We consider the indicator neuron’s activity as representative of the layer that projects onto it. The indicator neuron connects to an in- hibitory neuron which, in turn, connects back to the indicator neuron, as shown in figure 3.11. The inhibitor prevents runaway excitement in the indicator. The connection strengths be- tween the layers and the respective indicator neurons, and the connection strengths within the binary pair of neurons are all consistent with the overall background network strengths (non-amplified). The negative feedback loop to the indicator neuron prevents it from becom- ing overwhelmed by a surge in the circuit layer’s activity. We will measure synchrony in the indicator neurons in order to determine faithful signal propagation. Denote the input layer’s
33 indicator neuron for Master and Slave by ML1 and SL1, respectively. Likewise, denote the final layer’s signal neuron for Master and Slave by ML4 and SL4 respectively. Note that in measuring synchrony, we limit ourselves to 4 circuit layers because we were able to fill layer 4 with respect to the criteria described in section 3.3.1, whereas layer 5 routinely was several neurons short.
In section 3.8.3, we outline a metric for measuring the synchrony of the indicator neurons. The metric m(·, ·) takes two spike trains x, y ∈ {0, 1}N to the unit interval [0, 1], where m(x, x) = 1 implies an exact correlation, whereas a correlation m(x, y) = 0 implies no correlation. Define χ(·) to be the spike train of a neuron. Formally, for a neuron n, the spike
t2 t2−t1 train vector χt1 (n) ∈ {0, 1} is defined by χt = 1 if neuron n spikes at time t, and zero otherwise, where t is measured in ms, and comes from the discrete interval {t1, ··· , t2 −1}. In this case, we ran the signal propagation experiment for 20 seconds, injecting signals of 180 Hz for 25 ms into layer 1 of each network. The inputs to both Slave and Master circuits were the same. The time between signals was chosen uniformly from {150,..., 450} milliseconds. We measured the correlation of the indicator neurons on the last 10 seconds of spiking activity. We found that
20K 20K m χ10K(ML1), χ10K(ML4) = .344 20K 20K m χ10K(SL1), χ10K(SL4) = .710, which is significantly different. On the other hand, under .3 was generally found to be the upper bound for the correlation between random neurons. That is, the signal propagating from layer 1 to layer 4 in the Slave network is much more faithful than that in the Master network. Thus, with this configuration, we have found a nice symbiotic relationship between Master and Slave. Master generates the requisite inputs to keep Slave alive, whereas Slave provides a better medium for signal propagation.
34 Figure 3.12: A & B) Signal propagation through 5 layers for Master and Slave. C & D) A reverberating signal that is experienced in layer 5 of Master, but not in Slave. E & F) The average firing rate of the neurons in each layer for Slave and Master respectively.
35 3.7 Finding Master STP Parameters
We have established that different types of STP parameters can generate different desirable network dynamics. In this section we attempt to classify regions of the STP parameter domain with respect to their likelihood of producing the dynamics requisite for a Master- like network. In section 3.4 it was conjectured that the derivative of µ (referred to in this section as dµ) is important to the dynamics induced by STP. In sections 3.8.1 and 3.8.2, an analytical argument is given in support of the conjecture. In [STM07], the authors propose which derivative signs are important for a fixed point firing rate with respect to their mean field current injection model, but in our case, our goal is a bit different. We simply require self-sustained neural dynamics with respect to the spike based neural networks we have been considering throughout this chapter. The analysis done in section 3.8.1 assumes small changes in network inputs, which will not always be true. Furthermore, the analysis in section 3.8.1 is a first order approximation to the network dynamics, ignoring the nonlinear dynamical interactions. The higher order dynamics are of course very difficult to predict, and it is currently unclear if any such predictions can be made. Also, for our purposes, we are less strict on the fixed point firing rate, and only require sustained activity at a reasonable rate. To be precise, our criteria is sustained firing for two seconds at a rate between 1 and 50 Hz. For these reasons, we cannot conclude at this point a hard and fast rule, based on analytical estimates, that will always yield a Master-like network. Based on our work in section 3.8.2, we can, however, significantly increase the chances of finding such networks.
First, we restrict the STP parameters as follows. Let U ∈ (0, 1) as it is a probability, and let τD, τF ∈ [0, 2], with the choices for the bounds here inspired by experimental data [STM07]. Somewhat arbitrarily, we use r∗ = 12 Hz as our desired firing rate for setting the value of Aij in equation (3.5). Our only requirement in this choice was that it was a biologically feasible firing rate, consistent with low background activity. We proceed by labeling the synaptic connection types as EE, EI, IE, and II for excitatory-to-excitatory, excitatory-to-inhibitory, inhibitory-to-excitatory and inhibitory-to-inhibitory, respectively.
36 Note that because STP has three parameters, and each network has four connection types, if we choose the STP parameters at random and independently for the different connection types, we are choosing parameters from a 12 dimensional space, making a brute force search intractable. Furthermore, as it turns out, a random search is unlikely to yield acceptable parameters, as demonstrated by the first entry in table 3.3.
To explore the parameter space more efficiently, we characterize three different dynamic synapse regimes, within the domain space, and explore each regime. For each connection type, we independently consider an STP parameter class of N, Pb, or Pa (giving us 81 different types of STP parameter combination for the four different connection types), where
∗ N, Pb, and Pa are dependent on r and rcrit, and defined as follows. In section 3.8.2, we derive a critical firing rate rcrit, in equation (3.25), such that for firing rates r below rcrit, dµ
∗ is positive, and for firing rates above rcrit, dµ is negative. Given a specific r and rcrit pair, we consider the following three regimes.
• N (negative): Classified by rcrit < 0, resulting in
– dµ > 0 can never occur, and
– dµ < 0 for all positive r.
∗ ∗ • Pb (positive, threshold below r ): Classified by 0 < rcrit < r , resulting in
∗ – dµ > 0 when 0 < r < rcrit < r , and
– dµ < 0 when rcrit < r.
∗ ∗ • Pa (positive, threshold above r ): Classified by r < rcrit, resulting in
– dµ > 0 for r < rcrit, and
∗ – dµ < 0 for r < rcrit < r.
For our analysis of the 12 dimensional STP network parameter space, we define a network type WXYZ, where W, X, Y, Z ∈ {N, Pb, Pa}, corresponds to a network where the EE, EI, IE,
37 and II connections are governed by the W, X, Y, and Z class of STP parameters, respectively. For each of the 81 network types, WXYZ, the STP parameters for each synapse type are selected from the corresponding STP class uniformly at random, with the additional caveat that we require the dynamical synapses to be slow moving: |dµ| < .01. We found that this additional requirement yields to our experiment success rates. Through random sampling of more than 108 parameter sets, it was found that about 18.8% of the STP parameters in the space violated the small dµ condition. For each network type WXYZ, and with the additionally constrained dµ, we chose 106 sets of parameters to test, uniformly at random, as stated above. For each parameter set chosen, we recorded a success if at the end of 2 seconds, the network was firing at a rate between 1 and 50 Hz. We present the highest percentages of success in table 3.3, along with the results for choosing parameters uniformly at random (the first entry).
Though the success rate for the most successful region in table 3.3 is only around 3.2%, it is two orders of magnitude larger than searching at random. This provides a preliminary heuristic on which regions of the STP parameter domain to search for parameters that induce self-sustaining spiking activity. From this statistical analysis, it is clear that the ideal parameter region requires that the EE and EI connections have dµ > 0 for firing rates above our target firing rate of 12 Hz. The other two types of synapses appear to be less important.
3.8 Analysis
3.8.1 Analyzing Firing Rate Changes
Here we expand upon the arguments of Sussillo et al. [STM07]. The following argument assumes the steady state dynamics described in equations (3.1), (3.3) and (3.4). In the following, we drop STP parameter subscripts and use D = τD and F = τF , for convenience.
We consider a network of two populations: excitatory and inhibitory. Let Wmn be the mean synaptic weight from the n population to the m population, where m, n ∈ {exc, inh}
38 Type # Successes % Success RAND 332 0.0332 PaPaPaPa 31978 3.1978 PaPaPbPb 19728 1.9728 PaPaPbN 17335 1.7335 PaPaNN 12478 1.2478 PaPaPaPb 9459 0.9459 PaPaNPb 5863 0.5863 PaPbPaPa 3353 0.3353 PaPaPbPa 3131 0.3131 PbPbPaPa 2257 0.2257 PaPaPaN 2082 0.2082 PaPbPbPb 1972 0.1972 PaPbPbN 1918 0.1918 PbPbPbPb 1710 0.171 PbPbPbN 1690 0.169 PaPbNN 1452 0.1452 PbPbNN 1425 0.1425 PaPbPaPb 1420 0.142 PbPaPaPa 1332 0.1332 PbPaPbPb 1013 0.1013
Table 3.3: The success rate for finding Master-like STP parameters for various regions of the STP parameter domain. Uniformly at random is the first entry and the most prolific regions defined in section 3.7 follow.
39 (where we occasionally use notation e = exc and i = inh). Similarly, let µmn be the mean dynamic synapse between the populations. We assume that the dynamic synapses are in- stantly equilibrating functions of the presynaptic firing rates. We assume that the synaptic
∗ ∗ ∗ T weights Wmn are chosen to produce a stable network firing rate of r = (re , ri ) for a con- T stant input of v = (ve, vi) . Here, re and ri are the average firing rates of the excitatory and inhibitory populations and ve and vi are the external inputs to excitatory and inhibitory
∗ populations. Recall that for r , we have µmn = Wmn, from the choice of the constants Aij in equation (3.5). Further, we assume that for some decay constant τm,
dr τ m = −r + f [v + µ (r )r − µ (r )r ], (3.7) m dt m m m me e e mi i i where fm is some monotonic function, called the rate transfer function. This is an approx- imation to the network dynamics, commonly referred to as the mean field approximation, where all dynamics are considered in aggregate. For a constant network firing rate r∗, equa- tion (3.7) yields
∗ ∗ ∗ ∗ ∗ ∗ ∗ rm = fm[vm + µme(re )re − µmi(ri )ri ]. (3.8)
∗ ∗ ∗ We use perturbation theory to examine the change in firing rate r + δr = (re + δre, ri + T T δri) for a change in external input v + δv = (ve + δve, vi + δvi) . With this perturbation, and the instant equilibration of the dynamic synapses, equation (3.8) becomes
∗ ∗ ∗ ∗ ∗ ∗ ∗ rm + δrm = fm[vm + δvm + µme(re + δre)(re + δre) − µmi(ri + δri)(ri + δri)]. (3.9)
For z = F (x), we use the linearization δz ≈ F 0(x) · δx. Up to first order in δr, equation (3.9) becomes
∗ ∗ δrm ≈ βm(δvm + Wmeδre + dmeδrere − Wmiδri − dmiδriri ), (3.10)
40 dµmn 0 ∗ ∗ ∗ ∗ where dmn = ∗ , βm = f (vm + Wmer − Wmir ), and we have used that µ (r ) = Wmn. dxn m e i mn n Define
∗ βe 0 Wee −Wei dee −dei re 0 B = , W = B , D = B . (3.11) ∗ 0 βi Wie −Wii die −dii 0 ri
With this notation, equation (3.10) can be written as
δr ≈ Bδv + Wδr + Dδr. (3.12)
The solution to this system is
δr ≈ (I − (W + D))−1 Bδv, (3.13) where I is the identity matrix. We proceed with the assumption that the dynamic synapses are almost static (D ≈ 0), and approximate δr with respect to δv up to O(D2).
In the following, we refer to the components of a matrix M with subscripts, as in Mij. We invert the matrix in equation (3.13), while ignoring higher order terms from D, and we get
1 1 − Wii − Dii Wei + Dei βe 0 δr ≈ δv α − c Wie + Die 1 − Wee − Dee 0 βi
1 βeδve (1 − Wii − Dii) + βiδvi (Wei + Dei) = , (3.14) α − c βeδve (Wie + Die) + βiδvi (1 − Wee − Dee) where we have defined
α = (1 − Wee)(1 + Wii) + WeiWie (3.15)
41 and
c = Dee(1 − Wii) + Dii(1 − Wee) + WeiDie + WieDei. (3.16)
1 2 We now estimate α−c with respect to assumptions that D ≈ 0 and sup Dmn inf Wmn, c which implies that α < 1, since every term c contains an element of D. We get:
∞ 1 1 1 X c k 1 c = α = = 1 + + O(D2). (3.17) α − c 1 − c α α α α α k=0
Combining equations (3.14) and (3.17), we can estimate the change in excitatory firing rate with respect to the change in excitatory input:
δre βe c ≈ 1 + (1 − Wii − Dii) δve α α β h c c i = e (1 − W ) 1 + − D 1 + α ii α ii α βe c Dii ≈ (1 − Wii) 1 + − α α 1 − Wii β (1 − W ) c D = e ii 1 + − ii α α 1 − Wii β (1 − W ) D (1 − W ) D (1 − W ) = e ii 1 + ee ii + ii ee α α α W D W D D + ei ie + ie ei − ii α α 1 − Wii β (1 − W ) D (1 − W ) W D W D = e ii 1 + ee ii + ei ie + ie ei α α α α D [(1 − W )(1 − W ) − α] + ii ee ii α(1 − Wii) β (1 − W ) D (1 − W ) W D W D D W W = e ii 1 + ee ii + ei ie + ie ei − ii ei ie . α α α α α(1 − Wii)
42 Substituting the values for the matrix components, we arrive at δr β (1 + β W ) β d r∗(1 + β W ) β W β d e ≈ e i ii 1 + e ee e i ii − e ei i ie δve α α α (3.18) β W β d r∗ β d r∗β W β W − i ie e ei i − i ii i e ei i ie . α α(1 + βiWii)
Similarly, we get
δr β β W β d r∗(1 + β W ) β d r∗(1 − β W ) e ≈ − i e ei 1 + e ee e i ii − i ii i e ee δv α α α i (3.19) β W β d r∗ β d r∗(1 + β W )(1 − β W ) − e ei i ie e + e ei i i ii e ee . α αβeWei
Observe that B is positive semi-definite by the monotonicity of f. We now prove that α is positive. If the synapses are static (D = 0), then equation (3.13) reduces to
δr ≈ (I − W)−1 Bδv,. (3.20)
Observe that α = det(I − W), and we have
1 1 − Wii Wei δr ≈ Bδv. (3.21) α Wie 1 − Wee
Substituting in the appropriate values, and solving for the change in δre with respect to δve, we have
δre 1 = (1 + βiWii)βe. (3.22) δve α
This can also be derived from equation (3.18) when all elements from D are set to zero. Because of the monotonicity of the rate transfer function, f, an increase to the excitatory
inputs ve must result in an increase in re, thus α > 0.
43 With both α > 0 and B positive semi-definite, and all of our weights Wmn ∈ [0, 1] (so positive), the amount of change in equations (3.18) and (3.19) are largely determined by the signs of the derivatives dmn, as explored in section 3.7.
3.8.2 Critical Firing Rate
Given that the dynamics of the network depends on on the rates of change of the dynamic synapses, we will proceed to characterize the parameters that give the desired characteristics.
∗ ∗ dµ As before, let m, n ∈ {exc, inh}. As dmn = dµmn/dr , we begin by computing dr from equations (3.1) to (3.4), where we have dropped the star notation and, again dropped STP
subscripts and used D = τD and F = τF , for convenience. We get:
dµ U(F − DF 2Ur2 − 2DF Ur − FU − DU) = . (3.23) dr (DF Ur2 + DUr + F Ur + 1)2
As U is a probability, and thus positive, the sign of the derivative is completely determined by the expression
F − DF 2Ur2 − 2DF Ur − FU − DU, (3.24)
which is quadratic in r. To find the transition point in equation (3.23), we equate equa- tion (3.24) to zero and solve for r. Recalling that U is bound to (0, 1), we arrive at
r 1 1 − U r = − + , (3.25) crit F UDF
where we have taken the (potentially) positive branch of the solution. We note that equa- tion (3.24) is dominated by a negative quadratic coefficient, so as the steady-state firing rate r grows large, the synapse governed by equation (3.23) is depressive. Thus, for any set of parameters, (U, D, F ), commonly referred to as UDF , one of two cases can happen.
44 First, if rcrit, as computed by equation (3.25), is negative, the synapse governed by the UDF parameters will always be depressive. On the other hand, for a positive rcrit, then for a steady-state firing rate r < rcrit, the synapses governed by UDF will be facilitating, whereas for r > rcrit, the UDF synapses will be depressive. In section 3.7, it is with this synaptic characterization that we search the UDF parameter space for STP parameters to give the Master-like networks.
3.8.3 Assessing Circuit Layer Correlation
The metric that is used in section 3.6.3 uses the inner-product of two spike train vectors, each convolved with a Gaussian and then normalized, to measure the correlation between the two vectors. Note that this correlation measuring technique is also used in chapter 4.
t2 t2 For a neurons n and m, we let x = χt1 (n) and y = χt1 (m). We define σ = 70 ms and ∆ = 9 ms. We found these values to be a good trade off in the following algorithm.
Let g be the Gaussian distributed vector, centered at t = 0 with standard deviation σ. Then the resultant convolutions x ∗ g and y ∗ g have Gaussian bumps centered at each of the spike occurrences, or ones, in the original vectors x and y, respectively.
Denote the discrete Fourier transform of a vector v by vb. Then, the Hermitian property of the discrete Fourier transform operator, Parseval’s identity and the convolution theorem are used to compute the metric described above:
x ∗ g y ∗ g m(x, y) = , (3.26) k x ∗ g k k y ∗ g k x ∗ g y ∗ g = [ , [ (3.27) k x[∗ g k k y[∗ g k x · g y · g = b b , b b , (3.28) k xb · gb k k yb · gb k where the multiplication in equation (3.28) is component-wise, allowing for the efficient computation of m(x, y) for large vectors. Since x, y and g are all positive, m(x, y) has range
45 [0, 1], and larger values indicate a greater correlation between the spike trains.
3.9 Conclusion
In this chapter, we began by finding synaptic weights that generate self-sustained RAIN activity in the network. This alone is a problem worth studying [VA05, KSA08]. We suc- ceeded in finding the correct balance in the spiking domain. We then proceeded to study signal propagation through the RAIN network, as in [VA05]. However, we found that re- verberating signals are prominent, degrading the faithfulness of the signal propagation. It is likely possible to tune the network just right so that the reverberating signals are not present, however such manual tuning is very difficult.
In light of this problem, we consider STP dynamics. We consider the work of [STM07], which demonstrates that STP can induce a steady state firing rate. We found STP parame- ters that could produce self-sustained RAIN activity, which was actually more fault-tolerant to an improper excitatory/inhibitory balance than static synapses alone. We also found STP parameters that quickly kill networks. In combining networks with both types of dynamics, we proposed a novel coupled Master/Slave network that relies on a symbiotic relationship between the networks, in which Master sets the pace for the coupled system and Slave is leveraged for faithful signal propagation. However, Slave on its own cannot survive.
Finally, we studied the problem of finding STP parameters to induce Master-like network activity. This problem was found to be difficult, yet we were able to find a heuristic, with analytical support, that increases the likelihood of finding such networks by two orders of magnitude over a random search of the parameter space.
More work needs to be done on paring down the STP parameter space, and a deeper analysis the STP parameters’ affects on neural dynamics needs to be conducted. This is a difficult problem, and a general analytical characterization may not be attainable. However, it is likely that some relationship amongst the STP parameters could be found to boost the overall success rate of finding Master-like networks, giving a deeper understanding of the
46 system.
47 CHAPTER 4
Learning Multiple Signals Through Reinforcement
4.1 Introduction
Reinforcement learning is an approach to trial-and-error learning where an agent’s actions are guided by a class of signals called rewards. Reinforcement learning models are built into agents/systems that can learn from their interaction with their environment. The reinforce- ment models have advantages over supervised learning models [RM87] because it obviates the need for a supervisor to provide real-time feedback to the agent. The rewards during reinforcement learning are derived from the environment to provide a sense of value to the agent to guide learning during agent-environment interactions. Typically the reward appears after the cues and actions that correspond to it and is known as the distal reward prob- lem [Hul43, Izh07b] and in the reinforcement learning community as the credit assignment problem [SB98]. Ultimately, the goal of designing such systems is to produce autonomous, self-programming systems that achieve the goals in a flexible and reliable manner.
Most computational approaches to modeling reinforcement learning have focused on the “temporal difference” algorithm [SB98, HFO10] which computes the expected reward using an explicit account of temporal discount [SB98]. In this chapter, the focus is on developing a biologically plausible approach to modeling the distal reward problem using spiking neural models. This is because the primary mode of communication between neurons in the brain
This chapter is joint work with Narayan Srinivasa and will appear in Neural Computation.
48 is encoded in the form of impulses, action potentials or spikes. This mode of communication enables the brain, composed of billions of neural cells, to consume less than 20W of power [Len03, AL01]. A solution to the distal reward problem in the spiking domain would thus be a very efficient solution.
The reward signaling in the mammalian brain has been linked to the dopamine sys- tem [SR90, LAS91]. A model that linked STDP and dopamine signaling was developed in both [Izh07b] and [Flo07], known as reward-modulated STDP (R-STDP). In R-STDP, the synapses are evolved by STDP and modulated by a global reward signal such as dopamine. Despite the success of R-STDP, [FSG10] demonstrate that R-STDP cannot learn multiple reinforcement tasks simultaneously. In this chapter, we extend R-STDP to solve the problem of simultaneously learning multiple distal rewards.
4.2 Distal Reward Problem
In Pavlovian conditioning experiments, an agent learns to associate certain cues with resul- tant rewards or punishments. This is reinforcement learning because the learning is derived from the reward (or punishment) administered following the cue. In the context of spiking neural networks, the spiking sequence that is associated with either a reward or punishment is referred to hereinafter as an r-pattern. Furthermore, the term reward will be used to mean either reward or punishment, since both can be used in reinforcement learning. Continu- ing with this terminology, in Pavlovian learning, reward lags the r-pattern by seconds, yet the reward still yields effective learning [Pav27, Hul43, HDB95, Sch98, DA01]. The delay between the r-pattern and reward is precisely the reason reinforcement learning is such a powerful tool: it allows for hindsight evaluation of the agent-environment interactions, which the agent can then incorporate into behavior modification. However, this delay also poses difficult questions. Since reward lags the r-pattern, the r-pattern is no longer present when the reward is available to aid in learning–which, in spiking neural networks, takes the form of synaptic strength modification.
49 The second observation is that the rest of the network continues to spike during the delay between the r-pattern and the system uptake of reward. Thus, if the reward is truly to enhance the r-pattern, making it more likely to appear in the future, how does the reward “pick-out” the particular spiking pattern which induced the reward? For instance, consider the situation where a dog is told to sit. Suppose the dog then performs two nearly simultaneous actions such as shaking its head and sitting. The dog is then of course given a treat for sitting. However, how does the dog “know” that the action of sitting was rewarded, and not that of shaking its head? The key of course is in repetition–but this is on the macroscopic/behavioral level. It is interesting to see the corresponding correlates at the cellular level. This problem of reinforcing a specific r-pattern over other spiking patterns in the network is called the “distal reward problem” [Hul43], or the “credit assignment problem” [Min61, BSA83, HDB95, SB98, DA01, WP05].
As discussed in the introduction, [Flo07] and [Izh07b] solved the distal reward problem for a single r-pattern, in the context of spiking neural networks, using reward-modulated STDP (R-STDP). In this chapter, we extend R-STDP to enable a spiking neural network to learn multiple r-patterns, as outlined in the following sections.
4.3 Methods
We use the LIF neuron model described in section 2.1.2, with the network parameters listed in table 4.1. In this chapter, we build on the STDP plasticity model from section 2.2.1. Here
A+ and A− correspond to changes of +0.5% and −0.65% of the maximum synaptic strength
possible, respectively, and the time constants τ+ and τ− determine the effective time windows
for potentiation and depression, respectively. In most simulations, β = A−τ−/A+τ+ = 1.3.
When changing β in this chapter, A+, τ+ and τ− are fixed while A− is varied. Note that gmax = 15 nS, chosen such that a pre-synaptic spike across a fully potentiated synapse is not strong enough, by itself, to elicit a post-synaptic spike from resting potential.
50 All Networks
Cm = 200 pF gleak = 10 nS
Einh = −80 mV Eexc = 0 mV
Vthresh = −54 mV Vreset = −60 mV
Erest = −74 mV fanout = 150
τexc = 5 ms τinh = 15 ms
A+ = 0.005 A− = 0.0065
τ+ = 20 ms τ− = 20 ms
Table 4.1: Network parameters used in this chapter.
Section 4.3.1 defines R-STDP, which expands STDP for use in reinforcement learning. In section 4.3.2 a new learning rule is developed, called ARG-STDP, which improves the reinforcement model, enabling the learning of multiple distal rewards.
We also employ STP, descibed in section 2.2.2, to stabilize network dynamics. When
STP is used, we use Aij = 2.03 (equation (3.5)), τD = 50 ms and τF = 20 ms, U = 0.5.
4.3.1 Reward Modulated STDP
Reward modulation can be used in a straight forward way to extend any Hebbian unsuper- vised learning rule [FSG10]. In this chapter, R-STDP is used, which results from reward modulation of the STDP rule [Flo07, Izh07b]. To extend STDP with reward modulation, a global, extracellular modulator, such as dopamine, is assumed to exist. If R(t) denote the extracellular dopamine concentration, then the plasticity equations are as in section 2.2.1, except that equation (2.17) is replaced with:
51 ˙ Wij(t) = α · R(t) · Eij(t) (4.1) ˙ Eij(t) Eij(t) = − + Pij(t)Xi(t) − Dij(t)Xj(t − ∆j), (4.2) τE where α controls the learning speed of the system. In all simulations in this chapter, α = 12.
The value τE = 1000 ms is the time constant governing the eligibility trace, Eij, which tracks the potential contributions to the synaptic weight change from the potentiation trace and the depression trace. These are potential weight changes because the weight will not be affected by Eij unless the system is gated on by the global reward modulator R(t). In general, R(t) can be punishing as well as rewarding, depending on its sign. Likewise, Eij can be positive or negative, determined by the temporal ordering of the spikes between neuron i and neuron j [Flo07, Izh07b]. The Eij are initialized to zero and essentially track the underlying STDP rule, with the exception that they are not confined to be within the interval [0, 1], and thus can be negative when a spike in neuron i precedes a spike in neuron j. Though in general R(t) can be positive or negative, the focus in this chapter will be on the positive feedback R(t) ≥ 0, without any loss of generality.
4.3.2 R-STDP with Attenuated Reward Gating
We assume a mechanism that slowly reduces the amount of “reward” released into the system as the preference for the correct firing sequence becomes stronger. That is, for each of the N signals, we use a separate attenuating success signal for each reward channel. Specifically, for each signal channel k ∈ [1, ··· ,N], a separate reward predictor, Rk, is introduced. This reward predictor is initially set to zero, and slowly tracks towards the the reward associated with each successful presentation of signal k. The function
R(t) − Rk if r-pattern k induces a reward, Sk = S(R, k, t) = (4.3) 0 otherwise
52 Figure 4.1: System reward R, reward tracker Rk and success signal Sk for reward channel k are plotted. The time constant τR controls the rate of convergence of Rk → R. The independent axis is discrete and denotes the number of times success signal k is presented. Though the domain is discrete, interpolation is used to emphasize the trend.
is called the success signal, which is broadcast to the system after each reward presentation. The success signal is a measure of the reward prediction error, which converges to zero as P Rk → R(t) (see figure 4.1). To enable individualized attention for each task, k Sk is used in place of R(t) in equation (4.1):
N ! dWij X = S · E (t). (4.4) dt k ij k=1 To ensure that the reward prediction error converges to zero individually for each reward channel, we use the following simple update rule:
R − Rk Rk ←− Rk + , (4.5) τR
where τR = 500. The constant τR determines the speed of convergence for Rk → R(t). In this case, convergence speed is relative to the number of presentations of success signal k.
53 For a given τR, after τR occurrences of signal k, then Rk ≈ .63 · R, whereas after 2 · τR presentations of success signal k, Rk ≈ .86 · R.
r In order to formally define R(t), let Tk be a list of r-pattern presentation times for channel r r r 1 3 k. For each tk ∈ TK , let dk be a delay uniformly selected from ( 2 τE,..., 2 τE). Let p denote the probability that an r-pattern induces a reward in channel k. With the exception of section 4.6.2, p = 1.0. Finally, let X be a random variable uniformly distributed on (0, 1). Define the indicator function P (X) to be 1 if X < p and zero otherwise. Then R(t) = R(t, X) can be specified as:
X X r r R(t) = δ(t − (tk + dk)) · P (X), (4.6) r r k tk∈Tk
Equation (4.6) implies that R(t) = 1, with probability p, whenever a delayed reward (induced by an r-pattern) is presented to the system, and zero otherwise. This learning algorithm is referred to hereinafter as attenuated-reward-gating of STDP, or ARG-STDP.
4.4 Single Synapse Reinforcement Experiment
In [Izh07b], a simple problem is introduced to demonstrate the learning of a single distal reward using R-STDP. We describe this experiment here, and then extend it in section 4.5 to address the problem of learning multiple distal rewards.
The experiment setup is as follow. The network consists of 1000 leaky integrate-and- fire neurons (800 are excitatory, 200 are inhibitory), as in figure 4.2. Network connectivity is defined to be the probability that neuron j is connected to neuron i with a synapse. Unless indicated, all of the experiments in this chapter use a network connectivity of 1.5%, giving a network fan-out of 15 (each pre-synaptic neuron is connected to 15 post-synaptic neurons, on average). This yields 15,000 total network synapses, on average. The network is stimulated with current, through a Poisson process, to produce Poisson-like spike trains while the average network firing rate is maintained around 1 Hz.
54 From the network, an excitatory neuron, j1, is selected at random, and labeled P re1.
Likewise, one of its post-synaptic neurons, i1, is selected at random, and denoted P ost1.
The synapse from P re1 to P ost1 is denoted as Syn1, and has synaptic weight Wi1j1 . The weight Wi1j1 is set to zero, while the other network synaptic weights are initialized to a constant value of 0.3 (0.8 for inhibitory synapses). In this base experiment, excitatory neurons are modified according to R-STDP, while inhibitory neurons are static. The R- STDP rule depends on a global reward R(t). The following paragraphs describe how R(t) is generated.
Figure 4.2: Network configuration diagram. There are 1000 neurons, with 800 excitatory and 200 inhibitory and 1.5% network connectivity. The blue arrows indicate excitatory connections, and the red arrows indicate inhibitory connections. In addition, N pre-synaptic neurons are chosen at random and denoted by P rek for k ∈ [1, 2,...,N]. For each pre-synaptic neuron P rek, a random post-synaptic neuron is chosen from its fan-out pool, and denoted by P ostk. The synaptic weights between each P rek and P ostk is set to zero, whereas the rest of the synaptic strengths are either set to 0.3 (for excitatory synapses), or 0.8 (for inhibitory synapses). In addition, for each of the neuron pairs, k, a separate reward channel is introduced, represented by a VTAK (ventral tegmental area) neuron that releases a global reward or success signal, represented by the green arrow.
55 Label a connected neuron pair (j, i) as (P re, P ost), with synapse Syn. Then, in general, if a spike in P re precedes a spike in P ost by a small window, then due to the Hebbian nature of the eligibility trace, Eij experiences a sharp increase. If Eij > 0 after the increase, Syn is eligible for potentiation, upon gating from the reward signal. In the opposite case where a spike in P ost precedes a spike in P re by a small window, Eij experiences a sharp decrease.
If Eij < 0, then Syn is eligible for depression. Specifically, if P re = P re1 and spikes at time
0 t, and P ost = P ost1 and spikes at t , then a coincident (anti-coincident) spike pair is defined as one in which 0 < t0 − t ≤ 10 (0 < t − t0 ≤ 10). That is, the small window must be 10 ms or less. With this terminology, and the fact that each neuron is spiking at 1 Hz, a coincident spike pair is expected about once every 100 seconds, on average. It is the coincident spike pair that we define to be the r-pattern that is to be learned by the network.
To facilitate synaptic potentiation and depression, the network is also complimented by a neuron from the ventral tegmental area [GTJ00], denoted by VTA1. The VTA1 neuron’s purpose is to release extracellular dopamine (DA) into the system. Upon the occurrence
∗ 1 3 of an r-pattern at time t , a delay d is selected uniformly at random from ( 2 τE, 2 τE), and ∗ the VTA1 neuron is stimulated at time t + d, inducing a DA release. The evolution of the strength of Syn1 is in figure 4.3, as a conductance histogram of the final network synaptic conductance distribution. By the end of the simulation, Syn1 is fully potentiated, yet the rest of the network synapses are maintained at less than half strength, with many of them depressed very near zero (aside for the inhibitory synaptic conductance which are not plastic).
Stronger depression (A− > A+) skews the final distribution towards zero. Note that as the parameters were selected so that a pre-synaptic spike across a fully potentiated synapse does not, by itself, cause a post-synaptic spike, the firing rate of P ost1 is negligibly affected by learning in this experiment. [Izh07b] explains the success of R-STDP.
56 4.5 Generalization to Multiple Synapse Learning
In this section, we describe an experiment for learning across multiple distal reward channels.
For k ∈ [1,...,N], we select an excitatory neuron at random and label it P rek. For each
P rek, we select an excitatory neuron, P ostk, from P rek’s fan-out pool. The synapse connect- ing P rek to P ostk is denoted as Synk. As above, the initial weight for each of these synapses is set to zero, while initializing the rest of the network synapses as described previously. The same experiment is performed but now the system is rewarded for an r-pattern from any of the (P rek, P ostk) pairs. The results for n = 2 are shown in figure 4.3. As predicted in [FSG10], the system cannot learn more than one r-pattern with R-STDP. This is due to the bias of the rule: while channel one broadcasts a distal reward, Syn2 is just as often in an anti-correlated state as it is in correlated state (at least in the early phases of learning).
So, when the delayed reward is presented to the system, Syn2 will be depressed half of the time, and potentiated half of the time. However, depression is stronger than potentiation, so this has a detrimental affect on any gains the Syn2 may have made. The Syn1 faces the same learning obstacles as Syn2, and as evident from figure 4.3c, neither of them are stably potentiated.
In an effort to solve this generalized problem, two approaches are considered. First, in section 4.5.1, STP is examined as a solution, because of its stabilizing effects on a network [STM07]. Second, in section 4.5.2, a reward attenuation scheme is adapted, as the following explains: If every successful learning effort undertaken by a system maximizes the extra- cellular reward, then the average extracellular modulator present in the system will trend upward, making gains with each newly learned task. This is an unrealistic assumption given the extended lifetime of a system and the many tasks learned in that lifetime. Instead a mechanism that slowly reduces the amount of “reward” released into the system is assumed. This is a more realistic assumption, and leads to task-specific reward release as demonstrated in this chapter. Given individualized attention to each task, it is anticipated that a network will more ably learn multiple tasks. The concept of an attenuated reward signal has been
57 Figure 4.3: Synaptic learning under R-STDP. a) & c) Evolution of the synaptic weight for the 1-synapse and 2-synapse learning experiments respectively, for a duration 10,000 seconds. Each color represents a unique synapse. b) & d) Conductance histogram showing the final network conductance distribution for the 1-synapse and 2-synapse learning experiments (in log scale), respectively. Synaptic strength is measured in units of gmax, where 1.0 is fully potentiated. The synapses at 0.8 are inhibitory synapses, which are held static (red). considered before [FSG10, US09]. In fact, [FSG10] argues that in order to learn multiple signals, the broadcast reward has to vanish, on average, for each channel individually.
In the following sections, it is necessary to benchmark the network learning performance. Thus, it is assumed that the network has successfully learned the signals when 90% of the synapses in question have been stably potentiated to at least the middle of its synaptic conductance range.
58 4.5.1 R-STDP with STP Learns Multiple r-Patterns
In this section, STP is employed (see section 2.2.2 for details) to learn multiple r-patterns. Using STP, at least 20 r-patterns can be successfully learned according to the metric in the previous section, as shown in figure 4.4. However, with the chosen parameters, R-STDP augmented with STP could not learn 25 r-patterns.
Figure 4.4: Synaptic learning under R-STDP with STP. a) & c) Evolution of the synaptic weight for the 20-synapse and 25-synapse learning experiments, respectively, for a duration of 100,000 seconds. Each color represents a unique synapse. b) & d) Conductance histogram showing the final network conductance distribution for the 20-synapse and 25-synapse learning experiments (in log scale), respectively. Synaptic strength is measured in units of gmax, where 1.0 is fully potentiated. The synapses at 0.8 are inhibitory synapses, which are held static (red).
The fact that the biased learning rule, R-STDP, could learn more than one r-pattern
59 seems to be at odds with the conclusions of [FSG10], in which it is argued that a biased learning rule cannot learn multiple r-patterns. However, the conclusions of [FSG10] are based on the average synaptic weight change:
h∆Wiji = hS(R)Eiji = Cov(S(R),Eij) + hS(R)ihEiji. (4.7)
In [FSG10], equation (4.7) is used to argue that while the covariance term is useful for learning a network wide r-pattern, the hS(R)ihEiji term must be suppressed as it detracts from the network’s learning of the r-pattern, because it ignores correlations between the reward and the stimulus. The authors then argue that learning multiple r-patterns must be done either by an unbiased learning rule, where hEiji = 0, or by a system that employs a success signal which vanishes on average. In biology, for a given stimulus, it is realistic that only a small subset of the brain is responsible for the correct reaction. Thus, average weight dynamics that reflect the state of the entire network are not sufficient to capture the affects of local changes due to learning, as evidenced by figure 4.4, which successfully learns 20 r-patterns using a biased rule. In contrast to the networks studied in this chapter, the networks studied in [FSG10] were of the non-recurrent, strictly feed-forward type with small network size, and every neuron was partly responsible for producing the correct dynamics. Thus, in [FSG10] the average synaptic change was crucial to successfully learning multiple r-patterns.
4.5.2 ARG-STDP Learns Multiple r-Patterns
Ideally, the mechanism for gating the success signals would be an online biological mech- anism, such as a spiking critic model [PMD09]. For simplicity, we employ an algorithmic approach in this chapter to test the effectiveness of the mechanism. It is assumed that a separate reward channel for each r-pattern exists, gated by the dopaminergic neurons VTAk
for each k. A reward is said to come across channel k if VTAk sends the global attenuated success signal, Sk, which gates network synaptic plasticity. ARG-STDP and its details are
60 presented in section 4.3.2.
Using ARG-STDP with the parameters selected, it is possible to learn up to 16 r-patterns (see figure 4.5). When learning from 17 distal reward channels, synaptic learning becomes un- stable, causing network synapses to be potentiated or depressed haphazardly–including those that have been targeted for learning. The underlying STDP rule employed in this chapter converges to a bimodal weight distribution in the stable learning paradigm [SMA00, RBT00]. While attempting to learn 17 synapses, however, the network conductance histogram is spread across the possible dynamic range as shown in figure 4.5d, indicating network insta- bility. Figure 4.5c shows the effects of network instability on learning.
4.5.3 STP Stabilizes ARG-STDP Network Learning Dynamics
Figure 4.6c and figure 4.6d show a positive feedback loop resulting from the interaction between the excitatory neuron group, E, and the neurons P ost = {P ostk | k ∈ [1,...,N]}. The synapses from P re to P ost are strengthened (where P re is defined similarly to P ost), resulting in an increased average firing rate for the P ost group. The increase in firing rate for P ost then results in an average synaptic weight gain from E to P ost. This further increases the firing rate of the P ost group, leading to a positive feedback loop of uncontrolled firing in P ost and synaptic potentiation from E to P ost. This is enough of a perturbation to disrupt the network dynamics, causing an increase in network firing (in figure 4.6c, the firing rate of the entire network erupts to almost 40 Hz) and unstable learning dynamics, as demonstrated in figure 4.5c and figure 4.6d.
We employ STP (section 2.2.2) to stabilize the network spiking dynamics, regularizing the firing rate. As shown in figure 4.6e, all neuron pools fire around 1 Hz. STP eliminates the fateful rise in average firing rate in the P ost population, seen in figure 4.6c. This in turn eliminates the positive feedback loop that resulted in both unstable firing rates and learning. Figure 4.6f demonstrates improved learning where only the intended synaptic pool (P re to P ost) is potentiated, and the rest of the network synapses are close to zero, on average.
61 Figure 4.5: Synaptic learning under ARG-STDP. a) & c) Evolution of the synaptic weight for the 16-synapse and 17-synapse learning experiments, respectively, for a duration of 30,000 seconds. Each color represents a unique synapse. b) & d) Conductance histogram showing the final network conductance distribution for the 16-synapse and 17-synapse learning experiments (in log scale), respectively. Synaptic strength is measured in units of gmax, where 1.0 is fully potentiated. The synapses at 0.8 are inhibitory synapses, which are held static (red).
In order to ensure that the effective synaptic weights generated by STP are not sig- nificantly different than the static weights used during the onset of learning, with respect to short-term dynamics, the effective weights were normalized with respect to steady-state firing at 1 Hz (section 3.4), so that at an average firing rate of 1 Hz, the effective weights coincide with the static weights. STP and ARG-STDP were then turned on only after the steady-state dynamics were achieved. This has no qualitive effects on the experiment. The
62 learning was the same in both cases (STP and ARG-STDP on from the beginning of the simulation compared to turning them on after 100 seconds had lapsed).
Figure 4.6: Analysis of average synaptic growth and firing rates. The neuron pools are E, I, P re, P ost, indicating the excitatory, inhibitory, the P rek, and the P ostk neuron pools. a) & c) & e) The average firing rates of each pool of neurons for the 16-synapse, 17-synapse, and 17-synapse with STP learning experiments, respectively. The inset in (c) shows the detrimental rise in the average firing rate of Post. b) & d) & f) The average synaptic strengths between the different neuron groups for the 16-synapse, 17-synapse, and
17-synapse with STP learning experiments, respectively, measured in units of gmax.
Using STP, a network can learn at least 30 r-patterns, but not more than 40 (see fig- ure 4.8), with the particular choice of parameters used. Specifically, τR can influence how many distal rewards can be learned. Increasing this value enables a greater number of reward
63 presentations per channel to influence learning, but with the tradeoff of an increased amount of time required to fully learn the r-patterns. One important thing to note is that under the influence of STP, even when the network fails at learning all the synapses, learning remains stable. That is, the network never enters into the chaotic state shown in the 17-synapse learning experiment, but instead shares the stable learning dynamics that networks using only R-STDP with STP have (compare (c) and (d) from figures 4.4, 4.5 and 4.8).
4.6 Properties of ARG-STDP with STP
In the previous section, it was established that multiple distal rewards can be learned using ARG-STDP augmented by STP. In this section the dynamics of the reward gating framework will be further explored.
4.6.1 Reward Predictive Properties of r-Patterns
So far, it has been shown that multiple synapses can be reinforced in a stable manner with- out interfering with each other. From the experimental setup, it is clear that the occurrence of the kth r-pattern will be predictive of a reward from the kth reward channel. However, to demonstrate r-pattern independence, it is necessary to demonstrate that the kth r-pattern se- lectively predicts only rewards from the kth reward channel, and not others. To demonstrate that this is so, a 10-synapse learning experiment was simulated for 100,000 seconds. Though learning converged within the first 10,000 seconds, only the final 10,000 seconds of data was analyzed. For each pair of integers k, ` ∈ [1,..., 10], let d(k, `) be a metric measuring the correlation between the temporal occurrences of the kth r-pattern and rewards from the `th reward channel, where larger values of d(k, `) indicate a stronger correlation. The specific metric used here is defined in section 4.7.1 (though, other metrics were tested, and yielded similar results). Independence in r-pattern learning is achieved when d(k, k) d(k, `) for k 6= `. Figure 4.9 shows the values of the metric d for each k and ` pair. The values of d on the diagonal are significantly stronger than the values off of the diagonal, indicating
64 Figure 4.7: STP has a stabilizing effect on synaptic learning within the network. a) & b) Depict the evolution of the synaptic weights for a duration 30,000 seconds and the conductance histogram showing the final network synaptic conductance distribution, respectively, for the 17-synapse learning experiment without STP. c) & d) Depict the evolution of the synaptic weights for a duration 30,000 seconds and the conductance histogram showing the final network conductance distribution, respectively, for the 17-synapse learning experiment with STP. In (a) and (c), each color represents a unique synapse and the synaptic strengths are measured in units of gmax, where 1.0 is fully potentiated. In (b) and (d), plotted in log-scale, the synapses at 0.8 (red) are inhibitory synapses, which are held static. independence of r-pattern learning, as desired.
65 Figure 4.8: Synaptic learning under ARG-STDP with STP. a) & c) Evolution of the synaptic weight for the 30-synapse and 40-synapse learning experiments, respectively, for a duration 100,000 seconds. Each color represents a unique synapse. b) & d) Conductance histogram showing the final network conductance distribution for the 30-synapse and 40-synapse learning experiments (in log scale), respectively. Synaptic strength is measured in units of gmax, where 1.0 is fully potentiated. The synapses at 0.8 (red) are inhibitory synapses, which are held static.
4.6.2 Learning Robustness to Reward Release Probability
To test robustness, a probabilistic reward release p is used to control the probability that a reward is released given the occurrence of an r-pattern over any reward channel. The experiments considered thus far were based on p = 100%. We explored the learning behavior as p → 0, and the network’s performance was found to be robust to inconsistent reward. In fact, using the learning metric defined in section 4.5, a network can learn 30 r-patterns,
66 Figure 4.9: Heat map depicting the values of the correlation d(k, `) between the kth r-patterns and the rewards released from the `th reward channel, where k, ` ∈ [1,..., 10].
but not 40, with p as low as 15%. The specific results of a network with a reduced p value are omitted, because it looks very similar to figure 4.8. In figure 4.10 the network’s learning capacity is plotted as a function of p.
Note that due to the robustness to reward release probability, it can be concluded that picking reward delays from another properly chosen distribution, such as the exponential distribution, tuned so that the delayed reward will fall within the bounds prescribed here at least 15% of the time, ARG-STDP with STP will be successful in learning. That is, consider a reward across channel ` that falls outside of the delayed interval considered in this chapter,
due to a different delay distribution choice. Then, if Syn` is still eligible for potentiation, it
67 will be rewarded as usual. If the delay is significantly longer than the potentiation window,
Syn` will not be able to “distinguish” the reward from that of another channel k 6= `. The effects of the “misplaced” channel ` reward on synapse ` will be no more significant than a channel k reward.
Figure 4.10: The network learning capacity is plotted as a function of p. The data points indicate verified learning whereas the error bars correspond to simulations that were conducted with a granularity of 10 r- patterns. Thus, the error bars are one-sided with a length of 9.
4.6.3 Learning Robustness to Reward Ordering
To test the network’s learning dependency on the specific order of rewards, 50 experiments were conducted, each with a different random number generator seed. This produced a dif- ferent reward schedule for each of the 50 experiments. Using the metric defined in section 4.5, each network successfully learned 30 distal rewards, indicating that the network is robust to the specific ordering of the r-pattern presentations.
68 4.6.4 Network Scaling
We then studied the influence of network’s size on the number of r-patterns that can be learned. Recall that the term connectivity refers to the probability a single pair of neurons is connected by a synapse. On the other hand, fan-out refers to the number of post-synaptic neurons, on average, for a single neuron. With this terminology, the control network (used in all experiments thus far) is one of 1000 neurons, 1.5% connectivity (for a fan out of 15, with 15,000 total network synapses), and gmax = 15 nS as the maximal conductance value. Two network scaling techniques were considered. First, the network was scaled, by adding more neurons, while keeping the fan-out constant at 15 (that is, the network connectivity is scaled), which holds the average synaptic input to each neuron constant as the size of the network changes. The second scaling method is one in which the connectivity remains constant at 1.5%, but gmax is scaled, again in order to keep the average synaptic input to each neuron constant. Using both of these techniques, networks of sizes 2,000, 5,000 and 10,000 were simulated. Surprisingly, these networks could learn 30 or more r-patterns, but less than 40. The results for these experiments are omitted for brevity as they look similar to figure 4.8, with the caveat that the conductance histograms contain many more synapses, due to network size, but the overall distribution trend looks similar.
As increasing the size of the network did not allow for the learning of more distal rewards, it is apparent that network scale has very little to do with the learning capacity of a network under ARG-STDP. To test this claim, the networks were reduced in scale using the same two techniques as above. The same capacity for learning was found, even for sizes as small as 100 neurons, supporting the claim of independence between network size and learning capacity. This claim is further explored in the next section.
4.6.5 The Reward Scheduling Problem
Since network learning capacity (somewhere between N = 30 and N = 40) is largely in- dependent of network size, the hypothesis is that the networks reach a reward scheduling
69 temporal density threshold that prevents learning more r-patterns. If τE = 1s, global re- wards delayed up to two seconds after a coincident (or anti-coincident) spike pair still have an efficacy of 13.5% of the maximum. This is significant, considering that success signals are broadcast with a delay of between 500 ms and 1500 ms, which corresponds to between 61% and 22% efficacy, respectively. For simplicity, assume that a success signal will have a significant influence on synaptic plasticity with respect to spike events occurring two seconds prior to it, and refer to this time interval as a reward-gated interval, or RGI (see figure 4.11). Since each neuron spikes at 1 Hz on average, any pair of connected neurons will have a coincident spike pair once every hundred seconds, on average. Thus, for each channel k, a reward injection is expected approximately once every hundred seconds. Now consider a random anti-coincident spike pair (which also happens once every hundred seconds). Then