UCLA UCLA Electronic Theses and Dissertations

Title The Role of Short-Term Synaptic Plasticity in Neural Network Spiking Dynamics and in the Learning of Multiple Distal Rewards

Permalink https://escholarship.org/uc/item/63r8s0br

Author O'Brien, Michael John

Publication Date 2013

Peer reviewed|Thesis/dissertation

eScholarship.org Powered by the California Digital Library University of California University of California Los Angeles

The Role of Short-Term Synaptic Plasticity in Neural Network Spiking Dynamics and in the Learning of Multiple Distal Rewards

A dissertation submitted in partial satisfaction of the requirements for the degree in Mathematics

by

Michael John O’Brien

2013 c Copyright by Michael John O’Brien 2013 Abstract of the Dissertation The Role of Short-Term Synaptic Plasticity in Neural Network Spiking Dynamics and in the Learning of Multiple Distal Rewards

by

Michael John O’Brien Doctor of Philosophy in Mathematics University of California, Los Angeles, 2013 Professor Chris Anderson, Chair

In this thesis, we assess the role of short-term synaptic plasticity in an artificial neural network constructed to emulate two important brain functions: self-sustained activity and signal propagation. We employ a widely used short-term synaptic plasticity model (STP) in a symbiotic network, in which two subnetworks with differently tuned STP behaviors are weakly coupled. This enables both self-sustained global network activity, generated by one of the subnetworks, as well as faithful signal propagation within subcircuits of the other subnetwork. Finding the parameters for a properly tuned STP network is difficult. We provide a theoretical argument for a method which boosts the probability of finding the elusive STP parameters by two orders of magnitude, as demonstrated in tests.

We then combine STP with a novel critic-like synaptic learning algorithm, which we call ARG-STDP for attenuated-reward-gating of STDP. STDP refers to a commonly used long- term synaptic plasticity model called spike-timing dependent plasticity. With ARG-STDP, we are able to learn multiple distal rewards simultaneously, improving on the previous reward modulated STDP (R-STDP) that could learn only a single distal reward. However, we also provide a theoretical upperbound on the number of distal reward that can be learned using ARG-STDP.

ii We also consider the problem of simulating large spiking neural networks. We describe an architecture for efficiently simulating such networks. The architecture is suitable for implementation on a cluster of General Purpose Graphical Processing Units (GPGPU). Novel aspects of the architecture are described and an analysis of its performance is benchmarked on a GPGPU cluster. With the advent of inexpensive GPGPU cards and compute power, the described architecture offers an affordable and scalable tool for the design, real-time simulation, and analysis of large scale spiking neural networks.

iii The dissertation of Michael John O’Brien is approved.

Dean Buonomano Joseph Teran Andrea Bertozzi Chris Anderson, Committee Chair

University of California, Los Angeles 2013

iv Table of Contents

1 Introduction ...... 1

1.1 Motivation ...... 1

1.2 Historical Context ...... 2

1.3 Thesis Overview ...... 4

1.4 Chapter Summaries ...... 6

2 Background: Computational Models for Neural Dynamics and Synaptic Plasticity ...... 8

2.1 Neuron Models ...... 8

2.1.1 Hodgkin and Huxley Neurons ...... 8

2.1.2 Leaky Integrate-and-Fire Neurons ...... 11

2.1.3 Izhikevich Neurons ...... 13

2.2 Plasticity Models ...... 13

2.2.1 Spike Time Dependent Plasticity ...... 14

2.2.2 Short Term Plasticity ...... 14

3 Short Term Plasticity Aided Signal Propagation ...... 16

3.1 Introduction ...... 16

3.2 RAIN Networks ...... 17

3.3 Signal Propagation ...... 19

3.3.1 Circuit Design ...... 19

3.4 Properties of STP ...... 21

3.5 STP Conditioned RAIN ...... 24

v 3.6 Signal Transmission in Coupled STP Networks ...... 25

3.6.1 Network Layout ...... 25

3.6.2 Coupled RAIN Dynamics ...... 28

3.6.3 Coupled Signal Propagation Dynamics ...... 30

3.7 Finding Master STP Parameters ...... 36

3.8 Analysis ...... 38

3.8.1 Analyzing Firing Rate Changes ...... 38

3.8.2 Critical Firing Rate ...... 44

3.8.3 Assessing Circuit Layer Correlation ...... 45

3.9 Conclusion ...... 46

4 Learning Multiple Signals Through Reinforcement ...... 48

4.1 Introduction ...... 48

4.2 Distal Reward Problem ...... 49

4.3 Methods ...... 50

4.3.1 Reward Modulated STDP ...... 51

4.3.2 R-STDP with Attenuated Reward Gating ...... 52

4.4 Single Synapse Reinforcement Experiment ...... 54

4.5 Generalization to Multiple Synapse Learning ...... 57

4.5.1 R-STDP with STP Learns Multiple r-Patterns ...... 59

4.5.2 ARG-STDP Learns Multiple r-Patterns ...... 60

4.5.3 STP Stabilizes ARG-STDP Network Learning Dynamics ...... 61

4.6 Properties of ARG-STDP with STP ...... 64

4.6.1 Reward Predictive Properties of r-Patterns ...... 64

4.6.2 Learning Robustness to Reward Release Probability ...... 66

vi 4.6.3 Learning Robustness to Reward Ordering ...... 68

4.6.4 Network Scaling ...... 69

4.6.5 The Reward Scheduling Problem ...... 69

4.6.6 Firing Rate Affects Learning Capacity ...... 72

4.6.7 Eligibility Trace Time Constant Affects Learning Capacity ...... 73

4.6.8 Interval Learning ...... 75

4.7 Analysis ...... 77

4.7.1 Defining the Correlation Metric ...... 77

4.7.2 Computing the Decaying Eligibility Trace ...... 78

4.8 Discussion ...... 81

5 HRL Simulator ...... 85

5.1 Introduction ...... 85

5.1.1 GPGPU Programming with CUDA ...... 86

5.1.2 Spiking Neural Simulators ...... 87

5.2 Simulator Description ...... 89

5.2.1 User Network Model Description ...... 90

5.2.2 Input ...... 92

5.2.3 Analysis ...... 95

5.3 Simulator Design ...... 96

5.3.1 Modular Design ...... 96

5.3.2 Parallelizing Simulation/Communication ...... 98

5.3.3 MPI Communication ...... 99

5.3.4 Simulation ...... 103

5.4 Performance Evaluation ...... 112

vii 5.4.1 Large-Scale Neural Model ...... 112

5.4.2 GPU Performance ...... 115

5.4.3 CPU Performance ...... 118

5.4.4 Network Splitting ...... 120

5.4.5 Memory Consumption ...... 120

5.5 Discussion ...... 121

5.6 Conclusion ...... 122

6 Conclusion ...... 124

References ...... 126

viii List of Figures

3.1 RAIN network configuration. The red arrows indicate inhibitory connections and the blue arrows are excitatory connections...... 17

3.2 The firing rate for the networks tested in the synaptic weight parameter sweep. 19

3.3 Signal propagation circuit network architecture. A naturally occurring feed- forward circuit is found within a RAIN network. The feed-forward connections are then strengthened, and this circuit is the circuit we observe for signal propagation...... 20

3.4 A) Signal propagation through 5 layers. B) A reverberating signal that is experienced in layer 5, but without inputs to layer 1. C) The average firing rate of the neurons in each layer for the duration of the experiment...... 21

3.5 The dynamic synapses plotted as a function of the presynaptic firing rate. The STP parameters can be chosen to produce a fixed point firing rate. Here,

the fixed point is 10 Hz, at which point µmn = Wmn, which was already chosen to produce stable RAIN firing...... 23

3.6 A) RAIN activity for 100 of the network neurons. The network parameters are suboptimal, leading to activity that lasts less than 2 seconds. B) RAIN activity for 100 of the network neurons. STP is employed, enabling the net- work to overcome the faulty choice in network parameters. The activity lasts more than 10 seconds...... 24

ix 3.7 The coupled signal propagation network architecture. Two circuit networks are weakly coupled together. The two networks have the same general neural parameters and configuration statistics, but the STP parameters for each net- work can be chosen independently, producing different firing dynamics in each network. The left network is referred to as Master, having STP parameters that yield self-sustained network activity. The right network is referred to as Slave, which has STP parameters that allow short excitatory bursts through, then kills network activity...... 25

3.8 A) Slave and Master are uncoupled. Master continues indefinitely whereas Slave dies. B) Slave has projections onto Master. Here Slave dies, as expected, and Master continues indefinitely...... 29

3.9 A) Master has projections onto Slave. This is sufficient to restart Slave when- ever Slave dies. B) Slave and Master are mutually coupled. In this case, only Slave received initial inputs, and Master relied on Slave for a jump-start. This demonstrates that Slave has the ability to start Master in the event Master dies. In this configuration, both networks thrive indefinitely...... 31

3.10 An analysis of the coupling required for the connections between Master and Slave. A & B) The average firing rate of Slave and Master, for one second of elapsed time, for different connectivity probabilities. These were performed with a bridge synapse strength of 30 nS. C & D) The average firing rate of Slave and Master, for one second of elapsed time, for different connectivity strengths. These were performed with a synaptic bridge connection probability of 2E-4. 32

3.11 For any layer k of interest, we construct a binary projection neuron pair. Layer k projects onto the excitatory indicator neuron (blue). The indicator neuron has an excitatory connection to the inhibitory neuron (red) which, in turn, inhibits the indicator neuron to prevent the indicator from being overwhelmed by the circuit layer during a stimulus...... 33

x 3.12 A & B) Signal propagation through 5 layers for Master and Slave. C & D) A reverberating signal that is experienced in layer 5 of Master, but not in Slave. E & F) The average firing rate of the neurons in each layer for Slave and Master respectively...... 35

4.1 System reward R, reward tracker Rk and success signal Sk for reward channel k

are plotted. The time constant τR controls the rate of convergence of Rk → R. The independent axis is discrete and denotes the number of times success signal k is presented. Though the domain is discrete, interpolation is used to emphasize the trend...... 53

4.2 Network configuration diagram. There are 1000 neurons, with 800 excitatory and 200 inhibitory and 1.5% network connectivity. The blue arrows indicate excitatory connections, and the red arrows indicate inhibitory connections. In

addition, N pre-synaptic neurons are chosen at random and denoted by P rek

for k ∈ [1, 2,...,N]. For each pre-synaptic neuron P rek, a random post-

synaptic neuron is chosen from its fan-out pool, and denoted by P ostk. The

synaptic weights between each P rek and P ostk is set to zero, whereas the rest of the synaptic strengths are either set to 0.3 (for excitatory synapses), or 0.8 (for inhibitory synapses). In addition, for each of the neuron pairs, k, a sepa-

rate reward channel is introduced, represented by a VTAK (ventral tegmental area) neuron that releases a global reward or success signal, represented by the green arrow...... 55

xi 4.3 Synaptic learning under R-STDP. a) & c) Evolution of the synaptic weight for the 1-synapse and 2-synapse learning experiments respectively, for a du- ration 10,000 seconds. Each color represents a unique synapse. b) & d) Con- ductance histogram showing the final network conductance distribution for the 1-synapse and 2-synapse learning experiments (in log scale), respectively.

Synaptic strength is measured in units of gmax, where 1.0 is fully potentiated. The synapses at 0.8 are inhibitory synapses, which are held static (red). . . 58

4.4 Synaptic learning under R-STDP with STP. a) & c) Evolution of the synaptic weight for the 20-synapse and 25-synapse learning experiments, respectively, for a duration of 100,000 seconds. Each color represents a unique synapse. b) & d) Conductance histogram showing the final network conductance distri- bution for the 20-synapse and 25-synapse learning experiments (in log scale),

respectively. Synaptic strength is measured in units of gmax, where 1.0 is fully potentiated. The synapses at 0.8 are inhibitory synapses, which are held static (red)...... 59

4.5 Synaptic learning under ARG-STDP. a) & c) Evolution of the synaptic weight for the 16-synapse and 17-synapse learning experiments, respectively, for a du- ration of 30,000 seconds. Each color represents a unique synapse. b) & d) Conductance histogram showing the final network conductance distribution for the 16-synapse and 17-synapse learning experiments (in log scale), re-

spectively. Synaptic strength is measured in units of gmax, where 1.0 is fully potentiated. The synapses at 0.8 are inhibitory synapses, which are held static (red)...... 62

xii 4.6 Analysis of average synaptic growth and firing rates. The neuron pools are

E, I, P re, P ost, indicating the excitatory, inhibitory, the P rek, and the

P ostk neuron pools. a) & c) & e) The average firing rates of each pool of neurons for the 16-synapse, 17-synapse, and 17-synapse with STP learning experiments, respectively. The inset in (c) shows the detrimental rise in the average firing rate of Post. b) & d) & f) The average synaptic strengths between the different neuron groups for the 16-synapse, 17-synapse, and 17- synapse with STP learning experiments, respectively, measured in units of

gmax...... 63

4.7 STP has a stabilizing effect on synaptic learning within the network. a) & b) Depict the evolution of the synaptic weights for a duration 30,000 seconds and the conductance histogram showing the final network synaptic conduc- tance distribution, respectively, for the 17-synapse learning experiment with- out STP. c) & d) Depict the evolution of the synaptic weights for a duration 30,000 seconds and the conductance histogram showing the final network con- ductance distribution, respectively, for the 17-synapse learning experiment with STP. In (a) and (c), each color represents a unique synapse and the

synaptic strengths are measured in units of gmax, where 1.0 is fully poten- tiated. In (b) and (d), plotted in log-scale, the synapses at 0.8 (red) are inhibitory synapses, which are held static...... 65

4.8 Synaptic learning under ARG-STDP with STP. a) & c) Evolution of the synaptic weight for the 30-synapse and 40-synapse learning experiments, re- spectively, for a duration 100,000 seconds. Each color represents a unique synapse. b) & d) Conductance histogram showing the final network conduc- tance distribution for the 30-synapse and 40-synapse learning experiments (in

log scale), respectively. Synaptic strength is measured in units of gmax, where 1.0 is fully potentiated. The synapses at 0.8 (red) are inhibitory synapses, which are held static...... 66

xiii 4.9 Heat map depicting the values of the correlation d(k, `) between the kth r- patterns and the rewards released from the `th reward channel, where k, ` ∈ [1,..., 10]...... 67

4.10 The network learning capacity is plotted as a function of p. The data points indicate verified learning whereas the error bars correspond to simulations that were conducted with a granularity of 10 r-patterns. Thus, the error bars are one-sided with a length of 9...... 68

4.11 In ARG-STDP, the reward’s effect on the weight gain in a synapse is dependent on the amount of time that passes from the completion of the r-pattern until the presentation of the reward. Here, consider the effects of a reward at time

zero on the r1-pattern, which is within the 2 second RGI, and the r2-pattern, which is beyond the RGI. Though the length of RGI is somewhat arbitrarily picked, its effects are clear, and it gives us a benchmark to compare with across experiments...... 71

4.12 The average eligibility trace, hEiji, as a function of N, the number of reward channels. Network learning decreases as N becomes large. Several examples with various values of N have been simulated, demonstrating the decreasing learning capacity of a network...... 73

4.13 Synaptic learning under ARG-STDP with STP. Here the firing rates of P rek

and P ostk, for k ∈ [1,...,N], is reduced to 0.5 Hz, down from 1 Hz in previous experiments. a) Evolution of the synaptic weight for the 120-synapse learning experiment, for a duration 800,000 seconds. Each color represents a unique synapse. b) Conductance histogram showing the final network conductance distribution for the 120-synapse learning experiments (in log scale). Synaptic

strength is measured in units of gmax, where 1.0 is fully potentiated. The synapses at 0.8 (red) are inhibitory synapses, which are held static...... 74

xiv 4.14 Synaptic learning under ARG-STDP with STP. Here τE = 300 ms, down from 1000 ms in previous experiments. a) Evolution of the synaptic weight for the 100-synapse learning experiment, for a duration 200,000 seconds. Each color represents a unique synapse. b) Conductance histogram showing the final network conductance distribution for the 100-synapse learning experiments

(in log scale). Synaptic strength is measured in units of gmax, where 1.0 is fully potentiated. The synapses at 0.8 (red) are inhibitory synapses, which are held static...... 75

4.15 Interval learning. Each color represents a unique synapse. a) A seesaw pattern emerged in some of the simulations. In this case, the following were used: a

reduced spiking rate (0.5 Hz) for the P rek and P ostk neurons; two synaptic groups, each of size 30, for 60 total synapses; learning intervals of 100,000 seconds; and β = 1.3. This simulation was ran for 600,000 seconds. b) In this experiment the following were used: a reduced spiking rate (0.5 Hz) for

the P rek and P ostk neurons; two synaptic groups of size 100, for 200 total synapses; learning intervals of 10,000 seconds; and β = 1.1. This simulation was ran for 1,200,000 seconds. c) In this experiment the following were used:

τE = 300 ms; two synaptic groups of size 100, for 200 total synapses; learning intervals of 10,000 seconds; and β = 1.1. This simulation was ran for 300,000 seconds...... 76

4.16 Comparison of the decreasing eligibility traces for the standard experiment and the experiments from sections 4.6.6 and 4.6.7...... 81

5.1 The simulator modules of HRLSim with all the interactions between them is shown here...... 95

5.2 Flow charts showing how the communication thread is parallelized with the computation thread...... 97

xv 5.3 An example showing how dummy neurons can be used to simplify message passing...... 100

5.4 Dynamic spike packing is compared with the AER approach for simulating a network with 5000 outgoing axons...... 102

5.5 The conversion from a graph representation of a network to a flattened linear array of the same network...... 105

5.6 Graph showing the timing breakdown of a GPU simulation with neuron inte- gration in green, pre-synaptic updates in blue, simple post-synaptic updates in red, and optimized post-synaptic updates in purple...... 109

5.7 (Top) Two 80% excitatory / 20% inhibitory networks connected in a small world fashion. Here 25% means that 25% of all the neurons have outgoing axons that connect to an external network. Since the weight is fixed at zero the exact connectivity of the axons is irrelevant. 1:80, 1:20 and 1:1 indicates a fanout of 80, 20 and 1, respectively. (Bottom) The raster plot of 2000 neurons from one of these networks showing their bursting firing...... 112

5.8 GPU Results, 100,000 neurons per node. (A), (B), and (C) show the runtime distribution, with respect to ||I||, on 4, 16 and 64 nodes (400K, 1,600K, and 6,400K total neurons), respectively. (D) shows the total-time linear regression for plots (A), (B), and (C). (E) shows the histogram of ||I|| over I ∈ I. (F) shows how the runtime scales to network size...... 115

5.9 GPU Results, 20,000 neurons per node. (A), (B), and (C) show the runtime distribution, with respect to ||I||, on 4, 16 and 64 nodes (80K, 320K, and 1,280K total neurons), respectively. (D) shows the total-time linear regression for plots (A), (B), and (C). (E) shows the histogram of ||I|| over I ∈ I. (F) shows how the runtime scales to network size...... 116

xvi 5.10 CPU Results, 20,000 neurons per node. (A), (B), and (C) show the runtime distribution, with respect to ||I||, on 16, 64 and 128 nodes (320K, 1,280K, and 2,560K total neurons), respectively. (D) shows the total-time linear regression for plots (A), (B), and (C). (E) shows the histogram of ||I|| over I ∈ I. (F) shows how the runtime scales to network size...... 119

xvii List of Tables

3.1 Network parameters used in this chapter...... 18

3.2 Master and Slave STP parameters used in this chapter...... 26

3.3 The success rate for finding Master-like STP parameters for various regions of the STP parameter domain. Uniformly at random is the first entry and the most prolific regions defined in section 3.7 follow...... 39

4.1 Network parameters used in this chapter...... 51

5.1 Network parameters used in the benchmarks...... 114

5.2 Memory consumed by HRLSim during simulation for the 20K neuron network topology described in section 5.4.1. The left column reports memory for the 100 fanout networks used in the benchmarks throughout, and the right column reports memory for a more realistic situation of a 10K fanout...... 120

xviii Vita

2005 B.A. (Mathematics, Physics and Computer Science). Claremont McKenna College, Claremont, California.

2006 M.A. (Mathematics). UCLA, Los Angeles, California.

2003 Research Assistant. Reed Institute of Decision Science, Claremont, Cali- fornia.

2004–2005 Engineer Intern. Raytheon, El Segundo, California.

2005–2012 Teaching Assistant. Mathematics Department, UCLA.

2006–2008 Systems Engineer Intern, Aerospace Corporation, El Segundo, California.

2008 Adjunct Professor. Mathematics Department, Claremont McKenna Col- lege, Claremont, California.

2011 Adjunct Professor. Mathematics Department, Claremont McKenna Col- lege, Claremont, California.

2008–present Research Assistant. Center for Neural and Emergent Systems, Informa- tion and System Sciences Department, HRL Laboratories LLC, Malibu, California.

Publications

Fast Douglas Rachford Splitting Optimization Methods. Michael J. O’Brien and Thomas Goldstein. In preparation.

xix Using Short Term Plasticity in Symbiotic Coupled Networks to Aid Faithful Signal Propa- gation. Michael J. O’Brien and Narayan Srinivasa. In preparation.

Efficiently Passing Messages in Distributed Spiking Neural Network Simulation. Corey M. Thibeault, Kirill Minkovich, Michael J. OBrien, Frederick C Harris, Jr., and Narayan Srini- vasa. In preparation.

HRLSim: A High Performance Spiking Neural Network Simulator for GPGPU Clusters. Kirill Minkovich, Corey M. Thibeault, Michael J. OBrien, Aleksey Nogin, Youngkwan Cho, and Narayan Srinivasa. IEEE Transactions on Neural Networks and Learning Systems. In review.

A Spiking Neural Model for Stable Reinforcement of Synapses Based on Multiple Distal Rewards. Mike J. OBrien and Narayan Srinivasa. Neural Computation 2013, 25(1), 123- 156.

Equality in Pollard’s Theorem on Addition of Congruence Classes. Eva Nazarevicz, Mike O’Brien, Mike O’Neill and Carolyn Staples. Acta Arith. 127 (2007), 1-15

xx CHAPTER 1

Introduction

1.1 Motivation

The brain is by far the most sophisticated computing tool on earth. Though it can be outper- formed in certain specific tasks, such as chess and Jeopardy, its performance is unparalleled in solving a wide range of problems that require flexibility, creativity and complexity. For instance, the biological brain is magnitudes better than any artificial architecture known at both navigating ever changing environments and the recognition of people and objects despite spatial orientation, partial obstruction and deterioration or the effects of aging. These tasks are considered simple, even to young children, and yet machines cannot efficiently/accurately achieve them. On the other end of the spectrum, the brain is capable of complex reasoning, deduction and art: calculating the age of the universe, proving the existence of arbitrarily large arithmetic progressions within the primes, the Sistine Chapel fresco and the Toccata and Fugue in D minor. Each of these, while not considered easy, were achieved by the cre- ativity and brilliance of the biological brain architecture. It is clear that the brain possesses different, if not higher, computational abilities than artificial architectures have been able to achieve. Furthermore, this all can be achieved in a package that requires approximately a liter of space and thirty watts of power [Len03, AL01]. The most advanced computers, on the other hand, require upwards of 100 MW of power and 40 ML of space. So, it is with great interest that mankind uses its brain derived intellect in a quest in self-understanding.

We strive to create artificial neural networks that emulate the neurological dynamics of the brain in order to create computational systems that are able to come closer to the ca-

1 pabilities and efficiencies of the human brain for processing information and learning. The brain is a biologically evolved learning system, 6 billion years in the making, which provides great inspiration for developing artificial learning systems for solving very challenging prob- lems. In this thesis, we will explore some of the building blocks of the biologically inspired artificial neural network learning systems that are sought after.

1.2 Historical Context

The anatomical organization of the brain has long been studied as it was thought that the brain was the center of consciousness. Around 1900, the Spanish anatomist Santiago Ram´ony Cajal proposed the idea that discrete cells are the primary functional units, communicating with each other via specialized junctions, which were later coined as synapses by Sir Charles Sherrington. However, it was not until 1952 that Hodgkin and Huxley discovered that the action potential, which serves as the communication between neurons, is generated by a series of chemical reactions in which the electrical potential between the intracellular and extracellular regions is manipulated by ion channels in the cell membrane [HH52]. Hodgkin and Huxley provided a mathematical model to describe the nonlinear evolution of the cellular membrane potential. The Hodgkin and Huxley model, discussed in section 2.1.1, is the foundation of current research.

Despite the breakthrough of the Hodgkin and Huxley model for cellular dynamics, alone it is not enough to produce the evolving intelligence that is paramount to our species. For instance, artificial neural networks have long been a staple of computer science. An artificial neural network is typically defined as a network of connected compute nodes, where the specific computation within a node is called the activation function. The activation function is a mapping from the node’s input weights to the node’s output. For a proper connection weight set, artificial neural networks can demonstrate very complex computational behavior, such as face recognition. However, for a given activation function, the behavior is static with respect to the connection weight set. So, even with the highly evolved, non-linear activation

2 function described by the Hodgkin-Huxley equations, or for any activation function, the network behavior can be complex, but cannot learn. It is the synaptic plasticity in the brain that allows for learning, making the brain special. With plastic weights, a network can evolve to counter unexpected difficulties. For instance, a network can deal with a sudden loss of nodes (injury), or a change to the rules of the game (environmental changes). This consideration is essential to our being, as mankind, but also has important applications in science. For example, if an unmanned space mission beyond the reach of practical radio communication is damaged by space debris, it could relearn to navigate by rewiring the important connections within its neural architecture, through trial and error as provided by the environment.

Given evolving weights, the emergent network behavior can fall into a broad spectrum of possibilities, which can be achieved through efficient synaptic plasticity rules that optimize a network’s behavior. In computer science, genetic algorithms, gradient descent, and sim- ulated annealing are amongst the tools used for evolving optimal synaptic weights. These tools produce good weight sets for solving a particular problem, however an on-line learning algorithm is required for neural networks to learn in real-time, through interactions with the environment. Computer scientists employ a number of on-line learning techniques to solve this problem, such as temporal difference learning, but in this work we will be investigating biologically plausible synaptic plasticity rules.

In 1949, Donald Hebb proposed the existence of a biological mechanism through which synaptic connections between causally connected neurons become stronger, whereas other connections become weaker [Heb49]. Causally connected neurons are a pair of neurons j and i, where neuron j has a feed-forward connection to neuron i, and spikes in neuron j tend to elicit spikes in neuron i. In this case, we say that neuron j participates in firing neuron i. In 1973, evidence for such synaptic weight changes was discovered, and the plasticity was termed long-term potentiation [BL73, Lo03]. Learning induced by causally spiking neurons has become known as Hebbian learning. It has provided modern neuroscience with an on- line mechanism through which neural networks can evolve and learn from interactions with

3 environmental stimuli.

The precise form of the learning, however, is a very important research problem. There is volumes of literature attempting to address this question. The true and full nature of synaptic plasticity is likely some combination of the mechanisms that have been studied, as well as some that are yet to be discovered. For instance, though Hebbian plasticity has been observed, so has anti-Hebbian plasticity, as well as short-term (non-permanent) plasticity, amongst others. Yet, experimental difficulties arise in the validation of the proposed synaptic plasticity models due to the difficulty in measuring isolated synapses, let alone a cluster of interacting neurons. However, in general it is important to consider a variety of biologically plausible models, not only to push forward the theory of biological synaptic plasticity and the emergent neural dynamics, but also to provide learning mechanisms that can be used in artificial architectures.

1.3 Thesis Overview

In this thesis, we assess the role of short-term synaptic plasticity in an artificial neural network constructed to emulate two important brain functions: self-sustained activity and signal propagation. Short-term plasticity is a mechanism by which the synaptic weights are temporarily altered with respect to the firing rate of the presynaptic neuron. For instance, in some types of synaptic connections, a fast firing presynaptic neuron can exhaust the connection, leading to a reduced postsynaptic response for subsequent presynaptic spikes by depressing the effective synaptic weight. Short-term plasticity is often ignored in com- putational neuroscience because its dynamics do not shape the long-term synaptic weight distribution and, probably, because the usefulness of it is not fully understood. We val- idate the usefulness of short-term plasticity for self-sustained network activity as well as signal propagation within a neural network, demonstrating that the dynamics of short-term plasticity produce interesting network wide dynamics.

We also extend a form of long-term (permanent) Hebbian synaptic plasticity in order

4 to develop a model that can learn from distal, or delayed, rewards. Hebbian models rely on local interactions between the presynaptic and postsynaptic neurons. The system we develop assumes an additional global reward signal, such as dopamine within the brain, that modulates the Hebbian plasticity rule. The introduction of the extracellular global reward allows for learning from distal rewards. This is an important feature as learning via environmental feedback is a crucial type of learning. We found the learning within the system to be unstable under an increased (more than one) number of independent distal rewards. We then augmented the system with short-term plasticity and investigated its stability under learning multiple distal rewards. The stabilization properties of short-term plasticity aid in stabilizing the long-term Hebbian learning, enabling the stable learning of multiple distal rewards. This further validates the usefulness of short-term plasticity.

In studying neural networks, in addition to developing appropriate neural and synaptic models, there exists the computational task of implementing the models. Throughout this thesis, we consider relatively small neural networks (20,000 neurons), but in general these are just atomic models built to demonstrate the functional advantages of certain techniques. As the field of computational neuroscience continues to mature, the refined atomic models will be the building blocks for large-scale models. The building blocks will interact with each other, providing for complex emergent behavior. For instance, in [IE08] a large-scale model of a mammalian thalamocortical network is simulated. The network is comprised of one million neurons and almost half a billion synapses. In this model, it took one minute to simulate one second of activity. As these large-scale models are developed, it is necessary to produce a neural network simulator capable of efficiently simulating large networks. In this thesis, we propose a simulation architecture to address this concern, and demonstrate its ability to efficiently simulate large-scale neural networks.

5 1.4 Chapter Summaries

In this work, we consider two standard types of synaptic plasticity. We consider long term spike timing dependent plasticity, commonly called STDP, and we consider short term plas- ticity, called STP. The specifics of these models will be discussed in chapter 2, along with an introduction to several important neuron models. STDP is considered the primary synaptic plasticity rule, slowly working to permanently alter the synaptic strengths based on spike correlations, however the specific model it follows is debated. We use a widely considered model proposed by Song and Abbott [SMA00]. We also use a standard form of STP, pro- posed by Markram et al. [MLF97], as a network regulatory device. The synaptic plasticity in this case is temporary and the effect on the synaptic strengths immediate.

In chapter 3, we use STP to help stabilize networks with self-sustaining random asyn- chronous spiking activity, and also to aid in faithful signal transmission within a noisy network’s sub-circuit. We couple two different types of networks in symbiotic relationship in order to provide the desired dynamics. One network provides stability with respect to self-sustaining activity, and the second network provides the medium in which signal propa- gation is more faithful. An analytical characterization to assist in finding the relevant STP parameters is derived. The characterization is demonstrated to boost the probability of finding a useful network by two orders of magnitude over a random search.

In chapter 4, we consider the distal reward problem in which a neural network is to learn a stimulus signal based on a subsequent, and delayed, reward. This is analogous to Pavlovian, or classical, conditioning. By augmenting traditional STDP with a reward trace, often likened to an extracellular presence of dopamine, the distal reward problem was solved for a single distal reward [Izh07b, Flo07]. It was, however, thought that this technique was limited in scope to learning a single distal reward [FSG10]. In chapter 4 we employ STP, in combination to the reward modulated STDP, to learn multiple distal rewards. We also develop a novel learning rule in which the effects of the dopamine modulation attenuate with time. This further enhances the number of rewards we are able to learn. With this

6 algorithm, we demonstrate the learning of upwards of 200 distal rewards. Despite this success over previous methods, we also demonstrate a theoretical upper-bound on our technique.

In chapter 5, the problem of large-scale neural network simulation is addressed. A sim- ulator is described, designed specifically for efficiently simulating large scale models. The simulator is the first designed to exploit a parallel architecture of many GPGPUs (general purpose graphics processing units). Each GPGPU is characterized by a very high compu- tational throughput, achieved by its highly parallel architecture. We demonstrate that the proposed simulator architecture scales well with large networks.

7 CHAPTER 2

Background: Computational Models for Neural Dynamics and Synaptic Plasticity

The computational power of the brain relies on robust and fault tolerant neural networks. The complex behavior of the brain is realized through the individual firing of each cell along with the complex network configuration joining the cells together. The classic action poten- tial mechanisms of a neuron in computational neural modeling are presented in section 2.1. In order to be robust and fault-tolerant, the network connection complexities required for high-level activity must evolve naturally from a set of (chemically induced) rules. These rules are far from fully understood and are the subject of wide research. However, many basic principles have been found. The computational models typically used for plasticity are presented in section 2.2.

2.1 Neuron Models

In this section, we present several different neural models that accomplish the spiking dy- namics that are the basis of the brain’s robust computational power. Each of these models have their own strengths as well as drawbacks.

2.1.1 Hodgkin and Huxley Neurons

The spacial extent of a neuron is defined by the cell membrane that separates the intracellular contents from the cell’s environment. This membrane acts as an insulator between the intracellular and extracellular ions. This insulator induces a concentration difference in ion

8 density, resulting in an electrical potential across the cell membrane. For each ion species, neurons have a large number of microscopic channels composed of selective proteins (ensuring selectivity to the ion species). In each microscopic ion channel, the associated proteins form a small number of physical gates that regulate the flow of the ion species across the channel. Each gate can be either in a permissive or a non-permissive state. If all of the gates within an ion channel are in the permissive state, then ions are able to flow across the channel, and we call the channel open. If any of the gates are in the non-permissive state, ions cannot flow across the channel, and the channel is called closed [NR98].

When a membrane potential reaches a certain threshold through ion-gating interactions with the environment, the voltage triggers a non-linear sequence of ion channels opening and/or closing. This produces a 2 ms process of depolarization, followed by repolarization. The membrane then resets to the cellular resting potential. This is the neural action-potential modeled by the Hodgkin and Huxley neuron. Hodgkin and Huxley were the first to simplify the study of the membrane potential as an electrical circuit, considering the neuron mem- brane as a capacitor and the potential across ion channels as batteries. They proposed that the ionic conductances of a neuron were dynamically changing functions of the membrane potential [HH52]. It is now known that the voltage dependence is due to the biophysical properties of the ion channels. Given an input current, I(t), charge will build up on the capacitor, or leak through the channels. The electrical circuit is described by

dV C = I + I , (2.1) m dt ion ext

where V is the membrane potential, Cm is the membrane capacitance, Iext is the externally

applied current and Iion is the net flow of ion current across the membrane. Iion is the sum of

9 3 INa = gNam h(ENa − V ), (2.2)

4 IK = gKn (EK − V ), (2.3)

IL = gL(EL − V ), (2.4) representing the sodium current, potassium current and leakage current respectively [NR98].

For r ∈ {Na, K, L}, gr and Er correspond to the experimentally normalized macroscopic conductance and equilibrium potential for the macroscopic ion channel (which is the aggre- gate of the microscopic ion channels), and m, h, n ∈ [0, 1] are gating probability variables for different types of gates, examined below.

If we consider a single type of gate and its probability p of being in the permissive state, then the probability transition is assumed to obey the first-order kinetics

dp = α(V )(1 − p) − β(V )p. (2.5) dt

Here α is a voltage-dependent rate constant describing the transition from the non-permissive state to the permissive state. Likewise β is a voltage-dependent rate constant describing the transition from the permissive state to the non-permissive state. They are both fit to exper- imental data. With this formalism, the microscopic sodium channels are governed by three independent m-type gates and one independent h-type gate, resulting in equation (2.2) when the channels are considered in aggregate. Likewise, each microscopic potassium channel is governed by four independent n-type gates, resulting in the macroscopic behavior of equa- tion (2.3). The combined dynamics, known as the Hodgkin-Huxley neuron model (HH), produces the four-variable ODE system [NR98]:

10 dV C = g m3h(E − V ) + g n4(E − V ) + g (E − V ) + I , (2.6) m dt Na Na K k L L ext dm = α (V )(1 − m) − β (V )m, (2.7) dt m m dh = α (V )(1 − h) − β (V )h, (2.8) dt h h dn = α (V )(1 − n) − β (V )n. (2.9) dt n n

The HH model is the basis for most biophysical neuron models. Traditionally, however, most large-scale neural network simulations require only an emulation of macro neural dy- namics, and a simpler high-level abstraction of the HH dynamics is used. Two common models will be presented below.

2.1.2 Leaky Integrate-and-Fire Neurons

A leaky integrate-and-fire (LIF) [Abb99] neuronal model simplifies the Hodgkin-Huxley model by assuming a binary spike or no-spike neuron in which pre-synaptic inputs are in- tegrated into a post-synaptic neuron. Once the post-synaptic neuron’s potential crosses a threshold, the neuron emits a spike. The assumption here is that the action-potential is more important to neural network behavior than the specifics of how the action-potential is generated at the cellular level. This assumption has led to the widespread use of abstraction models that represent the essential network-level dynamics of the Hodgkin-Huxley model.

In the following model, it is assumed that neurons are either excitatory or inhibitory. A pre-synaptic spike from an excitatory (inhibitory) neuron increases (decreases) the post- synaptic neuron’s membrane voltage. With this, each neuron, i, has spiking behavior gov- erned by the voltage equation

dV C i = g (V − V ) + gexc(t)(E − V ) + ginh(t)(E − V ) + I (t). (2.10) m dt L rest i i exc i i inh i ext

11 sp A neuron spike at time ti is defined by the reset criteria:

lim Vi(t) = Vthr (2.11) sp− t→ti

lim Vi(t) = Vreset. (2.12) sp+ t→ti

After an action potential, the voltages are clamped for a refractory period of 2 ms. In this dissertation: Vrest = -74 mV, is the resting neuronal voltage; Vthr = -54 mV, is the neural membrane action-potential threshold; Iext denotes external current; Eexc = 0 mV and Einh =

exc inh -80 mV are the excitatory and inhibitory reversal potentials respectively; and gi and gi are the summed conductance contributions from the excitatory and inhibitory pre-synaptic inputs, indexed by j, to post-synaptic neuron i. The dynamics of these conductances can be described as:

dg` X τ i = −g` + g` · W (t)δ(t − (tsp + ∆ )), for ` ∈ {exc, inh}. (2.13) ` dt i max ij j j j

Here, τexc = 5 ms and τinh = 20 ms are the conductance decay constants, δ is the Dirac delta sp function, and tj is the time of neuron j’s last spike and ∆j is the axonal delay for neuron j. An axonal delay is the delay in time between a neuron’s action potential is released near the neuronal soma and the action potential’s arrival at the axonal terminals, where synapses with other cells are formed. This value can be less than a millisecond, or more than 100 ms.

The values Wij(t) indicate the synaptic weight from neuron j to neuron i at time t and are

exc inh measured in units of gmax or gmax (depending on the type of presynaptic neuron), which is the maximum synaptic conductance. An input resistance of 150 MΩ is assumed throughout this work. The conductances can be plastic, as discussed in section 2.2.

12 2.1.3 Izhikevich Neurons

The Izhikevich neuron model is more costly computationally than the LIF neuron, but can recreate a larger range of neuron classes [Izh03, Izh07a, Izh04]. It is thought to be a good compromise between the computationally efficient LIF neuron and the accurate dynamics of the Hodgkin-Huxley neuron. The model uses continuous dynamics to represent all the different types of behaviors found in real neurons, and does so without the artificial thresh- olding employed by LIF neurons. The model is expressed by the simple membrane voltage equation

dV = 0.04V 2 + 5V + 140 − u + gexc(t)(E − V ) + ginh(t)(E − V ) + I (t), (2.14) dt i exc i i inh i ext a recovery variable

du = a(bV − u), (2.15) dt and the spike reset rules:

  V ← c if V ≥ 30, then (2.16) u ← u + d.

2.2 Plasticity Models

In this section, the basic plasticity rules employed in this dissertation are presented. Synaptic plasticity rules serve an important role in neural network evolution as they are the key in how a neural network learns a behavior.

13 2.2.1 Spike Time Dependent Plasticity

Spike time-dependent plasticity, or STDP, is used as the basic synaptic plasticity model [SMA00] described succinctly by [Flo07]. STDP is a Hebbian [Heb49] learning rule that potentiates causal synaptic connections. Specifically, if a pre-synaptic spike precedes a post- synaptic spike, then the corresponding synapse will be strengthened. If, on the other hand, a post-synaptic spike precedes the pre-synaptic spike, then the corresponding synapse is weakened.

In this plasticity model, the term X (t) = P δ(t−tsp) denotes the spike train of neuron j spj j j as a sum of Dirac functions over the spike times of neuron j. The synaptic update rule for the weight Wij, between pre-synaptic neuron j and post-synaptic neuron i is given by:

˙ Wij(t) = Pij(t)Xi(t) − Dij(t)Xj(t − ∆ij) (2.17)

˙ Pij(t) Pij(t) = − + A+Xj(t − ∆ij) (2.18) τ+

˙ Dij(t) Dij(t) = − + A−Xi(t), (2.19) τ−

where Pij is the potentiation trace, tracking the influence of pre-synaptic spikes, and Dij is

the depression trace, tracking the influence of post-synaptic spikes. A+ and A− correspond to

the maximum potentiation and depression of synaptic strength possible, respectively, and τ+

and τ− determine the effective time windows for potentiation and depression, respectively.

To ensure network stability, β := A−τ−/A+τ+ < 1 so that depression is stronger than

potentiation [SMA00]. The values Wij, measured in units of gmax, are artificially limited to the interval [0, 1].

2.2.2 Short Term Plasticity

Short term plasticity (STP) temporarily modifies the synaptic weights used in the neural dynamics based on the pre-synaptic firing rate. In the experiments involving STP, the fol-

14 lowing algorithm was employed [TPM98, MWT98, MM02]. When invoking STP, integration of the voltage (equation (2.10) or equation (2.14)) is augmented by what is called an effective synaptic weight µij, rather than the absolute synaptic weights Wij. These effective synaptic weights are short-term modifications of the absolute weights, dependent on the pre-synaptic firing rate. That is, formally, replace equation (2.13) with

dg` X τ i = −g` + µ (t)δ(t − tsp) for ` ∈ {exc, inh}, (2.20) ` dt i ij j j where µij is computed using another set of equations [TPM98]:

µij = AijxU1 (2.21) u u˙ = − + Uij(1 − u)r(t) (2.22) τFij 1 − x x˙ = − U1xr(t) (2.23) τDij

U1 = u(1 − Uij) + Uij. (2.24)

Here, Aij is just a scaling constant, τDij and τFij are the depression and facilitation time constants, u tracks synaptic utilization, x tracks synaptic availability, r is the instantaneous

firing rate of pre-synaptic neuron j , Uij is a constant determining the initial release proba- bility of the first spike, and U1 is just a mathematical convenience factor.

15 CHAPTER 3

Short Term Plasticity Aided Signal Propagation

3.1 Introduction

In the absence of stimulus, the brain remains active. That is, the brain maintains a sustained neural background level of activity irrelevant of stimulus input. The baseline activity in neu- ral networks is referred to as RAIN activity, or Recurrent Asynchronous Irregular Nonlinear activity [BQH10, VA05, KSA08]. These networks can be achieved through a balance of exci- tation and inhibition, where the contributions from each nearly cancel. The activity then is a result of fluctuations about the mean [SN94, TS95, TM97]. In vivo measurements demon- strate that neural responses are highly variable [BW76, Dea81, SK93, HSK96, ALG00]. It has been established as an essential aspect of neural models in order to reproduce the data recorded for such responses [USK94]. Oftentimes, models employ random noise from an ex- ternal source. However, though neurons are subject to external noise, it is evident that most cortical variability is generated from internal activity [ASG96]. Sparsely connected balanced networks of spiking neurons can sustain the background noisy activity without the need of a random external source [VS96, VS98, AB97, Bru00, MHK03, LAH04].

Once a background of neural activity is in place, it is important to understand how signals can be faithfully transmitted through the noise. External signal inputs can blow up or dissipate along the signal path, causing either system-wide runaway activity, obfuscating

This chapter is joint work with Narayan Srinivasa.

16 the original signal, or a loss of information.

In this chapter, we first consider the problem of generating RAIN networks, and then we examine the faithfulness of signal transmission through an embedded circuit. We use STP (section 2.2.2) in a novel coupled network to both enhance network RAIN activity as well as boost network signal transmission capabilities. We then explore the problem of selecting the appropriate STP parameters.

3.2 RAIN Networks

Figure 3.1: RAIN network configuration. The red arrows indicate inhibitory connections and the blue arrows are excitatory connections.

To construct a RAIN network, we use 8,000 excitatory neurons and 2,000 inhibitory neurons with a connectivity of 1.5%. This means that for any two neurons j and i, the probability that there is a connecting synapse from j to i is 1.5%. Figure 3.1 is the network

diagram. The uniform synaptic strengths the excitatory pool are denoted by Wexc and the

synaptic weights from the inhibitory pool are Winh. For example, for any excitatory neuron

j, and any other neuron i in j’s fanout pool, the strength of the synapse from j to i is Wexc.

With the network parameters in table 3.1, we proceed by performing a parameter sweep

17 All Networks

Cm = 200 pF gleak = 10 nS

Einh = −80 mV Eexc = 0 mV

Vthresh = −54 mV Vreset = −60 mV

Erest = −74 mV fanout = 150

τexc = 5 ms τinh = 15 ms

Table 3.1: Network parameters used in this chapter.

across the values (Wexc,Winh) ∈ (0, 10] × (0, 100], in nS, with a discretization of .1 and 1 nS respectively. For each parameter set, 200 random neurons (allowing both excitatory and inhibitory) are stimulated with Poisson distributed current for 50 ms, resulting in an initial 60 Hz of activity, at which point all external inputs are turned off and the network’s activity is recorded for 2 seconds. The final 100 ms of activity is analyzed for average firing rate and inter-spike-interval coefficient of variation for each neuron. These values are then averaged across the network, and recorded. The goal is to find sustainable asynchronous network activity between 10 and 20 Hz. The coefficient of variation above 1 ensures network

asynchrony. With this approach, we were able to find (Wexc,Winh) = (4.1 nS, 98 nS), amongst others (figure 3.2), with a low firing rate and asynchronous activity, all sustained for at least 2 seconds. These parameters are used throughout this chapter.

18 Figure 3.2: The firing rate for the networks tested in the synaptic weight parameter sweep.

3.3 Signal Propagation

3.3.1 Circuit Design

An important aspect of neural computation is the transmission of information within the cortex [VA05, KRA10]. The signal transmission traits of a neural network is thus an im- portant feature to study. Here we consider a simple model for signal transmission proposed in [VA05]. Once a sustainable background network exhibiting RAIN activity is established, we select a random 5-layer circuit from the network in the following way. The first layer of the circuit consists of 30 random neurons from the network. The second layer of the circuit consists of 30 neurons selected from the pool of postsynaptic neurons from the first layer, with the requirement that any neuron selected must have at least 3 feed-forward connections from layer 1. For layer n > 2, we select 30 neurons from the pool of postsynaptic neurons of

19 Figure 3.3: Signal propagation circuit network architecture. A naturally occurring feed-forward circuit is found within a RAIN network. The feed-forward connections are then strengthened, and this circuit is the circuit we observe for signal propagation. layer n − 1, where each selected neuron has at least 3 feed-forward connections from layer n − 1. In addition, we impose a no short-circuit requirement, where each neuron in layer n has no connections from a neuron in layer k, where k < n−1. This forces the signal to prop- agate through the layers in order. If 30 neurons cannot be selected for a layer, the layer will consist of as many neurons as can be found satisfying the requirements. The feed-forward synaptic weights in the circuit are strengthened by a factor of 16. See figure 3.3 for the network and circuit architecture. This design we refer to as the circuit network. It will be used as a basis for larger networks in subsequent sections. The signal propagation through 5 layers is shown in figure 3.4. Notice the reverberating signals that can be propagated as well, which are signals that are transmitted through the circuit without a layer 1 stimulus.

20 Figure 3.4: A) Signal propagation through 5 layers. B) A reverberating signal that is experienced in layer 5, but without inputs to layer 1. C) The average firing rate of the neurons in each layer for the duration of the experiment.

3.4 Properties of STP

The introduction of STP (see section 2.2.2) into the neural dynamics strongly influences the stability of the RAIN activity and a network’s ability to propagate signals. In [STM07], it was demonstrated that synaptic STP dynamics can support a fixed average network firing rate, leading to stable firing dynamics. In section 3.8.1, the linear approximation to the change in firing rate with respect to the change in network inputs is examined, using mean

21 field theory. In this section we present the fixed-point requirements, and the heuristics behind the analysis conducted in section 3.8.1, following [STM07].

First, consider the steady state for equations (2.21) to (2.24), given a steady firing rate r∗ (steady state values indicated by asterisks):

∗ ∗ ∗ µij = Aijx U1 (3.1) ∗ ∗ τFij Uijr u = ∗ (3.2) 1 + τFij Uijr

∗ 1 x = ∗ ∗ (3.3) 1 + τDij U1 r ∗ ∗ U1 = u (1 − Uij) + Uij. (3.4)

Assuming that the static weights Wij were selected to give an average network firing rate at r∗, the dynamical synapses can produce a fixed point at firing rate r∗, if the multiplicative constant Aij is selected in the following way. For a given set of STP parameters and desired firing rate r∗, pick

∗ Wij Aij(U, τDij , τFij , r ) = ∗ ∗ ∗ ∗ . (3.5) x (U, τDij , τFij , r )U1 (Uij, τDij , τFij , r )

∗ We can now compute the effective weight µij at r :

∗ ∗ Wij ∗ ∗ µij = Aijx U1 = ∗ ∗ x U1 = Wij, (3.6) x U1

which yields fixed firing dynamics, by assumption. Thus, with Aij picked in this way, we attain a fixed-point firing rate for the system. We now consider a heuristic for the stability of r∗. In order to make r∗ a stable fixed point, [STM07] proposed that for a given firing rate

r, the effective synaptic weights µij should obey:

1. When r < r∗:

22 Figure 3.5: The dynamic synapses plotted as a function of the presynaptic firing rate. The STP parameters can be chosen to produce a fixed point firing rate. Here, the fixed point is 10 Hz, at which point µmn = Wmn, which was already chosen to produce stable RAIN firing.

(a) Increase in synaptic strength efficacy for E → E and I → I synapses,

(b) Decrease in synaptic strength efficacy for E → I and I → E synapses.

2. When r > r∗:

(a) Decrease in synaptic strength efficacy for E → E and I → I synapses,

(b) Increase in synaptic strength efficacy for E → I and I → E synapses.

This heuristic is visualized in figure 3.5. Consider the left figure, which plots the excitatory to excitatory and inhibitory to excitatory dynamic synapses (µee and µei respectively). When a presynaptic excitatory neuron is firing greater than 10 Hz, then the assumption is that the postsynaptic neuron is also firing too fast, and then µee < Wee, providing less excitation than

23 a static synapse, to slow the network down. Similarly, for slower than 10 Hz presynaptic excitatory firing, µee > Wee, to speed up the network. If an inhibitory presynaptic neuron is firing faster than 10 Hz, then µei > Wei, providing more inhibition to slow down the postsynaptic neuron. If the presynaptic inhibitory neuron is firing slower than 10 Hz, then

µei < Wei, to lessen the inhibition to the postsynaptic neuron.

Figure 3.6: A) RAIN activity for 100 of the network neurons. The network parameters are suboptimal, leading to activity that lasts less than 2 seconds. B) RAIN activity for 100 of the network neurons. STP is employed, enabling the network to overcome the faulty choice in network parameters. The activity lasts more than 10 seconds.

3.5 STP Conditioned RAIN

We found that STP’s stabilization properties, discussed above, can improve the fault-tolerance of a network’s self-sustained RAIN activity despite a suboptimal balance between excitatory and inhibitory weights. Optimally balanced excitatory/inhibitory networks can produce con- tinual RAIN activity as seen in section 3.2. However self-sustained RAIN activity can be fleeting in networks with unbalanced parameters, or can arise from network wide shocks, caused from external inputs to the network or internal activity fluctuations, compromising the excitatory/inhibitory balance and subsequently silencing the network. Figure 3.6 (A)

24 shows the stunted activity in a network with improperly chosen network parameters. In- troducing STP dynamics can fix the RAIN activity, resulting in sustained activity. Fig- ure 3.6 (B) demonstrates sustained activity for a network identical to that in (A), with the addition of dynamical synapses. The activity, influenced by STP, persists for over 10 seconds.

3.6 Signal Transmission in Coupled STP Networks

3.6.1 Network Layout

Figure 3.7: The coupled signal propagation network architecture. Two circuit networks are weakly coupled together. The two networks have the same general neural parameters and configuration statistics, but the STP parameters for each network can be chosen independently, producing different firing dynamics in each network. The left network is referred to as Master, having STP parameters that yield self-sustained network activity. The right network is referred to as Slave, which has STP parameters that allow short excitatory bursts through, then kills network activity.

In this section, we consider the advantages of STP with respect to the signal transmis- sion experiment proposed in section 3.3.1. In this version of the experiment, we consider two types of STP parameters. Consider a dual-network setup, where each network is configured

25 Type Master Slave

U = 0.2 U = 0.2

E → E τD = 9 ms τD = 10 ms

τF = 72 ms τF = 5 ms

U = 0.2 U = 0.2

E → I τD = 10 ms τD = 9 ms

τF = 5 ms τF = 72 ms

U = 0.2 U = 0.2

I → E τD = 10 ms τD = 9 ms

τF = 5 ms τF = 72 ms

U = 0.2 U = 0.2

I → I τD = 9 ms τD = 10 ms

τF = 72 ms τF = 5 ms

Table 3.2: Master and Slave STP parameters used in this chapter.

26 as in section 3.3.1, and we include dynamical synapses governed by STP. We call the two networks Master and Slave, which are weakly coupled (see figure 3.7). The difference be- tween the Master and Slave networks lies solely in the type of STP parameters selected. In Master, the STP parameters chosen provide longevity, with respect to the RAIN dynamics, as demonstrated in section 3.5. The Slave network, on the other hand, employs STP param- eters that tend to kill spiking dynamics. For example, in the brain, the prefrontal cortex (PFC) is known to be reciprocally connected to many other areas in the brain. However, the STP parameters in the PFC are more facilitating than other cortical areas, which are primarily depressive [MBT08, Fus08]. Parts of the PFC are thought to be more important for neural modulatory purposes rather than computing and signal propagation [ZB06, BZ07], which would be analagous to the role played by Master in our coupled network architecture. In this chapter, we demonstrate that, likewise, Master is ill-suited for signal propagation, as Master’s tendency to sustain activity elicits reverberating signals. On the other hand, the dynamics in Slave tend to kill spike activity. This allows for quick bursts of signal to prop- agate through a circuit, yet the dynamics kill subsequent ripple effects. On its own, Slave cannot sustain activity, and dies quickly. However, as we will show below, the weak cou- pling between Slave and Master is enough to sustain baseline RAIN activity in Slave. This is important because, without such activity, stimulus signals will not propagate through the circuit as readily. The baseline activity keeps the membrane voltage of the neurons near threshold, enabling quick ascension to action-potential and, in turn, quick responses to input signals.

In this work, we are interested in the emergent global network stability and how signal propagates through the local circuits, as we demonstrate in the following sections. However, the Master and Slave networks, and the weak coupling between them, are approximates to more biologically relevant small-world networks [ASW06, SCK04]. In small-world topologies, neurons have very few neighbors, but the average synaptic pathway between any two random neurons is short, allowing for efficient global communication. Within either Slave or Master, though the probability that two neurons are neighbors is low (since the connectivity is just

27 1.5%), the average path length between two neurons is 2.1, in adherence with the fundamental structural properties of small-world architectures. Weakly coupling of Master and Slave still allows for relatively short global path lengths since the probability that a single neuron does not have a bridge neuron to the network in its fanout pool is almost zero. However, because the coupling is weak, the global transfer of dynamics between networks is slow, preventing sudden disturbances in either network to adversely affect the other. Thus, while we are analyzing the global stability in a very specific network architecture, these results are applicable to more biologically inspired networks. The key to the success of this model is in the local versus global dynamic, in which global transmission rates are low (but present, inducing the requisite stability properties), yet local feed-forward circuits are easily found. These attributes are shared with general biological networks.

3.6.2 Coupled RAIN Dynamics

We refer to the coupling between the networks as the bridge. In this chapter, except for figure 3.10, we use a bridge probability p = 2E-6, and a synaptic strength for each synapse of 30 pS. The bridge probability p indicates the probability that a neuron in one network is connected to a neuron in the other. As the networks have 10,000 neurons each, p = 2E-6, for example corresponds to 200 connections in each direction (assuming a bidirectional bridge), each of strength 30 pS. In the control configuration for the coupled network, these parameters are used, along with a bidirectional bridge. The STP parameters used for Master and Slave in this chapter are listed in table 3.2.

We now consider several variants of the coupled network configuration, where we vary the direction of the bridge, and examine the resulting activity in Master and Slave. We initialize each network independently (unless otherwise stated), as described in section 3.2. Figures 3.8 and 3.9 summarize the following results. When the networks operate indepen- dently (no bridge), Master sustains activity whereas Slave dies quickly as expected (see figure 3.8 (A)). When Slave bridges to Master, Slave dies quickly, but Master sustains ac- tivity, as in figure 3.8 (B). When Master bridges to Slave, figure 3.9 (A), both networks can

28 Figure 3.8: A) Slave and Master are uncoupled. Master continues indefinitely whereas Slave dies. B) Slave has projections onto Master. Here Slave dies, as expected, and Master continues indefinitely.

29 sustain activity. In figure 3.9 (B), a bidirectional bridge is used, but only Slave is initialized. This is enough to, in turn, initialize Master via the bridge. Master is also able to maintain activity in Slave through the bridge. These configurations demonstrate that Slave needs Master, whereas Master could potentially survive without Slave.1 However a bidirectional weak coupling is desirable. This is because, in the event that Master experiences a shock, Slave can prevent the death of Master through rare inputs, as seen in the initialization of Master through the bridge connections in figure 3.9 (B). This is important as achieving RAIN activity is a delicate balance, and difficult to attain [KSA08].

In figure 3.10 we examine how weak the network bidirectional bridge can be before activity breaks down. In figure 3.10 (A) and (B), the average firing rate of Slave and Master is plotted for different bridge connectivity probabilities p, as shown in the legend. For p = 2E-7, which is merely 20 connections in each direction, but networks thrive. For p = 2E-8, two bridge connections, Master lives while Slave dies. The network dies quickly for p = 2E-4, which is unsurprising as that corresponds to 20,000 connections in each direction, doubling the number of neurons in each network. In this case, a surge in activity is easily propagated globally, which in turn forces network synchrony and then death [KSA08]. In figure 3.10 (C) and (D) the average firing rate of Slave and Master is plotted for different connectivity strengths. The synaptic strength s in the legend indicates the connection strength of the bridge synapses. The coupled network can self-sustain even for s = 5 nS, but the network dies for s = 2 nS.

3.6.3 Coupled Signal Propagation Dynamics

In this section, we consider the signal propagation capabilities of each network. We build the coupled circuit network by combining two circuit networks from section 3.3.1. The networks are then coupled and endowed with independent STP parameters as in section 3.6.2. Again, we refer to the individual circuit networks as Master and Slave, based on the type of STP

1However, in the brain, the PFC, which could serve as Master, does require some stimulation to become self-sustaining.

30 Figure 3.9: A) Master has projections onto Slave. This is sufficient to restart Slave whenever Slave dies. B) Slave and Master are mutually coupled. In this case, only Slave received initial inputs, and Master relied on Slave for a jump-start. This demonstrates that Slave has the ability to start Master in the event Master dies. In this configuration, both networks thrive indefinitely.

31 Figure 3.10: An analysis of the coupling required for the connections between Master and Slave. A & B) The average firing rate of Slave and Master, for one second of elapsed time, for different connectivity probabilities. These were performed with a bridge synapse strength of 30 nS. C & D) The average firing rate of Slave and Master, for one second of elapsed time, for different connectivity strengths. These were performed with a synaptic bridge connection probability of 2E-4.

32 Figure 3.11: For any layer k of interest, we construct a binary projection neuron pair. Layer k projects onto the excitatory indicator neuron (blue). The indicator neuron has an excitatory connection to the inhibitory neuron (red) which, in turn, inhibits the indicator neuron to prevent the indicator from being overwhelmed by the circuit layer during a stimulus. dynamics used in the network. We then run the signal propagation experiment from before, with identical inputs to Master and Slave.

From figure 3.12 (D) and (F), we see that reverberating signals are prominent in Master. Visually, the circuit in Slave does not become hyperactive as readily, due to its tendency to kill activity. This deceases the severity reverberating signal. In order to quantify the visual results, we wanted to measure signal propagation in Master and Slave. For each circuit layer of interest, we introduce a binary projection neuron pair composed of an excitatory neuron and an inhibitory neuron. The excitatory neuron is called the indicator neuron, to which the circuit layer being measured projects. We consider the indicator neuron’s activity as representative of the layer that projects onto it. The indicator neuron connects to an in- hibitory neuron which, in turn, connects back to the indicator neuron, as shown in figure 3.11. The inhibitor prevents runaway excitement in the indicator. The connection strengths be- tween the layers and the respective indicator neurons, and the connection strengths within the binary pair of neurons are all consistent with the overall background network strengths (non-amplified). The negative feedback loop to the indicator neuron prevents it from becom- ing overwhelmed by a surge in the circuit layer’s activity. We will measure synchrony in the indicator neurons in order to determine faithful signal propagation. Denote the input layer’s

33 indicator neuron for Master and Slave by ML1 and SL1, respectively. Likewise, denote the final layer’s signal neuron for Master and Slave by ML4 and SL4 respectively. Note that in measuring synchrony, we limit ourselves to 4 circuit layers because we were able to fill layer 4 with respect to the criteria described in section 3.3.1, whereas layer 5 routinely was several neurons short.

In section 3.8.3, we outline a metric for measuring the synchrony of the indicator neurons. The metric m(·, ·) takes two spike trains x, y ∈ {0, 1}N to the unit interval [0, 1], where m(x, x) = 1 implies an exact correlation, whereas a correlation m(x, y) = 0 implies no correlation. Define χ(·) to be the spike train of a neuron. Formally, for a neuron n, the spike

t2 t2−t1 train vector χt1 (n) ∈ {0, 1} is defined by χt = 1 if neuron n spikes at time t, and zero otherwise, where t is measured in ms, and comes from the discrete interval {t1, ··· , t2 −1}. In this case, we ran the signal propagation experiment for 20 seconds, injecting signals of 180 Hz for 25 ms into layer 1 of each network. The inputs to both Slave and Master circuits were the same. The time between signals was chosen uniformly from {150,..., 450} milliseconds. We measured the correlation of the indicator neurons on the last 10 seconds of spiking activity. We found that

  20K 20K m χ10K(ML1), χ10K(ML4) = .344   20K 20K m χ10K(SL1), χ10K(SL4) = .710, which is significantly different. On the other hand, under .3 was generally found to be the upper bound for the correlation between random neurons. That is, the signal propagating from layer 1 to layer 4 in the Slave network is much more faithful than that in the Master network. Thus, with this configuration, we have found a nice symbiotic relationship between Master and Slave. Master generates the requisite inputs to keep Slave alive, whereas Slave provides a better medium for signal propagation.

34 Figure 3.12: A & B) Signal propagation through 5 layers for Master and Slave. C & D) A reverberating signal that is experienced in layer 5 of Master, but not in Slave. E & F) The average firing rate of the neurons in each layer for Slave and Master respectively.

35 3.7 Finding Master STP Parameters

We have established that different types of STP parameters can generate different desirable network dynamics. In this section we attempt to classify regions of the STP parameter domain with respect to their likelihood of producing the dynamics requisite for a Master- like network. In section 3.4 it was conjectured that the derivative of µ (referred to in this section as dµ) is important to the dynamics induced by STP. In sections 3.8.1 and 3.8.2, an analytical argument is given in support of the conjecture. In [STM07], the authors propose which derivative signs are important for a fixed point firing rate with respect to their mean field current injection model, but in our case, our goal is a bit different. We simply require self-sustained neural dynamics with respect to the spike based neural networks we have been considering throughout this chapter. The analysis done in section 3.8.1 assumes small changes in network inputs, which will not always be true. Furthermore, the analysis in section 3.8.1 is a first order approximation to the network dynamics, ignoring the nonlinear dynamical interactions. The higher order dynamics are of course very difficult to predict, and it is currently unclear if any such predictions can be made. Also, for our purposes, we are less strict on the fixed point firing rate, and only require sustained activity at a reasonable rate. To be precise, our criteria is sustained firing for two seconds at a rate between 1 and 50 Hz. For these reasons, we cannot conclude at this point a hard and fast rule, based on analytical estimates, that will always yield a Master-like network. Based on our work in section 3.8.2, we can, however, significantly increase the chances of finding such networks.

First, we restrict the STP parameters as follows. Let U ∈ (0, 1) as it is a probability, and let τD, τF ∈ [0, 2], with the choices for the bounds here inspired by experimental data [STM07]. Somewhat arbitrarily, we use r∗ = 12 Hz as our desired firing rate for setting the value of Aij in equation (3.5). Our only requirement in this choice was that it was a biologically feasible firing rate, consistent with low background activity. We proceed by labeling the synaptic connection types as EE, EI, IE, and II for excitatory-to-excitatory, excitatory-to-inhibitory, inhibitory-to-excitatory and inhibitory-to-inhibitory, respectively.

36 Note that because STP has three parameters, and each network has four connection types, if we choose the STP parameters at random and independently for the different connection types, we are choosing parameters from a 12 dimensional space, making a brute force search intractable. Furthermore, as it turns out, a random search is unlikely to yield acceptable parameters, as demonstrated by the first entry in table 3.3.

To explore the parameter space more efficiently, we characterize three different dynamic synapse regimes, within the domain space, and explore each regime. For each connection type, we independently consider an STP parameter class of N, Pb, or Pa (giving us 81 different types of STP parameter combination for the four different connection types), where

∗ N, Pb, and Pa are dependent on r and rcrit, and defined as follows. In section 3.8.2, we derive a critical firing rate rcrit, in equation (3.25), such that for firing rates r below rcrit, dµ

∗ is positive, and for firing rates above rcrit, dµ is negative. Given a specific r and rcrit pair, we consider the following three regimes.

• N (negative): Classified by rcrit < 0, resulting in

– dµ > 0 can never occur, and

– dµ < 0 for all positive r.

∗ ∗ • Pb (positive, threshold below r ): Classified by 0 < rcrit < r , resulting in

∗ – dµ > 0 when 0 < r < rcrit < r , and

– dµ < 0 when rcrit < r.

∗ ∗ • Pa (positive, threshold above r ): Classified by r < rcrit, resulting in

– dµ > 0 for r < rcrit, and

∗ – dµ < 0 for r < rcrit < r.

For our analysis of the 12 dimensional STP network parameter space, we define a network type WXYZ, where W, X, Y, Z ∈ {N, Pb, Pa}, corresponds to a network where the EE, EI, IE,

37 and II connections are governed by the W, X, Y, and Z class of STP parameters, respectively. For each of the 81 network types, WXYZ, the STP parameters for each synapse type are selected from the corresponding STP class uniformly at random, with the additional caveat that we require the dynamical synapses to be slow moving: |dµ| < .01. We found that this additional requirement yields to our experiment success rates. Through random sampling of more than 108 parameter sets, it was found that about 18.8% of the STP parameters in the space violated the small dµ condition. For each network type WXYZ, and with the additionally constrained dµ, we chose 106 sets of parameters to test, uniformly at random, as stated above. For each parameter set chosen, we recorded a success if at the end of 2 seconds, the network was firing at a rate between 1 and 50 Hz. We present the highest percentages of success in table 3.3, along with the results for choosing parameters uniformly at random (the first entry).

Though the success rate for the most successful region in table 3.3 is only around 3.2%, it is two orders of magnitude larger than searching at random. This provides a preliminary heuristic on which regions of the STP parameter domain to search for parameters that induce self-sustaining spiking activity. From this statistical analysis, it is clear that the ideal parameter region requires that the EE and EI connections have dµ > 0 for firing rates above our target firing rate of 12 Hz. The other two types of synapses appear to be less important.

3.8 Analysis

3.8.1 Analyzing Firing Rate Changes

Here we expand upon the arguments of Sussillo et al. [STM07]. The following argument assumes the steady state dynamics described in equations (3.1), (3.3) and (3.4). In the following, we drop STP parameter subscripts and use D = τD and F = τF , for convenience.

We consider a network of two populations: excitatory and inhibitory. Let Wmn be the mean synaptic weight from the n population to the m population, where m, n ∈ {exc, inh}

38 Type # Successes % Success RAND 332 0.0332 PaPaPaPa 31978 3.1978 PaPaPbPb 19728 1.9728 PaPaPbN 17335 1.7335 PaPaNN 12478 1.2478 PaPaPaPb 9459 0.9459 PaPaNPb 5863 0.5863 PaPbPaPa 3353 0.3353 PaPaPbPa 3131 0.3131 PbPbPaPa 2257 0.2257 PaPaPaN 2082 0.2082 PaPbPbPb 1972 0.1972 PaPbPbN 1918 0.1918 PbPbPbPb 1710 0.171 PbPbPbN 1690 0.169 PaPbNN 1452 0.1452 PbPbNN 1425 0.1425 PaPbPaPb 1420 0.142 PbPaPaPa 1332 0.1332 PbPaPbPb 1013 0.1013

Table 3.3: The success rate for finding Master-like STP parameters for various regions of the STP parameter domain. Uniformly at random is the first entry and the most prolific regions defined in section 3.7 follow.

39 (where we occasionally use notation e = exc and i = inh). Similarly, let µmn be the mean dynamic synapse between the populations. We assume that the dynamic synapses are in- stantly equilibrating functions of the presynaptic firing rates. We assume that the synaptic

∗ ∗ ∗ T weights Wmn are chosen to produce a stable network firing rate of r = (re , ri ) for a con- T stant input of v = (ve, vi) . Here, re and ri are the average firing rates of the excitatory and inhibitory populations and ve and vi are the external inputs to excitatory and inhibitory

∗ populations. Recall that for r , we have µmn = Wmn, from the choice of the constants Aij in equation (3.5). Further, we assume that for some decay constant τm,

dr τ m = −r + f [v + µ (r )r − µ (r )r ], (3.7) m dt m m m me e e mi i i where fm is some monotonic function, called the rate transfer function. This is an approx- imation to the network dynamics, commonly referred to as the mean field approximation, where all dynamics are considered in aggregate. For a constant network firing rate r∗, equa- tion (3.7) yields

∗ ∗ ∗ ∗ ∗ ∗ ∗ rm = fm[vm + µme(re )re − µmi(ri )ri ]. (3.8)

∗ ∗ ∗ We use perturbation theory to examine the change in firing rate r + δr = (re + δre, ri + T T δri) for a change in external input v + δv = (ve + δve, vi + δvi) . With this perturbation, and the instant equilibration of the dynamic synapses, equation (3.8) becomes

∗ ∗ ∗ ∗ ∗ ∗ ∗ rm + δrm = fm[vm + δvm + µme(re + δre)(re + δre) − µmi(ri + δri)(ri + δri)]. (3.9)

For z = F (x), we use the linearization δz ≈ F 0(x) · δx. Up to first order in δr, equation (3.9) becomes

∗ ∗ δrm ≈ βm(δvm + Wmeδre + dmeδrere − Wmiδri − dmiδriri ), (3.10)

40 dµmn 0 ∗ ∗ ∗ ∗ where dmn = ∗ , βm = f (vm + Wmer − Wmir ), and we have used that µ (r ) = Wmn. dxn m e i mn n Define

        ∗ βe 0 Wee −Wei dee −dei re 0 B =   , W = B   , D = B     . (3.11) ∗ 0 βi Wie −Wii die −dii 0 ri

With this notation, equation (3.10) can be written as

δr ≈ Bδv + Wδr + Dδr. (3.12)

The solution to this system is

δr ≈ (I − (W + D))−1 Bδv, (3.13) where I is the identity matrix. We proceed with the assumption that the dynamic synapses are almost static (D ≈ 0), and approximate δr with respect to δv up to O(D2).

In the following, we refer to the components of a matrix M with subscripts, as in Mij. We invert the matrix in equation (3.13), while ignoring higher order terms from D, and we get

    1 1 − Wii − Dii Wei + Dei βe 0 δr ≈ δv α − c     Wie + Die 1 − Wee − Dee 0 βi

  1 βeδve (1 − Wii − Dii) + βiδvi (Wei + Dei) = , (3.14) α − c   βeδve (Wie + Die) + βiδvi (1 − Wee − Dee) where we have defined

α = (1 − Wee)(1 + Wii) + WeiWie (3.15)

41 and

c = Dee(1 − Wii) + Dii(1 − Wee) + WeiDie + WieDei. (3.16)

1 2 We now estimate α−c with respect to assumptions that D ≈ 0 and sup Dmn  inf Wmn, c which implies that α < 1, since every term c contains an element of D. We get:

∞ 1 1 1 X  c k 1  c  = α = = 1 + + O(D2). (3.17) α − c 1 − c α α α α α k=0

Combining equations (3.14) and (3.17), we can estimate the change in excitatory firing rate with respect to the change in excitatory input:

δre βe  c  ≈ 1 + (1 − Wii − Dii) δve α α β h  c   c i = e (1 − W ) 1 + − D 1 + α ii α ii α    βe c Dii ≈ (1 − Wii) 1 + − α α 1 − Wii β (1 − W )  c D  = e ii 1 + − ii α α 1 − Wii β (1 − W ) D (1 − W ) D (1 − W ) = e ii 1 + ee ii + ii ee α α α W D W D D  + ei ie + ie ei − ii α α 1 − Wii β (1 − W ) D (1 − W ) W D W D = e ii 1 + ee ii + ei ie + ie ei α α α α D [(1 − W )(1 − W ) − α] + ii ee ii α(1 − Wii) β (1 − W )  D (1 − W ) W D W D D W W  = e ii 1 + ee ii + ei ie + ie ei − ii ei ie . α α α α α(1 − Wii)

42 Substituting the values for the matrix components, we arrive at δr β (1 + β W ) β d r∗(1 + β W ) β W β d e ≈ e i ii 1 + e ee e i ii − e ei i ie δve α α α (3.18) β W β d r∗ β d r∗β W β W  − i ie e ei i − i ii i e ei i ie . α α(1 + βiWii)

Similarly, we get

δr β β W  β d r∗(1 + β W ) β d r∗(1 − β W ) e ≈ − i e ei 1 + e ee e i ii − i ii i e ee δv α α α i (3.19) β W β d r∗ β d r∗(1 + β W )(1 − β W ) − e ei i ie e + e ei i i ii e ee . α αβeWei

Observe that B is positive semi-definite by the monotonicity of f. We now prove that α is positive. If the synapses are static (D = 0), then equation (3.13) reduces to

δr ≈ (I − W)−1 Bδv,. (3.20)

Observe that α = det(I − W), and we have

  1 1 − Wii Wei δr ≈ Bδv. (3.21) α   Wie 1 − Wee

Substituting in the appropriate values, and solving for the change in δre with respect to δve, we have

δre 1 = (1 + βiWii)βe. (3.22) δve α

This can also be derived from equation (3.18) when all elements from D are set to zero. Because of the monotonicity of the rate transfer function, f, an increase to the excitatory

inputs ve must result in an increase in re, thus α > 0.

43 With both α > 0 and B positive semi-definite, and all of our weights Wmn ∈ [0, 1] (so positive), the amount of change in equations (3.18) and (3.19) are largely determined by the signs of the derivatives dmn, as explored in section 3.7.

3.8.2 Critical Firing Rate

Given that the dynamics of the network depends on on the rates of change of the dynamic synapses, we will proceed to characterize the parameters that give the desired characteristics.

∗ ∗ dµ As before, let m, n ∈ {exc, inh}. As dmn = dµmn/dr , we begin by computing dr from equations (3.1) to (3.4), where we have dropped the star notation and, again dropped STP

subscripts and used D = τD and F = τF , for convenience. We get:

dµ U(F − DF 2Ur2 − 2DF Ur − FU − DU) = . (3.23) dr (DF Ur2 + DUr + F Ur + 1)2

As U is a probability, and thus positive, the sign of the derivative is completely determined by the expression

F − DF 2Ur2 − 2DF Ur − FU − DU, (3.24)

which is quadratic in r. To find the transition point in equation (3.23), we equate equa- tion (3.24) to zero and solve for r. Recalling that U is bound to (0, 1), we arrive at

r 1 1 − U r = − + , (3.25) crit F UDF

where we have taken the (potentially) positive branch of the solution. We note that equa- tion (3.24) is dominated by a negative quadratic coefficient, so as the steady-state firing rate r grows large, the synapse governed by equation (3.23) is depressive. Thus, for any set of parameters, (U, D, F ), commonly referred to as UDF , one of two cases can happen.

44 First, if rcrit, as computed by equation (3.25), is negative, the synapse governed by the UDF parameters will always be depressive. On the other hand, for a positive rcrit, then for a steady-state firing rate r < rcrit, the synapses governed by UDF will be facilitating, whereas for r > rcrit, the UDF synapses will be depressive. In section 3.7, it is with this synaptic characterization that we search the UDF parameter space for STP parameters to give the Master-like networks.

3.8.3 Assessing Circuit Layer Correlation

The metric that is used in section 3.6.3 uses the inner-product of two spike train vectors, each convolved with a Gaussian and then normalized, to measure the correlation between the two vectors. Note that this correlation measuring technique is also used in chapter 4.

t2 t2 For a neurons n and m, we let x = χt1 (n) and y = χt1 (m). We define σ = 70 ms and ∆ = 9 ms. We found these values to be a good trade off in the following algorithm.

Let g be the Gaussian distributed vector, centered at t = 0 with standard deviation σ. Then the resultant convolutions x ∗ g and y ∗ g have Gaussian bumps centered at each of the spike occurrences, or ones, in the original vectors x and y, respectively.

Denote the discrete Fourier transform of a vector v by vb. Then, the Hermitian property of the discrete Fourier transform operator, Parseval’s identity and the convolution theorem are used to compute the metric described above:

 x ∗ g y ∗ g  m(x, y) = , (3.26) k x ∗ g k k y ∗ g k  x ∗ g y ∗ g  = [ , [ (3.27) k x[∗ g k k y[∗ g k  x · g y · g  = b b , b b , (3.28) k xb · gb k k yb · gb k where the multiplication in equation (3.28) is component-wise, allowing for the efficient computation of m(x, y) for large vectors. Since x, y and g are all positive, m(x, y) has range

45 [0, 1], and larger values indicate a greater correlation between the spike trains.

3.9 Conclusion

In this chapter, we began by finding synaptic weights that generate self-sustained RAIN activity in the network. This alone is a problem worth studying [VA05, KSA08]. We suc- ceeded in finding the correct balance in the spiking domain. We then proceeded to study signal propagation through the RAIN network, as in [VA05]. However, we found that re- verberating signals are prominent, degrading the faithfulness of the signal propagation. It is likely possible to tune the network just right so that the reverberating signals are not present, however such manual tuning is very difficult.

In light of this problem, we consider STP dynamics. We consider the work of [STM07], which demonstrates that STP can induce a steady state firing rate. We found STP parame- ters that could produce self-sustained RAIN activity, which was actually more fault-tolerant to an improper excitatory/inhibitory balance than static synapses alone. We also found STP parameters that quickly kill networks. In combining networks with both types of dynamics, we proposed a novel coupled Master/Slave network that relies on a symbiotic relationship between the networks, in which Master sets the pace for the coupled system and Slave is leveraged for faithful signal propagation. However, Slave on its own cannot survive.

Finally, we studied the problem of finding STP parameters to induce Master-like network activity. This problem was found to be difficult, yet we were able to find a heuristic, with analytical support, that increases the likelihood of finding such networks by two orders of magnitude over a random search of the parameter space.

More work needs to be done on paring down the STP parameter space, and a deeper analysis the STP parameters’ affects on neural dynamics needs to be conducted. This is a difficult problem, and a general analytical characterization may not be attainable. However, it is likely that some relationship amongst the STP parameters could be found to boost the overall success rate of finding Master-like networks, giving a deeper understanding of the

46 system.

47 CHAPTER 4

Learning Multiple Signals Through Reinforcement

4.1 Introduction

Reinforcement learning is an approach to trial-and-error learning where an agent’s actions are guided by a class of signals called rewards. models are built into agents/systems that can learn from their interaction with their environment. The reinforce- ment models have advantages over supervised learning models [RM87] because it obviates the need for a supervisor to provide real-time feedback to the agent. The rewards during reinforcement learning are derived from the environment to provide a sense of value to the agent to guide learning during agent-environment interactions. Typically the reward appears after the cues and actions that correspond to it and is known as the distal reward prob- lem [Hul43, Izh07b] and in the reinforcement learning community as the credit assignment problem [SB98]. Ultimately, the goal of designing such systems is to produce autonomous, self-programming systems that achieve the goals in a flexible and reliable manner.

Most computational approaches to modeling reinforcement learning have focused on the “temporal difference” algorithm [SB98, HFO10] which computes the expected reward using an explicit account of temporal discount [SB98]. In this chapter, the focus is on developing a biologically plausible approach to modeling the distal reward problem using spiking neural models. This is because the primary mode of communication between neurons in the brain

This chapter is joint work with Narayan Srinivasa and will appear in Neural Computation.

48 is encoded in the form of impulses, action potentials or spikes. This mode of communication enables the brain, composed of billions of neural cells, to consume less than 20W of power [Len03, AL01]. A solution to the distal reward problem in the spiking domain would thus be a very efficient solution.

The reward signaling in the mammalian brain has been linked to the dopamine sys- tem [SR90, LAS91]. A model that linked STDP and dopamine signaling was developed in both [Izh07b] and [Flo07], known as reward-modulated STDP (R-STDP). In R-STDP, the synapses are evolved by STDP and modulated by a global reward signal such as dopamine. Despite the success of R-STDP, [FSG10] demonstrate that R-STDP cannot learn multiple reinforcement tasks simultaneously. In this chapter, we extend R-STDP to solve the problem of simultaneously learning multiple distal rewards.

4.2 Distal Reward Problem

In Pavlovian conditioning experiments, an agent learns to associate certain cues with resul- tant rewards or punishments. This is reinforcement learning because the learning is derived from the reward (or punishment) administered following the cue. In the context of spiking neural networks, the spiking sequence that is associated with either a reward or punishment is referred to hereinafter as an r-pattern. Furthermore, the term reward will be used to mean either reward or punishment, since both can be used in reinforcement learning. Continu- ing with this terminology, in Pavlovian learning, reward lags the r-pattern by seconds, yet the reward still yields effective learning [Pav27, Hul43, HDB95, Sch98, DA01]. The delay between the r-pattern and reward is precisely the reason reinforcement learning is such a powerful tool: it allows for hindsight evaluation of the agent-environment interactions, which the agent can then incorporate into behavior modification. However, this delay also poses difficult questions. Since reward lags the r-pattern, the r-pattern is no longer present when the reward is available to aid in learning–which, in spiking neural networks, takes the form of synaptic strength modification.

49 The second observation is that the rest of the network continues to spike during the delay between the r-pattern and the system uptake of reward. Thus, if the reward is truly to enhance the r-pattern, making it more likely to appear in the future, how does the reward “pick-out” the particular spiking pattern which induced the reward? For instance, consider the situation where a dog is told to sit. Suppose the dog then performs two nearly simultaneous actions such as shaking its head and sitting. The dog is then of course given a treat for sitting. However, how does the dog “know” that the action of sitting was rewarded, and not that of shaking its head? The key of course is in repetition–but this is on the macroscopic/behavioral level. It is interesting to see the corresponding correlates at the cellular level. This problem of reinforcing a specific r-pattern over other spiking patterns in the network is called the “distal reward problem” [Hul43], or the “credit assignment problem” [Min61, BSA83, HDB95, SB98, DA01, WP05].

As discussed in the introduction, [Flo07] and [Izh07b] solved the distal reward problem for a single r-pattern, in the context of spiking neural networks, using reward-modulated STDP (R-STDP). In this chapter, we extend R-STDP to enable a spiking neural network to learn multiple r-patterns, as outlined in the following sections.

4.3 Methods

We use the LIF neuron model described in section 2.1.2, with the network parameters listed in table 4.1. In this chapter, we build on the STDP plasticity model from section 2.2.1. Here

A+ and A− correspond to changes of +0.5% and −0.65% of the maximum synaptic strength

possible, respectively, and the time constants τ+ and τ− determine the effective time windows

for potentiation and depression, respectively. In most simulations, β = A−τ−/A+τ+ = 1.3.

When changing β in this chapter, A+, τ+ and τ− are fixed while A− is varied. Note that gmax = 15 nS, chosen such that a pre-synaptic spike across a fully potentiated synapse is not strong enough, by itself, to elicit a post-synaptic spike from resting potential.

50 All Networks

Cm = 200 pF gleak = 10 nS

Einh = −80 mV Eexc = 0 mV

Vthresh = −54 mV Vreset = −60 mV

Erest = −74 mV fanout = 150

τexc = 5 ms τinh = 15 ms

A+ = 0.005 A− = 0.0065

τ+ = 20 ms τ− = 20 ms

Table 4.1: Network parameters used in this chapter.

Section 4.3.1 defines R-STDP, which expands STDP for use in reinforcement learning. In section 4.3.2 a new learning rule is developed, called ARG-STDP, which improves the reinforcement model, enabling the learning of multiple distal rewards.

We also employ STP, descibed in section 2.2.2, to stabilize network dynamics. When

STP is used, we use Aij = 2.03 (equation (3.5)), τD = 50 ms and τF = 20 ms, U = 0.5.

4.3.1 Reward Modulated STDP

Reward modulation can be used in a straight forward way to extend any Hebbian unsuper- vised learning rule [FSG10]. In this chapter, R-STDP is used, which results from reward modulation of the STDP rule [Flo07, Izh07b]. To extend STDP with reward modulation, a global, extracellular modulator, such as dopamine, is assumed to exist. If R(t) denote the extracellular dopamine concentration, then the plasticity equations are as in section 2.2.1, except that equation (2.17) is replaced with:

51 ˙ Wij(t) = α · R(t) · Eij(t) (4.1) ˙ Eij(t) Eij(t) = − + Pij(t)Xi(t) − Dij(t)Xj(t − ∆j), (4.2) τE where α controls the learning speed of the system. In all simulations in this chapter, α = 12.

The value τE = 1000 ms is the time constant governing the eligibility trace, Eij, which tracks the potential contributions to the synaptic weight change from the potentiation trace and the depression trace. These are potential weight changes because the weight will not be affected by Eij unless the system is gated on by the global reward modulator R(t). In general, R(t) can be punishing as well as rewarding, depending on its sign. Likewise, Eij can be positive or negative, determined by the temporal ordering of the spikes between neuron i and neuron j [Flo07, Izh07b]. The Eij are initialized to zero and essentially track the underlying STDP rule, with the exception that they are not confined to be within the interval [0, 1], and thus can be negative when a spike in neuron i precedes a spike in neuron j. Though in general R(t) can be positive or negative, the focus in this chapter will be on the positive feedback R(t) ≥ 0, without any loss of generality.

4.3.2 R-STDP with Attenuated Reward Gating

We assume a mechanism that slowly reduces the amount of “reward” released into the system as the preference for the correct firing sequence becomes stronger. That is, for each of the N signals, we use a separate attenuating success signal for each reward channel. Specifically, for each signal channel k ∈ [1, ··· ,N], a separate reward predictor, Rk, is introduced. This reward predictor is initially set to zero, and slowly tracks towards the the reward associated with each successful presentation of signal k. The function

 R(t) − Rk if r-pattern k induces a reward, Sk = S(R, k, t) = (4.3) 0 otherwise

52 Figure 4.1: System reward R, reward tracker Rk and success signal Sk for reward channel k are plotted. The time constant τR controls the rate of convergence of Rk → R. The independent axis is discrete and denotes the number of times success signal k is presented. Though the domain is discrete, interpolation is used to emphasize the trend.

is called the success signal, which is broadcast to the system after each reward presentation. The success signal is a measure of the reward prediction error, which converges to zero as P Rk → R(t) (see figure 4.1). To enable individualized attention for each task, k Sk is used in place of R(t) in equation (4.1):

N ! dWij X = S · E (t). (4.4) dt k ij k=1 To ensure that the reward prediction error converges to zero individually for each reward channel, we use the following simple update rule:

R − Rk Rk ←− Rk + , (4.5) τR

where τR = 500. The constant τR determines the speed of convergence for Rk → R(t). In this case, convergence speed is relative to the number of presentations of success signal k.

53 For a given τR, after τR occurrences of signal k, then Rk ≈ .63 · R, whereas after 2 · τR presentations of success signal k, Rk ≈ .86 · R.

r In order to formally define R(t), let Tk be a list of r-pattern presentation times for channel r r r 1 3 k. For each tk ∈ TK , let dk be a delay uniformly selected from ( 2 τE,..., 2 τE). Let p denote the probability that an r-pattern induces a reward in channel k. With the exception of section 4.6.2, p = 1.0. Finally, let X be a random variable uniformly distributed on (0, 1). Define the indicator function P (X) to be 1 if X < p and zero otherwise. Then R(t) = R(t, X) can be specified as:

X X r r R(t) = δ(t − (tk + dk)) · P (X), (4.6) r r k tk∈Tk

Equation (4.6) implies that R(t) = 1, with probability p, whenever a delayed reward (induced by an r-pattern) is presented to the system, and zero otherwise. This learning algorithm is referred to hereinafter as attenuated-reward-gating of STDP, or ARG-STDP.

4.4 Single Synapse Reinforcement Experiment

In [Izh07b], a simple problem is introduced to demonstrate the learning of a single distal reward using R-STDP. We describe this experiment here, and then extend it in section 4.5 to address the problem of learning multiple distal rewards.

The experiment setup is as follow. The network consists of 1000 leaky integrate-and- fire neurons (800 are excitatory, 200 are inhibitory), as in figure 4.2. Network connectivity is defined to be the probability that neuron j is connected to neuron i with a synapse. Unless indicated, all of the experiments in this chapter use a network connectivity of 1.5%, giving a network fan-out of 15 (each pre-synaptic neuron is connected to 15 post-synaptic neurons, on average). This yields 15,000 total network synapses, on average. The network is stimulated with current, through a Poisson process, to produce Poisson-like spike trains while the average network firing rate is maintained around 1 Hz.

54 From the network, an excitatory neuron, j1, is selected at random, and labeled P re1.

Likewise, one of its post-synaptic neurons, i1, is selected at random, and denoted P ost1.

The synapse from P re1 to P ost1 is denoted as Syn1, and has synaptic weight Wi1j1 . The weight Wi1j1 is set to zero, while the other network synaptic weights are initialized to a constant value of 0.3 (0.8 for inhibitory synapses). In this base experiment, excitatory neurons are modified according to R-STDP, while inhibitory neurons are static. The R- STDP rule depends on a global reward R(t). The following paragraphs describe how R(t) is generated.

Figure 4.2: Network configuration diagram. There are 1000 neurons, with 800 excitatory and 200 inhibitory and 1.5% network connectivity. The blue arrows indicate excitatory connections, and the red arrows indicate inhibitory connections. In addition, N pre-synaptic neurons are chosen at random and denoted by P rek for k ∈ [1, 2,...,N]. For each pre-synaptic neuron P rek, a random post-synaptic neuron is chosen from its fan-out pool, and denoted by P ostk. The synaptic weights between each P rek and P ostk is set to zero, whereas the rest of the synaptic strengths are either set to 0.3 (for excitatory synapses), or 0.8 (for inhibitory synapses). In addition, for each of the neuron pairs, k, a separate reward channel is introduced, represented by a VTAK (ventral tegmental area) neuron that releases a global reward or success signal, represented by the green arrow.

55 Label a connected neuron pair (j, i) as (P re, P ost), with synapse Syn. Then, in general, if a spike in P re precedes a spike in P ost by a small window, then due to the Hebbian nature of the eligibility trace, Eij experiences a sharp increase. If Eij > 0 after the increase, Syn is eligible for potentiation, upon gating from the reward signal. In the opposite case where a spike in P ost precedes a spike in P re by a small window, Eij experiences a sharp decrease.

If Eij < 0, then Syn is eligible for depression. Specifically, if P re = P re1 and spikes at time

0 t, and P ost = P ost1 and spikes at t , then a coincident (anti-coincident) spike pair is defined as one in which 0 < t0 − t ≤ 10 (0 < t − t0 ≤ 10). That is, the small window must be 10 ms or less. With this terminology, and the fact that each neuron is spiking at 1 Hz, a coincident spike pair is expected about once every 100 seconds, on average. It is the coincident spike pair that we define to be the r-pattern that is to be learned by the network.

To facilitate synaptic potentiation and depression, the network is also complimented by a neuron from the ventral tegmental area [GTJ00], denoted by VTA1. The VTA1 neuron’s purpose is to release extracellular dopamine (DA) into the system. Upon the occurrence

∗ 1 3 of an r-pattern at time t , a delay d is selected uniformly at random from ( 2 τE, 2 τE), and ∗ the VTA1 neuron is stimulated at time t + d, inducing a DA release. The evolution of the strength of Syn1 is in figure 4.3, as a conductance histogram of the final network synaptic conductance distribution. By the end of the simulation, Syn1 is fully potentiated, yet the rest of the network synapses are maintained at less than half strength, with many of them depressed very near zero (aside for the inhibitory synaptic conductance which are not plastic).

Stronger depression (A− > A+) skews the final distribution towards zero. Note that as the parameters were selected so that a pre-synaptic spike across a fully potentiated synapse does not, by itself, cause a post-synaptic spike, the firing rate of P ost1 is negligibly affected by learning in this experiment. [Izh07b] explains the success of R-STDP.

56 4.5 Generalization to Multiple Synapse Learning

In this section, we describe an experiment for learning across multiple distal reward channels.

For k ∈ [1,...,N], we select an excitatory neuron at random and label it P rek. For each

P rek, we select an excitatory neuron, P ostk, from P rek’s fan-out pool. The synapse connect- ing P rek to P ostk is denoted as Synk. As above, the initial weight for each of these synapses is set to zero, while initializing the rest of the network synapses as described previously. The same experiment is performed but now the system is rewarded for an r-pattern from any of the (P rek, P ostk) pairs. The results for n = 2 are shown in figure 4.3. As predicted in [FSG10], the system cannot learn more than one r-pattern with R-STDP. This is due to the bias of the rule: while channel one broadcasts a distal reward, Syn2 is just as often in an anti-correlated state as it is in correlated state (at least in the early phases of learning).

So, when the delayed reward is presented to the system, Syn2 will be depressed half of the time, and potentiated half of the time. However, depression is stronger than potentiation, so this has a detrimental affect on any gains the Syn2 may have made. The Syn1 faces the same learning obstacles as Syn2, and as evident from figure 4.3c, neither of them are stably potentiated.

In an effort to solve this generalized problem, two approaches are considered. First, in section 4.5.1, STP is examined as a solution, because of its stabilizing effects on a network [STM07]. Second, in section 4.5.2, a reward attenuation scheme is adapted, as the following explains: If every successful learning effort undertaken by a system maximizes the extra- cellular reward, then the average extracellular modulator present in the system will trend upward, making gains with each newly learned task. This is an unrealistic assumption given the extended lifetime of a system and the many tasks learned in that lifetime. Instead a mechanism that slowly reduces the amount of “reward” released into the system is assumed. This is a more realistic assumption, and leads to task-specific reward release as demonstrated in this chapter. Given individualized attention to each task, it is anticipated that a network will more ably learn multiple tasks. The concept of an attenuated reward signal has been

57 Figure 4.3: Synaptic learning under R-STDP. a) & c) Evolution of the synaptic weight for the 1-synapse and 2-synapse learning experiments respectively, for a duration 10,000 seconds. Each color represents a unique synapse. b) & d) Conductance histogram showing the final network conductance distribution for the 1-synapse and 2-synapse learning experiments (in log scale), respectively. Synaptic strength is measured in units of gmax, where 1.0 is fully potentiated. The synapses at 0.8 are inhibitory synapses, which are held static (red). considered before [FSG10, US09]. In fact, [FSG10] argues that in order to learn multiple signals, the broadcast reward has to vanish, on average, for each channel individually.

In the following sections, it is necessary to benchmark the network learning performance. Thus, it is assumed that the network has successfully learned the signals when 90% of the synapses in question have been stably potentiated to at least the middle of its synaptic conductance range.

58 4.5.1 R-STDP with STP Learns Multiple r-Patterns

In this section, STP is employed (see section 2.2.2 for details) to learn multiple r-patterns. Using STP, at least 20 r-patterns can be successfully learned according to the metric in the previous section, as shown in figure 4.4. However, with the chosen parameters, R-STDP augmented with STP could not learn 25 r-patterns.

Figure 4.4: Synaptic learning under R-STDP with STP. a) & c) Evolution of the synaptic weight for the 20-synapse and 25-synapse learning experiments, respectively, for a duration of 100,000 seconds. Each color represents a unique synapse. b) & d) Conductance histogram showing the final network conductance distribution for the 20-synapse and 25-synapse learning experiments (in log scale), respectively. Synaptic strength is measured in units of gmax, where 1.0 is fully potentiated. The synapses at 0.8 are inhibitory synapses, which are held static (red).

The fact that the biased learning rule, R-STDP, could learn more than one r-pattern

59 seems to be at odds with the conclusions of [FSG10], in which it is argued that a biased learning rule cannot learn multiple r-patterns. However, the conclusions of [FSG10] are based on the average synaptic weight change:

h∆Wiji = hS(R)Eiji = Cov(S(R),Eij) + hS(R)ihEiji. (4.7)

In [FSG10], equation (4.7) is used to argue that while the covariance term is useful for learning a network wide r-pattern, the hS(R)ihEiji term must be suppressed as it detracts from the network’s learning of the r-pattern, because it ignores correlations between the reward and the stimulus. The authors then argue that learning multiple r-patterns must be done either by an unbiased learning rule, where hEiji = 0, or by a system that employs a success signal which vanishes on average. In biology, for a given stimulus, it is realistic that only a small subset of the brain is responsible for the correct reaction. Thus, average weight dynamics that reflect the state of the entire network are not sufficient to capture the affects of local changes due to learning, as evidenced by figure 4.4, which successfully learns 20 r-patterns using a biased rule. In contrast to the networks studied in this chapter, the networks studied in [FSG10] were of the non-recurrent, strictly feed-forward type with small network size, and every neuron was partly responsible for producing the correct dynamics. Thus, in [FSG10] the average synaptic change was crucial to successfully learning multiple r-patterns.

4.5.2 ARG-STDP Learns Multiple r-Patterns

Ideally, the mechanism for gating the success signals would be an online biological mech- anism, such as a spiking critic model [PMD09]. For simplicity, we employ an algorithmic approach in this chapter to test the effectiveness of the mechanism. It is assumed that a separate reward channel for each r-pattern exists, gated by the dopaminergic neurons VTAk

for each k. A reward is said to come across channel k if VTAk sends the global attenuated success signal, Sk, which gates network synaptic plasticity. ARG-STDP and its details are

60 presented in section 4.3.2.

Using ARG-STDP with the parameters selected, it is possible to learn up to 16 r-patterns (see figure 4.5). When learning from 17 distal reward channels, synaptic learning becomes un- stable, causing network synapses to be potentiated or depressed haphazardly–including those that have been targeted for learning. The underlying STDP rule employed in this chapter converges to a bimodal weight distribution in the stable learning paradigm [SMA00, RBT00]. While attempting to learn 17 synapses, however, the network conductance histogram is spread across the possible dynamic range as shown in figure 4.5d, indicating network insta- bility. Figure 4.5c shows the effects of network instability on learning.

4.5.3 STP Stabilizes ARG-STDP Network Learning Dynamics

Figure 4.6c and figure 4.6d show a positive feedback loop resulting from the interaction between the excitatory neuron group, E, and the neurons P ost = {P ostk | k ∈ [1,...,N]}. The synapses from P re to P ost are strengthened (where P re is defined similarly to P ost), resulting in an increased average firing rate for the P ost group. The increase in firing rate for P ost then results in an average synaptic weight gain from E to P ost. This further increases the firing rate of the P ost group, leading to a positive feedback loop of uncontrolled firing in P ost and synaptic potentiation from E to P ost. This is enough of a perturbation to disrupt the network dynamics, causing an increase in network firing (in figure 4.6c, the firing rate of the entire network erupts to almost 40 Hz) and unstable learning dynamics, as demonstrated in figure 4.5c and figure 4.6d.

We employ STP (section 2.2.2) to stabilize the network spiking dynamics, regularizing the firing rate. As shown in figure 4.6e, all neuron pools fire around 1 Hz. STP eliminates the fateful rise in average firing rate in the P ost population, seen in figure 4.6c. This in turn eliminates the positive feedback loop that resulted in both unstable firing rates and learning. Figure 4.6f demonstrates improved learning where only the intended synaptic pool (P re to P ost) is potentiated, and the rest of the network synapses are close to zero, on average.

61 Figure 4.5: Synaptic learning under ARG-STDP. a) & c) Evolution of the synaptic weight for the 16-synapse and 17-synapse learning experiments, respectively, for a duration of 30,000 seconds. Each color represents a unique synapse. b) & d) Conductance histogram showing the final network conductance distribution for the 16-synapse and 17-synapse learning experiments (in log scale), respectively. Synaptic strength is measured in units of gmax, where 1.0 is fully potentiated. The synapses at 0.8 are inhibitory synapses, which are held static (red).

In order to ensure that the effective synaptic weights generated by STP are not sig- nificantly different than the static weights used during the onset of learning, with respect to short-term dynamics, the effective weights were normalized with respect to steady-state firing at 1 Hz (section 3.4), so that at an average firing rate of 1 Hz, the effective weights coincide with the static weights. STP and ARG-STDP were then turned on only after the steady-state dynamics were achieved. This has no qualitive effects on the experiment. The

62 learning was the same in both cases (STP and ARG-STDP on from the beginning of the simulation compared to turning them on after 100 seconds had lapsed).

Figure 4.6: Analysis of average synaptic growth and firing rates. The neuron pools are E, I, P re, P ost, indicating the excitatory, inhibitory, the P rek, and the P ostk neuron pools. a) & c) & e) The average firing rates of each pool of neurons for the 16-synapse, 17-synapse, and 17-synapse with STP learning experiments, respectively. The inset in (c) shows the detrimental rise in the average firing rate of Post. b) & d) & f) The average synaptic strengths between the different neuron groups for the 16-synapse, 17-synapse, and

17-synapse with STP learning experiments, respectively, measured in units of gmax.

Using STP, a network can learn at least 30 r-patterns, but not more than 40 (see fig- ure 4.8), with the particular choice of parameters used. Specifically, τR can influence how many distal rewards can be learned. Increasing this value enables a greater number of reward

63 presentations per channel to influence learning, but with the tradeoff of an increased amount of time required to fully learn the r-patterns. One important thing to note is that under the influence of STP, even when the network fails at learning all the synapses, learning remains stable. That is, the network never enters into the chaotic state shown in the 17-synapse learning experiment, but instead shares the stable learning dynamics that networks using only R-STDP with STP have (compare (c) and (d) from figures 4.4, 4.5 and 4.8).

4.6 Properties of ARG-STDP with STP

In the previous section, it was established that multiple distal rewards can be learned using ARG-STDP augmented by STP. In this section the dynamics of the reward gating framework will be further explored.

4.6.1 Reward Predictive Properties of r-Patterns

So far, it has been shown that multiple synapses can be reinforced in a stable manner with- out interfering with each other. From the experimental setup, it is clear that the occurrence of the kth r-pattern will be predictive of a reward from the kth reward channel. However, to demonstrate r-pattern independence, it is necessary to demonstrate that the kth r-pattern se- lectively predicts only rewards from the kth reward channel, and not others. To demonstrate that this is so, a 10-synapse learning experiment was simulated for 100,000 seconds. Though learning converged within the first 10,000 seconds, only the final 10,000 seconds of data was analyzed. For each pair of integers k, ` ∈ [1,..., 10], let d(k, `) be a metric measuring the correlation between the temporal occurrences of the kth r-pattern and rewards from the `th reward channel, where larger values of d(k, `) indicate a stronger correlation. The specific metric used here is defined in section 4.7.1 (though, other metrics were tested, and yielded similar results). Independence in r-pattern learning is achieved when d(k, k)  d(k, `) for k 6= `. Figure 4.9 shows the values of the metric d for each k and ` pair. The values of d on the diagonal are significantly stronger than the values off of the diagonal, indicating

64 Figure 4.7: STP has a stabilizing effect on synaptic learning within the network. a) & b) Depict the evolution of the synaptic weights for a duration 30,000 seconds and the conductance histogram showing the final network synaptic conductance distribution, respectively, for the 17-synapse learning experiment without STP. c) & d) Depict the evolution of the synaptic weights for a duration 30,000 seconds and the conductance histogram showing the final network conductance distribution, respectively, for the 17-synapse learning experiment with STP. In (a) and (c), each color represents a unique synapse and the synaptic strengths are measured in units of gmax, where 1.0 is fully potentiated. In (b) and (d), plotted in log-scale, the synapses at 0.8 (red) are inhibitory synapses, which are held static. independence of r-pattern learning, as desired.

65 Figure 4.8: Synaptic learning under ARG-STDP with STP. a) & c) Evolution of the synaptic weight for the 30-synapse and 40-synapse learning experiments, respectively, for a duration 100,000 seconds. Each color represents a unique synapse. b) & d) Conductance histogram showing the final network conductance distribution for the 30-synapse and 40-synapse learning experiments (in log scale), respectively. Synaptic strength is measured in units of gmax, where 1.0 is fully potentiated. The synapses at 0.8 (red) are inhibitory synapses, which are held static.

4.6.2 Learning Robustness to Reward Release Probability

To test robustness, a probabilistic reward release p is used to control the probability that a reward is released given the occurrence of an r-pattern over any reward channel. The experiments considered thus far were based on p = 100%. We explored the learning behavior as p → 0, and the network’s performance was found to be robust to inconsistent reward. In fact, using the learning metric defined in section 4.5, a network can learn 30 r-patterns,

66 Figure 4.9: Heat map depicting the values of the correlation d(k, `) between the kth r-patterns and the rewards released from the `th reward channel, where k, ` ∈ [1,..., 10].

but not 40, with p as low as 15%. The specific results of a network with a reduced p value are omitted, because it looks very similar to figure 4.8. In figure 4.10 the network’s learning capacity is plotted as a function of p.

Note that due to the robustness to reward release probability, it can be concluded that picking reward delays from another properly chosen distribution, such as the exponential distribution, tuned so that the delayed reward will fall within the bounds prescribed here at least 15% of the time, ARG-STDP with STP will be successful in learning. That is, consider a reward across channel ` that falls outside of the delayed interval considered in this chapter,

due to a different delay distribution choice. Then, if Syn` is still eligible for potentiation, it

67 will be rewarded as usual. If the delay is significantly longer than the potentiation window,

Syn` will not be able to “distinguish” the reward from that of another channel k 6= `. The effects of the “misplaced” channel ` reward on synapse ` will be no more significant than a channel k reward.

Figure 4.10: The network learning capacity is plotted as a function of p. The data points indicate verified learning whereas the error bars correspond to simulations that were conducted with a granularity of 10 r- patterns. Thus, the error bars are one-sided with a length of 9.

4.6.3 Learning Robustness to Reward Ordering

To test the network’s learning dependency on the specific order of rewards, 50 experiments were conducted, each with a different random number generator seed. This produced a dif- ferent reward schedule for each of the 50 experiments. Using the metric defined in section 4.5, each network successfully learned 30 distal rewards, indicating that the network is robust to the specific ordering of the r-pattern presentations.

68 4.6.4 Network Scaling

We then studied the influence of network’s size on the number of r-patterns that can be learned. Recall that the term connectivity refers to the probability a single pair of neurons is connected by a synapse. On the other hand, fan-out refers to the number of post-synaptic neurons, on average, for a single neuron. With this terminology, the control network (used in all experiments thus far) is one of 1000 neurons, 1.5% connectivity (for a fan out of 15, with 15,000 total network synapses), and gmax = 15 nS as the maximal conductance value. Two network scaling techniques were considered. First, the network was scaled, by adding more neurons, while keeping the fan-out constant at 15 (that is, the network connectivity is scaled), which holds the average synaptic input to each neuron constant as the size of the network changes. The second scaling method is one in which the connectivity remains constant at 1.5%, but gmax is scaled, again in order to keep the average synaptic input to each neuron constant. Using both of these techniques, networks of sizes 2,000, 5,000 and 10,000 were simulated. Surprisingly, these networks could learn 30 or more r-patterns, but less than 40. The results for these experiments are omitted for brevity as they look similar to figure 4.8, with the caveat that the conductance histograms contain many more synapses, due to network size, but the overall distribution trend looks similar.

As increasing the size of the network did not allow for the learning of more distal rewards, it is apparent that network scale has very little to do with the learning capacity of a network under ARG-STDP. To test this claim, the networks were reduced in scale using the same two techniques as above. The same capacity for learning was found, even for sizes as small as 100 neurons, supporting the claim of independence between network size and learning capacity. This claim is further explored in the next section.

4.6.5 The Reward Scheduling Problem

Since network learning capacity (somewhere between N = 30 and N = 40) is largely in- dependent of network size, the hypothesis is that the networks reach a reward scheduling

69 temporal density threshold that prevents learning more r-patterns. If τE = 1s, global re- wards delayed up to two seconds after a coincident (or anti-coincident) spike pair still have an efficacy of 13.5% of the maximum. This is significant, considering that success signals are broadcast with a delay of between 500 ms and 1500 ms, which corresponds to between 61% and 22% efficacy, respectively. For simplicity, assume that a success signal will have a significant influence on synaptic plasticity with respect to spike events occurring two seconds prior to it, and refer to this time interval as a reward-gated interval, or RGI (see figure 4.11). Since each neuron spikes at 1 Hz on average, any pair of connected neurons will have a coincident spike pair once every hundred seconds, on average. Thus, for each channel k, a reward injection is expected approximately once every hundred seconds. Now consider a random anti-coincident spike pair (which also happens once every hundred seconds). Then

98 N the likelihood that it is not in an RGI is 100 , where N is the number of reward channels (assuming each RGI is two seconds in length). For N = 30, this is about 55%, implying that an anti-coincident spike pair falls within an RGI approximately 45% of the time. Likewise, for N = 40, this amounts to an anti-coincident spike pair falling into an RGI 55% of the time. The breakdown in learning happens somewhere between N = 30 and N = 40, where depression due to anti-coincident spikes pairs begins to dominate the learning process.

To understand this phenomena in a more formal way, consider a pair of neurons (j, i),

for an active reward channel k. The neuron pair is denoted by (P rek, P ostk), with the

associated synapse Synk and weight Wij. During a simulation in which the network is rewarding coincident spike pairs associated with this neuron pair, consider the time average

eligibility trace hEiji of Synk, as a function of the average potentiation trace, hP i, and

average depression trace hDi for Synk. For simplicity, hEiji = hP i − hDi. The reward

modulated gains in hP i are gated by coincident spike pairs for P rek and P ostk, whereas the gains in hDi are gated by anti-coincident spike pairs.

Given a coincident spike pair across Synk, two sources contribute to hP i: the gating of coincident spike pair for reward channel k, or, a coincident spike pair in any other reward channel ` 6= k. Let the average contribution to hP i due to the first source be denoted by a

70 Figure 4.11: In ARG-STDP, the reward’s effect on the weight gain in a synapse is dependent on the amount of time that passes from the completion of the r-pattern until the presentation of the reward. Here, consider

the effects of a reward at time zero on the r1-pattern, which is within the 2 second RGI, and the r2-pattern, which is beyond the RGI. Though the length of RGI is somewhat arbitrarily picked, its effects are clear, and it gives us a benchmark to compare with across experiments.

positive constant c1. Likewise, denote the average contribution to hP i for a spike pair due

th to any one of the ` channels by another positive constant, c2. Assuming that there are

a total of N reward channels, it follows that hP i = c1 + (N − 1) · c2, as there are N − 1 independent channels contributing a value of c2. The value of hDi is similarly estimated as

hDi = (N − 1) · βc2, where β > 1, corresponding to STDP’s slight bias towards depression. Unlike the hP i estimate, the hDi estimate does not have any contributions from reward channel k, as no success signal is broadcast for anti-coincident spike pairs. Combining these estimates, the total expected eligibility trace can be expressed as:

71 hEiji = hP i − hDi = c1 + (N − 1)c2(1 − β) = c2(1 − β)N + c3, (4.8)

where c3 = c1 + (β − 1)c2 is again a positive constant. The striking aspect of this equation is that the eligibility trace has a negative slope with respect to the number of distal reward channels (see figure 4.12). Near the beginning of the simulation, the average eligibility trace of Synk and the average broadcast success signal are independent. Thus,

dW  ij = hE S(t)i = hE ihS(t)i. (4.9) dt ij ij

As N increases, it becomes less likely that Synk is potentiated, since hEiji decreases inversely with the increase of N. See section 4.7.2 for an estimate of c1 and c2 based on the constants used in the network and the plasticity models. The striking results of these computations are in agreement with the simulation results (see figure 4.12).

4.6.6 Firing Rate Affects Learning Capacity

In section 4.6.4 it was shown that the network’s size has a very limited role in the learning capacity, and section 4.6.5 argued that a dense reward schedule is limiting learning. In an effort to break the 30 synapse learning barrier, the firing rates of the P rek and P ostk neurons were slowed. This reduces the reward injection rate on a per channel basis. For instance, if the firing rate of these neurons are reduced to 0.5 Hz, then a reward can be expected once every 400 seconds per channel. In this case, the probability that a random

398 N anti-coincident spike pair falls within an RGI is 1 − 400 . From the computations in the previous section, the prediction is that this probability can be around 45%, and the synapses can still be potentiated (since this is the probability that corresponds to learning 30 synapses previously). Picking N = 120 gives the desired probability, and the results in figure 4.13 demonstrates successful learning for this choice of N, with respect to the proposed learning metric, which is in agreement with the hypothesis. For N = 130, the network is not

72 Figure 4.12: The average eligibility trace, hEiji, as a function of N, the number of reward channels. Network learning decreases as N becomes large. Several examples with various values of N have been simulated, demonstrating the decreasing learning capacity of a network.

able to successfully learn, which is also in agreement with the hypothesis. Though learning capacity improves using this technique, the time required to learn is greatly increased since the learning rate has been slowed–however, this is a generally known inverse relationship between the learning rate and learning quality. For a formal comparison of this technique’s

advantage over the control experiment, see figure 4.16 in section 4.7.2, which compares hEiji for several experiments.

4.6.7 Eligibility Trace Time Constant Affects Learning Capacity

A second way to reduce the effects of the dense reward broadcast schedule is to reduce the eligibility trace time constant. This shortens each RGI, allowing for more reward channels

73 Figure 4.13: Synaptic learning under ARG-STDP with STP. Here the firing rates of P rek and P ostk, for k ∈ [1,...,N], is reduced to 0.5 Hz, down from 1 Hz in previous experiments. a) Evolution of the synaptic weight for the 120-synapse learning experiment, for a duration 800,000 seconds. Each color represents a unique synapse. b) Conductance histogram showing the final network conductance distribution for the 120-

synapse learning experiments (in log scale). Synaptic strength is measured in units of gmax, where 1.0 is fully potentiated. The synapses at 0.8 (red) are inhibitory synapses, which are held static.

to be active while maintaining the same reward coverage. For example, in experiments

with τE = 300 ms, the efficacy of a delayed reward is at 13.5% of the maximum (the same

threshold considered in section 4.6.4 where τE = 1000 ms) after a 600 ms delay. Thus,

the new RGI, corresponding to τE = 300 ms, is RGI = 600 ms. In order to maintain the same amount of average synaptic modulation, the reward delay must be reduced. When

τE = 1000 ms, the reward delay was picked uniformly from [500,..., 1500] ms, yielding reward efficacies from the interval (.223,.6). To maintain this same approximate efficacy distribution when τE = 300 ms, the reward delay was selected uniformly from [150,..., 450] ms. Thus, rewards must come much quicker after the coincident spike pairs. Here, the

99.4 N probability that a random anti-coincident spike pair falls in an RGI is 1 − 100 . The value N = 99 makes this approximately 45%. Figure 4.14 shows learning for N = 100. However, as in increasing N in the previous section, for N = 110, the results are not nearly as good. For a formal comparison of this technique’s advantage over the control experiment,

see figure 4.16 in section 4.7.2, which compares hEiji for several experiments.

74 Figure 4.14: Synaptic learning under ARG-STDP with STP. Here τE = 300 ms, down from 1000 ms in previous experiments. a) Evolution of the synaptic weight for the 100-synapse learning experiment, for a duration 200,000 seconds. Each color represents a unique synapse. b) Conductance histogram showing the final network conductance distribution for the 100-synapse learning experiments (in log scale). Synaptic strength is measured in units of gmax, where 1.0 is fully potentiated. The synapses at 0.8 (red) are inhibitory synapses, which are held static.

4.6.8 Interval Learning

It is known that learning many overlapping tasks at once is difficult [BSB96]. In this section, we examine interval learning to understand how it can influence the number of distal rewards a network can learn. In these experiments, two or more groups of r-patterns are created. During the experiment, the system partitions the time domain into intervals of a fixed length. These intervals are called learning intervals. As time progresses from one learning interval to the next, the system alternates which group of r-patterns is rewarded. Group sizes of 30 and 100, and learning intervals of 10K, 30K and 100K seconds were used. The methods for reducing the global reward schedule density problem were then applied as outlined in sections 4.6.6 and 4.6.7. It was observed that strong depression of synapses causes loss of the synaptic gains once the learning group is changed. This can cause a sawtooth learning pattern, as shown in figure 4.15a. This effect can be reduced by using smaller learning intervals. However it could not be totally mitigated in this fashion. Reducing the ratio

75 Figure 4.15: Interval learning. Each color represents a unique synapse. a) A seesaw pattern emerged in

some of the simulations. In this case, the following were used: a reduced spiking rate (0.5 Hz) for the P rek

and P ostk neurons; two synaptic groups, each of size 30, for 60 total synapses; learning intervals of 100,000 seconds; and β = 1.3. This simulation was ran for 600,000 seconds. b) In this experiment the following were

used: a reduced spiking rate (0.5 Hz) for the P rek and P ostk neurons; two synaptic groups of size 100, for 200 total synapses; learning intervals of 10,000 seconds; and β = 1.1. This simulation was ran for 1,200,000 seconds. c) In this experiment the following were used: τE = 300 ms; two synaptic groups of size 100, for 200 total synapses; learning intervals of 10,000 seconds; and β = 1.1. This simulation was ran for 300,000 seconds.

β from 1.3 to 1.1 (that is, reduce the depression strength with respect to the potentiation strength) results in significant improvement in learning. The best results (figure 4.15b and figure 4.15c) rely on the reduced β, as well as using larger group sizes, and small learning intervals. [FSG10] also demonstrates successful results with interval learning.

76 4.7 Analysis

4.7.1 Defining the Correlation Metric

Note that this technique is also used in chapter 3. Let (k, `) ∈ [1,..., 10]2, where the first coordinate represents the kth r-pattern and the second coordinate represents the `th reward

channel. For r-pattern k, let P rek and P ostk denote the corresponding pre-synaptic neuron

th and post-synaptic neuron. Denote the ` reward channel by VTA`.

The temporal resolution of the experiment in section 4.6.1 was 1 ms, and the data an-

alyzed spanned 10,000 seconds. Thus, define spike vectors Prek and Postk to be vectors

th of length 10,000,000 where, for z ∈ {P rek, P ostk}, the t component of z, z(t), is defined to be one if neuron z spikes at time t, and zero otherwise. Then, define the kth r-pattern occurrence vector vk, of the same length as above, in the following manner. Let vk(t) = 1 Q10 if both Postk(t) = 1 and i=1 (1 − Prek(t − i)) = 0, and vk(t) = 0 otherwise. That is, if

P ostk spikes at time t and P rek spikes within the discrete 10 ms interval [t − 10, . . . , t − 1],

th implying an occurrence of the k r-pattern at time t, then vk(t) = 1.

The reward injection vector for VTA` is denoted by w`, and is the same length as above

and defined similarly. Specifically, w`(t) = 1 if the system receives a reward injection from channel ` at time t+1000. A 1000 ms shift is used to recenter the reward vector since reward injection lags r-patterns by an average of 1000 ms due to the nature of the distal reward problem.

The metric that is used in section 4.6.1 uses the inner-product of the r-pattern vector and the reward vector, each convolved with a Gaussian and then normalized, to measure the correlation between the two vectors. Let g be the Gaussian distributed vector, centered at

t = 0 with σ = 400 ms. Then the resultant convolutions vk ∗ g and w` ∗ g have Gaussian

bumps centered at each of the occurrences, or ones, in the original vectors vk and w`, respectively.

Denote the discrete Fourier transform of a vector x by xb. Then, the Hermitian property

77 of the discrete Fourier transform operator, Parseval’s identity and the convolution theorem are used to compute the metric described above:

 v ∗ g w ∗ g  d(k, `) = k , ` (4.10) k vk ∗ g k k w` ∗ g k  v ∗ g w ∗ g  = \k , \` (4.11) k v\k ∗ g k k w\` ∗ g k  v · g w · g  = bk b , b ` b , (4.12) k vbk · gb k k wb ` · gb k where the multiplication in equation (4.12) is component-wise, allowing for the efficient computation of d(k, `) for large vectors. Since vk, w` and g are all positive, d(k, `) has range [0, 1], and larger values indicate a greater correlation between the occurrences of the kth r-pattern and the rewards injected by the `th reward channel.

4.7.2 Computing the Decaying Eligibility Trace

The computation of constants c1 and c2 used in generating figure 4.12 is described below. The notation from section 4.6.5 is used. The parameters that are used in these calculations

are A+; A−; τ−; τ+; β = A−τ−/A+τ+ > 1; p, as defined in section 4.6.2; d1 and d2 which marks the bounds of the interval from which the delay between an r-pattern and the its

subsequent reward is uniformly picked; r, the average firing rate of each of the P ostk and 1000 P rek neurons; χ = r , the average inter-spike interval of each of the P ostk and P rek neurons, which is a more intuitive parameter to use than r; and κ, the coincidence interval,

0 defined such that if P ostk spikes at t and P rek spikes at t , then the spike pair is considered coincident if t − t0 < κ. The following assumptions are also made:

τ+  χ (4.13)

τ−  χ (4.14) χ2 τ  . (4.15) E κ

78 Equations (4.13) and (4.14) reduce STDP to nearest-neighbor STDP, or NN-STDP [MDG08]. More specifically, synaptic plasticity is only significantly influenced by temporally adjacent spikes in the pre-synaptic neuron and post-synaptic neuron. For example, consider post-

0 0 0 synaptic spikes at t1 and t2, and pre-synaptic spikes at t1 and t2, with the ordering t1 < t1 < 0 0 0 t2 < t2. Then, the spike pair at (t1, t1) will contribute to depression and the spike pair (t2, t2) 0 0 will contribute to potentiation, but the spike pairs (t1, t2) and (t1, t2) will have no significant influence. This is due to the exponentially decaying nature of STDP, coupled with the small firing rate (which implies a large inter-spike-interval).

The first task is to estimate the contribution from reward channel k to hP i. Assuming

P rek spikes at time 0, let s denote the time that P ostk spikes, where s is belongs to (0, κ) with uniform probability. Given that P rek spikes at time 0 and P ostk spikes at time s, then the system will be modulated by reward through channel k, with probability p, at time t, where t is picked uniformly from (s+d1, s+d2). Then, the contribution to potentiation from

−s/τ+ −t/τE channel k is A+e e . Considering intervals of length χ, one expects an r-pattern to κ occur with probability χ . Of the intervals in which an r-pattern occurs, a reward is released through channel k with a probability of p. So, the expected contribution over the possible values of s and t is

κ Z κ 1 Z s+d2 1  −s/τ+ −t/τE p · A+e e dt ds, (4.16) χ 0 κ s+d1 d2 − d1

χ averaged over κ intervals of length χ. The value of equation (4.16) is denoted by c1, which is the positive constant:

pA τ 2 τ   τ + τ  + E + −d1/τE −d2/τE  E + c1 := e − e 1 − exp − κ . (4.17) χ(d2 − d1)(τE + τ+) τEτ+

To estimate the contribution to hP i from a single reward channel ` 6= k assume, as before,

that P rek spikes at time 0, and P ostk spikes at some time s, which is uniformly distributed κ on (0, χ). As the probability of an r-pattern occurring in channel ` is χ , the number of

79 χ expected intervals of length χ between rewards is κ . Since each interval is of length χ ms, χ2 reward ` is presented to the system on average once every κ ms. So, after P ostk spikes at time s, reward ` is expected to be injected into the system at time t, which is drawn

χ2 uniformly from (0, κ ). Then the contribution to the potentiation from channel ` over the possible values of s and t, is

2 ! Z χ 1 Z χ /κ 1 −s/τ+ −t/τE p · 2 A+e e dt ds, (4.18) 0 χ 0 χ /κ with the consideration that a reward is only released with probability p upon an r-pattern in channel `. Evaluating this integral, yields

pκA+τEτ+  2  1 − e−χ /κτE 1 − e−χ/τ+  . (4.19) χ3

This is a positive constant. Under assumptions equations (4.13) and (4.15), c2 is defined as the approximation to equation (4.19):

pκA τ τ c := + E + . (4.20) 2 χ3

Given N reward channels, the combined contribution to the average potentiation trace as hP i = (N − 1)c2 + c1.

Note that in a first order approximation, hDi has no contributions from channel k, as a reward is only released by channel k if there is a coincident spike pair across Synk, in which case the eligibility trace is positive. On the other hand, the contributions to hDi from reward channel ` are computed similarly as above, giving

pκA τ τ − E − = βc . (4.21) χ3 2

80 Figure 4.16: Comparison of the decreasing eligibility traces for the standard experiment and the experiments from sections 4.6.6 and 4.6.7.

Then the combined contribution to the average depression trace is hDi = (N − 1)βc2.

It follows that hEiji = c2(1 − β)N + c3, where c3 := c1 + c2(β − 1) is a positive constant.

Figure 4.16 compares hEiji for the control experiment (using the parameters used throughout most of this chapter) with the parameters used in sections 4.6.6 and 4.6.7, in which the values

for r and τE are adjusted, respectively.

4.8 Discussion

In this chapter, we showed that R-STDP can be extended to learn multiple distal rewards, using the novel ARG-STDP with STP. In [FSG10], the authors describe a plasticity rule called the R-Max plasticity rule [XS04, PTB06, Flo07], which is derived using the principle

81 of reward maximization. This plasticity rule also enables the learning of multiple distal rewards as shown in [FSG10], yet is able to do so precisely because the rule is lacking unsupervised bias. Though [Flo07] painstakingly derives the R-Max rule, in simulations R-STDP is used because, as [Flo07] argues, there is no experimentally justified model that incorporates a neuron’s firing intensity in the plasticity rule as R-Max does. Because of the lack of experimental evidence for the R-Max rule, and R-Max’s deficiency in tasks, this chapter focused on expanding the more frequently used R-STDP rule. Expanding on R-STDP, the ARG-STDP rule is able to demonstrate the capacity to learn multiple distal rewards for a single learning problem, yet maintains an unsupervised bias. Unsupervised bias is an important aspect of unsupervised learning, thus it should not be eliminated entirely from plasticity rules. For instance, leveraging the unsupervised learning bias, [YWW07] demonstrates the consistency of STDP with cortical reorganization.

It should be noted that both R-STDP and R-Max are phenomenological rules with a structure of a local Hebbian rule that is modulated by global reward, so both are plasticity candidates that attempt to fit data curves. However, the data curves are generally measured in vitro, possibly producing physiologically inaccurate results. For instance, plasticity can even depend on a synapse’s location within the dendritic tree [LKS06]. It is clear that R-STDP is not sufficient to reproduce all of the non-linear aspects of synaptic plasticity, however, R-STDP is in partial agreement with corticostriatal plasticity, where long-term efficacies require the activation of D1/D5 receptors [PK08]. Furthermore, as argued in [FSG10], ARG-STDP might be the favored over R-Max because it is the spirit of temporal difference learning [SB98] and ARG-STDP agrees with the wide-spread interpretation of subcortical dopamine signals as a reward-prediction-error [Sch07, Sch10]. On the other hand, [CG10] argue in favor of a voltage dependent plasticity rule. The models available today have limitations, but the techniques in this chapter offer further insights in solving the multiple distal reward problem, independent of the underlying reinforcement model.

[FUS11] also solves the distal reward problem using a variant of the R-Max algorithm. [FUS11] is interesting in that it uses cascading eligibility traces to learn at various timescales.

82 [FUS11] also demonstrates learning that scales with network size. However the work shows that learning speeds up with larger networks rather than demonstrating an increased learning capacity. [FUS11] also argues for a biological basis of multiple eligibility traces. It is of interest to see if cascading eligibility traces can be used in tandem with R-STDP or ARG- STDP and STP with similar results and to see if such extensions can learn multiple distal rewards at varying timescales.

Though this work is inspired by previous work [Flo07, Izh07b, FSG10], there are several differences. While [Izh07b] uses the Izhikevich neuronal model, a LIF neuronal model is used in this work, which has simpler dynamics. Despite the simpler neural dynamics, stable learning of many distal rewards was demonstrated. Also, [Izh07b] does not employ exponen- tial conductance decay in the synaptic input model used, in contrast to this work. That is, [Izh07b] uses the conductance summation equation

` X sp gi (t) = Wij(t)δ(t − tj ), (4.22) j∈Ki in contrast to the biologically verified dynamics employed in equation (2.13).

This work applies similar techniques as in [FSG10] by requiring the success signal across each channel to vanish independent of other channels. However, this work differs from [FSG10] in several ways. Here, an event driven spiking neural network model with conduc- tance based LIF neurons is employed. [FSG10] used the SRM0 neuronal model [GK02]. The model presented in this chapter uses STP to learn more distal rewards, and an analysis was provided to predict an upper bound on the number of distal rewards that can be learned with respect to the parameters used in the simulations. Also, the network configuration used in this chapter is different in that a much larger and more general recurrent pool of neurons is used, which consists of both excitatory and inhibitory neurons. Furthermore, a small system embedded within the background network was responsible for the learning task described here, which is biologically more realistic. In [FSG10] a small, excitatory, strictly feed-forward network is used, in its entirety, for learning.

83 The learning setting differs as well. In this chapter learning is continuous and general in that r-patterns may occur at any given time and may overlap. The r-patterns, in turn, induce rewards over any channel at some delayed time drawn uniformly from an interval. Thus, the rewards may occur in a different order than the r-patterns, and may also overlap. This is in contrast to [FSG10], in which the neural network trains via set trials with a specific r-pattern presented during each trial, followed by reward after a fixed amount of time.

The ARG-STDP learning rule augmented with STP in this chapter enables stable learning of many distal rewards while avoiding potential network instability induced by high firing rates while maintaining a learning bias. ARG-STDP uses a mechanism to attenuate the success signal for each r-pattern individually as a function of the number of times the r-

pattern is presented. As Rk is initialized to zero and converges monotonically to R in the

discussed experiment, the success signal S = R − Rk is always greater than zero. Thus, the unsupervised bias of R-STDP is not removed as [FSG10] claims for a similar algorithm.

In fact, here learning becomes negligible when Rk ≈ R, which is the same point when the rule’s bias is removed. In contrast, [FSG10] argues that the unsupervised bias of R-STDP is removed when S = R − hRi is broadcast, where hRi is computed in a similar manner to

Rk, (however different parameters are used to create a short-memory trial average of the reward). Then the argument is that hSi = hR − hRii = 0, removing the rule’s bias.

Future work would involve exploring the mathematical structure of the STP models, seeking to understand the precise characteristics that enable STP equipped networks to learn multiple distal rewards. Our analysis on the upper bound on the number of distal rewards a network can learn is independent of STP dynamics, so a further investigation into the interactions between reinforcement learning and STP is warranted.

It is also important to realize a spiking domain model that mimics the algorithmic ap- proach of ARG-STDP. Our work focuses on the benefits of such a model, but does not answer the question of how to achieve the requisite reward prediction from purely spiking components.

84 CHAPTER 5

HRL Simulator

5.1 Introduction

Computer modeling of spiking neural networks has an important role to play in the under- standing of brain function. Neurons communicate primarily via action potentials or spikes that are generated by the integration of dendritic synaptic currents caused by spikes from other neurons. There have been several techniques developed to measure neural activity to understand brain function and most approaches are local and limited in their ability to simultaneously measure a large number of neurons [Buz06]. While there have been recent advances in these measurement techniques [WBW09], it is very difficult to simultaneously obtain precise neural data at multiple spatial and temporal scales. The capacity to model biologically plausible models of large scale neural networks in real-time offers an alternate so- lution to improve the understanding of brain function. While these models cannot capture all the details of biology, they operate by abstracting the phenomenology of the various cellular and network functions. By using these models, it is possible to analyze behavioral implica- tions of network dynamics and thus make testable predictions for various network topologies and learning dynamics. These models can also offer a path to develop neuromorphic systems with real-world applications.

There are many difficulties to overcome in the modeling of large-scale neural systems

This chapter is joint work with Kirill Minkovich, Corey Thibeault, Aleksey Nogin, Youngkwon Cho and Narayan Srinivasa and has been submitted to IEEE Transaction on Neural Networks.

85 such as numerically integrating the governing equations for neurons and synapses, as well as efficiently communicating spiking information between neurons. These inherent difficulties can be further compounded by the need for high-performance and real-time simulations. Although the numerical modeling of spiking neural systems appears to be highly parallel, the models generally do not scale linearly with the number of compute elements because of the strong interdependence of the neurons. We present a general framework for large-scale neural simulations that demonstrates higher levels of scaling on general computing architec- tures than the current state-of-the-art. The simulation environment, named HRLSim, was designed for both parallel Central Processing Unit (CPU) architectures and parallel General Purpose Graphics Processing Unit (GPGPU) cluster computers.

The motivation for creating a new neural simulator was driven by the need to support the neuromorphic hardware of the SyNAPSE project [SC12]. A key goal of the project is to implement, in a square cm of CMOS, 106 neurons with 1010 synapses and an average connectivity of 104 synapses per neuron. Recently, as part of this effort, the HRL SyNAPSE team published a compiler for the automatic translation of a given neural architecture into custom neuromorphic hardware [MSC12]. HRLSim was developed to verify the functional performance of large scale spiking models. The verified spiking models are then ported onto hardware using the neuromorphic compiler.

5.1.1 GPGPU Programming with CUDA

GPUs have a large number of single-instruction multiple-data (SIMD) processors capable of efficiently processing huge amounts of data in parallel. In addition, the cost associated in creating clusters of GPUs is considerably lower than CPU based super-computers capable of comparable performance [FQK04]. It is no surprise that they are being exploited for general purpose computing.

In the computational neuroscience community there have been a large number of projects focused on single or dual-GPUs localized to a single compute node. However there are no

86 GPU-cluster based neural simulation environments openly available for studying large scale neural models.

5.1.2 Spiking Neural Simulators

For many researchers the choice of neural simulation environment can be tricky due to a trade-off between biological realism and computational complexity. If access to high- performance computing resources is unavailable, large-scale modeling may be out of reach. In addition, the time investment required for installing and learning a new simulator is a hindrance. With the relatively low cost of GPGPU computing more researchers now have the ability to explore large-scale models. However, the difficulty in adopting a new simulation environment remains.

There are a number of general CPU based simulators that support large-scale neural models. NEURON [HC08, HC97], and GENESIS [BBH02, BB98], are two of the most pop- ular. Both offer CPU versions for single and distributed computer environments. Other simulators include: NEST [GD07], NCS [WGH01, Wil01], and PCSIM [PNS09] (the parallel successor of CSIM [BRC07]). Each of these provides a parallel CPU implementation that is well-suited for many distributed environments. However, they do not yet offer a GPU compatible version that can take advantage of large-scale GPGPU clusters. The lack of GPU support for neural simulation resulted in a number of projects focused on creating gen- eral environments specific to GPU implementations. Nageswaran et al. [NDK09] developed a single GPU spiking neural simulator with a C++ user interface for creating networks. An enhanced version of this single GPU simulator was described in [RND11]. However, both target a single GPU. Thibeault et al. [THH11] presented a proof-of-concept simulator that targeted multiple GPUs within a single computer. They used a method for distribut- ing the simulations on multiple nodes, which was based on a novel spike message passing scheme that represented the neuron states using individual bits. Izhikevich neurons [Izh03] were supported along with features such as conductance based synapses and STDP based plasticity.

87 Similarly, a number of projects have resulted in model-specific GPU implementations. Scorcioni et al. [Sco10] presented a single GPU simulator capable of modeling 100,000 Izhike- vich neurons, with a fanout of 100 randomly connected STDP synapses, in real-time. Tiesel et al. [TM09], created a single planar network of Integrate-and-Fire (I&F) neurons using the OpenGL graphics Application Programming Interface (API). Along the same lines, the work of Taha et al. [HT10], presented a two-layer input-output network specific to image recogni- tion. Igarashi 2011 et al. [ISF11] developed a heterogeneous model of action-selection in the basal ganglia. Two different neuron types are simulated in the model including Izhikevich neurons for the striatum and Leaky Integrate-and-Fire (LIF) with Hodgkin-Huxley channels for the other areas. The simulation was executed on a single CPU-GPU combination in real-time. Richmond et al. [RBG11] presented a model with 2 layers of Integrate-and-Fire neurons with recurrent connections. The resulting code demonstrated a speed-up as high as 42x over a comparable Python implementation. In this case the parallelism of the GPU was exploited for parametric optimization. Fidjeland et al. [FS10] presented results for a single GPU simulation similar to the work of [NDK09]. The system could not simulate as many neurons in real-time but did demonstrate higher throughput, defined as spike arrivals per second.

Yudanov et al. [YSM10] presented a single GPU simulation of Izhikevich neurons inte- grated using an adaptive Parker-Sochacki method. Emphasis was placed on sub-millisecond event tracking and accuracy between CPU and GPU implementations. A speedup of 9x was achieved for a comparable CPU implementation. Nere et al. [NHL11] presented an extension to a learning model of the mammalian neocortex. The simulation abstracted the neural activity up to the level of neocortical minicolumns using a rate-based model. Synap- tic plasticity is only applied to active columns and follows a Hebbian learning rule where the weight matrix between columns is increased if the input is active and decreased if it is not. Simulations were distributed between a single CPU and either a single GPU card or dual GPU cards. The resulting implementation demonstrated a 60x speedup over the single-threaded implementation. De Camargo et al. [DRS11] created a GPU simulation of

88 multi-compartment conductance-based neurons. The test network was comprised of excita- tory pyramidal cells and inhibitory cells. Each neuron contained two channel conductances modeled using Hodkgin-Huxley dynamics. Variations in the number of connections, weights and neural activity were explored resulting in a speed up of 40x in some cases over the serial CPU implementation.

In addition to presenting the first GPU-cluster based spiking neural simulator, this chap- ter presents several novel concepts in neural simulation that increase both performance and scalability. A description of the modeling and programming elements of HRLSim is provided in section 5.2. The design details are presented in section 5.3. Benchmark results for perfor- mance of HRLSim are summarized in section 5.4. In section 5.5, possible improvements to HRLSim are discussed.

5.2 Simulator Description

5.2.0.1 Neural Dynamics

HRLSim currently supports LIF (section 2.1.2) neurons and Izhikevich (section 2.1.3) neu- rons.

5.2.0.2 Synaptic Plasticity

HRLSim currently supports STDP (section 2.2.1), weighted STDP [RBT00] (not discussed in this dissertation), R-STDP (section 4.3.1), ARG-STDP (section 4.3.2), STP (section 2.2.2), and homeostatic [Tur08] (not discussed in this dissertation) plasticities.

5.2.0.3 Transmission Delays

HRLSim currently supports four different transmission delays: 2ms, 6ms, 11ms, and 21ms. This choice was made to be compatible with the neuromorphic hardware being developed under the SyNAPSE project, but can be readily extended to cover more fine grained delays

89 if needed. The cost of extending to more transmission delays are in memory, and a slight speed hit. Simulation speed is slightly reduced in checking which synapses belong to each transmission delay, extra memory management, and the number of variables that must be decayed are increased (however, variable decay has been highly optimized as discussed later). Memory depends on the maximum delay and the total number of possible delays. The spike trains are held in memory until the maximum transmission delay is reached. Also, the memory required for the plasticity updates increases. Notice that in equation (2.19), Dij does not depend on neuron j, and thus only Di = Dij needs to be stored. On the other hand, in equation (2.18), Pij depends on the specific delay ∆ij. So, for the potentiation trace, a distinct copy of the parameter is required for each transmission delay associated with neuron i. This is not overly prohibitive, and HRLSim can easily be configured through preprocessor directives to cope with more transmission delays.

5.2.1 User Network Model Description

5.2.1.1 Model Development (C++/PyNN like Interface)

The simulator consists of two main parts: the neuron simulator and the user experiments. The user experiments are further divided in two parts: an .exp text file and the C++ code. The .exp file is used to provide constants to the C++ file, select simulator options (such as using a LIF or Izhikevich neuron model). The simplest .exp file provides the name of the C++ file and of the base class for the user experiment. More sophisticated experiment designs can leverage the .exp file to efficiently perform parameter sweeps and can be used to specify user-defined preprocessor flags for creating highly generalized user-experiments without touching any C++ code. The network building parts are described in a PyNN style [DBE09] C++ API which enables them to be compiled into the simulator. Each user- defined experiment can provide three types of functions: functions to build the network, functions to inject stimuli into the network and functions to collect statistics about the simulation. These three modules provide a good balance of functionality and complexity

90 needed for simulations and on-line analysis. The simulator interfaces with the user defined experiment through a standardized class interface.

5.2.1.2 C++ Preprocessor and Code Generation

The use of C++ in the user experiment allows for faster network building, input generation, and on-line analysis–all three of which must interface with the main simulator. On the side of the simulator, the C++ compiler allows certain optimizations to be performed such as: loop unrolling, pre-calculation of constants, and unused simulator code removal. This adapts the simulator to the exact network that is being simulated. Using preprocessor flags, which can be set in the .exp file, a highly optimized simulation executable is generated for the specific experiment through the removal of conditional statements and unnecessary computations. These optimizations make a large difference in GPU performance since memory look-ups and branching executions are a common performance bottleneck. The only downside is that with each change to the network, the simulator has to be recompiled, for which the OMake [NH06] build system is used. Consecutive build times are negligible since only the parts that were affected have to be recompiled. Once the binary is built, constructing and starting the simulation for a network with 100K neurons and 10M synapses can be performed in less than 4 seconds.

5.2.1.3 Sample User Experiment

Listing 5.1 shows an example of the C++ API that is used to create a simple 80% excitatory and 20% inhibitory network with 1000 neurons and an average fan-out of 100. Lines 1-10 setup the class definition to allow storage of population data and provide functions which the simulator can use to generate additional inputs. In the constructor (lines 12-25), lines 13-15 generate three populations, lines 17-21 define the synaptic weights and connections between the excitatory and inhibitory populations, lines 23-24 define the synaptic weights and connections from the dummy neurons. The dummy neurons are used for providing inputs

91 into the network from external sources. Lines 27-30 provide the simulator with a method for generating inputs where line 28 sets the input duration in ms (100 in this example) and line 29 injects a single spike in one of the dummy neurons. Because of line 28, the fill_spikes function is called after every 100 ms to generate the next set of inputs to the network. Listing 5.2 is an abreviated public signature of the most relevant components used in defining an experiment. For brevity, the container classes for synapses and neurons (SynapseKind, DummyKind and NeuronKind) have been ommitted, although they just contain the obvious defining attributes, such as: inhibitory/excitatory, plastic/static, synaptic weight, Izhikevich parameters (to define the type of neuron behaviour), etc. The container classes also have member functions to set these values, such as SetInhibitory() and SetFixedOut() to make the neuron inhibitory and make the output synapses a fixed strength, respectively.

5.2.2 Input

There are two ways to provide inputs to the simulator: current injection or spike injection. Spike injection is performed through dummy input neurons which are in turn connected to a set of regular neurons. This allows a single input to be routed to many different neurons. Current injection, on the other hand, is fed directly into the neuron as a floating point addition to the voltage. The single threaded version of the simulator supports both spike and current injection while the distributed version of the simulator only supports spike injection. This restriction is due to two factors. First, to maximize performance the amount of communication had to be reduced. Since current injection uses floating point precision, it would greatly increase the communication requirements in comparison to a spike injection, which can be modeled by a single bit. Second, it would have also required an additional type of message to be exchanged between GPUs thereby further slowing down the communication.

92 1 class SimpleExperiment : public UserExperiment { public : 3 SimpleExperiment(BuildNetwork &buildNet) ; void f i l l spikes (InputGen &input gen ) ; 5 //network information 7 NeuronPopulation E;// excitatory NeuronPopulation I;// inhibitory 9 DummyPopulation D;// dummy(inputs) } ; 11 SimpleExperiment :: SimpleExperiment(BuildNetwork &buildNet) { 13 E = buildNet.NewPopulation(800,NeuronKind()); I = buildNet.NewPopulation(200,NeuronKind() . SetInhibitory() .SetFixedOut()); 15 D = buildNet .NewDummyPopulation(10 ,DummyKind() . SetFixedOut() ) ;

17 UniformGen weight(0.49 ,0.51) ; E−>ConnectionsFixedPostNumber(E, 40, weight); 19 E−>ConnectionsFixedPostNumber(I , 10, weight); I−>ConnectionsFixedPostNumber(E, 40, weight); 21 I−>ConnectionsFixedPostNumber(I , 10, weight);

23 ConstFloatGen inp weight(6.0/G MAX) ; D−>ConnectionsFixedPostNumber(E, 1, inp weight ) ; 25 }

27 void SimpleExperiment :: fill spikes (InputGen &input gen ) { input gen . s e t s p i k e b u f s i z e (100) ; 29 input gen . add spike(0, D, rand()%10); } Listing 5.1: C++ source code to generate a simple 80/20 excitatory / inhibitory network with 1000 neurons and an average fanout of 100.

Lifting this restriction is possible, in general, but at a significant performance hit. Because HRLSim was developed to support spike-based neuromorphic hardware, and because of the performance cost of communicating current injection, the decision was made to only support spike injection on the multi-threaded platform.

Despite only having spike inputs available to multi-threaded simulations, integrating HRLSim with virtual environments, such as CASTLE [PL08] and WEBOTS [Mic98], to the simulator through the UserExperiment module is straightforward. The only requirements for HRLSim to interface with an environment is that the environment models stimuli as spike trains which can be fed into the user module UserExperiment::fill_spikes, and that the environment can receive and interpret spikes, accessible from the Statistics class within

93 class UserExperiment { 2 public :// user generated: virtual void p r i n t extra(Statistics &stats) {} ;// stats 4 virtual void f i l l spikes (InputGen &inputs) {} ;// input spikes virtual void f i l l currents(InputGen &inputs) {} ;// input currents 6 } ;

8 class BuildNetwork { public : 10 // add new populations to network NeuronPopulation NewPopulation( int size , NeuronKind &neuron); 12 DummyPopulation NewDummyPopulation( int s i z e , DummyKind &neuron ) ; } ; 14 // Parent to DummyPopulation/NeuronPopulation 16 class Population { public : 18 //1to1 connections Connections1to1(NeuronPopulation &to , SynapseKind &s); 20 // Uniformly random connections ConnectionsRandom(NeuronPopulation &to , float prob, SynapseKind &s); 22 // Fixed number(n) of post −synapses ConnectionsFixedPostNumber(NeuronPopulation &to , float n, SynapseKind &s); 24 // Create connections froma weight matrix ConnectionsUsingWeightTable(NeuronPopulation &to , float ∗∗ weights ) ; 26 } ;

28 class S t a t i s t i c s { public : 30 void p r i n t s t a t s ( ) ;// print stats // UserExperiment:: print e x t r a can access each of the following 32 // recent stats for populationp, neuronn or synapses float g e t rate(Population &p);// firing rate 34 int g e t sp(Population &p, int n) ;// num spikes float get wt(SynapseId &s);// syn weight 36 } ;

38 class InputGen { public : 40 // Resize the vector tot empty trains: void s e t s p i k e b u f s i z e ( int t ) ;// spikes 42 void s e t c u r r e n t b u f s i z e ( int t ) ;// currents // add spike/curr to neuronn of popp at timet 44 void add spike ( int t , DummyPopulation &p, int n) ; void add current ( int t, NeuronPopulation &p, int n , float curr ) ; 46 // Poisson spike train for neuronn of popp from time t0 to tf int addExpSpikes(DummyPopulation &p, int n , float mean , int t0 , int t f ) ; 48 } ;

Listing 5.2: C++ visible class signatures used in constructing an experiment.

94 Figure 5.1: The simulator modules of HRLSim with all the interactions between them is shown here.

the UserExperiment::print_extra method, and interpret them as motor commands. The statistics and input generation modules can both be extended by the user (see figure 5.1), allowing for easy integration with any virtual environment that fits these criteria. Fur- thermore, the neural models implemented on HRLSim have been successful in using spike injection (rather than current injection) to drive activity.

5.2.3 Analysis

Analysis of the simulations can be completed either while the simulation is running or after the simulation has completed. Analyzing the model during the simulation provides users with the ability to monitor and to some extent refine the model.

95 5.2.3.1 On-line Analysis

The simulator provides the user with hooks for on-line spike analysis of the running sim- ulation. At the end of each (user defined) statistics printing interval, users can access the total number of spikes that occurred during the interval, for each neuron, and the synaptic weight between any two neurons at the end of the interval (see listing 5.2). Online analysis is accomplished through either the default statistics module, or by a custom user-extended statistics module.

5.2.3.2 Off-Line Analysis

Often with larger models the statistics take a considerable amount of time to calculate. In addition, visualization offers better insight into the neural model. An off-line library is provided for efficient analysis of the neural and synaptic outputs of a completed simulations. The library is written in C++ for higher performance but is accessed with Python through the Boost.Python framework [Pyt]. This feature combines the performance of C++ with the ease of visualization and manipulation that Python provides.

The finished simulation results are analyzed by first specifying the binary data files (a complete spike history, synaptic weights at user-defined intervals, and the network config- uration) to read, and the neuron population, with corresponding indices, to extract. The analysis object constructor is called by Python but instantiated in C++ for speed. The created object can then be used to gather statistics with the results stored in C++ STL vectors but treated in Python as list objects. These can then be further manipulated or plotted using available Python utilities.

5.3 Simulator Design

5.3.1 Modular Design

Figure 5.1 shows the complete breakdown of all the simulation modules with the user defined

96 Figure 5.2: Flow charts showing how the communication thread is parallelized with the computation thread. modules shown in a lighter color. A simulation starts by creating the main Process object which in turn builds a Statistics module for logging spikes, printing simulation status, and performing user specified calculations; an Input Generation module for user provided inputs; and a Master Compute module for building the network, splitting up the network and per- forming the simulation. The Master Compute module uses the BuildNetwork module (which provides the user with APIs to construct the network) and the user experiment to build the Network and State modules. The Network module is the immutable portion of the network state, such as connectivity and parameters, whereas the State module is the mutable por- tion of the network state, such as weights and voltages. This distinction allows for efficient reporting as the Network module is only recorded once, at the beginning of the simulation,

97 and the State module is recorded at the end of every stats interval. After generating the network, the Master Compute module then splits the network, building a set of Commu- nication modules (for performing the message passing interface (MPI) communication) and Slave Compute modules (for performing the actual simulation).

Once the simulation is initialized, the Master Compute module is responsible for provid- ing the inputs and parsing the outputs of the neural network by running the user provided analysis and input generation code. The Slave Compute modules are responsible for per- forming the simulation. This distribution of work leaves the Master Compute module to have enough computational resources for performing user defined tasks. Throughout the simula- tion the Master Compute module is continuously writing out 3 files: network state, network spikes, and synaptic weights. This information is only written at user-specified intervals, giving fine control over how much time is dedicated to saving the state. All information is passed to the Master Compute node which then can write the requisite files while the slaves nodes continue with the simulation. This allows the simulator to resume from a saved state and it also enables an offline analysis to be performed. Each of these individual features (spike analysis, weight analysis, state-saving) can be toggled on or off by the user in order to speed up simulations that do not require the data.

5.3.2 Parallelizing Simulation/Communication

To achieve the best performance on the slave nodes the computation has to be parallelized as much as possible with the communication. Figure 5.2 shows how the execution of the Computation module and the Communication module are interleaved. The three gray boxes represent the three parallelized threads: computation, transmitting and receiving. The blue box features the serialized tasks. When a task can be parallelized, it is branched off and executed in one of the threads, with the dotted line denoting the parallelized task’s execution. We define pre-synaptic updates to be the simulation updates that happen upon the arrival of a pre-synaptic action potential. Likewise, post-synaptic updates are those that happen following a post-synaptic action potential. With this terminology, the following steps (with

98 corresponding labels in figure 5.2) are used to maximize the amount of parallelization:

1. The receive thread starts receiving incoming spikes for iteration k

2. The computation thread waits for the post-synaptic updates from iteration k − 1 to finish

3. Integration is performed to generate the outgoing spikes

4. The transmit thread starts transmitting iteration k’s outgoing spikes

5. The pre-synaptic updates are computed

6. The post-synaptic updates are computed for iteration k (which are used by the next iteration)

7. The receive and transmit threads wait for incoming spikes to be received and the outgoing spikes to be sent, synchronizing communication

HRLSim overlaps communication and computation as much as possible. The communi- cation threads are given maximal time to finish while hiding communication latency with computational overhead.

5.3.3 MPI Communication

Efficiently passing spiking messages in a neural simulation environment is a topic of interest to a number of simulation projects [MMG05, PEM07, MCL06]. In many of these simula- tion environments it is argued that the majority of time is spent in processing rather than communication, so optimizing the latter offers little to no performance benefit [MMG05]. Although this argument is valid in the case of traditional distributed architectures this is not true when utilizing high-performance GPGPU’s since the latter has a much greater com- putational throughput than the former. Thus, to differentiate HRLSim’s communication scheme three of its main aspects will be presented: dummy neurons, message packing, and message passing. These efficient concepts allow for a significant increase in speed.

99 Figure 5.3: An example showing how dummy neurons can be used to simplify message passing.

5.3.3.1 Dummy Neurons

Dummy neurons operate like spike relays by simply passing any incoming spikes as outputs without any integration. These neurons are used when a neuron on one computation resource has connections to neurons on other computation resources. An example of this process is shown in figure 5.3. The use of dummy neurons not only provides a mechanism for compressing the amount of data that has to be transferred but also removes the need for extra look-up tables to deal with external efferent connections. The dummy neurons are handled just like regular neurons to reduce code complexity. These are similar to the “proxy nodes” introduced in [PEM07].

5.3.3.2 Message Packing

In addition to reducing the cost of message passing by parallelizing the computation and communication, hybrid message passing provides a guaranteed upper-bound to the amount of data sent regardless of the spike-rate of the network. This is performed using HRLSim’s novel dynamic spike packing, which dynamically switches between the AER (Address-Event Representation) [Boa00] scheme to a bit representation of the outgoing ensemble of neurons. The bit representation scheme essentially encodes the state of these neurons as single bits

100 in an array. A “1” bit indicates that the neuron fired, while a “0” bit indicates it did not. Note that because of the 2 ms refractory period and the voltage integration time step of 1 ms (discussed below), a neuron can only spike once per time bin. Using this scheme, the entire ensemble can be encoded in N/32 integers, where N is the number of neurons in the ensemble. To ensure optimal performance, the simulator only uses this scheme when the activity of the ensemble is high enough that the bit representation is smaller than the AER scheme. Figure 5.4 shows the difference between dynamic spike packing and AER when simulating a network with 5000 outgoing synapses. This method is similar to Thibeault et al. [THH11]. However the method presented in [THH11] encoded the state of the entire network at each iteration rather than just the outgoing connections. In addition, the method in [THH11] did not dynamically switch to the most efficient encoding method. With the dummy neurons described above, the bit representation of the outgoing neurons is sorted based on their outgoing projections. The sorted list enables computation to be performed very efficiently, on GPUs, due to the non-overlapping memory access and, on CPUs, due to localization which reduces the number of cache misses.

5.3.3.3 Message Passing

The MPI [Uni] API facilitates the inter-process communication in HRLSim. During the net- work initialization phase, the network is split and assigned to the Slave Compute modules. Along with the requisite network pieces, each Slave Compute module is also given a post- synaptic target lookup table which details the nodes that the target neurons live on. Then, after the integration part of the compute cycle, spikes are packaged together with respect to the destination node IDs. These spike packages are then sent using the MPI protocol. In using MPI, several different communication schemes were explored and it was determined that for Infiniband-based communication with GPGPU-based computation the non-blocking point-to-point (P2P) methods MPI Isend and MPI Irecv performed best. For the results of this chapter, this is the communication method used. However, HRLSim provides four dif- ferent communication schemes that the user can chose from. The three other communication

101 Figure 5.4: Dynamic spike packing is compared with the AER approach for simulating a network with 5000 outgoing axons. methods are being further analyzed and their results will be reported in [TMS13].

• Blocking P2P: Separate calls to MPI Isend and MPI Recv.

• Non-blocking P2P: Separate calls to MPI Isend and MPI Irecv.

• AlltoAll Collective: Fixed size message buffers but different data is sent to all receiving nodes.

• AlltoAllv Collective: Variable sized data buffers. Relies on the MPI implementation to provide performance optimizations.

102 5.3.4 Simulation

5.3.4.1 Integration/Synaptic Updates

Neural simulations can either be time-based or event-driven. For time-based simulation, a timestep is selected and each component is simulated for that timestep. The simplicity of this method allows for parallelization and scalability. For event-driven simulations, each component is simulated to figure out the time of the next event, which is inserted into a priority queue. Specifically, this method would determine the next time a neuron should fire and only examine that neuron when it is expected to fire. This method can perform very well when simulating a small network with a low firing rate since there is no wasted work during low activity times. The problem with this approach is that it is difficult to parallelize and thus does not scale easily. For these reasons, HRLSim employs time-based simulation with a 1 ms time step. At every time step, the simulator performs a two-step Euler integration followed by synaptic updates, including the STDP and current summation calculations. In the current implementation of HRLSim, the choices of a 1 ms timestep, and the Euler integration method are project specific, made for both speed and hardware compatibility. Recall that HRLSim was developed to verify spiking models which are capable of being ported to the neuromorphic hardware [SC12]. With these choices, interesting neural dynamics have been demonstrated [SC13, OS13], and accuracy as been demonstrated as described in section 5.5. Despite the SyNAPSE project specific choices made here, HRLSim’s high-level architecture does not preclude more sophisticated integration techniques or smaller timesteps.

5.3.4.2 Network Storage/Layout

Using object oriented code makes network graph creation and connectivity simple. However, in using an object oriented build API, it is very common that at the completion of the network build, network data is distributed across non-continuous (scattered) memory. Thus, after the user finishes creating the network graph using HRLSim’s API (see lines 13-24 of listing 5.1, for example), an additional step is performed to flatten the network into a

103 contiguous vector representation. The transformation is shown in figure 5.5 where a network with three neurons is converted into three vectors: a vector showing the number of outputs for each neuron, a vector showing where to start looking in the synaptic connection vector, and a vector listing the post-synaptic connections of each neuron. These three vectors are used together to describe the connectivity of the network. The master node then distributes the flattened network representation and the corresponding synapses to the slave nodes in a

 i  simple ordered way. For N nodes, neuron i is assigned to node N , where b·c is the floor function. In this distribution scheme, neurons that are in populations together are likely to be assigned to the same node. The assumption here is that a population is more highly connected to itself than to other populations. Future work will involve developing a more intelligent splitting scheme.

Though the flattened network is memory efficient, the original representation can be expansive. In benchmarking the limitations of HRLSim, it was occasionally necessary to build and flatten the network on a big-memory node before splitting the representation across the slave nodes. Future work on HRLSim will involve parallelizing the build process across slave nodes both to avoid memory restrictions and to speed up the process.

5.3.4.3 CPU

HRLSim uses a standard CPU implementation with optimized memory storage (as described above). However, the biggest bottleneck of network scaling is the synaptic updates. The following optimizations were benchmarked using the 20K network described in section 5.4.1.

Delayed STDP Computation

One of the slowest parts of the CPU simulation is the STDP [SMA00] updates which are performed at every time step for each synapse. One part of the calculation that takes a large amount of time is the STDP decay of internal variables P (potentiation) and D (depression). This problem is compounded by transmission delays which increases the number of decays

104 Figure 5.5: The conversion from a graph representation of a network to a flattened linear array of the same network.

of P by the number of delay lines. P and D are decayed through the following updates:

D ← D · e−1.0/τ−

P ← P · e−1.0/τ+

where τ+ is the time constant for the LTP (long term potentiation) window and τ− is the time constant for the LTD (long term depression) window both in milliseconds. The interesting thing is that these variables are only needed when a neuron fires. By delaying the calculation until it is needed, the computation can be grouped and optimized. For example, if X is multiplicatively decayed n times by some constant ρ, then the combined decay would be reduced to:

105 X ← X · ρn. (5.1)

Keeping track of n, and using equation (5.1) produces a substantial simulation speed up. Employing this optimization technique to D and P reduced the runtime by 68%.

Look-up Tables

In order to efficiently compute the decay from equation (5.1), ρn is pre-calculated for all n ≤ 512, where 512 was found empirically to be the best tradeoff. For n ≥ 512, ρn was either approximated as zero or, depending on the value of ρ, the computation was broken down using the following decomposition: ρn = ρbn/512c · ρn mod 512, where b·c is the floor function. A Taylor expansion method was also considered, but found to be slower and less accurate. In benchmarking these optimizations. Using a 5-term Taylor expansion produced a 17% speed up over traditional power calculations while using a look-up table with ρn computed for n ≤ 512 resulted in a 30% speed up. Thus, the precomputation method was selected for both speed and accuracy (adding more terms to the Taylor expansion would yield better accuracy, but at the expense of speed, so this is not a viable solution).

These schemes lead to increased memory access because P and D have to be checked to see if they need to be decayed before they are used. This memory access is easily offset by the reduction in computation for CPUs but on GPUs it poses a large slow down in com- putation. Besides the additional memory access, this optimization could result in divergent computation on GPUs which would also affect performance.

5.3.4.4 GPU

Background

The difference between the computations performed on a GPU and CPU stem from the inherent difference in the architectures. The biggest deciding difference is that GPUs have

106 up to 448 cores (NVIDIA Tesla C2075) while CPUs have up to 16 cores (AMD Opteron 2600). The GPU cores are called Scalar Processors (SP), and are divided into Streaming Multiprocessors (SM). Thus, the Tesla C2075 has 448 SPs, divided into SMs. While there are many other important differences such as clock speed, memory bandwidth, and memory structure, the number of cores is what truly separates them. An overview of the CUDA GPU architecture can be found in [NDK09], but some of the terminology and high level ideas will be summarized here.

With hundreds of cores available, every large computation has to be broken down into a set of highly parallelizable tasks. Each parallelizable task is split into a separate function, denoted as a kernel. The idea is that a kernel can be run in mass-parallelism with each parallel thread executing on a separate GPU core and operating on (mostly) different data. If a particular value must be updated by multiple threads, atomic operations are used to eliminate “race conditions”, or incorrect results can occur as a consequence of competing threads attempting to modify the same piece of memory at the same time.

A kernel is concurrently executed by processing threads, which are organized into blocks. A block can have up to 512 threads (1024 for CUDA compute capability 2.0 and above). For programming convenience, thread blocks can be of dimension one, two or three, as long as the number of total threads does not exceed the block size limit. Since a block of threads has a size limit, many blocks are needed to execute a large number of threads. The blocks must be the same size and shape, and are organized in a grid. When a kernel execution is launched, each block will be assigned to an SM that becomes available. Blocks may execute in any order, thus kernels must be written to be correct for any block order execution. In order to minimize processor wait time, threads from the blocks are systematically (in a way to optimize memory access) assigned to groups, called warps. If a particular warp requires an expensive memory access, the warp is swapped out, in favor of another warp that can execute immediately, while the memory is being retrieved. In order to have enough threads to populate a number of warps, multiple blocks are assigned to an SM at a time.

Another capability of modern GPUs is the parallelization of certain GPU operations. In

107 order to do this, one uses what are termed CUDA streams. Each CUDA stream is a queue of GPU operations such as kernel launches and memory copies.

The low number of publicly available GPU-based simulator code suggests that porting existing CPU optimized code to GPUs is difficult. To overcome these differences, the above CUDA features were used to develop a suite of optimizations. These optimizations include: using NN statistics for selecting grid/block size, dynamic kernel selection, kernel paralleliza- tion, integer approximation, memory optimizations, and communication message packing. These optimizations, described below, require CUDA compute capability 1.2 or higher, and allow for the efficient simulator scheduling as depicted in figure 5.2, where each computation block is executed as a kernel.

Adjust grid/block size to network size and expected dynamics

GPUs employ single instruction, multiple data parallelism which means that splitting the data is crucial to achieving the maximum performance on the GPUs. It is common for the GPU code to force the GPU architecture onto the data being processed, but to get the best performance the data must be partitioned to match the GPU architecture. To perform this, the structure of the neural network was used to split the data onto the GPU cores. For example, fan-out information was used to split methods that operate over fan-outs. This information is extracted before the simulation starts and is used to determine the thread grid and block sizes of every kernel launch.

Dynamic method selection based on firing rate

Another way to improve execution time is to align the memory access to the thread execution where consecutive threads access consecutive memory storage. This presents a complication when memory access cannot be easily predicted (e.g., memory access is dependent on firing rate). To address this issue, a dynamic selection algorithm was developed to automatically choose between two methods for post-synaptic updates. Method 1 iterates only on the

108 Figure 5.6: Graph showing the timing breakdown of a GPU simulation with neuron integration in green, pre-synaptic updates in blue, simple post-synaptic updates in red, and optimized post-synaptic updates in purple. neurons that just fired while Method 2 would iterate over all the neurons and only updates the ones that just fired. A benefit to these methods is that the runtime is directly determined by how many neurons fired in the previous iteration. This allows the number of neurons that fired to be the selection criterion for determining which method should be used. Since the selection is dynamic, it can adjust to changing network conditions or code improvements. To determine the initial transition point during network initialization, two firing rates are chosen to which both Methods 1 and 2 are applied. The methods scale linearly as evidenced by figure 5.6 (described below), so a linear fit is constructed for both methods, and the transition point is selected as the intersection of the lines. Once a transition point is selected then there is no overhead of selecting the proper algorithm. If the runtime matches the expected value then no extra work has to happen. If the runtime does not match, the transition point is adjusted by selecting between the two methods based on the lowest runtime.

109 Figure 5.6 shows the runtime breakdown of the main kernels that make up the CUDA computation: integration, pre synaptic updating, and post synaptic updating. “New Post Synaptic Update” is created by dynamically selecting between Method 1 and Method 2. When simulating the 100K example from section 5.4.1 for a 30 virtual seconds it was observed that if only the Method 1 was used then the runtime would be 26 seconds and if Method 2 was used then the runtime would be 167 seconds (due to the majority of iterations having a low firing rate), but if the hybrid post synaptic method is used then the runtime would be 21 seconds.

Kernel parallelization

We observed that for many CUDA function calls, the output was not immediately needed. For example, the output of the synaptic update functions are not needed until the next integration step. Also, many of the memory operations were not required to be completed until the next iteration. To take advantage of these features, HRLSim uses multiple CUDA streams. If separate kernels do not depend on large memory copies, then they can be assigned to separate streams and be executed in parallel. In HRLSim, streams are used to launch all the synaptic update functions in the background and to parallelize as much of the memory access as possible. Note that this feature is extremely important because communication between the CPU and the GPU is slow, often being the main bottleneck in GPU based computing.

Integer approximation

Many of the initial Tesla GPU cards, like the C1060, did not support operations such as floating-point atomic addition operating on 32-bit words in global and shared memory. These operations are crucial in many neural computation steps in HRLSim. Fortunately, there is support for integer atomic operation on the Tesla C1060, and thus many of the floating point operations were converted into integer operations by determining the necessary pre-

110 cision. Although, this resulted in a small functional difference from CPU computation, the alternative of performing the computation serially or in a tree fashion would have been unac- ceptably slow. As HRLSim is currently implemented using atomic integer operations, a card with CUDA compute capability of 1.2 or higher is required. The Tesla C1060 can support CUDA compute capability up to 1.3, though much of the code could be simplified if atomic float operations are available (CUDA compute capability 2.x or higher).

Memory optimization

HRLSim optimizes memory access in two ways. First, all the synapse related data struc- tures are aligned to 128-bits because single instruction memory reads can be up to 128-bits. By preventing overlapping access to memory, the total runtime was reduced by about 1%. Second, all the memory allocations are based on the assumed sizes and not on the maximum size. This allows the memory footprint to be greatly reduced, with the rare penalty that the memory might need to be copied to a larger vector. This optimization was essential for efficiently packing the outgoing spike vectors.

Communication message packing

When comparing the amount of data that is produced by the GPU computation unit, versus a CPU one, the ratio is over 10 times. The GPU computation has to undertake some of the communication complexity in order to prevent the communication thread from being overwhelmed. A special kernel is employed to take the newly generated spikes and pack them into messages to be sent to other nodes.

111 Figure 5.7: (Top) Two 80% excitatory / 20% inhibitory networks connected in a small world fashion. Here 25% means that 25% of all the neurons have outgoing axons that connect to an external network. Since the weight is fixed at zero the exact connectivity of the axons is irrelevant. 1:80, 1:20 and 1:1 indicates a fanout of 80, 20 and 1, respectively. (Bottom) The raster plot of 2000 neurons from one of these networks showing their bursting firing.

5.4 Performance Evaluation

5.4.1 Large-Scale Neural Model

The selection of a neural network architecture has a large effect on how well a simulator can perform. For example, a simple network with a tight Gaussian firing rate distribution is easy to simulate and parallelize since each process would be doing a similar amount of work. On the other hand, a network with uncorrelated bursting dynamics is much harder to

112 simulate since each process will be doing a different amount of work. This differentiation in computational effort bounds the simulation time of each iteration to the slowest simulating resource. For example, if two nodes are simulating a network and they take 0.5 ms and 1 ms, respectively, then the overall runtime (i.e., the time taken to simulate the network) of that iteration would be 1 ms. This property makes it difficult to efficiently scale the simulation of these networks to a large number of nodes since the runtime of each iteration is proportional to the slowest simulation across all the nodes. The bottleneck of any simulation is the slowest (most burdened) node; thus in the following paragraphs we analyze compute time as a function of the number of spikes a node must process in a given time step. For the following discussion, let I be the set of all iterations, or simulation time-steps. For I ∈ I, define the size ||I|| to be the maximum over the number of spikes generated on each node. Then, define the max node of an iteration I to be the node which generated ||I|| spikes during the iteration. If there is a tie, then the slowest of the tied nodes is the max node.

To create these bursting networks, each node simulates a set of 80/20 excitatory/in- hibitory networks. Each node simulates either 20K or 100K neurons where 25% are exter- nally connected to the other nodes, creating small-world connectivity [ASW06]. Thus, a network with 100K neurons would have 25K outgoing axons. Table 5.1 shows the parame- ters used in the model for these benchmarks. Figure 5.7 shows how two of these networks would be connected and it also shows burst firing in one of the two networks. In order to keep the firing rates similar, the connections between the networks were set to a fixed weight of zero, which still resulted in communication but kept the firing rates of the networks independent. When examining the distribution of {||I|| : I ∈ I}, shown in figure 5.8 (E), figure 5.9 (E), and figure 5.10 (E), there is a clear trend that when more nodes are being used, then Prob (||I|| > k) increases for any positive k. For this reason, this specific type of network is the slowest to simulate and thus was chosen as a worst-case scenario for evaluating HRLSim.

This benchmark was performed on a cluster of 92 compute nodes, each with two Intel Xeon E5520 2.27GHz CPUs (4 dual thread cores per card) and two NVIDIA Tesla C1060

113 All Networks

Cm = 200 pF gleak = 10 nS

Einh = −80 mV Eexc = 0 mV

Vthresh = −54 mV Vreset = −60 mV

Erest = −74 mV fanout = 100

τexc = 5 ms τinh = 100 ms

A+ = .025 A− = .0265

τ+ = 20 ms τ− = 20 ms

100K Neurons per Node Networks

exc inh gmax = 6.02 nS gmax = 6.02 nS

20K Neurons per Node Networks

exc inh gmax = 7 nS gmax = 7 nS

Table 5.1: Network parameters used in the benchmarks.

114 Figure 5.8: GPU Results, 100,000 neurons per node. (A), (B), and (C) show the runtime distribution, with respect to ||I||, on 4, 16 and 64 nodes (400K, 1,600K, and 6,400K total neurons), respectively. (D) shows the total-time linear regression for plots (A), (B), and (C). (E) shows the histogram of ||I|| over I ∈ I. (F) shows how the runtime scales to network size.

cards, with an Infiniband communication backend. Each slave node has 12 GB of memory, where the head node has 47 GB of memory. Note that in the following simulations, it was necessary to build the networks on the head node because of memory constraints (see section 5.3.4.2). Despite the limitations imposed by the head node, the entire simulation on each slave node required less than 300MB of memory (for 100K neurons with a fanout of 100).

5.4.2 GPU Performance

Using the C1060 card, HRLSim can simulate a 110K neuron and 11M synapse network for 100 virtual seconds in 99 seconds. The following benchmarks are all simulated for 100 virtual seconds, and the results are measured in wall-time, or real-time elapsed. For the benchmarks, Master is used for measuring the wall-time of a system wide iteration. Each node measures the time it takes to complete a single iteration and sync with Master, as

115 Figure 5.9: GPU Results, 20,000 neurons per node. (A), (B), and (C) show the runtime distribution, with respect to ||I||, on 4, 16 and 64 nodes (80K, 320K, and 1,280K total neurons), respectively. (D) shows the total-time linear regression for plots (A), (B), and (C). (E) shows the histogram of ||I|| over I ∈ I. (F) shows how the runtime scales to network size. well as the time required for the computation part of the iteration. Each node then records its communication time as its iteration time minus its computation time. Because of the communication with Master, all nodes sync to Master at every iteration.

To test how HRLSim with CUDA computation scales, with respect to network size, it was tested with 2, 4, 8, 16, 32, and 64 GPU cards where each GPU card either simulated a 100K (figure 5.8) or a 20K (figure 5.9) spiking neuron network. For brevity, only the results for 4, 16 and 64 GPU cards are presented, though the others are similar. Plots (A), (B), and (C) from figures 5.8 and 5.9 show: the total wall time (green) to execute a system-wide iteration, I; the wall time required for communication (blue) on the max node; and the wall time required to execute iteration I on the max node as a function of ||I||. Notice that the total iteration time (green) trends as the sum of the almost constant max node communication time (blue) and the wall time of the max node (red). Communication roughly remains constant because of two factors: HRLSim’s message packing creates a hard

116 limit on the amount of data to be transmitted and HRLSim is able to efficiently parallelize the communication and the simulation. The communication of the max node is hidden by the threading, as well as the fact that the max node does not have to wait for the faster nodes (however, the faster nodes have a higher communication cost because they must wait for the slower nodes). All these results were tightly grouped except for figure 5.9 (C) where there seems to be two trend lines. Upon further investigation it was observed that the simulation runtime had two main contributors: integration/synaptic updates and message packing. The problem is that for the 64 node system, during low firing activity, it had two distinct operating states depending on how many outgoing messages needed to be packed. For low firing rate, packing the outgoing messages using the AER method (section 5.3.3.2) took almost as much time as everything else. As the firing rate increases, the message packing penalty becomes constant due to the bit representation, and the integration and synaptic updating dominated the simulation time. This phenomenon was only seen here because of the unusual combination of small network (20K) and high external connectivity.

Figure 5.8(D) and figure 5.9(D) are side-by-side comparisons of the linear regressions of the total simulation time for each network size. Figure 5.8(E) and figure 5.9(E) are the histograms depicting Prob (||I|| = k) for all observed iteration sizes k. Increasing the number of slave nodes used produces a greater probability that one of them will run slowly relative to the other nodes. Notice the vertical translation of the iteration trend lines in figure 5.8(D) as the number of nodes increases. This is due to the increased number of spikes needing to be processed on each node. For example, with 64 nodes, and an iteration with ||I|| = 8000, the max node produced 8000 spikes, but there are other nodes that are also close to this threshold. With 64 nodes available, it is much more likely that there exists a node in the bursting regime, as seen in figure 5.8. Since more nodes are bursting, more spikes are being produced, increasing the number of spikes in the system. In turn, the number of spikes that any given node must process increases. So, whereas the max node only generates 8000 spikes, the number of incoming spikes that it must process will increase, increasing the amount of time required for packing/unpacking and processing the spikes. This is more apparent in

117 GPU performance due to the extra costs of memory copying.

An interesting result was obtained when total simulation runtime was monitored as a function of network scale (as (F) in figures 5.8 and 5.9 shows). The 20K network shows almost linear scaling but the 100K network flattens out. This flattening out suggests that as long as there is a slave node taking a significant amount of computation time the communication latency will be hidden due to threading, taking no additional time. As more nodes are added to the network, it becomes more likely that a node is producing a large number of spikes (bursting), as demonstrated by (E) in figures 5.8 and 5.9. The saturation of slow nodes across the iterations dominates the simulation time, giving the flattening in the 100K networks. In the 20K networks, we fully expect this sort of flattening as more data points are included. The 20K networks do not burst as readily as the 100K networks, thus the flattening is delayed. These results demonstrate that HRLSim’s computation time degrades gracefully (linearly or better) with this type of scaling.

In the above analysis, it was determined that HRLSim scales well with network size. To determine how HRLSim scales with respect to a network of fixed size, a 1.6 million neuron network was divided using 16 and 80 nodes, corresponding to 100K and 20K neurons per node respectively. The simulation took 174 seconds using 16 nodes and 148 seconds using 80 nodes. Although this suggests that using more nodes is more efficient, this result does not extend to 160 nodes (10K per node) where even the 16 node simulation performs much better due to a reduction in the number of nodes communicating.

5.4.3 CPU Performance

To evaluate the CPU simulation core, it is necessary to determine the largest network that it can simulate in real time. Thus, it was determined that HRLSim’s CPU core can simulate a network with 20K neurons and 2M synapses firing at 10Hz for 100 virtual seconds in 98 real seconds. This limit of real-time simulation is why the 20K network was selected for the scalability test and each simulation is simulated for 100 virtual seconds as in the

118 Figure 5.10: CPU Results, 20,000 neurons per node. (A), (B), and (C) show the runtime distribution, with respect to ||I||, on 16, 64 and 128 nodes (320K, 1,280K, and 2,560K total neurons), respectively. (D) shows the total-time linear regression for plots (A), (B), and (C). (E) shows the histogram of ||I|| over I ∈ I. (F) shows how the runtime scales to network size.

GPU case. To test the scalability of HRLSim using CPU computation, it was tested on 8, 16, 32, 64, 128, and 256 CPU cores where each core simulated a 20K neuron network (figure 5.10). Plots (A), (B), and (C) from figure 5.10 summarize the results for 16, 64, and 128 cores respectively. The figures show that starting with 128 cores, just the empty MPI synchronization calls account for a significant amount of runtime, and with 256 cores communication time dominates until when about 5% of the network starts firing (not shown). Even with these high communication costs, simulation time eventually dominates and allows the communication latency to be mostly hidden, adding almost zero time to the iteration. Examining the trend lines corresponding to the total iteration time, figure 5.10 (D) shows that as the number of nodes increases so does the initial cost, but the slope also decreases. Also, as with the CUDA simulation, there is some spiking imbalance when the number of nodes is increased as Prob (||I|| > k) increases with the number of nodes (figure 5.10 (E)). Overall, the CPU scaling simulation data as shown in figure 5.10 (F), has a favorable linear relationship between the network size and runtime.

119 100 Fanout 10K Fanout

80K Neurons 8M Synapses 80K Neurons 800M Synapses

Master .413 GB Master 41.2 GB

Slave .0892 GB Slave 3.96 GB

Table 5.2: Memory consumed by HRLSim during simulation for the 20K neuron network topology described in section 5.4.1. The left column reports memory for the 100 fanout networks used in the benchmarks throughout, and the right column reports memory for a more realistic situation of a 10K fanout.

5.4.4 Network Splitting

In the benchmarks above, the networks were split based on the small-world network topolo- gies, where tightly connected components were assigned to the same computation resource. This corresponds to the optimal way a network can be split but for most other networks such a simple partition is not feasible. In order to test the worst split possible, each neuron was assigned to a random computation resource. When looking at the 128 node network with 20K neurons per CPU core, the number of outgoing axons on each node increased over 400x (from 3K to 1385K) but the simulation time only increased 23% (from 790 seconds to 970 seconds). This implies that the communication and simulation threading was working correctly and the message packing scheme prevented the network from getting saturated.

5.4.5 Memory Consumption

Memory requirements are often a bottleneck when parallelizing large-scale neural simulations across many nodes. Kunkel et al. [KPE12] thoroughly examine the memory requirements of NEST for simulating large-scale networks across many processes. Before a thorough investigation into HRLSim’s memory requirements, HRLSim’s build process needs to be parallelized. Since the build process is all done on Master, the memory of Master (47 GB)

120 limits the size of networks that can be built. Furthermore, HRLSim currently stores a redundant copy of the network on the Master node, even after the split and distribution of the network to the slaves. This allows Master to print and compute statistics, and to record spike and weight histories and the network state. This redundant memory counts against HRLSim in a full memory analysis. After HRLSim’s build and data recording processes are parallelized, a full memory analysis, in the spirit of [KPE12], will be done for both CPU and GPU in future versions of the simulator. Here, we report the memory consumption of the 20K CPU experiment from section 5.4.1. We also considered a more realistic network where the fanout is 10K, rather than 100, but everything else is kept the same. Table 5.2 contains the memory used by both Master and the slave nodes, where 4 slaves are used. Note that the memory is reported for the actual simulation, after the network representation has been flattened. We used only 4 slaves nodes (80K neurons) because building the 10K connectivity experiment required 47.9 GB of memory on Master. Thus, more than 4 nodes was not considered.

5.5 Discussion

While the neural models simulated in section 5.4 are the simplest that would convey the scalability of HRLSim, many other networks have been simulated by the SyNAPSE team members including HRL Laboratories, Boston University, George Mason University, Uni- versity of California at Irvine, and The Institute. In addition to the testing performed by the team members, the functionality of the simulator was verified against Richert et al. (2011) [RND11] by checking that an 80/20 Izhikevich network had the same firing dynamics on both simulators. This implies that the simulator is accurate.

The original motivation for the HRLSim simulator was to perform simulation of neural networks that would be implemented on SyNAPSE [SC12] hardware. During the develop- ment process it has evolved into much more than a simulator of a specific neural hardware. It can now be used for evaluating novel neural dynamics, additional biological features, and

121 specialized computational techniques. However, the trend in neural simulation, and software in general, is that the more extensible the design the more the performance suffers.

GPU simulators, including this one, currently lack a level of scalability that most devel- opers would desire. In addition, almost all of them are tied to a specific hardware platform. Adding new features can be tedious and difficult to implement, while porting these im- plementations to new hardware is almost completely unreasonable. The trade-off between performance and scalability is a choice that either complicates the design or restrict the en- vironment to specific hardware and features. Projects like OpenCL [Khr] can abstract away some of the hardware dependence but that comes at a cost in performance. Additionally, template based designs can improve extensibility but that brings an associated increase in code complexity and again, can introduce decreased performance. The need for simulator performance is in direct conflict with the need for extensibility.

HRLSim requires high-performance to support the SyNAPSE project and its user base. However, the compiled nature of HRLSim does allow for future extensions without a negative impact on the current performance. This ensures that supporting the needs of future users will not affect the current users but the complexity of implementing those extensions varies.

In addition to the features the simulator currently support, we plan on adding additional features such as: a PyNN interface to simplify importing existing models, additional visu- alization tools, parallel network building capabilities, parallel statistics calculations, parallel weight/spike history file generation, and a simple API to specify hardware restrictions needed to simulate custom architectures such as Neurogrid [MAS07] and FACETS [SGM06]. These additional features and a complete analysis of the communication [TMS13] will be discussed in future publications.

5.6 Conclusion

A novel simulator environment, HRLSim, was described for modeling of very large scale spiking neural models. The design of the simulator code as well as the communication

122 was described. Using small-world network topologies, various performance measures were measured and analyzed using a GPGPU cluster. The performance shows that HRLSim offers a real-time simulation tool to study very large scale spiking neural models. Furthermore, with GPGPU cards becoming more affordable combined with higher performance for each card, this enables a scalable solution to support the design and analysis of very large spiking neural models based on HRLSim.

123 CHAPTER 6

Conclusion

In this thesis, the role of STP in self-sustained spiking activity and signal propagation was assessed within a spiking neural network. We used STP to create two networks, one that could generate self-sustained activity and another that could faithfully propagate signal. In weakly coupling these networks, we produced a global network capable of doing both of these fundamental tasks. The relationship is symbiotic in that one network maintains global activity, while the second network is responsible for faithful signal propagation. We de- rived a characterization that aids in finding STP parameters for the self-sustaining network, boosting the probability of finding such parameters by two orders of magnitude. Despite this characterization, it is still difficult to find the STP parameters needed as we were only able to boost the success rate to 3.2%. Future work would focus on finding a more complete characterization of the STP parameter domain space to ensure a high degree of success in finding networks that are self-sustaining.

We also developed the ARG-STDP algorithm, a novel plasticity rule capable of learning multiple distal rewards for the single synapse reinforcement learning problem. We analyzed the model for learning stability and the capacity for learning multiple distal rewards. In addition, we formulated a theoretical limitation for the learning capacity of the ARG-STDP rule. While ARG-STDP is limited in its learning capacity, we showed that combining it with STP enables an improved learning capacity. Future work will involve converting the algorithmic critic approximation in ARG-STDP to a biologically plausible, online critic that can be realized in the spiking domain. We will also consider more realistic problems that involve larger neuron population learning tasks as opposed to the single-synapse learning

124 task studied here.

We concluded our work with a description of a novel simulator environment. We described a simulator design, along with communication specifications, suitable for implementation on a cluster of GPGPUs. The simulator architecture was designed for the modeling of very large-scale spiking neural models. Using small-world network topologies, we performed various benchmark measures. The simulator’s performance on the benchmarks shows that the architecture offers a real-time simulation tool to study very large-scale spiking neural models. Furthermore, with GPGPU cards becoming more affordable, combined with the high performance of each card, our architecture enables a scalable solution to support the design and analysis of very large spiking neural models, making efficient large-scale simulation environments attainable with limited resources.

125 References

[AB97] D J Amit and Nicolas Brunel. “Model of global spontaneous activity and local structured activity during delay periods in the cerebral cortex.” Cerebral Cortex, 7(3):237–252, 1997.

[Abb99] Larry F. Abbott. “Lapicque’s introduction of the integrate-and-fire model neuron (1907).” Brain Research Bulletin, 50(5-6):303–304, 1999.

[AL01] David Attwell and Simon B. Laughlin. “An Energy Budget for Signaling in the Grey Matter of the Brain.” Journal of Cerebral Blood Flow and Metabolism, 21(10):1133–1145, October 2001.

[ALG00] J S Anderson, I Lampl, D C Gillespie, and D Ferster. “The contribution of noise to contrast invariance of orientation tuning in cat visual cortex.” Science, 290(5498):1968–1972, 2000.

[ASG96] A Arieli, A Sterkin, A Grinvald, and Ad Aertsen. “Dynamics of ongoing activity: explanation of the large variability in evoked cortical responses [see comments].” Science, 273(5283):1868–1871, 1996.

[ASW06] Sophie Achard, Raymond Salvador, Brandon Whitcher, John Suckling, and Ed Bullmore. “A Resilient, Low-Frequency, Small-World Human Brain Func- tional Network with Highly Connected Association Cortical Hubs.” Journal of Neuroscience, 26(1):63–72, 2006.

[BB98] James M Bower and David Beeman. The Book of GENESIS: Exploring Realistic Neural Models with the GEneral NEural SImulation System. Springer-Verlag, New York, 2 edition, 1998.

[BBH02] James M Bower, David Beeman, and Mike Huck. “GENESIS Simulation Sys- tem.” In The Handbook of Brain Theory, pp. 475–478. MIT Press, Cambridge, MA, 2002.

[BL73] Tim V. Bliss and Terje Lø mo. “Long-Lasting Potentiation of Synaptic Trans- mission in the Dentate Area of the Anaesthetized Rabbit Following Stimulation of the Perforant Path.” The Journal of Physiology, 232(2):331–356, 1973.

[Boa00] Kwabena A Boahen. “Point-to-point connectivity between neuromorphic chips using address events.” Ieee Transactions On Circuits And Systems Ii Analog And Digital Signal Processing, 47(5):416–434, 2000.

[BQH10] Laurence C. Jayet Bray, Mathias Quoy, Frederick C. Harris, and Philip H. Good- man. “A Circuit-Level Model of Hippocampal Place Field Dynamics Modulated by Entorhinal Grid and Suppression-Generating Cells.” Frontiers in Neural Cir- cuits, 4(November):1–12, January 2010.

126 [BRC07] Romain Brette, Michelle Rudolph, N. Ted Carnevale, Michael L. Hines, David Beeman, James M Bower, Markus Diesmann, Abigail Morrison, Philip H. Good- man, Frederick C. Harris, Milind Zirpe, Thomas Natschl¨ager,Dejan Pecevski, Bard Ermentrout, Mikael Djurfeldt, Anders Lansner, Olivier Rochel, Thierry Vieville, Eilif Muller, Andrew P Davison, Sami El Boustani, and Alain Destexhe. “Simulation of Networks of Spiking Neurons : A Review of Tools and Strategies.” Journal of Computational Neuroscience, 23:349–398, 2007. [Bru00] Nicolas Brunel. “Dynamics of networks of randomly connected excitatory and inhibitory spiking neurons.” Journal Of Physiology Paris, 94(5-6):445–463, 2000. [BSA83] Andrew G. Barto, Richard S. Sutton, and Charles William Anderson. “Neuron- like Adaptive Elements That Can Solve Difficult Learning Control Problems.” 13(5):834–846, 1983. [BSB96] Thomas Brashers-Krug, Reza Shadmehr, and Emilio Bizzi. “Consolidation in Human Motor Memory.” Nature, 382(18):252–255, 1996. [Buz06] Gy¨orgyBuzs´aki. Rhythms of the Brain, volume 54. Oxford University Press, 2006. [BW76] B. Delisle Burns and A. C. Webb. “The Spontaneous Activity of Neurones in the Cats Visual Cortex.” Proceedings of the Royal Society of London, Series B Biological Sciences, 194:211–223, 1976. [BZ07] Helen Barbas and Basilis Zikopoulos. “The Prefrontal Cortex and Flexible Be- havior.” The a review journal bringing neurobiology neurology and psychiatry, 13(5):532–45, 2007. [CG10] Claudia Clopath and Wulfram Gerstner. “Voltage and Spike Timing Interact in STDP - A Unified Model.” Frontiers in Synaptic Neuroscience, 2(25):11, January 2010. [DA01] Peter Dayan and Larry F. Abbott. Theoretical Neuroscience: Computational and Mathematical Modeling of Neural Systems, volume 39 of Computational neuro- science. MIT Press, Cambridge, 2001. [DBE09] Andrew P Davison, Daniel Br¨uderle,Jochen M. Eppler, Jens Kremkow, Eilif Muller, Dejan Pecevski, Laurent Perrinet, and Pierre Yger. “PyNN: A Com- mon Interface for Neuronal Network Simulators.” Frontiers in neuroinformatics, 2(January):10, 2009. [Dea81] A F Dean. “The variability of discharge of simple cells in the cat striate cortex.” Experimental Brain Research, 44(4):437–440, 1981. [DRS11] Raphael Y De Camargo, Luiz Rozante, and Siang W Song. “A multi-GPU algo- rithm for large-scale neuronal networks.” Concurrency and ComputationPractice Experience, 23(6):556–572, 2011.

127 [Flo07] Rzvan V. Florian. “Reinforcement Learning Through Modulation of Spike- Timing-Dependent Synaptic Plasticity.” Neural Computation, 19(6):1468–1502, 2007.

[FQK04] Zhe Fan, Feng Qiu, Arie Kaufman, and Suzanne Yoakum-stover. “GPU Cluster for High Performance Computing.” In IEEE Supercomputing, volume 00, p. 47, 2004.

[FS10] A K Fidjeland and Murray P. Shanahan. “Accelerated simulation of spiking neural networks using GPUs.”, 2010.

[FSG10] Nicolas Fr´emaux,Henning Sprekeler, and Wulfram Gerstner. “Functional Re- quirements for Reward-Modulated Spike-Timing-Dependent Plasticity.” The Journal of Neuroscience, 30(40):13326–13337, 2010.

[Fus08] Joaqu´ınM. Fuster. The Prefrontal Cortex, volume 1. Academic Press, 2008.

[FUS11] Johannes Friedrich, Robert Urbanczik, and Walter Senn. “Spatio-Temporal Credit Assignment in Neuronal Population Learning.” PLoS Computational Bi- ology, 7(6):1–13, June 2011.

[GD07] Marc-Oliver Gewaltig and Markus Diesmann. “NEST (NEural Simulation Tool).” Scholarpedia, 2(4):1430, 2007.

[GK02] Wulfram Gerstner and Werner M Kistler. Spiking Neuron Models, volume 66. Cambridge University Press, 2002.

[GTJ00] Hirac Gurden, Masatoshi Takita, and Th´er`eseM. Jay. “Essential Role of D1 But Not D2 Receptors in the NMDA Receptor- Dependent Long-Term Potentiation at HippocampalPrefrontal Cortex Synapses In Vivo.” The Journal of Neuroscience, 20(22):RC106, November 2000.

[HC97] Michael L. Hines and N. Ted Carnevale. “The NEURON simulation environ- ment.” Neural Computation, 9(6):1179–1209, 1997.

[HC08] Michael L. Hines and N. Ted Carnevale. “Translating Network Models to Paral- lel Hardware in NEURON.” Journal of Neuroscience Methods, 169(2):425–455, 2008.

[HDB95] James C Houk, Joel L Davis, and David G Beiser. Models of Information Pro- cessing in the Basal Ganglia. MIT Press, 1995.

[Heb49] D O Hebb. The Organization of Behavior: A Neuropsychological Theory, vol- ume 44 of A Wiley book in clinical psychology. Wiley, 1949.

[HFO10] Thomas E Hazy, Michael J Frank, and Randall C O’Reilly. “Neural Mecha- nisms of Acquired Phasic Dopamine Responses in Learning.” Neuroscience and Biobehavioral Reviews, 34:701–720, April 2010.

128 [HH52] A L Hodgkin and A F Huxley. “A quantitative description of membrane cur- rent and its application to conduction and excitation in nerve.” The Journal of Physiology, 117(4):500–544, 1952.

[HSK96] G R Holt, W R Softky, C. Koch, and R J Douglas. “Comparison of discharge variability in vitro and in vivo in cat visual cortex neurons.” Journal of Neuro- physiology, 75(5):1806–1814, 1996.

[HT10] Bing Han and Tarek M Taha. “Acceleration of spiking neural network based pat- tern recognition on NVIDIA graphics processors.” Applied Optics, 49(10):B83– 91, 2010.

[Hul43] C L Hull. Principles of behavior. Appleton-Century-Crofts, 1943.

[ISF11] Jun Igarashi, Osamu Shouno, Tomoki Fukai, and Hiroshi Tsujino. “Real-time simulation of a spiking neural network model of the basal ganglia circuitry us- ing general purpose computing on graphics processing units.” Neural Networks, 24(9):950–60, 2011.

[Izh03] Eugene M Izhikevich. “Simple model of spiking neurons.” IEEE Transactions on Neural Networks, 14(6):1569–1572, January 2003.

[Izh04] Eugene M Izhikevich. “Which model to use for cortical spiking neurons?” IEEE Transactions on Neural Networks, 15(5):1063–1070, September 2004.

[Izh07a] Eugene M Izhikevich. Dynamical Systems in Neuroscience. Computational Neu- roscience. The MIT press, 2007.

[Izh07b] Eugene M Izhikevich. “Solving the Distal Reward Problem through Linkage of STDP and Dopamine Signaling.” Cerebral Cortex, 17:2443–2452, 2007.

[Khr] Khronos. “OpenCL: The Open Standard for Parallel Programming of Heteroge- neous Systems.”.

[KPE12] Susanne Kunkel, Tobias C Potjans, Jochen M. Eppler, Hans Ekkehard Plesser, Abigail Morrison, and Markus Diesmann. “Meeting the Memory Challenges of Brain-Scale Network Simulation.” Frontiers in neuroinformatics, 5(January):1– 15, 2012.

[KRA10] Arvind Kumar, Stefan Rotter, and Ad Aertsen. “Spiking Activity Propagation in Neuronal Networks: Reconciling Different Perspectives on Neural Coding.” Nature Reviews Neuroscience, 11(9):615–627, 2010.

[KSA08] Arvind Kumar, Sven Schrader, Ad Aertsen, and Stefan Rotter. “The High- Conductance State of Cortical Networks.” Neural Computation, 20:1–43, 2008.

[LAH04] A. Lerchner, M. Ahmadi, and J. Hertz. “High-Conductance States in a Mean- Field Cortical Network Model.” Neurocomputing, 58(60):935–940, 2004.

129 [LAS91] Tomas Ljungberg, Paul Apicella, and Wolfram Schultz. “Responses of monkey midbrain dopamine neurons during delayed alternation performance.” Brain Re- search, 567:337–341, 1991.

[Len03] Peter Lennie. “The Cost of Cortical Computation.” Current Biology, 13:493– 497, 2003.

[LKS06] Johannes J Letzkus, Bj¨ornM Kampa, and Greg J Stuart. “Learning Rules for Spike Timing-Dependent Plasticity Depend on Dendritic Synapse Location.” The Journal of Neuroscience, 26(41):10420–10429, October 2006.

[Lo03] Terje Lø mo. “The Discovery of Long-Term Potentiation.” Philosophical Transactions of the Royal Society of London - Series B: Biological Sciences, 358(1432):617–620, 2003.

[MAS07] Paul A Merolla, John V Arthur, Bertram E Shi, and Kwabena A Boahen. “Ex- pandable Networks for Neuromorphic Chips.”, 2007.

[MBT08] Gianluigi Mongillo, Omri Barak, and Misha Tsodyks. “Synaptic Theory of Work- ing Memory.” Science, 319:1543–1546, 2008.

[MCL06] M Migliore, C Cannia, W W Lytton, Henry Markram, and Michael L. Hines. “Parallel network simulations with NEURON.” Journal of Computational Neu- roscience, 21(2):119–129, 2006.

[MDG08] Abigail Morrison, Markus Diesmann, and Wulfram Gerstner. “Phenomenological models of synaptic plasticity based on spike timing.” Biological Cybernetics, 98:459–478, 2008.

[MHK03] Carsten Mehring, Ulrich Hehl, Masayoshi Kubo, Markus Diesmann, and Ad Aert- sen. “Activity dynamics and propagation of synchronous spiking in locally con- nected random networks.” Biological Cybernetics, 88(5):395–408, 2003.

[Mic98] Olivier Michel. “Webots: Symbiosis between virtual and real mobile robots.” Virtual Worlds, 1434:254–263, 1998.

[Min61] Marvin Minsky. “Steps Toward Artificial Intelligence.” In Edward A. Feigenbaum and Julian Feldman, editors, Computers and Thought, volume 49, pp. 406–450. McGraw-Hill, New York, 1961.

[MLF97] Henry Markram, Joachim L¨ubke, Michael Frotscher, and . “Reg- ulation of Synaptic Efficacy by Coincidence of Postsynaptic APs and EPSPs.” Science, 275:213–215, January 1997.

[MM02] Wolfgang Maass and Henry Markram. “Synapses as dynamic memory buffers.” Neural Networks, 15:155–161, 2002.

130 [MMG05] Abigail Morrison, Carsten Mehring, Theo Geisel, Ad Aertsen, and Markus Dies- mann. “Advancing the boundaries of high-connectivity network simulation with distributed computing.” Neural Computation, 17(8):1776–1801, 2005.

[MSC12] Kirill Minkovich, Narayan Srinivasa, and Jose M Cruz-Albrecht. “Program- ming Time-Multiplexed Reconfigurable Hardware Using a Scalable Neuromorphic Compiler.” IEEE Transactions on Neural Networks, 23(6):889–901, 2012.

[MWT98] Henry Markram, Yun Wang, and Misha Tsodyks. “Differential Signaling via the Same Axon of Neocortical Pyramidal Neurons.” Neurobiology, 95:5323–5328, 1998.

[NDK09] Jayram Moorkanikara Nageswaran, Nikil Dutt, Jeffrey L Krichmar, Alex Nicolau, and Alexander V Veidenbaum. “A configurable simulation environment for the efficient simulation of large-scale spiking neural networks on graphics processors.” Neural Networks, 22(5-6):791–800, 2009.

[NH06] Aleksey Nogin and Jason Hickey. “Omake: Designing a Scalable Build Process.” In Luciano Baresi and Reiko Heckel, editors, Fundamental Approaches to Software Engineering, pp. 63–78. Springer-Verlag, 2006.

[NHL11] Andrew Nere, Atif Hashmi, and Mikko Lipasti. “Profiling Heterogeneous Multi- GPU Systems to Accelerate Cortically Inspired Learning Algorithms.”, 2011.

[NR98] Mark Nelson and John Rinzel. “The Hodgkin-Huxley Model.” In The Book of GENESIS: Exploring Realistic Neural Models with the GEneral NEural SImula- tion System, chapter 4, pp. 29–50. Springer-Verlag, 2 edition, 1998.

[OS13] Michael J O’Brien and Narayan Srinivasa. “A spiking neural model for stable re- inforcement of synapses based on multiple distal rewards.” Neural Computation, 25(1):123–156, January 2013.

[Pav27] Ivan Petrovich Pavlov. Conditioned Reflexes. Oxford University Press, 1927.

[PEM07] Hans Ekkehard Plesser, Jochen M. Eppler, Abigail Morrison, Markus Diesmann, and Marc-Oliver Gewaltig. “Efficient Parallel Simulation of Large-Scale Neuronal Networks on Clusters of Multiprocessor Computers.” In A.-M. Kermarrec and L. Boug´e,editors, Euro Parallel Processing, Heidelberg, 2007. Springer Berlin.

[PK08] Verena Pawlak and Jason N D Kerr. “Dopamine Receptor Activation is Re- quired for Corticostriatal Spike-Timing-Dependent Plasticity.” The Journal of Neuroscience, 28(10):2435–2446, March 2008.

[PL08] Art Pope and Pat Langley. “CASTLE: A Framework for Integrating Cognitive Models into Virtual Environments.” In Biologically Inspired Cognitive Architec- tures: Papers from the AAAI FAll Symposium, AAAI Technical Report, 2008.

131 [PMD09] Wiebke Potjans, Abigail Morrison, and Markus Diesmann. “A Spiking Neu- ral Network Model of an Actor-Critic Learning Agent.” Neural Computation, 21:301–339, 2009.

[PNS09] Dejan Pecevski, Thomas Natschl¨ager,and Klaus Schuch. “PCSIM: A Paral- lel Simulation Environment for Neural Circuits Fully Integrated with Python.” Frontiers in neuroinformatics, 3(May):15, 2009.

[PTB06] Jean-Pascal Pfister, Taro Toyoizumi, David Barber, and Wulfram Gerstner. “Op- timal Spike-Timing-Dependent Plasticity for Precise Action Potential Firing in Supervised Learning.” Neural Computation, 18:1318–1348, June 2006.

[Pyt] Python. “Python::boost Documentation.”.

[RBG11] Paul Richmond, Lars Buesing, Michele Giugliano, and Eleni Vasilaki. “Demo- cratic Population Decisions Result in Robust Policy-Gradient Learning: A Para- metric Study with GPU Simulations.” PLoS ONE, 6(5):19, 2011.

[RBT00] M. C. van Rossum, Guo Qiang Bi, and Gina G Turrigiano. “Stable Hebbian Learning from Spike Timing-Dependent Plasticity.” The Journal of Neuroscience, 20(23):8812–8821, 2000.

[RM87] David E. Rumelhart and James L. McClelland. Parallel Distributed Processing, volume 1. Bradford Books, 1987.

[RND11] Micah Richert, Jayram Moorkanikara Nageswaran, Nikil Dutt, and Jeffrey L Krichmar. “An Efficient Simulation Environment for Modeling Large-Scale Cor- tical Processing.” Frontiers in Neuroinformatics, 5(September):1–15, 2011.

[SB98] Richard S. Sutton and Andrew G. Barto. Reinforcement Learning: An Intro- duction, volume 9 of Adaptive computation and . MIT Press, 1998.

[SC12] Narayan Srinivasa and Jose M Cruz-Albrecht. “Neuromorphic Adaptive Plastic Scalable Electronics.” IEEE Pulse, 3(February):51–56, 2012.

[SC13] Narayan Srinivasa and Y. K. Cho. “A Self-Organizing Spiking Neural Model for Learning Fault-Tolerant Spatio-Motor Transformations.” IEEE Transactions on Neural Networks, 2013.

[Sch98] Wolfram Schultz. “Predictive Reward Signal of Dopamine Neurons Predictive Reward Signal of Dopamine Neurons.” Journal of Neurophysiology, 80:1–27, 1998.

[Sch07] Wolfram Schultz. “Behavioral dopamine signals.” Trends in Neurosciences, 30(5):203–10, May 2007.

132 [Sch10] Wolfram Schultz. “Dopamine Signals for Reward Value and Risk: Basic and Recent Data.” Behavioral and Brain Functions, 6(24):1–9, January 2010.

[SCK04] Olaf Sporns, Dante R Chialvo, Marcus Kaiser, and Claus C Hilgetag. “Orga- nization, Development and Function of Complex Brain Networks.” Trends in Cognitive Sciences, 8(9):418–425, 2004.

[Sco10] Ruggero Scorcioni. “GPGPU implementation of a synaptically optimized, anatomically accurate spiking network simulator.” Biomedical Sciences and En- gineering Conference BSEC 2010, 11(Suppl 1):P133, 2010.

[SGM06] J Schemmel, A Grubl, K Meier, and Eilif Muller. “Implementing Synaptic Plas- ticity in a VLSI Spiking Neural Network Model.”, 2006.

[SK93] W R Softky and C. Koch. “The highly irregular firing of cortical cells is incon- sistent with temporal integration of random EPSPs.” Journal of Neuroscience, 13(1):334–350, 1993.

[SMA00] Sen Song, Kenneth D Miller, and Larry F. Abbott. “Competitive Hebbian learn- ing through spike-timing-dependent synaptic plasticity.” Nature Neuroscience, 3(9):919–926, 2000.

[SN94] M N Shadlen and W T Newsome. “Noise, neural codes and cortical organization.” Current Opinion in Neurobiology, 4(4):569–579, 1994.

[SR90] Wolfram Schultz and Ranulfo Romo. “Dopamine Neurons of the Monkey Mid- brain: Contingencies of Responses to Stimuli Eliciting Immediate Behavioral Re- actions.” Journal of Neurophysiology, 63(3):607–624, March 1990.

[STM07] David Sussillo, Taro Toyoizumi, and Wolfgang Maass. “Self-Tuning of Neural Circuits Through Short-Term Synaptic Plasticity.” Journal of Neurophysiology, 97:4079–95, June 2007.

[THH11] Corey M. Thibeault, R. Hoang, and Frederick C. Harris. “A novel Multi-GPU Neural Simulator.” In ISCA International Conference on Bioinformatics and Computational Biology, New Orleans, LA, 2011.

[TM97] T W Troyer and Kenneth D Miller. “Physiological gain leads to high ISI vari- ability in a simple model of a cortical regular spiking cell.” Neural Computation, 9(5):971–983, 1997.

[TM09] J P Tiesel and A S Maida. “Using parallel GPU architecture for simulation of planar I/F networks.”, 2009.

[TMS13] Corey M. Thibeault, Kirill Minkovich, and Narayan Srinivasa. “Efficiently Pass- ing Messages in Distributed Spiking Neural Network Simulation.” 2013.

133 [TPM98] Misha Tsodyks, Klaus Pawelzik, and Henry Markram. “Neural Networks with Dynamic Synapses.” Neural Networks, 10(1998):821–835, 1998.

[TS95] Misha Tsodyks and T Sejnowski. “Rapid state switching in balanced cortical network models.” Network Computation in Neural Systems, 6(2):111–124, 1995.

[Tur08] Gina G Turrigiano. “The self-tuning neuron: synaptic scaling of excitatory synapses.” Cell, 135(3):422–435, 2008.

[Uni] Ohio State University. “MVAPICH: MPI over InfiniBand, 10GigE/iWARP and RoCE.”.

[US09] Robert Urbanczik and Walter Senn. “Reinforcement learning in populations of spiking neurons.” Nature Neuroscience, 12:250–252, 2009.

[USK94] M. Usher, M. Stemmler, and C. Koch. “Network Amplification of Local Fluctua- tions Causes High Spike Rate Variability, Fractal Patterns and Oscillatory Local Field Potentials.” Neural Computation, 6:795–836, 1994.

[VA05] Tim P Vogels and Larry F. Abbott. “Signal Propagation and Logic Gat- ing in Networks of Integrate-and-Fire Neurons.” The Journal of Neuroscience, 25(46):10786–10795, 2005.

[VS96] C Van Vreeswijk and Haim Sompolinsky. “Chaos in neuronal networks with bal- anced excitatory and inhibitory activity.” Science, 274(5293):1724–1726, 1996.

[VS98] C Van Vreeswijk and Haim Sompolinsky. “Chaotic balanced state in a model of cortical circuits.” Neural Computation, 10(6):1321–1371, 1998.

[WBW09] Brian A Wilt, Laurie D Burns, Eric Tatt Wei Ho, Kunal K Ghosh, Eran A Mukamel, and Mark J Schnitzer. “Advances in light microscopy for neuro- science.” Annual Review of Neuroscience, 32(1):435–506, 2009.

[WGH01] E. C. Wilson, Philip H. Goodman, and Frederick C. Harris. “Implementation of a Biologically Realistic Parallel Network Simulator.” In SIAM Conference on Parallel Processing for Scientific Computing, pp. 1–11, Portsmouth, VA, 2001.

[Wil01] E. C. Wilson. Parallel Implementation of a Large Scale Biologically Realistic Neocortical Neural Network Simulator. PhD thesis, University of Nevada, Reno, 2001.

[WP05] Florentin W¨org¨otterand Bernd Porr. “Temporal Sequence Learning, Predic- tion, and Control - A Review of different models and their relation to biological mechanisms.” Neural Computation, 17(2):245–319, February 2005.

[XS04] Xiaohui Xie and H. Sebastian Seung. “Learning in Neural Networks by Rein- forcement of Irregular Spiking.” Physical Review E, 69(4):1–10, April 2004.

134 [YSM10] D Yudanov, M Shaaban, R Melton, and L Reznik. “GPU-based simulation of spiking neural networks with real-time performance amp;amp; high accuracy.”, 2010.

[YWW07] Joshua M Young, Wioletta J Waleszczyk, Chun Wang, Michael B Calford, Bog- dan Dreher, and Klaus Obermayer. “Cortical Reorganization Consistent with Spike Timing-but not Correlation-Dependent Plasticity.” Nature Neuroscience, 10(7):887–895, July 2007.

[ZB06] Basilis Zikopoulos and Helen Barbas. “Prefrontal Projections to the Thalamic Reticular Nucleus form a Unique Circuit for Attentional Mechanisms.” Journal of Neuroscience, 26(28):7348–7361, 2006.

135