UCLA Electronic Theses and Dissertations
Total Page:16
File Type:pdf, Size:1020Kb
UCLA UCLA Electronic Theses and Dissertations Title The Role of Short-Term Synaptic Plasticity in Neural Network Spiking Dynamics and in the Learning of Multiple Distal Rewards Permalink https://escholarship.org/uc/item/63r8s0br Author O'Brien, Michael John Publication Date 2013 Peer reviewed|Thesis/dissertation eScholarship.org Powered by the California Digital Library University of California University of California Los Angeles The Role of Short-Term Synaptic Plasticity in Neural Network Spiking Dynamics and in the Learning of Multiple Distal Rewards A dissertation submitted in partial satisfaction of the requirements for the degree Doctor of Philosophy in Mathematics by Michael John O'Brien 2013 c Copyright by Michael John O'Brien 2013 Abstract of the Dissertation The Role of Short-Term Synaptic Plasticity in Neural Network Spiking Dynamics and in the Learning of Multiple Distal Rewards by Michael John O'Brien Doctor of Philosophy in Mathematics University of California, Los Angeles, 2013 Professor Chris Anderson, Chair In this thesis, we assess the role of short-term synaptic plasticity in an artificial neural network constructed to emulate two important brain functions: self-sustained activity and signal propagation. We employ a widely used short-term synaptic plasticity model (STP) in a symbiotic network, in which two subnetworks with differently tuned STP behaviors are weakly coupled. This enables both self-sustained global network activity, generated by one of the subnetworks, as well as faithful signal propagation within subcircuits of the other subnetwork. Finding the parameters for a properly tuned STP network is difficult. We provide a theoretical argument for a method which boosts the probability of finding the elusive STP parameters by two orders of magnitude, as demonstrated in tests. We then combine STP with a novel critic-like synaptic learning algorithm, which we call ARG-STDP for attenuated-reward-gating of STDP. STDP refers to a commonly used long- term synaptic plasticity model called spike-timing dependent plasticity. With ARG-STDP, we are able to learn multiple distal rewards simultaneously, improving on the previous reward modulated STDP (R-STDP) that could learn only a single distal reward. However, we also provide a theoretical upperbound on the number of distal reward that can be learned using ARG-STDP. ii We also consider the problem of simulating large spiking neural networks. We describe an architecture for efficiently simulating such networks. The architecture is suitable for implementation on a cluster of General Purpose Graphical Processing Units (GPGPU). Novel aspects of the architecture are described and an analysis of its performance is benchmarked on a GPGPU cluster. With the advent of inexpensive GPGPU cards and compute power, the described architecture offers an affordable and scalable tool for the design, real-time simulation, and analysis of large scale spiking neural networks. iii The dissertation of Michael John O'Brien is approved. Dean Buonomano Joseph Teran Andrea Bertozzi Chris Anderson, Committee Chair University of California, Los Angeles 2013 iv Table of Contents 1 Introduction :::::::::::::::::::::::::::::::::::::: 1 1.1 Motivation . .1 1.2 Historical Context . .2 1.3 Thesis Overview . .4 1.4 Chapter Summaries . .6 2 Background: Computational Models for Neural Dynamics and Synaptic Plasticity ::::::::::::::::::::::::::::::::::::::::: 8 2.1 Neuron Models . .8 2.1.1 Hodgkin and Huxley Neurons . .8 2.1.2 Leaky Integrate-and-Fire Neurons . 11 2.1.3 Izhikevich Neurons . 13 2.2 Plasticity Models . 13 2.2.1 Spike Time Dependent Plasticity . 14 2.2.2 Short Term Plasticity . 14 3 Short Term Plasticity Aided Signal Propagation ::::::::::::::: 16 3.1 Introduction . 16 3.2 RAIN Networks . 17 3.3 Signal Propagation . 19 3.3.1 Circuit Design . 19 3.4 Properties of STP . 21 3.5 STP Conditioned RAIN . 24 v 3.6 Signal Transmission in Coupled STP Networks . 25 3.6.1 Network Layout . 25 3.6.2 Coupled RAIN Dynamics . 28 3.6.3 Coupled Signal Propagation Dynamics . 30 3.7 Finding Master STP Parameters . 36 3.8 Analysis . 38 3.8.1 Analyzing Firing Rate Changes . 38 3.8.2 Critical Firing Rate . 44 3.8.3 Assessing Circuit Layer Correlation . 45 3.9 Conclusion . 46 4 Learning Multiple Signals Through Reinforcement ::::::::::::: 48 4.1 Introduction . 48 4.2 Distal Reward Problem . 49 4.3 Methods . 50 4.3.1 Reward Modulated STDP . 51 4.3.2 R-STDP with Attenuated Reward Gating . 52 4.4 Single Synapse Reinforcement Experiment . 54 4.5 Generalization to Multiple Synapse Learning . 57 4.5.1 R-STDP with STP Learns Multiple r-Patterns . 59 4.5.2 ARG-STDP Learns Multiple r-Patterns . 60 4.5.3 STP Stabilizes ARG-STDP Network Learning Dynamics . 61 4.6 Properties of ARG-STDP with STP . 64 4.6.1 Reward Predictive Properties of r-Patterns . 64 4.6.2 Learning Robustness to Reward Release Probability . 66 vi 4.6.3 Learning Robustness to Reward Ordering . 68 4.6.4 Network Scaling . 69 4.6.5 The Reward Scheduling Problem . 69 4.6.6 Firing Rate Affects Learning Capacity . 72 4.6.7 Eligibility Trace Time Constant Affects Learning Capacity . 73 4.6.8 Interval Learning . 75 4.7 Analysis . 77 4.7.1 Defining the Correlation Metric . 77 4.7.2 Computing the Decaying Eligibility Trace . 78 4.8 Discussion . 81 5 HRL Simulator :::::::::::::::::::::::::::::::::::: 85 5.1 Introduction . 85 5.1.1 GPGPU Programming with CUDA . 86 5.1.2 Spiking Neural Simulators . 87 5.2 Simulator Description . 89 5.2.1 User Network Model Description . 90 5.2.2 Input . 92 5.2.3 Analysis . 95 5.3 Simulator Design . 96 5.3.1 Modular Design . 96 5.3.2 Parallelizing Simulation/Communication . 98 5.3.3 MPI Communication . 99 5.3.4 Simulation . 103 5.4 Performance Evaluation . 112 vii 5.4.1 Large-Scale Neural Model . 112 5.4.2 GPU Performance . 115 5.4.3 CPU Performance . 118 5.4.4 Network Splitting . 120 5.4.5 Memory Consumption . 120 5.5 Discussion . 121 5.6 Conclusion . 122 6 Conclusion ::::::::::::::::::::::::::::::::::::::: 124 References ::::::::::::::::::::::::::::::::::::::::: 126 viii List of Figures 3.1 RAIN network configuration. The red arrows indicate inhibitory connections and the blue arrows are excitatory connections. 17 3.2 The firing rate for the networks tested in the synaptic weight parameter sweep. 19 3.3 Signal propagation circuit network architecture. A naturally occurring feed- forward circuit is found within a RAIN network. The feed-forward connections are then strengthened, and this circuit is the circuit we observe for signal propagation. 20 3.4 A) Signal propagation through 5 layers. B) A reverberating signal that is experienced in layer 5, but without inputs to layer 1. C) The average firing rate of the neurons in each layer for the duration of the experiment. 21 3.5 The dynamic synapses plotted as a function of the presynaptic firing rate. The STP parameters can be chosen to produce a fixed point firing rate. Here, the fixed point is 10 Hz, at which point µmn = Wmn, which was already chosen to produce stable RAIN firing. 23 3.6 A) RAIN activity for 100 of the network neurons. The network parameters are suboptimal, leading to activity that lasts less than 2 seconds. B) RAIN activity for 100 of the network neurons. STP is employed, enabling the net- work to overcome the faulty choice in network parameters. The activity lasts more than 10 seconds. 24 ix 3.7 The coupled signal propagation network architecture. Two circuit networks are weakly coupled together. The two networks have the same general neural parameters and configuration statistics, but the STP parameters for each net- work can be chosen independently, producing different firing dynamics in each network. The left network is referred to as Master, having STP parameters that yield self-sustained network activity. The right network is referred to as Slave, which has STP parameters that allow short excitatory bursts through, then kills network activity. 25 3.8 A) Slave and Master are uncoupled. Master continues indefinitely whereas Slave dies. B) Slave has projections onto Master. Here Slave dies, as expected, and Master continues indefinitely. 29 3.9 A) Master has projections onto Slave. This is sufficient to restart Slave when- ever Slave dies. B) Slave and Master are mutually coupled. In this case, only Slave received initial inputs, and Master relied on Slave for a jump-start. This demonstrates that Slave has the ability to start Master in the event Master dies. In this configuration, both networks thrive indefinitely. 31 3.10 An analysis of the coupling required for the connections between Master and Slave. A & B) The average firing rate of Slave and Master, for one second of elapsed time, for different connectivity probabilities. These were performed with a bridge synapse strength of 30 nS. C & D) The average firing rate of Slave and Master, for one second of elapsed time, for different connectivity strengths. These were performed with a synaptic bridge connection probability of 2E-4. 32 3.11 For any layer k of interest, we construct a binary projection neuron pair. Layer k projects onto the excitatory indicator neuron (blue). The indicator neuron has an excitatory connection to the inhibitory neuron (red) which, in turn, inhibits the indicator neuron to prevent the indicator from being overwhelmed by the circuit layer during a stimulus. 33 x 3.12 A & B) Signal propagation through 5 layers for Master and Slave. C & D) A reverberating signal that is experienced in layer 5 of Master, but not in Slave. E & F) The average firing rate of the neurons in each layer for Slave and Master respectively. 35 4.1 System reward R, reward tracker Rk and success signal Sk for reward channel k are plotted.