Metaplasticity in Multistate Memristor Synaptic Networks
Total Page:16
File Type:pdf, Size:1020Kb
SS13: Mixed-signal Circuits for Machine Learning and Edge-AI Metaplasticity in Multistate Memristor Synaptic Networks Fatima Tuz Zohorax, Abdullah M. Zyarahy, Nicholas Souresy, Dhireesha Kudithipudix Neuromorphic AI Lab, University of Texas at San Antoniox Rochester Institute of Technologyy Abstract—Recent studies have shown that metaplastic synaptic metaplasticity model, also referred to as multistate synapses can retain information longer than simple binary synapse shows less degradation in memory lifetime for highly synapses and are beneficial for continual learning. In this paper, sparse activity and outperforms the cascade model in memory we explore the multistate metaplastic synapse characteristics in capacity [3]. In this paper, we focus on the multistate synaptic the context of high retention and reception of information. In- model. Previous research on metaplasticity focused on physical herent behavior of a memristor emulating the multistate synapse metaplastic behavior in memristor devices [4]–[6]. Most of is employed to capture the metaplastic behavior. An integrated neural network study for learning and memory retention is the prior literature is concentrated on device level analysis performed by integrating the synapse in a 5 × 3 crossbar at considering only continuous synaptic efficacy with no network the circuit level and 128 × 128 network at the architectural level. level realization. However, incorporating metaplastic synapses An on-device training circuitry ensures the dynamic learning in in a crossbar architecture can lead to compact and powerful the network. In the 128 × 128 network, it is observed that the neuromorphic architecture capable of high memory retention. number of input patterns the multistate synapse can classify is ' Since edge devices encounter large amounts of streaming 2.1x that of a simple binary synapse model, at a mean accuracy data, such architecture can immensely benefit their overall of ≥ 75% . performance. Keywords—Metaplasticity, Memristor, Multistate synapse One of the early realizations of the binary metaplastic synapse was proposed by [3]. Since this model can retain I. INTRODUCTION previously learned information and maintain response to new information simultaneously, such a synaptic model can better Neural plasticity in the brain is the ability to learn and adapt capture all the information learned throughout its lifetime. to intrinsic or extrinsic stimuli by reorganizing the morphology, Hence, it shows better resilience against catastrophic forgetting functions, or connectivity of its constituent synapses and compared to binary synapses. In this research, we study this neurons. Synaptic plasticity is a complex dynamic process synaptic model at-scale in memristive neural accelerators. The that modulates and regulates network dynamics depending main contributions of this paper are as follows: on external activity over multiple timescales. Metaplastic- ity refers to plasticity of the plasticity of synapses [1]. A • to emulate binary metaplastic synapses by exploiting metaplastic synaptic network enables a synapse to tune its inherent device properties of a memristor. level of plasticity depending on the pre-synaptic activity. This • to demonstrate the efficacy of metaplastic synapse in property is deemed crucial for high memory retention and a 5 × 3 crossbar circuit architecture with on-device learning capability in a synaptic network [2]. It is shown that learning capability. simple binary synapses show high memory retention when the imposed activity is highly sparse. However, for moderately • to compare the performance of binary vs. metaplastic sparse neuronal activity, the interference between multiple synapse in a two layer neural network emulating stimuli can pose a challenge to achieve high memory retention hardware constraints. arXiv:2003.11638v1 [cs.NE] 26 Feb 2020 and learning. Since binary synapses cannot concurrently learn new activity and retain knowledge of past activity, the synapse memory lifetime drops significantly [3]. To solve this issue, II. METAPLASTIC SYNAPTIC NETWORK MODEL Fusi et al. [2] proposed a cascade model of synapse, in The multistate synapse is a relatively simple model where which synapses with binary efficacy have multiple metastates. metaplasticity is modeled by serially connected metastates. The Synapses exhibit varying degree of plasticity depending on probability to transit from one state to the other is equal. their metaplastic state. This property enables a network of Fig. 1 shows the metastates of the multistate synapse and such synapses to retain knowledge of past activity and facilitate their inter-transitions. The red and blue bubbles represent high plasticity to learn new activity. While the cascade synapse synaptic metastates with efficacy 1 and 0 respectively. The outperforms a simple binary synapse in response to moderately arrows show the transition direction, the red arrows correspond sparse activity, its memory retention for highly sparse activity to potentiation and the blue arrows represent depression. As is orders below that of a simple binary synapse. In [3], Leibold shown in Fig. 1, the synapse changes its efficacy only when et al. proposed a variant of metaplastic synapse model, in it is in metalevel (η) 0; in all other cases it only changes which the metastates are serially connected and the probability the metalevel retaining its efficacy. Multistate model with n to transit from one state to another is equally likely. This serial metalevels can exhibit (2n-1) forgetting timescales which helps XXXX-XXXX c 2020 IEEE it to retain knowledge of past activity [3]. SS13: Mixed-signal Circuits for Machine Learning and Edge-AI In [7] and [3], the authors investigate memory lifetime by imposing a specific pattern of activity to the network and observing how long the network can recollect the learned infor- mation. It is shown that complex synapses with metaplasticity can retain information longer than simple binary synapses when the neuron activity becomes less sparse. In this work, we explore how metaplasticity affects the accuracy of a synaptic network to detect all the patterns learned throughout its lifetime and its capability to learn new activity. We consider a simple feed forward network where Nin input neurons are connected to Nout output neurons through a network of sparse synapses. Random input patterns and corresponding output patterns of activity f (f % of bits are high) are generated and applied to a network with connectivity C, i.e. C% of the input and output neurons are connected to each other. Initially, the connected synapses have random efficacy and they are at their most plastic state. Similar to [3], we use McCullogh-Pitts neuron Fig. 1. Representation of the multistate metaplastic synapse model mapped model at the output nodes. This neuron detects activity if to a physical memristor device behavior captured from [8]. the incoming signal is greater than its threshold, which is set based on the average input to an output neuron in the III. MODELING MULTISTATE SYNAPSES WITH randomly initialized network. In a network with connectivity MEMRISTORS C and input activity f, the threshold is equal to NinCf=2. We use an error based learning rule to train the network, where In this work, we leverage device characteristics of mem- ristor device to realize a multistate synapse. As presented in error = (y − yn) (y is the ground truth label and yn is the network output) and only the synapses with active presynaptic [8], the device under consideration shows gradual change in inputs are updated. A synapse is potentiated for positive error conductance during RESET. For modeling we assume this and depressed for negative error. Using this set up, we train behavior during SET operation as well. To emulate the metas- 128 × 128 networks (f=25%, C=25%) of simple binary and tates, the memristor was trained with 15µs pulses of 1.2V to multistate synapses. We also train a similar sized network potentiate or depress from one state to another. In this process, with gradient descent (GD) in which the synaptic weights we get three states with high and low conductance which are thresholded for computation. Two types of accuracy are can represent different metastates. The correlation between tracked in the networks, (1) accuracy to detect the most recent the metastates and the memristor state variable (w) which input to evaluate learning capability and (2) the mean accuracy is proportional to its conductance is shown in Fig. 1. In the across all the patterns encountered, to evaluate the network’s ideal multistate model, change in metalevel incurs no change resilience against catastrophic forgetting. in synaptic efficacy. However, in the hardware emulation the In Fig. 2(a) we see that the binary network outperforms conductance of the memristor varies across metastates. The both GD and multistate network in learning accuracy, the device has to be programmed to ensure that the difference multistate network shows ' 91% accuracy after encountering in conductance between the high and low efficacy states is 100 patterns whereas for the binary network it is ' 99%. substantial. In the modeled memristor, the lowest and highest However, in Fig. 2(d) we see that the mean accuracy drops resistive states were set to be 100kΩ and 10MΩ respectively significanly slower in the multistate network than both GD and and the ratio of conductance between high and low efficacy binary network. To compare the performance across networks state at metalevel zero is ' 4.5. we empirically