Probabilistic Circuits for Autonomous Learning: a Simulation Study

UC Santa Barbara UC Santa Barbara Previously Published Works Title Probabilistic Circuits for Autonomous Learning: A Simulation Study. Permalink https://escholarship.org/uc/item/8vw0063k Authors Kaiser, Jan Faria, Rafatul Camsari, Kerem Y et al. Publication Date 2020 DOI 10.3389/fncom.2020.00014 Peer reviewed eScholarship.org Powered by the California Digital Library University of California BRIEF RESEARCH REPORT published: 25 February 2020 doi: 10.3389/fncom.2020.00014 Probabilistic Circuits for Autonomous Learning: A Simulation Study Jan Kaiser*, Rafatul Faria, Kerem Y. Camsari and Supriyo Datta Department of Electrical and Computer Engineering, Purdue University, West Lafayette, IN, United States Modern machine learning is based on powerful algorithms running on digital computing platforms and there is great interest in accelerating the learning process and making it more energy efficient. In this paper we present a fully autonomous probabilistic circuit for fast and efficient learning that makes no use of digital computing. Specifically we use SPICE simulations to demonstrate a clockless autonomous circuit where the required synaptic weights are read out in the form of analog voltages. This allows us to demonstrate a circuit that can be built with existing technology to emulate the Boltzmann machine learning algorithm based on gradient optimization of the maximum likelihood function. Such autonomous circuits could be particularly of interest as standalone learning devices in the context of mobile and edge computing. Keywords: on-device learning, Boltzmann machine algorithm, probabilistic computing, magnetic tunnel junction (MTJ), machine learning, analog circuit 1. INTRODUCTION Machine learning, inference, and many other emerging applications (Schuman et al., 2017) make use of stochastic neural networks comprising (1) a binary stochastic neuron (BSN) (Ackley et al., 1985; Neal, 1992) and (2) a synapse that constructs the inputs Ii to the ith BSN from the outputs mj of all other BSNs. Edited by: The output mi of the ith BSN fluctuates between +1 and −1 with a probability controlled by its Lei Deng, University of California, Santa Barbara, input United States mi(t + τN ) = sgn tanh Ii(t) − r (1) Reviewed by: Youhui Zhang, where r represents a random number in the range [−1, +1], and τN is the time it takes for a neuron Tsinghua University, China 1 to provide a stochastic output mi in accordance with a new input Ii . Jeongjun Lee, Usually the synaptic function, Ii({m}) is linear and is defined by a set of weights Wij such that Texas A&M University, United States *Correspondence: I (t + τ ) = W m (t) (2) Jan Kaiser i S X ij j [email protected] j τ { } { } Received: 25 November 2019 where S is the time it takes to recompute the inputs I everytime the outputs m change. Typically Accepted: 03 February 2020 Equations (1), (2) are implemented in software, often with special accelerators for the synaptic Published: 25 February 2020 function using GPU/TPUs (Schmidhuber, 2015; Jouppi, 2016). Citation: The time constants τN and τS are not important when Equations (1) and (2) are implemented on Kaiser J, Faria R, Camsari KY and a digital computer using a clock to ensure that neurons are updated sequentially and the synapse Datta S (2020) Probabilistic Circuits is updated between any two updates. But they play an important role in clockless operation of for Autonomous Learning: A autonomous hardware that makes use of the natural physics of specific systems to implement Simulation Study. Equations (1) and (2) approximately. A key advantage of using BSNs is that Equation (1) can be Front. Comput. Neurosci. 14:14. doi: 10.3389/fncom.2020.00014 1Equation (1) can be written in binary notation with a unit step function and a sigmoid function. Frontiers in Computational Neuroscience | www.frontiersin.org 1 February 2020 | Volume 14 | Article 14 Kaiser et al. Probabilistic Circuits for Autonomous Learning implemented compactly using stochastic magnetic tunnel with vivj = Vv,ij/(VDD/2) and mimj = Vm,ij/(VDD/2). From junctions (MTJs) as shown in Camsari et al. (2017a,b), while Figure 1 and comparing Equations (3), (4) it is easy to see resistive or capacitive crossbars can implement Equation (2) how the weights and the learning and regularization parameters (Hassan et al., 2019a). It has been shown that such hardware are mapped into circuit elements: Wij = AvVij/V0, λ = implementations can operate autonomously without clocks, if the V0/(AvVDD/2), and τL = λRC where Av is the voltage gain of BSN operates slower than the synapse, that is, if τN >> τS shown OP3 in Figure 1 and V0 is the reference voltage of the BSN. For by Sutton et al. (2019). proper operation the learning time scale τL has to be much larger Stochastic neural networks defined by Equations (1) and (2) than the neuron time τN to be able to collect enough statistics can be used for inference whereby the weights Wij are designed throughout the learning process. such that the system has a very high probability of visiting A key element of this approach is the representation of configurations defined by {m} = {v}n, where {v}n represents the weights W with voltages rather than with programmable a specified set of patterns. However, the most challenging and resistances for which memristors and other technologies are still time-consuming part of implementing a neural network is not in development (Li et al., 2018b). By contrast the charging of the inference function, but the learning required to determine capacitors is a textbook phenomenon, allowing us to design the correct weights Wij for a given application. This is commonly a learning circuit that can be built today with established done using powerful cloud-based processors and there is great technology. The idea of using capacitor voltages to represent interest in accelerating the learning process and making it more weights in neural networks has been presented by several authors energy efficient so that it can become a routine part of mobile and for different network topologies in analog learning circuits edge computing. (Schneider and Card, 1993; Card et al., 1994; Kim et al., 2017; In this paper we present a new approach to the problem Sung et al., 2018). The use of capacitors has the advantage of of fast and efficient learning that makes no use of digital having a high level of linearity and symmetry for the weight computing at all. Instead it makes use of the natural physics of updates during the training process (Li et al., 2018a). a fully autonomous probabilistic circuit composed of standard In section 2, we will describe such a learning circuit that electronic components like resistors, capacitors, and transistors emulates Equations (1)–(3). The training images or patterns {v}n along with stochastic MTJs. are fed in as electrical signals into the input terminals, and the We focus on a fully visible Boltzmann machine (FVBM), synaptic weights Wij can then be read out in the form of voltages a form of stochastic recurrent neural network, for which the from the output terminals. Alternatively the values can be stored most common learning algorithm is based on the gradient in a non-volatile memory from which they can subsequently be ascent approach to optimize the maximum likelihood function read and used for inference. In section 3, we will present SPICE (Carreira-Perpinan and Hinton, 2005; Koller and Friedman, simulations demonstrating the operation of this autonomous 2009). We use a slightly simplified version of this approach, learning circuit. whereby the weights are changed incrementally according to 2. METHODS Wij(t + t) = Wij(t) + ǫ[vivj − mimj − λWij(t)] The autonomous learning circuit has three parts where each where ǫ is the learning parameter and λ is the regularization part represents one of the three Equations (1)–(3). On the left parameter (Ng, 2004). The term vivj is the correlation between the hand side of Figure 1, the training data is fed into the circuit ith and the jth entry of the training vector {v}n. The term mimj by supplying a voltage Vv,ij which is given by the ith entry of corresponds to the sampled correlation taken from the model’s the bipolar training vector vi multiplied by the jth entry of the distribution. The advantage of this network topology is that the training vector vj and scaled by the supply voltage VDD/2. The learning rule is local since it only requires information of the two training vectors can be fed in sequentially or as an average of neurons i and j connected by weight Wij. In addition, the learning all training vectors. The weight voltage Vij across capacitor C rule can tolerate stochasticity for example in the form of sampling follows Equation (4) where Vv,ij is compared to voltage Vm,ij noise which makes it an attractive algorithm to use for hardware which represents correlation of the outputs of BSNs mi and mj. machine learning (Carreira-Perpinan and Hinton, 2005; Fischer Voltage Vm,ij is computed in the circuit by using an XNOR gate and Igel, 2014; Ernoult et al., 2019). that is connected to the output of BSN i and BSN j. The synapse For our autonomous operation we replace the equation above in the center of the circuit connects weight voltages to neurons with its continuous time version (τL: learning time constant) according to Equation (2). Voltage Vij has to be multiplied by 1 or −1 depending on the current value of mj. This is accomplished by dWij vivj − mimj − λWij using a switch which connects either the positive or the negative = (3) node of Vij to the operational amplifiers OP1 and OP2. Here, OP1 dt τL accumulates all negative contributions and OP2 accumulates all which we translate into an RC circuit by associating Wij with the positive contributions of the synaptic function.

Probabilistic Circuits for Autonomous Learning: a Simulation Study

Neural Networks for Machine Learning Lecture 12A The

A Boltzmann Machine Implementation for the D-Wave

Boltzmann Machine Learning with the Latent Maximum Entropy Principle

Stochastic Classical Molecular Dynamics Coupled To

Learning and Evaluating Boltzmann Machines

Learning Algorithms for the Classification Restricted Boltzmann

An Efficient Learning Procedure for Deep Boltzmann Machines

Machine Learning Algorithms Based on Generalized Gibbs Ensembles

Quantum Restricted Boltzmann Machine Is Universal for Quantum Computation

Machine Learning and Quantum Phases of Matter Hugo Théveniaut

Feature Learning and Graphical Models for Protein Sequences

Introduction Boltzmann Machine: Learning