<<

Brain-like Computng – Technologies and Architctures

Prof. Pinaki Mazumder University of Michigan Ann Arbor, MI 48105

Acknowledgement: National Science Foundation (CCF & ECCS) 1 EVOLUTION OF NEUROMORPHIC COMPUTING:

Perceptron (’60) Gartner Hype Cycle (10 E 2 neurons)

Neural Net (’90) (10 E 3 Neurons)

Neuromporphic Hardware (’00) • (10 E 5 neurons) • Spintronics • MQW Devices Multi-level • Nanoscale CMOS Nanocrossbar (10 E 7 neurons)

Designed by Mazumder Group 1992

Perceptron Mark 1 Hopfield Net Chip TrueNorth (IBM) ~ 1 M Neurons Mazumder Group’s Neuromorphic Research

Self-Healing VLSI Design Learning based VLSI Chips (1989-1996) (2010-2016) Hopfield Neural Net as Algorithmic STDP Learning for Position Detector Hardware for Spare Allocation by STDP Learning for Virtual Bug Navigation Node Cover over Bipartite Graph STDP Learning for XOR/Edge Detection

• IEEE Trans. on CAS, 1993 Q-Learning for Maze Search Algorithm on • IEEE Trans. on CAS, 1993 • IEEE Trans. on Computer, 1996 Memristor Array

Reinforcement Learning/Actor-Critic NN Cellular Neural Networks for Optimal Nonlinear Control Applications (2008-2013)

• Color Image Processing • Velocity Tuned Filter • Memristor/RRAM based CNN • RTD+HEMT based CNN • Proceedings of the IEEE, 2012 • IEEE Trans. on VLSI, 2009 • Nano Letters Journal, 2010 • IEEE Trans. on Nanotechnology, 2008 • IEEE Trans. on Computer, 2016 • IEEE Trans. on Neural Nets, 2014 • IEEE Nanotechnology, 2011 • IEEE Trans. on Nanotechnology, 2013 • IEEE Nanotechnology, 2014 • ACM Journal on Emerging Technologies • IEEE Cellular Neural Networks, 2012 in Computing Systems, 2013 Neuromorphic Self-Healing Memory Design using Memristor Array

Product Code (SEC), Augmented PC (DEC) à Requires Muxes (~8%) Hamming Code (SEC) à Higher Overhead Projective Geometry Code (DEC/TED) à Galois Field Decoding, … BCH & Reed Solomon (DEC) à Decoding complexity is high High Density a-Si Based Nano-Crossbar

1kb crossbar array

Jo et al. Nano Lett., 9, 870 (2009). From Prof. Wei Lu’s Research It is Scalable and CMOS Compatible. p-Si Group 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Ag 1 Fabbed by 2 3 < 50 kΩ Wei Lu’s 4 Group at 5 Univ. of 6 7 50 ~ 150 k Ω Michigan 8 9 Defects ! & More Defects!! 10 11 400 cells & 31 defects=8% 150 ~ 300 kΩ 12 13 14 15 300 k ~ 1 MΩ 16 17 18 19 20 Not Written Compaction of Faulty Array & Find Vertex Cover in Bipartite Graph R1 C1 R2

C3

Find Vertex Cover: Mazumder & Yih, IEEE Trans. on CAD, 1992 [R1, R2; C1, C3] Defective Cells are Edges in Bipartite Graph

Unrestricted Vertex Cover Problem can Be solved in Polynomial Time by Bipartite Graph Matching Algorithm. However, Restricted Vertex Cover Problem is NP-complete. Neuromorphic Self-Healing for Any Type of Memory Array

With BISR 100%

Without BISR 20%

M.D. Smith & P. Mazumder, IEEE Trans. on Computers, Vol. 45, No. 1, Jan. 1996, pp. 109-115.

10,000 trans. Fusion of Sensing & Processing

Nanoscale Cellular Neural Network (CNN) by Using Quantum Tunneling Devices & Memristor Array

Fig. 2 Schematic view of the pyramid-shape In Ga As quantum dot. 0.5 0.5 3-D Confined Quantum Box

Array

Memristor Array

Metal Metal

a-Si a-Si

p-Si p-Si

UNIVERSITY OF Low Resistance High Resistance MICHIGAN State (LRS) State (HRS) start 3D-SBEM-TIS

0.2

0.1 Save mass (e and h), band

Define quantum 3 0.0

Φ alignment, doping dot structure distribution in files -0.1

100 -0.2 80 60 20 40 40 X Axis 60 20 Y Axis 80 100 Calculate 2D wave 2D wave function function in xy IEEE Trans. on Nanotech, 2008 plane

2.5

2.0

Solve 1D 1.5 Schrodinger (z)

χ equation (in z 1.0 direction) To Pixel 0.5

0.0 0 1 2 3 4 5 6 z (nm)

Using boundary Transmission rate condition to get 3D T(E) S-Matrix Edge Detection T = T ( N ) ⋅T ( N −1) ⋅⋅⋅T (2) ⋅T (1)

Integrate T(E) over Tunneling current all possible energy Vertical Line Detection

UNIVERSITY OF MICHIGANend (C) 2015 Professor Pinaki Mazumder Analog Programmable Nano-Architecture for Static and Dynamic Image Processing

10-3

10-5

10-7

-9 Input (I)

Current (A) 10

10-11 0 1 2 3 Voltage (V)

Center (C) Output (O)

Memristor

RTD IEEE Trans. On Nanotechnology, 2013 (a)

(b) Qdot + Memristor Chip Analog Programmable Unconventional Computing

IEEE Trans. On Nanotechnology, 2013

2 2 2 2

Qdot + Memristor Chip Edge Detection HLD & VLD

Molecular Computer? Spike Timing Dependent Plasticity (STDP) Learning Networks using Biological Neuron Model

Ionic Transport in Biological Neuron & its Silicon Implementation

Hodgkin-Huxley Model Spike Timing Dependent Plasticity (STDP) Learning Networks

Pre

Post

Synaptic Weight

15 Previous Approach of STDP Implementation On Crossbar with Constant Amplitude Programming pulses with different pulse widths

100 µs

Conductance change after each pulse 1. Digitally Controlled

2. Constant Amplitude

3. Temporal Correlation Position Detector Application

• Split up area into a 5x5 grid

• Each grid has one neuron that is connected to an adjacent neuron through STDP synapse

• Detect a light source at any position on the grid

• No post-processing circuitry necessary

• k-Winner-Take-All (k-WTA) implementation

17 STDP Neural Circuit for Position Detector

Bases Memristor CMOS Design Design

Synaptic area < (0.5µm x 17µm x 16µm 0.5µm) Synaptic Density Ø 4 devices/µm2 0.0037 devices/ x1000 µm2 Neuron area 20µm x 10µm 8µm x 12µm Neuron Density 0.005 devices/ 0.0104 devices/ µm2 µm2 x2 Volatility Nonvolatile Volatile Ebong, and Mazumder, Proceedings of the IEEE, Feb. 2012. STDP Circuits for XOR/Edge Detector

Inhibitory =20 MΩ Excitatory = 5.6 MΩ) S

Input = 011110; Output = 010010 n23 = n12 XOR n14 = 1 when n12=0 & n14=1 n23 = n12 XOR n14 = 0 when n12=1 & n14=1 Basic Neuromorphic Functions: WTA, XOR, Edge Detector Were implemented Using STDP learning scheme IEEE Nano 2011 Networks

Memristor Based Q-Learning Network

CMOS Digital Actor-Critic Network

Proceedings of the 14th IEEE International Conference on Nanotechnology Toronto, Canada, August 18-21, 2014

Proceedings of the 14th IEEE International Conference on Nanotechnology Toronto, Canada, IterativeAugust 18-21, 2014 Architecture for Value Iteration using Memristors

IterativeIdongesit Architecture E. Ebong, Member, for Value IEEE and Iteration Pinaki Mazumder, usingFellow, Memristors IEEE

Idongesit E. Ebong, Member, IEEE and Pinaki Mazumder, Fellow, IEEE

Abstract— Memristors promise higher device density and de- Section IV provides simulation results and discussion, and signAbstract flexibility.— Memristors Besides promise utilizing higher memristors device for density digital and memory, de- SectionSection IV providesV concludes simulation the paper. results and discussion, and signanother flexibility. promising BesidesProceedings utilizingavenue of forthe memristors 14th adoption IEEE for is digital the advancement memory, Section V concludes the paper. anotherof neural promising network avenueInternational circuits for Conference capable adoption on Nanotechnology of is learning. the advancement Neural net- II. Q-LEARNING AND MEMRISTOR MODELING of neural network circuitsToronto, Canada, capable August of18-21, learning. 2014 Neural net- II. Q-LEARNING AND MEMRISTOR MODELING workwork implementations implementations with with memristors memristors have have been been proposed, proposed, including memristor synaptic training methodologies. This Q-learningQ-learning algorithm algorithm [8] is [8] provided is provided in (1). in Equation (1). Equation (1) (1) including memristor synaptic training methodologies. This ˜ ˜ workwork highlights highlights applications applications ofIterative a of neural a neural learning Architecture learning methodology methodology forprovides Valueprovides the Iteration the update update for using the for estimated the Memristors estimated Q-value Q-value (Q) at (Q the) at the current state (s ) and action taken at the current state (a ). inspiredinspired by by Q-learning. Q-learning. Memristors Memristors are are used used as analog as analog storage storage current statet (st) and action taken at the current statet (at). elements to store a large Q-table. The method is qualified with elements to store a large Q-table. The method is qualified with↵t is↵ ais learning a learning parameter, parameter, and r andt isr theis reward. the reward. In this In form this form a maze problem in order to show thatIdongesit the proposed E. Ebong, networkMember,of IEEE Q-learning,t and Pinaki the Mazumder, model of theFellow, environment IEEEt does not need to cana be maze used problem to learn in to order approximate to show an that optimal the proposedpath to solving network be accurate.of Q-learning, Therefore the model after learning, of the environment the exact reward does values not need to thecan maze be used problem. to learn Brief to results approximate highlighting an optimal the methodology path to solving do not affect the overall behavior of the network [9]. onthe a maze maze problem problem. and Brief discussion results on highlighting generating the random methodology keys be accurate. Therefore after learning, the exact reward values are provided. This workAbstract combines— Memristors model-free promise higherreinforcement device density and de- Section IV provides simulation results and discussion, and on a maze problemsign and flexibility. discussion Besides on utilizing generating memristors random for digital keys memory,do notSection affect V concludes the overall the paper. behavior of the network [9]. learning with neural networks. ˜ ˜ are provided. Thisanother work promising combines avenue model-free for adoption reinforcement is the advancementQ (st,at) Q (st,at) (1 ↵t(st,at)) + ↵t (st,at) rt of neural network circuits capable of learning. Neural net- II. Q-LEARNING⇥ AND MEMRISTOR MODELING ⇥ learning with neuralI.work networks. INTRODUCTION implementations with memristors have been proposed, + ↵ (s ,a ) max Q˜ (s ,a ) Q˜ (s ,aQ-learning) tQ˜ algorithm(ts ,at ) [8](1 is provided↵ (ts+1 in,a (1).t))+1 Equation + ↵ (s (1),a ) r including memristor synaptic training methodologies. This t t t t⇥ at+1 t t t t t t t Memristors [1],[2] have been proposed for use in different provides the update for⇥ the estimatedh Q-value (Qi˜) at the(1) ⇥ applications to alleviateworkI. hardware highlights INTRODUCTION applications complexity of a issues neural learning related methodology ˜ inspired by Q-learning. Memristors are used as analog storageThe MAXcurrent function state+ (↵st)( inandst,a (1) actiont) will takenmax produce atQ the(s current at+1 value,a statet+1 that) (at). is a to CMOS-only hardwareelements in to the store sub-22nm a large Q-table. scaling The method regime. is qualified with ↵ ˜⇥ art+1 Memristors [1],[2] have been proposed for use in differentlinear combinationt is a learning between parameter,Q and(st,at tis) theand reward. another In this value formt. These applications includea maze problem FPGA, in cellular order to neuralshow that networks, the proposed network of Q-learning, the model of the environmenth does not needi to (1) can be used to learn to approximate an optimal path toThe solving value of t can be zero, positive or negative. It is a applications to alleviate hardware complexity issues related be accurate. Therefore after learning, the exact reward values digital memory, and programmablethe maze problem. analog Brief results resistors. highlighting In addi- the methodologycorrectingThe factor MAX that function discerns in (1) how will far produce apart the a Q-value value that of is a tion,to CMOS-onlymemristors have hardware been proposed in the sub-22nm for higher-level scaling algo- regime. do not affectQ-Learning the overall behavior of the Hardware network [9]. – Reinforcement Learning on a maze problem and discussion on generating randomthe keyslinear current combination state-action pairbetween is fromQ˜(s thet,a Q-valuet) and another of the next value t. rithmicThese implementations applicationsare include provided. including FPGA, This instar work cellular and combines outstar neural model-free training networks, reinforcement learning with neural networks. state-actionThe value pair. of Thereforet can be (1) zero, can be positive written or as negative. (2): It is a Evaluative Feedback (Rewards) [3], performing optimal control, modeling the visual cortex, Q˜ (st,at) Q˜ (st,at) (1 ↵t(st,at)) + ↵t (st,at) rt digital memory, and programmable analog resistors. In addi- correcting factor that discerns⇥ how far apart the⇥ Q-value of etc [4],[5]. The use of memristors in BooleanI. INTRODUCTION computing has + ↵ (s ,a ) max Q˜ (s ,a ) current tion, memristors have been proposed for higher-level algo- ˜ ˜ t t t t+1 t+1 state Controller several limitations [6] sinceMemristors direct [1],[2] control have been of memristors proposed for areuse in differenttheQ current(st,at) state-actionQ (st,at)+ pair⇥ ↵at ist+1( fromst,at) the( Q-valuert + t) of the(2) next current h ⇥ i (1) veryrithmic imprecise. implementations Therefore,applications including application to alleviate instar areas hardware and where outstar complexity precise training issues relatedstate-actionstate pair. ThereforeController (1) can be written as (2): since maxThea MAXQ˜ function(st+1,a int (1)+1) will= produceQ˜(st,a at value)+ thatt. Neural is a resistance values areto not CMOS-only required hardware better suit in the these sub-22nm devices, scaling regime. t+1 C Network Network C [3], performing optimal control, modeling the visual cortex, linear combinationC betweenNetwork Q˜(st,aNetworkt) and anotherC value t. network inspired approaches1 have been2 shown to efficiently 1 2 as shown in fuzzy systemThese implementationsapplications include FPGA,using memristorscellular neural networks, The valueh of can be zero,i positive or negative. It is a etc [4],[5]. The use of memristors in Boolean computing hasperform maximizingEnvironment t and minimizing functions [10]. Mem- [7]. digital memory, and programmable analog resistors. In addi- correcting˜ factor that˜ discernsnext how far apart the Q-value of Environment Q (s ,a ) currentQ (s ,a state)+↵ (s ,a ) (r + ) (2) next several limitationstion, [6] memristors since direct have control been proposed of memristors for higher-level areristors algo- in thet crossbart action configurationt t t aret nott onlyt usedt as This work accepts the imprecise memristor values and the current state-action pair is from the Q-value⇥ of the next state rithmic implementations including instar and outstar training Actor current followsvery imprecise. a similar route Therefore, by striving application to realize areas value where iteration precisememorystate-action but also as pair. processing Therefore (1) elements. can be written By monitoring as (2): the action [3], performing optimal control, modeling the visual cortex, since maxa Q˜ (st+1,at+1) = Q˜(st,at)+t. Neural [8] with memristor hardware and providing two applications current through selectedt+1 memoryVREAD devices, the MAX function Actor resistance valuesFig. areetc 1. [4],[5]. not Maze required example The use showing of better memristors starting suit position in these Boolean (green devices, square) computing and ending has position (red square) can be evaluated in parallel. of the chosen architecture. Q-Learning learns state-action network inspiredQ˜ (s ,a ) approachesh Q˜ (s ,a )+ have↵ (si,a been) shown(r + ) to efficiently as shown in fuzzyseveral system limitations implementations [6] since direct using control memristors of memristors are t Xt1 X2 tX14t X15 tX16t t t t (2) The next step is to cast (1) in a formY1⇥ that parallels values (Q-values); storing these Q-values in tabular form v V very imprecise. Therefore, application areas where preciseperform maximizing and minimizing functionsN21 [10]. Mem- n READ 16 [7]. ˜ ˜ ) memristor properties. Using the linear-drift dopant diffusion s for the entire state-action(3) can be space discretized is shown intoFig.n to 1.different reach Maze applicationsoptimal example of showingVapp startingsince max positionat+1 Q (s (greent+1,at+1 square)) = Q(sY andt,at)+ endingt. Neural 2 μ Vthresh=0.9

resistance values are not required better suit these devices, (

14 for t . In addition, we have shown through [12] thatristors in the crossbar configuration are notN22 only used as spec position (red square) model [11] the resistive value of a memristor is: s Vthresh=0.2 solutionsThis even work under accepts exploration. the imprecise The drawback memristor associated values and network inspired approachesh havei been shown to efficiently e thisas shown discretization in fuzzy leads system to implementations memristors behaving using memristors as (4). In k -1 i 12 Y ,j Cint 14 i p memoryperform but maximizing also as processing and minimizing elements. functions By [10]. monitoring Mem-Output the X X X X X M s withfollows this a approach similar(4), route[7]. isa the=1 by memory strivingVapptspec tosizen realize, b requirement= valueVapptspec iteration for, is defined N2 1 2 14 15 16

14 Registers N2 n 2 R j-1 Y 10 as (2⌘R)/(Q R ), and n is an integer. For the intended LOAD Y M 1 o 0 0 ristors in the crossbar configuration2 ⌘ R are15 not(t) only used as RLOAD i,j-1 r the tabular form of theThis algorithm work accepts may the be imprecise prohibitive. memristor In values andcurrent through selected memory devices, the MAX function Output u N2 M = R 1 · · · (3) N2 e [8] with memristorapplication, hardware Hebbian and providing learning is two envisioned. applications Therefore (4) T 0 2 15 N2 Registers Mi,j 1 8 memory but also as processing elements. By monitoring the j n follows a similar route by striving to realize value iteration Q0 R Y16 v M C M e order to alleviate thisis valid problem, if updating function the memristor(3) approximators can in be onediscretized direction have ensuring into n differents applications0 of Vapp i,j n i int , r can be evaluated in parallel. N2 Y2j+ 6 current through selected memory devices,· the MAX16 function 1 o of the chosen architecture. Q-Learning learns state-action f the[8] resistance with memristor is always hardware increasing and providing with increasing two applicationsn. The N2 been used to reduce memory size. Function approximators 2nd j+1 N2 e for tspec. In addition, we have shown through [12] that M C 2 b hardware that uses this memristor is described next. MT Theiscan the next be total evaluated step memristance, in is parallel. to castR (1)0 is in the a initial formneurons resistance that parallels i,j+1 int 4 of the chosen architecture. Q-Learning learns state-action e values (Q-values); storing these Q-values in tabular form N11 N12 N114 N115 N116 though need careful design since poor design may lead to m i this discretizationof leads the memristor, toThe memristors next⌘ stepcan is be to behaving cast viewed (1) in as a memristor as form (4). that parallelsIn pin con- t 2 values (Q-values); storing these Q-values in tabular formmemristor properties. Using the linear-drift dopant diffusion Y divergence.for the entire We propose state-action using space memristor is shown crossbar to reach instead. optimal N1i-1 N1i N1i+1 14 (4),R0 a =1 V nfiguration=0t memristorn with, b respect= properties.V to1st Using layer appliedt neurons the, linear-drift voltage is defineddopant that takes diffusion on a Output0 Mfor the= entire state-action space is shown to2 reachapp optimalspec app spec N2 Section II deals withT the⇠ mathematicalR + R pa details,1 b Section+ 1 b III n>0model [11] the resistive value of a memristor is: 14 Registers solutions even under exploration.( 0 n The0 drawback2 a 8 associateda 2 model [11] the resistive value of a memristor is: R 0 10 20 30 40 50 solutions even under exploration.as (2⌘ TheR drawback)/(Q0R associated0value), and of n1,isR anis integer. the memristor’s For the resistive intended range (dif- LOAD Y15 h i (4) Fig.± 2. (Top)System top level (Bottom)Schematic of the Network Compo- Fig. 3. (Left) Activation of neurons (Right) Equivalent circuit of activated n introduces the Maze application andP the prescribed hardware, nents with this approachwith is this the approach memory isapplication, the size memory requirement size Hebbian requirement forference for learning between is maximum envisioned. resistance ThereforeROFF and (4) minimum devices N215 the tabularIII. M formAZE ofAPPLICATION the algorithm AND mayHARDWARE be prohibitive. In 2 ⌘ R (t) Y Fig. 4. Effect of v on the charging time to spike I. E. Ebong is with the Department of Electrical Engineering and resistance RON ), M(tT) =isR the0 1 total·2 flux· ⌘ through· R ( thet) device,(3) 16 thresh the tabular form of the algorithmARCHITECTURE mayis valid be prohibitive. if updating In the memristor in one direction Q R2 ensuring Computer Science at Universityorder to of alleviate Michigan, this Ann problem, Arbor, function MI 48109 approximators USA. and haveQ is the chargeMT required= R0s to1 pass· through0 · 0 ·2 the memristor (3) N216 Given the maze in Fig. 1, we would like to train through0 other crossbar stores values of NetworkQ· 1 mirroredRn about the [email protected] to alleviate thisbeen problem, used to reduce function memorythe approximators resistance size. Function approximators is have always increasingdiagonal from the with tops left increasing corner to the0 bottom0. The right corner zeros. In the Run phase the network obtains the next position; value iteration, the generation of optimal actions to reachfor the dopantM boundaryis the total to memristance, move a distanceR is the comparable initial resistance to the 2nd layer t/Mij Cint T 0 · the first neuron to spike will have its corresponding output of the neurons take on the form vn(1 e ). By P.been Mazumder used is to a professor reducetargetthough with memory (RED) need the Department from careful size. the design start Functionhardware of Electrical position since poor approximators Engineering(GREEN). that design uses may The lead 16 this16 to memristor(Network 2 in is Fig. described 2). The top level next. system in Fig. 2 shows neurons and Computer Science at University of Michigan, Ann Arbor, MI 48109 device⇥ width.ofan the agent memristor, By acting choosing on⌘ can the a be environment. constant viewed as voltage memristor The components pulse pin con-Vapp of register latch aN1 “1” whileN1 the othersN1 are “0,”N1 andN1 will provide choosing a spiking threshold vthresh for the neurons less than mazedivergence. shows We admissible propose states using memristor in white and crossbar inadmissible instead. MT is the total memristance, R0 is the initial resistance 1 2 14 15 16 t/Mij Cint figuration with respect to applied voltage that takes on a vn, neuron j can spike whenever vn(1 e ) reaches [email protected] need carefulstates design in black. since Our approachpoor design to solving may this lead maze toand using applyingthe system this constant are: controller, pulse memristor for a specified network, comparators time tspec, a signal to the controller that this phase is complete. In the Section II deals with the mathematical details, Section III vthresh. The difference between the activated neurons lies in memristors is to store the value of each state (admissibleof or thevalue(C memristor, blocks), of 1, andR⌘ actor.iscan the The be memristor’s actor viewed performs resistive as memristor chosen range actions, (dif- pin con-Check phase the digital network asserts VREAD and connects R0 ± n =0 1st layer neurons j divergence. We proposeinadmissible)introduces using the in Maze a memristor memristor application crossbar. andcrossbar the A prescribed 16 instead.16memristor hardware, the comparators compare two values within the memristor R to decipher the values stored at two locations (the tspike, how long it takes for neuron j to spike: ⇥ figurationference with between respect maximum to resistance appliedR voltageOFF and that minimum takes on aLOAD 978-1-4799-5622-7/$31.00crossbar ©2014 array is IEEE thereforeM neededT = to store all values.967 The crossbar,1 theb memristor1 networkb 2 performs the MAX function Section II deals with the mathematical details,⇠ SectionR + III R presistancea RON ), +(t) is the total flux throughn> the0 device, value of current state vs. that of the next state). If the current mazeI. E. pattern Ebong can is with be the preprogrammed Department of( Electrical to the0 crossbar Engineeringn array and0 and generates2 a next state8 a information, and the controller j vthresh value of 1, R is the memristor’s resistive range (dif-state’s sampled voltage is equal to or greater than the next tspike > Mij Cint ln 1 (5) Computer Science at University of Michigan, Ann Arbor, MI 48109 USA. andcoordinatesQ0 is the communication charge required to between pass through all components. the memristor The · · v introduces the Mazewhereby application inadmissible and states the prescribed are programmed hardware, to ROFF and ± (4) Fig. 2.state’s (Top)System voltage, then top a punish level signal (Bottom)Schematic is generated. In the Train of the Network Compo- ✓ n ◆ [email protected] admissible states are programmed to values aroundference an twoh between memristor maximum networks have resistance thei sameR componentsand and minimum a P for dopant boundary to move a distance comparableOFF to the nents Figure 4b shows the graph of (5) and how the choice of initialP. Mazumder resistance is aR professor. The with search the Department space is discretized of Electrical Engineering into time detailed network schematic is also provided in Fig. 2. phase, the punish signal is used to reduce the weight of 0 device width. By choosing a constant voltage pulse Vapp v I. E. Ebong is withperiodsand theComputer where Department Science one move at ofUniversity Electricalis made of Michigan, per EngineeringIII. unit Ann time. M Arbor,AZE Each and MIA move48109resistancePPLICATIONTheR networkON ), AND blocks(t) His haveARDWARE the two total sets flux of neurons through (1st the and device,the current state. The neural network approach is used to thresh can affect circuit operation. Since the MAX function USA. [email protected] and2nd applying layer), allowing this constant access pulse to the for value a specified of each time statetspec on, the translate the time to spike to approximate the environment. depends on the comparison of spike times of different Computer Science atmade University in must of Michigan, either progress Ann Arbor, to adjacent MI 48109 states USA. or stayand at RCHITECTUREQ is the charge required to pass through the memristor the current state. For example, if at time period p=1 and theA memristor0 crossbar array. This architecture approximates a The architecture is a hybrid architecture that combines both neurons, separation of these spike times for different n values [email protected] . Network 1 is the forward path and current978-1-4799-5622-7/$31.00 state is the green square, ©2014Given thenIEEE the the three maze valid in statesfor Fig.967 dopant 1, we boundary would to like move to a train distance through comparableother to theanalog crossbar processing stores with values digital controls. of Network This architecture 1 mirroredis critical about for correct the circuit operation. By increasing vthresh, P. Mazumder is a professorfor transition with the are Department the two adjacent of Electrical white Engineering squares and the Network 2 is the feedback path. For example, in the maze is used to approximate value iteration and is tested in the while other parameters are kept at their previous levels, there application, neurons correspond to horizontal and verticaldiagonal from the top left corner to the bottom right corner and Computer Sciencegreen at University square. Decisions of Michigan,value regarding Ann iteration, state Arbor, transition MI 48109 the are generation madedevice width. of optimal By choosing actions a constant to reach voltage the pulse Vappcontext of a simple maze problem. is a wider change in spike time. The quoted times are in µs, coordinates. At any given time the admissible actions are: by obtaining the stored values of valid states in reference to (Network 2 in Fig. 2). The top level system inand Fig. if transistors 2 shows used for implementation are sensitive to the USA. [email protected] present state. The drawbacktarget to (RED) this one-step from lookahead theand start applyingstay position at current this constant state, (GREEN). move pulse one forThe space a 16specified in any16 diagonal, time tspec,For the maze application, value iteration updates based on hundreds of nanosecond range, then there should be minimal approach is its limited depth search that takes longer for horizontal, or vertical direction. Network 1 determines⇥ thean agent(2), but acting the exact nature on the of the environment. update term, ↵t(st,a Thet) components of maze shows admissible states in white and inadmissible ⇥ n n training to converge to an approximation of the optimal path next Y position, while Network 2 determines the next X (rt + t), has not clearly been defined. In the maze problem, problem detecting the larger of =50 and =51. position. The controller coordinates actions of the networksthe system are: controller, memristor network, comparators 978-1-4799-5622-7/$31.00from start to ©2014 end. IEEEstates in black. Our967 approach to solving this maze using ↵t(st,at) will be limited to take on a value of either 1 or 0 using four control phases: Start, Run, Check, and Train. The memristor crossbar is used to store state values. (C blocks),and the sum and of rt and actor.t can The be cast actor to take on performs the value of chosen actions, Therefore, to reduce hardwarememristors complexity is with to respect store to the valueThe Start ofphase each is state a wait (admissiblephase whereby the or crossbar 1 b 1 b 2 MT which corresponds to the R0pa + accessing the crossbar, two crossbars are used: whereby network is not accessed. All the switches in Fig. 2 are open,the comparators compare two values 2 a within8 a the memristor inadmissible) in a memristor crossbar. A 16 16 memristor term in (4). This proposed matching works in this application one stores values in order (Network 1 in Fig. 2) while the all input and output neurons disabled,⇥ and output registers are h i crossbar array is therefore needed to store all values. The crossbar,because the the memristor envisioned system network has memristors performs initialized the MAX function 968 andaround generatesR0 and any next memristor state updates information, will adjust resistance and the controller maze pattern can be preprogrammed to the crossbar array ˜ coordinatesby MT . ↵ communicationt(st,at) is 1 if Q(st,a betweent) is greater all than components. or The whereby inadmissible states are programmed to ROFF and equal to Q˜max(st+1,at+1), otherwise ↵t(st,at) is 0. The the admissible states are programmed to values around an twopunish memristor signal generated networks in the Check havephase the determines same which components and a detailedvalue ↵ networkt(st,at) takes. schematic This restriction is on also↵t(s providedt,at) ensures in Fig. 2. initial resistance R0. The search space is discretized into time (a) (b) Q˜max(st+1,at+1) is always decreased when updated since 1000 1200 periods where one move is made per unit time. Each move Thert + networkt is always a blocks positive number. have two sets of neurons (1st and ˜ 800 1000 made in must either progress to adjacent states or stay at 2nd layer),The value allowing for ↵t(st,a accesst) depends to on theQmax value(st+1,a oft+1) each, state on the 800 ˜ 600 and Qmax(st+1,at+1) is obtained from the neuromorphic 600 (26,45)

memristor crossbar array. This architecture approximates# steps a the current state. For example, if at time period p=1 and the side of the circuit. A simple leaky integrate and fire neuron 400 (15,28)

# steps 400 current state is the green square, then the three valid states recurrentshould work neural for this network. purpose. The Network left schematic 1 isin Fig. the 3 forwardis 200 path and 200 used to explain the nearest neighbor concept. From a current for transition are the two adjacent white squares and the Network 2 is the feedback path. For example, in the0 maze10 20 0 10 20 30 application,X position and neurons a current correspondY position, switch to corresponding horizontal and verticaliteration iteration green square. Decisions regarding state transition are made to Xj is activated and Yi 1, Yi, and Yi+1 are enabled. Using (c) (d) by obtaining the stored values of valid states in reference to coordinates.RC integrators At to model any neuron given internal time state, the the admissible equivalent actions are: the present state. The drawback to this one-step lookahead staycircuit at current for these activated state, devices move is shown one (Fig. space 3 right). in anyFig. 5. diagonal, (a,b) Near optimal paths for two mazes from start to finish The first order RC circuit shows that the internal state (c,d) Number of steps till convergence for each maze approach is its limited depth search that takes longer for horizontal, or vertical direction. Network 1 determines the training to converge to an approximation of the optimal path next Y position, while Network 2 determines969 the next X from start to end. position. The controller coordinates actions of the networks The memristor crossbar is used to store state values. using four control phases: Start, Run, Check, and Train. Therefore, to reduce hardware complexity with respect to The Start phase is a wait phase whereby the crossbar accessing the crossbar, two crossbars are used: whereby network is not accessed. All the switches in Fig. 2 are open, one stores values in order (Network 1 in Fig. 2) while the all input and output neurons disabled, and output registers are

968 IV. RESULTS AND DISCUSSION Learning method would provide a way to adjust weights in order while avoiding the hardware cost of having outside The current simulation results for the models derived cache memories to keep track of which synaptic elements vn 16 were obtained through MATLAB. The parameters used for )

s need to be adjusted.

μ Vthresh=0.9

( simulation were vthresh/vn =0.75, Vapp=1.2 V, Cint=1 pF,

14 s Vthresh=0.2 1 1 e t R ⌦ R ⌦ k spec=2 ms, =-199.8 V s , 0=2 M , ON =20 k , V. C ONCLUSION -1 i 12 ,j Cint i p ·

M s

and ROFF =20 M⌦. Fig. 5a and Fig. 5b provide path from

N2j-1 n 10 We have shown the concept of value iteration being M o RLOAD i,j-1 r start to finish when using the architecture of Fig. 2. Fig. 5c

Output u applied to the memristor crossbar in a way that is realizable N2 Registers Mi,j e 8 j n

and Fig. 5d, respectively, provide the relationship for number v M C M e i,j n i int with the aid of CMOS hardware. We have shown how , r j+ 6 1 o of steps to reaching the target vs. number of training stages N2 f maze learning can be implemented using the crossbar along j+1 e

M C b i,j+1 int 4 for convergence. The hardware prefers exploration in the first

e the same lines as value iteration with Q-values. We have m

i iteration. During the second iteration, the number of steps is t 2 dissected the memristor modeling equation to show that the N1 N1 N1 i-1 i i+1 0 drastically reduced. The maze in Fig. 5a has two paths to neural network model whereby state information can be 0 10 20 30 40 50 target, but the shorter route is chosen. Fig. 5b has a longer translated to delayed spike timing is shown. Overall, the Fig. 3. (Left) Activation of neurons (Right) Equivalent circuit of activated n path to target, hence, it takes longer to converge. target of the proposed implementation is a mixed signal devices The paths shown in the solutions are near-optimal, with design that combines the benefits of analog circuits, digital Fig. 4. Effect of v on the charging time to spike thresh the suboptimal moves circled. After convergence, memristor domain spike representations, and the use of memristors as updates cease on their own, therefore, ensuring the hardware synapses. zeros. In the Run phase the network obtains the next position; t/Mij Cint will monitor its own performance. This quality allows for of the neurons take on the form vn(1 e ). By The next step is to verify this use of memristor crossbar the first neuron to spike will have its corresponding output this application to be extended to key generation using register latch a “1” while the others are “0,” and will provide choosing a spiking threshold vthresh for the neurons less than in hardware and to apply the concept to various other appli- t/Mij Cint memristive neural hardware. This approach can generate v , neuron j can spike whenever v (1 e ) reaches cations. As this method deals with the process of adjusting a signal to the controller that this phase is complete. In the n n vthresh. The difference between the activated neurons lies in multiple keys due to the nature of the architecture. Fig. 6 Check phase the digital network asserts VREAD and connects memristor values on the crossbar in a non-programmable tj , how long it takes for neuron j to spike: plots relative values of the memristor on the Z axis vs. RLOAD to decipher the values stored at two locations (the spike manner, the applications that would benefit from it would value of current state vs. that of the next state). If the current the position on a 2D grid. From one iteration to the next, need to be neuromorphic ones where the memristor crossbar j vthresh the entire crossbar could look very different, so if multiple state’s sampled voltage is equal to or greater than the next t > Mij Cint ln 1 (5) array needs to be adjusted in its entirety. Providing a method spike · · v state’s voltage, then a punish signal is generated. In the Train ✓ n ◆ random keys were to be generated using the fabric on the for this to happen algorithmically without preprogramming phase, the punish signal is used to reduce the weight of Figure 4b shows the graph of (5) and how the choice of left, it would be very hard to reconstruct that key using the would be the best course of action. the current state. The neural network approach is used to vthresh can affect circuit operation. Since the MAX function fabric on the right. translate the time to spike to approximate the environment. depends on the comparison of spike times of different The current hardware is very specific to the maze search REFERENCES The architecture is a hybrid architecture that combines both neurons, separation of these spike times for different n values application but is useful in neural processing systems as a [1] L. O. Chua, “Memristor - missing circuit element,” IEEE Transactions analog processing with digital controls. This architecture is critical for correct circuit operation. By increasing vthresh, whole. Neural systems rely on the updating of synapses be- on Circuit Theory, vol. CT18, no. 5, pp. 507–519, 1971. while other parameters are kept at their previous levels, there [2] D. B. Strukov, G. S. Snider, D. R. Stewart, and R. S. Williams, “The is used to approximate value iteration and is tested in the tween neurons based on network activity as time progresses. missing memristor found,” Nature, vol. 453, no. 7191, pp. 80–83, context of a simple maze problem. is a wider change in spike time. The quoted times are in µs, The Q-Learning inspired approach traverses a network of 2008. For the maze application, value iteration updates based on and if transistors used for implementation are sensitive to the synapses in particular orders therefore one very good use [3] G. Snider, “Instar and outstar learning with memristive nanodevices,” hundreds of nanosecond range, then there should be minimal Nanotechnology, vol. 22, no. 1, p. 015201, 2011. [Online]. Available: (2), but the exact nature of the update term, ↵t(st,at) of this method would be the adjusting of synapse values ⇥ problem detecting the larger of n=50 and n=51. http://stacks.iop.org/0957-4484/22/i=1/a=015201 (rt + t), has not clearly been defined. In the maze problem, Performance of Memristor Q-Learningrelative to one another Hardware without actually preprogramming the [4] P. Mazumder, S. M. Kang, and R. Waser, “Memristors: Devices, ↵t(st,at) will be limited to take on a value of either 1 or 0 sequence of updates. With the uncertainties in memristors, models, and applications [scanning the issue],” Proceedings of the and the sum of rt and t can be cast to take on the value of IEEE, vol. 100, no. 6, pp. 1911 –1919, june 2012. 2 effectively programming the devices to specific resistances [5] P. J. Werbos, “Memristors for more than just memory: How to M which corresponds to the R pa 1 b + 1 b T 0 2 a 8 a are out of the question. By using a method like this that use learning to expand applications,” in Advances in Neuromorphic term in (4). This proposed matching worksh in this applicationi allows synaptic devices to be adjusted, one after another in Memristor Science and Applications, ser. Springer Series in Cognitive and Neural Systems, R. Kozma, R. E. Pino, G. E. Pazienza, J. G. because the envisioned system has memristors initialized order to meet overall processing requirements may be the around R and any memristor updates will adjust resistance Taylor, and V. Cutsuridis, Eds., vol. 4. Springer Netherlands, 2012, 0 breakthrough necessary to finally utilizing these devices as pp. 63–73. by M . ↵ (s ,a ) is 1 if Q˜(s ,a ) is greater than or T t t t t t synapses. In neuromorphic applications, the absolute synap- [6] I. E. Ebong and P. Mazumder, “Self-controlled writing and erasing in equal to Q˜max(st+1,at+1), otherwise ↵t(st,at) is 0. The a memristor crossbar memory,” Nanotechnology, IEEE Transactions punish signal generated in the Check phase determines which tic weight value does not matter as long as the relative weight on, vol. 10, no. 6, pp. 1454–1463, 2011. relationship of the weight matrix are maintained. This Q- [7] F. Merrikh-Bayat and S. Bagheri Shouraki, “Memristive neuro-fuzzy value ↵t(st,at) takes. This restriction on ↵t(st,at) ensures Memristor Model Used in ˜ (a) (b) system,” Systems, Man, and , Part B: Cybernetics, IEEE Qmax(st+1,at+1) is always decreased when updated since Matlab Simulation Transactions on, vol. PP, no. 99, pp. 1–17, 2012. 1000 1200 rt + t is always a positive number. [8] C. J. C. H. Watkins, “Learning from Delayed Rewards,” Ph.D. disser- ˜ 800 1000 tation, Cambridge University, 1989. The value for ↵t(st,at) depends on Qmax(st+1,at+1), 800 600 0 10 [9] B. Balleine, N. Daw, and J. O’Doherty, “Multiple Forms of Value and Q˜max(st+1,at+1) is obtained from the neuromorphic (26,45) 600 0 Learning and the Function of Dopamine,” in Neuroeconomics: decision −10 # steps 400 side of the circuit. A simple leaky integrate and fire neuron (15,28) −10 making and the brain, P. W. Glimcher, Ed. Academic Press, 2009,

# steps 400 −20 should work for this purpose. The left schematic in Fig. 3 is 200 200 −20 pp. 367–385. −30 used to explain the nearest neighbor concept. From a current −30 [10] J. J. Hopfield and D. W. Tank, “‘Neural’ computation of decisions 0 10 20 0 10 20 30 −40 −40 in optimization problems,” Biological Cybernetics, vol. 52, no. 3, pp. X position and a current Y position, switch corresponding 20 20 iteration iteration 15 20 15 20 141–152, 1985. 10 15 10 15 10 10 to Xj is activated and Yi 1, Yi, and Yi+1 are enabled. Using 5 5 [11] Y. N. Joglekar and S. J. Wolf, “The elusive memristor: properties of (c) (d) X 5 X 5 0 0 Y 0 0 RC integrators to model neuron internal state, the equivalent Y basic electrical circuits,” European Journal of Physics, vol. 30, no. 4, pp. 661–675, 2009. circuit for these activated devices is shown (Fig. 3 right). Fig. 5. (a,b) Near optimal paths for two mazes from start to finish Fig. 6. (Left) Memristor Fabric with relative values after one iteration [12] I. E. Ebong, “Training memristors for reliable computing,” Ph.D. The first order RC circuit shows that the internal state (c,d) NumberIdong of steps and till Mazumder, convergence for IIEEE each maze Nano 2014 Synapse States after dissertation, The University of Michigan, 2013. (Right)1st Memristor and Fabric 2 withnd relative Iterations weights after two iterations 969 970 114 P. Mazumder et al. / INTEGRATION, the VLSI journal 54 (2016) 109–117

consist of a training controller, a testing controller, a mux to select Table 3 shows the power consumption of both designs. between training signals and testing signals, a training signal Although, in term of the number of neurons, the large design is 58 generator and the SNN. times of the small design, its area is only 4 times of the small Fig. 6 also shows the dataflow of each design. The biggest dif- design and its power consumption is only 9 times of the small ference between Fig. 6a and b is peripheral circuitry needed to design. This implies that the area and power consumption of the handle the differing network sizes. Fig. 6a has a block for the SNN design do not increase linearly with respect to the size of the which contains 7 neurons and 8 synapses while Fig. 6b needed to design and thus allows us to increase the design’s size with lower be broken apart in order to handle the wire connectivity. The cost. The next section presents the results of both chips and pro- floorplans correspond directly to the layouts so those are not vides a cross comparison between all platforms. included in this article. In layout pictures, each major block is circled and corresponds to a block shown in the corresponding floor plan. Table 2 breaks down the layout area of each major 4. Results and discussion component defined Fig. 6. The full chip area without pads for each design is provided as well. With pads included, the small design The indirect training algorithm was implemented in MATLAB, has an area of 2.8 mm by 2.8 mm while the larger design has an FPGA and CMOS to train the SNN presented in Section 2 to allow area of 6.5 mm by 4.8 mm. the virtual insect to perform the terrain navigation task. The small design has also been tested on an Altera Cyclone II EPC20F484C7 Table 2 FPGA board to ensure that the implemented design works in real Layout area of both CMOS designs. hardware. To implement the small design 21,172 logic elements

SMALL LARGE and 1104 dedicated logic register were used. Block Area Block Area The performance of the trained insect was first evaluated by MATLAB simulations, as demonstrated in Fig. 7, in which the green SNN 800 μm 400 μm Input layer 4600 μm 1600 μm   line and the blue line depict trails of the untrained state and the Hidden layer 3750 μm 500 μm  trained state of the virtual insect, respectively. The results show Plasticity (STDP)Output layerBased 800 μm 200Learningμm 2 Chip   Training signal 450 μm 20 μm Training signal 710 μm 36 μm that after the SNN was fully trained, the virtual bug was capable of   generatorfor Virtualgenerator Bug Navigation Testing 300 μm 980 μm Testing 900 μm 400 μm   controller controller TrainingNon-Evaluative720 μm 160 μm TrainingFeedback4900 μm (Correlation)200 μm   controller controller Control signal 300 μm 300 μm Control signal 950 μm 800 μm   mux mux Whole chip 1500 μm 1200 μm Whole chip 5200 μm 4000 μm   without pads without pads

Table 3 Layout area of both CMOS designs

Total Leakage/μW SNN's Other # of power/ power/ power/ neurons mW mW mW

Large design 13.24 1.06 9.485 3.755 406 Small 1.468 0.0679 0.889 0.579 7 design Large/small 9.019 15.61 10.669 6.485 58 ratio Fig. 8. Untrained hardware virtual insect trajectory on a map of size 1000 1000. Â VLSI Journal, 2015

Fig. 7. Trails of the virtual insect on a uniform, obstacle-free terrain and on a terrain with different roughness, populated by obstacles [1]. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.) Conclusion

Facets of Neuromorphic or Brain-like Computing:

1. Self-Healing 2. Learning & Plasticity 3. Cognition 3. Associative Memory

Adaptive Hardware Platform: Optimal , multi-agent systems, swarm intelligence, control, computer games, telecommunications, smart grid for power distribution, and Markov decision process (MDP)