Scalable Hardware Architecture for Memristor Based Artificial Neural Network Systems

Scalable Hardware Architecture for Memristor

Based Artificial Neural Network Systems

A thesis submitted to the

Graduate School

of the University of Cincinnati

in partial fulfillment of the

requirements for the degree of

MASTER OF SCIENCE

in the Dept. of Electrical Engineering and Computing Systems

of the College of Engineering and Applied Sciences

May 2016

Ananthakrishnan Ponnileth Rajendran

B.Tech, Amrita Vishwa Vidyapeetham University, Kerala, India

May 2013

Thesis Advisor and Committee Chair: Dr. Ranga Vemuri

Abstract

Since the physical realization of the Memristor by HP labs in 2008, research on Mem- ristors and Memristive devices gained momentum, with focus primarily on modelling and fabricating Memristors and in developing applications for Memristive devices. The Memristor’s potential can be exploited in applications such as neuromorphic engineering, memory technology and analog and digital logic circuit implementations. Research on Memristor based neural networks have thus far focused on developing algorithms and methodologies for implementation. The Memristor Bridge Synapse, a Wheatstone bridge-like circuit composed of four Memristors is a very effective way to implement weights in hardware neural networks. Re- search on Memristor Bridge Synapse implementations coupled with the Random Weight Change Algorithm proved effective in learning complex functions with potential for implementation on hardware with simple and efficient circuity. However, the simulations and experiments conducted was purely on software and was only proof of concept. Realizing neural networks using the Memristor Bridge Synapse capable of on-chip training requires an effective hardware architecture with numerous components and complex timing. This thesis presents a scalable hardware architecture for implementing artificial neural networks using the Memristor Bridge Synapse capable of being trained on-chip using the Random Weight Change algorithm. Individual components required for implementing training logic, timing and evaluation are described and simulated using SPICE. A complete training simulation for a small neural network based on the proposed architecture was performed using HSPICE. A prototypical placement and routing tool for the architecture is also presented.

To my parents and my sister. Thank you for being my inspiration.

In memory of my friends Govind and Srinivas. You’ll forever be in my heart.

iv Acknowledgements

I would like to start by thanking the most important people in my life, my family. My parents Rajendran and Rajam have made a lot of sacriﬁces to help my sister Malavika and I to realize our dreams. Thank you very much for believing in me and motivating me towards realizing my goals. I will forever be indebted to you. I consider myself very lucky to have received the opportunity to work under Dr. Ranga Vemuri. The knowledge you imparted will forever stay with me. Thank you very much for letting me be a part of DDEL and guiding me through my Master’s journey. Thank you Dr. Wen-Ben Jone and Dr. Carla Purdy for being part of my defense committee. Thanks to Rob Montjoy for providing continuous support with the DDEL machines. Special thanks to my friend Prabanjan for our innumerable discussions and the ideas you gave me to put my work together. I would like to thank my friends Diwakar, Ashwini and Meera for providing a helping hand on numerous occasions. Thank you Renuka for reviewing my thesis. I would like to thank all my teachers from primary school through college for moulding me into the person I am today. Special thanks to Dr. Rajesh Kannan Megalingam for inducing interest in the ﬁeld of VLSI in me and motivating me to pursue a Master’s degree. Last but not least, thanks to all my friends and relatives for being a part of my journey of life. I will forever be grateful for your help and support.

v Contents

1 Introduction 1

1.1 The Memristor ...... 2

1.2 Artiﬁcial Neural Networks ...... 4

1.3 Artiﬁcial Neural Networks on Hardware ...... 7

1.3.1 Analog Neural Network Implementations ...... 7

1.3.2 Memristor Based Neural Networks ...... 10

1.4 Random Weight Change Algorithm ...... 15

1.5 Memristor Bridge Synapse ...... 17

1.6 Thesis Statement ...... 19

1.7 Thesis Overview ...... 21

2 The Memristor Neural Network Architecture 22

2.1 Architecture Overview ...... 23

2.2 Architecture Components ...... 25

2.2.1 Neuron Block ...... 26

2.2.2 Microcontroller ...... 31

2.2.3 Shift Register ...... 34

2.2.4 Connection Buses ...... 35

2.3 Memristor Bridge Synapse Bit-Slice ...... 35

2.4 Architecture in a nutshell ...... 36

2.5 Summary ...... 37

vi CONTENTS

3 Placement and Routing Tool for Memristor Neural Network Architec- ture 38

3.1 Tool Overview ...... 38

3.2 Tool Flow ...... 40

3.3 Output and Performance Analysis ...... 43

3.3.1 Area Analysis ...... 44

3.3.2 Runtime Performance ...... 47

3.4 Scalability ...... 48

3.5 Summary ...... 49

4 Experimental Results and Analysis 50

4.1 Memristor Simulation ...... 51

4.2 Memristor Bridge Synapse Simulation ...... 54

4.3 Memristor Bridge Synapse Bit-Slice Simulation ...... 57

4.4 Simple Neural Network Simulation ...... 59

4.5 OR-Gate Training in SPICE ...... 62

4.5.1 Experimental Setup ...... 62

4.5.2 Observation and Analysis ...... 65

4.6 Power and Timing Estimation ...... 66

4.6.1 Power ...... 66

4.6.2 Timing ...... 68

4.7 Training Performance ...... 69

4.8 Summary ...... 73

5 Conclusion and Future Work 74

5.1 Conclusion ...... 74

5.2 Future Work ...... 75

5.2.1 Implementing Stronger Activation Function ...... 75

5.2.2 Linear Feedback Shift Register for Random Bits ...... 76

5.2.3 Implementing other Hardware Friendly Algorithms ...... 76

vii CONTENTS

5.2.4 Bit-slice in Layout ...... 76 5.2.5 Testing with more Memristor Models ...... 76 5.2.6 Reconﬁgurable Neural Network ...... 77

Bibliography 78

viii List of Figures

1.1 Conceptual symmetries of the four circuit variables with the three classical circuit elements and the memristor [1]...... 2

1.2 Cross section of HP’s Crossbar Array showing the memristor switch [2]. .3

1.3 Representation of a simple three-layered artiﬁcial neural network [3]. . . .5

1.4 Diﬀerential ﬂoating gate synapse schematic diagram of Electrically Train- able Analog Neural Network (ETANN) [4]...... 8

1.5 Analog current synapse, synapse current input, weight-control and neuron output circuit schematic of model proposed in [5]...... 9

1.6 Schematic of a weight cell of CMOS integrated feed-forward neural network [6]...... 10

1.7 Excitatory neuron with the input sensing circuit of Memristor Crossbar Architecture for Synchronous Neural Networks [7]...... 11

1.8 Weighting and Range Select circuit for RANLB and MTNLB [8]...... 12

1.9 (a) Activation function circuit for RANLB. (b) Activation function circuit for MTNLB [8]...... 12

1.10 Circuit that accomplishes weighting using the Memristor bridge synaptic circuit and voltage-to-current conversion with diﬀerential ampliﬁer in [9]. 13

1.11 (a) Typical multi-layered neural network inputs in voltage form. (b) Schematic of learning architecture for the equivalent hardware for the neural network in (a) [9]...... 14

1.12 Flowchart for Random Weight Change Algorithm...... 15

ix LIST OF FIGURES

1.13 Illustration of energy surface tracing by back-propagation and random weight change algorithm [9]...... 16

1.14 Memristor Bridge Synapse Circuit [10]...... 17

2.1 Sample input for face pose identiﬁcation problem [11]...... 22

2.2 Three layered neural network for face pose identiﬁcation...... 23

2.3 Memristor based neural network architecture for face pose identiﬁcation. 24

2.4 Simple three-layered neural network...... 26

2.5 Memristor Bridge Synapse design...... 27

2.6 Summing logic for neuron N3 from Figure 2.4...... 28

2.7 Summing circuit using voltage average and operational ampliﬁer circuits. 29

2.8 Difference circuit using differential amplifier...... 30

2.9 Neuron N3 inputs and output for the neural network in Figure 2.4. . . . 32

2.10 Memristor Bridge Synapse Bit-Slice...... 36

3.1 Placement of 10 blocks of output layer on layout represented with p-diﬀusion. 41

3.2 After routing of input bus for placed blocks in Figure 3.1...... 42

3.3 Completed placement and routing for neural network with 30 inputs, 10 hidden layer neurons and 10 output layer neurons...... 43

3.4 Flowchart showing the tool ﬂow for placement and routing...... 44

3.5 Layout for face pose identiﬁcation neural network with 960 inputs, 10 hidden layer neurons and 4 output layer neurons...... 45

3.6 Output layer layout for face pose identiﬁcation neural network...... 46

3.7 Output layer layout for neural network with 80 inputs, 12 hidden layer neurons and 15 output layer neurons...... 47

3.8 Neural network with 80 inputs and 15 output layer neurons having two hidden layers with 30 neurons in the ﬁrst hidden layer and 25 neurons in the second hidden layer...... 48

4.1 Circuit for Memristor simulation with Memristor M1 (Ron =116Ω, Roff =16kΩ) in series with resistor R1 (100Ω) and Voltage source Vin...... 51

x LIST OF FIGURES

4.2 Memristor simulation with DC voltage +1V and -1V...... 52 4.3 Resistance change in the memristor for millisecond input pulse-width. . . 53 4.4 Resistance change in the memristor for microsecond input pulse-width. . 54 4.5 Memristor Bridge Synapse circuit used for simulation...... 55 4.6 Memristor Bridge Synapse simulation waveform...... 55 4.7 Evaluation pulse applied to Memristor Bridge Synapse...... 56 4.8 Memristor Bridge Synapse Bit-Slice simulation waveform...... 58 4.9 Neural network training input application and output evaluation. . . . . 59 4.10 Neural network weight update pulse application...... 60 4.11 Neural network output at evaluation during diﬀerent iterations...... 61 4.12 Flowchart showing tool ﬂow for neural network training simulator in SPICE. 62 4.13 Neural network output for learning OR-gate function at the start of simulation...... 64 4.14 Neural network output for learning OR-gate function for 54th iteration of training...... 65 4.15 Neural network output for learning OR-gate function at the end of simulation. 66 4.16 Mean squared error vs iterations for training OR-gate function...... 70 4.17 Mean squared error vs iterations for simulation 2...... 71 4.18 Mean squared error vs iterations for simulation 3...... 72 4.19 Mean squared error vs iterations for simulation 4...... 72 4.20 Mean squared error vs iterations for simulation 5...... 73

xi List of Tables

2.1 Training input selection logic...... 35

3.1 Comparison of total layout area for neural networks for diﬀerent technology nodes...... 46 3.2 Fraction of unused area in layout for diﬀerent neural networks ...... 47

4.1 Instantaneous current and resistance measurements for forward biased memristor...... 53 4.2 Instantaneous current and resistance measurements for reverse biased memristor...... 54 4.3 Weight change for diﬀerent training signal pulse-widths for memristor bridge synapse ...... 57 4.4 Comparison of training performance for multiple simulations for training OR-gate function in HSPICE ...... 70

xii Chapter 1

Introduction

In 1971, Leon Chua presented an argument that a fourth two-terminal device should exist along with the three classical circuit elements, namely, the resistor, capacitor and inductor [12]. He named this fourth circuit element as the Memristor. Chua pointed out that the three basic circuit elements were deﬁned based on a relationship between two of the four fundamental circuit variables current, voltage, charge and ﬂux-linkage. There are six possible relationships between these four circuit variables, of which two are direct relationships. Z q = i(t)dt, (1.1) is the relationship between charge (q) and current (i) and,

Z φ = v(t)dt, (1.2) is the relationship between flux-linkage (φ) and voltage (v). The other three relations are based on the axiomatic definition of the three classical circuit elements. The resistor is defined by the relationship between current and voltage, the inductor by current and flux-linkage and the capacitor by the relationship between charge and voltage. Chua postulated based on a logical as well as axiomatic point of view that a fourth basic two-terminal device should exist, which can be characterized by charge and flux-linkage.

There was no physical realization for such a two terminal device for over three decades

1 CHAPTER 1. INTRODUCTION

Figure 1.1: Conceptual symmetries of the four circuit variables with the three classical circuit elements and the memristor [1]. since Chua’s proposal, until in 2008 Dmitri B. Strukov et al. from HP Labs published an article claiming that they observed memristance arises naturally in nanoscale systems on coupling solid-state electronic and ionic transport under an external bias voltage [13]. Since this discovery, research on Memristors and Memristive devices gained momentum, with focus primarily on modelling and fabricating memristors and in developing applications for memristive devices. The memristor’s potential can be exploited in applications such as neuromorphic engineering, memory technology and analog and digital logic circuit implementations. The work presented in this thesis focuses on the application of the memristors in the area of artiﬁcial neural networks.

1.1 The Memristor

The Memristor is a two terminal device whose electrical resistance is not a constant, but varies depending on the amount of charge that ﬂows through it. This variable resistance of the Memristor is termed as its Memristance. The memristor is non-volatile in nature, meaning that the device can remember its most recent resistance value even after it is

2 1.1. THE MEMRISTOR

Figure 1.2: Cross section of HP’s Crossbar Array showing the memristor switch [2].

disconnected from an electric power supply. This property of the memristor makes it very useful for various applications such as in designing eﬃcient memories and hardware realizations of artiﬁcial neural networks.

There have been several implementations for the memristor device such as the Poly- meric Memristor, Layered Memristor, Ferroelectric Memristor, Spin Memristive systems etc. In this text, we will discuss the Titanium Dioxide Memristor that HP developed in 2008. Researchers Dmitri B. Strukov et al. developed the memristor while working on crossbar memory architecture at HP Labs. The crossbar is an array of perpendicular wires that are connected using swtiches at points where they cross. Their idea was to open and close these switches by applying voltages at the end of the wires. The design of these switches lead to the creation of the memristor.

HP’s memristor is composed by sandwiching a thin layer of titanium dioxide (T iO2) between two platinum electrodes. The electrodes are about 5nm thick and the T iO2 layer

is about 30nm thick. The T iO2 layer is divided into two separate regions, one composed

of pure T iO2 and the other slightly depleted of oxygen atoms. These oxygen vacancies act as charge carriers and help conduct current through the device leading to a lower resistance in the oxygen depleted region. The application of an electric ﬁeld results in

3 CHAPTER 1. INTRODUCTION a drift of these oxygen vacancies which results in a shift of the boundary between the low and high resistance regions. Figure 1.2 shows a cross sectional view of HP’s crossbar array with the memristor. If an electric ﬁeld is applied across the two electrodes, it results in the boundary between the normal region and oxygen depleted region moving either towards or away from the upper platinum electrode. If the boundary moves towards the upper electrode, it results in higher resistance and vice versa. Thus, the resistance of the device is dependent on how much charge has passed through it in a particular direction. The memristance is observed only when both the pure and doped regions contribute to the resistance. After enough charge passes through the device, the ions becomes unable to move further and the device enters hysteresis. The device then acts as a simple resistor until the direction of the current is reversed.

In 2010, R. Stanley Williams of HP labs reported that they were able to fabricate memristors as small as 3 nm by 3 nm in size that had a switching time of 1 ns (1 GHz speed). Such small dimension and great speed promises a lot of application for the memristor. In the work presented here, the memristor’s ability to provide a wide range of resistance values is utilized in creating synaptic weights for artiﬁcial neural networks. For simplicity, we have used the term ’resistance’ instead of ’memristance’ throughout in this text.

1.2 Artiﬁcial Neural Networks

Artificial neural networks are group of nodes that are connected using weighted edges. They are models inspired by biological neural networks and are used to estimate or approximate functions that usually depend on a large number of unknown inputs. The ability of artificial neural networks to adapt to a given set of circumstances is what makes them very attractive for applications such as pattern recognition, data mining, game-play and decision making, medical diagnosis etc. Neural networks adapt to a given set of inputs by modifying the weights of the interconnects between its neurons based on a suitable algorithm. An activation function at the neuron defines its output for an

4 1.2. ARTIFICIAL NEURAL NETWORKS input or set of inputs to it. There are mainly three learning paradigms, viz. supervised learning, unsupervised learning and reinforcement learning.

Every neural network has one input layer and one output layer. It may have one or more hidden layers. Figure 1.3 shows a simple three-layered neural network. The number of neurons in each layer depend on the function that the network is trying to approximate. The neural networks discussed in this thesis are feed-forward neural networks i.e., data only flows in the forward direction and there is no feedback for the data while the network is evaluated. The neural network in Figure 1.3 is fully interconnected arrangement in the sense that every neuron in one layer is connected to every neuron in the succeeding layer. This not a necessity while designing a neural network since all connections may not be required to implement a specific function. However, it is very difficult to accurately predict the optimal number of hidden layer neurons and connections that a particular problem might require. The beauty lies in the fact that neural networks have the ability to learn whether or not a particular neuron or connection has a significant impact on its output.

Figure 1.3: Representation of a simple three-layered artiﬁcial neural network [3].

Supervised learning is one of the most commonly used learning method for artiﬁcial

5 CHAPTER 1. INTRODUCTION neural networks. In this kind of learning, the aim is to infer the mapping implied by the data; the cost function is related to the mismatch between the user’s mapping and the data and it implicitly contains prior knowledge about the problem domain [3]. The mean- squared error is often used as the cost and the learning tries to reduce the average error between the network’s output and the desired output. The Backpropagation algorithm is a well-known and eﬃcient algorithm used for training neural networks. Training is accomplished by adjusting the weights on the connections between neurons with an aim to reduce the mean-squared error at the output of the neural network.

The Backpropagation algorithm calculates the gradient of a loss function with respect to all of the weights in the network. The algorithm tries to minimize the loss function by feeding the gradient to an optimization method which uses it to update the weights. In order for the Backpropagation algorithm to work, the activation function used by the neurons should be differentiable. The activation function is any mathematical function at the neuron which defines its output for a given set of inputs. The Backpropagation algorithm is very effective in training neural networks, but poses a lot of challenges when implementing it on a standalone hardware system. The algorithm works in two phases; the propagation phase and the weight update phase. In the propagation phase, the algorithm first forward propagates a training input through the network and generates the output activations. In the next step, the algorithm does a backward propagation of the output activations through the network using the target pattern to generate the difference between input and output values of all the hidden and output neurons. In the weight update phase, the algorithm first multiplies the difference obtained with the input activation to find the gradient of the weight. Then it uses this gradient to update each of the weights in the network.

It is quite evident that the Backpropagation algorithm though very effective, requires complex multiplication, summation and derivaties that are difficult to implement in VLSI circuits [14]. A simpler algorithm is desirable to design a standalone hardware neural network system. There are several hardware friendly algorithms implemented to train artificial neural networks on hardware. The Random Weight Change algorithm is one

6 1.3. ARTIFICIAL NEURAL NETWORKS ON HARDWARE such popular algorithm. Though not as eﬃcient as Backpropagation, it is hardware friendly and much simpler to implement.

1.3 Artiﬁcial Neural Networks on Hardware

Implementation of artiﬁcial neural networks on hardware has been popular for over three decades. Hardware neural networks extend from Analog to Digital to FPGA and even to Optical Neural Networks. In this section, we brieﬂy explore a few analog neural network implementations and neural network implementations using memristors.

1.3.1 Analog Neural Network Implementations

Implementation of artificial neural networks on hardware gained popularity in the 1980s with Intel’s Electrically Trainable Analog Neural Network (ETANN) 80170NX chip being one of the earliest fully developed analog chips [4]. The ETANN is a general purpose neu- rochip that stores its weights on non-volatile floating gate transistors (Floating-gate MOS- FET or FGMOS) as electric charge with the help of EEPROM cells, and uses Gilbert- multiplier synapses to provide four-quadrant multiplication. Training for ETANN is done off chip using a host computer and the weights are written into the ETANN [4]. The chip contains 64 fully interconnected neurons and can be cascaded by bus interconnection to form a network of up to 1024 neurons with up to 81,920 weights [15].

Figure 1.4 shows the synapse circuit of the ETANN, which is an NMOS version of the Gilbert-Multiplier with a pair of EEPROM cells in which the a diﬀerential voltage is stored as weights. Flower-Nordheim tunneling of electrons is used to add and remove electrons from the ﬂoating gates in the EEPROM to adjust the weights [4]. ETANN was used in several systems like the Mod2 Neurocomputer which implemented 12 ETANN chips for real-time image processing [16] and the MBOX II which makes use of 8 ETANN chips to create an analog audio synthesizer [15].

One of the major drawbacks of this chip was the limited resolution in storing the synaptic weights. The long time resolution of the weights was not more than ﬁve bits.

7 CHAPTER 1. INTRODUCTION

Figure 1.4: Diﬀerential ﬂoating gate synapse schematic diagram of Electrically Trainable Analog Neural Network (ETANN) [4].

Another issue was the writing speed and cyclability of the EAROMs used to store the weights which restricted the application of chip-in-the-loop training [17].

Milev and Hristov [5] present a simple analog-signal synapse with inherent quadratic non-linearity implemented using MOSFETs with no ﬂoating-gate transistors. They designed a neural matrix for ﬁnger-print feature extraction with 2176 analog current mode synapses arranged in eight layers of 16 neurons with 16 inputs each. A chip was fabricated in a standard 0.35µm TSMC process to demonstrate the feasibility of non-linear synapses in practical application.

Apart from the 16 x 8 neural-matrix of 128 analog 16-input-neurons, a 16-bit latched digital inputs multiplexed with 16 analog-current inputs and 16 analog-current signal outputs and a 9-bit current-output digital-to-analog converter (DAC) is also implemented on chip. Weight storage done is on an on-chip SRAM of more than 19K size. The architecture allows for cascaded interconnection for system expansion. The internal system clock is speciﬁed at 200 MHz maximum frequency. However, the input-data processing speed is determined by current propagation delay through the components in the network and varies signiﬁcantly with the reference current driving the analog synapse circuits [5].

8 1.3. ARTIFICIAL NEURAL NETWORKS ON HARDWARE

Figure 1.5: Analog current synapse, synapse current input, weight-control and neuron output circuit schematic of model proposed in [5].

Lui et al. [6] developed a mixed signal CMOS feed-forward neural network chip with on-chip error reduction hardware. The design is compact and capable of high-speed parallel learning using the Random Weight Change Algorithm (RWC). The weight storage in the system is accomplished using capacitors. Capacitors implemented as weights are compact and easy to program, but are susceptible to leakage issues leading to error in the stored weights. In their system, Lui et al. designed large capacitors to ensure the leakage be negligible. The chip is designed to operate in conditions that change continuously,and the weight leakage problem is mitigated by constant weight updates. They found that the weight retention time for the capacitors was around 2s for losing 1% of the weight value at room temperature.

Figure 1.6 shows the schematic of a single weight cell with a shift register for random input, the weight storage and modiﬁcation circuit and the multiplier circuit. Lui et al. were able to fabricated and test a chip with 100 weights and 10x10 array with 10 inputs and 10 outputs. They tested the chip with by connecting it to a PC using an analog to digital converter (ADC) and a digital to analog converter (DAC). In this work we make use of the same RWC algorithm used by Lui et al. in their system. The RWC algorithm

9 CHAPTER 1. INTRODUCTION

Figure 1.6: Schematic of a weight cell of CMOS integrated feed-forward neural network [6]. is described in detail in the next section.

The analog neural network implementations discussed in this text is only a small subset of the innumerable VLSI implementations of artiﬁcial neural networks. Misra and Saha [15] provide a comprehensive survey of the hardware implementations of artiﬁcial neural networks for over 20 years. Their discussion is not limited to analog neural network implementations, but extend to digital, hybrid, FPGA based, RAM based and optical neural networks.

1.3.2 Memristor Based Neural Networks

The potential to mimic brain logic is one of the most attractive feature of the memristor. Various architectures and synapse designs have been proposed using memristors for realizing artiﬁcial neural networks. Here, we brieﬂy discuss a couple of neural network implementations using memristors and the Memristor Bridge Synapse based neural network that we have used as the primary reference in our work.

Starzyk and Basawaraj [7] propose an architecture and training scheme for neural networks implemented using crossbar connections of memristors with a view of preserving the high density of synaptic connections. They employ simple threshold based neurons, synapse constituting of only a single memristor and a common sensing network. The synapse is designed with a view of creating large scale systems with synapses arranged

10 1.3. ARTIFICIAL NEURAL NETWORKS ON HARDWARE in a grid structure capable of being trained on-chip. The sysyem is composed of a single layer feed-forward neural network with n inputs and m outputs.

Figure 1.7: Excitatory neuron with the input sensing circuit of Memristor Crossbar Architecture for Synchronous Neural Networks [7].

The neuron of the Memristor Crossbar Architecture proposed in [7] operates in three different phases, viz. sensing phase, active phase and resting phase. During the sensing phase, the neuron waits for input activity and does not fire. Increase in any of the input signals above the threshold would switch the neuron into active phase, where the neuron either fires or does not for a specific amount of time. Once the active phase timing expires, the neuron goes into resting phase where all the inputs and outputs go to 0V and remains in this state till the next sampling time. The excitatory neuron with the input sensing circuit of the Memristor Crossbar Architecture is shown in Figure 1.7. The design was tested in HPSICE for organization of the neural network on noisy digit recognition.

In [8] Solitiz et al. propose two Neuron Logic Block (NLB) designs to overcome the limitation of not being able train linearly inseparable functions with existing perceptron based NLB designs using thin-film memristors that implement static threshold activation functions. Their designs overcome the limitation by allowing effective activation function to be adapted during learning. Solitiz et al. contribute a perceptron based NLB design with an adaptive activation function, a perceptron based NLB with static activation function and multiple activation thresholds and demonstrate the designs for reconfigurable logic and optical character recognition for hand written digits.

11 CHAPTER 1. INTRODUCTION

Figure 1.8: Weighting and Range Select circuit for RANLB and MTNLB [8].

Figure 1.8 shows the weighting and range selection circuit implemented using memristors for the Robust Adaptive Neural Logic Block (RANLB) and the Multithreshold Neural Logic Block (MTNLB). The RANLB implements an adaptive activation function using the circuit in Figure 1.9 (a), by providing an adjustable digital value for each input current range. A ﬂip-ﬂop stores the digital value for each input current range. The MTNLB is designed with a view of overcoming the high area overhead of the RANLB’s activation function which limits its implementation on large neural networks where area is a primary constraint. The MTNLB employs a static activation function in such a way that the ability to learn linearly inseparable functions is not compromised. Figure 1.9 (b) shows the activation function circuit for the MTNLB circuit.

Figure 1.9: (a) Activation function circuit for RANLB. (b) Activation function circuit for MTNLB [8].

12 1.3. ARTIFICIAL NEURAL NETWORKS ON HARDWARE

Figure 1.10: Circuit that accomplishes weighting using the Memristor bridge synaptic circuit and voltage-to-current conversion with diﬀerential ampliﬁer in [9].

The Memristor Bridge Synapse introduced by Kim et al. in [10] is a very popular synaptic design used to implement neural networks. [9], [18], [19] and [20] present implementations of the Memristor Bridge Synapse in artiﬁcial neural networks. In our work, we build on the work presented in [9] by Adhikari et al. on neural networks constructed using Memristor Bridge Synapse that involves the Random Weight Change algorithm for training.

Each neuron in the Memristor Bridge Synapse based neural network in [9] is composed of multiple synapse and one activation unit. The inputs to the neural network are supplied as voltage values which are weighted and then converted to current by differential amplifiers. Kirchhoff Current Law (KCL) is used to sum the currents and produce the output of a neuron. The differential amplifier along with the active load circuit form the activation unit of the neuron. Figure 1.10 shows the Memristor Bridge Synapse connected to the differential amplifier circuit. Figure 1.11 (a) shows a simple neural network with two neurons and Figure 1.11 (b) shows the equivalent hardware circuit for the neural network in Figure 1.11 (a) along with the architecture for the training regime.

Adhikari et al. designed and simulated the diﬀerential ampliﬁer and the active load circuit in HSPICE and developed a look-up table from the results. The Memristor model, error calculation, random number generation and training pulse application were simulated in MATLAB. They tested the architecture to learn the 3-bit parity problem, a Robot

13 CHAPTER 1. INTRODUCTION

Figure 1.11: (a) Typical multi-layered neural network inputs in voltage form. (b) Schematic of learning architecture for the equivalent hardware for the neural network in (a) [9].

workspace and face pose identiﬁcation using neural networks with 3 input x 5 hidden x 1 output, 10 input x 20 hidden x 1 output and 960 input x 10 hidden x 4 output nodes respectively in MATLAB [9]. Their aim was to show that the Memristor Bridge Synapse based neural networks trained using the Random Weight Change algorithm could be used to realize simple, compact and reliable neural networks that are capable of being used for real-life applications.

In our work, we have used the Memristor Bridge Synapse based neural networks described in [9] as the base and try to build a complete hardware architecture which can be implemented on a chip. We have made several modiﬁcations to the architecture presented in [9], but have used the Memristor Bridge Synapse as the primary component of the system along with the application of the RWC algorithm for training. The RWC

14 1.4. RANDOM WEIGHT CHANGE ALGORITHM algorithm and the circuit implementation of the Memristor Bridge Synapse are discussed in detail in the following sections.

1.4 Random Weight Change Algorithm

The Random Weight Change (RWC) algorithm was ﬁrst described by Hirotsu and Brooke in 1993. They proposed the algorithm as an alternative to Backpropagation to eliminate the need for complex calculations while training a neural network. The non-idealities of analog circuits is another reason why Backpropagation is not preferred for hardware implementations. They were able to successfully implement and test the algorithm on a chip with 18 neurons and 100 weights which learned the XOR Gate problem [14].

Figure 1.12: Flowchart for Random Weight Change Algorithm.

The algorithm randomly changes all of the weights by a small increment of -δ or +δ from its initial state. The training input is then supplied to the network and the output

15 CHAPTER 1. INTRODUCTION

Figure 1.13: Illustration of energy surface tracing by back-propagation and random weight change algorithm [9]. error is calculated. If the new error has reduced compared to the previous iteration, the same weight change is done again, until the output error either increases or falls to within a desired limit. If the output error increases, then the weights are updated randomly again. The algorithm can be summarized using the following equations from [14]:

wij(n + 1) = wij(n) + ∆wij(n + 1) (1.3) where,

∆wij(n + 1) = ∆wij(n) if E(n + 1) < E(n)

∆wij(n + 1) = δ ∗ Rand(n) if E(n + 1) ≥ E(n)

E() is the root mean-squared error at the output, δ is a small constant and Rand(n) = +1 or -1 randomly. The ﬂowchart in Figure 1.12 illustrates the steps in the Random Weight Change algorithm.

The Random Weight Change algorithm is less eﬃcient when compared to Backproga- tion. Figure 1.13 shows a comparison of the RWC algorithm with Backpropagation. For Backpropagation, the operating point goes down along the steepest slope of the energy

16 1.5. MEMRISTOR BRIDGE SYNAPSE

Figure 1.14: Memristor Bridge Synapse Circuit [10].

curve in the network. For RWC algorithm, the operating point goes up and down on the energy curve rather than descending straight along the energy curve. However, RWC’s operating point statistically descends and ﬁnally reaches the correct answer [14].

The RWC algorithm is very effective for analog implementations of artificial neural networks as it eliminates the need for complex circuitry and is not greatly affected by circuit non-idealities. Moreover, the algorithm does not require any specific network structure and can be applied to all feed-forward neural networks. Fully connected feedback networks may have local minimum problems [14].

1.5 Memristor Bridge Synapse

The Memristor Bridge Synapse is a Wheatstone Bridge like circuit that is composed of four identical memristors. Figure 1.14 shows the arrangement of the memristors in the Bridge Synapse. The memristor are arranged such that the polarities of memristors M1 and M4 are the same and opposite to that of M2 and M3. When a positive voltage

is supplied at Vin M1 and M4 are forward biased, which leads to the decrease in their resistances. M2 and M3 on the other hand become reverse biased and their resistance increases [10]. The outputs of the Bridge Synapse are tapped out at the nodes A and B.

The Bridge Synapse basically acts as two voltage divider circuits. The voltage at the

17 CHAPTER 1. INTRODUCTION

nodes A and B are given by the simple voltage divider formula:

M2 VA = ( ) ∗ Vin (1.4) M1 + M2

M4 VB = ( ) ∗ Vin (1.5) M3 + M4

where M1, M2, M3 and M4 are the resistance of the memristors M1, M2, M3 and M4 respectively. The weight of the Memristor Bridge Synapse is the diﬀerence in the voltage

VA and VB. Initially, when all the memristors are in the same state, the node VA and VB will have the same value. The synaptic weight of the Bridge Synapse is described by the following expressions from [10]:

positive synaptic weight if, M M 2 > 4 M1 M3

negative synaptic weight if, M M 2 < 4 M1 M3

zero synaptic weight if, M M 2 = 4 M1 M3

The output of the Bridge Synapse can be modelled by the equation

Vout = ψ ∗ Vin (1.6)

where ψ is the synaptic weight deﬁned by,

M M ψ = 2 − 4 (1.7) M1 + M2 M3 + M4

The Memristor Bridge Neuron is implemented by summing the output signals from different Bridge Synapses. Differential amplifiers are used to process the weighted inputs from primary inputs or other neurons. The implementation of the Bridge Neuron is described in Chapter 2.

18 1.6. THESIS STATEMENT

1.6 Thesis Statement

Since the physical realization of the memristor by HP labs in 2008, the research on memristors and its applications have been constantly gathering pace. The potential of memristors in realizing simple and fast neuromorphic circuits is immense. As the lithographic process for fabricating memristors evolve, architectures and tools for circuit realization also need to evolve.

Majority of the research on memristor based neural networks have thus far focused on various algorithms and methodologies for implementation. The Bridge Synapse based artificial neural network presented in [9] shows a lot of promise for practical implementation because of the simplicity in its design. In the work presented in [9], the authors focused on illustrating the simplicity and effectiveness of using the Memristor Bridge Synapse in tandem with the Random Weight Change algorithm for neural network implementations. They proposed to use the Memristor Bridge Synapse as the weighting element of the neural network to which inputs were applied as voltage pulses. At the neuron level, voltage-to-current conversion was achieved using differential amplifiers to take advantage of Kirchhoff Current Law to sum the inputs of the neurons. The differential amplifier along with the active load circuit form the activation unit of the neuron.

In [9], the authors tested their design by first simulating the differential amplifier and the active load circuit in HSPICE and creating a look-up table which was then used for training the neural network in MATLAB. The neural network circuit was created in MATLAB using a memristor model. The error calculation and random number generation was done by MATLAB code and the weight updates were done by changing the resistance of the memristors in the bridge synapse based on the random numbers. They successfully trained neural network for 3-bit parity problem, learning robot workspace and for face pose identification.

Although [9] proves that neural networks using the Memristor Bridge Synapse for weighting along with the RWC algorithm for training is a good approach for real-life applications, a path for an actual realization of a chip was not described. Moreover,

19 CHAPTER 1. INTRODUCTION on-chip training requires additional circuitry and timing becomes critical. In our work, we focus on developing an architecture that can eﬃciently implement the RWC algorithm and Memristor Bridge Synapses to create a hardware neural network the can be trained completely on chip. We have made modiﬁcations to the design of the neuron and activation function in [9], but the training algorithm and weighting methodology remains the same.

Our architecture is composed of the neural network circuit realized using Memristors and diﬀerential ampliﬁers. The architecture also incorporates a microcontroller, which is responsible for measuring and calculating the output error and supplying the random training signals and timing signals to the neural network. We designed and implemented circuits to supply the random inputs and apply them to each individual Memristor Bridge Synapse during training.

We also developed a placement and routing tool to realize the architecture on a physical layout. The tool takes the number of inputs, hidden layers and outputs as its input and generates a physical layout with interconnections between neuron blocks on diﬀerent layers. Since layout libraries for memristors are not available yet, the placement and routing tool is only a prototype to illustrate how the architecture would appear on a layout and to gather an approximation of the area occupied by a speciﬁc network.

Majority of the simulations in this work were performed using HSPICE. Spice level simulations are the best available approximations to actual circuit behavior in hardware. Simulations were performed for individual components of the architecture and complete neural network circuits. We also developed a simulator to train a small neural network in HPSICE using Perl. Perl mimicked functions of the microcontroller such as supplying random inputs, clock signals etc. by generating PWL inputs to the HPSICE circuit. A neural network with 2 inputs, 3 hidden layer neurons and 1 output layer neuron successfully learned the OR-gate function in HSPICE.

The aim of our work was to develop an architecture suitable for implementing memristor based neural networks on chip. With the core of the neural network implemented in HSPICE using real components and only minimal functionality simulated using software,

20 1.7. THESIS OVERVIEW we were able to show that our architecture is well suited to be realized on a chip.

1.7 Thesis Overview

The remainder of this document is organized in the following manner: Chapter 2 discusses the architecture for implementing neural networks with the Memristor Bridge Synapse. The Chapter describes the various components of the architecture and their functions. An overview of the functioning of the neural network and the bit-slice design of the synapse is also presented in this Chapter. The placement and routing tool for the architecture layout is described in Chapter 3. This Chapter explains the algorithm and the implementation of the tool and presents and discusses the output. The Chapter also discusses how the tool is designed to produce layout for varying number of neurons and neural layers. Chapter 4 describes the experimental setup, observations and analyzes the results of the experiments conducted at diﬀerent abstractions of the neural network design. All components of the neural network are simulated both individually and and as full circuit. The power calculations and estimations for neural network training and normal operation are also presented in this Chapter. The conclusions drawn from this thesis and future work are described in Chapter 5.

21 Chapter 2

The Memristor Neural Network Architecture

The primary focus of this thesis is to develop an efficient hardware architecture to implement the memristor based artificial neural networks described in [9]. This Chapter focuses on describing our architecture and the various components of the neural network system and their functions. The architecture is best explained with the help of examples. In this thesis, we have used two different neural networks for simulations at different levels of abstraction. A small neural network that aimed to learn the OR-Gate problem was used in simulations to verify the functionality of the Memristor Bridge Synapse and other components and the entire architecture at the SPICE level. A much larger neural network for face pose identification explained in [9], was simulated using Python to verify the functioning of the large Memristor Bridge Synapse based neural networks for more practical applications.

Figure 2.1: Sample input for face pose identiﬁcation problem [11].

22 2.1. ARCHITECTURE OVERVIEW

2.1 Architecture Overview

Image recognition is a popular application of artificial neural networks and the memristor bridge synapse based artificial neural networks are efficient in learning functions of this kind. We illustrate the working of the neural network architecture using face pose identification problem discussed in [9].

The sample inputs to the neural network for the face pose identiﬁcation problem is shown in Figure 2.1. The images for face recognition are available for download from CMU [11]. In this problem, the network tries to learn the direction in which the face of the subject in the image is oriented. There are four face poses that the networks aims to learn; left, right straight and up as depicted in Figure 2.1 (a) through (d). The images are greyscale with 32x30 resolution. Figure 2.2 shows a representation of the neural network used for this problem.

Figure 2.2: Three layered neural network for face pose identiﬁcation.

23 CHAPTER 2. THE MEMRISTOR NEURAL NETWORK ARCHITECTURE

The network has a total of 960 (32*30) inputs, 10 hidden layer neurons and 4 output neurons. Every neuron in one layer is connected to every neuron in the succeeding layer. The network consists of a total of 9640 memristor bridge synapses. The circuit produces an output of [1 -1 -1 -1], [-1 1 -1 -1], [-1 -1 1 -1] and [-1 -1 -1 1] for left, right, straight and up orientations of the subject’s face. In Figure 2.2, the input layer neurons are only a representation of the fan-out of the external inputs to multiple memristors bridge synapses. No function is applied to the inputs at the input layer neurons.

Figure 2.3: Memristor based neural network architecture for face pose identiﬁcation.

For this neural network design, we can see that the number of hidden and output layers are smaller in number compared to the input layer. In this particular example of face pose identiﬁcation, the number of input neurons is almost 100 times the hidden neurons, and the number of output neurons is less than half the number of input neurons. A chip for such a neural network can have close to 1,000 pins and the architecture in Figure 2.3 is designed keeping the constraint of connecting the pins to internal signals in mind. Since all inputs go to all neurons, each neuron block in the middle layer receives the inputs from a bus. A neuron block consists of as many Memristor Bridges as inputs to the block (960 Memristor Bridge Synapses in this example) and three operational ampliﬁers

24 2.2. ARCHITECTURE COMPONENTS circuits, two for summing and one for diﬀerence. The middle layer neuron blocks are placed close to the periphery of the chip on three sides and the output is drawn out from the fourth side.

The input layer bus (consisting of 960 wires in the example) is placed around the middle layer neuron blocks. This way, the inputs from the pins can easily be supplied to the bus and the bus lines can be conveniently accessed by each neuron block. The middle layer bus will have as many lines as middle layer neuron blocks (10 in this case). The output of each neuron in the middle layer is connected to the bus and supplied to output layer neuron blocks. The outputs of the output layer neuron blocks are connected to the microcontroller, which reads the values generated by the network and calculate the error and perform training. The outputs can also be tapped out through other pins on the chip.

2.2 Architecture Components

With respect to the description of the architecture in Figure 2.3, the components of the neural network can be categorized as internal and external to the neuron block. The components that are external to the neuron blocks are the connection buses and the microcontroller. We ﬁrst describe the components internal to the neuron block and then move onto the components external to it.

We describe the internal components of the neuron block with the help of a simple neural network. Figure 2.4 shows an artificial neural network with two input layer neurons, two hidden layer neurons and one output layer neuron. The aim of this neural network is to learn the OR-Gate function. The training inputs are applied through the nodes IN1 and IN2. There are a total of five neurons, N1 through N5 and six memristor bridges, BR1 through BR6 in this network. Each neuron in one layer is connected to every neuron in the succeeding layer. The neurons N1 and N2 are only a representations and do not apply any function on the inputs. The applied inputs fan out from N1 and N2 neurons to different bridges. For example, the input supplied at IN1 fans out to

25 CHAPTER 2. THE MEMRISTOR NEURAL NETWORK ARCHITECTURE

bridges BR1 and BR2. Each memristor bridge produces two output components, the

VA component and the VB component. These components are represented by the two lines that originate from each bridge synapse and go into the neuron where summing logic is implemented.

Figure 2.4: Simple three-layered neural network.

2.2.1 Neuron Block

2.2.1.1 Memristor Bridge Synapse

The Memristor Bridge Synapse is the primary component of the neural network and takes up most area on the chip. Each memristor is about 50 nm x 50 nm wide. A single memristor bridge requires about 200 nm x 200 nm of area after including the routing between the memristors. The biggest network simulated in this work has almost 10,000 bridge synapses.

Figure 2.5 shows the design of a Memristor bridge synapse. The input to the memristor is applied from one end (node IN ) and the other end is tied to ground. As discussed in Chapter 1, the two memristors connected on either side of node A are arranged such that one of them will be forward biased and the other reverse biased when a voltage is applied at node IN. The same logic applies to the memristors connected on either side of node B, the only diﬀerence being that their orientation with respect to node IN is opposite to that of the other two memristors on either side of node A in the bridge. The nature of this arrangement ensures that the voltage drop at either node A or node B will be greater than the other. Because of this arrangement, when the voltage drop at one node

26 2.2. ARCHITECTURE COMPONENTS

Figure 2.5: Memristor Bridge Synapse design. increases, drop at the other node decreases. It also ensures that the total resistance of the memristor bridge is a constant and brings a symmetry to the weight supplied by the bridge.

The weight supplied by the bridge synapse is the diﬀerence between the node voltages

(VA − VB). The weight is changed by supplying either a positive or negative voltage at IN. For the bridge in Figure 2.5, a positive voltage pulse at IN will result in the decrease in the resistances of the memristors M1 and M4, and an increase in the resistance of M2 and M3. Consequently, the voltage drop at A will increase and voltage drop at B will decrease as explained using equations 1.4 and 1.5. On the contrary, if a negative voltage pulse is applied at IN, the voltage drop at A will decrease and that at B will increase.

The weight supplied will either be positive or negative depending on whether VA or VB is greater.

It is interesting to note that both the evaluation and training pulse are applied through the same node to the memristor bridge. The question arises how would the evaluation input aﬀect the resistance of the bridge and in turn, the weight of the bridge if they are both applied from the same node. From experiments conducted, we observed that if the pulse width of the input is within 1 ms, it does not bring any notable change to the resistance of a memristor. Moreover, to ensure that the evaluation pulse does not alter the resistance of the bridge synapse, the evaluation pulse is supplied as a complement,

27 CHAPTER 2. THE MEMRISTOR NEURAL NETWORK ARCHITECTURE e.g. if an input to one of the inputs of the neural network is +1V, a -1V is applied for a the duration as the +1V input during evaluation to reverse any change to the resistance caused by the input pulse. To change the resistance of a memristor by 40 Ω , a pulse of width 250 µs was required. The experiments and observations are described in detail in Chapter 4.

All connections between neurons in the network are established using the memristor bridge synapse. While training is in progress, each memristor is applied a training pulse based on the random number that was generated for it. The circuitry for applying random pulses will be discussed in a later section.

The activation function at the neuron is implemented by the operational amplifiers using a summing logic. The neuron receives its input from the various bridges that are connected to it. Each bridge supplies two voltage components (VA and VB). The neuron first sums these two components individually and then evaluates the difference between the two sums.

Figure 2.6: Summing logic for neuron N3 from Figure 2.4.

Figure 2.6 shows the summing logic implementation of neuron N3 from the circuit in

Figure 2.4. Each bridge synapse has two output components, the VA component (voltage from node A) and VB component (voltage from node B). At the neuron, the individual

VA and VB components are summed together ﬁrst. After the summing is complete, the diﬀerence of these individual summed values is evaluated. This evaluated voltage value

28 2.2. ARCHITECTURE COMPONENTS

will be the output of the neuron.VASUM and VBSUM in Figure 2.6 are evaluated by

summing the VA components and VB components from memristor bridges BR1 and BR3.

After the summation, the difference is evaluated by subtracting VBSUM from VASUM. The difference gives the output N3OUT of neuron N3. Both the summing and difference logic is implemented with the help of operational amplifiers. Each neuron contains three operational amplifiers, two for each summing circuit and one for the difference circuit. The implementation of the summing and difference circuit are explained in the following section.

2.2.1.2 Summing Ampliﬁer

The summing operation is implemented by using a voltage average circuit along with an operational ampliﬁer as depicted in Figure 2.7. Note that the resistors used along with the ampliﬁer circuits are normal resistors and not memristors. The memristors are used only to design the bridge synapses. The voltage averaging is accomplished by

Figure 2.7: Summing circuit using voltage average and operational ampliﬁer circuits.

connecting the input voltages to resistors of resistance R. The other end of these resistors

are connected to the same node. For example, in Figure 2.7, the voltages VA from BR1 and BR3 are connected to two resistors of resistance R. Now, the voltage at node S1 will be the average of the two input voltages. To get the sum from the averaged voltage, the voltage at node S1 needs to be multiplied with the total number of inputs to the summing circuit. This accomplished by adjusting the gain of the operational amplifier. In the circuit it Figure 2.7, the operational amplifier is in the non-inverting configuration,

29 CHAPTER 2. THE MEMRISTOR NEURAL NETWORK ARCHITECTURE

whose gain is determined by the two resistors R1 and R2.

The gain for this particular ampliﬁer is two, since there are two inputs to the summing

circuit. VASUM will automatically be generated after the circuit receives the input voltages. For a neuron which has n inputs, the operational amplifier will be configured to have a gain of n. The gain of the transistor once fixed, does not have to be altered during the operation of the neural network. An important point to note here is that the output, VASUM of the summing circuit is limited by the supply voltage to the differential amplifier. In the case of the circuit in Figure 2.7, the output voltage will be within -1V to +1V.

2.2.1.3 Diﬀerence Ampliﬁer

Figure 2.8: Difference circuit using differential amplifier.

The implementation of the difference amplifier is much more straightforward. The differential amplifier circuit used for this operation is shown in Figure 2.8. It is configured

with a gain of 1 and the input VASUM is supplied to the non-inverting input and VBSUM sum is supplied to the inverting input. All the resistors in the circuit are the same value.

The circuit essentially does the operation N3OUT = VASUM - VBSUM.

30 2.2. ARCHITECTURE COMPONENTS

2.2.2 Microcontroller

The microcontroller is one of the key components of the architecture. It is responsible for implementing the training algorithm by supplying all the necessary signals to the memristor bridge synapses and the neurons. The microcontroller additionally generates the random numbers required to train the weights of the bridge synapses.

2.2.2.1 Signals Generated by the Microcontroller

The microcontroller contains the logic for generating the control signals that are required to train and operate the neural network. There are three control signals: update/evaluate, evaluate, shift in and clk.

1. update/evaluate: This signal decides whether the neural network is in weight update or evaluation mode. When update is high (update = 1), the network is in weight update mode. The microcontroller supplies this signal to activate the weight update process by enabling the +1V and -1V power rails. When the signal is low, the network is either evaluating its output using the supplied external input, or is in an idle state. When the network is in idle state, all bridge synapses and neurons are undriven.

The update signal is also used to isolate the memristor bridges from the operational amplifiers during the weight update phase. The isolation of the operational amplifiers is very important to ensure that the training pulse on one memristor bridge synapse is not propagated forward to the next layer. This is done by disabling the input power rails to the differential amplifier through power gating.

2. shift in:The random numbers for each memristor bridge synapse are supplied using this signal. Each bridge requires a random signal (either 0 or 1) and this random number is generated and supplied by the microcontroller. The random numbers are passed on to a shift register that are connected to the bridge synapse. Each bridge synapse will have one D ﬂip-ﬂop associated with it to supply the random number for training. The random numbers are supplied to the shift register through the

31 CHAPTER 2. THE MEMRISTOR NEURAL NETWORK ARCHITECTURE

shift in line. There may be more than one shift in line depending on the size of the neural network and the number of shift registers implemented.

3. clk: The clk is the global clock supplied to the entire neural network and is used for supplying the random numbers to all the ﬂip-ﬂops in the network. This signal is activated only when the random input are supplied to the neural network for training.

2.2.2.2 Functions of the Microcontroller

The microcontroller is the central component of the architecture. It is responsible for supplying the input, training signals, updating weights and evaluating the output of the network. It ensure that all the components in the network function in a synchronized manner.

Figure 2.9: Neuron N3 inputs and output for the neural network in Figure 2.4.

1. Synchronizing Input Application:

The microcontroller enables and disables the application of external inputs to the neural network. The external inputs are supplied to the network for only a very short period of time (<1ms) during the evaluation phase. The inputs should be disabled while the training pulses are applied to the memristor bridge synapses. Otherwise, this will lead to two strong signals driving one single node and may

32 2.2. ARCHITECTURE COMPONENTS

even damage the circuit. The microcontroller ensures that multiple inputs will not drive a memristor bridge synapse at any given time using the signals it generates.

2. Disabling Operational Ampliﬁers:

Figure 2.9 shows two input bridges BR1 and BR3 to the neuron N3 and its output connected bridge BR5. The external input as well as the weight update pulses are supplied to BR1 and BR3 through the nodes IN1 and IN2 respectively. The weight update pulse for BR5 is supplied through node N3OUT. During the weight update process, the voltage pulse supplied to BR1 and BR5 will propagate all the way to the differential amplifiers and get amplified to generate a voltage value at N3OUT. To avoid this scenario, the differential amplifiers are turned off while the training pulses are supplied. Since the input and output of the differential amplifiers are electrically isolated, the bridges themselves will remain electrically isolated. This is ensured by gating the evaluate signal generated by the microcontroller with the power rails of the operational amplifiers.

3. Generating Random Numbers:

The microcontroller is responsible for generating the random numbers that are required for updating the weights of the memristor bridges during training. Each memristor bridge requires a uniquely generated binary random number to decide the direction of its weight update. If the random number associated with a bridge synapse is 1, its weight is increased. If the random number is 0, then the bridge’s weight is decreased. Each memristor bridge synapse has a D flip-flop that stores its training input. All the flip-flops are connected as a shift register and their inputs are supplied serially. A neural network may have more than ten thousand bridge synapses and each will have its own D flip-flop, which makes it difficult to supply so many random numbers in parallel.

4. Error Calculation and Processing:

Every iteration in training generates an output. The microcontroller reads this output and compares it with an expected output already stored in its memory to

33 CHAPTER 2. THE MEMRISTOR NEURAL NETWORK ARCHITECTURE

generate a mean-squared error value. This generated error is compared against an already defined and stored threshold. If the error value obtained falls within the threshold, then the training process will come to an end. If the generated error is above the threshold, the error is compared with the error generated during the previous iteration. If the comparison yields that the new error is less than the old error, then the flip-flop values are not updated, and the weight update is done in the same direction for each bridge as the previous iteration. If the new error is greater than the old error, then the microcontroller generates new random values and sends it through the shift register to all memristor bridges before the weight updates are done.

2.2.3 Shift Register

A shift register is nothing but a cascade of flip-flops that share the same clock. A shift register is used in the neural network to supply the random inputs necessary for weight update for the memristor bridge synapses. Each bridge synapse has a flip-flip associated with it, which hold a 1 or 0, to dictate whether a positive or negative should be applied to the memristor bridge for weight update. A positive pulse increases the bridge’s weight while a negative pulse decreases it.

There will be as many number of flip-flips as there are memristor bridge synapses. The number of shift registers may vary depending on the design. If there is a large number of memristor bridges, it may be suitable to have multiple shift registers to which the random numbers can be supplied. For example, if there are 10,000 flip-flops in the neural network, having two shift registers of 5,000 flip-flops each with input to each shift register supplied in parallel, the process can be completed in half the time. However, if there are 20 bridges, it is not desirable to have two separate shift registers due to the logic overhead required to supply pulses to two different shift registers.

Although the flip-flops of the shift register are part of the neuron block, we described it as a separate component since the shift register is formed by the interconnection of flip-flops across multiple neuron blocks.

34 2.3. MEMRISTOR BRIDGE SYNAPSE BIT-SLICE

2.2.4 Connection Buses

The connection buses are used to supply the inputs to each memristor in the neuron block. The same input needs to be supplied to multiple memristor bridge synapses and we use a bus to establish this connection. For example, for the neural network in Figure 2.3, each input is supplied to 10 memristor bridge synapses inside 10 diﬀerent neuron blocks that are placed on three edges of the chip. The bus 960 lines is laid around the 10 neuron blocks and all neuron blocks receive all 960 input signals.

2.3 Memristor Bridge Synapse Bit-Slice

The memristor bridge synapse is the most recurring element in the neural network design. Each memristor bridge synapse requires additional circuitry to implement the weight change and evaluation logic. This extra circuitry is composed of a flip-flop to hold the selection logic for weight change and an additional multiplexer to select the input power rails. Since these components need to be implemented alongside all memristor bridge synapses, it is logical to create a bit-slice design with the bridge and the components that go along with it. The flip-flops in different bit-slices will be connected together to form a shift register. Figure 2.10 shows bit-slice for the memristor bridge synapse. A multiplexer circuit is implemented in the bit-slice using transistor logic to supply either +1V or -1V to the input br in, of the memristor bridge synapse. The D flip-flop, which is part of the shift register holds the value which determines to which power rail the memristor bridge synapse will be connected to during weight update. Table 2.1 summarizes the logic. When update = 0, node br in will either be undriven or driven by bridge input depending on whether or not the microcontroller has asserted the training input signal. When update = 0, the power rails to the multiplexer are gated and br in can only be driven by bridge input. When update = 1, the power rails to the multiplexer are active and the weight update pulse is automatically applied.

Table 2.1: Training input selection logic.

35 CHAPTER 2. THE MEMRISTOR NEURAL NETWORK ARCHITECTURE

Figure 2.10: Memristor Bridge Synapse Bit-Slice.

update shift out in

0 0 undriven/bridge input 0 1 undriven/bridge input 1 0 -1V 1 1 +1V

Combining these components together and creating a bit-slice design makes it much more efficient when creating layout for the architecture. The neuron blocks will be composed of an array of these bit-slices and the differential amplifier circuits.

2.4 Architecture in a nutshell

The scalable hardware architecture is composed of several layers of neuron blocks. The number of neuron blocks in each layer deﬁnes the number of neurons in the layer. Each neuron block in a one layer receives all inputs from the previous layer or primary input.

36 2.5. SUMMARY

The innermost layer in the architecture is the output layer, whose neuron blocks are considerably smaller in size compared to the other layers. Each neuron block contains as many memristor bridge synapses bit-slices as inputs to it. Each bit-slice is composed of a memristor bridge synapse, a multiplexer circuit, an inverter and a flip-flop. The flip-flops across multiple bit-slices are connected together to form a shift register. The neuron blocks also contain three operational amplifier circuits for summation and difference. Training of the neural network is synchronized by the microcontroller. It reads the neural network output to compute the error and generates all control signals and random bits for training and supplies training input. Training is implemented by the Random Weight Change algorithm, with the microcontroller comparing the output error at each iteration with the previous iteration and deciding whether to continue weight change in the same direction or switch to a new random direction. Training is stopped when the output error goes below a set threshold or if the iteration limit is reached.

2.5 Summary

In this Chapter, we presented an overview of the proposed architecture for memristor based artiﬁcial neural networks. All the individual building blocks of the architecture were introduced and how they all piece together to form the complete hardware for an artiﬁcial neural network was explained. In the next Chapter, we introduce a placement and routing tool that can be used for realizing the hardware architecture presented in this Chapter.

37 Chapter 3

Placement and Routing Tool for Memristor Neural Network Architecture

In Chapter 2, we described the hardware architecture for implementing Memristor Bridge Synapse based artiﬁcial neural networks. In this Chapter we describe the placement and routing tool that can be used to translate the architecture into a physical layout. The tool is only a prototype and is designed to be modiﬁed when memristor layout libraries become available.

3.1 Tool Overview

The hardware architecture was designed in such a way that neural networks with a large number of inputs can be easily implemented on a chip. In Chapter 2, we described the architecture with the help of an example neural network with 960 inputs, 10 hidden layer neurons and 4 output layer neurons with complete connectivity. This neural network will be composed of a total of 9640 Memristor Bridge Synapses, 38,560 memristors, 9640 flip- flops, 9640 voltage multiplexers, 9640 inverters and 42 differential amplifiers. Realizing such a large circuit with over 65,000 components requires efficient floorplanning and routing. The tool also needs to adaptable in cases where the number of hidden layers is

38 3.1. TOOL OVERVIEW more than one, which is the primary advantage of the hardware architecture.

We have implemented a tool using C++ to generate a layout capable of being loaded onto Magic [21] layout editor. Magic is a free and open source VLSI layout tool initially developed at UC Berkeley. Our placement and routing tool generates a .mag file with layout information stored as coordinates of a two dimensional gird space that can be read directly by the Magic layout editor. The placement and routing for the tool is done at a level of abstraction where we illustrate only neuron blocks being placed and routed. The neuron blocks constitute of Memristor Bridge Synapse bit-slices and differential amplifiers.

As mentioned earlier, placement and routing in Magic is based on two dimensional grids, where the dimension of each grid is the feature size of the technology used. The blocks and the connection wires are printed on the two dimensional grid space by specifying what material should be printed on each grid. Once the placement and routing is complete, the grid coordinates and what material it holds is written out as text to a .mag file. In most placement and routing tools, the grid space is stored in a data structure so that each entry in the data structure pertaining to a coordinate can hold a specific value to indicate what material each grid space holds. Once the placement and routing is complete, the tool would print out the coordinates and the content of each grid onto the .mag file.

For our neural network architecture, the total grid space required for placing and routing the neural network for face pose identiﬁcation was 7680 x 11520. If a data structure was to be created to store each grid on the grid space, memory will have to be allocated to store a total of 88,473,600 grid values.

In case of most VLSI circuits, placement and routing is accomplished by implementing already established algorithms. These tools also take certain amount of runtime to create and print out a layout to a .mag ﬁle depending on the size of the circuit and the routing complexity. Our architecture however is a basically multiple instantiations of the same components in a speciﬁc structured way. We place and route neuron blocks on three sides of concentric squares. By taking advantage of this symmetry in the design, we

39 CHAPTER 3. PLACEMENT AND ROUTING TOOL FOR MEMRISTOR NEURAL NETWORK ARCHITECTURE mathematically calculate the grids on which different materials fall and directly write the data to the .mag file. This way, we avoid the overhead of creating a large data structure to store the contents of each grid which in turn brings significant reduction in runtime. Analysis of the tool performance is described in a later Section in this Chapter.

In the following sections, we describe the tool ﬂow and the algorithm, layout components, analysis of the output and how the tool produces the layout for networks with diﬀerent number of hidden layer neurons. We also discuss the tool run-time analysis and the area consumed by the layout.

3.2 Tool Flow

The placement and routing tool takes the number of inputs, number of hidden layers and hidden layer neurons and the number of output layer neurons as its input. From the description of the architecture in Chapter 2, we can see that neuron blocks are placed on three sides of a square shaped chip and the outputs are drawn out to the fourth side. The input layer neurons are placed close to the periphery of the chip and the output layer neurons at its center. The hidden layers will be placed between the input and output layer neurons.

The tool only places and routes the neuron blocks discussed in Chapter 2. Each neuron block will contain three operational ampliﬁer circuits and as many Memristor Bridge Synapse bit-slices as inputs to the layer in which the block is present.

The tool starts by ﬁrst placing the output layer neuron blocks on three sides of the innermost area of the chip. The placement and routing in Magic as discussed in the previous section is on two dimensional grids. The materials on layout are printed on the two dimensional grid space by specifying the bottom left and top right corner of a rectangle and the type of material occupying this rectangular area. The number of blocks to be placed at the top, bottom and left sides are derived by the following simple formulae:

Total no. of blocks No. of blocks on top side = 3 40 3.2. TOOL FLOW

Total no. of blocks No. of blocks on left side = + (Total no. of blocks)%3 3 Total no. of blocks No. of blocks on bottom side = 3

Figure 3.1: Placement of 10 blocks of output layer on layout represented with p-diﬀusion.

For simplicity, we abstract the components inside the neuron block and represent each neuron block with p-diffusion on the Magic layout editor. The size of each block is estimated by the number of inputs to the block. For example, if a particular layer receives n inputs from either the previous layer or the primary input to the network, the size of each block is set to be 2n x 2n grids. The blocks are separated with an offset value of 5 gird spaces. The separation of blocks with offset is only done for illustrating the placement and routing of blocks. For an actual implementation, the blocks can be placed next to each other. Figure 3.1 shows an 10 neuron blocks of dimension 20 x 20

41 CHAPTER 3. PLACEMENT AND ROUTING TOOL FOR MEMRISTOR NEURAL NETWORK ARCHITECTURE grids and each block receives 10 inputs from either the previous layer or primary input.

Figure 3.2: After routing of input bus for placed blocks in Figure 3.1.

After placing the blocks, the tool creates a bus on the outside of the layer of neuron blocks that span to all neuron blocks in the layer. Since each neuron blocks requires all inputs from the previous layer, the bus is used to supply the inputs. The inputs to the bus comes from either the output of the previous layer or the primary input. Figure 3.2 shows the updated layout in Figure 3.1 after input bus routing.

Once routing of the input bus is complete, the tool routes the output of the neuron layer. If the layer under construction is the output layer, then the wires are drawn to connect the output to the pins. For hidden layers under construction, the output of the layer will be connected to the input bus of the next layer. Note that ﬁrst the output layer is routed followed by the hidden layers and ﬁnally the input layer.

After construction of one layer is complete, the tool picks up the next higher layer in

42 3.3. OUTPUT AND PERFORMANCE ANALYSIS

Figure 3.3: Completed placement and routing for neural network with 30 inputs, 10 hidden layer neurons and 10 output layer neurons. the network and repeats the same process. On ﬁnishing construction of all layers of the neural network, the tool routes the control signals and power rails through the blocks. Figure 3.3 shows a completely routed 3-layered neural network with 30 inputs, 10 hidden layers neurons and 10 output layer neurons.

The ﬂowchart in Figure 3.4 summarizes the tool ﬂow for placement and routing. In the next section, we analyze the output of the layout tool for a few example neural networks.

3.3 Output and Performance Analysis

In the previous section, we presented the flow for our placement and routing tool and showed the final layout for a neural network with 30 inputs, 10 hidden layer neurons and 10 output layer neurons. In this section, we discuss the output produced for different size

43 CHAPTER 3. PLACEMENT AND ROUTING TOOL FOR MEMRISTOR NEURAL NETWORK ARCHITECTURE

Figure 3.4: Flowchart showing the tool flow for placement and routing. neural networks and analyze the area occupied by the layout and the efficiency. We also briefly discuss the runtime for generating the layout.

3.3.1 Area Analysis

To analyze the area occupied by the neural networks on layout, we present layout for three diﬀerent neural networks. We discuss the total area occupied by the layout and also show what percentage of the total area is used for placement and routing for diﬀerent sized neural networks.

The largest neural network we created using the tool was for the face pose identiﬁ- cation problem which received 960 inputs and was composed of 10 hidden layer neurons

44 3.3. OUTPUT AND PERFORMANCE ANALYSIS

Figure 3.5: Layout for face pose identiﬁcation neural network with 960 inputs, 10 hidden layer neurons and 4 output layer neurons.

and 4 output layer neurons. Figure 3.5 shows the full layout for this neural network. The network occupies a total grid dimension of 9642 x 15393 grids. It is clear visible from the Figure that a great percentage of the total area is left unused. The reason for this is the nature of the neural network implemented. Due to the large number of inputs to the network, each neuron in the hidden layer receives 960 inputs, which means that there are 960 Memristor Bridge Synapses associated with each neuron. When we compare this number to the output layer neurons, each neuron receives only 10 inputs the previous layer which makes the size of the neuron blocks in the output layer to be very small. Figure 3.6 shows the output layer of the neural network after zooming into the center of the layout.

45 CHAPTER 3. PLACEMENT AND ROUTING TOOL FOR MEMRISTOR NEURAL NETWORK ARCHITECTURE

Figure 3.6: Output layer layout for face pose identiﬁcation neural network.

In Figure 3.7, we show the output layout of a neural network with 80 inputs, 12 hidden layer neurons and 15 output layer neurons. Table 3.1 compares the total area of layout for diﬀerent neural networks for diﬀerent technology nodes and Table 3.2 compares shows the total unused area in the layouts for the neural networks in Table 3.1.

Table 3.1: Comparison of total layout area for neural networks for diﬀerent technology nodes.

Neural Area (mm2) Grid Network Description for 2 λ = Dimensions Hidden Output Inputs 45 nm 32 nm 22 nm layer neurons layer neurons

960 10 4 9642x15396 0.075 0.038 0.018

80 12 15 1007x1313 6.60 × 10−4 3.38 × 10−4 1.60 × 10−4

30 10 10 342x513 8.88 × 10−5 4.49 × 10−5 2.12 × 10−5

From Table 3.2 we can see that a signiﬁcant area of the layout is unused. This area can

46 3.3. OUTPUT AND PERFORMANCE ANALYSIS

Figure 3.7: Output layer layout for neural network with 80 inputs, 12 hidden layer neurons and 15 output layer neurons.

Table 3.2: Fraction of unused area in layout for diﬀerent neural networks

Neural Grid Unused Network Description Dimensions area Hidden Output Inputs layer neurons layer neurons 960 10 4 9642x15396 35% 80 12 15 1007x1313 40% 30 10 10 342x513 41% be used to implement other logic that is required for the operation of the neural network. For example, generation of random numbers can be accomplished by implementing a linear feedback shift register (LFSR) within the circuit. This way, the amount of time required to shift in the random numbers to the shift register can be signiﬁcantly reduced.

3.3.2 Runtime Performance

Since we take advantage of the symmetry of the architecture and multiple instantiations of the same components to create the layout using mathematical and geometric calculation that require less memory access and processing, the runtime for creating the layout is very less. The largest layout we created using the tool was the neural network for face pose identiﬁcation which occupied 9642 x 15393 grid vectors. The runtime required to

47 CHAPTER 3. PLACEMENT AND ROUTING TOOL FOR MEMRISTOR NEURAL NETWORK ARCHITECTURE create this network on a PC with Intel CORE i3 370m processor at 2.40 GHz clock speed was less than 0.2s. Since the runtime for the tool is very small, we are not reporting the runtime for any of the other neural networks that we had created.

3.4 Scalability

Figure 3.8: Neural network with 80 inputs and 15 output layer neurons having two hidden layers with 30 neurons in the ﬁrst hidden layer and 25 neurons in the second hidden layer.

We show the scalability of the tool with the example of one neural network. This neural network has 2 hidden layers with 30 neurons in the ﬁrst hidden layer and 25 neurons in the second hidden layer. The network has 80 inputs and 15 output layer neurons. Figure 3.8 shows the neural network layout. This neural network occupies 1974x2303 grids on layout. When scaling the architecture to incorporate more number of hidden layer in the

48 3.5. SUMMARY neural network, the number of neurons in each layer need to be carefully planned. A reasonable ratio needs to be maintained between the number of inputs to a layer and the number of neurons in the layer and the number of neurons in the succeeding layer. It needs to be noted that the number of primary inputs to the neural network needs to be greater than the number of neurons in the ﬁrst layer.

3.5 Summary

In this Chapter, we introduced and described the placement and routing tool that can be used to realize the scalable hardware architecture for memristor based neural networks. We gave an overview of the tool and explained how it was designed. The tool ﬂow was also described and an analysis of the tool output in terms of the area occupied by diﬀerent neural networks was also given. The scalability of the tool was illustrated with the help of an example of a three-layered neural network. In the next Chapter, we discuss the experiments, observations and results.

49 Chapter 4

Experimental Results and Analysis

The simulations in this work were primarily done on SPICE and Python. The basic components such as the memristor, the memristor bridge synapse, the memristor bit- slice and a small neural network were simulated using SPICE. Bigger neural networks were simulated using Python, which mimicked the behavior of the basic components at a higher level of abstraction. In this Chapter, we describe the observations and results of the simulations in SPICE and analysis of the results. The simulation results from Python are not presented here since they do not convey anything diﬀerent from what was reported in [9]

We build confidence in the design by first illustrating the proper functionality of the basic components of the architecture using SPICE. We begin by describing the behavior of the memristor followed by the memristor bridge synapse. Once these components are described, we go on to simulate the summing and difference logic using operational amplifier circuits. After simulating all the individual components, we first perform a partial simulation to illustrate the training process of the small neural network. This network will have all the basic components working together in unison. We then go on to simulate a complete training of a neural network to learn the OR-gate function.

50 4.1. MEMRISTOR SIMULATION

4.1 Memristor Simulation

Biolek et al. developed a mathematical model for the memristor, based on the ﬁndings in [13] by incorporating non-linear dopant drift modeled using window functions [22]. In the experiments presented here, we have used an ideal model of the memristor in the simulations.

Figure 4.1: Circuit for Memristor simulation with Memristor M1 (Ron =116Ω, Roff =16kΩ) in series with resistor R1 (100Ω) and Voltage source Vin.

Figure 4.1 shows the circuit used for simulating the memristor. In this circuit, memristor M1 is connected in series with a resistor R1 and a voltage source Vin. The memristor used in this simulation has on resistance (Ron) of 116Ω and oﬀ resistance (Roff ) of 16kΩ. The series resistor R1 is 100Ω.

The circuit was simulated by supplying +1V and -1V DC voltages and the change in the resistance of the memristor M1 was measured. The resistance of the memristor was calculated based on the measured instantaneous current and applied voltage.

The memristor in its initial state has a resistance of 1kΩ. A +1V was ﬁrst applied to terminal A and the memristor was brought to its ON state (state of least resistance). Then a -1V was applied for a certain period of time and the memristor was brought to its OFF state (state of most resistance). Figure 4.2 shows the waveform for this simulation.

It took about 9ms for 1V pulse to completely turn the memristor ON and make it reach Ron=116Ω. From the Ron state, it took -1V pulse about 20ms to bring the

51 CHAPTER 4. EXPERIMENTAL RESULTS AND ANALYSIS

Figure 4.2: Memristor simulation with DC voltage +1V and -1V. memristor to near OFF state. From Figure 4.2, we can see that the memristor does not go completely into OFF state until a long period of time. This is due to the non- linear nature of the memristor. This behavior is expected since it takes a pulse of longer duration to turn OFF the memristor when compared to turning it ON.

To observe the resistance change of the memristor more closely, we applied voltage pulses of diﬀerent duration and measured the resistance change with a view of ﬁnding a suitable training pulse for the memristor bridge synapses. We performed two simulations on two instances of the circuit in Figure 4.1. To one of the circuits, a positive pulse was supplied at node A, and to the other a negative pulse was applied at node A, making them forward and reverse biased respectively.

In the ﬁrst simulation, voltage pulses of duration in milliseconds were supplied to both instances of the circuit. Figure 4.3 shows the simulation waveforms for millisecond pulses. Pulses of pulse-width 10ms and 5ms were applied to the circuit and the instantaneous initial and ﬁnal currents were measured and the resistance of the memristor was calculated. In the second simulation, pulses with duration in microseconds were applied and similar calculations were made. Figure 4.4 shows the simulation waveforms for the

52 4.1. MEMRISTOR SIMULATION

Figure 4.3: Resistance change in the memristor for millisecond input pulse-width. pulses of pulse-width 400µs and 250µs. The waveform in Figure 4.4 also illustrates that that applying a negative pulse for the same duration can reverse the eﬀect of the initial positive pulse and vice versa. The results of the simulation are summarized in Table 4.1 and Table 4.2. Table 4.1 shows data for forward biased circuit and Table 4.2 for reverse biased. The instantaneous initial and ﬁnal currents were measured for both forward and reverse biased memristors, and the instantaneous resistance values were calculated. Input voltage of pulse-width 10ms, 5ms, 400µs and 250µs were applied and the resistance changes were measured. The objective of this experiment was to identify a pulse-width that would bring an optimal resistance change in the memristor for updating the weight of the memristor bridge synapse.

Table 4.1: Instantaneous current and resistance measurements for forward biased memristor.

Pulse-width Iinit (A) Ifinal (A) Rinit (Ω) Rfinal (Ω) Delta R (Ω) 10ms 1.23 × 10−4 1.59 × 10−4 8057.9377 6176.2819 1881.6557

5m 1.59 × 10−4 1.93 × 10−4 6176.2819 5083.4957 1092.7862

400µs 1.23 × 10−4 1.24 × 10−4 8057.9377 7989.9604 67.977314

250µs 1.24 × 10−4 1.24 × 10−4 7989.9604 7946.9944 42.965912 From the data, we can see that pulses with pulse-width in millisecond range bring a

53 CHAPTER 4. EXPERIMENTAL RESULTS AND ANALYSIS

Figure 4.4: Resistance change in the memristor for microsecond input pulse-width. very large change in the resistance of the memristors. The pulses in microsecond range seem more suitable for weight training since the resistance change is less than 100 Ω. This will give a wider range for the weights supplied by the memristor bridge synapse. A positive 1V pulse of 400µs brings a decrease in the resistance of the memristor by about 68Ω and a negative 1V pulse of the same duration brings an increase of nearly the same amount. Simulating the memristor bridge synapse will give a better idea of how the pulse-width of the training pulse aﬀects the weight of the neural network.

Table 4.2: Instantaneous current and resistance measurements for reverse biased memristor.

Pulse-width Iinit (A) Ifinal (A) Rinit (Ω) Rfinal (Ω) Delta R (Ω) 10ms 1.23 × 10−4 1.03 × 10−4 8057.9377 9587.1065 1529.1688

5m 1.03 × 10−4 9.67 × 10−5 9587.1065 10239.123 652.01678

400µs 1.23 × 10−4 1.22 × 10−4 8057.9377 8125.7136 67.775907

250µs 1.22 × 10−4 1.21 × 10−4 8125.7136 8167.1958 41.482187

4.2 Memristor Bridge Synapse Simulation

To simulate the memristor bridge synapse, we used the arrangement shown if Figure 2.5 and inserted a voltage source at node IN. Figure 4.5 shows the circuit used for simulating

54 4.2. MEMRISTOR BRIDGE SYNAPSE SIMULATION

Figure 4.5: Memristor Bridge Synapse circuit used for simulation.

the memristor bridge synapse. As described earlier the two memristors top and bottom are connected such that one of the two memristors is forward biased and the other reverse biased. The output voltage of the bridge synapse is tapped from nodes A and B. The weight supplied by the memristor bridge synapse is adjusted by changing the resistance of the memristors on the bridge synapse. The diﬀerence in the voltage at nodes A and B

(VA − VB) is the output of the bridge synapse.

Figure 4.6: Memristor Bridge Synapse simulation waveform.

The memristor bridge synapse was simulated by applying a 1V pulse of width 400µs. Before the pulse was applied, all the memristors were brought to their initial state with all memristors having the same resistance. The waveforms from the simulation are shown in Figure 4.6. The ﬁrst pulse applied for 400µs is the update pulse. The second spike

55 CHAPTER 4. EXPERIMENTAL RESULTS AND ANALYSIS in the waveform is the training input pulse applied for a much shorter duration (5ns).

We can observe that the voltage diﬀerence (VA − VB) produced by the memristor bridge synapse is about 4.26mV after updating the weights. When the weight update pulse is applied, the resistance of memristors M1 and M4 decrease, while that of M2 and M3 increase. This results in an increased voltage drop at node A and decreased drop at B.

Figure 4.7: Evaluation pulse applied to Memristor Bridge Synapse.

The evaluation pulse is applied to read out the output of the network and is of much shorter duration compared to the training pulse. The 5ns pulse is short enough to not bring any notable change in the resistance of any of the memristors in the bridge synapse. Figure 4.7 shows the magniﬁed part of the evaluation pulse from Figure 4.6. We can see that the output voltages at neither node A nor node B changed after the evaluation pulse was supplied. Usually, the evaluation pulse is applied with another voltage pulse of same duration but opposite magnitude to negate any resistance change that might be incurred during evaluation.

Table 4.3 shows the voltage diﬀerence generated for training pulses of diﬀerent pulse- widths. In each of the simulations, all the memristors in the memristor bridge synapse were at the same initial state (R=8050Ω) before the training pulse was applied. A voltage pulse of 1V amplitude was applied to the IN as the training input.

The memristors used in our experiments have ON resistance of 116Ω and OFF resis-

56 4.3. MEMRISTOR BRIDGE SYNAPSE BIT-SLICE SIMULATION

Table 4.3: Weight change for diﬀerent training signal pulse-widths for memristor bridge synapse

Pulse-width (µs) VA (V ) VB (V ) VA - VB (V ) 400 0.50213 0.49787 0.00426 800 0.50426 0.49574 0.00852 1200 0.50638 0.49362 0.01276 2400 0.51277 0.48723 0.02554 4800 0.52499 0.47501 0.04998 tance of 16kΩ. In terms of magnitude, the minimum and maximum output voltage the memristor bridge synapse can produce is 0V and 0.9928V, assuming that the maximum input voltage is 1V. This means that if 400µs pulse of 1V magnitude is used as the training pulse, each bridge synapse will have about 466 possible weights. If a 4800µs pulse of 1V magnitude is used, there would be around 40 possible weights for each memristor bridge synapse. The length of the training pulse should be chosen based on requirements function the neural network is attempting to approximate. For certain problems, it would be eﬀective to have more number of available weights, while for others a smaller number could be more eﬃcient.

4.3 Memristor Bridge Synapse Bit-Slice Simulation

The Memristor Bridge Synapse Bit-Slice was simulated to test its functionality. The bit-slice design as described in Figure 2.10 is composed of the memristor bridge synapse, a flip-flop and a multiplexer circuit to choose between +1V and -1V training pulse. The output of the flip-flop controls whether the multiplexer supplies a +1V or -1V to the input of the memristor bridge synapse during training.

A simple simulation was done to verify the functionality of the memristor bridge synapse bit-slice. The experiment results are shown in the waveform in Figure 4.8. The update signal is used to control the weight update and training input application. When update is low, the +1V and -1V rails used for weight update is gated and driven to GND. This ensures that there will be no leakage current through the multiplexer circuit that would otherwise alter the weight of the bridge. When update is high, the power rails are

57 CHAPTER 4. EXPERIMENTAL RESULTS AND ANALYSIS active and the weight update process is activated. The weight-update pulse is applied to the memristor bridge synapse till the update signal is high. During the weight update phase, the training input is not applied.

Figure 4.8: Memristor Bridge Synapse Bit-Slice simulation waveform.

In the simulation waveform in Figure 4.8, the update signal is initially kept low. A high signal is produced at the input of the flip-flop and a clock signal is applied. Once the clock is applied, the output of the flip-flip (scan out) becomes high. After a small time gap, the update signal is made high and the weight update process is activated. When the weight update is completed, a small pulse is applied through the training input port to the memristor bridge synapse to evaluate the voltages at nodes A and B. When evaluation is complete, a low signal is produced at the flip-flops input and the same process is repeated.

In this experiment, a weight-update pulse of 30ms was applied while the ﬂip-ﬂop output held a high value and a 20ms pulse was applied when it held a low value. The waveform in Figure 4.8 show that the bit-slice circuit is functioning correctly.

58 4.4. SIMPLE NEURAL NETWORK SIMULATION

4.4 Simple Neural Network Simulation

To illustrate the working of the memristor based artificial neural network as a system of the basic components working together, we simulated the training of the simple memristor based artificial neural network in Figure 2.4 explained in Chapter 2. This neural network aims to approximate the OR-Gate function. The components of this neural network include six memristor bridge synapses (Figure 2.5), six summing amplifiers (Figure 2.7) and three difference amplifiers (Figure 2.8). The elements of the memristor bridge synapse bit-slice like the D flip-flop, multiplexer etc. are ignored in this simulation for simplicity. The weight update pulses are supplied directly to the bridge synapse terminals for this simulation using separate voltage sources.

Figure 4.9: Neural network training input application and output evaluation.

The initial conditions of the memristors were changed for better clarity in illustrating the functioning of the network. Each memristor is set to have a diﬀerent initial resistance. Figure 4.9 shows the simulation results of the circuit functioning. In the waveform, signal n5out is the output of the neural network. Signals in1 and in2 are the two input training signals to the neural network. update is the signal used to control between weight update

59 CHAPTER 4. EXPERIMENTAL RESULTS AND ANALYSIS

and training input application. The training input is applied to the circuit when update is low. The weight update process happens automatically when update goes high.

For the ﬁrst step, update is made low and the training pulse is applied. The training inputs are given as a complementary pair of pulse-width 1µs. The output is only measured and evaluated during the ﬁrst 1µs. The second complemented input is applied to restore any change caused to the memristors due to the application of the input.

Figure 4.10: Neural network weight update pulse application.

After the training inputs are applied and output measured, the output error is calculated by comparing the obtained output with the expected output. On taking the mean squared error of the output, we see that the output error is about 37.4%. Since this error is much greater than desired, we apply weight update pulses to the memristor bridge synapses. For the ﬁrst training iteration, random weight update pulses are applied to each of the memristor bridges in the network. There are six memristor bridges in this particular example network and six individual pulses are applied to train each of the memristor bridges. Figure 4.10 shows the weight update pulses applied to the memristor bridges.

The ﬁrst set of random wight update pulse applied to the memristor bridge synapses

60 4.4. SIMPLE NEURAL NETWORK SIMULATION

are [-1 1 1 -1 -1 -1] to [BR1 BR2 BR3 BR4 BR5 BR6]. Each weight update pulse is applied for a duration of 400µs. The pulse changes weights of the memristor bridge synapses in either positive or negative direction depending on whether a positive or negative voltage applied. After the weight update pulses are applied, the training input is applied and the network is evaluated to measure the output error. We see that the output value during evaluation is 0.38426V and the output error is 37.9%. The output error has increased after the ﬁrst set of weight update pulses were applied. The RWC recommends that a new set of random weight update pulses should be applied to the memristor bridge synapses if the output error increased compared to the previous iteration. A new set of voltages, [1 -1 1 1 1 -1] is applied to the memristor bridge synapses and an output voltage of 0.38608V is obtained on evaluation. The new output error is 37.6%, which is lower than the previous iteration. So, the same weight update pulses are applied again until the error either increases or reaches the expected value. Figure 4.11 (a)-(d) show evaluation pulses magniﬁed.

Figure 4.11: Neural network output at evaluation during diﬀerent iterations.

61 CHAPTER 4. EXPERIMENTAL RESULTS AND ANALYSIS

4.5 OR-Gate Training in SPICE

4.5.1 Experimental Setup

To show a complete training simulation of the Memristor Based neural network designed using the hardware architecture described in this thesis, we created a training simulator using HSPICE and Perl. The simulator tries to mimic the behavior of the neural network on hardware as close as possible. SPICE is the closest and most accurate simulation technique available to simulate electric and VLSI circuits.

Figure 4.12: Flowchart showing tool ﬂow for neural network training simulator in SPICE.

62 4.5. OR-GATE TRAINING IN SPICE

In our simulator, the neural network is defined using a SPICE circuit. All the logic components of the neural network are modeled at the transistor level in SPICE except for the differential amplifier for which we have used an idea model. For the memristor, we have used the ideal memristor model from [23]. Perl mimics the operations of the microcontroller by generating the control signals and the supplying the training input.

The flowchart in Figure 4.12 shows the simulator flow. The simulator is basically a wrapper around HSPICE created using Perl. All interactions of the user are through the command line interface to the Perl script. The simulator receives the number of inputs, hidden layer neurons and output layer neurons along with pointers to files containing the training inputs and the expected output. The user can also supply the error threshold and a maximum iteration count for terminating the simulation. The simulation will terminate if the output error goes below the error threshold or if the number of iterations of training reaches the limit.

The Perl script first reads and store the training input and expected output from their respective files and stores the information in a data structure. It then creates a SPICE file of the complete neural network with all of its components using the specifications provided by the user. Note that the SPICE files contains only the neural network and none of the functions of the microcontroller are implemented in SPICE.

For the first iteration, the SPICE file contains only the training input supplied and all other control signals are made inactive. The Perl script includes instructions to sample the output of the neural network for when the training inputs are applied to the network as voltages. The output voltage values are stored in a log file generated by HSPICE. At the end of the HSPICE simulation for an iteration, the HSPICE tool is given a directive to store the circuit state to a file that can be loaded by another SPICE file to begin its simulation from where the previous iteration had finished.

Once the simulation is complete, Perl reads the network output voltage values from the HSPICE log ﬁle and compares it with the expected output to compute the output error. For the ﬁrst iteration, the error for the previous iteration is saved as 0. The wrapper checks if the new error is greater or lesser compared to the previous iteration. If

63 CHAPTER 4. EXPERIMENTAL RESULTS AND ANALYSIS

Figure 4.13: Neural network output for learning OR-gate function at the start of simulation.

the new error is greater, then the wrapper generates random bits for training the neural network. It modifies the PWL voltage inputs defined in the SPICE file to update the weights of the neural network. It also adds lines to reload the circuit state at the end of the previous iteration and calls the HSPICE simulator for the next iteration. If the new error is found to be less compared to the previous iteration, then the wrapper first checks if the new error is less than the error threshold. If this is found to be true, the training simulation will end. If the new error is greater than error threshold then the wrapper starts the next iteration of HSPICE simulation.

Since the RWC algorithm is an iterative heuristic, the output is not guaranteed to be optimal. There is a chance that the output error may not go below the error threshold during training. Even if the output error is only 0.1% above the error threshold, the circuit still continues training and may not ﬁnd an solution after the iteration limit has reached. Unlike in software, it is not possible to save a snapshot of the circuit for the best case output and revert to the required state. Setting the circuit to a speciﬁc state would involve changing the resistance values of many memristors. Hence, choosing a suitable error threshold is critical in training the neural network with the RWC algorithm.

64 4.5. OR-GATE TRAINING IN SPICE

4.5.2 Observation and Analysis

We successfully trained a neural network with 2 inputs, 3 hidden layer neurons and one output layer neuron to learn the OR-gate function. The training simulation was done using the Perl-HSPICE simulator explained in the previous section. For this simulation, all the memristors were initially in the same state. Hence, the weight supplied by all Memristor Bridge Synapses will be 0 at the start of training. Figure 4.13 shows the circuit output for the ﬁrst iteration of training. v(in1) and v(in2) represent the input pulses supplied to the neural network and v(l3in) represent the output. The input pulses are supplied as complementary pairs in order to revert the eﬀect of the applied input. The output values obtained for the neural network was [8.2067aV 5.384nV 5.2055nV 9.736nV] for an expected output of [0V 1V 1V 1V]. Note that all four input combinations for the two input OR-gate are supplied to the neural network for training.

Figure 4.14: Neural network output for learning OR-gate function for 54th iteration of training.

In Figure 4.14 we show the output waveforms for the 54th iteration of training. It is interesting to note that in this iteration, the output voltage for the input combination v(in1) = 1V and v(in2) = 0V, the output is -0.012V. The error threshold for the simulation was set at 0.015%. At the end of the simulation, the outputs obtained were [150.86fV

65 CHAPTER 4. EXPERIMENTAL RESULTS AND ANALYSIS

0.92259V 0.98883V 1V]. The simulation ran for a total of 276 iterations with random bits being supplied for 28 of those iterations. The weight update pulses were supplied for 500 µs for each iteration. Figure 4.15 shows the output waveform form the neural network at the end of simulation.

Figure 4.15: Neural network output for learning OR-gate function at the end of simulation.

4.6 Power and Timing Estimation

In this section, we give an approximation of the power consumption and timing of the neural network circuit. We mathematically analyze the power consumption and timing for the circuit during training and come up with a generalized formula for estimating these metrics for diﬀerent neural network designs.

4.6.1 Power

The major player in power consumption for the neural network in both training and normal operation are the memristor bridge synapses. Since these components are resistive elements, they are likely to consume most power. Here, we mathematically calculate the power consumed by a single memristor bridge synapse based on the memristor model

66 4.6. POWER AND TIMING ESTIMATION

that we have used in our design.

The arrangement of the memristors on the memristor bridge synapse ensures that the total resistance of the bridge synapse remains a constant. When a training pulse is applied, the resistance of two of the memristor bridge synapse increase and that of the other two decrease. This feature makes it easier for us to calculate the power consumed during the operation of the circuit. For our simulations, we used a memristor model

with Ron = 116Ω, Roff = 16kΩ and Rinit = 8050Ω. We assume that all the memristors of all bridge synapses in the circuit are at the same initial state before commencing training. This implies that the total resistance of each memristor bridge synapse is about 8050Ω at all times. With this notion, we can calculate the instantaneous power drawn by the memristor bridge synapse during training. To update weights, we supply a positive or negative pulse of 1V magnitude. The current drawn from the circuit and the total instantaneous power is given by the equations below.

1 Power, P = W (4.1) 8050

P = 124.22 µW (4.2)

Both instantaneous power and average power for the memristor bridge synapse during training is the same since the it can be viewed as a DC circuit with constant overall resistance during the weight change process although the actual memristors may be changing their resistance values. Note that the multiplexer circuit, the inverter and the ﬂip-ﬂip associated with each memristor bridge synapse also consumes power during training, but we ignore the power consumed by these elements since it is negligible compared to the power consumed by the bridge synapse. So, the total average power consumed during training can be generalized by the following formula.

Total Average Power, Ptot = (P ∗ number of bridge synapses)W (4.3)

67 CHAPTER 4. EXPERIMENTAL RESULTS AND ANALYSIS

Total Average Power, Ptot = (124.22 ∗ number of bridge synapses) µW (4.4)

Equation 4.3 shows the general formula for total average power for a memristor bridge synapse based artificial neural network and Equation 4.4 shows the formula for total instantaneous power for our simulations. The neural network used to simulate the OR- gate function described in the previous section consisted of a total of 9 memristor bridge synapses. So, the total average power consumed by the entire network for one iteration of training is 1.118 mW. The power consumption for the neural network during evaluation phase of training and standalone operation post training cannot be accurately estimated since it depends on the input voltages supplied and the output of differential amplifiers at each neuron. But the worst case instantaneous or average power consumed by the neural network during standalone operation will be the same as the instantaneous power consumed during training.

4.6.2 Timing

The complete timing for training a neural network cannot be accurately predicted because of the algorithm being used to train the network. However, we can estimate the time required for one iteration of training. Application of the weight update pulse occupies the majority of time in one training iteration. In our simulations, we supplied a training pulses for 500µs duration to update the weights. To evaluate the new output after updating the weights, signals can be supplied in ns range. When compared to time required for updating weights, the evaluation time is negligible. Another contributing factor during training is the time required to shift in random bits to the flip-flips when the weight change directions have to be updated. This time would depend on the number of memristor bridge synapses in the network and the total number of individual shift registers in the circuit. The clock period for the flip-flops can also be in nanosecond range. The following equations summarize the time for training.

68 4.7. TRAINING PERFORMANCE

Time one training w/ random bit generation = Weight update time + Shift in time (4.5)

Time one training w/o random bit generation = Weight update time (4.6)

Equations 4.5 and 4.6 show the total time required for one training iteration when random training bits are applied and not applied respectively. The neural network simulation for OR-gate function required a total of 276 iterations with 28 iterations requiring random bit generation. In our simulations, we used a clock of period 2µs to shift in the values to the shift register. There were a total of 9 ﬂip-ﬂops in the circuit all connected to form on shift register, which meant 9 clock cycles were required to shift in all the values to the shift register. So, total time for training the circuit in hardware would be,

Total time = [(500 + 2 ∗ 9) ∗ 28 + 500 ∗ 248]µs (4.7)

Total time = 138.50ms (4.8)

We can see from Equation 4.8 that the actual time required to train this neural network in hardware is less than 0.15s. Even though ﬂip-ﬂops can work faster than with clock cycles of 2µs, we supplied such long pulses to reduce the run time for HSPICE simulation by reducing the resolution. In real-time scenarios, the shift in time for random bits would be far less than what is reported here.

4.7 Training Performance

We performed ﬁve training simulations to learn the OR-gate function using the same neural network to analyze the performance of the training algorithm. One of the training simulations required 276 iterations of weight updates. During training, the network

69 CHAPTER 4. EXPERIMENTAL RESULTS AND ANALYSIS

Figure 4.16: Mean squared error vs iterations for training OR-gate function.

received new random bits for weight update for 28 iterations, meaning that the output error increased 28 times during training. The graph in Figure 4.16 shows how the mean squared error of the output reduces during the course of training for this simulation. The Y-axis shows the mean squared error and the X-axis shows the number of iterations. We can see from the graph that initially the error increases and decreases a few times before the error starts to decrease continuously. After the initial slow decent, the error increases again close to 50 iterations. Following another change in the random inputs the error starts to decrease steeply as the network ﬁnds a suitable direction of weight changes to match the expected output. The error again increases and decreases before it ﬁnally reaches below the error threshold of 0.015%.

Table 4.4: Comparison of training performance for multiple simulations for training OR-gate function in HSPICE

Simulation Total Weight Random Continuous Training Time No. Updates Updates Updates on Hardware (ms) 1 276 28 248 138.50 2 56 2 54 28.04 3 693 20 673 346.86 4 97 6 91 48.61 5 732 4 728 366.07

70 4.7. TRAINING PERFORMANCE

In Table 4.4 we summarize the results for all five training simulations for OR-Gate. The number of iterations required for training cannot be predicted because of the ran- domness of the algorithm used. We see that for the second experiment, the number of iterations required for training was only 54, while the number for the fifth experiment it was 732. When we compare the fourth and fifth simulation, we can see that the fourth one finished in 97 iterations, but had more random weight updates than the fifth simulation. In the fourth simulation, the network was able to find good directions for weight change for its memristor bridge synapses and the error curve had a steeper slope. Figure 4.17-4.20 shows the mean squared error plot for simulations 2-5.

Figure 4.17: Mean squared error vs iterations for simulation 2.

71 CHAPTER 4. EXPERIMENTAL RESULTS AND ANALYSIS

Figure 4.18: Mean squared error vs iterations for simulation 3.

Figure 4.19: Mean squared error vs iterations for simulation 4.

72 4.8. SUMMARY

Figure 4.20: Mean squared error vs iterations for simulation 5.

4.8 Summary

The primary objective of this Chapter was to simulate and illustrate the working of the neural network hardware architecture in SPICE. We began by simulating the working of the individual components of the neural network and followed these simulations up with a full training simulation completely in SPICE. We also presented an estimation of the power consumption of the circuit and showed the timing requirements for training a circuit. In the next Chapter, we draw conclusions of this work and propose enhancements and extensions to architecture presented in this thesis.

73 Chapter 5

Conclusion and Future Work

5.1 Conclusion

The Memristor based artiﬁcial neural networks presented in [9] employed the Memristor Bridge Synapse to implement weights and the Random Weight Change algorithm for training. The focus of the work in [9] was only to prove that Memristor Bridge Synapse based neural networks can used to learn complex functions and maybe implemented on a chip with supplementary hardware. The simulations were done in software to illustrate the training of the neural network but a path to actual hardware implementation was not provided.

We based our work on the findings in [9] that the Memristor Bridge Synapse an effective system for implementing weights which when employed with the RWC algorithm can yield to neural networks capable of learning complex functions on chip without requiring a host computer. Our aim was to develop a complete hardware architecture to implement Memristor Bridge Synapse based artificial neural networks. First, we presented an efficient way to place different layers of neurons to allow maximum inputs to be supplied to the network with less routing. Then we went on to describe the primary components of the neural network like the Memristor Bridge Synapse and the operational amplifiers along with various other hardware components necessary to implement the training logic. After describing the building blocks of the neural network, we went on to show how var-

74 5.2. FUTURE WORK ious components could be combined together to form a bit-slice structure which can be repeated to form layers of neurons.

We developed a prototypical placement and routing tool for the proposed architecture to illustrate how the neural network would appear on layout. The tool also gives an approximate insight into how much area the neural network requires and the eﬃciency of the architecture in utilizing the chip area.

To ascertain that the proposed architecture with all its components can successfully implement an artificial neural network capable of learning complex functions on chip, we performed various SPICE simulations. We first simulated all of the basic components of the architecture individually. After verifying their functionality, we combined the basic components to make circuits to perform the different tasks in implementing and training the neural network and tested their functioning. We performed a complete neural network training simulation in HSPICE to learn the OR-gate function.

Through our simulations and analysis, we were able to conclude that the hardware architecture presented in this thesis is an eﬀective way to implement artiﬁcial neural networks using memristors. As the large scale production of memristors on physical layout becomes possible our architecture can be directly realized on chip without requiring any additional circuity and can be easily scaled to have several layers of neurons to learn complex functions.

5.2 Future Work

In this section, we present a few ideas that might help improve the robustness of the system and its ability to learn functions and reduce power consumption.

5.2.1 Implementing Stronger Activation Function

The activation function implemented in the hardware architecture is only the summing of the individual VA and VB voltage components of each bridge and taking the diﬀerence of the two sums. A more complex activation function can be implemented to improve the

75 CHAPTER 5. CONCLUSION AND FUTURE WORK learning process. Circuits are available to implement popular activation functions such as the sigmoid function and these circuits can be added to the neuron. The circuit will need to be tested to see how eﬀective other activation functions will be when implemented along with the Memristor Bridge Synapse and the Random Weight Change Algorithm.

5.2.2 Linear Feedback Shift Register for Random Bits

In our architecture, we had assigned the task of creating random bits for training solely to the microcontroller. The microcontroller would serially shift in the bits to the shift register before the training pulses could be applied. This processes takes several clock cycles depending on the size of the network. However, if the shift registers themselves were made to generate random bit values, then the process would take far less time. Completely new random bits could be generated in just one clock cycle and would lead to saving a lot of time during training. The layout of the architecture also contains lot of free space to implement logic for creating an LFSR.

5.2.3 Implementing other Hardware Friendly Algorithms

The same hardware architecture could be used to implement neural networks with a training algorithm other than the Random Weight Change algorithm. In [24], Moerland and Fiesler explain few hardware friendly algorithms for artiﬁcial neural networks.

5.2.4 Bit-slice in Layout

A layout for the Memristor Bridge Synapse bit-slice can be created and functionality can be incorporated into the placement and routing tool to automatically place and route the bit-slice on the layout by replacing the p-diﬀusion blocks that represent neuron blocks.

5.2.5 Testing with more Memristor Models

The hardware architecture was tested only for one memristor model in our experiments. The architecture should be tested with diﬀerent memristor models, which may allow re-

76 5.2. FUTURE WORK duction in training pulse application depending on the memristors’ device parameters. We tested the network with only an ideal model of the memristor. In reality, the memristor’s non-idealities might play a signiﬁcant part in the eﬃciency of the neural network implementation. Thorough testing of the neural network can be done when characterized libraries of memristors become available.

5.2.6 Reconﬁgurable Neural Network

Our architecture when translated to a layout will have a fixed number of inputs, hidden layer neurons and output layer neurons. Different functions require different number of neurons in each layer for efficient implementation. If logic can be incorporated into the system by which the user can choose the number of neurons for each layer, it will provide more flexibility and robustness in implementing various types of functions.

77 Bibliography

[1] Wikipedia, “Memristor — wikipedia, the free encyclopedia,” 2016. [Online; accessed 4-February-2016].

[2] R. Williams, “How we found the missing memristor,” Spectrum, IEEE, vol. 45, pp. 28–35, Dec 2008.

[3] Wikipedia, “Artiﬁcial neural network — wikipedia, the free encyclopedia,” 2016. [Online; accessed 5-February-2016].

[4] M. Holler, S. Tam, H. Castro, and R. Benson, “An electrically trainable artiﬁcial neural network (etann) with 10240 ’ﬂoating gate’ synapses,” in Neural Networks, 1989. IJCNN., International Joint Conference on, pp. 191–196 vol.2, 1989.

[5] M. Milev and M. Hristov, “Analog implementation of ann with inherent quadratic nonlinearity of the synapses,” Neural Networks, IEEE Transactions on, vol. 14, no. 5, pp. 1187–1200, 2003.

[6] J. Liu, M. A. Brooke, and K. Hirotsu, “A cmos feedforward neural-network chip with on-chip parallel learning for oscillation cancellation,” Neural Networks, IEEE Transactions on, vol. 13, no. 5, pp. 1178–1186, 2002.

[7] J. A. Starzyk et al., “Memristor crossbar architecture for synchronous neural networks,” Circuits and Systems I: Regular Papers, IEEE Transactions on, vol. 61, no. 8, pp. 2390–2401, 2014.

[8] M. Soltiz, D. Kudithipudi, C. Merkel, G. S. Rose, and R. E. Pino, “Memristor-

78 BIBLIOGRAPHY

based neural logic blocks for nonlinearly separable functions,” Computers, IEEE Transactions on, vol. 62, no. 8, pp. 1597–1606, 2013.

[9] S. Adhikari, H. Kim, R. Budhathoki, C. Yang, and L. Chua, “A circuit-based learning architecture for multilayer neural networks with memristor bridge synapses,” Circuits and Systems I: Regular Papers, IEEE Transactions on, vol. 62, pp. 215–223, Jan 2015.

[10] H. Kim, M. Sah, C. Yang, T. Roska, and L. Chua, “Memristor bridge synapses,” Proceedings of the IEEE, vol. 100, pp. 2061–2070, June 2012.

[11] CMU, “Neural networks for face recognition.” [Online; accessed 18-February-2016].

[12] L. Chua, “Memristor-the missing circuit element,” Circuit Theory, IEEE Transac- tions on, vol. 18, pp. 507–519, Sep 1971.

[13] D. B. Strukov, G. S. Snider, D. R. Stewart, and R. S. Williams, “The missing memristor found,” Nature, vol. 453, pp. 80–83, May 2008.

[14] K. Hirotsu and M. Brooke, “An analog neural network chip with random weight change learning algorithm,” in Neural Networks, 1993. IJCNN ’93-Nagoya. Proceed- ings of 1993 International Joint Conference on, vol. 3, pp. 3031–3034 vol.3, Oct 1993.

[15] J. Misra and I. Saha, “Artiﬁcial neural networks in hardware: A survey of two decades of progress,” Neurocomputing, vol. 74, no. 1, pp. 239–255, 2010.

[16] M. L. Mumford, D. K. Andes, and L. R. Kern, “The mod 2 neurocomputer system design,” Neural Networks, IEEE Transactions on, vol. 3, no. 3, pp. 423–433, 1992.

[17] I. Bayraktaro˘glu,A. S. O˘grenci,G.¨ D¨undar,S. Balkır, and E. Alpaydın, “Annsys: an analog neural network synthesis system,” Neural Networks, vol. 12, no. 2, pp. 325– 338, 1999.

79 BIBLIOGRAPHY

[18] S. P. Adhikari, C. Yang, H. Kim, and L. O. Chua, “Memristor bridge synapse- based neural network and its learning,” IEEE Transactions on Neural Networks and Learning Systems, vol. 23, pp. 1426–1435, Sept 2012.

[19] S. P. Adhikari, H. Kim, R. K. Budhathoki, C. Yang, and J.-M. Kim, “Learning with memristor bridge synapse-based neural networks,” in 2014 14th International Workshop on Cellular Nanoscale Networks and their Applications (CNNA), pp. 1–2, July 2014.

[20] M. P. Sah, C. Yang, H. Kim, T. Roska, and L. Chua, “Memristor bridge circuit for neural synaptic weighting,” in 2012 13th International Workshop on Cellular Nanoscale Networks and their Applications, pp. 1–5, Aug 2012.

[21] “Magic VLSI Layout Tool.” http://opencircuitdesign.com/magic/. Accessed: 04-19- 2016.

[22] Z. Biolek, D. Biolek, and V. Biolkova, “Spice model of memristor with nonlinear dopant drift,” Radioengineering, vol. 18, no. 2, pp. 210–214, 2009.

[23] D. Biolek, M. Di Ventra, and Y. V. Pershin, “Reliable spice simulations of memristors, memcapacitors and meminductors,” arXiv preprint arXiv:1307.2717, 2013.

[24] P. Moerland and E. Fiesler, “Neural network adaptations to hardware implementations,” tech. rep., IDIAP, 1997.