Memristive Device based Brain- Inspired Navigation and Localization for Robots

A dissertation submitted to the

Graduate School

of the University of Cincinnati

in partial fulfillment of the

requirements for the degree of

DOCTOR OF PHILOSOPHY

in the Department of Mechanical and Materials Engineering

of the College of Engineering and Applied Sciences

by

Mohammad Sarim

Master of Technology

Aligarh Muslim University, UP, India

December 2012

Committee Chair: Manish Kumar, Ph.D. Abstract

Biomimetic robots have gathered tremendous research interests for various applications ranging from resource hunting in unknown terrains to search and rescue operations in situations of dis- asters such as earthquakes, fires, or terror-attacks. Biological species have been known to learn from the environment, collect and process data, and make appropriate decisions rather seamlessly.

Such sophisticated computing capabilities in artificial robotics counterpart are difficult to achieve, especially if done in real-time with ultra-low energy consumption. Traditionally, researchers have explored the possibility of employing computational methods such as artificial neural networks.

However, a biological neuronal system is inherently complex. Simulating such a large network of neurons on conventional computing platforms is computationally very expensive and consumes tremendous power. Further, large-scale networks suffer from the famous “memory wall” problem.

The memory devices are ideal for this purpose since they are capable of simultaneous computation and learning through resistance switching of their oxide layer on application of a voltage signal, thus avoiding the memory wall.

In this work, we have developed a memristive device based learning scheme for imparting deep learning capabilities to robots. These devices are arranged in a crossbar array to develop a neuromorphic computing platform that can provide a highly scalable and energy efficient computing architecture for learning in robots. A device-physics based model of a two-terminal synaptic memory device is used to simulate the learning behavior in the robot. The resistance switching is modeled using the Frenkel-Poole emission in the memory device.

We have demonstrated the validity of this approach by navigating a robot, integrated with just a few devices, through an unknown environment rigged with obstacles. A major advantage of our approach, as compared to traditional navigation methods, such as potential field algorithms, is the

i ability to escape the region of local minima. Further, our learning scheme is very scalable and can be applied as such to any miniature robot readily. We demonstrated this ability by integrating our learning scheme on a commercially available Khepera III robot by K-Team and guiding it through an unknown environment filled with obstacles. The Khepera III robot is a very capable robot with on-board processing capabilities and integrated ultrasonic and infrared proximity sensors. The sensors give current to the modeled memory device which, in turn, moves the robot according to the resistance values. These resistance values or synaptic weights, to say, are then ‘learned’ through the mechanism of Spike Timing Dependent Plasticity (STDP).

To explore the scalability of this approach, we also developed a robot localization mechanism using a large-scale network of such devices. This approach is motivated by the ‘place cells’ located in the hippocampus of the animal brain that fire when the animal enters a corresponding ‘place field’ in the environment, thus making them responsible for providing a cognitive map of the environment.

As the robot explores the environment, the localization network associates the environment features around it to the location information coming from place cells. We demonstrated this approach on the Khepera III robot moving in an unknown environment with randomly located distinct landmarks. We also integrated this localization mechanism with the navigational ability developed earlier to complete the robot navigational scheme. The experimental results show that, after learning, the robot is capable of localizing itself with good precision, thus corroborating the validity of this approach and establish the potential and robustness of memristive device based networks.

ii

Acknowledgments

“In the name of Allah, the Most Beneficent, the Most Merciful”

First and foremost, I would like to express my sincere gratitude to my research advisor and committee chair, Dr. Manish Kumar, for his constant support and motivation throughout the development of this work. This work would not be possible without the encouragement and guidance of Dr. Kumar and I am highly grateful for his excellent advice and support. I would also like to thank him for all the financial support that enabled me to carry out this work with cutting-edge research and computational facilities. His constant encouragement towards publications, conference presentations and talks have added a lot of value to my research understanding during my PhD journey.

Many thanks to my committee members, Dr. Rashmi Jha, and Dr. Ali Minai, for their excellent guidance and inputs towards making this work an outstanding research breakthrough. Their inputs on hardware modeling and simulation aspects, respectively, have greatly added to the value of this work and the publications. I would also like to thank Dr. David Thompson and Dr. Tamara

Lorenz, for their support towards the refinement of this work through their reviews, inputs and suggestions. I am deeply grateful to all my committee members, collectively, for serving on my dissertation committee and reviewing the work in time to make it perfect.

I would also like to thank Dr. Balaji Sharma, my senior at CDS Lab, for assisting me with my robots and getting them ready for experiments. Further thanks to my lab members, Alireza,

Ruoyu, Mohammadreza, Gaurav, Rumit, Aditya, Hans, Matthew, and everyone else at CDS Lab for their support and help during the past few years.

Many thanks to Nicole Jenkins and Lorri Blanton from UC International for shaping me up as a leader through the IPALs program. Cheers to fellow IPALs and my best wishes for this initiative.

iv My Mom and Dad, who supported my decision to travel 7600 miles for seeking knowledge, I can never thank them enough for their motivation and constant encouragement while being away from their son. Thanks to my siblings Sana and Yasir, for all their love. I would like to especially thank my dearest wife, Sara, for her patience and motivation throughout this PhD journey which

I could not have imagined without her.

I am grateful to The University of Toledo as well as University of Cincinnati, the College of

Engineering and Applied Sciences, the Department of Mechanical and Materials Engineering, and the Graduate School at UC for providing me an excellent research environment. I also acknowledge the National Science Foundation for funding this research work.

v Contents

1 Introduction 1

2 Literature Review 5

2.1 Types of Machine Learning ...... 5

2.1.1 Supervised Learning Techniques ...... 6

2.1.2 Unsupervised Learning Techniques ...... 9

2.2 Learning in Biological Systems ...... 10

2.3 Biological Navigation and Localization ...... 11

2.4 Learning in Robotics ...... 12

2.4.1 Navigation ...... 12

2.4.2 Simultaneous Localization and Mapping (SLAM) ...... 13

2.5 Problem Identification ...... 14

3 Spike Timing Dependent Plasticity 16

4 Device Models 19

4.1 Macro-Model of the Memristor Device ...... 19

4.2 Leaky Integrate and Fire Neuron Model ...... 21

4.3 Device-Physics Derived Model ...... 22

5 Learning Schemes 26

5.1 System Description ...... 26

5.1.1 Sensor Configuration ...... 26

5.1.2 Robot Kinematics ...... 28

vi 5.1.3 Place Cell Configuration ...... 29

5.1.4 Environment Features ...... 30

5.2 Learning Scheme for Navigation ...... 31

5.3 Learning Scheme for Localization ...... 32

5.3.1 Network Design ...... 32

5.3.2 Learning Rule ...... 34

5.3.3 Learning Scheme ...... 35

6 Simulation Results 37

6.1 Simulation Results for Navigation ...... 37

6.1.1 Macro-Model (M1) ...... 37

6.1.2 Device-Physics Derived Model (M2) ...... 38

6.1.3 Comparison with Paths Based on Reinforcement Learning Algorithms . . . . 39

6.2 Device Variability Study ...... 42

6.2.1 Variability in Device Doping Concentration ...... 43

6.2.2 Variability in Update of Resistive States ...... 44

6.2.3 Device Malfunction ...... 44

6.3 Simulation Results for Localization ...... 46

6.3.1 Memristive Device Model ...... 46

6.3.2 Comparison with Computational SLAM ...... 46

7 Experimental Results 60

7.1 Khepera III Robot ...... 60

7.2 Robot Navigation ...... 60

7.2.1 Experimental Setup ...... 60

7.2.2 Results ...... 61

7.3 Localization Results ...... 63

7.3.1 Environment Setup ...... 63

7.3.2 Memristive Device based Model ...... 63

7.3.3 Comparison with Computational SLAM ...... 64

7.4 Discussions ...... 65 vii 8 Discussions 71

8.1 Future Work ...... 72

viii List of Figures

1.1 A Pascaline ...... 2

1.2 Charles Babbage’s differential engine ...... 3

1.3 IBM’s synaptic computer ...... 4

2.1 An Artificial Neural Network (ANN) ...... 7

2.2 Bayesian network ...... 8

3.1 Synaptic plasticity ...... 17

4.1 Membrane action potential ...... 20

4.2 STDP based learning rule ...... 21

4.3 Device properties ...... 24

4.4 Memristive device based learning rule ...... 25

5.1 Sensor simulation ...... 28

5.2 Robot modeling ...... 30

5.3 Learning scheme for robot navigation ...... 33

5.4 Neuromorphic array for robot localization ...... 33

5.5 Memristive device based learning rule for robot localization ...... 34

5.6 The integrated learning scheme ...... 36

6.1 Robot navigation for model M1 ...... 38

6.2 Robot navigation for model M2 ...... 50

6.3 Robot navigation with Reinforcement learning - global knowledge ...... 51

6.4 Robot navigation with Reinforcement learning - local knowledge ...... 52

ix 6.5 Device performance - uniformly distributed variability in doping concentration . . . 53

6.6 Probability distribution functions for device initial doping concentration ...... 53

6.7 Device performance - normally distributed variability in doping concentration . . . . 53

6.8 Device performance - variability in state update ...... 54

6.9 Device array layout ...... 54

6.10 Robot navigation layouts for shorted device ...... 55

6.11 Performance results for shorted device ...... 55

6.12 Robot navigation - shorted device ...... 56

6.13 Memristive device based robot navigation and localization ...... 56

6.14 Localization errors - device based localization ...... 57

6.15 3D view of localization error - device based localization ...... 57

6.16 Robot localization using computational SLAM ...... 58

6.17 Localization error - computational SLAM ...... 58

6.18 3D view of localization error - computational SLAM ...... 59

6.19 Device-based localization in actual experimental layout ...... 59

7.1 Experimental setup for navigation ...... 61

7.2 Time-lapse of robot navigation - layout 1 ...... 62

7.3 Time-lapse of robot navigation - layout 2 ...... 63

7.4 Robot navigation plots ...... 64

7.5 Environment setup for robot localization ...... 65

7.6 Snapshots of robot localization - run 1 ...... 66

7.7 Snapshots of robot localization - run 2 ...... 67

7.8 Robot localization errors - device-based localization ...... 67

7.9 3D view of localization error - device based localization ...... 68

7.10 Robot localization errors - computational SLAM ...... 68

7.11 3D view of localization error - computational SLAM ...... 69

x List of Tables

4.1 F-P equation parameters...... 22

6.1 Path lengths for different environment layouts ...... 42

6.2 Device variability results ...... 45

6.3 Device-based localization - Simulation results ...... 48

6.4 Device-based localization in actual experimental layout - Simulation results . . . . . 49

6.5 Computational SLAM - Simulation results ...... 49

7.1 Device-based localization - Experimental results ...... 69

7.2 Computational SLAM - Experimental results ...... 70

xi List of Symbols

(x, y, ψ) Robot pose

δn Change in carrier concentration in the memristive device oxide layer

∆t Spike time difference between pre- and post-synaptic spikes

ψ˙ Robot turning rate

W˙ Change in structural parameter

µ Mobility of the dielectric

th µi Location of the i place cell

νMR Voltage drop across the memristor

ω Pulse width

φ Target heading

φB Depth of trap from the conduction band of HfO2

Σi Covariance matrix for place cell i

τ Time constant of the membrane

θ Sensor orientation

ξ STDP rule

A Area of the memristive device

xii cij Conductance of the memristive device located at index (i, j) in the array

E Electric field

EL Resting potential of the LIF neuron

I Injected current to the LIF neuron

n0 Concentration of the doping in the memristive device oxide layer nl Number of landmarks

P k Place cell activity at location k = [x y]T

Q Sensor quality q Electron charge

Rm Membrane resistance for the LIF neuron

k sj Weighted sum through the device array when the robot is location k v Robot velocity

Vreset Resting potential of the LIF neuron after the spike

Vth Spike threshold for the LIF neuron

k xi Inputs to the device array when the robot is at location k

xiii Chapter 1

Introduction

The human brain is a marvelous learning machine. We are exposed to myriad of sensory data every second and can still process, organize, and make decisions based on the information. Scientists and researchers have always been interested in Artificial Intelligence (AI). The history of machines is quite old. In 1642, Blaise Pascal invented the first digital calculating machine, the mechanical calculator, known as ‘Pascaline’ (Figure 1.1). Three decades later, Leibniz invented the binary numeral system and envisioned a universal calculus of reasoning. During the mid 19th century,

Charles Babbage and Ada Lovelace worked on programmable mechanical calculating machines

(Figure 1.2).

A hundered years later in 1941, Konrad Zuse built the first working program-controlled com- puters. In 1950, artificial intelligence was so evolved that Alan Turing proposed the famous Turing

Test as a measure of machine intelligence. In mid 1980’s, neural networks become widely used with backpropagation algorithm. The 1990’s were responsible for the major advances in all areas of

AI, with significant demonstrations in machine learning, intelligent tutoring, case based reasoning, multi agent planning, etc. In 2005, a project to simulate the brain at molecular level was born known as “Blue Brain”. More recently, we have witnessed self-driving cars, computer programs defeating sports professionals in their fields, smart phone apps as personal assistants, and so on. In

2014, the Nobel Prize in Physiology or Medicine was awarded to Dr. John O’Keefe for the discovery of place cells in human brain.

Looking at all these developments, it is clear that the human curiosity towards understanding and building an all-intelligent robotic machine is very strong. Researchers have a better under- 1 standing of how biological species carry out the task of learning from their environment and make decisions. However, the challenge of actually implementing it on a robot is quite big. This is due to the fact that biological species have such a huge and complex system of neurons and connections between them that it is not feasible to simulate that capability without running into computational complexities and real-time application inefficiencies. For instance, researchers have estimated that the human brain has anywhere between 0.6 × 1014 to 2.4 × 1014 synaptic connections. A hypo- thetical computer to program this in real time would require 12 GW of power, whereas a human brain consumes merely 20 W. Besides, there is the famous “memory wall” problem associated with conventional computers that are based on Von Neumann architecture. The von Neumann model, described by John von Neumann in 1945 in the First Draft of a Report on the EDVAC [1], describes a design architecture for an electronic digital computer made up of a processing unit containing arithmetic logic unit and processor registers; a control unit containing an instruction register and program counter; a memory to store both data and instructions; external mass storage; and in- put and output mechanisms. Memory wall refers to the difference in speed of computation and memory storage due to limited communication bandwidth between CPU and memory. Further, the conventional Complementary Metal Oxide Semiconductor (CMOS) memory and logic devices are approaching their fundamental scalability limits.

Figure 1.1: A Pascaline, signed by Pascal in 1652.1

These factors are responsible for researchers to start looking into synaptic memory devices.

Due to their resistance switching feature on application of an appropriate voltage signal, they are capable of simultaneous computation and learning. Such neuromorphic computers can demonstrate

1By c 2005 David Monniaux /, CC BY-SA 3.0, https://commons.wikimedia.org/w/index.php?curid=186079

2 high computation ability and can even achieve parity with biological brain albeit with some short- comings. Recently, the IBM Blue Gene/Q supercomputer (Figure 1.3) was used to program 2.084 billion neurosynaptic cores containing 53 × 1010 neurons and 1.37 × 1014 synapses. However, it ran

1542× slower than real time consuming 70 mW of power [2].

Figure 1.2: The Science Museum’s Difference Engine No. 2, built from Babbage’s design.2

The objective of this work is to explore the potential of such neuromorphic devices in achieving and imparting deep learning capabilities to a robot. This artificial intelligence thus created can be used to solve numerous complicated problems in the field of robotics. Here, we propose a neuromorphic architecture of synaptic memory devices applied on a robot to solve the most common problem of Simultaneous Localization and Mapping (SLAM) and robot navigation.

The current state of art in SLAM indoors is using different sensors for localizing the robot while mapping the environment. Some of the commonly used methods for localization include inertial sensors, flow sensors, vision sensor, laser sensor, infrared proximity sensors, and radio beacons. The mapping can be achieved by vision sensors and laser sensors. However, navigation algorithms based on these sensors are not very efficient. For instance, the inertial sensors coupled with odometry suffer from the dead reckoning which gets worse with time. The flow sensors are largely dependent on ambient light and background features and hence, do not work very well when there are not enough features in the background to be detected. Laser sensors give out over ten thousand data points every second. Processing such a data stream needs a lot of computation and hence gives rise

2By User:geni - Photo by User:geni, GFDL, https://commons.wikimedia.org/w/index.php?curid=4807331 3http://www.research.ibm.com/articles/brain-chip.shtml

3 Figure 1.3: Synapse 16-chip board by IBM.3 to high energy requirements. This again boils down to the memory wall and scalability constraints as discussed above for conventional computing platforms. The radio beacons suffer from interference and have errors accumulating over time. To achieve a reasonable accuracy with these sensors, some sensor fusion is needed which increases the computational cost.

The applications of SLAM are vast, including agriculture, archaeology, autonomous vehicles, biology and conservation, geology, remote sensing, military, robotics, video-gaming, and so on. In this work, we attempted to bridge this technology gap through a non von Neumann architecture of computing that can eliminate these shortcomings. We also developed a localization mechanism for autonomous robots that validates the application of our approach to large scale synaptic networks in a scalable and energy efficient manner. The motivation comes from the discovery of ‘place cells’ in the hippocampus of the animal brain, that spike with maximum rate when the animal occupies a particular location in its environment. Thus, place cells provide a cognitive map of the animal surroundings. We identified the benefits of this inherent mapping system and exploited the out- coming information to enable the robot to learn the associations between the surroundings and the current location.

4 Chapter 2

Literature Review

Learning is an elaborate phenomenon. It comprises acquisition of new descriptive knowledge, development of cognitive skills through practice, organization and representation of new knowledge, and discovery of new theories through experimentation. Researchers have always been interested in imparting learning capabilities to computers. However, solving this problem still remains a challenging task in artificial intelligence.

Learning in artificial systems has been widely studied over the years. Authors in [3] have distinguished the major periods of pursuit of machine learning research, each centered around a different pattern as follows:

• neural modeling and decision-theoretic techniques

• symbolic concept-oriented learning

• knowledge-intensive learning systems exploring various learning tasks

2.1 Types of Machine Learning

The concept of machine learning is divided into two main types - supervised learning and unsupervised learning.

5 2.1.1 Supervised Learning Techniques

In the predictive or supervised learning approach, the goal is to learn a mapping between the inputs and outputs, given a ‘training set’ of input-output pairs. Supervised classification is a frequently carried out task by Intelligent Systems. In the following paragraphs, we talk about the most important supervised machine learning techniques.

Logic Based Techniques

In this paragraph, we will talk about two types of logic based methods - decision trees and rule-based classifiers.

Murthy in [4] have discussed the work on decision trees and their benefit in the field of machine learning. Essentially, decision trees classify instances by sorting them based on feature values. The most well-known algorithm for building decision trees is the C4.5 [5]. A comparison of decision trees and other learning algorithms has been done in [6]. In [7], Gehrke proposes Rainforest, a framework for developing fast and scalable algorithms to construct decision trees while taking care of memory requirements. Though decision trees are inherently univariate - using splits based on single feature at each internal node, [8] uses multivariate trees by constructing new binary features.

Gama in [9] also worked on multivariate decision trees. We have utilized the decision trees for flight formation of unmanned aerial vehicles in a dynamic environment [10], [11].

An overview of rule-based methods is given in [12]. Such methods derive the rules directly from the training data using some sort of algorithm. More work on this can be found in [13] and [14].

Perceptron Based Techniques

Another classification of supervised learning method is based on perceptron. Perceptron can be described as a sum of weighted inputs passed through an adjustable threshold. If the sum is above threshold, the output is 1; else it is 0. The learning algorithm is run on the training set repeatedly until the prediction mapping is correct on all of the training set. This mapping is then used to label the test set.

Perceptrons are capable of classifying only linearly separable sets of data. If the instances are not linearly separable, then the perceptron will fail to converge. To mitigate this problem,

6 Figure 2.1: A feed forward ANN.

Artificial Neural Networks (ANNs) have been created. Zhang in [15] has reviewed the neural networks for classification. A multilayer neural network consists of an input and output layer, with hidden layers in between as shown in Figure 2.1. The network is trained on a paired data set and weight of connections computed to determine the mapping for the new data set. Determining the size of hidden layer is a problem since an underestimate will lead to under-approximation and overestimate will lead to computational extensiveness and hence difficulty in finding global optimum. Researchers in [16] and [17] have studied this aspect. Once the structure of the network is determined, the weights between the layers need to be trained. The training can be done through different algorithms. The most common method is known as Back Propagation algorithm. However, this is a slow way and there have been efforts to speed up the training process, such as in [18], authors have estimated the initial weights instead of setting them randomly. Genetic Algorithm

(GA) can be used to train the network [19] and to find the architecture of the neural networks [20].

Based on [21] and [6], it is concluded that neural networks are usually more able to easily provide incremental learning than decision trees. However, their training time is much longer and perform as well as decision trees, but not generally better [22].

Statistical Learning Algorithms

Contrary to ANNs, statistical approaches have an explicit underlying probability model. This model provides a probability that an output belongs to a certain input, instead of a classification.

Under this classfication, Bayesian networks and instance-based learning algorithms are considered.

7 A Bayesian network is a graphical model for probability relationships between instances as shown in Figure 2.2. The task of learning a Bayesian network comprises learning the DAG (Directed

Acyclic Graph) structure of the network and the determination of the parameters. Once the network structure is fixed, learning the parameters in the Conditional Probability Tables (CPT) is usually done by estimating a locally exponential number of parameters from the data provided [23]. Works in this area have been documented in [24]–[29].

Figure 2.2: An example of a Bayesian network.1

Instance-based learning algorithms wait for the generalization process until classification is performed. Hence, they take less computation time during training phase as compared to decision trees, ANNs, Bayesian networks, etc., but more computation time during the classification process.

The most straightforward algorithm is nearest neighbor algorithm. k-Nearest Neighbor (kNN) is based on the principle that the instances with similar properties tend to exist together.

The major disadvantage of instance-based learning is the high computation time for classifica- tion. 1Charniak, Eugene. ”Bayesian networks without tears.” AI magazine 12, no. 4 (1991): 50.

8 Support Vector Machines

Support Vector Machines (SVMs) work on the idea of maximizing the ‘margin’ - either side of a hyperplane separating the data groups, thus creating the largest possible distance between the sep- arating classes. Therefore, the model complexity of an SVM is independent of number of features encountered in the training data. In fact, they are best suited for learning large number of features in the data set. Training an SVM can be quickly done through Sequential Minimal Optimization

(SMO) algorithm [30]. Researchers have also formulated faster versions of SMO in [31]. An advan- tage of SVM is that it necessarily reaches the global minimum and avoids the local minima, unlike other search algorithms. However, SVM methods are binary and hence, the actual problem needs to be reduced to a binary classification problem.

2.1.2 Unsupervised Learning Techniques

In the descriptive or unsupervised learning approach, the goal is to find a pattern in the given data without any knowledge of the kind of patterns to look for. There is no obvious ‘training set’ here, unlike the supervised learning techniques. Here, we discuss about unsupervised learning in neural networks. In neural computation, there are two major categories for unsupervised learning methods - first class are motivated by standard statistical methods like principal component analysis

(PCA) and factor analysis (FA). These methods form a continuous transformation of the original input to feature vectors and are especially useful in feature extraction. The second class are more clustering related, i.e., learning vector coding based on competitive learning. Such methods are able to map highly non-linear input data onto low-dimensional neural lattices and are suitable for data clustering and visualization.

Many of the learning rules for PCA are based on the neuron model developed in [32]. In PCA, the purpose is to find a subset of variables that would give as good a representation as possible, for a given set of multivariate measurements, with less redundancy. However, it is not always feasible to solve the PCA eigenvectors by standard numerical methods. To solve this problem, PCA neural network learning rules are devised as discussed in [32]. Another modification called generalized

Hebbian algorithm (GHA) was also presented by Sanger [33]. Other related algorithms have been presented in [34]–[37]. A PCA computation in neural networks using back-propagation algorithm

9 for learning has been presented in [38]. More recent works on PCA include [39]–[41].

The FA has been used to find relevant and meaningful factors that explain observed results [42]–

[44]. The Independent Component Analysis (ICA) is more recent model that would maximally reduce the redundancy between the hidden variables. It has been widely studied but the more prominent works are [45]–[48]. Recent works on FA and ICA can be found in [49]–[52].

Traditional vector coding algorithms are Linde-Buzo-Gray (LBG) algorithm, which is similar to k-means clustering algorithm [53], adaptive resonance theory (ART) network [54], etc. How- ever, the best known neural network vector coding algorithm is the self-organizing map (SOM) introduced by Kohonen [55]. In SOM, in addition to the trasitional clustering approaches, the neurons are arranged to a multi-dimensional lattice, such that each neuron has a set of neighbors.

The goal of SOM is to form a topological mapping from input to the set of neurons, while vector coding simultaneously. The mathematical and convergence properties have been studied by several researchers such as [55]–[58]. More recent works on SOM and their applications to various fields can be found in [59]–[61].

2.2 Learning in Biological Systems

Biological species gather the sensory data from the environment through the basic sensors and intuitively process that data in real time and make appropriate decisions. This process has inspired and helped researchers develop various models of understanding and implementing such a capability on robots. The Artificial Neural Networks (ANNs) have been evolving towards more realistically functional model of the biological systems. Initially, for the first generation neurons, a step-function threshold was used [62]. With more advancements, for second generation neurons, continuous activation functions are used as threshold for computing output [63]. The third generation neurons utilize the spiking model of neurons, thereby improving accuracy as well as computational power

[64]. Different models of spiking neurons include the integrate-and-fire model [65], the resonate-and-

fire model [66], the Hodgkin-Huxley model [67], and the Izhikevich model [68]. Izhikevich showed that his model is computationally effective [69] and fairly simple, yet this is the most widely used model for spiking neurons as evident from [70]–[74]. Further applications of spiking neuron models for navigation in artificial systems include optical flow navigation inspired from honeybees [75]. Low

10 et al. [76] have developed a camera based navigation approach that allows the robot to converge to a target using its location in camera frame only. A review on bioinspired navigation control schemes is done by Trullier et al. in [77] and by Franz and Mallot [78]. In [79] and [80], Floreano et al. have applied spiking neural circuits to control navigation in a small robot. They used a spiking response model to build a neural network to control the robot. In [81] and [82], authors have used distributed adaptive control model, based on artificial neural networks. In [83], authors have developed obstacle avoidance and navigation control algorithm based on Classical Conditioning [84] by taking into account the Spike Timing Dependent Plasticity (STDP) rule.

2.3 Biological Navigation and Localization

The rodents are the most studied animals in the area of biological navigation and mapping. Early works have established that rats can continue to navigate even in the absence of a continuous feature rich stream of information, i.e., in pitch darkness [85], [86]. In 70’s, O’Keefe et al. identified place cells in the rodent hippocampus that responded to the spatial location of the animal [87], [88].

Later, another set of neurons were identified, called as head direction cells, that responded to animal’s orientation [89]. Recently, evidence of grid cells has emerged from [90], [91] - cells that

fire at regular grid-like intervals in the environment.

Head direction cells have the peak firing rate when the rodent’s head is facing a specific direction, thus mapping the rodent orientation. This direction is referred to as the cell’s preferred direction or orientation. As the rodent turns away from the preferred direction, the cell’s firing rate decreases.

Sargolini [92] also found that for majority of head direction cells. the cell firing rate is affected by the animal’s translational and turning velocity too. Mizumori [85] also demonstrated that rats continue to navigate in the dark and the directional information is retained in the absence of visual cues for short periods of time, and is updated by ideothetic input. Experiments have also shown that allothetic cues also influence the firing of head direction cells and rotating a visual cue in the environment can cause about the same shift in the preferred orientation of the head direction cells

[89], [93].

Place cells are analogous to head direction cells in two dimensions. These cells map the animal’s location within an environment. Their firing rate is maximum when the animal occupies a certain

11 location within an environment and reduces as the animal moves away from that direction. This preferred location for place cells, where they fire maximum, is known as place field of that cell. It has been established that firing of place cells is also affected by a number of other factors, such as, visual cues, barriers, and food reward locations [94]–[96]. Save et al. [97] experimented with rats blinded from an early age and found that place cell activity is also present in the absence of visual cues, however, this results in generally lower firing rates. Like head direction cells, place fields also rotate by a corresponding amount when the distal cues are rotated [88].

Most of the rodent hippocampal models consist of head direction and place cells. Such models often employ some form of attractor network [98]–[100], however, there are other models available in literature too, such as associative mapping model for updating head direction based on angular velocity [101], and circular shift register [102].

The path integration is achieved by injecting activity into the attractor network, thus shifting the peak towards the injected activity. An implementation has been successfully tested by Arleo [103].

Since, purely ideothetic updates to head direction are vulnerable to cumulative error, hence, some sort of allothetic correction is needed for a robust path integration. Skaggs associated visual cues with head direction cells corresponding to cue’s direction in an egocentric rather than allocentric reference frame [102].

The place cell models form the maps through distance to visual cues to form the firing activity of simulated place cells [104]. However, a main limitation of such a model is that it depends on the visual cues and can not perform the path integration in the absence of visual input. Researchers have also developed ‘view cells’ which are essentially directional place cells to avoid the place cell activity to get direction dependent in narrow environments [105].

2.4 Learning in Robotics

2.4.1 Navigation

The navigation problem consists of three parts, viz., localization, obstacle avoidance, and path planning. Localization can be achieved using a Global Positioning System (GPS) outdoors, but for indoor environments or areas which are cluttered and don’t have a very reliable GPS, like streets with tall buildings, dense forests, etc., some sort of alternate localization method is needed. 12 Hoy et al. [106] have reviewed a range of techniques for navigation of unmanned vehicles through unknown environments with obstacles. As mentioned in their work, there are two major ways for optimal path planning, viz., global path planning methods and sensor-based methods. The difference comes from the amount of the information available about the environment. As is obvious, global path planning is used when the complete knowledge of the obstacles is available, while the sensor-based methods are suitable for local information of the obstacles. Sensor-based methods are more useful since in majority of the real world scenarios the full knowledge of the environment is not available. Various sensor-based methods include obstacle avoidance via boundary following, artificial potential field methods, reinforcement learning methods and so on. The wall following method has been applied to border patrolling and structure inspection [107], [108], autonomous underwater vehicles [109], lane following by autonomous road vehicles [110], and surveying an indoor area [111]. The potential field functions and reinforcement learning methods are used in [112], while [113] employs reinforcement learning method with Cerebellar Model Articulation Controllers

(CMACs). We have reviewed a variety of path planning algorithm for unmanned vehicles in [114].

2.4.2 Simultaneous Localization and Mapping (SLAM)

An effective way of localization is Simultaneous Localization and Mapping (SLAM). The Simultane- ous Localization and Mapping (SLAM) problem deals with a robot trying to incrementally build a map of an unknown environment while simultaneously determining its location in the map. During the early days of the probabilistic SLAM, researchers were looking to apply estimation-theoretic methods to mapping and localization. However, it was recognized that consistent probabilistic mapping was a fundamental problem in robotics with major conceptual and computational issues.

In late 80’s, Smith and Cheesman [115] and Durrant-Whyte [116] showed that there must be a high degree of correlation between estimates of locations of different landmarks in a map, and that it increases with successive observations. However, these early works did not focus on the conver- gence properties of the map or its steady-state behavior. It was assumed that the estimated map errors would not converge, and hence, researchers decoupled the full filter to a series of landmark to vehicle filters. The structure of the SLAM problem along with the result of map convergence, and the acronym ‘SLAM’ came with the 1995 survey paper in International Symposium on Robotics

Research [117]. Soon after, many researchers started working on SLAM in indoor, outdoor, as well 13 as underwater environments [118]–[121]. Traditionally, researchers have used an Extended Kalman

Filter (EKF) for incrementally estimating the posterior distribution over robot’s position with the landmarks’ position [122]. This was extended to environments with a large number of landmarks in [123], [124]. We also developed an indoor localization and navigation algorithm for Unmanned

Aerial Vehicles in [125].

2.5 Problem Identification

Although there has been a lot of work done in the area of artificial learning in robotics, the problem of robot localization and navigation in unknown environment still remains pretty challenging. The problem is not trivial due to lack of ability to learn really complex structure of the environment.

Biological brains, on the other hand, have been able to do this because of their ability to carry out deep learning and storing the information in large neuronal networks.

In this work, we have developed a novel learning scheme that is inspired from the network of neurons connected with synapses in the human brain. In traditional neural network schemes, the synaptic dynamics are represented by complex mathematical functions. However, in our work, we have presented a scheme that can be conveniently implemented on resistive synaptic memory (or memristive devices). We believe that implementation of such learning schemes on emerging neuro- morphic hardware will provide scalable and energy efficient route for robotic controls, especially on miniaturized robots where high performance learning in an energy efficient fashion is desired [126].

The memristive devices are a cross bar array of resistors that can change their resistances when a voltage is applied. We have modeled these devices in MATLAB and integrated with the learning scheme to develop an artificial brain mechanism for robots.

We have demonstrated the validity of our learning scheme by navigating a two wheeled differen- tial drive robot while localizing it in an environment with randomly placed obstacles. The robot is able to associate the information coming from the on-board sensors, such as direction of target and proximity of obstacles to efficiently navigate while avoiding the obstacles. The landmarks around the robot serve as the ‘memory recall’ to the current location in the environment.

We also compared our learning scheme to a very well known and efficient path planning tech- nique, viz., the reinforcement learning based approach. We found that for simple layout of obstacles,

14 our navigation approach approximates an optimal path even with just the local knowledge of en- vironment. Further, this navigation approach does not suffer from the problem of robot getting stuck in a region of local minima, unlike, e.g., potential field based algorithms. For localization, we compared our approach with the computational SLAM technique that uses particle filter for landmark matching and data association. It was found that our approach can provide impressive localization performance, comparable to the highly computational algorithms. Hence, this work establishes the potential of synaptic learning through weight modification for solving numerous problems in robotics, particularly in the area of navigation and localization.

15 Chapter 3

Spike Timing Dependent Plasticity

Spike Timing Dependent Plasticity (STDP) is a family of learning mechanisms originally postulated in the context of artificial machine learning algorithms (or computational neuroscience), exploiting spike-based computations (as in brains) with great emphasis on the relative timing of spikes [127].

According to STDP, synaptic weights between two neurons are increased if presynaptic spike occurs before the postsynaptic spike (Long Term Potentiation (LTP)), thus establishing a causal correlation between spiking times of the two neurons. On the other hand, if the postsynaptic spike occurs before the presynaptic spike, the synaptic weight between those neurons is decreased (Long

Term Depression (LTD)). Recently, STDP has been introduced to explain biological plasticity

[128] and [129]. Synaptic weight modification using STDP rule has been demonstrated in temporal pattern recognition [130], temporal sequence learning [131], [132], [133], and navigation [134], [135],

[136]. Timoth´ee et al. have done extensive work on STDP towards recognition of repeating patterns

[137] and multiple repeating patterns with multiple STDP neurons [73] in continuous spike trains.

They demonstrated that a single leaky integrate-and-fire (LIF) neuron equipped with STDP is able to detect a repeating pattern in a continuous spike train. In other words, the neuron becomes selective to successive coincidences of the pattern. In [73], it was observed that the neurons tend to compete for the patterns and try to cover the different patterns or part of pattern. These results illustrate how STDP can detect repeating patterns and generate selective responses to them.

Some researchers have looked into the aspect of reward modulated STDP for a simple dif- ferential drive robot [138] where they have applied a functional approach to model motor sensor interactions and focused on the learning in the neural controller of the robot. They have taught the 16 robot different behaviors like obstacle avoidance, foraging, or a combination of the two. Other ap- proaches are to use genetic algorithms [139], [140], or different STDP models [141] to learn synaptic connection to control motor movements. Florian [142] have implemented reinforcement learning through modulation of STDP in their work. In [143], Evans also worked on reinforcement learning task of foraging and poison avoidance using a robot controlled with a spiking neural network.

Figure 3.1: A visualization of the LTP and LTD processes.1

The neurons interchange information through their inter-connections known as a synapse. The neuron spike is the membrane voltage difference between the inside and outside of the cellular membrane. The memristor voltage is then the difference in presynaptic and postsynaptic neu- ron spikes. The large membrane voltages during a spike (with magnitude around 70 mV ) cause molecular membrane channels to open and close allowing ionic and molecular substances to flow through or restrict them, thus creating a difference in their concentrations. At the same time, neurotransmitters from presynaptic cell fuse into the membrane and release into the synaptic cleft.

1Boundless. “Synaptic Plasticity”. Boundless Biology. Boundless, 08 Aug. 2016. Retrieved 27 Oct. 2016 from https://www.boundless.com/biology/textbooks/boundless-biology-textbook/the-nervous-system-35/how- neurons-communicate-200/synaptic-plasticity-765-11998/

17 Some of these neurotransmitters are collected by the postsynaptic neuron, thereby changing its membrane conductivity. This process can be visualized as in Figure 3.1. The cumulative effect of presynaptic spikes will eventually trigger the generation of a postsynaptic spike. These spikes when falling within a time interval of each other modify the synaptic weight between those two neurons.

The synaptic weight could be considered as a measure of number/size of neurotransmitter packets released during a spike. This phenomenon was reported in 1949 by Hebb [144] as follows:

“When an axon of cell A is near enough to excite a cell B and repeatedly or persistently takes part in firing it, some growth process or metabolic change takes place in one or both cells such that

A’s efficiency, as one of the cells firing B, is increased.”

STDP is a refinement of the above postulate by taking into account the precise timing of pre- and postsynaptic spikes. The change in synaptic weight is expressed as a function of the time difference, ∆T , between pre- and postsynaptic spikes.

Mathematically, this ξ(∆T ) STDP learning function is described as [127]:

 + a+e−∆T/τ if ∆T > 0 ξ(∆T ) = (3.1) − −a−e∆T/τ if ∆T < 0 where, a+ and τ + are respectively the LTP learning rate and time constant, and a− and τ − are the same parameters for LTD.

18 Chapter 4

Device Models

In this chapter, we will discuss the model of the Mn:HfO2 based memristive device derived based on the principles of physics [126]. Such devices, as we discussed, are capable of changing their structural parameter, usually the resistance, upon application of a voltage signal. The spikes coming from the neurons are modeled mathematically and the membrane action potential can be computed from superposition of those spikes with the time difference. When the membrane voltage becomes higher than the threshold potential, a spike is generated and the membrane potential falls to a reset potential. We have modeled the spikes and membrane potentials for different cases of time difference and the change in the structural parameter using STDP is also modeled.

Here, we present two models used in this study for the device: (1) Macro-model of the device, which is a behavioral model made of circuit elements, based on the macro-model developed in

[127], and (2) Device-physics derived model, which captures the low-level features of the memristive devices, viz. the conduction mechanism in the device. This is based on the device model developed in [126].

4.1 Macro-Model of the Memristor Device

For memristor devices, the change in structural parameter is driven by voltage drop across it, νMR, as described by Snider [145], [146]:

W˙ = f(νMR) (4.1)

19 Figure 4.1: Membrane voltage action potential for positive ∆T

The parameters Io and νo are device parameters that may or may not depend on W . The function f() can be modeled to grow exponentially and/or include a threshold barrier vth [147] as:

For |νMR|> vth h i |νMR|/v0 vth/v0 f(νMR) = I0sign(νMR) e − e (4.2) and 0 otherwise.

This memristor voltage νMR is plotted in Figure 4.1 for ∆t = 0.75ms. The values used are:

I0 = 10µA, and v0 = 1/7V . As can be observed, the value of νMR is more than the threshold voltage vth, which is unity. Hence, we can find the change in structural parameter W from Eqn. (4.1) as:

Z ∆W (∆T ) = f(νMR(t, ∆t))dt = ξ(∆T ) (4.3)

The function ∆W (∆t) imitates the behavior of the STDP rule ξ as described by Bi and Poo from physiological experiments [148]. This is plotted in Figure 4.2.

20 Figure 4.2: STDP update function

4.2 Leaky Integrate and Fire Neuron Model

A Leaky Integrate and Fire (LIF) neuron model adds the synaptic currents coming towards it. The rise in potential is given as:

−1/τ V = (EL + IRm) + (V − (EL + IRm))e (4.4)

Now as soon as V > Vth, then V changes to Vreset and a spike is generated. The parameters described above are:

EL : Resting potential

Vth : Spike threshold

Rm : Membrane resistance

τ : Time constant of the membrane

I : Injected current

Vreset : Reset voltage after spike

21 Table 4.1: F-P equation parameters.

Parameter Value Units q 1.609 × 10−19 C µ 0.15 cm2/V.s E 1 × 106 V/cm A 4 × 10−12 cm2 20 −3 n0 9 × 10 cm

φB 0.207 eV ε 25 × 8.85 × 10−12 F/m k 8.617 × 10−5 eV/K T 300 K

4.3 Device-Physics Derived Model

In this section, we present the device-physics derived model of a two-terminal resistive memory device as presented in [126]. This model focuses on the conduction mechanism in two-terminal memristive devices and the authors have found that it is based on Frenkel-Poole emission [149] based on excellent r-square (R2) values obtained for F-P fitting. This is a closer approximation and can more precisely predict the behavior of a memristive device changing its resistance on application of a voltage signal. A two terminal memory device is considered for simulating the synaptic array junctions. These devices are made up of a TiN top electrode (TE) and Ru bottom electrode (BE) with Mn doped HfO2 as the switching layer. The change in carrier concentration according to the input current can be predicted and hence, can be used to design a device with required specifications. The equation for F-P emission can be given as:

 q qE  φB − πε I = qµEAn exp − (4.5) 0  kT 

Here µ is the mobility of dielectric, E is the electric field, A is the area of the device, n0 is defect concentration and φB is the depth of the trap from the conduction band of HfO2 which is corrected for the electric field in the exponential. The F-P equation parameters are tabulated in

Table 4.1.

To demonstrate the possibility of implementing STDP using the proposed device model, a 1V 22 pulse is applied while the pulse width (ω) is modulated based on the time difference, ∆t, between pre-synaptic and post-synaptic firing spikes. The ∆t is mapped to the pulse width as follows. Since, the highest change in conductance occurs when the time is small, hence ∆t = ±10ms corresponds to ω = 200ms, ∆t = ±20ms corresponds to ω = 100ms, ∆t = ±30ms corresponds to ω = 50ms, and ∆t = ±40ms corresponds to ω = 20ms. This relation is represented by the following equation:

ω = L1 exp(K1|∆t|) (4.6)

where L1 and K1 are fitting parameters and are computed to be L1 = 368.3 and K1 = −0.05199, respectively. This relation is also plotted as shown in Figure 4.3a. Upon the application of poten- tiating voltage, the concentration of defects in doped oxide increases due to constant voltage stress and stress induced leakage current. The change in carrier concentration, ∆n on application of a voltage pulse can be modeled as follows:

∆n = L2 exp(K2ω) (4.7)

19 where L2 and K2 are fitting parameters and are computed to be L2 = 8.111 × 10 and K2 = 0.006501, respectively. This relation is based on the one found in [126]. The variation in carrier concentration with pulse width is also shown in Figure 4.3b. This variation in n0 can be included in the original F-P equation as follows:

 q qE  φB − πε I = qµEA[n ± ∆n] exp − (4.8) 0  kT  where a +∆n refers to an increase in carrier concentration, i.e. a potentiating pulse, and −∆n refers to a decrease in carrier concentration, i.e., a depressing pulse. Interestingly, defects in doped oxide can be annihilated by reversing the polarity of the applied bias (i.e. by applying depressing pulse) which provides a decrease in ∆n.

Finally, the variation of current with the increase in carrier concentration can be seen in Figure

4.3c. The variation in device resistance as carrier concentration changes can be observed from

Figure 4.3d.

The resistance of the device can be calculated by applying a 1V pulse to read the current

23 (a) (b)

(c) (d)

Figure 4.3: Plots showing the properties of the device-physics derived model of the memory device. (a) Mapping pulse width to spike time difference. (b) Variation in carrier concentration with applied pulse width (c) Variation in device current with the increase in carrier concentration (d) Variation in device resistance with carrier concentration.

24 Figure 4.4: Change in device conductance with spike time difference through the device. Hence, the new value of device resistance is given by dividing the voltage pulse by the current value. Figure 4.4 shows the variation in device conductance with the difference in spike time difference. We can observe that this resembles the STDP update function from the macro-model (Figure 4.2) and also the experimental results from Bi and Poo [148]. It should be noted that the depressing pulse, i.e. (n0 − ∆n), is represented here as the negative spike time difference and the percentage change in device conductance is with respect to n0.

25 Chapter 5

Learning Schemes

5.1 System Description

In this section, we discuss the learning scheme developed to complement the memory device model that imparts the artificial intelligence capability to a robot.

5.1.1 Sensor Configuration

A two wheeled differential drive robot is navigated towards a known target location while avoiding the obstacles. To achieve this in simulation, two types of sensors are modeled − target sensors and obstacle sensors. These are represented as 1 through 5 in Figure 5.2. The target sensors return the direction of the target in the form of a sensor reading and sensor quality. The sensor quality, Q is defined as the ability of the sensor to provide correct information. In other words, it is quantitatively the sensor reliability. Hence, from Figure 5.2, if φ is the angular distance of the target (shown by red star) from the sensors, and θi(i = 1, 2, ..., 5) are the location of the sensors 1 − 5, where

  θ = 3π π π , (5.1) π 4 2 4 0 then, the sensor quality can be written as

Qi = 1 − δi/max(δ) (5.2)

26 where, δi = |φ − θi| The above parameter, Q, when multiplied by the sensor reading, which is the distance of the target from the robot, gives an estimate of the target direction. As an example, consider the position of the target, shown by the red star, in Figure 5.2. Since the target is (radially) closest to sensor 4, then 5, and so on, hence, Q4 > Q5 > Q3 > Q2 > Q1. The sensor current that goes into synaptic array is the product of sensor reading, as defined earlier, and Q. These currents are added in the motor neuron circuit which generates a spike as described by the LIF model earlier.

These spikes then actuate the motor, hence moving the robot. If the resulting motion moves the robot heading towards the target, this action is associated to the active sensor. If the robot heading moved away from the target, the connections are weakened.

The obstacle information comes from the front three obstacle sensors (2, 3, and 4). For simula- tion purposes, the area in front of the robot is divided into three grids and these sensors look if the grid is occupied or not. If the grid is occupied, the sensor starts sending electric current into the synaptic array which changes the motor speeds based on the synaptic weight associated with that particular sensor. Once the robot moves, these grids are examined for obstacles again. If the robot moved away from the obstacle, this motion is associated to the obstacle sensors by strengthening the connection between the sensor and the motor neurons. If the grids are still occupied, this could be an unfavorable motion and is penalized by weakening the synaptic connection.

For experimental work, the sensors give out a current inversely proportional to the distance from the obstacle in range. If the robot is too close to an obstacle, the current value is very high as compared to when the robot is far away from the obstacle.

The sensor currents are related to the wheel linear velocities as follows:

vL = (TL − OR)vt + vc (5.3)

vR = (TR − OL)vt + vc (5.4)

Let i = L, R represent left (L) and right (R) wheels respectively, then vi are wheel linear velocities, Ti are spikes generated from the LIF neuron by the target sensor currents, and Oi are spikes generated from the LIF neuron by the obstacle sensor currents. Further, vt and vc are velocity components that affect the turning rate and constant linear velocity of the robot, respectively. The

27 Khepera III robot that we used for experimental results (Section 7.1) takes these vL and vR as inputs.

An omnidirectional LASER rangefinder is also modeled to see the landmarks in the range of the robot. The LASER rangefinder gives the linear distance to the closest point of reflection of the LASER with corresponding bearing value, thus identifying any obstacles in the range. To complement the LASER rangefinder, we also model a feature sensor that is essentially a vision sensor, like camera. However, to maintain simplicity, we assume that the feature sensor is able to read the landmark features directly as discussed in Section 5.1.4. Figure 5.1 shows the simulated sensor data for the omnidirectional LASER sensor as the robot navigates in the environment. The

‘×’ represents the heading of the robot. As can be observed, the LASER sensor essentially maps the landmarks present in the range of the sensor around the robot. These distances to the landmarks at the respective bearing angle are mapped to proportional current values and sent to the feature sensor as explained in more detail in Section 5.3.1.

Figure 5.1: LASER sensor simulation as the robot moves

5.1.2 Robot Kinematics

Consider a differential drive model of a robot as shown in Figure 5.2. The velocities of the left and right wheels are characterized as scalar angular velocities ωL and ωR, respectively. We follow the

28 formulation as derived in [150]. Hence, the body fixed components v and ωK of the robot velocity are related to the wheel velocities as:

    v ω    R   = C   (5.5) ωK ωL

where, v is the linear velocity, ωK is turning rate, and C is defined as

  rR rL C =  2 2  (5.6)  rR −rL  b b in which rL and rR are the radii of the left and right wheels respectively, and b is the axle distance between the wheels. The factory parameters of Khepera III are rL = rR = r = 0.0205m and b = 0.08841m. Substituting Eqn. (5.6) into (5.5) yields:

r v = (ω + ω ) (5.7) 2 R L r ω = (ω − ω ) (5.8) K b R L

The robot kinematic equations can then be written as:

x˙K = v cos ψ

y˙K = v sin ψ (5.9) ˙ ψ = ωK

wherex ˙K andy ˙K are components of robot’s velocity in x and y directions respectively, and ψ is robot heading from x−axis.

5.1.3 Place Cell Configuration

The robot is modeled to contain a number of place cells that spike as it moves around the envi- ronment. The place fields are modeled as 2D Gaussian distribution with the mean µi and variance

29 Figure 5.2: Modeling of a two-wheeled differential drive robot

  σ2 ρσ σ T th xi xi yi Σi, where µi = [xi yi] is the location of the i place cell. Also, Σi =  , where  2  ρσyi σxi σyi Σi is the covariance matrix for place cell i and we assume that ∀i, σx = σy = σ and correlation,   i i σ2 0   k k T ρ = 0, so that Σ =  . Hence, the place cell activity, P = [Pi ] at a location k = [x y] 0 σ2 can be written as follows:

  k 1 1 −1 T P = exp − (k − µi)Σ (k − µi) (5.10) i 2πp|Σ| 2

5.1.4 Environment Features

The environment is modeled as a closed rectangular area with a number of landmarks randomly placed across the area. The landmarks are modeled as distinct polygons as follows:

We first find the centers of the landmarks, ci = [cxi cyi ], for i = 1, 2, ..., nl, nl being number of landmarks. These are scattered evenly in every 0.25[XY ] section of the environment, where

X and Y are the dimensions of the environment. The landmark vertices, li = [lxi lyi ], are then generated as follows:

30 For θ = [0 : 2π/V : 2π], V being number of vertices desired:

lxi = cxi + (B + R) cos θ (5.11)

lyi = cyi + (B + R) sin θ (5.12) where, B = αY , and R = B.rand(). These vertices are then joined with straight lines to get a polygon with V vertices. The value of α may be varied to get desired size of the landmarks. The rand() function gives out uniformly distributed random numbers in the range [0, 1].

It is assumed that each landmark has a unique feature that will be picked up by the robot.

These features may be physical features of the landmarks, such as, size, color, etc. We give distinct features to the landmarks. The feature sensor described earlier can read the vector of these features when the landmark is in range. This also constitutes the input to the localization neural network as discussed in Section 5.3.

5.2 Learning Scheme for Navigation

The learning scheme that we developed can be visualized from the schematic as shown in Figure 5.3.

To understand, consider the input current coming from the sensors. It may be noted that there are two types of sensors: i) target sensor; and ii) obstacle sensor. Both of these sensors output current based on distances from target and obstacles respectively. These currents split into the left and right wheel neuron model according to the synaptic resistances. They are then added in the motor neuron circuits based on an LIF neuron model. This neuron model then generates a spike when the membrane voltage crosses the threshold. As a result, the robot moves to a new position and/or orientation. The new sensor inputs are then used to compare the change in orientation. If this change is favorable (robot heading moved towards the target and/or away from the obstacle), the synaptic resistance decreases for that pair of sensory-motor neuron. On the other hand, an unfavorable movement (robot heading moved away from the target and/or towards the obstacle) leads to an increase in the synaptic resistance for that sensory-motor neuron pair. This learning scheme is also given in Algorithm 1.

31 while not reached target do Find the target and obstacle information from the sensors. Send out currents based on the perceived signal. Add the currents from all sensors in the motor neuron circuit to generate a spike, thus moving the robot. Find the new position and orientation of the robot. Find the target and obstacle information from the sensor again. if robot heading changed towards the target then Strengthen the connection between the sensor and the spiked motor neuron. else Weaken the connection between the sensor and the spiked neuron. end if the robot moved away from the obstacle then Strengthen the connection between the obstacle sensor and the spiked motor neuron. else Weaken the connection between the obstacle sensor and the spiked neuron. end end Algorithm 1: Learning Scheme for active learning using STDP with a two terminal memory device model

5.3 Learning Scheme for Localization

5.3.1 Network Design

The robot is trained in a given environment with landmarks and features as described in Sec-

k tion 5.1.4. The input to the localization neural network, xi , is the vector of landmark features. This vector is calculated as follows:

k k k xi (Θ) = f (Θ).z (Θ) (5.13)

k T where, f (Θ) = [f1(Θ) f2(Θ) ... fnl (Θ)] is the feature vector for the landmark at bearing angle Θ, and zk is the output of the LASER sensor (the distance value) at that bearing. Hence, from

k Figure 5.4, the inputs to the network, xi , are the vectors of landmark features, xi,(i = 1, 2, ..., nl), when the robot is at the location k in the environment. The weighted sum through the device array can be calculated as:

k X sj = xi.cij (5.14) i where, cij are the conductance of the memristive devices. The range of cij is assumed to be 32 Figure 5.3: Schematic of the learning scheme

0.01 × 10−6Ω−1 to 100 × 10−6Ω−1. These are initialized as randomly distributed about the mean

−6 −1 of c0 = 1.137 × 10 Ω with a standard deviation of 10% of c0. The output of the network can be

k k 1 calculated by passing this weighted sum, sj , through the sigmoid function yj = f(s) = 1+e−s . It should be noted that sk are normalized in the range [−1, 1] before passing to the sigmoid function.

This output, y, represents the place field activity of the robot that may be decoded into (x, y) coordinates for representation purposes.

Figure 5.4: Device network for localization

33 5.3.2 Learning Rule

The learning rule is assumed to be linear for learning the landmark features. To realize the device model, we calculate the pulse width needed to change the conductance, by changing the carrier concentration, as discussed in the Device-Physics based model in Section 4.3. The pulse width, ω, is related to the error in network output as follows:

k kT k k ω = K1.x .(P − y ) (5.15)

k where, K1 is a normalization constant to keep the ω in the range of [−200, 200]ms. The change in conductance is then calculated as:

1 + sign(ω) 1 − sign(ω)  ∆ck = ωk K + K (5.16) 2 p 2 n where Kp and Kn are rates for learning and unlearning, respectively. Thus the conductances may be updated as cij ← cij + ∆cij by applying a pulse width ωij. This relation can be plotted as in

−7 −9 Figure 5.5, for values of Kp = 3 × 10 and Kn = 5 × 10 . The units of Kp and Kn maybe noted as ms−1Ω−1.

Figure 5.5: Change in conductance with the applied pulse width

34 5.3.3 Learning Scheme

The complete learning scheme integrating localization with navigation and obstacle avoidance is shown in Figure 5.6. As shown in the figure, the LASER sensor gives out a scan of the environment around the robot thus detecting any landmarks/obstacles in the range. This scan is input to a feature sensor that identifies the landmarks in range and extracts the unique features off those landmarks. In practical scenario, we can think of this feature sensor as a vision sensor that, through image processing, can identify a landmark and the associated features. The output of the feature sensor is a vector of all the features collected around the robot with respect to the angle. After going through the device array, we get an output as described in Section 5.3. This output is basically the array of place cell activities that may be decoded into an (x, y) coordinate for visualization purposes. This output is now compared against the place cell activity and a pulse width is evaluated for changing the device conductance as described in Section 5.3.2.

The output of the LASER sensor is also sent to the obstacle sensor to carry out obstacle avoidance and navigation. This has been described earlier in Section 5.2.

35 Figure 5.6: Schematic of the learning scheme for localization and navigation

36 Chapter 6

Simulation Results

6.1 Simulation Results for Navigation

In this section, we present the simulation results for navigation of a two-wheeled differential drive robot through an unknown environment without any knowledge of the obstacles a-priori. The simulation environment is a 20m × 20m area with randomly placed obstacles. The boundary of the environment is also modeled as an obstacle. Seven different obstacle layouts, (a) through (g), are considered for comparing the performances of the four methods: i) Proposed memristive device based method that uses macro-model (M1); ii) Proposed memristive device based method that uses device-physics derived model (M2); iii) Reinforcement learning method using global knowledge

(RLg); and iv) Reinforcement learning method using local knowledge (RLl). The robot is equipped with five infrared proximity sensors placed around the robot. It should be noted that the robot is navigating with only local knowledge of the environment (for methods M1, M2, RLl) and is fairly agnostic to the placement of the obstacles.

6.1.1 Macro-Model (M1)

Figure 6.1 shows the navigation results of the robot while utilizing the macro-model of the memory device. The robot starts from ‘◦’ and has to reach the target shown by ‘×’. The last three obstacle layouts are more interesting to note here since they contain a region of local minima. It can be seen that the robot is able to navigate out of the U-shaped obstacle as shown in Figure 6.1e and reversely in Figure 6.1f. Another complex layout is shown in Figure 6.1g where the robot is able 37 to successfully navigate through the maze like structure.

(a) (b) (c)

(d) (e) (f)

(g)

Figure 6.1: Navigation of robot through different obstacle layouts employing the macro-model of memristive device (M1)

6.1.2 Device-Physics Derived Model (M2)

Figure 6.2 shows the navigation results of the robot while utilizing the device-physics derived model of the memory device. As with the previous model, the robot can navigate to the target while avoiding the obstacles and again, the local minima.

38 6.1.3 Comparison with Paths Based on Reinforcement Learning Algorithms

Reinforcement Learning Algorithm

Reinforcement Learning problems involve learning how to take actions, based on situations, so as to maximize a reward signal. The earned reward can be a feedback for the next action - known as exploitation, or the agent may choose to explore the environment for a better action selection in future.

The reinforcement learning algorithms generally involve estimating a value function. Value functions are functions of states (also state-action pairs which are referred to as Q-values) that estimate the degree of importance the agent should give to be in a given state [151]. For refining the values of the grids, the Value iteration algorithm [151] is used as described below:

Initialize V(s) randomly

Repeat

Count k = 1

Initialize ∆ as a large number where, ∆ is the difference between the values of a particular state in successive time steps. For each s ∈ S:

X a a 0 Vk(s) ← mina P 0[R 0 + γVk−1(s )] (6.1) s0 ss ss

Compute ∆ = Vk(s) − Vk−1(s) until ∆ = 0.01

Count k + 1

Output a policy π, such that

X a a ∗ 0 π(s) = argmina P 0[R 0 + γV (s )] (6.2) s0 ss ss

where P a refers to the state transition probability from state s to s0 by taking action a and Ra ss0 ss0 refers to the expected reward for that particular transition, V ∗ is the converged (optimal) value

39 function. The update equation is given by:

X a 0 Vk(s) ← mina P 0[1 + γVk−1(s )] (6.3) s0 ss where 1 signifies the one-step cost to move from one state to the other, and P a is the probability ss0 of choosing an action a in state s and the value depends on the location of the robot. This one-step cost usually represents the distance the robot is required to move to complete one transition.

In this work, the reinforcement learning problem is to reach a known goal position in shortest path, while avoiding obstacles. For this path planning problem, the complete environment is divided into a number of grids and each grid carries a value that reflects the cost to reach the goal from that grid. The exploration step involves starting from a random position and trying to reach the goal by using the current knowledge of obstacles that comes from the sensors. For a better estimate of the expected reward (the values of grids in this case), this step must be performed a number of times. Once enough information is collected, the robot exploits the knowledge of the grid values to choose the lowest value among its neighbors. This makes it one step closer to the goal. The choice of number of exploration steps depends on the size of field and the speed of computation.

The complete algorithm is constructed as in Algorithm 2.

To validate the performance of the proposed algorithms, their results are compared with paths obtained via the following two algorithms based on reinforcement learning (RL):

RL with global knowledge of obstacles (RLg) where the location of all the obstacles within the environment are known a-priori, and are used to build a grid-based map of the environ-

ment through exploration. The final map provides the minimal distance from all locations to

the target with probability close to 1, so that the path obtained using this approach is the

shortest possible path between the initial position (◦) to the goal position (×). These paths

are shown in Figure 6.3 for the same layouts of obstacles as before.

RL with local knowledge of obstacles (RLl) where the initial map does not include informa- tion about obstacles, and the robot only discovers them when they are encountered during

navigation. Thus, paths must be learned gradually and are not guaranteed to be optimal,

especially in the case of non-convex obstacles. Figure 6.4 shows the paths found using RL

40 Generate grids and set all grid-values to zero. while current position 6= goal do for a given number of steps, k do while current position 6= goal do Identify neighbor grids and visible grids∗. Move to next grid∗∗. Update previous grid’s value as

X a 0 Vk(s) ← mina P 0[1 + γVk−1(s )] s0 ss Set current position = next grid. end end Identify neighbor grids and visible grids. Identify occupied grids (obstacles) from SLAM. Mark occupied grids with a grid value of 10000. Move to next grid. Set current position = next grid. end ∗ grids within the range of the LASER. ∗∗ next grid is the grid with the minimum value among the neighboring grids. If there is more than one minima, we choose randomly. Algorithm 2: Path Planning using Reinforcement Learning

algorithm with local knowledge of the environment (RLl).

A comparison can now be made between the two models of the memristive device and the paths obtained using RL algorithms. Table 6.1 shows the path lengths for the M1, M2, RLg, and RLl algorithms. It can be observed that the device-physics derived model (M2) has better performance as compared to the macro-model (M1). This is expected since the device-physics derived model captures the actual resistance switching mechanism through the F-P equation more accurately. It is very interesting to note that our model is quite efficient and approaches the optimal path for all but the most difficult obstacle layouts. Further, it is able to navigate through the region of local minima quite efficiently as evident from the last three obstacle layouts (Figures 6.2e- 6.2g vs. Figures 6.4e- 6.4g). This ability to handle local minima is one of the most striking features of the proposed approach, and one where many other path planning and navigation algorithms encounter problems. It can be inferred that an active real-time learning considerably improves the performance of the robot navigation in our case.

41 Table 6.1: Comparison of path lengths for different layouts of obstacles (meters)

Layout M1 M2 RLg RLl a 30.05 27.92 25.92 25.92 b 33.66 30.25 25.92 25.92 c 29.08 26.93 25.33 25.33 d 29.99 29.34 27.09 28.26 e 44.86 41.87 26.74 61.06 f 26.98 26.55 24.85 24.85 g 40.57 38.84 30.18 58.18

6.2 Device Variability Study

One of the major limitations of analog state memristive devices has been the variability in the initial and various reconfigurable resistive states. The variabilities in states can arise due to variation in the process parameters during device fabrication or due to the mechanism of switching itself.

Most of the oxide-based memristive devices rely on intentional or unintentional doping to achieve reconfigurable states. Variability in doping can introduce variability in states. Hence, the variability due to various factors such as stochastic formation/rupture of the conductive filaments, variation of tunneling gap distances, newly generated traps, etc. [152] needs to be studied to develop a set of device specifications for application-specific tasks, such as robotic navigation. Furthermore, it is important to develop memristive device-based learning algorithms that are not only robust to variabilities in these devices but also exploit these variabilities for a more enhanced performance.

Researchers have studied the immunity to these variabilities in fabricated devices for applications like handwritten number recognition [153]. We studied three prominent sources of variation in device parameters as follows:

A. Variability in device initial doping concentration

1) Uniform distribution

2) Normal distribution

B. Variability in update of resistive states

C. Device malfunction 42 To evaluate the performance of the developed learning scheme with the described variabilities, we navigated a robot in an unknown environment with randomly placed obstacles. The robot tries to reach the target location while avoiding the obstacles. It is important to note here that the robot navigates with only the local knowledge of the environment and does not know the location of all the obstacles a-priori. We navigate the robot for a reasonably large number of steps for a total of

10 times and a success rate is defined as the percentage of times the robot is able to reach the goal in a reasonable number of steps. It can be quickly noted here that the definition of success rate is somewhat subjective and depends on the required performance of the system. A success rate of more than 80% is considered to be acceptable. This study has been published in [154].

6.2.1 Variability in Device Doping Concentration

The memristive devices rely on intentional and unintentional doping during the fabrication process to achieve reconfigurable states. The variations in process parameters during fabrication give rise to variations in these resistive states. We have modeled this variation to be following − 1) uniform distribution and 2) normal distribution.

Uniform distribution

A variability parameter (v) is varied from 1 through 100. This means a range of n0/v to n0.v. Fig- ure 6.5 shows the performance results of the robot navigation for the case of uniformly distributed variation in initial doping concentration. We can observe that a value of v up to 13 is acceptable for the case of robot navigation scenario. As the range gets wider, the performance of the system reduces rapidly.

Normal distribution

The values for initial doping concentration are modeled to be normally distributed with mean

20 −3 n0 = 9 × 10 cm and the standard deviation (σ) that varies up as v% of n0, where v is again varied from 1 through 100. However, these values are assumed to lie between the bounds of nlo = n0/v and nhi = n0.v like the previous case. This also comes from the fact that the doping concentration can not be negative or unreasonably high. This is achieved by considering a truncated normal distribution [155] in the specified range. 43 Figure 6.6 represents the probability distribution functions for both the models for v = 60. The desired value (n0), the actual means of the distributions, and the lower and upper bounds are also marked. It should be noted that for a small values of v, the probability distribution curves are narrower and most of the values lie close to the desired value n0. As the variability increases, the uniform distribution stretches much faster as compared to the normal distribution. This means while there is still a higher probability in getting a value close to the desired value of n0 for the normal distribution, the uniform distribution has already moved quite far away from n0. This also explains the rapid decline in performance for uniformly distributed variation (Figure 6.5) unlike the case of normal distribution where the large values of σ are still tolerable, as shown in Figure 6.7.

6.2.2 Variability in Update of Resistive States

At each time step, the device resistance is changed by applying a pulse width modulated as described in Section 4.3. However, the inherent noise in the signal and the device dynamics will cause the state to attain a value different from the one calculated from Eqn. 4.8. This is modeled by considering a normally distributed noise in the change in carrier concentration (Eqn. 4.7) with a standard deviation (SD) varied up to 100%. Figure 6.8 shows the performance results of the robot navigation with normally distributed noise in updating the device resistance states. It can be concluded that a noise of upto 55% is acceptable in the resistance update mechanism. We can also see from the beginning of the curve that a small amount of noise is actually helpful to get a better performance out of the learning scheme.

6.2.3 Device Malfunction

This study highlights the fault tolerance of our learning scheme. Here, we assume that a device spontaneously stops working after some time and gets stuck in a low resistance state (shorted). For simulating this scenario, we have marked the devices with their IDs from 1 through 16 as shown in

Figure 6.9.

To capture the effects of the devices irrespective of the layout of the environment, two different obstacle courses are modeled and the robot is navigated through both of them. These obstacle layouts are mirror images of each other as can be seen from Figure 6.10.

Figure 6.11 shows the performance results of the robot navigation with shorted devices for 44 layouts 1 and 2 respectively. We can observe that the device malfunction greatly affects the robot navigation scenario and also it is dependent on the environment layout. For the layout 1, devices

#13 and #15 are very crucial for the navigation. This is expected since they connect the front and right obstacle sensors with the left wheel array. Hence, when the robot approaches the obstacle to its right, the right wheel does not get any current due to these devices being shorted and hence the robot can not navigate away from the obstacle. This can be seen in Figure 6.12. The device is shorted at the beginning of red line and the robot gets stuck at its terminal and is unable to reach the goal.

The results can be summarized in Table 6.2 as follows: Table 6.2: Results Summary

Study Acceptable Variability Device doping Uniform distribution Up to v = 13

concentration Normal distribution Up to 100% SD with mean n0 Update of resistive state Up to 55% SD from calculated value Layout 1 All except #13 and #15 Shorted device Layout 2 #5, #7, and #11

We can observe that the developed learning scheme is very robust and works with satisfactory performance over a wide range of variations in the device parameters. Hence, minor variations in the process parameters are, in fact, tolerable and, in some cases, even improve the performance of the system. The device malfunction study also shows that the performance is severely limited if one of the device fails and hence redundant devices are needed to achieve a truly fault-tolerant system.

We would also like to emphasize that the defined success rate is rather subjective here and the acceptable tolerance limits will depend on the application based on the acceptable trade-off between the fabrication process refinement and robot performance for the particular application.

45 6.3 Simulation Results for Localization

6.3.1 Memristive Device Model

We present the results of robot localization while it navigates in an unknown environment. The environment size is kept as 115cm×138cm to match the experimental environment size. The sensor range is assumed to be 40cm. Figures 6.13a and 6.13b show two runs for the robot navigation using the proposed learning scheme. As earlier, the ‘O’ denotes starting position while ‘×’ denotes the waypoints. The ‘◦’ denotes the estimated position from the localization network as the robot moves along the shown path. The faint cyan dots show the position of all the place cells.

The error in localization is plotted in Figures 6.14a and 6.14b that may also be visualized as in

Figures 6.15a and 6.15b.

6.3.2 Comparison with Computational SLAM

We compare the localization performance of our memristive device-based learning scheme with a conventional computation SLAM model employing particle filter. For this purpose, we run the simulation using the Simple SLAM algorithm developed by Randolph Voorhies [156]. This algo- rithm is based on Montemerlo et al.’s FastSLAM 1.0 algorithm [157], with optimizations removed for clarity. This is explained in the following sections.

Simultaneous Localization and Mapping

Simultaneous Localization and Mapping (SLAM) deals with the problem of estimating the state of the robot while simultaneously building a map of the environment. Given a map with N landmarks,

t (Γ = Γ1, Γ2, ..., ΓN ), the robot state s at time t can be estimated by the following posterior distribution:

p(Γ, st|zt, ut, nt) (6.4)

t t where, z = z1, z2, ..., zt is a sequence of measurements and u = u1, u2, ..., ut is a sequence of robot control inputs. The variables nt specify the identity of landmark observed at time t.

To calculate the posterior 6.4, the robot is given a probabilistic model as p(st|ut, st−1). Ad- ditionally, there is probabilistic measurement model, p(zt|st, Γ, nt), describing how measurements

46 evolve from state. Both the models are non-linear functions with independent Gaussian noise:

p(zt|st, Γ, nt) = g(st, Γnt ) + t (6.5)

p(st|ut, st−1) = h(ut, st−1) + δt (6.6)

where g and h are nonlinear functions, and t and δt denote Gaussian noise with covariance Rt and Pt, respectively.

FastSLAM

FastSLAM [157] is based on the following factorization of the posterior:

t t t t t t t t Y t t t t p(Γ, s |z , u , n ) = p(s |z , u , n ) p(Γn|s , z , u , n ) (6.7) n

For SLAM problems, this factorization states that if the path of the robot is known, the landmark positions could be estimated independently of each other. However, in practice, it is not possible to know the vehicle’s path. Still, the independence makes it possible to factor the posterior into an estimate of path probability and the position of the landmarks, conditioned on each path.

FastSLAM samples the robot path through a particle filter. Each particle consists of its map with N Kalman filters. The updates in FastSLAM begin with sampling new poses based on the most recent robot movement command ut:

[m] [m] st ∼ p(st|st−1, ut). (6.8)

The FastSLAM then updates the estimate of the observed landmarks considering the measurement zt as follows:

t,[m] t t [m] t−1,[m] t−1 t−1 (6.9) p(Γnt |s , n , z ) = ηp(zt|Γnt , st , nt) · p(Γnt |s , n , z ) where η is a constant. Finally, FastSLAM resamples the particles to include the most recent measurement [158].

The FastSLAM algorithm is simplified as SimpleSLAM in [156]. Here, the FastSLAM is imple-

47 mented in MATLAB. The code simulates a robot cruising around in a 2D world which contains 5 uniquely identifiable landmarks at unknown locations. The robot is equipped with a sensor that can measure the range and angle to these landmarks, but each measurement is corrupted by noise.

For this work, the SimpleSLAM is modified to suit the needs of our scenarios (e.g. increasing the number of landmarks).

The simulation setup is same as for device-based SLAM and the results can be seen in Fig- ures 6.16 - 6.18.

The simulation is run for 10 different start and destination positions for both device-based and computational SLAM models. The results are tabulated in Tables 6.3 and 6.5. We also simulated the robot localization in the actual experimental layout for 10 different runs. One such run can be seen in Figure 6.19. The results are tabulated in Table 6.4.

Table 6.3: Device-based localization - Simulation errors (cm)

Run Mean error Median error Minimum error Maximum error 1 5.3309 4.2519 0.2308 18.2189 2 13.2807 9.5699 0.4249 35.5349 3 9.4838 8.5606 0.3568 23.3278 4 5.5195 5.0032 0.2971 25.1072 5 9.9028 2.4907 0.3202 33.0683 6 9.3419 5.2257 0.2774 31.7365 7 5.2119 4.3822 0.7235 16.3203 8 4.2691 3.7433 0.5552 13.7719 9 5.1479 4.3106 0.7112 20.8699 10 7.1416 4.8811 0.3139 34.2034 Overall 7.4630 4.6317 0.2308 35.5349

48 Table 6.4: Device-based localization in actual experimental layout - Simulation errors (cm)

Run Mean error Median error Minimum error Maximum error 1 3.9814 3.8532 0.5065 12.2394 2 3.8542 3.3456 0.0838 10.7950 3 5.3833 3.8793 0.1708 29.0707 4 4.9751 4.2464 0.0780 14.2436 5 4.6076 3.7063 0.2650 12.7815 6 4.5615 4.8591 0.7409 12.9829 7 5.4448 5.1975 1.7732 8.6780 8 3.3448 2.5962 0.3743 20.8676 9 3.2339 3.0212 0.6859 9.6025 10 3.6584 3.1317 0.5313 9.8386 Overall 4.3045 3.7798 0.0780 29.0707

Table 6.5: Computational SLAM - Simulation errors (cm)

Run Mean error Median error Minimum error Maximum error 1 0.3376 0.2754 0.0647 1.9000 2 0.2872 0.2665 0.0154 0.8526 3 0.4037 0.3375 0.0746 1.2587 4 0.3605 0.3149 0.0388 2.0692 5 0.4176 0.3818 0.0198 1.7108 6 0.3381 0.2960 0.0252 1.0658 7 0.2929 0.2675 0.0395 0.6421 8 0.3485 0.2870 0.0297 1.5969 9 0.3540 0.2590 0.0261 1.8012 10 0.3756 0.3161 0.0313 1.1742 Overall 0.3516 0.2915 0.0154 2.0692

49 (a) (b) (c)

(d) (e) (f)

(g)

Figure 6.2: Navigation of robot through different obstacle layouts employing the device-physics derived model of memristive device (M2)

50 (a) (b) (c)

(d) (e) (f)

(g)

Figure 6.3: Navigation of robot through different obstacle layouts employing the Reinforcement Learning algorithm with global knowledge (RLg)

51 (a) (b) (c)

(d) (e) (f)

(g)

Figure 6.4: Navigation of robot through different obstacle layouts employing the Reinforcement Learning algorithm with local knowledge (RLl)

52 Figure 6.5: Performance results for uniform distribution case

Figure 6.6: Probability distribution functions for device initial doping concentration

Figure 6.7: Performance results for normal distribution case

53 Figure 6.8: Performance results for variability in update of resistive states

Figure 6.9: Device IDs in the neuromorphic array

54 (a) Layout 1 (b) Layout 2

Figure 6.10: Two obstacle layouts considered for the device malfunctions study

(a) Layout 1 (b) Layout 2

Figure 6.11: Performance results for shorted device for two layouts

55 Figure 6.12: Navigation scenario for shorted device #13 in Layout 1

(a) Run 1 (b) Run 2

Figure 6.13: Robot navigation and localization for two different runs through device-based model

56 (a) Run 1 (b) Run 2

Figure 6.14: Robot localization errors for two different runs through device-based localization

(a) Run 1 (b) Run 2

Figure 6.15: Robot localization errors for two different runs through device-based localization - 3D view

57 (a) Run 1 (b) Run 2

Figure 6.16: Robot navigation and localization for two different runs through computational SLAM

(a) Run 1 (b) Run 2

Figure 6.17: Robot localization errors for two different runs through computational SLAM

58 (a) Run 1 (b) Run 2

Figure 6.18: Robot localization errors for two different runs through computational SLAM - 3D view

Figure 6.19: Robot navigation and localization in actual experimental layout

59 Chapter 7

Experimental Results

7.1 Khepera III Robot

The Khepera III is a very capable robot developed by the K-Team with built-in ultrasonic sensors and infrared proximity sensors. It can communicate with a PC using serial commands through a serial port, WiFi, or Bluetooth. The infrared proximity sensors give out values inversely pro- portional to the distance from the closest obstacle. The Khepera III takes input in form of wheel speeds for moving.

7.2 Robot Navigation

7.2.1 Experimental Setup

An experimental setup is established as shown in Figure 7.1. The Khepera III robot communicates with the workstation using Bluetooth. The workstation hosts the proposed memristive device model and the learning scheme. The sensor data are received in real-time and, after processing through learning scheme, the movement commands are sent in terms of wheel speeds. An overhead camera is set up to estimate the position of the robot. It should be emphasized that the robot position data is used to generate the plots for representation purposes only (see Figures 7.4a and 7.4b) and this data is not communicated to the robot. We also set up a USB camera to record the navigation videos.

The experimental setup also shows the experimental area where the target is marked by a

60 Figure 7.1: Experimental setup for robot navigation

‘×’ and the obstacles are the randomly placed rectangular boxes as denoted in the figure. The perimeter of the area also has walls that serve as a bounding obstacle. The robot starts from a random location on the map and navigates in the environment to reach the target while avoiding the obstacles.

The robot gathers the obstacle data through the built-in infrared proximity sensors that give out values inversely proportional to the distance from the closest obstacle. This raw data is converted to a proportional current value that is sent to the obstacle sensor array. The output spikes from the motor neuron circuit are mapped to the wheel velocities for driving the robot.

7.2.2 Results

The robot navigation is run for several different layouts of obstacles. We are including two of such layouts in this section. A time-lapse snapshot of robot navigation through the layout 1 of obstacles is shown in Figure 7.2. As we can observe, the robot initially moves towards the right and then

61 Figure 7.2: Time-lapse snapshot of robot navigation through obstacle layout 1 encounters the obstacle. It then corners around the obstacle to reach the target. The position of the robot is estimated through an overhead camera and plotted as shown in Figure 7.4a. The robot is provided with the direction and distance to the target and the presence of obstacles is sensed using the on-board infrared proximity sensors as mentioned in Section 5.1.1.

Another navigation scenario is shown in Figure 7.3 which is more interesting to note due to the presence of a region of local minima. The robot position is plotted in Figure 7.4b. In this navigation scenario, the robot gets into the region of local minima and, after some exploration, is able to navigate out of it and reach the target. This is a striking feature of our learning scheme where conventional navigation algorithms based on local information of the environment (such as potential field functions) fail.

We experimented with different layouts of obstacles and obtained very promising results that corroborate the validity and the potential of the learning scheme. This work has been published online and some videos of the robot navigation can be found as the supplementary material to

[159].

62 Figure 7.3: Time-lapse snapshot of robot navigation through obstacle layout 2

7.3 Localization Results

7.3.1 Environment Setup

The environment is set up as shown in Figure 7.5 for demonstrating the localization aspect of the learning scheme. The complete experimental setup follows the same protocols as discussed earlier in Section 7.2.1.

7.3.2 Memristive Device based Model

We present the robot navigation and localization results for two different runs in Figures 7.6 and 7.7.

The landmarks are shown in black filled circles that light up in green as they come in range of the robot. The ‘×’ shows the target and ‘◦’ denotes the estimated position from the localization network as the robot moves in the environment. Figure 7.8 shows the localization errors for the two runs that may also be visualized in Figure 7.9. The broken circle around the target represents the range within which the robot is considered to have reached the target.

63 (a) (b)

Figure 7.4: Navigation of robot through different obstacle layouts - (a) Layout 1, and (b) Layout 2.

7.3.3 Comparison with Computational SLAM

We also performed localization using the computational SLAM method described in Section 6.3.2, while navigating the actual Khepera robot in an experimental setting. Figure 7.10 shows the localization error for the two runs that may also be visualized in Figure 7.11. The broken circle around the target represents the range within which the robot is considered to have reached the target.

We tested with 10 different runs for start and destination positions for both device-based and computational SLAM models. The results are tabulated in Table 7.1 and 7.2. Videos for some of the device-based localization runs may be found in [160].

64 Figure 7.5: Environment setup for robot localization

7.4 Discussions

From Tables 6.5 and 7.2, we can notice the degradation in performance of the computational SLAM during the experimental runs as compared to those of simulation runs, as much as by an order of

101 in mean error and maximum error. This is expected, since, for the simulation, we model the errors in the robot sensors ourselves while sensing the environment and landmarks. Hence, this is a known error in the model and the algorithm performs well. For experimental scenario, the actual errors or their behavior/distribution is not known. Hence, we see that the performance of the algorithm is not as good as that in simulation.

For the case of device-based localization, referring to Tables 6.3 and 6.4, we can observe that the localization performance depends on the environment layout. We can see that the actual experimental layout that contains less number of landmarks, spaced farther apart, actually lead to a better memory recall. It is intuitive that a highly cluttered environment will lead to confusion in the the location recall of the robot due to a large number of perceived features at all locations.

This may be resolved by an adaptive sensor range that picks up the most relevant features based

65 Figure 7.6: Snapshot of robot navigation and localization - run 1 on the incoming feature stream.

Further comparing the simulation and experimental results of device-based localization (Ta- bles 6.3 and 7.1), we find that the simulation was able to capture the actual experimental runs very precisely, as seen from the localization performance. This is also due to the fact that the noise in sensor measurement and robot modeling do not affect our developed approach very much. This is also contributed by the fact that the memristive devices have a high tolerance to variability in weight update as seen in our variability study (Section 6.2.2).

66 Figure 7.7: Snapshot of robot navigation and localization - run 2

(a) Run 1 (b) Run 2

Figure 7.8: Robot localization errors for two different runs through device-based SLAM

67 (a) Run 1 (b) Run 2

Figure 7.9: Robot localization errors for two different runs through device-based localization - 3D view

(a) Run 1 (b) Run 2

Figure 7.10: Robot localization errors for two different runs through computational SLAM

68 (a) Run 1 (b) Run 2

Figure 7.11: Robot localization errors for two different runs through computational SLAM - 3D view

Table 7.1: Device-based localization - Experimental errors (cm)

Run Mean error Median error Minimum error Maximum error 1 3.8399 3.2690 0.4758 20.9869 2 3.7264 2.8343 0.3986 18.8481 3 4.6478 3.7290 0.3052 24.7500 4 3.7045 3.4195 0.6928 13.2461 5 5.3961 3.9414 0.4694 14.8980 6 4.0890 3.0465 0.1777 18.3121 7 4.9927 3.2535 0.3189 21.5087 8 3.1843 2.5552 0.0876 21.6100 9 6.1209 4.2080 1.1519 26.4341 10 4.2500 3.6425 0.3761 13.2067 Overall 4.3952 3.3443 0.0876 26.4341

69 Table 7.2: Computational SLAM - Experimental errors (cm)

Run Mean error Median error Minimum error Maximum error 1 2.8204 2.7273 0.2456 6.8733 2 3.6618 3.1364 0.3841 13.0197 3 2.4726 1.9802 0.0344 11.1645 4 3.0489 3.0279 0.2468 8.3364 5 3.2195 3.2187 0.1363 9.4001 6 3.5087 3.2767 0.2959 10.3119 7 2.8649 2.5514 0.1117 16.8448 8 3.1839 3.3514 0.1665 4.1661 9 3.6606 3.4948 0.2463 19.3131 10 2.3578 1.6813 0.0329 17.6788 Overall 3.0799 3.0822 0.0329 19.3131

70 Chapter 8

Discussions

We have presented a novel learning scheme, implemented on synaptic memory devices, inspired from biological species for efficient processing and computing, and hence learning, in robots in an unsupervised manner. Such devices are capable of carrying out computational processing as well as memory storage simultaneously in an energy efficient manner. This property makes them superior and ideal for next generation of computing abilities as compared to classical von Neumann architectures.

First, we explored the potential of our learning scheme towards navigation of robots in an unknown environment with randomly placed obstacles. We found that a robot coupled with just a few of such devices is able to perform very well, thus reaching a known target in a very efficient manner. Further, our approach has a striking advantage over traditional navigation approaches, such as potential field algorithms, that it doesn’t get stuck in a local minima. This approach is also very scalable and can be readily applied to any miniature sized robot running in any environment.

Next, we looked into the localization aspect of the SLAM problem and found that the robot with a large number of such devices arranged in a crossbar fashion is able to learn the associations between the features of the landmarks around it and the location information coming from the built-in place cells. This is a groundbreaking outcome from our approach and, as validated from the comparison with the computational SLAM, it corroborates the fact that such memristive devices are the next generation of an energy efficient and parallel computing architecture.

We have published parts of our work demonstrating the validity and potential of our learning scheme as [154], [159], [161], [162]. The work on robot localization will be published shortly [163]. 71 8.1 Future Work

While we have integrated and complemented the navigation scheme with the localization aspect, there are still a few areas of extension. The first step forward would be to implement this learning scheme on fabricated device arrays, thus gauging the scalability and potential of such a scheme with actual devices. The device variability study conducted in Section 6.2 will prove vital for designing and developing memristive device arrays for this purpose.

The other aspect of localization, i.e., mapping still remains to be explored. The traditional ways are too computationally extensive to be realized on miniature robots in an energy efficient manner, hence, a parallel processing array of these devices will be ideal for storing maps of the environment. The power of grid cells may also be utilized to build a complete cognitive map coupled with ‘place cells’ to realize a truly brain-inspired mechanism of learning.

Finally, the developed learning scheme may be applied to explore its potential in learning various other scenarios, for ex., image recognition, voice recognition, and other deep learning areas, which need a lot of training and effort to be developed for a particular application. The human brain does this in a very efficient and intuitive way and hence, these applications may benefit greatly from our learning scheme.

72 Bibliography

[1] J. Von Neumann and M. D. Godfrey, “First draft of a report on the edvac,” IEEE Annals

of the History of Computing, vol. 15, no. 4, pp. 27–75, 1993.

[2] T. M. Wong, R. Preissl, P. Datta, M. Flickner, R. Singh, S. K. Esser, E. McQuinn, R.

Appuswamy, W. P. Risk, H. D. Simon, et al., “1014,” IBM Research Divsion, Research

Report RJ10502, 2012.

[3] R. S. Michalski, J. G. Carbonell, and T. M. Mitchell, Machine learning: An artificial intel-

ligence approach. Springer Science & Business Media, 2013.

[4] S. K. Murthy, “Automatic construction of decision trees from data: A multi-disciplinary

survey,” Data mining and knowledge discovery, vol. 2, no. 4, pp. 345–389, 1998.

[5] J. R. Quinlan, C4. 5: programs for machine learning. Elsevier, 2014.

[6] T.-S. Lim, W.-Y. Loh, and Y.-S. Shih, “A comparison of prediction accuracy, complexity,

and training time of thirty-three old and new classification algorithms,” Machine learning,

vol. 40, no. 3, pp. 203–228, 2000.

[7] J. Gehrke, R. Ramakrishnan, and V. Ganti, “Rainforest—a framework for fast decision

tree construction of large datasets,” Data Mining and Knowledge Discovery, vol. 4, no. 2-3,

pp. 127–162, 2000.

[8] Z. Zheng, “Constructing conjunctions using systematic search on decision trees,” Knowledge-

Based Systems, vol. 10, no. 7, pp. 421–430, 1998.

[9] J. Gama and P. Brazdil, “Linear tree,” Intelligent Data Analysis, vol. 3, no. 1, pp. 1–22,

1999.

73 [10] M. Radmanesh, A. Nemati, M. Sarim, and M. Kumar, “Flight formation of quad-copters

in presence of dynamic obstacles using mixed integer linear programming,” in ASME 2015

Dynamic systems and control conference, American Society of Mechanical Engineers, 2015,

V001T06A009–V001T06A009.

[11] M. Radmanesh and M. Kumar, “Flight formation of UAVs in presence of moving obstacles

using fast-dynamic mixed integer linear programming,” Aerospace Science and Technology,

vol. 50, pp. 149–160, 2016.

[12] J. F¨urnkranz,“Separate-and-conquer rule learning,” Artificial Intelligence Review, vol. 13,

no. 1, pp. 3–54, 1999.

[13] A. An and N. Cercone, “Rule quality measures improve the accuracy of rule induction:

An experimental approach,” in International Symposium on Methodologies for Intelligent

Systems, Springer, 2000, pp. 119–129.

[14] T. Lindgren, “Methods for rule conflict resolution,” in European Conference on Machine

Learning, Springer, 2004, pp. 262–273.

[15] G. P. Zhang, “Neural networks for classification: A survey,” IEEE Transactions on Systems,

Man, and Cybernetics, Part C (Applications and Reviews), vol. 30, no. 4, pp. 451–462, 2000.

[16] L. Camargo and T. Yoneyama, “Specification of training sets and the number of hidden

neurons for multilayer perceptrons,” Neural Computation, vol. 13, no. 12, pp. 2673–2680,

2001.

[17] M. A. Kon and L. Plaskota, “Information complexity of neural networks,” Neural Networks,

vol. 13, no. 3, pp. 365–375, 2000.

[18] J. Y. Yam and T. W. Chow, “Feedforward networks training speed enhancement by optimal

initialization of the synaptic coefficients,” IEEE Transactions on Neural Networks, vol. 12,

no. 2, pp. 430–434, 2001.

[19] M. Siddique and M. Tokhi, “Training neural networks: Backpropagation vs. genetic algo-

rithms,” in Neural Networks, 2001. Proceedings. IJCNN’01. International Joint Conference

on, IEEE, vol. 4, 2001, pp. 2673–2678.

74 [20] G. G. Yen and H. Lu, “Hierarchical genetic algorithm based neural network design,” in

Combinations of Evolutionary Computation and Neural Networks, 2000 IEEE Symposium

on, IEEE, 2000, pp. 168–175.

[21] P. W. Eklund and A. Hoang, “A performance survey of public domain supervised machine

learning algorithms,” Australian Journal of Intelligent Information Systems. v9 i1, pp. 1–47,

2002.

[22] I. G. Maglogiannis, Emerging artificial intelligence applications in computer engineering:

real word AI systems with applications in eHealth, HCI, information retrieval and pervasive

technologies. Ios Press, 2007, vol. 160.

[23] F. V. Jensen, An introduction to Bayesian networks. UCL press London, 1996, vol. 210.

[24] M Madden, “The performance of bayesian network classifiers constructed using different

techniques,” in Proceedings of European conference on machine learning, workshop on prob-

abilistic graphical models for classification, 2003, pp. 59–70.

[25] D. M. Chickering, “Optimal structure identification with greedy search,” Journal of machine

learning research, vol. 3, no. Nov, pp. 507–554, 2002.

[26] S. Acid and L. M. de Campos, “Searching for bayesian network structures in the space

of restricted acyclic partially directed graphs,” Journal of Artificial Intelligence Research,

vol. 18, pp. 445–490, 2003.

[27] R. G. Cowell, “Conditions under which conditional independence and scoring methods lead

to identical selection of bayesian network models,” in Proceedings of the Seventeenth con-

ference on Uncertainty in artificial intelligence, Morgan Kaufmann Publishers Inc., 2001,

pp. 91–97.

[28] Y. Yang and G. I. Webb, “On why discretization works for naive-bayes classifiers,” in Aus-

tralasian Joint Conference on Artificial Intelligence, Springer, 2003, pp. 440–452.

[29] R. R. Bouckaert, “Naive bayes classifiers that perform well with continuous variables,” in

Australasian Joint Conference on Artificial Intelligence, Springer, 2004, pp. 1089–1094.

[30] J. C. Platt et al., “Using analytic QP and sparseness to speed training of support vector

machines,” Advances in neural information processing systems, pp. 557–563, 1999. 75 [31] S. S. Keerthi and E. G. Gilbert, “Convergence of a generalized SMO algorithm for SVM

classifier design,” Machine Learning, vol. 46, no. 1-3, pp. 351–360, 2002.

[32] E. Oja, “Simplified neuron model as a principal component analyzer,” Journal of mathe-

matical biology, vol. 15, no. 3, pp. 267–273, 1982.

[33] T. D. Sanger, “Optimal unsupervised learning in a single-layer linear feedforward neural

network,” Neural networks, vol. 2, no. 6, pp. 459–473, 1989.

[34] P. Foldiak, “Adaptive network for optimal linear feature extraction,” in Neural Networks,

1989. IJCNN., International Joint Conference on, IEEE, 1989, pp. 401–405.

[35] K. I. Diamantaras and S. Y. Kung, Principal component neural networks: theory and appli-

cations. John Wiley & Sons, Inc., 1996.

[36] J. Rubner and P. Tavan, “A self-organizing network for principal-component analysis,” EPL

(Europhysics Letters), vol. 10, no. 7, p. 693, 1989.

[37] L Wang and J Karhunen, “A uni ed neural bigradient algorithm for robust PCA and MCA,”

to appear in int,” J. of Neural Systems,

[38] S. Haykin and N. Network, “A comprehensive foundation,” Neural Networks, vol. 2, no. 2004,

2004.

[39] K.-L. Du and M. Swamy, “Principal component analysis,” in Neural Networks and Statistical

Learning, Springer, 2014, pp. 355–405.

[40] R. Bro and A. K. Smilde, “Principal component analysis,” Analytical Methods, vol. 6, no. 9,

pp. 2812–2831, 2014.

[41] J. Wang, “Principal component analysis,” in Geometric Structure of High-Dimensional Data

and Dimensionality Reduction, Springer, 2012, pp. 95–114.

[42] H. H. Harmon, Modern factor analysis, 1967.

[43] M Kendall and A Stuart, “The advanced theory of statistics griffin: London 1979,” Vol.

I-III, p. 102, 1976-1979.

[44] A. Webb, “Statistical pattern recognition. 1999,” Arnold, London,

76 [45] J.-F. Cardoso, “Blind signal separation: Statistical principles,” Proceedings of the IEEE,

vol. 86, no. 10, pp. 2009–2025, 1998.

[46] A. Hyv¨arinenand E. Oja, “A fast fixed-point algorithm for independent component analy-

sis,” Neural computation, vol. 9, no. 7, pp. 1483–1492, 1997.

[47] J. Karhunen, E. Oja, L. Wang, R. Vigario, and J. Joutsensalo, “A class of neural networks

for independent component analysis,” IEEE Transactions on Neural Networks, vol. 8, no. 3,

pp. 486–504, 1997.

[48] E. Oja, “The nonlinear pca learning rule in independent component analysis,” Neurocom-

puting, vol. 17, no. 1, pp. 25–45, 1997.

[49] G. D. Garson, Factor analysis. 2013.

[50] E. E. Cureton and R. B. D’Agostino, Factor analysis: An applied approach. Psychology

Press, 2013.

[51] R. P. McDonald, Factor analysis and related methods. Psychology Press, 2014.

[52] K.-L. Du and M. Swamy, “Independent component analysis,” in Neural Networks and Sta-

tistical Learning, Springer, 2014, pp. 419–450.

[53] R. J. Schalkoff, Pattern recognition. Wiley Online Library, 1992.

[54] S. Grossberg, “Direct perception or adaptive resonance?” Behavioral and Brain Sciences,

vol. 3, no. 03, pp. 385–386, 1980.

[55] T. Kohonen, “The self-organizing map,” Proceedings of the IEEE, vol. 78, no. 9, pp. 1464–

1480, 1990.

[56] H. Ritter, T. Martinetz, K. Schulten, D. Barsky, M. Tesch, and R. Kates, Neural computation

and self-organizing maps: an introduction. Addison-Wesley Reading, MA, 1992.

[57] E. Oja and S. Kaski, Kohonen maps. Elsevier, 1999.

[58] M. Van Hulle, Faithful Representations and Topographic Maps. Wiley, New York, 2000.

[59] M. M. Van Hulle, “Self-organizing maps,” in Handbook of Natural Computing, Springer,

2012, pp. 585–622.

77 [60] G. Deboeck and T. Kohonen, Visual explorations in finance: with self-organizing maps.

Springer Science & Business Media, 2013.

[61] E. Askanazi, “Self organizing maps,” Bulletin of the American Physical Society, vol. 59,

2014.

[62] H. Adeli and S.-L. Hung, Machine learning: neural networks, genetic algorithms, and fuzzy

systems. John Wiley & Sons, Inc., 1994.

[63] W. Maass, “Lower bounds for the computational power of networks of spiking neurons,”

Neural computation, vol. 8, no. 1, pp. 1–40, 1996.

[64] S. Ghosh-Dastidar and H. Adeli, “Improved spiking neural networks for EEG classification

and epilepsy and seizure detection,” Integrated Computer-Aided Engineering, vol. 14, no. 3,

pp. 187–212, 2007.

[65] W. Gerstner and W. M. Kistler, Spiking neuron models: Single neurons, populations, plas-

ticity. Cambridge university press, 2002.

[66] E. M. Izhikevich, “Resonate-and-fire neurons,” Neural networks, vol. 14, no. 6, pp. 883–894,

2001.

[67] A. L. Hodgkin and A. F. Huxley, “A quantitative description of membrane current and

its application to conduction and excitation in nerve,” The Journal of physiology, vol. 117,

no. 4, p. 500, 1952.

[68] E. M. Izhikevich et al., “Simple model of spiking neurons,” IEEE Transactions on neural

networks, vol. 14, no. 6, pp. 1569–1572, 2003.

[69] E. M. Izhikevich, “Which model to use for cortical spiking neurons?” IEEE transactions on

neural networks, vol. 15, no. 5, pp. 1063–1070, 2004.

[70] S. M. Bohte, J. N. Kok, and H. La Poutre, “Error-backpropagation in temporally encoded

networks of spiking neurons,” Neurocomputing, vol. 48, no. 1, pp. 17–37, 2002.

[71] J. J. Wade, L. J. McDaid, J. A. Santos, and H. M. Sayers, “SWAT: A spiking neural net-

work training algorithm for classification problems,” IEEE Transactions on neural networks,

vol. 21, no. 11, pp. 1817–1830, 2010.

78 [72] F. Ponulak and A. Kasinski, “Supervised learning in spiking neural networks with resume:

Sequence learning, classification, and spike shifting,” Neural Computation, vol. 22, no. 2,

pp. 467–510, 2010.

[73] T. Masquelier, R. Guyonneau, and S. J. Thorpe, “Competitive STDP-based spike pattern

learning,” Neural computation, vol. 21, no. 5, pp. 1259–1276, 2009.

[74] J. M. Brader, W. Senn, and S. Fusi, “Learning real-world stimuli in a neural network with

spike-driven synaptic dynamics,” Neural computation, vol. 19, no. 11, pp. 2881–2912, 2007.

[75] M. V. Srinivasan, S.-W. Zhang, J. S. Chahl, E Barth, and S. Venkatesh, “How honeybees

make grazing landings on flat surfaces,” Biological cybernetics, vol. 83, no. 3, pp. 171–183,

2000.

[76] E. M. Low, I. R. Manchester, and A. V. Savkin, “A biologically inspired method for vision-

based docking of wheeled mobile robots,” Robotics and Autonomous Systems, vol. 55, no. 10,

pp. 769–784, 2007.

[77] O. Trullier, S. I. Wiener, A. Berthoz, and J.-A. Meyer, “Biologically based artificial naviga-

tion systems: Review and prospects,” Progress in neurobiology, vol. 51, no. 5, pp. 483–544,

1997.

[78] M. O. Franz and H. A. Mallot, “Biomimetic robot navigation,” Robotics and autonomous

Systems, vol. 30, no. 1, pp. 133–153, 2000.

[79] D. Floreano and C. Mattiussi, “Evolution of spiking neural controllers for autonomous

vision-based robots,” in Evolutionary Robotics. From Intelligent Robotics to Artificial Life,

Springer, 2001, pp. 38–61.

[80] D. Floreano, Y. Epars, J.-C. Zufferey, and C. Mattiussi, “Evolution of spiking neural circuits

in autonomous mobile robots,” International Journal of Intelligent Systems, vol. 21, no. 9,

pp. 1005–1024, 2006.

[81] P. F. Verschure, B. J. Kr¨ose,and R. Pfeifer, “Distributed adaptive control: The self-organization

of structured behavior,” Robotics and Autonomous Systems, vol. 9, no. 3, pp. 181–196, 1992.

[82] P. F. Verschure, T. Voegtlin, and R. J. Douglas, “Environmentally mediated synergy between

perception and behaviour in mobile robots,” Nature, vol. 425, no. 6958, pp. 620–624, 2003. 79 [83] P. Arena, L. Fortuna, M. Frasca, and L. Patan´e, “Learning anticipation via spiking networks:

Application to navigation control,” Neural Networks, IEEE Transactions on, vol. 20, no. 2,

pp. 202–216, 2009.

[84] I. P. Pavlov and G. V. Anrep, Conditioned reflexes. Courier Corporation, 2003, vol. 614.

[85] S. Mizumori and J. Williams, “Directionally selective mnemonic properties of neurons in

the lateral dorsal nucleus of the thalamus of rats,” Journal of Neuroscience, vol. 13, no. 9,

pp. 4015–4028, 1993.

[86] G. J. Quirk, R. U. Muller, and J. L. Kubie, “The firing of hippocampal place cells in the dark

depends on the rat’s recent experience,” Journal of Neuroscience, vol. 10, no. 6, pp. 2008–

2017, 1990.

[87] J. O’Keefe and J. Dostrovsky, “The hippocampus as a spatial map. preliminary evidence

from unit activity in the freely-moving rat,” Brain research, vol. 34, no. 1, pp. 171–175, 1971.

[88] J O’keefe and D. Conway, “Hippocampal place units in the freely moving rat: Why they fire

where they fire,” Experimental brain research, vol. 31, no. 4, pp. 573–590, 1978.

[89] J. S. Taube, R. U. Muller, and J. B. Ranck, “Head-direction cells recorded from the post-

subiculum in freely moving rats. i. description and quantitative analysis,” Journal of Neu-

roscience, vol. 10, no. 2, pp. 420–435, 1990.

[90] M. Fyhn, S. Molden, M. P. Witter, E. I. Moser, and M.-B. Moser, “Spatial representation

in the entorhinal cortex,” Science, vol. 305, no. 5688, pp. 1258–1264, 2004.

[91] T. Hafting, M. Fyhn, S. Molden, M.-B. Moser, and E. I. Moser, “Microstructure of a spatial

map in the entorhinal cortex,” Nature, vol. 436, no. 7052, pp. 801–806, 2005.

[92] F. Sargolini, M. Fyhn, T. Hafting, B. L. McNaughton, M. P. Witter, M.-B. Moser, and E. I.

Moser, “Conjunctive representation of position, direction, and velocity in entorhinal cortex,”

Science, vol. 312, no. 5774, pp. 758–762, 2006.

[93] J. S. Taube, R. U. Muller, and J. B. Ranck, “Head-direction cells recorded from the post-

subiculum in freely moving rats. ii. effects of environmental manipulations,” Journal of

Neuroscience, vol. 10, no. 2, pp. 436–447, 1990.

80 [94] J. J. Knierim, H. S. Kudrimoti, and B. L. McNaughton, “Place cells, head direction cells,

and the learning of landmark stability,” Journal of Neuroscience, vol. 15, no. 3, pp. 1648–

1659, 1995.

[95] J. J. Knierim, “Dynamic interactions between local surface cues, distal landmarks, and

intrinsic circuitry in hippocampal place cells,” Journal of Neuroscience, vol. 22, no. 14,

pp. 6254–6264, 2002.

[96] K. M. Gothard, W. E. Skaggs, and B. L. McNaughton, “Dynamics of mismatch correc-

tion in the hippocampal ensemble code for space: Interaction between path integration and

environmental cues,” Journal of Neuroscience, vol. 16, no. 24, pp. 8027–8040, 1996.

[97] E. Save, A. Cressant, C. Thinus-Blanc, and B. Poucet, “Spatial firing of hippocampal place

cells in blind rats,” Journal of Neuroscience, vol. 18, no. 5, pp. 1818–1826, 1998.

[98] A. D. Redish, A. N. Elga, and D. S. Touretzky, “A coupled attractor model of the rodent

head direction system,” Network: Computation in Neural Systems, vol. 7, no. 4, pp. 671–685,

1996.

[99] K. Zhang, “Representation of spatial orientation by the intrinsic dynamics of the head-

direction cell ensemble: A theory,” Journal of Neuroscience, vol. 16, no. 6, pp. 2112–2126,

1996.

[100] A. Samsonovich and B. L. McNaughton, “Path integration and cognitive mapping in a con-

tinuous attractor neural network model,” Journal of Neuroscience, vol. 17, no. 15, pp. 5900–

5920, 1997.

[101] B. McNaughton, L. Chen, and E. Markus, ““dead reckoning,” landmark learning, and the

sense of direction: A neurophysiological and computational hypothesis,” Journal of Cognitive

Neuroscience, vol. 3, no. 2, pp. 190–202, 1991.

[102] W. E. Skaggs, J. J. Knierim, H. S. Kudrimoti, and B. L. McNaughton, “A model of the

neural basis of the rat’s sense of direction,” in Advances in neural information processing

systems, 1995, pp. 173–180.

[103] A. Arleo, Spatial Learning and Navigation in Neuro Mimetic Systems: Modeling the Rat

Hippocampus. dissertation. de, 2000.

81 [104] N. Burgess, M. Recce, and J. O’Keefe, “A model of hippocampal function,” Neural networks,

vol. 7, no. 6, pp. 1065–1081, 1994.

[105] P. Gaussier, A. Revel, J.-P. Banquet, and V. Babeau, “From view cells and place cells to

cognitive map learning: Processing stages of the hippocampal system,” Biological cybernetics,

vol. 86, no. 1, pp. 15–28, 2002.

[106] M. Hoy, A. S. Matveev, and A. V. Savkin, “Algorithms for collision-free navigation of mobile

robots in complex cluttered environments: A survey,” Robotica, vol. 33, no. 03, pp. 463–497,

2015.

[107] A. R. Girard, A. S. Howell, and J. K. Hedrick, “Border patrol and surveillance missions

using multiple unmanned air vehicles,” in Decision and Control, 2004. CDC. 43rd IEEE

Conference on, IEEE, vol. 1, 2004, pp. 620–625.

[108] G. Saggiani and B Teodorani, “Rotary wing UAV potential applications: An analytical study

through a matrix method,” Aircraft Engineering and Aerospace Technology, vol. 76, no. 1,

pp. 6–14, 2004.

[109] M. Caccia, R. Bono, G. Bruzzone, and G. Veruggio, “Variable-configuration UUVs for ma-

rine science applications,” Robotics & Automation Magazine, IEEE, vol. 6, no. 2, pp. 22–32,

1999.

[110] K. Lee and M. Han, “Lane-following method for high speed autonomous vehicles,” Interna-

tional Journal of Automotive Technology, vol. 9, no. 5, pp. 607–613, 2008.

[111] M. Sarim, A. Nemati, and M. Kumar, “Autonomous wall-following based navigation of

unmanned aerial vehicles in indoor environments,” in Proceedings of the 2015 AIAA SciTech

Conference, AIAA Infotech @ Aerospace, AIAA, 2015.

[112] D. B. Aranibar and P. J. Alsina, “Reinforcement learning-based path planning for au-

tonomous robots,” in EnRI-XXIV Congresso da Sociedade Brasileira de Computa¸cao, 2004,

p. 10.

[113] S. Fujisawa, R. Kurozumi, T. Yamamoto, and Y. Suita, “Path planning for mobile robots

using an improved reinforcement learning scheme,” in Intelligent Control, 2002. Proceedings

of the 2002 IEEE International Symposium on, IEEE, 2002, pp. 67–74.

82 [114] M. Radmanesh, M. Kumar, P. H. Guentert, and M. Sarim, “Overview of path-planning and

obstacle avoidance algorithms for UAVs: A comparative study,” Unmanned systems, vol. 6,

no. 2, pp. 1–24, 2018.

[115] R. C. Smith and P. Cheeseman, “On the representation and estimation of spatial uncer-

tainty,” The international journal of Robotics Research, vol. 5, no. 4, pp. 56–68, 1986.

[116] H. F. Durrant-Whyte, “Uncertain geometry in robotics,” IEEE Journal on Robotics and

Automation, vol. 4, no. 1, pp. 23–31, 1988.

[117] H. Durrant-Whyte, D. Rye, and E. Nebot, “Localization of autonomous guided vehicles,” in

Robotics Research, Springer, 1996, pp. 613–625.

[118] J. A. Castellanos, J. M. Mart´ınez,J. Neira, and J. D. Tard´os,“Experiments in multisensor

mobile robot localization and map building,” IFAC Proceedings Volumes, vol. 31, no. 3,

pp. 369–374, 1998.

[119] J. J. Leonard and H. J. S. Feder, “A computationally efficient method for large-scale con-

current mapping and localization,” in Robotics Research, Springer, 2000, pp. 169–176.

[120] J. Guivant, E. Nebot, and S. Baiker, “Autonomous navigation and map building using laser

range sensors in outdoor applications,” Journal of robotic systems, vol. 17, no. 10, pp. 565–

583, 2000.

[121] S. B. Williams, P. Newman, G. Dissanayake, and H. Durrant-Whyte, “Autonomous un-

derwater simultaneous localisation and map building,” in Robotics and Automation, 2000.

Proceedings. ICRA’00. IEEE International Conference on, IEEE, vol. 2, 2000, pp. 1793–

1798.

[122] R. Smith, M. Self, and P. Cheeseman, “Estimating uncertain spatial relationships in robotics,”

in Autonomous robot vehicles, Springer, 1990, pp. 167–193.

[123] F. Lu and E. Milios, “Globally consistent range scan alignment for environment mapping,”

Autonomous robots, vol. 4, no. 4, pp. 333–349, 1997.

[124] J. E. Guivant and E. M. Nebot, “Optimization of the simultaneous localization and map-

building algorithm for real-time implementation,” Robotics and Automation, IEEE Trans-

actions on, vol. 17, no. 3, pp. 242–257, 2001. 83 [125] E. Schnipke, S. Reidling, J. Meiring, W. Jeffers, M. Hashemi, R. Tan, A. Nemati, and

M. Kumar, “Autonomous navigation of uav through gps-denied indoor environment with

obstacles,” in AIAA Infotech@ Aerospace, 2015, p. 0715.

[126] S. Mandal, A. El-Amin, K. Alexander, B. Rajendran, and R. Jha, “Novel synaptic memory

device for neuromorphic computing,” Scientific reports, vol. 4, 2014.

[127] C. Zamarre˜no-Ramos, L. A. Camu˜nas-Mesa,J. A. P´erez-Carrasco,T. Masquelier, T. Serrano-

Gotarredona, and B. Linares-Barranco, “On spike-timing-dependent-plasticity, memristive

devices, and building a self-learning visual cortex,” Frontiers in neuroscience, vol. 5, 2011.

[128] S. Song, K. D. Miller, and L. F. Abbott, “Competitive Hebbian learning through spike-

timing-dependent synaptic plasticity,” Nature neuroscience, vol. 3, no. 9, pp. 919–926, 2000.

[129] E. M. Izhikevich, J. A. Gally, and G. M. Edelman, “Spike-timing dynamics of neuronal

groups,” Cerebral cortex, vol. 14, no. 8, pp. 933–944, 2004.

[130] W. Gerstner, R. Ritz, and J. L. Van Hemmen, “Why spikes? Hebbian learning and retrieval

of time-resolved excitation patterns,” Biological cybernetics, vol. 69, no. 5-6, pp. 503–515,

1993.

[131] A. A. Minai and W. B. Levy, “Sequence learning in a single trial,” in INNS world congress

on neural networks, Erlbaum Hillsdale, NJ, vol. 2, 1993, pp. 505–508.

[132] L. Abbott and K. I. Blum, “Functional significance of long-term potentiation for sequence

learning and prediction,” Cerebral Cortex, vol. 6, no. 3, pp. 406–416, 1996.

[133] P. D. Roberts, “Computational consequences of temporally asymmetric learning rules: I. Dif-

ferential Hebbian learning,” Journal of Computational Neuroscience, vol. 7, no. 3, pp. 235–

246, 1999.

[134] K. Blum, L. Abbott, et al., “A model of spatial map formation in the hippocampus of the

rat,” Neural Computation, vol. 8, no. 1, pp. 85–93, 1996.

[135] W. Gerstner and L. Abbott, “Learning navigational maps through potentiation and mod-

ulation of hippocampal place cells,” Journal of computational neuroscience, vol. 4, no. 1,

pp. 79–94, 1997.

84 [136] M. R. Mehta, M. C. Quirk, and M. A. Wilson, “Experience-dependent asymmetric shape of

hippocampal receptive fields,” Neuron, vol. 25, no. 3, pp. 707–715, 2000.

[137] T. Masquelier, R. Guyonneau, and S. J. Thorpe, “Spike timing dependent plasticity finds

the start of repeating patterns in continuous spike trains,” PloS one, vol. 3, no. 1, e1377,

2008.

[138] B. W. B¨uel,“A neurally controlled robot that learns,” Master’s thesis, 2011.

[139] H. Hagras, A. Pounds-Cornish, M. Colley, V. Callaghan, and G. Clarke, “Evolving spiking

neural network controllers for autonomous robots,” in Robotics and Automation, 2004. Pro-

ceedings. ICRA’04. 2004 IEEE International Conference on, IEEE, vol. 5, 2004, pp. 4620–

4626.

[140] D. Floreano, J.-C. Zufferey, and J.-D. Nicoud, “From wheels to wings with evolutionary

spiking circuits,” Artificial Life, vol. 11, no. 1-2, pp. 121–138, 2005.

[141] A. Bouganis and M. Shanahan, “Training a spiking neural network to control a 4-dof robotic

arm based on spike timing-dependent plasticity,” in Neural Networks (IJCNN), The 2010

International Joint Conference on, IEEE, 2010, pp. 1–8.

[142] R. V. Florian, “Reinforcement learning through modulation of spike-timing-dependent synap-

tic plasticity,” Neural Computation, vol. 19, no. 6, pp. 1468–1502, 2007.

[143] R. Evans, “Reinforcement learning in a neurally controlled robot using dopamine modulated

STDP,” Master’s thesis, 2012.

[144] D. O. Hebb, The organization of behavior, 1949.

[145] G. S. Snider, “Self-organized computation with unreliable, memristive nanodevices,” Nan-

otechnology, vol. 18, no. 36, p. 365 202, 2007.

[146] G. S. Snider, “Spike-timing-dependent learning in memristive nanodevices,” in Nanoscale

Architectures, 2008. NANOARCH 2008. IEEE International Symposium on, IEEE, 2008,

pp. 85–92.

[147] S. H. Jo, T. Chang, I. Ebong, B. B. Bhadviya, P. Mazumder, and W. Lu, “Nanoscale

memristor device as synapse in neuromorphic systems,” Nano letters, vol. 10, no. 4, pp. 1297–

1301, 2010. 85 [148] G.-q. Bi and M.-m. Poo, “Synaptic modification by correlated activity: Hebb’s postulate

revisited,” Annual review of neuroscience, vol. 24, no. 1, pp. 139–166, 2001.

[149] W. Zhu, T.-P. Ma, T. Tamagawa, J Kim, and Y Di, “Current transport in metal/hafnium

oxide/silicon structure,” IEEE Electron Device Letters, vol. 23, no. 2, pp. 97–99, 2002.

[150] G. Antonelli, S. Chiaverini, and G. Fusco, “A calibration method for odometry of mobile

robots based on the least-squares technique: Theory and experimental validation,” IEEE

Transactions on Robotics, vol. 21, no. 5, pp. 994–1004, 2005, issn: 1552-3098. doi: 10 . 1109/TRO.2005.851382.

[151] R. S. Sutton and A. G. Barto, Reinforcement learning: An introduction, 1. MIT press Cam-

bridge, 1998, vol. 1.

[152] S. Yu, X. Guan, and H.-S. P. Wong, “On the stochastic nature of resistive switching in metal

oxide rram: Physical modeling, monte carlo simulation, and experimental characterization,”

in Electron Devices Meeting (IEDM), 2011 IEEE International, IEEE, 2011, pp. 17–3.

[153] D. Querlioz, O. Bichler, P. Dollfus, and C. Gamrat, “Immunity to device variations in a spik-

ing neural network with memristive nanodevices,” IEEE Transactions on Nanotechnology,

vol. 12, no. 3, pp. 288–295, 2013.

[154] M. Sarim, R. Jha, and M. Kumar, “Neuromorphic device specifications for unsupervised

learning in robots,” in Aerospace and Electronics Conference (NAECON), 2017 IEEE Na-

tional, IEEE, 2017, pp. 44–51.

[155] L. Norman, S. Kotz, and N Balakrishnan, Continuous univariate distributions, 1994.

[156] R. Voorhies, Randvoorhies/simpleslam, 2012. [Online]. Available: https://github.com/

randvoorhies/SimpleSLAM.

[157] M. Montemerlo, S. Thrun, D. Koller, and B. Wegbreit, “FastSLAM: A factored solution to

the simultaneous localization and mapping problem,” in Proceedings of the AAAI National

Conference on Artificial Intelligence, Edmonton, Canada: AAAI, 2002.

[158] D. Rubin, J. Bernardo, M. De Groot, D. Lindley, and A. Smith, Bayesian statistics 3, 1988.

[159] M. Sarim, M. Kumar, R. Jha, and A. A. Minai, “Memristive device based learning for

navigation in robots,” Bioinspiration & biomimetics, vol. 12, no. 6, p. 066 011, 2017. 86 [160] M. Sarim, Khepera localization videos using memristive device-based neuromorphic network,

2018. [Online]. Available: https://www.youtube.com/playlist?list=PLFKpluGvxSV4v-

h63XNP2nDTZoB88CdPX.

[161] M. Sarim, T. Schultz, M. Kumar, and R. Jha, “An artificial brain mechanism to develop

a learning paradigm for robot navigation,” in ASME 2015 Dynamic Systems and Control

Conference, American Society of Mechanical Engineers, 2015.

[162] M. Sarim, T. Schultz, R. Jha, and M. Kumar, “Ultra-low energy neuromorphic device based

navigation approach for biomimetic robots,” in 2016 National Aerospace and Electronics

Conference & Ohio Innovation Summit (NAECON-OIS), IEEE, 2016.

[163] M. Sarim, M. Kumar, R. Jha, and A. A. Minai, Memristive device based brain-inspired

localization of robots, 2018, in preparation.

87