Spiking neuromorphic architecture for

associative learning

Dissertation document submitted to the Graduate School of the University of Cincinnati in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the Department of Electrical Engineering and Computer Science of the College of Engineering and Applied Sciences.

Author: Jones M.S. University of Cincinnati 2016 B.S. University of Cincinnati 2015 September 29, 2020 Committee: Rashmi Jha (Chair), Marc Cahay, Manish Kumar, Cory Merkel, and Ali Minai

Abstract

The work shown in this dissertation demonstrates the implementation of a specialized neural network for associative memory within novel neuromorphic hardware. The architecture is implemented using CMOS-based circuitry for information processing and memristive devices for the network’s memory. The architecture is based on a non-Von Neumann version of computer architecture called in-memory computing where information storage and processing reside within a single location. The CMOS circuitry within the architecture has both digital and analog components to perform processing. The memristive devices used in the architecture are a newer form of memristive device that possesses a gate that is used to potentiate/depress the device. These gated-memristive devices allow for simpler hardware architectures for tasks such as reading/writing to a device simultaneously. The architecture demonstrated here uses a property that is often seen within various memristive devices where the state is semi-volatile. This semi- volatile state can be used in tandem with a spiking neuromorphic architecture to perform unique tasks during learning depending on the degree of volatility in the device. Once memories are programmed into the network, it can then later recall previously seen memories by observing partial information from them and performing pattern completion. The final portion of this dissertation focuses on studying how the network behaves when exposed to a larger dataset of information over time and analyzing how the network performs recall on that data. An array of metrics will be used to evaluate the network’s performance during these tests, and potential expansions of network functionality are explored and studied in order to enhance its capabilities in certain applications.

ii

iii

Thank you to all my friends, family, and research colleagues for all the love, support, and advice through the development of this work.

iv

Funding Acknowledgement The work in this dissertation was supported by the National Science Foundation under the following award numbers: ECCS 1156294 SHF-1718428 ECCS 1926465

v

Table of Contents

I. Introduction………………………………………………..…………………………………1 II. Background…………………………………………………..……………………………….5 a. A Brief History/Overview of Neural Networks………………………………….………..5 b. History of Neuromorphic Computing……………………………………………………...6 c. Approaches to Neuromorphic Computing…………………………………………………7 d. Synaptic Devices…………………………………………………………………………..9 e. Neuron Circuits and Devices……………………………………………………………..13 f. Application Space of Neuromorphic Computing………………………………………...14 III. The Neuron Circuit…………………………………..…………………………………..….16 a. The Original Octopus Retina Circuit……………………………………………………..16 b. Expanding the Octopus Retina Circuit…………………………………………………...16 c. Circuit Profiles…………………………………………………..…………..…………...18 IV. Synaptic Devices……………………………………………………………………..………23 a. Introduction to Gated-Synaptic Devices………………………………………………….23 b. Superiority of Gated-Synaptic Devices…………………………………………………..24 c. Initial Device Model…………………………………………..………………………….26 d. Results of Initial Version……………………………………….…….………………..…27 e. Generic Synaptic Model…………………………………….……………………………30 f. Generic Model Results……………………………………….…………………………..44 g. Using the Generic Model for a NbOx Gated-Synaptic Device…………………………..51 V. Architecture………………………………………………………………………..………..57 a. Associative Memory…………………………………….……………………………….57 b. Recurrent/Hopfield Networks……………………..……………………………………..58 c. The Segmented Attractor Network………………………………………………………58 d. Implementing Network in Hardware Using the Initial Device Model……………………61 e. Using SAN for Navigation……………………………………………………………….64 f. Using the Generic Model in a Segmented Attractor Network……………………………67 g. Results of the Generic Model Segmented Attractor Network……………………………70 VI. Analysis of the Segmented Attractor Network…………………………………………….76 a. Importance of Synaptic Values…………………………………………………………..76 b. Recalling Different Amounts of Input……………………………………………………83 c. Varying the Segmented Attractor Network’s Dimensions……………………………….84 d. Implementing a Dataset onto the Segmented Attractor Network…………………………86 e. The EHoS Dataset………………………………………………………………………..87 f. Demonstration of the EHoS Dataset on the Segmented Attractor Network……………..90 g. Expanding the Segmented Attractor Network’s Behavior………………………………103 h. Predicting Future Memories…………………………………………………………….108 i. Erasing Common Information Observed……………………………………………….117 j. Forgetting Memories Over Time……………………………………………………….126 k. Using Behavior Ensembles……………………………………………………………..133 VII. Conclusion………………………………………………………………………………….146

vi

List of Figures II. Background Figure 1: Diagrams of two-terminal, three-terminal, and four-terminal synaptic devices III. The Neuron Circuit Figure 2: Schematic of the SR Octopus Retina neuron circuit Figure 3: Frequency/duty cycle profiles of the SR Octopus Retina neuron circuit (ASU node) Figure 4: Power consumption profile of SR Octopus Retina neuron circuit (ASU node) Figure 5: Spiking energy efficiency comparison for SR Octopus Retina neuron circuit IV. Synaptic Devices Figure 6: Read/write comparison diagram for two and three terminal synaptic devices Figure 7: Potentiation/decay demonstration of double-gated synaptic device model Figure 8: Port diagram for the behavioral gated-synaptic device model Figure 9: Variation of gc during potentiation for gated-synaptic device model Figure 10: Variation of brev during a vin sweep for gated-synaptic device model Figure 11: Variable impact diagram for gated-synaptic device model Figure 12: Experiment replications using the gated-synaptic device model Figure 13: Pulse count fit of gated-synaptic device model to niobium oxide device Figure 14: Voltage sweep fit of gated-synaptic device model to niobium oxide device Figure 15: State decay fit of gated-synaptic device model to niobium oxide device V. Architecture Figure 16: Diagram of an example segmented attractor network Figure 17: Two-memory demonstration of an attractor network using double-gated model Figure 18: Layout diagrams for navigational neuromorphic architecture Figure 19: Results of navigation test using the double-gated synaptic model Figure 20: Hardware diagram of a segmented attractor network using gated-synaptic devices Figure 21: Frequency/duty cycle profiles of SR Octopus Retina neuron circuit (TSMC node) Figure 22: Frequency response of segmented attractor network during association Figure 23: Synaptic heatmap of segmented attractor network during association Figure 24: Frequency response of segmented attractor network during recall VI. Analysis of the Segmented Attractor Network Figure 25: Flowchart for generic segmented attractor network simulations Figure 26: Hit rate of segmented attractor network as its size is increased Figure 27: Hit rate of segmented attractor network as its size is increased (lower weight) Figure 28: Hit rate of segmented attractor network as synaptic weight is varied Figure 29: Hit rate of segmented attractor network while varying amount of hidden input Figure 30: Hit rate of segmented attractor network while varying set count Figure 31: Hit rate of segmented attractor network while varying features per set count Figure 32: Average Hamming distance of all memories in the EHoS dataset Figure 33: Scaled diagram of SAN for EHoS dataset Figure 34: Flowchart for the SAN simulations on the EHoS dataset Figure 35: Hit rate during a basic SAN EHoS dataset run Figure 36: Unique memory ratio during a basic SAN EHoS dataset run Figure 37: Memory occurrences during a basic SAN EHoS dataset run

vii

Figure 38: Hit rate of SAN EHoS dataset run with varied input Figure 39: Unique memory ratio of SAN EHoS dataset run with varied input Figure 40: Memory occurrences of SAN EHoS dataset run with varied input Figure 41: Expanded learning behavior diagrams for the segmented attractor network Figure 42: Additional circuitry required for predictive memory behavior Figure 43: Hit rate of SAN EHoS dataset run with predictive memory behavior Figure 44: Unique memory ratio of SAN EHoS dataset run with predictive memory behavior Figure 45: Memory occurrences of SAN EHoS dataset run with predictive behavior Figure 46: Hit rate of SAN EHoS dataset run with low predictive behavior Figure 47: Unique memory ratio of SAN EHoS dataset run with low predictive behavior Figure 48: Memory occurrences of SAN EHoS dataset run with low predictive behavior Figure 49: Hit rate of SAN EHoS dataset run with erase behavior Figure 50: Unique memory ratio of SAN EHoS dataset run with erase behavior Figure 51: Memory occurrences of SAN EHoS dataset run with erase behavior Figure 52: Hit rate of SAN EHoS dataset run with high erase behavior Figure 53: Unique memory ratio of SAN EHoS dataset run with high erase behavior Figure 54: Memory occurrences of SAN EHoS dataset run with high erase behavior Figure 55: Hit rate of SAN EHoS dataset run with forgetting behavior Figure 56: Unique memory ratio of SAN EHoS dataset run with forgetting behavior Figure 57: Unique memory ratio of SAN EHoS dataset run with faster forgetting behavior Figure 58: Memory occurrences of SAN EHoS dataset run with forgetting behavior Figure 59: Memory occurrences of SAN EHoS dataset run with faster forgetting behavior Figure 60: Hit rate of SAN EHoS dataset run with full behavior ensemble Figure 61: Unique memory ratio of SAN EHoS dataset run with full behavior ensemble Figure 62: Memory occurrences of SAN EHoS dataset run with full behavior ensemble Figure 63: Hit rate of SAN EHoS dataset run with forget and erase behaviors Figure 64: Unique memory ratio of SAN EHoS dataset run with forget and erase behaviors Figure 65: Memory occurrences of SAN EHoS dataset run with forget and erase behaviors Figure 66: Hit rate of SAN EHoS dataset run with predict and erase behaviors Figure 67: Unique memory ratio of SAN EHoS dataset run with predict and erase behaviors Figure 68: Memory occurrences of SAN EHoS dataset run with predict and erase behaviors

viii

List of Tables IV. Synaptic Devices Table 1: User-defined model parameters for the behavioral gated-synaptic device model Table 2: User-defined parameters for the gated-synaptic device experimental replications Table 3: User-defined model parameters for the niobium oxide device experiment fits V. Architecture Table 4: Recall frequency response of the segmented attractor network Table 5: Recall frequency response of the segmented attractor network with faster decay VI. Analysis of the Segmented Attractor Network Table 6: U-Factors of sets within the EHoS dataset

List of Algorithms IV. Synaptic Devices Algorithm 1: Algorithm for the behavioral gated-synaptic device model

ix

I. Introduction For the last several decades, the field of computing’s primary focus has been on the pursuit of continuing to fulfill Moore’s Law. As decades passed, transistors shrunk in size and computers became faster and more efficient. Progress over the past several decades has been massive with the microprocessor being invented, pipelines being implemented in CPUs, memory capacities increasing several orders of magnitude, clock speeds hitting CPU heat extraction rate ceilings, multi-core CPUs being introduced, and graphics cards becoming capable of handling much more parallel tasks.

All these facets of computing still continue to improve to this day, albeit at a slower pace.

Moore’s Law has begun to die, primarily due to difficulties with fabricating continually shrinking transistor sizes [1]. Progress will continue to be made on improving fabrication techniques of smaller transistor sizes, but not at the rate to which the tech industry has acclimated itself. If computing capabilities are to be further improved, the industry must look to other factors besides transistor size to improve how information is processed and handled.

One factor that has been of great interest in the past ten years or so is how computation architecture is shaped at the most basic level. For many years the widely accepted method of arranging a computer’s structure has been the format of Von Neumann architecture [2]. Von

Neumann architecture explicitly defines the internal structure of the computer to be split into two primary components: processing and memory. These two components are separate from one another, where processing is performed in one area of the computer while memory is stored elsewhere. Information from memory is then passed back and forth to the processing module of the computer to perform various tasks. An analogy to this method of computing is a person in a library [3]. The person might be doing work at a table (processor) within the library but must seek

out information within the bookshelves (memory) and bring them back to the table to perform any work based off the content of the books.

Von Neumann architecture has certainly proven effective in processing information efficiently but is not without its shortcomings. One often-cited issue with Von Neumann architecture is the so-called “Von Neumann bottleneck,” where the speed of processing information is limited by the amount of available bandwidth between the processing and memory modules of the computer [4-6]. For tasks that have a large amount of information that can be processed in parallel, this bottleneck can be a massive optimization point for improving processing speeds.

To circumvent the issue of the Von Neumann bottleneck, engineers have begun to investigate other non-Von Neumann architectures that don’t have bandwidth issues. A paradigm called in-memory computing has arisen as a primary contender to Von Neumann architectures for parallel processing [7]. Within this form of architecture, the memory of the system exists within the same location as where the processing takes place. This design removes the bottleneck entirely and allows for data to be processed in parallel much more efficiently.

One computation component that has arisen in the past decade as a method of performing in-memory computing is the GPU [8, 9]. Companies such as NVIDIA have really pushed their

GPUs to the limit in doing parallelizable tasks. One of the new key selling points for GPUs during the 2010’s was their capability of simulating a data structure ideal for certain types of parallel processing called a neural network [10]. Neural networks process information via pattern recognition or associative memories in a manner akin to how biological systems process information [11]. They store information in the form of synaptic weights within the network that can be adjusted through various means. The weights of these synapses can then be analyzed via

2

neurons, which act as the neural networks’ processing units. The collective output of the neurons within the networks can then be measured to determine the solution to whatever problem the neural network has been given.

Neural networks can be implemented on other computation and logic platforms besides

GPUs such as FPGAs or ASICs [12]. Countless research articles have been published on neural networks being implemented on such frameworks [12-15]. Many companies now also exist that utilize neural networks for all sorts of data processing from voice/speech analysis and processing

[16] to economic market prediction [17].

Despite the flourishing field of neural network design, implementation of neural networks on currently available systems still has its limits. GPUs and ASICs rely on heavy power consumption to operate [12], and FPGAs suffer from memory capacity issues [18]. All these systems attempt to fit neural networks onto systems that innately aren’t designed to handle the unique design of neural networks. If neural networks are to see their full potential, they must be used on systems that are not merely adapted to run neural networks, but are instead built specifically for neural network processing. There is a field of research that is specifically targeted at tackling this issue called neuromorphic computing.

Neuromorphic computing looks to tackle neural networks from a hardware perspective. Its primary goal is to implement a neural network directly into hardware to a degree where all the overhead in managing and analyzing a neural network is mostly or completely removed [15]. This implementation technique will allow for faster, smaller, and more power efficient neural network hardware that will enable more widespread use of machine learning technology [15].

This dissertation will introduce neuromorphic architecture designed by the author and how it is used to store memories in the form of different input occurring simultaneously (i.e. associative

3

memory). This work has been published in high impact factor, peer-reviewed journals and presented in conferences. First, a background of neural networks and neuromorphic computing will be provided to give historical context. Next, the neuron circuit and synaptic device models used for the neuromorphic architecture designed will be introduced. After that architecture-level simulations will be shown on both the software and hardware versions of the architecture to show its capabilities. Finally, the dissertation will look at different learning behaviors that can potentially enhance the performance of the network in certain situations before making concluding remarks.

4

II. Background a. A Brief History/Overview of Neural Networks

The origin of neural network research can be traced all the way back to the late 1800s.

Individuals such as Bain [19], James [20], and Sherrington [21] performed some preliminary work on how neural networks operated on the basis of neuron output. This work was all biological research however, and work in computational neural networks wouldn’t come until decades later in the 1940s.

In 1943, Warren McCulloch and Walter Pitts defined a computational model for neural networks [22]. This work performed by McCulloch and Pitts served to turn the gears of computational neural network development for others who read their work. Later in the decade,

Donald Hebb developed the famous Hebb rule and Hebbian learning [23]. In this work, Hebb proposed the principles around the phrase, “Neurons that fire together, wire together,” where neurons that fire simultaneously would potentiate any synapses connected between one another.

In the 1950s, another critical breakthrough was made by the pair Alan Hodgkin and

Andrew Huxley. Hodgkin and Huxley developed a set of differential equations to describe how action potentials within squid neurons operate [24]. Although this was primarily biological work, the fundamentals established by the Hodgkin-Huxley model is used within computational neural network studies in contemporary work [25].

Another major breakthrough in neural network development was made in 1958 when Frank

Rosenblatt established the idea of a perceptron [26]. The perceptron acted as a processing unit within neural networks that performed tasks such as supervised learning where the network would be shown a scenario, prompted to guess the outcome, and then be corrected based upon the

5

proximity of its guess to the true solution. The perceptron has turned into a basic, but very powerful tool in neural network research for many decades to come. To this day papers using perceptrons are still published [15, 27].

Other popular algorithms and methods within neural networks took many years to take shape instead of only within a single paper. Concepts such as backpropagation appeared in various forms in papers as far back as the early 1960s [28] but the algorithm did not reach its modern form until 1986 [29]. In that same year, the concept of Deep learning was introduced by Rina Dechter

[30]. Deep learning would continue to evolve over the course of many research efforts over the years to this day [31], where the concept is still evolving and being stretched to its absolute limit in networks of ever-increasing complexity. b. History of Neuromorphic Computing

Neuromorphic computing got its start from some of the field’s pioneers in the late 1980s.

Researchers such as Carver Mead started to develop basic components and principles for neural processing circuits such as the Axon-Hillock circuit [32]. Throughout the 1990s, other researchers

(who often previously studied under Mead) would continue to improve and develop neural processing circuits through further research [15, 33, 34]. The field was still in its infancy at this point however, due to the limitations in currently available technology.

It wasn’t until the turn of the century that a wider range of groups and individuals decided to join the field of neuromorphic computing. Other research groups across universities and corporations began to research methods of creating neuromorphic components such as neurons

[35, 36]. Publications started to emerge that involved creating actual neuromorphic circuits that could perform extremely basic functions [37]. As technology continued to progress throughout the 2000s, the rate of publications on neuromorphic computing continued to increase [15].

6

During the early 2010s, an explosion of work on neuromorphic computing occurred.

Companies like IBM began to create neuromorphic systems that other groups could use [38].

Research groups such as Giacomo Indiveri’s group at the University of Zürich began to make much larger neuromorphic chips able to perform tasks such as image processing [39]. Behind the scenes, corporations such as Intel were silently observing the work of others while working on their own forms of neuromorphic architecture that wouldn’t be seen until the latter half of the decade [40]. The importance of neuromorphic computing was becoming well known and widespread across the tech industry and research efforts in the field continue to accelerate to this day.

At the dawn of the 2020s, neuromorphic computing is in a very exciting, but still uncertain time period. Many corporations and research groups are still attempting to figure out the optimal methods for performing neural network analysis in hardware. The tech industry for the longest time was extremely focused on improving one single facet of computing performance via shrinking the transistor. Now that the return on investment isn’t as high on shrinking transistors, the computing fields’ top experts must once again think of more creative methods to improve computing performance in an attempt to revive the processing improvement promises made by

Moore’s Law. c. Approaches to Neuromorphic Computing

Neuromorphic computing might be considered as the implementation of neural networks within hardware, but within the field there are multiple methods of accomplishing this task. These methods can be split into three primary categories: digital architectures, bio-inspired architectures, and hybrid architectures.

7

One of the simplest methods of implementing a neuromorphic architecture is by purely taking the digital approach. These architectures are often implemented on platforms such as a

FPGA boards or ASICs and take a fully digital approach to implementing the neuromorphic architecture [12]. This method often implements synapses as arrays of registers within a memory bank and then uses sequences of arithmetic circuits (adders, accumulators, etc.) that act as the neurons [12]. These forms of neuromorphic architectures can be much easier to implement in industry since they require little to no research in terms of finding new devices/circuits to build the architecture. However, these types of architectures fall prey to poor scalability or power consumption issues when attempting to accomplish much more complex tasks [12, 18]. Some examples of these implementations include various FPGA implementations of application-specific neuromorphic architectures [41] and Intel’s Loihi chip [40]. Intel’s architecture has recently started to tackle the power consumption issue that has plagued digital implementations of neuromorphic architectures for years, but still has room for improvement in certain processing areas [42].

Bio-inspired architectures are often more analog-based neuromorphic implementations that try and mimic the techniques biological neurons use in order to learn and accomplish tasks. They tend to center around a type of neural network called a spiking neural network (SNN), where the neurons output sequences of spikes instead of a single value to indicate the neuron’s output strength [43]. Most of these architectures also attempt to look for alternatives to digital memory for synaptic memory. The goal of these alternatives is to find a single device or small subset of devices that can replace the digital registers that commonly hold synaptic values in digital architectures. This device alternative should then ideally increase scalability and reduce power consumption of the architecture. These types of architectures eventually should be more optimal

8

than digital architectures, however more research is required in order to mature them. Examples of these architectures include IBM’s TrueNorth [38] and Stanford’s bio-inspired architecture [44].

Bio-inspired architectures will be the primary focus of the work later discussed in this dissertation.

A third category of neuromorphic architectures are hybrid designs. These architectures mix bio-inspired and digital designs in an attempt to create an optimal architecture with currently available technology. An example of such an architecture would be Indiveri et al.’s exQUAD chip

[39], where digital synapses are mixed with analog neuron circuits. Intel’s Loihi chip can also be argued as a hybrid architecture since its scheme to process information is a spiking method despite the architecture being primarily digital [40]. d. Synaptic Devices

When implementing a bio-inspired neuromorphic architecture, a method of storing each synapse’s state must be determined. Many methods of implementing synapses have been used in various architectures [38, 39]. One of the simplest methods of implementing a synapse is to simply use a standard register to store the synapse’s weight as a binary value. This method is used in implementations such as FPGA implementations of neural networks [13] or Intel’s Loihi chip [40].

This implementation requires the use of a large amount of transistors however, so this method of implementation is not scalable to extremely large architectures [18].

As an alternative to a CMOS register implementation, engineers have looked to more novel forms of memory to implement synapses. The primary device category that has been studied for implementing synapses are memristors, which were originally proposed by Leon Chua in 1971 as the final missing circuit element amongst resistors, capacitors, and inductors [45]. Memristors demonstrate a relationship between electrical flux and charge. This relationship allows for memristors to change their resistive state based upon the electrical bias applied to the device. This

9

resistance however is inherently non-volatile (to an extent), meaning that once the electrical bias is removed the device will maintain its newly adjusted resistive state. This phenomenon appears as hysteresis within the device’s IV curve during a voltage sweep [45].

Physically implemented memristors were largely a mystery until 2008 when HP Labs announced the first implementation of a memristor using a simple two-terminal titanium-oxide device doped with silver [46]. Since then, research into memristive devices has exploded and continues to be pursued with great interest. For around the first ten years after the demonstration of a physically implemented memristor, most research was focused solely on two-terminal versions of memristors (Fig. 1a) [47-50]. Heavy research has been done on optimal material stacks and fabrication techniques for memristive devices, along with other subjects such as read/write schemes to access the device [51, 52].

Some of the most popular memristors that have been studied include various transition metal-oxide devices such as the original titanium oxide devices [46], tantalum oxide [47], niobium oxide [48], and strontium titanate [49]. These devices can rely either on metal dopants or oxygen vacancies to form a conductive path through the device when it’s in its lower resistive state [46].

Some of these devices have also been reported to possess multiple stable resistive states [50], which allows them to be used to store more than just a simple binary value.

Another popular type of two-terminal device that has been studied are phase-change memory (PCM) devices. Instead of forming a conductive path through the device, PCM devices rely on changing the crystalline state of the device in order to adjust the device’s bulk electron mobility [53]. This process isn’t necessarily performed by the electrical bias applied to the device, but instead by the heat generated by applying electrical bias to the device in order to change the

10

device between its crystalline (low-resistive) and amorphous (high-resistive) states [53]. These types of devices are notably used within IBM’s TrueNorth neuromorphic architecture [38].

Within the past couple years, focus has started to shift to a more advanced type of memristive device. This newer type of memristor possesses an additional terminal on the device that is typically used as a programming gate (Fig. 1b). This gate terminal can be used to both potentiate/depress the device depending on the voltage polarity applied. This extra terminal allows for simpler hardware implementations of neuromorphic systems along with processes such as read/write to be performed simultaneously, which is not possible on two-terminal devices. Various forms of these three-terminal or gated-memristive devices have appeared in studies with varying working mechanisms and functionalities [54-57]. Some studies have even gone a step further with gated-memristive devices and proposed dual-gated devices (four-terminal memristors, Fig. 1c)

[58, 59]. These devices would allow for even simpler programming schemes for neuromorphic architectures, however these devices are rather experimental or purely theoretical at this point and require much more research to explore their capabilities.

11

(a) (b) (c)

Figure 1.(a) Diagram showing a sideview for a two-terminal memristive device. A scalability diagram of a two terminal 3x3 matrix can be seen below. Dopants or defects within the device move throughout the channel to change its resistance via applied bias on Vin or Vout.

(b) Diagram showing the device design from Herrmann et al. [57]. Beneath the diagram is a planar scalability diagram for the device within a 3x3 matrix. The device from [57] uses oxygen vacancies to control the resistance of the device via a conductive path between TE and BE that is formed via voltage from Vgate. Positive voltage decreases resistance by pushing vacancies closer to the electrodes, while negative bias increases resistance by

12

drawing vacancies away from the electrodes. (c) The proposed double gated-memristive device from [59]. Below the diagram is a planar scalability diagram for a 3x3 matrix of the device. The Gate 1 and BE lines for the device are parallel. The device starts with vacancies along one side of the device channel. A high negative bias is then applied from Vp to Vn to initialize the device’s state to a high resistance. Positive bias can then be applied from Vp to

Vn to potentiate the device, while applying negative bias can depress it. The voltage across

Vp and Vn must be greater than the device’s Vt, otherwise the device will not change state.

This figure can be found in [59]. e. Neuron Circuits and Devices

The second component that needs designed when implementing a bio-inspired neuromorphic architecture is the neuron element. The neuron circuit’s purpose is to gather and process information given to it by the synaptic devices within the architecture, and then pass that information along to other synapses and neurons. The most basic form of an analog-based neuron circuit is Dr. Carver Mead’s Axon-Hillock circuit, introduced in his book Analog VLSI and Neural

Systems in 1989 [32]. The circuit is comprised of a pair of capacitors and six transistors and functions off an integrate-and-fire philosophy [32]. Voltage accumulates on the input node to the circuit to the point where it turns the buffer circuit ON. Once the buffer circuit switches ON, the voltage now on the output node of the circuit flips the reset circuit into the ON state and causes the voltage on the input node to deplete as the input current is redirected towards ground. This process shuts off the buffer, which shuts off the reset circuit element, and once again allows for voltage to accumulate on the input node. The circuit benefits from being extremely simple, however it has its limitations [60]. Since the introduction of the Axon-Hillock circuit, many alternatives to neuron circuits have been introduced.

13

In 2011, Indiveri et al. created a fairly comprehensive summary of all notable neuron circuits proposed to that point [61]. Since then, other neuron circuits have been developed [62-

65], although none of these neurons have been used in any large scale neuromorphic architectures yet. f. Application Space of Neuromorphic Computing

Neuromorphic computing can be applied to a wide range of applications where neural networks are used for artificial intelligence or machine learning. One space where the power and processing efficiency of neuromorphic computing can shine are within large data processing farms that large tech companies use such as Google, Amazon, and Apple. These types of facilities often care about processing overhead improvements and also the power consumption the computing systems require.

For these large data processing centers, processing information such as voice data can be taxing. Google has invented a simple solution to alleviate such burdens on their data centers via their TPU chip [66]. This chip acts as streamlined matrix multiplication processor that allows the system to quickly calculate the value of synaptic inputs to neurons within neural networks more efficiently. Other companies such as Intel with their Loihi chip are looking to build general purpose architectures to service widely variant demands upon a neuromorphic system [40].

One final space where neuromorphic systems can be of use is in embedded applications.

In AMD’s Ryzen processor line, a small neuromorphic circuit was used in order to assist with branch prediction [67]. Other specific applications could also find neuromorphic circuitry useful.

Power-constrained applications could become much more capable if neuromorphic circuitry was introduced, due to the potential for neuromorphic architecture to be low power [15]. One power-

14

constrained application that could be served well by neuromorphic computing is navigation, which will be discussed later in this dissertation.

15

III. The Neuron Circuit a. The Original Octopus Retina Neuron

In 2003, Culurciello et al. published a paper that covered the topic of image processing

[68]. Within the paper, they proposed a neuron circuit that was inspired by the neuron model from an octopus’s retina. Dubbed by Indiveri’s 2011 paper [61], the “Octopus Retina” neuron circuit assisted in helping perform some image analysis for the architecture used in Culurciello’s 2003 paper [68]. In comparison to other post-Axon Hillock neuron circuits, it is very small - only seven transistors and one capacitor (red section of Fig. 2). The circuit is also designed in such a manner to where it is very power efficient (0.043 pJ to set, 3.88 pJ to reset). These characteristics make it very attractive for scalable neuromorphic architectures. The downside to the neuron circuit however is that it has no self-reset mechanism, and instead requires external stimuli in order to reset on its Vreset node. If the neuron circuit were to be used within neuromorphic architecture, it would need to be modified in order to repeatedly spike similar to other neuron circuits. b. Expanding the Octopus Retina Circuit

In order to adapt the Octopus Retina neuron circuit for neuromorphic architectures, a few modifications have to be made. Within a bio-inspired neuromorphic architecture, the circuit is likely to be used with a large amount of analog circuit components. The usage of analog components can introduce noise into the circuitry. In order to help stabilize the input current given to the neuron, a current mirror is placed onto the input of the neuron (M1 and M2 in Fig. 2).

Next, the circuit needs some sort of self-reset mechanism to repeatedly spike given the input to the neuron. The simplest way to accomplish this task is to connect the original output node (voltage node that connects M9’s source and M10’s drain in Fig. 2) to the Vreset node of the

16

original circuit (M4’s gate in Fig. 2). When the output node of the neuron activates, that voltage will then automatically reset the circuit (similar to the reset mechanism of the original Axon

Hillock circuit). In order to add this functionality, a series of inverters were placed between the original output node and Vreset. This inverter array adds delay between the circuit turning ON and

OFF instead of the circuit immediately resetting once the output reaches a sufficient voltage (which will appear as if the circuit doesn’t fire at all). Also, the original Octopus Retina neuron circuit resets on the falling edge of Vreset; not the rising edge. The rate and behavior of this reset can be adjusted via the two MOSFETs between the first two inverters in the chain and ground and the bias applied to them (Vb).

Within a larger neuromorphic architecture, the neuron circuit should be expected to drive potentially highly variable loads or very high fan-outs of synaptic devices. If these loads or fan- outs were placed directly onto the original output node of the circuit, this could interfere with the neuron’s new self-reset mechanism. That output load can pull the output voltage low enough to a point where it can no longer turn ON, which means it can no longer fire. In order to solve this problem, a buffer is placed between the original output node and the true output to the circuit

(Vout). This modification allows the added buffer to act as a voltage divider between the neuron circuit and the output load on the circuit; allowing the circuit to operate in an independent manner from the current output load or fan-out.

17

Figure 2. Diagram of the Self-Resetting (SR) Octopus Retina neuron circuit [59]. The portion of the circuit highlighted in red is from the original version of the circuit proposed by

Culurciello [68] (sizes are in Appendix A). Both Vb nodes used a bias voltage of 0.4V.

The complete circuit can be seen in Fig. 2. The highlighted portion of the circuit (in red) is the original Octopus Retina neuron circuit from Culurciello’s previous work [68]. The circuit is closer to the size of other more modern neuron circuits that have been proposed in the past couple decades, but still only possesses a single capacitor (many neuron circuits possess two or more [61]) and possesses the power efficiency similar to that of the original neuron circuit. This newly modified version of the Octopus Retina neuron circuit is simply called the Self-Resetting

(SR) Octopus Retina neuron circuit. c. Circuit Profiles

Now that the circuit is fully modified to accommodate a neuromorphic environment, its profile must be extracted. Fig. 3 shows the frequency and duty cycle profile of the circuit when it is

18

constructed using the 180nm technology node available from Arizona State’s predictive technology models [69]. The frequency and duty cycle for each input current measured shows very predictable behavior for the SR Octopus Retina neuron circuit across the performance spectrum. The circuit can spike in the 50-100 Hz regime with as little input as 10 pA. The circuit can also reach operating frequencies of ~9 MHz when given its maximum input current (10 µA).

Beyond its maximum input current, its frequency and duty cycle profile remain constant around 9

MHz and 44%, respectively. This behavior is different than that of the Axon Hillock circuit, as it has issues with duty cycle saturation when it hits a sufficiently high current value on its input node, as the Axon Hillock’s output node will become permanently ON instead of spiking.

Figure 3. Frequency and duty cycle profile for the SR Octopus Retina neuron circuit [59].

19

The other critical portion of its profile that needs extracted is its power/energy profile. The power drawn by all sources on the circuit (including input) can be seen in Fig. 4. The power on the input predictably increases as the input current increases along with VDD. Past ~1 µA however, the power consumed by VDD begins to become constant. The power consumed by both of the voltage bias nodes also remains orders of magnitude under VDD for the full performance spectrum

(Vb1=Vb2=0.4V for all simulations shown here).

Figure 4. Power consumption profile for the SR Octopus Retina neuron circuit (including Iin port) [59].

20

Using the combined data from the frequency and power profiles and taking an average, the number for its average energy/spike can be calculated. Using the data from Figs. 3 and 4, the SR

Octopus Retina neuron circuit consumes on average ~1.07 pJ/spike [59] when using the previously mentioned 180nm technology node. This value places it in a very favorable spot amongst other previously designed neuron circuits. When looking at a neuron’s maximum operating frequency and its energy/spike factor (Fig. 5), the SR Octopus Retina neuron circuit competes with and surpasses many other neuron circuits proposed in the past decade. This makes it an ideal candidate for power-efficient, bio-inspired neuromorphic architecture for application-specific architectures.

21

Figure 5. Spiking energy efficiency of the SR Octopus Retina neuron in comparison to its maximum operating frequency [59] when compared to other previously published neuron circuits [39, 62-65]. Reference numbers in the figure are from [59].

22

IV. Synaptic Devices a. Introduction to Gated-Synaptic Devices

As mentioned previously, recent research has shown a growing interest in gated-synaptic devices. These devices have taken many forms in publications over recent years, with one of the earlier versions of a gated-synaptic device being a nanowire-based device [54]. The device in [54] developed by Sacchetto et al. was not specifically developed with neuromorphic computing in mind (it was proposed for general memory applications) but is mentioned as a potential application for the device.

In the past couple years, interest has resurged in the study of gated-synaptic devices as the field of neuromorphic computing has begun to mature. One of the more recent examples of a gated-synaptic device is from Lim et al.’s gated Schottky diode [55]. Lim et al.’s device is targeted specifically at neuromorphic applications and is shown to have a fairly linear potentiation curve, but a very sharp depression curve [55]. Regardless, in the simulated architecture in which the device was used, the network was able to obtain a 92-94% classification accuracy on MNIST using a three-layer perceptron neural network [55].

Another more exotic example of gated-synaptic devices is Murdoch et al.’s light-gated synaptic device [56]. The device has a standard two-terminals and is instead potentiated/depressed via light stimulation. The device shows promise for low-power neuromorphic applications, however it currently possesses a very low ON/OFF ratio, as the difference between its OFF and

ON current is only ~6 nA when potentiated with a white LED light [56].

One of the more common methods to develop gated-synaptic devices is to develop a device where the bulk of the device is one of many transition-metal oxides. The device’s conduction level

23

is then controlled by forming/destroying a metallic or oxygen vacancy filament within the bulk or influencing the concentration gradient of defects within the bulk channel. A primary example of this type of device was developed by Herrmann and Jha [57]. The device used Strontium Titanate

(SrTiO4 or STO) as its bulk material. The concentration of defects along the device’s conductive path were manipulated via bias applied to the device’s gate.

One of the last types of gated-synaptic devices are the more theoretical dual-gated synaptic devices. These devices have been proposed by works such as M and Ramakrishnan [70], Sacchetto et al. [54], and Bao et al. [58], but have remained either fairly theoretical [59, 70] or having to utilize difficult to scale bulk materials such as carbon nanotubes or liquid electrolytes [54, 58].

However, the potential for these devices is great in terms of simplifying neuromorphic architectures, so they are worth exploring. The first type of device explored in this dissertation will be a theoretical dual-gated synaptic device driven by a physics-based model. b. Superiority of Gated-Synaptic Devices

Gated-synaptic devices (GSDs) bring a pair of vital benefits over their two-terminal predecessors [71]. The first of these benefits is the capability of simultaneous read and write operations to the device. In a typical two-terminal device, the device requires use of both terminals to perform either a read or write operation [71]. In order to even select the device for a write operation, it is often placed into a “1T1R” configuration (1-transistor 1-resistor) with a transistor in order to act as a selector device for the memristor (Fig. 6) [72-24]. With a GSD reads are conducted via the two terminals along the device channel whilst write operations are conducted via the gate terminal (Fig. 6) [71]. Since read and writes occur via different terminals, these operations can occur at the same time. This feature can be of great use in situations where swapping between read and write modes constantly would require extra architecture or overhead for the

24

neural network in which they are used. Specifically, this feature is useful in applications where asynchronous processing occurs such as spiking neural networks.

Figure 6. Diagram that compares the difference between read and write operations in two- terminal devices and GSDs (e.g. Gated-RRAM) [71].

The second benefit that GSDs provide is that of more power efficient write operations [71].

Since in GSDs the gate terminal has an isolation layer between it and the device channel, very little current flows from the gate terminal to the device channel. This amount is so small in magnitude that often publications do not report this number in their findings [55, 75-76], or in the rare case they do it is a mere fraction of the device channel’s typical current from its input (i.e. top electrode or TE) to its output (i.e. bottom electrode or BE) terminals (Fig. 6) [57]. This phenomenon leads to write operations in GSDs being more power efficient than those of typical two-terminal memristors where write operations must occur across the entire device that needs to be of a higher conductance value.

25

c. Initial Device Model

The dual-gated synaptic device proposed here takes the form of the device seen in Fig. 1c and has been published in Jones et al. [59]. The device uses Niobium Dioxide (NbO2) as its bulk material and uses the mechanism of oxygen vacancy concentration gradient manipulation to potentiate/depress the device. The conduction mechanisms that govern the device’s conductive state within the theoretical model proposed are Frenkel-Poole and Schottky/Ohmic conduction.

The current through the device at any given time is given by the effective voltage applied to the device’s gate (Veff = Vp – Vn) divided by the sum of the resistances within the device’s bulk created by the Frenkel-Poole and Schottky/Ohmic conduction mechanics.

푉푖푛 − 푉표푢푡 퐼푠푦푛 = (1) [59]. 푅푓푝 + 푅푆푂

Both Rfp and RSO are controlled by the concentration of vacancies within the device’s bulk along its conductive path (Nc). This conductive path is a filament formed along one side of the bulk prior to use. This filament could be formed during the manufacturing process of the device or peripheral circuitry could be placed into the architecture to form the filament. A negative Veff is then applied across the device in order to initially place in its high resistive state (HRS). The value of Nc at any given time is defined by the equation

푤푚푎푥 푁푐 = 푁푐,푚푎푥 + 푁푐,푚푖푛 (2) [59], 푤푐 푤푚푎푥 + 푒푥푝 (−( − 푤푚푎푥)) 푤푚푎푥 where wmax is the maximum width of the conductive path within the device and wc is the current width of the conductive path. Nc,max and Nc,min are the maximum and minimum vacancy concentration values for the device, respectively.

The value of wc at each time step of simulation is given by the equation

26

푡푠푡푒푝µ표2푣푎푐푉푒푓푓 푤푐 = 푤푐,푡−1 + − 푑, (3) [59], 푊푐ℎ where tstep is the amount of time between the current and last simulation iteration, µo2vac is the mobility of oxygen vacancies within a vacuum, Wch is the width of bulk device, and wc,t-1 is wc from the last time step. The value of d within (3) represents a term often not modeled within memristive devices; its volatility or rate of decay for the device’s conductive state. Most memristors are modeled as ideal devices that are assumed to be non-volatile or nearly non-volatile

[81, 82], but this model will include that volatility for potential benefits when implemented in architecture (described in Section V). The value of d is given by the Gaussian equation

2 푑푚푎푥 −(푁푐 − 훭푛) 푑 = 푒푥푝 ( 2 ) (4) [59], 휎푛√2휋 2휎푛 where dmax is a constant that controls the maximum decay rate for the device, and Mn and σn are the average and standard deviations of the device’s vacancy concentration value, respectively

(calculated based off Nc,min and Nc,max).

The conceptual functionality of the device can be seen in Fig. 7c. When a Veff that is large enough (>Vt for the device) is applied, the device’s resistance reduces. The device’s resistance will then increase when no bias is applied due to d. The resistance can also be increased of a large enough negative bias is applied to the device (<-Vt). d. Results of Initial Version

In order to test the functionality of the device, a simple experiment was created where a read voltage of 1V was applied from Vin to Vout across the device. Veff was then applied at a range of magnitude over a series of trials in order to observe the rate at which the device potentiates. The results of these simulation trials can be seen in Fig. 7a. As Veff increases, the rate at which the device increases also increases. The rate at which the device potentiates does have diminishing

27

returns, however. This phenomenon is primarily due to the sigmoidal nature of the equation that represents Nc. The set time of the device when vt = 1.8V ranges from ~4ms to ~0.6ms as Veff is increased from 2V to 10V.

In order to observe the conductive state of the device decaying properly, a second set of simulations was created. In these simulations, Veff was set to a fixed value of 10V at t = 0. At specific points along the conductance curve for the device, Veff will be set to 0V so the state of the device can decay over time. The results of this set of simulations can be seen in Fig. 7b. As Veff is set to 0V at different points along the conductance curve, the rate of decay at each point varies in a Gaussian fashion (in accordance with the equation that represents d). This type of decay was chosen as it has appeared in devices such as Rush and Jha [48] where the decay of the device was highest in the mid-range conduction region of the device and lowest at the high and low-range conduction regions of the device. Although the synaptic device shown by Rush and Jha was a two-terminal device, it used NbO2 as its bulk material, so the same decay mechanism can be proposed to be in this device as well.

28

(a) (b)

(c)

Figure 7(a). Synaptic current (Isyn) during transient simulations where a constant voltage bias is applied to Vp and Vn. The profile of the current curve is that of a sigmoid, as (2) describes. (b) Plot that depicts how decay changes based upon the state of the synaptic device.

The decay is Gaussian (per (4)) and can be seen in its various states as the constant bias applied across Vp and Vn is removed at different times. (c) Diagram describing conceptually how the device’s resistance changes when biases of opposite polarities are applied

29

simultaneously to Vp and Vn. If no overlap occurs, there is no change in the device’s resistance besides inherent decay. This figure can be found in [59]. e. Generic Synaptic Model

Despite the potential for the dual-gated synaptic device to be useful within neuromorphic architectures, it does remain a theoretical device due to fabrication complications in designing such a device. Therefore, a more realistic approach could be used when wanting to make neuromorphic architectures with currently available technology. In order to properly model any type of gated- synaptic device, a generic model should be used so the device can be varied to meet the large variety of neuromorphic architectures that currently exist. Unfortunately, no generic model for gated-synaptic devices exists for hardware simulation, so one was created within Verilog A to be used in SPICE environments. A block diagram of the gated-synaptic device can be seen in the diagram within Fig. 8.

Figure 8. A basic schematic diagram that shows the basis for the generic Verilog A model interprets a GSD. Programming of the device (i.e. write operations) is primarily done via the

30

gate terminal whilst signal analysis (i.e. read operations) is done via the device channel between Vin and Vout [78].

In order to create a generic GSD model, a survey of currently published devices that can be classified as GSDs was conducted. This model will be focused on capturing temporal behavior

GSDs exhibit when exposed to stimulus. During the survey, a total of seven devices were studied from previous publications in order to understand the general behavior of currently developed

GSDs [55-58, 75-77]. The devices studied cover a wide selection of implementation methods. The resulting model and its results were submitted to IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (TCAD) [78]. The different species of devices are listed below with descriptions of the devices that fall underneath each type.

Silicon – This type of GSD is built purely out of traditional silicon and silicon compounds normally used during CMOS fabrication. In this survey, one device of this category was studied from the previously mentioned Lim et al [55]. Lim et al’s device is built using a set of cleverly designed diodes in order to obtain a memristive behavior.

Gated-ReRAM – This category of GSD covers a wider range of devices than the other categories studied in this survey. For a device to be classified under this category, it should follow similar design notions as traditional two-terminal memristors that have been developed in the past.

These characteristics include possessing a channel that is constructed from a material such as transition metal-oxide and utilizes direct electrical stimulation to the channel in order to potentiate/depress the device. The key difference between these devices and traditional two- terminal ReRAM devices is that gated-ReRAM devices have their memristance adjusted primarily via the added gate-terminal attached to the device channel. Three devices in this survey fall under this category: Herrmann et al’s SrTiO4 gated-ReRAM device [57], Burgt et al’s polystyrene

31

sulfonate device [75], and Tang et al’s ECRAM device [76]. Despite the naming differences between these three publications, all three can be considered gated-ReRAM due to how they are designed and how they operate.

Liquid Electrolyte – Liquid electrolyte GSDs possess the exotic property of using a liquid- based medium for their device channel during operation. Electrical stimulation of the device is then performed via an additional gate terminal attached to the fluid channel to adjust the memristance of the device. Only one device of this type was studied in the survey to create the

GSD model from the previously mentioned Bao et al [58].

Light-Gated – While these devices technically do not possess a physical gate terminal, light-gated GSDs can qualify as GSDs via their unique property to be potentiated via light exposure. Two devices from Murdoch et al [56] and Tan et al [77] that can be classified as light- gated GSDs were studied in the creation of this model. For simplicity’s sake in the creation of the model, the light exposed to the device is treated as an analogous voltage applied to the gate terminal of the model in order to potentiate the device.

In the process of studying the devices in this survey, the goal was to isolate general behaviors seen across all GSDs while also locating certain behaviors that only occurred in specific devices. The general behaviors extracted from all devices would guide how the core of the behavioral model operated while behaviors specific to certain cases would direct how specific parameters or edge-cases would be handled by the model.

The core operation of a GSD entails five key characteristics. First, every conductance curve

(with respect to time) possesses a “general shape.” This shape can change from device to device, but typically lies on a spectrum from appearing like an inverse-exponential curve, linear curve, or sigmoidal curve. The spectrum ranges from inverse-exponential on one end to sigmoidal on the

32

other with linear in the middle. In order to encapsulate this behavior within the GSD model, a parameter defined as gc controls the general shape of the curve and ranges from 0 to 1. If the GSD model is potentiated at a constant rate with respect to time with varying values of gc, the shape of the curve will vary in response. A result of such a situation can be found in Fig. 9.

Figure 9. Simulation example of the GSD model that demonstrates the influence of gc on the general shape of the conductance curve with respect to time. As gc varies from 0 to 1, the curve changes from inverse exponential, to linear, to sigmoidal. All other parameters of the

33

model during this example were set to their default values shown in Table 1 while vin=1V and vgate=1V [78].

The second core behavior all GSDs exhibit is having a minimum amount of time that is required in order to reach their highest potentiated state. This time can vary massively from device to device, with some devices taking many seconds to fully set while others only take microseconds.

Within the GSD model, this behavior is capture by a variable defined as tset.

The third core characteristic shown in all GSDs is possessing a minimum and maximum conductance value for its device channel. Devices (usually) start in their lowest conductive state prior to operation, and then can be potentiated to their highest conductance value once operation as begun. These conductance value limits are represented within the model by the terms gmin and gmax. These terms will bound the general shape of the conductance curve defined by gc to a specific range.

The fourth property that all GSDs show is the propensity of showing some degree of state- based, short-term decay to the device’s conductance state. This behavior can be seen after a device has been quickly potentiated. After potentiation, the device will proceed to decay back to a lower conductance value over a period that ranges from milliseconds (or lower) to seconds. The rate at which this the device decays within the model is controlled by the term, rstp.

The fifth and final characteristic all GSDs possess is simply long-term plasticity. Even though each GSD exhibits some form of short-term decay, if potentiated long enough it will maintain a high conductance state for a long period of time or (in terms of the application in which the GSD is implemented) indefinitely. This capability to hold conductance states for a long time is primarily represented by two terms within the model defined as qltp and rltp. The term, qltp, is the

34

primary driver for how strong/quickly long-term plasticity occurs within the device while rltp determines the degree to which this long-term plasticity disappears/decays over time.

In addition to these core behaviors every GSD have, a plethora of other device-specific properties and behaviors also appeared within the survey. For example, gate threshold voltage is one property common in traditional MOSFETs but also appears in one of the devices surveyed

(Herrmann et al [57]). This property is defined as vt within the model with the term tc controlling the type of threshold that exists in the device (soft/hard). Some devices can be programmed, reset, or influenced by bias applied to their device channel despite the gate being the primary avenue for device programming (e.g. both light-gated GSDs in Murdoch et al [56] and Tan et al [77]). This behavior is controlled by a term in the model, oc. Other devices (e.g. Lim et al [55]) have a very asymmetric potentiation/depression curve. The asymmetry brought upon the device can be represented with a term such as namp which can adjust how quickly a device is depressed. One device (Burgt et al [75]) even relies on an inverted bias application scheme to manipulate the device’s conductance (i.e. negative bias potentiates the device and positive bias depresses it). Since this behavior was core to this one device’s functionality, it was placed into the model parameter, f, which controls whether you need a normal or inverted applied bias to manipulate the device.

Another term that does not necessarily happen in specific designs of devices but can happen due to device to device variance or previous experiments performed on the device is the starting conductance value of a device. This potentiation level can be controlled by the term, xstart, which could also prove useful in setting up initial random weights in higher, system-level simulations for neuromorphic architecture if it is ever needed.

One final property that is often desirable in GSDs and is shown by one device in this survey is having diode characteristics within the device channel. The prime example within the survey of

35

this behavior is from Lim et al [55] since the device is comprised of multiple diodes. This diode behavior leads to the device having a lower conductance value when placed into reverse bias than when it is placed into forward bias. This behavior allows the system to keep all current flowing forward through the neural network in which its used and prevents sneak current from popping up within the synaptic grid which can alter the outcome of a network’s converged state. This reverse bias diode behavior is controlled by a term, brev. This term only dictates diode behavior during reverse bias, and not forward (since the current-limiting factor in forward bias will be the device channel’s general ohmic conductance). This term can be seen in action as it is varied between 0 and 1 in Fig. 10.

36

Figure 10. Simulation example demonstrating the diode behavior exhibited by the generic model when placed under reverse bias and brev is dropped below 1. As brev is decreased, the strength of the diode behavior while under reverse bias increases. While in forward bias, no diode behavior is observed since the typical device channel ohmic conductance limits the curve to a linear relation. All parameters for the model were set to the default values shown in Table 1 with vgate=GND (unprogrammed) and vin being swept from -3V to 3V [78].

37

Every parameter that is used to control how the behavioral GSD model behaves is described in Table 1. All parameters can be defined by the user prior to simulation to change how the model operates. The goal of the parameters defined in Table 1 is to give the user a very intuitive way to control how the device the model is trying to encapsulate operates at a high level. This type of generic emulation allows for the wide range of GSDs shown in the survey to be represented by the model.

Table 1. User-Defined Model Parameters for the Generic GSD Model [78]

Parameter Default Range Description Value

gc 0.0 0 to 1 Central fitting parameter that controls the device’s overall transient conductance curve shape.

Defines whether the behavior during reverse bias is that of a memristor/resistor (brev=1) or diode brev 1.0 0 to 1 (brev=0), or in between (0

gmin 1e-11 S

Maximum conductance value. Acts as an asymptote for inverse exponential/sigmoid portion of g 1e-6 S >g max min conductance equation.

tset 1e-6 s >0 Ideal set time of device (assuming no decay and potentiation @1V above threshold voltage on vgate).

vt 0.0V ≥0 Threshold voltage for potentiation/depression.

Controls amplified depression when negative bias is applied to v . The higher the value of n , the n 1 >0 gate amp amp higher the amplification.

Dictates control of how much the bias across the device channel (vin-vout) influences the effective oc 0.0 0 to 1 voltage applied to the device via vgate. oc=0 means it has no influence, and oc=1 means the channel bias is fully included in calculating Veff.

Dictates control of how much vt emphasis occurs when calculating the change in x and xmin. tc=0 is tc 0.0 0 to 1 none, while tc=1 is a perfect difference between vgate and vt.

rstp 0.0 ≥0 Rate of decay for short-term plasticity. Magnitude often inversely proportional to set time (tset).

Controls the quality of the long-term potentiation rate of the device. The higher the value of qltp, the qltp 0.0 0 to 1 closer the long-term potentiation conductance value is to gmax as the device is programmed.

Controls the rate of decay for long-term plasticity. The higher the value, the sharper the rate of decay r 0.0 ≥0 ltp for the long-term plasticity state.

Indicates if the polarity at which the device is potentiated/depressed is flipped or not. f=1 means it’s f 1 1, -1 not flipped, while f=-1 means it is.

Parameter that defines the initial conductance state of the device. xstart=0 means the device is in its xstart 0.0 0 to 1 lowest conductance state, and xstart=1 means it's in its highest conductance state.

38

To further demonstrate how the model affects general operation of a GSD, a schematic example of a traditional potentiate, decay, and reset test can be seen in Fig. 11. In this diagram every parameter is shown how it typically changes certain parts of a device’s conductance curve with respect to time in order to provide an intuitive, graphical avenue for users to understand the model.

Figure 11. Conceptual plot diagram that depicts a typical potentiation/decay/depression curve of a GSD. Within the diagram, the locations of where each simulation parameter has influence is labeled. Certain parameters will only exert themselves under certain testing conditions (e.g. namp), while others will be noticeable in all simulations (e.g. gmin and gmax) [78].

The GSD model is guided by a set of equations that utilizes the parameters defined in Table

1 to determine the device’s potentiated state and channel current at any given time. For the model to calculate its channel current at any given time (Isyn), the piecewise equation

39

푔푠푦푛∆푉, ∆푉 ≥ 0 퐼푠푦푛 = { ∆푉 (5) [78], 푏푟푒푣푔푠푦푛∆푉 + (1 − 푏푟푒푣)푔푠푦푛(푒 − 1), ∆푉 < 0 is used. The term ΔV within (5) is defined as the voltage potential across the two channel terminals

(Vin and Vout) of the device and is given by

∆푉 = 푣푖푛 − 푣표푢푡 (6) [78].

The other previously unmentioned term within (5) is gsyn, which represents the GSD’s conductance at the given point in time. This term is calculated by the lengthy equation

−푝푥 푔푠푦푛 = max(1 − 2푔푐, 0) (푔푟푎푛푔푒(1 − 푒 )) + (−푎푏푠(2푔푐 − 1) + 1)(푔푟푎푛푔푒푥 + 푔푚푖푛)

푔푚푎푥 + max(2푔 − 1,0) (7) [78]. 푐 1 + 푒−푚푥+푠

The length of (7) is due to how it encapsulates the spectrum of general shapes a conductance curve can take with respect to time from device to device. There are three components with (7) that each define the inverse exponential, linear, and sigmoidal components of the device’s conductance, respectively. The term, gc, plays a key role in determining how much of each component appears in the calculated conductance in the form of the max() and abs() coefficient functions. These coefficient functions are the ones that determine the general shape of the curve as shown in Fig. 9 where gc=0.0 is fully inverse exponential, gc=0.5 is fully linear, and gc=1.0 is fully sigmoidal.

Within (7), there are a set of pre-calculated simulation parameters that are wholly reliant upon the values given by gmin and gmax when the user defines them. The first of these four terms, s, is used to properly offset the sigmoidal curve component and is defined by

푠 = ln(푔푚푎푥/푔푚푖푛 − 1) (8) [78].

Another parameter within the sigmoidal component is also pre-calculated, m. The purpose of m’s existence is to ensure the sigmoidal curve scales at the correct rate and is calculated as

푚 = ln(1/푔푟푎푛푔푒 − 1) + 푠 (9) [78].

40

Within (9) and in the original gsyn equation the term, grange, appears. This term is simply the difference in between the minimum and maximum conductance values defined by the user,

푔푟푎푛푔푒 = 푔푚푎푥 − 푔푚푖푛 (10) [78].

The final pre-calculated parameter, p, is solely used within the inverse exponential component of

(7). Its purpose is like that of m’s where it exists to ensure the inverse exponential component scales at the correct rate. In order to calculate p, the equation

푝 = −ln(푔푚푖푛/푔푟푎푛푔푒) (11) [78], is used.

The one term that primarily controls how gsyn varies over time is x. This term is a unitless variable that tracks how “potentiated” the state of the device is and can vary from zero to one.

When determining how much x should change by from one point to another, the first thing that must be calculated is the scaling factor for x, xscale, which is defined by

∆푡 푥푠푐푎푙푒 = (12) [78], 푡푠푒푡 where Δt simply represents the amount of time between t and t-1. If one were to be running this model within a SPICE environment, this time is simply the difference between the current and previous time steps. Once the scaling term is calculated, the next thing to calculate is the effective voltage that is applied to the device channel. This term, defined as Veff is given by the equation

푉푒푓푓 = 푓푣푔푎푡푒 − 표푐∆푉 (13) [78].

Within (13) the terms f and oc can be seen which can affect what type/polarity of bias affects the conductance of the device as previously described. For the edge case shown in Burgt et al f would be set to -1, whilst if your device is manipulated additionally by channel potential the oc can be made a non-zero value. In a traditional device (e.g. a traditional gated-ReRAM device), (13) can simply read as Veff=Vgate (i.e. only being affected by gate bias). There is one special condition on

41

the value of Veff. If the polarity of Veff is negative and has surpassed the threshold voltage for the device (if there is one defined), the model will recalculate Veff to also include the namp term previously described that is utilized to model devices such as Lim et al [55]. This recalculation of

Veff is defined as

푉푒푓푓 = 푛푎푚푝푉푒푓푓 (푖푓 푉푒푓푓 < 0 푎푛푑 푎푏푠(푉푒푓푓) > 푣푡) (14)[78].

Once Veff has been readjusted (if needed), if the value of abs(Veff) > vt, a change in x will be calculated given by Δx. If x is between the minimum value allowed to x (xmin, which begins the simulation at 0) and 1, the change will be calculated. If x is at its operational bounds, the change will be set to zero. This calculated is defined by the piecewise equation

0, 푥 ≥ 푥 ≥ 1 ∆푥 = { 푚푖푛 (15) [78]. 푥푔푎푡푒 − 푑푠푡푝, 푥푚푖푛 < 푥 < 1

The first term that is used to calculate the change in x is xgate and is calculated by the term

푥푔푎푡푒 = 푥푠푐푎푙푒(푉푒푓푓 − 푠푖푔푛(푉푒푓푓)푡푐푣푡) (16) [78], that incorporates the previously described scaling term for x (xscale), the effective voltage applied to the gate (Veff), the threshold voltage for the gate (vt), and the term that describes the type of threshold vt is for the GSD (tc). The other term in (15) used to calculate Δx is dstp and represents the amount of short-term decay that is occurring at this timestep within the GSD. This term is state based in nature where the magnitude of decay is directly proportional to the potentiation level of the device. This term is defined by the equation

푑푠푡푝 = 푟푠푡푝푡푠푒푡(푥 − 푥푚푖푛)∆푡 (17) [78], and utilizes the rstp term that controls the rate at which short-term decay occurs.

The final aspect of the GSD that needs modeling is the fifth and final core behavior all

GSDs exhibit which is some form of long-term plasticity. Within the model this is represented by a sub-term, xmin. This term acts as a minimum bounding term for x at any given time as described

42

in (15). If Veff surpasses the threshold voltage for the system, a change in xmin is calculated which is defined as Δxmin. The value for xmin is bounded between 0 and 1 and its change is calculated by the equation

0, 0 ≥ 푥푚푖푛 ≥ 1 ∆푥푚푖푛 = { (18) [78]. 푞푙푡푝푥푔푎푡푒 − 푑푙푡푝, 0 < 푥푚푖푛 < 1

The long-term potentiation within Δxmin is calculated by the component qltpxgate whilst the decay of the long-term potentiation is controlled by the second component, dltp. The term, dltp, is calculated by

푑푙푡푝 = 푟푙푡푝푡푠푒푡∆푡 (19) [78].

All the equations within the model are controlled by the algorithm shown in Algorithm 1. This module is defined within Verilog A and is then instantiated within a SPICE environment when ran. The testing then takes place within the SPICE environment. This model can also be converted to other code frameworks such as python or C++ if a custom device simulation environment is available to test the device in those environments given that the environment simulates the device with respect to time in some manner.

43

Algorithm 1. Generic Model for GSD [78]

Algorithm 1 GSD_module (vin, vout, vgate)

1: Define User-Defined Parameters (vt, brev, gmin, gmax, tset, rstp, gc, namp, oc, tc, qltp, rltp, xstart, f) 2: Define/Calculate Parameters (grange, s, m, p, xmax) (Eqs. 8-11) 3: Define Variables (veff, x, gsyn, tcurr, tpast, xscale, xmin) 4: Begin Behavioral Block 5: Calculate xscale and veff (Eqs. 12 and 13) 6: if abs(veff)>vt 7: if veff<0 8: Apply namp to veff (Eq. 14) 9: Calculate x and xmin (Eq. 16, Eqs. 15, 18 bottom condition) 10: if x>xmin 11: Calculate dstp and Apply to x (Eq. 17) 12: if xmin>0 13: Calculate dltp and Apply to xmin (Eq. 19) 14: Bound xmin between 0 and xmax (Eq. 18 top condition) 15: Bound x between xmin and xmax (Eq. 15 top condition) 16: Calculate gsyn (Eq. 7) 17: if vin-vout≥0 18: Calculate Isyn (do not include diode component) (Eq. 5 top) 19: else 20: Calculate Isyn (include diode component) (Eq. 5 bottom) 21: End Behavioral Block

f. Generic Synaptic Model Results

In order to sufficiently test the behavioral GSD model, rigorous testing was conducted within a HSPICE simulation environment where the GSD was instantiated as a Verilog A module.

The survey of devices used to construct the model was then also utilized in testing the robustness of the model in a plethora of temporal evaluations. Across the seven devices surveyed, 26 individual tests were selected from the previously published papers [55-58, 75-77]. These 26 experiments would be emulated within the HSPICE environment and have the GSD model’s user- defined parameters manually tuned in order to fit the behavior observed in the experiments selected. These tests vary from gate pulse tests, drain sweep tests, gate sweep tests, multi-state evaluations, etc. in order to confirm the model emulates the behaviors seen in the surveyed GSDs.

For all 26 tests emulated within the HSPICE environment, Table 2 contains all user-defined parameters necessary to capture the desired behavior from each test. The original experiment’s figure from the surveyed publication is labeled on the left side of each row. To the right of that

44

label is the subfigure label within Fig. 12 where the simulated version of the selected GSD experiment resides. A primary goal for each publication’s surveyed device was to keep the user- defined parameters as similar as possible since the model should encapsulate each device in all cases and not just one. However, this goal was not always achieved.

There are multiple explanations for differing parameter values between individual tests within a single publication. The first of these reasons is due to certain tests utilizing differently designed devices between tests (such as in Tang et al. [76]) or potentially not mentioning when different devices are used for different tests. Another reason parameter values can vary is due to simple device to device variance if using more than one device across different tests. Furthermore, those devices could have been used in a sequence of tests that altered their behavior as each test was performed. Some tests might apply extremely high voltage to a device’s gate that might be useful in demonstrating a behavior in one test, but that drastic change to the device’s state in other tests could’ve physically altered the device to a degree where the behavioral GSD model is unable to further emulate the original behavior. Finally, not all the tests within each surveyed publication were described in full detail. Some experiments had unknown read voltages, programming times, times between reads, etc. This unknown can also alter the behavior of a test or increase the difficulty in replicating the test within simulation.

45

Table 2. User-Defined Parameters for GSD Experiment Replications (Fig. 12)

Bao et al. [58]

Burgt et al. [75] i Herrma nn et al. [57]

Lim et al. [55]

Murdoch et al. [56]

Tan et al. [77]

Tang et al. [76]

Fig. 12 contains the full results of the emulated tests from the seven surveyed devices. The solid lines shown in each subfigure are from the simulation whilst the colored stars shown within each subfigure signify graphically extrapolated critical points from the original experiments that were used in obtaining the values shown in Table 2.

The first GSD device surveyed was developed by Bao et al [58]. This device is an exotic species of GSD that utilizes a liquid electrolyte for its memristive device channel. Four experiments from Bao et al were selected for emulation within an HSPICE environment and are shown in Fig. 12a-d. These tests primarily focus on applying voltage pulses to the gate of the GSD and observing decay of the device’s conductance over time. A single test in Fig. 12d instead demonstrates a repeatable set/reset capability of the Bao et al device. One critical event that occurred during the evaluation of this liquid electrolyte GSD took place in Fig. 12a. Within the work published by Bao et al, there is no explicit mention of a gate threshold voltage [58]. In the emulation of the device in Fig. 12a however, it was discovered that in order to obtain the low value

46

of current when a 1V pulse was applied to the gate and also simultaneously get the higher current value achieved when applying a 5V pulse to the gate, a gate threshold voltage must be utilized. A gate voltage of 0.7V was used in order to capture the levels seen in the original experiment. This type of behavioral extraction shows the GSD model can directly reveal possible parameter values not originally described in the original experiment.

The second device surveyed comes from Burgt et al and proposes a more standard resistive-

RAM type of GSD with a twist [75]. Instead of requiring positive voltage to the potentiate the device and negative bias to depress it, these polarities are inverted. Negative bias potentiates the device whilst positive bias depresses it. The five experiments from Burgt et al selected for replication can be seen in Fig. 12e-i. These tests are split between set/reset and set/decay tests along with one test shown in Fig. 12e how the device can cycle through five explicit conductance states given certain voltage values on its gate. The GSD model’s capability of replicating this experiment to the level it did shows further robustness in how it captures the memristive behavior of each device studied.

The next device surveyed comes from work done by Herrmann et al [57]. This device is a traditional resistive-RAM GSD that utilizes SrTiO4 for its device channel. The four experiments replicated from Herrmann et al are shown in Fig. 12j-m. One benefit to the work shown on this device is that an explicit threshold voltage for the gate is already defined, vt=0.788V. This value was obtained in the work already conducted in the work and was also used in replicating all four experiments from the paper. These tests primarily focus on sweeping either the gate or input terminal to the device channel to observe how its channel current evolves. One final experiment in Fig. 12m also shows what occurs when a DC bias of certain magnitude is applied to the gate over time. Something observed in the lower levels of bias applied to the gate is a slight decay of

47

the device’s conductance over time. This type of behavior is not able to be captured by the GSD model utilized here due to its design. However, this decay in the device’s conductance disappears at higher gate voltage values. Since the decay only occurs at lower gate voltages (e.g. Vgate=1V) and the channel current values at this bias are multiple orders of magnitude smaller than the ones at higher gate voltage values, the lack of decay emulation in this circumstance can be considered insignificant.

The fourth device surveyed in Table 2 and Fig. 12 is one developed by Lim et al [55]. This device is highly CMOS compatible due to be created purely out of a set of cleverly designed diodes. The experiments selected for emulation from this work can be seen in Fig. 12n-p. Two of the experiments focus on simple set/reset curves that show a highly asymmetrical relation between the potentiation and depression rates for the device. The other test shown in Fig. 12n shows the conductance of the device when under reverse bias conditions. Since the device is built using diodes, a reverse bias diode curve (absolute value) can be observed. The sharpness of the curve is not quite the same as the original experiment as the higher pulse counts to the gate, but the end result values are the same. This type of behavior in reverse bias is often highly desirable in synaptic devices to ensure that current only flows one way through a synaptic layer for processing (i.e. to prevent sneak current).

The next device surveyed is the first of two devices that do not technically have a gate terminal, but instead can be defined to have a “gate” via light stimulation to the device. This first light-potentiated GSD developed by Murdoch et al [56] had three experiments replicated in Fig.

12q-s. The tests for this device all primarily focused on potentiation/decay evaluation. One test in

Fig. 12r also included a periodic reset signal to the output terminal of the device channel in order to reset the device. This behavior from this light-potentiated GSD was correctly emulated by the

48

model via the user-defined parameter oc, which was set to one in order to allow for the bias across the channel to have a high degree of influence over the effective voltage (Veff) that changes the memristance of the device at any time.

The other light-potentiated GSD that was studied was one developed by Tan et al [77].

This device proved one of the trickiest to emulate and possessed an interesting behavior that was extremely difficult for the GSD model to show. Although it closely emulated the original device’s behavior in Figs. 12t and v, the behavior shown in Fig. 12u could not be captured fully. According to the work done by Tan et al, due to the device’s design it exhibits a physical phenoma known as negative differential resistance (NDR) [77]. This behavior allows the device to conduct increasing amounts of current even when applied voltage to the device is being decreased. Despite aggressive tuning of parameter values in order to obtain NDR behavior in Fig. 12u, the degree to which NDR was shown after the device was programmed could not be achieved. This experiment is the only circumstance within the 26 experiment simulations conducted where the model failed to capture a device’s behavior.

The final device studied is one developed by Tang et al and IBM [76]. This device is touted as, “Electro-chemical RAM” (ECRAM), and is similar to resistive-RAM GSDs. Multiple devices of different sizings were studied in this work shown in Fig. 12w-z. These experiments primarily concerned set/reset tests to show the (mostly) symmetrical behavior of the device during potentiation and depression.

49

Figure 12. Figure showing all replicated tests from previous publications that were documented in Table 2 [78]. Critical points that were utilized to replicate each experiment are depicted as stars. (a)-(d) Bao et al. (e)-(i) Burgt et al. (j)-(m) Herrmann et al. (n)-(p) Lim et al. (q)-(s) Murdoch et al. (t)-(v) Tan et al. (w)-(z) Tang et al.

50

After conducting the 26 experiments in Fig. 12, it can be stated with a strong certainty that the generic GSD model is a success. Outside of device or testing variance, or unusual physical phenomena such as NDR, the model can emulate a very wide range of devices at the behavioral level through a large array of tests. This type of model will hopefully be of great benefit to anyone interested in utilizing GSDs within their neuromorphic architectures. g. Using the Generic Synaptic Model for a NbOx Gated-Synaptic Device

Now that the generic device model has been verified against previously published data, it can also be verified against a device during its prepublication phase as well. Experiments were performed on a gated-synaptic device by Aaron Ruen in a work submitted to ACM’s Journal on

Emerging Technologies in Computing (JETC) [71]. This GSD is of the gated-ReRAM species and possesses a niobium oxide (NbOx) channel where memristance is maintained. The device underwent a total of three temporal experiments to observe its behavior with respect to time. The experiments were then replicated in simulation with the generic GSD model with minimal adjustment to parameters between experiments. The parameters for all three experiment emulations can be seen in Table 3. The only primary values adjusted between experiments in Table

3 are the gmin, gmax, and tset values primarily due to device-to-device variance. Otherwise values remain the same.

51

Table 3. User-Defined Model Parameters for NbOx GSD Experiment Fits (Figs. 13-15) [71]

Fig. 13 Values Fig. 14 Values Fig. 15 Values

The first experiment emulated in [71] is the measurement of the device channel’s current after logarithmically increasing pulse counts are applied to the gate. The results for both the original experiment and the simulation can be seen in Fig. 13. Overall, the extreme gate voltage biases are replicated with high precision, whilst intermediate values more closely fit at higher pulse counts. The variance of intermediate voltage values applied to the gate at lower pulse counts appears significant on log scale (as shown in Fig. 13) but are overall very minor in terms of linear scale.

52

Figure 13. Channel current of a gated-ReRAM GSD compared to a fitted version of the generic model previously described as programming pulses were applied to the device’s gate terminal [71]. The data from the gated-ReRAM device was obtained by Aaron Ruen.

The second experiment emulated can be seen in Fig. 14. This experiment used a similar logarithmically increasing pulsing scheme to the gates as seen in the first experiment. However, this experiment instead swept the input voltage of the device channel (VTE) after applying pulses

53

to the gate instead of reading a single channel current value. The model predicts slightly lower values than measured at lower pulse counts and then begins to slightly overestimate at higher pulse counts. The model appears to predict a more sudden change in conductance than what was measured in the device, but still within reason (i.e. an order of magnitude).

Figure 14. Channel current of a gated-ReRAM GSD compared to a fitted version of the generic model previously described as its input terminal had a bias swept across it whilst its output terminal was grounded [71]. This process was done multiple times at logarithmically increasing intervals of pulses applied to the gate. The data from the gated-ReRAM device was obtained by Aaron Ruen.

The final experiment replicated in [71] is one that focuses on the decay of the device’s state after being preprogrammed (Fig. 15). Similar results are once again observed where

54

underestimation occurs at lower programming pulse counts, and overestimation occurs at higher pulse counts. Overall, the model predicts a very similar end result to the result shown in the experimental data.

Figure 15. Channel current of a gated-ReRAM GSD compared to a fitted version of the generic model previously described as the device’s conductive state decays over time [71].

55

The device was programmed by different amounts of voltage pulses to its gate prior to the decay experiment. The data from the gated-ReRAM device was obtained by Aaron Ruen.

Emulating a fabricated device such as the one in [71] can be useful for future development of neuromorphic hardware. The generic GSD model shown here will prove useful when trying to capture certain device behavior (such as decay) other models do not. This type of decay behavior can be functional in certain types of neuromorphic designs such as certain species of recurrent neural networks.

56

V. Architecture a. Associative Memory

A core concept that unites all neural networks and neuromorphic computing architectures is the process of recognizing/reconstructing patterns. This process is how all neural networks and neuromorphic architectures learn when given information, and mimics how organisms that are capable of learning do so [83]. One of the most fundamental studies that demonstrates this behavior in complex organisms is Pavlov’s work on classical conditioning [84]. In Pavlov’s work, he showed how complex organisms such as dogs could have certain behaviors conditioned to stimuli that previously wasn’t linked to the behavior. Through the process of ringing a bell and then giving food to a dog, Pavlov was eventually able to get the dogs to salivate (a behavior linked to seeing the food) by just ringing the bell before food was even presented.

The concept of classical conditioning Pavlov demonstrated in this study is the core concept to a basic type of learning that occurs within neural networks called associative learning.

Associative learning is the process in which two concepts or ideas become related within the neural network by being shown to the network simultaneously or in sequence with one another [85, 86].

This type of learning is what allows us to think of the color, red, when someone mentions the word apple or know the next verse in our favorite song. Through associative learning, a neural network can create relational maps of ideas within itself that have been identified or analyzed by other types of neural networks prior to it such as multi-layer perceptrons or self-organizing feature maps [87,

88].

57

b. Recurrent/Hopfield Networks

Recurrent neural networks are a class of neural networks that consists of one or more layers. The neurons within a single layer often have their inputs linked in some manner to the output of the layer [89]. This connection scheme allows recurrent networks to accomplish many tasks typical feedforward networks cannot accomplish such as signal classification [90], speech recognition

[89], and image generation [91].

One of these subclasses of recurrent networks is called a Hopfield network. Hopfield networks store information similar to content-addressable memory [92] and are able to store information in the form of associations within the network. Hopfield networks were first defined by J.J. Hopfield in 1982 [93]. In this work he defined how a Hopfield network is made, and how memory storage and recall operate within it.

The original memory capacity of Hopfield networks is quite small [94]. Others have attempted to define variations of the Hopfield network to improve its memory capacity [95, 96], but usually only in theoretical terms or networks. It would be desirable to create a more scalable version of the Hopfield network for associative memory, but also create one that could be implemented within hardware in a simple manner. c. Segmented Attractor Network

In order to approach associative learning in this dissertation with recurrent networks, a sequence of small recurrent neural networks connected to one another in a format similar to a Hopfield network was developed. This network is called the segmented attractor network (SAN) and handles associative learning by making a couple key assumptions. The first key assumption it makes is that all information it observes is split into different sets of information. Within those sets of information, lists of features that belong to that set can exist. When implemented within

58

hardware, these different sets of information can correspond to info coming from different sensors or input methods (i.e. accelerometer, thermometer, pre-processed camera data, etc.). They can also define different categories of data (e.g. size, age, time stamp, etc.). The second assumption that is made about the information being given to the network is that features within the set of information cannot associate with one another. This means that if you have a set within the network that represents temperature, 70˚F cannot be associated with 50˚F.

When implemented in architecture, the sets of information are represented by arrays of neurons defined as s where the scalar magnitude of s is defined by y(t)

푠1 푦1(푡) 푠 푠 = [ 2] , 푤ℎ푒푟푒 |푠| = 푦(푡) = [푦2(푡)] (20) [97]. … … 푠 푛 푛×1 푦푛(푡) 푛×1

In (14), the term, n, is the total number of sets within the architecture. Within each set (si) there exists an array of features where each feature (fij) holds a label that allows it to exist within that si.

Therefore, each si and its magnitude is defined by

푓푖1 푦푖1(푡) 푓 푦 (푡) 푠푖 = [ …푖2 ] , 푤ℎ푒푟푒 |푠푖| = 푦푖(푡) = [ 푖2… ] (21) [97],

푓푖푚 푚×1 푦푖푚(푡) 푚×1 where m is the total number of features within the set. It is not necessary for each si to have the same value of m, but can be done so if the application requires it or to help demonstrate the functionality of the network.

The weight matrix within the network is represented by a set of synapses that cover all possible associations between sets and their features. A full example layout of this representation can be seen in Fig. 16. The complete weight matrix is represented by W,

푊1,1 ⋯ 푊1,푛 푊 = [ ⋮ ⋱ ⋮ ] , 푤ℎ푒푟푒 푊푖,푗 = 0 푖푓 푖 = 푗 (22) [97]. 푊푛,1 ⋯ 푊푛,푛 푛×푛

59

Within W exists a matrix of submatrices where each submatrix is represented by Wi,j. Each Wi,j possesses weights that associate one si to another. Weights are set to their activated value, wON, when the two features the weight associates fire simultaneously via having external input (E) applied to them.

Figure 16. Diagram of the segmented attractor network including its external input, synapses, and neurons [97]. This particular network has two sets (n=2), and three features per set (m=3).

To perform recall on the network, an iterative process is performed on the network to analyze the output magnitude of each neuron within the entire network in order to observe the feature from each set that possesses the highest magnitude output. The complete equation that represents this process at each time step is

푦11(푡) 푤11,11 ⋯ 푤11,푛푚 푦11(푡 − 1) 퐸11 푦 (푡) 푦 (푡 − 1) 퐸 [ 12… ] = [ ⋮ ⋱ ⋮ ] [ 12 … ] + [ …12 ] (23) [97]. 푤푛푚,11 ⋯ 푤푛푚,푛푚 푦푛푚(푡) 푦푛푚(푡 − 1) 퐸푛푚

60

In (17), yij(t-1) represents the output of the specified feature from the previous time step. If this artificial network was translated into hardware, the output magnitude of each neuron can be corresponded to the frequency at which each neuron would be firing within a spiking neural network and the consideration of the product of the weight matrix with the output of the neurons from the previous time step simulates how feedback depreciates within the network. d. Implementing the Segmented Attractor Network Using the Initial Device Model

To implement the SAN into hardware, first a simple point attractor network must be constructed in order to prove the functionality of the previously discussed neuromorphic components. This simple attractor network will use the previously discussed SR Octopus Retina neurons and the initial physics-based dual-gated synaptic device model that uses Niobium Oxide (NbO2) as its bulk material. A diagram of this architecture can be seen in Fig. 17a. In addition to the SR Octopus

Retina neurons, a set of ideal OP amps is used in order to amplify the neuron’s signal prior to reaching each synapse’s gate. There are two OP amps for each neurons; one inverting amplifier and one non-inverting amplifier. For the architecture shown in Fig. 17a, the OP amps are set to have gains of 5 and -5 in order to potentiate the device as fast as possible. The inverting OP amps are then ran to the Vn gate terminals on each synapse while the non-inverting OP amps are ran to the Vp gate terminals on each synapse (as shown in Fig. 17a).

Similar to the theoretical version of the SAN discussed previously, the network undergoes two separate phases of simulation. The first part of the simulation is considered the training or

“association” phase where memories are programmed into the network, while the second half is the recall phase where the network attempts to reconstruct previously seen memories when only shown part of the input it has previously seen.

61

Figs. 17b and 17c show a simulation where the network has two separate memories placed into its synapses. The first memory associates the neurons N1 and N2 with one another while the second memory associates N3 and N4. The plot in Fig. 17b shows the running frequency output of each neuron within the network (i.e. the frequency as calculated by the time between one spike and the next spike). What can be seen in the recall phase of Fig. 17b is that when N1 has external input applied, N2 responds due to the association previously drawn between the two neurons via the synaptic connections in the network. The same happens with N4 when N3 is given external input at the end of the simulation.

At the bottom of Fig. 17, Fig. 17c shows the synaptic weight map of this architecture in the form of each synapse’s resistance value. Black synapses correspond to a high resistance value, while yellow synapses correspond to a lower resistance value. Each plot within Fig. 17c is a different snapshot in the simulation’s time that shows the formation of the two memories within the network. This demonstration shows that the core neuronal and synaptic components function correctly for neuromorphic applications, and can therefore be used in a larger-scale version of the network in constructing a SAN.

62

(a)

(b)

(c)

63

Figure 17(a). Schematic of the basic four neuron network that uses the proposed double gated-synaptic devices. (b) Plot showing the running frequency of all neurons within the network with respect to time during a training/recall test where two memories are shown.

(c) Synaptic heatmap snapshots for the network as each memory is shown to the network.

The high resistance in the snapshots (black squares) is typically ~3.3 GΩ. This figure can be found in [59]. e. Using the Network for Navigation

In order to show the capabilities of a hardware-based SAN, the SAN will be placed into a slightly larger architecture that is able to perform a very basic form of navigation. The navigation done here is a simple test environment for a drone. The drone exists in a simple, cylindrical environment where colored landmarks are placed along the wall of the cylinder (Fig. 18a). Of the colored landmarks, a “blue” landmark is considered a target landmark for the drone to navigate towards.

Within this environment, the drone can move up/down along with turning left and right along its navigational axis.

To navigate the environment, a neuromorphic architecture is used in order to remember the landmarks the drone has seen and tell it where to go based upon those landmarks. This architecture can be found in Fig. 18b and consists of an input layer, and associative layer, and a motor layer.

The input layer simply takes digital data given to the network and translates it into spike trains via an array of SR Octopus Retina neurons. The associative layer consists of a SAN that represents all of the possible heights within the environment (z-axis position), colors of the landmarks within the environment (red, orange, green, blue), and the drone’s head direction (0˚ to 300˚ in increments of 60˚) along with all their proper associations (Fig. 18c). The motor layer consists of a set of four neuron circuits that represent the four actions the drone can take in its environment (move

64

up/down, turn left/right). Those neurons have synapses associated with them that are programmed via the associative layer when a blue (target) landmark is seen (Fig. 18d). This programming of the motor layer will then direct the network towards that target landmark after it’s seen no matter where the drone placed within the environment along the shortest path possible to the target.

(a) (b)

(c) (d)

Figure 18(a). Diagram of a sample environment for the proposed architecture to traverse.

(b) Diagram depicting the system-level design for the architecture which contains three core layers (input layer, associative layer, and a motor layer). (c) Diagram of the associative layer

(a segmented attractor network) that creates a landmark map for the system. (d) Schematic showing the design of the architecture’s motor layer. All synaptic Vp gates that possess multiple signals are preceded by a two or three-input OR gate that unifies the signals. All Vp

65

signals are positively amplified prior to reaching each synapse, and all synaptic Vn signals are negatively amplified prior to reaching each synapse. This figure can be found in [59].

An example of the drone moving through its environment can be seen in Fig. 19a. During this process, the drone first randomly wanders its environment; recording things with its associative layer as it travels. Once it finds its target, the motor layer is programmed. The drone is then placed back at its original starting location to show that the output of the network indicates that the network shows an output to move towards its previously seen target in the short path possible. The results of this from the perspective of each neuron’s output in the motor layer can be seen in Fig. 19b. The output of the motor layer is all equal and low until it sees its targets.

Once it’s placed back at its starting location, the motor layer shows that the drone should be moving up and turning right to get to its target in the shortest path possible. The synaptic resistance maps for both the associative and motor layers can be seen in Figs. 19c and 19d, respectively. The maps shown in these two subfigures are the maps after the drone has observed its blue landmark target.

This demonstration shows the SAN’s capability of associating pieces of information with one another and using that information in a useful application.

66

(a) (b)

(c) (d)

Figure 19(a). Diagram showing the conceptual procedure for the test presented to the architecture. (b) Plot showing the running frequency as it executes the test shown in Fig.

19(a). The recall at the end of the test is when the system is shown the first point it saw and shows that the network wants to go in the correct direction. (c) Synaptic heat map for the architecture’s associative layer once the test from Fig. 19(a) was complete. (d) Synaptic heat map for the architecture’s motor layer once the test was complete. This figure can be found in [59]. f. Using the Generic Model in a Segmented Attractor Network

Despite the promising results shown in Jones et al. [59], the architecture relied upon a double gated version of the device, which has not been as universally realized as GSDs with a single gate [78]. Therefore, demonstrating a neuromorphic architecture in simulation using the generic GSD model could show a network that is closer to a feasible implementation. Not only

67

would this network be more feasible but could also be built out of any GSD that meets the generic requirements shown by the model’s behavioral nature.

In order to demonstrate this neuromorphic architecture, a segmented attractor network was designed to hold associations between input and submitted in a paper to ACM’s Journal on

Emerging Technologies in Computing [71]. This network is spiking in nature and is not used in tandem with any other neuromorphic layers unlike the architecture shown in Jones et al. [59]. The schematic for this architecture can be seen in Fig. 20. This architecture contains six neurons, 24 synaptic devices (i.e. GSDs), and 12 logic AND gates. Every external input provided to the neurons is also preceded by a resistive diode to convert the external voltage applied to the input of each neuron into a current. The diodes also ensure that the feedback current from the synapses stays within the network and doesn’t leak backwards into the external input connections. The neurons are divided into three sets (s1-3) where each set possess two features (s1=N11, N12, s2=N21, N22, s3=N31, N32). The synapses within the synaptic grid have their gates programmed via the logic

AND gates that utilize the external input provided to the network. The output of these AND gates run to the gate terminals of the GSDs within the grid in order to program them when external input is provided. Each AND gate is responsible for combining two external inputs into a single output that is then ran to a pair of GSD gate terminals. The pair of GSDs programmed by each AND gate reside on the opposite, symmetrical sides of the synaptic grid. Each of these GSDs are responsible for running their postsynaptic connections to the two neurons that have their external input also ran to the input of the AND gate programming those two GSDs.

68

Figure 20. Diagram depicting a hardware schematic for a six-neuron segmented attractor network that contains three sets, where two features exist per set [71].

The output of the neurons in the architecture shown in Fig. 20 are recurrently connected to the presynaptic connections in the synaptic grid. These recurrent connections are (in a geometric sense) perpendicular to the postsynaptic connections made to the neurons. The neuron output is then measured to determine the network’s converged solution. The neuron within each set with the highest frequency is determined once again to be the winning neuron of each set. The neurons used in this version of the architecture use the TSMC 180nm technology node to create the same

SR Octopus Retina neuron circuit previously described. The frequency and duty cycle profile with respect to input current is given in Fig. 21.

69

Figure 21. Frequency and duty cycle as a function of input current for the SR Octopus Retina neuron built using 180nm TSMC technology node transistors [71]. g. Results of the Generic Model Segmented Attractor Network

To evaluate this version of the SAN architecture, a simple three-memory sequence was placed into the network during an association phase. After this association phase, recall was performed on the network by providing a single external input to the SAN at a time. Every external input is then stepped through in sequence to determine what the recalled state of the system is.

70

The first phase of simulation, the association phase, one neuron from each set is given in a sequence to insert three memories into the network. The memories given are the following.

1. Memory #1: N11, N21, N31

2. Memory #2: N12, N22, N32

3. Memory #3: N11, N22, N31

The output frequency of the network during the association phase can be seen in Fig. 22. The dark black areas seen in the frequency output on each neuron signifies when each neuron is being provided external input and spiking at its maximum frequency. At the end of the association phase during the third memory, other neurons besides the ones being provided external input begin to spike as well. Those neurons are spiking at lower frequencies due to the feedback provided through the synaptic grid in the SAN. The synapses that were previously programmed by earlier memories

(both #1 and #2) cause feedback to N12, N21, and N32 to cause them to spike.

Figure 22. Frequency response of all neurons within the architecture shown in Fig. 20 during an association phase where three memories are provided to the network [71]. Darker areas

71

indicate higher frequency responses where the black regions indicator where external input is provided to the network in order to form memories.

Those synapses within the synaptic grid that cause that feedback can be seen in the temporal snapshots shown in Fig. 23. These synapses use the rightmost values seen previously in

Table 3 when emulating the NbOx-based GSD. One thing that can be seen in the snapshots shown is that after a memory is placed into the network, some of the synapses begin to decay with time

(as shown previously in Fig. 15). This decay of each synapse’s state will influence what the recall state of the system is once the recall phase begins.

Figure 23. Snapshots of the synaptic heatmap of the segmented attractor network described in Fig. 20 during the association phase shown in Fig. 22 [71]. As time progresses, synapses are programmed in the network to higher conductance values so the network can perform recall on these observed memories later.

The recall phase previously mentioned can be conducted once the association phase is complete. The frequency output over time of each neuron within the network can be seen in Fig.

72

24. The dark black diagonal in Fig. 24 demonstrates the neurons receiving external input like in

Fig. 22. Within the lighter regions of the network’s frequency response, echoes of the three memories originally placed into the network during the association phase can be naturally seen.

To quantitatively measure these outputs however, an average of the frequency response for each neuron during each external input’s application to the network was calculated. These averages can be seen in Table 4.

Figure 24. Frequency response of the neurons within the SAN with respect to time during the recall phase [71]. During recall, external input is provided to each neuron one-by-one in order to observe what memory is recalled during the given stimulus. If the response in the figure is closely observed, visible echoes of the memories shown in Fig. 22 can be seen over time as recall is performed.

Within Table 4, the highest average frequencies within each set of neurons that determine the SAN’s state during that external input’s application are in bold print. These outputs are then recorded on the right side of the table and labeled as one of the memories from the original

73

association phase. As shown, the first memory is recalled in half of the cases tested whilst the second memory appeared twice, and the third memory once. Despite the first memory being the oldest one within the network, its recall was still quite strong due to the decay of the GSDs within the synaptic grid being low, and the third memory reinforcing two of the same neurons associated with the first memory at the very end of the association phase (Fig. 22).

Table 4. Recall Frequency Responses of SAN [71]

External Input Recalled Frequencies (MHz) Recalled Memory

s1 s2 s3 N11 N12 N21 N22 N31 N32 s1 s2 s3 # N11 - - 26.8 3.65 8.36 6.21 8.12 3.67 N11 N21 N31 1 N12 - - 3.23 29.2 2.24 6.27 2.69 8.88 N12 N22 N32 2 - N21 - 5.65 3.35 28.0 5.15 5.51 3.07 N11 N21 N31 1 - N22 - 5.69 5.01 5.26 23.0 5.53 5.00 N11 N22 N31 3 - - N31 9.94 4.30 9.38 8.22 27.7 3.99 N11 N21 N31 1 - - N32 2.26 8.49 2.15 6.08 2.62 28.8 N12 N22 N32 2

To demonstrate how factors such as GSD decay affects how the SAN recalls states, another simulation was conducted that repeated the same procedure as the one shown in Figs. 22-24, but instead doubled the decay factor of the GSD model from 5e-3 to 1e-2. The results of this test can be seen in Table 5. What is shown in this version of the simulation is an almost opposite picture than what is shown in the previous table. Memory #1 is only recalled once, while the second is recalled three times and the third twice. This result shows that decay plays a major role in what is recalled from the SAN (along with associative memory in general) when using GSDs.

Table 5. Recall Frequency Responses of SAN with Higher Decay (rstp=1e-2) [71]

External Input Recalled Frequencies (MHz) Recalled Memory

s1 s2 s3 N11 N12 N21 N22 N31 N32 s1 s2 s3 # N11 - - 27.9 3.78 6.12 7.72 10.0 3.76 N11 N22 N31 3 N12 - - 2.67 29.2 1.25 7.02 2.60 9.49 N12 N22 N32 2 - N21 - 3.77 2.99 28.9 7.18 3.79 1.79 N11 N21 N31 1 - N22 - 4.48 4.92 4.28 22.0 4.48 4.88 N12 N22 N32 2 - - N31 9.70 3.84 5.79 7.60 27.7 3.97 N11 N22 N31 3 - - N32 2.90 9.30 1.88 6.94 3.39 29.1 N12 N22 N32 2

74

The results shown in the two previous tables might initially seem like a detriment to the network’s performance, but when viewed for an application-focused lens, it turns into being a beneficial trait for the network. In long-term, real world use for the SAN, once provided with enough information about the world, the network will begin to saturate all its synaptic states. It will begin to receive too much information, which will cause the network to never give any sort of meaningful recall on the fraction of information it is being provided. With the conductance state of the GSDs in the network being volatile, it allows for memories to be slowly forgotten over time.

This rate of forgetting is dictated by the rate at which a synapse’s state decays and can be tuned via device engineering for the specific application space the network is targeting. This process of forgetting memories allows the network to make space for more new memories in the future that might be more relevant to the network’s current situation.

75

VI. Analysis of the Segmented Attractor Network a. Importance of Synaptic Values

Simple hardware demonstrations of a SAN are valuable in showing the feasibility of the architecture in hardware, but also studying larger, generic versions of the network via higher-level simulation frameworks is key. Studying the network at a larger (albeit more abstract) scale reveals a perspective on the network’s capabilities in best case scenarios. In order to see how the SAN operates under generic, ideal conditions, a simulation framework was established to study various design parameters of network and how they affect its performance.

Within a MATLAB environment, a SAN was created using the mathematical operations that were previously described by (20)-(23). A flowchart depicting the simulation framework that utilizes (20)-(23) can be seen in Fig. 25. This analysis of the SAN will primarily look at several key parameters in determining the success of recall of a SAN’s state given part of a previously observed memory:

1. How the value of synaptic weights can change the outcome of the network?

2. How recalling different amounts of input affects the network’s hit rate?

3. How adjusting the physical size of the network changes the network’s hit rate?

76

Figure 25. Flowchart for the algorithm used to program and analyze the segmented attractor network [97]. Memories are placed into the network by programming synapses to their wON value (where wOFF=0). External input is defined as E=1. During recall, Ihid≥1, depending on the simulation being executed.

In order to simulate the SAN within MATLAB and demonstrate its capability, the process was split into two halves. The first half of this simulation process would be the SAN’s “training” phase where it would be populated with random memories. For the sake of these simulations, a

“memory” here is defined as associating a single fij from each si with one another. The second half

77

of the simulation is performing recall on the random assortment of memories the network has observed. To perform recall, the network is shown a fraction of the features within a single memory it has previously seen as external input, and then (17) is ran iteratively until a solution has been revealed. If two features within the same set appear to have equal value, a random one of the two is selected and applied as external input to the network to help it converge to a correct solution.

This process is done over many test points to obtain an average hit rate. The network is considered to have a hit when it guesses all original features of the memory correctly. If it guesses even one incorrectly, it is considered a miss. Once the average hit rate has converged, the simulation is complete. This entire process can be seen in the diagram shown in Fig. 25.

The first thing that must be studied when looking at generic scenarios for the SAN is how the actual value of the synaptic weights within the synaptic grid affect the network’s outcome. The first experiment to study this behavior is a simple demonstration in showing how the synaptic weight interacts with the network’s size. For this demonstration, n will represent the number of sets within the SAN while m will represent the number of features per set. Since this is a generic demonstration, the number of features per set will always be the same (even though that rule does not need to be explicitly followed for a network to be a SAN). If both m and n were scaled linearly to increase the size of the network while 200 random memories were placed and recalled into the network, a hit rate could be calculated for all given memories currently placed into the network.

For the first experiment (shown in Fig. 26), the value of each synapse once it has been programmed

(wON) has been set to 0.1. This value is analogous to the high conductance value the synapse would possess if it were a device implemented in hardware. What can be seen in Fig. 26 is an increase in the network’s hit rate at higher memory count values when the network becomes larger. This outcome is not surprising, but one feature that does deserve explanation within Fig. 26 is the blue

78

tail that extends into the low hit rate region in the higher memory count area of the plot. Within a couple incremental increases to the size of the network from n, m = 9 to 11 the hit rate drops rapidly. One thing that must be considered when simulating this network is not only the value of wON but also the value of the external input, E.

Figure 26. Average hit rate of the SAN as the size of the network is increased [97]. The number of hidden neurons to the network when performed recall (Ihid) is set to 1, while wON=0.1. As more memories are placed into the network, hit rate overall drops at all sizes, but at different rates.

Within these generic simulations, the neurons in the network are treated as simple summation devices. This method of neuron implementation ensures that the neurons operate in a similar way to the rate-based spiking neurons used in hardware. The pitfall of this type of implementation when considering a recurrently connected network such as the SAN is how much feedback the network gives back to the neurons versus the magnitude of the external input. If a neuron is being providing

79

external input, it should be the outright winning feature of its set since it’s already receiving trusted input. If the feedback to another neuron within the set becomes larger than this external input however, it can cause the network to choose the incorrect neuron when attempting to recall a memory.

When looking at the simulations shown in Fig. 26, the external input provided to the network is 1, while the wON for each synapse is 0.1. If a situation were to arise where 9-10 synapses were all providing feedback to a neuron within a set that had another neuron receiving external input, that neuron might be able to override the outcome of the external input neuron and cause the network to recall its state incorrectly. This type of scenario is what happens at higher memory values in Fig. 26. As the network approaches n, m = 10, the hit rate drops rapidly due to this type of scenario occurring. What this result shows is

퐸 > 푤푂푁(푛 − 1) (24) [97].

In (24), the term n-1 is in the equation since every neuron receives feedback from all other sets besides its own set.

If wON were to be decreased from 0.1 to 0.04 and then have the same experiment from Fig.

26 reran, a result like the one shown in Fig. 27 would occur. Since the value of each synapse’s feedback is now much smaller, the network can still recall memories at higher memory counts within the network reliably. If the y-axis of Fig. 27 were to be extended higher until it reached a range and enough memories were added to the network, a similar effect would appear as did in

Fig. 26 where the hit rate at (relatively) higher memory counts would quickly drop due to feedback overriding external input once more.

80

Figure 27. Average hit rate of the SAN as the size of the network is increased [97]. The number of hidden neurons to the network when performed recall (Ihid) is set to 1, while wON=0.04. As more memories are placed into the network, hit rate overall drops at all sizes, but at different rates. Synapses also show no overruling behavior with respect to the external input provided to the network in this example unlike in Fig. 26.

To study the direct effect of wON to hit rate when not considering the size of the network, another experiment was conducted where wON was varied from 0.025 to 0.25 in order to see any change in the network’s hit rate as memories were added to the SAN. The result of this test can be seen in Fig. 28. Although not as stark as the result shown in Figs. 26 and 27, the result shown in

Fig. 28 reinforces the notion from the previous two figures that decreasing the amount of influence each synapse has on the network’s output with respect to the external input’s magnitude is one way to improve the network’s overall performance. To further the principle put forth previously by (24),

퐸 ≫ 푤푂푁(푛 − 1) (25) [97].

81

If the weight (i.e. conductance in hardware terms) of the synapse is kept as low as possible, it will provide the best potential benefit to your network’s overall performance. In the realm of hardware however, this rule can only be taken so far. Since the SAN relies on rate-based output to determine its outcome, the hardware must be able to measure a reliable difference between the winning neuron and other neurons within a set. If the difference between neurons is too small, the factor of noise or measurement inaccuracy begins to enter. Therefore, the conductance of the synapses within the grid should be kept low and not allowed to ever compete with an external input’s value, but it should not be made so low that the difference between neurons is indistinguishable by measurement systems.

Figure 28. Average hit rate of the network as wON is varied from 0.025 to 0.25 [97]. There is an improvement in hit rate as wON is decreased as memories are added to the network. The number of neurons hidden to the network in this demonstration was 1, and the network possessed 8 sets and 8 features per set.

82

b. Recalling Different Amounts of Input

Another angle that can be taken when studying how a generic SAN performs is how much input is given to the network when trying to perform recall. Since recall is attempting to perform the basic task of pattern completion, the expected outcome of such a situation would be that the more information that is provided to the network, the more reliably it can predict the outcome. For the sake of the simulations shown here, this number of sets that do not receive external input during recall and that instead must recall on synaptic feedback is defined as Ihid. For Figs. 26-28, this value was always set at one. For the following experiment however, this value will be varied for a network that has 20 sets. The result of this experiment as memories are placed into the network can be seen in Fig. 29. What can be seen in the outcome shown is the expected one. As more information is hidden from the network, the less likely the network is to recall the input correctly

(especially at higher memory count values).

Figure 29. Average hit rate of the network as the number of neurons hidden to the network is varied [97]. The network here had 20 sets and 20 features per set while wON=0.04. As more

83

information is hidden from the network during recall, average hit rate decreases as memories are added to the netework. c. Varying the Segmented Attractor Network’s Dimensions

The final aspect to study regarding the generic version of the SAN is individually changing the number of sets and features within the network to observe how it affects the network’s performance. For the first experiment, the number of features per set in the SAN was fixed at ten whilst the number of sets in the network was varied from 2 to 20. What can be seen in the results of the experiment shown in Fig. 30 is an increase in performance as the number of sets increases with a plateauing of the network’s memory capacity around n=7. This plateau of performance when increasing the number of sets within the network is most likely attributed to the network running out of space to store memories at a similar point since the number of features per set are not increasing with n. If the scenarios for n and m were to flip however, a much different outcome occurs.

Figure 30. Average hit rate of the SAN as the number of sets within the network was varied

[97]. This network had 8 features per set, wON=0.04, and Ihid=1.

84

In order to show the network’s memory capacity potential when varying the number of features per set, a SAN was created with ten sets (n=10), a wON value for synapses of 0.01, and half of the features from each memory not shown to the network when attempting to perform recall. This network then had its hit rate analyzed over a wide range of values for m (where m is the same value for each si). This process was done with a range of memories in the network from

1 to 200 in order to create a complete depiction of the network’s memory capacity. The results of these simulations can be seen in Fig. 31. What can be seen in Fig. 31 is that the network’s memory capacity increases in an interesting fashion when looking at simulations where the hit rate of the network is very high (hit rate > 0.95). If comparing the memory capacity to that of a typical

Hopfield network or even that of the Storkey modified version of the Hopfield network [96], the memory capacity surpasses that of any previously demonstrated attractor network. The traditional

Hopfield network has a memory capacity in best case scenarios around 0.15n [94, 95], where n in this case represents the total number of neurons within your network. The Storkey version of the

Hopfield network possesses a higher capacity of around ~0.3n [96]. Even in the example network analyzed in Fig. 31 at the higher value of m shown, the memory capacity for the network peaks around ~0.375n. This increase in memory capacity is due to the network’s structured manner of introducing sparsity within itself. As the number of features per set increases within the network, information has more space to fit within the network.

85

Figure 31. Hit rate of the segmented attractor network when an increasing number of memories was placed into a network that had its features per set count varied from m=1 to m=40 [97]. The value for won was set to 0.01, and the number of sets (n) was fixed at 10.

The number of inputs hidden to the network during analysis (Ihid) was fixed at 5. d. Implementing a Dataset onto the Segmented Attractor Network

Whilst observing the effects of generic memory capacity on a segmented attractor is useful for understand how some of its core mechanics operate, something that could prove more interesting is observing how the network behaves when provided an actual dataset. Randomly generated values provide general insight into how the network operates, but a dataset would provide the network with an environment that follows a set of obvious or non-obvious rules. For example, in the MNIST dataset the network is only provided images of handwritten digits. Images of airplanes or frogs are never given to the network. Other datasets like MSTAR provide images of potential military targets [79]. Not only does the network have to classify the images it is shown, but it also must determine if they are threatening or non-threatening.

86

For associative memory, the goal is different than that of typical classification neural networks. Associative memory’s primary objective is to catalogue and later recall higher-level associations between previously defined abstract ideas or concepts. Therefore, typical classification datasets such as MNIST or its derivatives aren’t the ideal candidate for an evaluation dataset. The segmented attractor network requires a dataset comprised of higher-level information in order to properly test its capabilities. Since there are not any notable datasets for neural networks that consist of such higher-level data, one was created. This dataset is called the European Heads of State (EHoS) dataset. It consists of 700 data points that were/are leaders from various European states throughout history. The dataset is not a comprehensive one of European history, but it highlights the rulers of some of the most notable European states throughout history in order to show how some states are linked. e. The EHoS Dataset

As previously mentioned, the EHoS dataset consists of 700 different data points where each point is a leader. Each leader is categorized by eight traits that will be defined as the sets in the segmented attractor network. These eight sets are:

1. First Name – First name of the given leader.

2. Last Name/Numeral/Title – A common last name, numeral (e.g. Henry I), or title (e.g.

Alexander the Great) often used for the ruler.

3. Century of Rule – The century in which most of their rule took place.

4. State – The state which the ruler led.

5. Position – The position the leader took when in charge (e.g. King, , President,

etc.).

6. /Political Party – The dynasty or political party to which the leader belonged.

87

7. Cause of Death – What caused the death of the leader.

8. (Years) – How many years the leader was in power.

Within the dataset, there are 328 unique first names, 190 unique last names/numerals/titles, 29 unique centuries of rule, 22 unique states, 24 unique positions, 117 unique /political parties, nine unique causes of death, and 59 unique reign lengths. The number of unique features within each set with respect to the total number of data points within the dataset is defined as a set’s “Uniqueness” Factor, or U-Factor. The equation to calculate a set’s U-Factor is given by

푓푖 푈푖 = (26), 퐼푡표푡 where fi is the number of unique features within the set, and Itot is the total number of data points within the dataset. The U-Factor of each set within the EHoS dataset is contained in the following table.

Table 6: U-Factor Values for Each Set within the EHoS Dataset

Set F.Name L.Name Century State Position Dynasty Death Reign

U-Factor (%) 46.9 27.1 4.1 3.1 3.4 16.7 1.3 8.4

As can be seen in Table 6, a wide range of U-Factors exist within the EHoS dataset. Some sets have a high degree of uniqueness (e.g. First Names) while others are more uniform (e.g. Cause of

Death).

Another thing to consider whenever analyzing the overall content of the EHoS dataset is the Hamming distance between each point within the dataset. Hamming distance is commonly defined as the bitwise difference between two values [80]. For example, if one were to determine the Hamming distance between the two decimal values 114 and 118, a Hamming distance of 1 would be found since only a single digit differs between the two numbers. This same principle can

88

be applied to the EHoS dataset where each data point is compared to every other data point within the dataset to find that specific point’s average Hamming distance. A figure that graphically displays the average Hamming distance for all points within the dataset can be seen in Fig. 32.

Figure 32. Average Hamming distance of each memory (leader) within the EHoS dataset with respect to all other memories within the dataset. The memories were then sorted with respect to their average Hamming distance.

Fig. 32 shows that all memories fall along a curve that places the average Hamming distance of each point between 6.5 and 8.0. There are a few outliers on both ends of the dataset, but most values lie upon a linear curve from ~6.75 to ~7.6. The overall average Hamming distance

89

for all points in the EHoS dataset is 7.19. This average value means that given any arbitrary leader within the dataset, they will most likely share one common feature when compared to any other random leader (i.e. the closest integer Hamming distance to 7.19 is 7, and there are eight sets total).

This overall average shows that the dataset does possess many associations within, but also that most leaders are fairly unique with respect to others. f. Demonstration of the EHoS Dataset on the Segmented Attractor Network

The SAN designed to handle the EHoS dataset is comprised of 788 neurons and 441,699 synapses. To obtain these values, the total number of neurons in the network is calculated by the equation

푠푡표푡 푁푡표푡 = ∑푖=1 푓푖 (27), where stot represents the number of sets within the SAN (stot=8 for the EHoS dataset) and fi is the number of features within each set (i.e. Table 6). To calculate the number of synapses required for the network, the equation

푠푡표푡 푤푡표푡 = ∑푖=1 푓푖(푁푡표푡 − 푓푖) (28), is used.

Fig. 33 shows a relatively scaled diagram that shows how the SAN generally relates different sets of information with one another. Sets with high amounts of features consume more space within the SAN than those that have fewer features and the empty diagonal varies in size with the feature count of each set. This diagram really demonstrates how the content of the information the SAN is meant to remember can influence how the inner structure of the network is designed.

90

Figure 33. A relatively scaled diagram of a SAN that can interpret the EHoS dataset. Many synapses exist for relations with high amounts of features (e.g. First/Last Names) while much fewer synapses exist for relations with few features (e.g. State and Cause of Death).

To run the SAN, a code framework was built in MATLAB that was designed to lightly emulate the same type of behavior that is typically seen in the hardware version of the network

(similar to the previous generic MATLAB version of the SAN). The entire flow of the program is demonstrated in Fig. 34. The process to evaluate the SAN on the EHoS dataset is a bit lengthy, but can be broken down into the initial setup, memory association, memory recall, post-analysis, and post-analysis check stages.

The first of these main stages is one that is brief and mostly involves translating the information from the EHoS dataset into something the SAN can utilize. The initial setup stage first takes in the information from the EHoS dataset (Appendix C) and translates this table into numeric data than can be easily interpreted by the SAN during both association and recall. The network

91

then has all its initial parameters defined to formulate what type of simulation is about to occur.

Finally, the framework generates a random sequence of memories from the SAN dataset that will be used during testing. This sequence will always include every single memory exactly one time and be 700 memories long. It should be noted that all tests performed on this SAN shown in this dissertation only use a single random sequence of inputs for consistent comparisons between trials.

Not every input sequence generates the same results, but the general behavior should be consistent from trial to trial.

The next stage of analyzing the SAN is the first of two memory processing stages – memory association. An important remark to make at this point is that the SAN here is not analyzed in the normal way neural networks are often evaluated. Normally, neural networks go through two separate phases of evaluation: the training phase and the testing phase. In this analysis, training

(memory association) and testing (memory recall) occur in an interweaved fashion as the trial progresses. This type of evaluation is more akin to the concept of lifelong learning [98] since the network is evaluated with respect to time.

The memory association phase primarily handles placing the next memory in the sequence into the SAN and then performing any special learning behaviors specified when initializing the network’s parameters. These special learning behaviors will be discussed later in subsection g.

The second of the two memory processing stages is the memory recall stage. In this stage, the network’s state is considered “frozen” and goes through a process of evaluating the network on every memory that has been shown to the network so far. During this memory recall process, four out of the eight features that comprise of each memory are iteratively selected and shown to the network. The network then attempts to perform a pattern completion on the remainder of the memory to correctly recall the previously shown memory. In order to comprehensively evaluate

92

every possible pattern completion situation that could be given to the network, every possible combination of the four features from each memory are considered (8 choose 4 = 70). If during evaluation two or more features appear tied, the network will randomly select one of these tied features and apply an external input signal to that feature within the SAN. The network will then attempt to reanalyze its recalled state to converge to a single array of winning features. This random selection to perform a tiebreaker is necessary in a more theoretical approach the SAN as shown here, but in a hardware implementation of the network it would be (probably) not necessary.

Hardware introduces natural noise to the SAN, which results in ties being much more unlikely.

Only when frequency between two features falls under the measurement tolerance of the circuit measuring the neuron output of the SAN does a tiebreaker need to be performed in hardware.

The final two stages of analyzing the SAN are very brief and are primarily for record keeping and determining the next thing to do during analysis. Post-analysis performs the three tasks of determining if the network’s recalled state matches the memory that was partially provided to the network (i.e. hit rate), checking if the recalled state has been recalled by the network yet at this point in time and exists within the SAN dataset, and tallying up how often it recalls each memory at each point in time. Finally, the network determines if it can either go to the next combination of inputs for the current memory being evaluated, moving to another memory to recall, showing a new memory to the network, or ending the simulation.

93

Figure 34. Flowchart that shows the simulation procedure for running and evaluating the

SAN on the EHoS dataset.

To show a first demonstration of the SAN on the EHoS dataset, the first test is a basic run of the network with no sort of special learning behaviors included in the architecture. The overall performance of the network is tracked in three separate plots in Figs. 35-37. As will be seen in

Figs. 35-37 and future results from other tests on the SAN, certain setups of the SAN perform well in some of the three metrics shown in Figs. 35-37 (hit rate, unique memory ratio, and recall occurrences), but might not accomplish other things that might be desirable in the SAN in certain applications.

The first metric to analyze the network, hit rate, is shown in Fig. 35 for the first demonstration of the SAN on the EHoS dataset. To determine hit rate, each colored element of the heatmap shown in Fig. 35 is calculated by the equation

퐶표푟푟푒푐푡 푅푒푐푎푙푙푠 퐻푖푡 푅푎푡푒 = (29). 푇표푡푎푙 푅푒푐푎푙푙 푇푟푖푎푙푠

94

For (29), it is important to note that the correct recalls for the SAN and the total number of recall trials for the SAN are for just that point in the heatmap which is dependent upon the simulation’s current time step and the current combination of inputs being provided to the network. The y-axis in Fig. 35 is listed as the U-Factor Average with a range of values that are not perfect intervals of one another (but sorted). These values are averaged values of the U-Factors for each of the sets

(Table 6) that have features giving input in each input combination across the heatmap. Every memory shown to the network up to the current time step gets one trial on every row along the y- axis to exhaustively analyze how the network responds under all circumstances.

One thing that can be observed in Fig. 35 is that the U-Factor average does appear to have some form of general control over the hit rate of the network at most time steps of the simulation.

This control does not always occur (e.g. U-Factor Average = 0.1338), but shows a general trend as U-Factor average increases. This loss of dependence on U-Factor average in specific situations might be due to other factors such as the largest/smallest set within the SAN currently receiving external input or not, the internal state of the network, etc. Some other evaluation was performed on dependent variables for hit rate, but no real clear winner ever emerged with U-Factor average being the best contender for the primary dependent factor.

95

Figure 35. Hit Rate as a function of the average recalled U-Factor as memories are shown to the network one-by-one over time.

The next metric that was measured to study the SAN’s performance was determining how many unique memories (i.e. individual memories in the EHoS dataset) the network can recall at each time step. This metric is defined as the unique memory ratio and is calculated by the equation

푈푛푖푞푢푒 푀푒푚표푟푖푒푠 푅푒푐푎푙푙푒푑 푈푛푖푞푢푒 푀푒푚표푟푦 푅푎푡푖표 = (30). 푇표푡푎푙 푀푒푚표푟푖푒푠 푆ℎ표푤푛 푡표 푁푒푡푤표푟푘

96

In (30), the denominator term is the total number of memories shown to the network up to the current time step. At time step 1, the denominator is 1. At time step 300, the denominator is 300.

The purpose of this metric is to determine how many memories that have been shown to the network still exist within it in a recallable fashion. Even if the memory is recalled once out of hundreds of different input combinations, it counts towards the unique memory ratio. Since the

SAN is shown one memory per time step, a perfect unique memory ratio for the SAN to maintain over time would be 1. However, as Fig. 36 shows, this value is not always maintained as the simulation progresses.

In Fig. 36, a few notable trends can be seen in the unique memory ratio profile. First, the

SAN is able to maintain the unique memory ratio of 1 for between 200-300 time steps of the simulation. The second observation to be made is that the SAN maintains a fairly high unique memory ratio until the end of the simulation. By 700th time step, the ratio has only dropped to

0.975. This end result means that of the 700 different memories shown to the network over the entire simulation, it can recall 97.5% of them under at least one circumstance. The end trend however alludes to potential future issues if the dataset the SAN was interpreting was larger. If the

EHoS dataset were perhaps doubled or tripled in size, that unique memory ratio might further decline as time progresses. This trend is primarily due to the network slowly filling up with information over time. Eventually, certain information no longer becomes unique and cannot recall specific memories under any scenario.

The final observation to be made about Fig. 36 is an event that occurs in the middle of the simulation. When the SAN’s unique memory ratio initially breaks from the value of 1, the ratio does not break in the direction initially expected. Around the 260th time step, the unique memory ratio climbs above one. If the network is only shown one new memory per time step, how could

97

the unique memory ratio climb above 1? This phenomenon is explainable and will be shown in finer detail in Fig. 37.

Figure 36. The ratio between the number of unique memories the SAN currently holds and the total number of memories shown to the network at that given point in time during the simulation.

The final metric analyzed during the SAN simulation is how often the network recalls every memory ever shown to the network. This metric is called recall occurrences and is qualitatively described as

98

푅푒푐푎푙푙 푂푐푐푢푟푟푒푛푐푒푠 = 푁푢푚푏푒푟 표푓 푅푒푐푎푙푙푠 푓표푟 푎 푆푝푒푐푖푓푖푐 푀푒푚표푟푦 (31).

Once again, (31) is measured at the current time step for each memory. Analyzing the network in this manner allows Fig. 37 to display how the SAN recalls and sometimes forgets certain memories over time. Since one new memory is shown to the network per time step, Fig. 37 has a distinctive diagonal that extends from the lower-left to the upper-right corner of the heatmap. The black region in the upper-left portion of the plot exists since those memories have not been shown to the network yet. Other black regions appear for certain memories as they are forgotten as time passes (due to more memories being introduced to the network and overshadowing old ones) or are never able to take root in the network (i.e. solid, black, horizontal lines).

In Fig. 36, the phenomenon of the unique memory ratio exceeding the value of 1 was unexpected. However, in Fig. 37 this occurrence becomes much more apparent with the distinctive red lines appearing within the large black region of the heatmap. These are memories that have not yet been introduced to the SAN, but due to sheer chance of the network’s internal state, output a result that matched a future memory that was going to be shown to the SAN. This effect can be considered as a light form of chance prediction. Those recalled states were not accurate estimates of the recall input provided to the network during evaluation but did match another memory within the EHoS dataset.

99

Figure 37. Recall count heatmap for every memory at each point during the simulation.

In the evaluation of the SAN seen so far, recall has only been performed by using 50% recall input (i.e. 4 out of the 8 features of every memory shown to perform pattern completion).

This amount of input for recall is not the only amount of recall you can provide the network. To perform pattern completion, any amount between 1 and 7 could be provided to the network. To demonstrate how the amount of recall input on the network can affect, another test was conducted where the simulation from Figs. 35-37 was repeated but for all amounts of recall input that can be provided to the network. The results of this analysis can be seen in Figs. 38-40.

100

In Fig. 38, the hit rate for all amounts of recall can be seen in a series of seven plots. The resolution of the y-axis changes from plot to plot due to combinatorial results between selecting two features out of eight for a memory or five features out of eight for a memory. As can be seen as recall input increases from plot to plot, the general hit rate improves for the SAN. This result is expected since the more information you give the network to recall original memory, the more likely it is to recall the memory since it has to estimate less information.

Figure 38. Composite image of hit rate heatmaps where each heat map represents a different amount of input recall given to the SAN during evaluation.

In Fig. 39, the clear effects of increasing the amount of recall input can be seen in the unique memory ratio. Fig. 39 shows that a near-perfect unique memory ratio can be maintained within the SAN as long as the recall input provided to the network is >50%. It also shows that while some memories might appear to be unrecallable when given certain amounts of recall inputs, those same memories still exist within the network if more information was externally given.

101

Figure 39. Plot showing the unique memory ratio of the recall process for the SAN at each time step during seven separate simulations where differing amounts of input recall were given to the network. Both the ratios for the recall inputs of 75% and 87.5% have perfect unique memory ratios of one for the entirety of their trials.

Finally, Fig. 40 shows how the network recalls different memories with differing amounts of recall input. When low amounts of recall input are given, many memories are never recalled and most memories are not recalled very often. However, as recall input approaches 50-62.5%, all or nearly all memories appear as recallable and are recalled much more often. An interesting trend

102

to observe from 62.5% to 87.5% input recall is that even though all memories are recallable, the amount each memory is recalled decreases and become extremely uniform. This result can be explained by the hit rate at higher amount of recall input being very high (as shown by Fig. 38) which translates to more uniform, more reliable recalls that do not recall other memories more often by accident.

Figure 40. Composite image of recall occurrence heatmaps for every individual memory presented to the SAN during seven different trials where varying amounts of input recall were provided in each test. g. Expanding the Segmented Attractor Network’s Behavior

There are several methods that could be utilized to improve the performance of the SAN in various ways. The few that will be focused on here will primarily target types of behavior we see in human cognition [99-103]. Each specific memory behavior will be tested individually against the same

103

sequence of the EHoS dataset given in the first example in Figs. 35-37. Then, all behaviors will be combined in order to test how the entire ensemble of behaviors operates together.

The first of the three behaviors that will be targeted in these experiments will be the capability of making predictions on what content the network might observe in the future. The implementation method for this behavior is described in Fig. 41a. This behavior enhances the network’s capability to use two or more previous memories it has seen previously with something that it is being currently shown. If the previous memories and the current one are similar enough, the output of a neuron will surpass the network’s predefined prediction threshold (Pth) and form associations between this neuron and all other neurons currently being provided external input (E) or also currently exceeding Pth. The synaptic connections made between these neurons is no different than a normal synaptic connection within the SAN; only the means of formation varies.

The threshold, Pth, does not apply to neurons that are currently receiving external input since they are already forming associative connections between one another. This predictive process is described by the equation

푤푖푗, 푓푖 < 푃푡ℎ 푎푛푑 푓푗 < 푃푡ℎ 푤푖푗 = { (32). 푤푂푁, (푓푖 ≥ 푃푡ℎ 푎푛푑 푓푗 = 퐸) 표푟 (푓푖 ≥ 푃푡ℎ 푎푛푑 푓푗 ≥ 푃푡ℎ)

It should be noted that these predictive connections, whenever made, will always keep the SAN symmetric with respect to the diagonal (i.e. an even number of synapses will always be programmed during a prediction).

104

Figure 41. Diagrams showing examples of the expanded types of behavior that can be applied to the SAN. (a) When given enough feedback to neurons not currently receiving external input, they can also tie themselves to neurons currently receiving external input; allowing for the network to form predictive connections. (b) When feedback from a neuron (not including its external input) reaches a certain point, it resets all synapses associated with that neuron, erasing that content from the SAN’s synaptic grid. (c) When using volatile synapses, the SAN will passively forget memories over time if the synapse is not reactivated by a future memory.

This behavior is primarily targeted at the network making a concerted effort at using information its previously seen to potentially predict future events. These predictions will not always be accurate (and depending on the dataset might rarely be), but it could provide the network with some degree of foresight. This type of behavior can be correlated to the human behavior of speculating or predicting future events via information previously observed [99]. If this type of behavior were implemented in a hardware version of a SAN (such as in [71]), each synaptic device would be programmed by a combination of both the external input to the SAN and the output spike

105

patterns of the neurons within the network. If the output spike frequency of a neuron surpasses a certain threshold, the rate at which synapse would be potentiated would be able to surpass the natural decay rate of the synaptic device and instead place it into a high conductance state. The only extra circuitry required for this operation would be the combinational logic shown in Fig. 42.

Figure 42. Combinational circuitry needed for forming predictive memories. Only one additional AND gate and one additional OR gate would be needed per synapse since each synapse already receives input from one AND gate.

One issue with the SAN having predictive capability could be for the network to form too many predictions as time passes. This behavior could cause the network to saturate its synaptic connections; preventing it from making any form of logical conclusion to information it is presented. Throughout daily activity, humans observe and interact with an overflowing amount of information. Most of this information that the human brain encounters is deemed unimportant during processing and is discarded [100, 101]. Sometimes, this process does not occur instantly, but perhaps at a later time [101]. For example, one of the possible explanations to why humans dream is that it is the brain consolidating and processing information it observed during the previous day and discarding the previously observed unimportant information [101]. This process

106

is known as the sleep-wake cycle [100]. If this type of behavior was implemented in some manner into the SAN, it could potentially help with this issue.

To implement this type of behavior in the SAN, an erasing scheme as shown in Fig. 41b was used. To implement this erasing scheme, another threshold, defined as Eth, was created. This threshold defines the maximum weight sum of the synapses that are post-synaptically connected to a single neuron. If this threshold is crossed, the neuron post-synaptically connected to those synapses will trigger a RESET signal to occur and reset all synapses associated with that neuron, both pre- and post-synaptic. This response means that

푤푖: = 푤:푖 = 0, (∑ 푤푖: > 퐸푡ℎ) (33).

The purpose of this erasing behavior is primarily to remove any highly common features within memories that associate many features together. The desire of this result is to keep the important information in each memory apart by breaking those ultra-frequent feature ties. This behavior removes information from certain memories so that the entire memory cannot be recalled in the future, but the remaining portion of the memory should be more clearly recalled.

The unfortunate side effect to this behavior is that if it was implemented in the hardware version of a SAN, it would require extensive peripheral circuitry to detect programmed synapses and then appropriately send reset signals to the appropriate columns/rows of synapses within the synaptic grid. Although not out of the realm of possibility, it does complicate the architecture of the hardware to a fair extent.

The final behavior that will be studied is simpler and more straightforward than the previous two. Despite the erasing behavior’s potential for keeping the network from saturating with information over time, it might not be enough. Eventually, some memories might not ever be able to take root within the network due to the amount of previous memories already present. To

107

combat this potential issue, the simple concept of forgetting memories in the network as time naturally passes can be implemented. This behavior is incredibly easy to implement as it only requires the synapses within the network to possess a degree of semi-volatile plasticity. Many of the previously shown candidates for synaptic devices [56-58, 71, 75-77] already possess such a behavior. No peripheral circuitry would be required to implement this behavior as the physics of the synaptic devices would act as the source.

To implement a simplified version of the previously described semi-volatile conductance state seen in synaptic devices, a simple equation is applied to the weight of every synapse after each memory is placed into the network (i.e. the passing of time).

푤푖푗(푡 + 1) = 푤푖푗(푡)(1 − 푟푠푡푝) (34).

This equation in a simple sense emulates the state-based decay behavior seen in GSDs as shown in the model presented in [78]. The term, rstp, from this model appears within (34), but in a slightly different way than what was shown in the GSD model. The term, rstp, still dictates the rate at which each synaptic weight decays over time, but more as the percentage value for how much of the device’s current state is lost at each time step. h. Predicting Future Memories

To potentially enhance the random predictive behavior previously seen in Fig. 37, the predictive behavior shown in Fig. 41a can be implemented into the SAN in order to potentially predict more future memories. In order to test this behavior, the SAN was informed to form prediction when necessary with a prediction threshold of Pth=6wON. What this threshold means is whenever a neuron that is not being provided external input has six or more synapses providing feedback input that is directly from neurons that are currently receiving external input, that neuron that is surpassing Pth will also associate itself with all features being currently provided external input.

108

Another way to describe the prediction formation process is to say if the Hamming distance between a neuron’s synaptic connections and the memory currently being presented to the network is two or less, the feature will associate itself with all features in the currently presented memory.

The same input sequence used prior was used for this test and the results can be seen in

Figs. 43-45. Fig. 43 shows that the hit rate for the SAN does indeed take a hit when using predictive behavior. Over the course of the simulation, the network attempted to make an extreme amount of predictions (527 to be exact). This over amount of prediction caused the network to populate its synaptic connections rapidly; resulting in a hit rate that quickly fell over time.

109

Figure 43. Hit Rate as a function of the average recalled U-Factor as memories are shown to the network one-by-one over time when the network is told to predict potential future memories. For this test, Pth=6wON.

Not only did the network’s overall hit rate decrease with the predictive behavior, but also its number of unique memories it could store at any given point in time. Previously, the unique memory ratio stayed close to 1 until near the end of the simulation (when it only dropped to 0.975).

In the simulation where the SAN was told to predict, the large amount of predictions caused the network to oversaturate its synaptic connections. The oversaturation of connections quickly

110

confused the network to the point where it could not recall many memories even once (as depicted in Fig. 44).

Figure 44. The ratio between the number of unique memories the SAN currently holds and the total number of memories shown to the network at that given point in time during the simulation when the network is told to predict potential future memories. For this test,

Pth=6wON.

The complete image of the oversaturation of the network can be seen in Fig. 45 when the memory occurrences over the course of the entire simulation are mapped. As time passes an

111

increasing amount of memories cannot take root within the system at any point in time due to oversaturation, resulting in a very sparse recall map and overall poor performance. The network does correctly predict a couple memories as shown by the lines above the diagonal in the upper portion of Fig. 45. These couple predictions came at a great cost however, and they unfortunately were not permanent. It is also not certain if these predictions were correct due to the predictive behavior or sheer chance like the previous experiment.

Figure 45. Recall count heatmap for every memory at each point during the simulation when the network is told to predict potential future memories. For this test, Pth=6wON.

112

If the network were to make less predictions, the resulting performance might improve. To test this hypothesis, the same simulation was once again reran but with a Pth=7wON. The value for

Pth cannot exceed this value since when using the EHoS dataset, the network only has a total of eight sets. The results of this simulation can be seen in Figs. 46-48. During the duration of this simulation, the network still made many predictions, albeit a lot less than the previous trial (250 for Pth=7wON).

The hit rate shown in Fig. 46 demonstrates a profile that looks keenly similar to that of the original experiment shown in Fig. 35. This might mean that the reduced amount of predictions helped alleviate some of the previous issues with the predictive behavior.

113

Figure 46. Hit Rate as a function of the average recalled U-Factor as memories are shown to the network one-by-one over time when the network is told to predict potential future memories. For this test, Pth=7wON.

Likewise, in Fig. 47 the unique memory ratio is much improved than the one in Fig. 44.

This result again alludes to the fact that a lower amount of predictions is most likely ideal.

114

Figure 47. The ratio between the number of unique memories the SAN currently holds and the total number of memories shown to the network at that given point in time during the simulation when the network is told to predict potential future memories. For this test,

Pth=7wON.

The recall occurrence map in Fig. 48 shows another profile that looks very similar to the original experiment. General occurrences for the network are much higher than before, but the network appears to have been less successful in predicting future memories than prior.

115

Figure 48. Recall count heatmap for every memory at each point during the simulation when the network is told to predict potential future memories. For this test, Pth=7wON.

Although the performance of the predictive behavior is not ideal, part of this issue could be attributed to the EHoS dataset. There are many specific features within the EHoS dataset that are highly unique (e.g. first names, last names, dynasty, etc.) that might only occur a few times within the dataset. Other attributes such as cause of death, state, etc. have much lower U-Factors.

This variance in U-Factors across different sets of information means that some specific features of memories could be predicted correctly (i.e. features belonging to lower U-Factor sets) while

116

others are not. The way the SAN determines a recalled memory requires the entire memory to be recalled, and not only a fraction of the original features. So, some of those misses could be attributed to cases where the network was off by a factor of 1-2 features. In datasets where the U-

Factor for each set is lower, predictions could be made more effectively. A lower U-Factor for every set however means that there are less features per set, which points to the overall network being smaller. Smaller networks cannot store as much information as larger ones [97] which would cause their synaptic grid to saturate much faster. In general, a behavior should be developed to maintain the network over time as the synaptic grid begins to saturate. This behavior should accommodate the network not only in cases where the predictive behavior is being used, but also when utilizing its baseline functionality as shown in Figs. 35-37. i. Erasing Common Information Observed

To tend to the SAN’s saturation of the synaptic grid issue over time, the erasing behavior shown in Fig. 41b can be implemented on the base functionality level of the network (i.e. not prediction behavior being used for now). To test this behavior, the same memory input sequence was used once more with an erase threshold defined (as previously described) of Eth=400wON. The results of this simulation can be seen in Figs. 49-51.

Starting with the hit rate as before, an immediate observation can be made in Fig. 49 partway through the simulation between the 400th and 500th timesteps. The erase threshold defined for this simulation quite large, and the anomaly that appears at this point is the only erasure event that occurs within this trial. Up until the erasure event the simulation was proceeding exactly as the base simulation shown in Fig. 35. After the event the overall hit rate for the network took an immediate dive. In the short term, this erasure event detriments the performance of the network.

However, with the commonly occurring feature having its synaptic connections reset after this

117

point, there is more room in the network for future memories to take root. As time passes in Fig.

49, the overall hit rate for the network increases instead of decreasing since new memory space has been made. Once the final memory has been introduced to the network, the hit rate for the network is near approaching the original hit rate from the base functionality shown in Fig. 35.

Figure 49. Hit Rate as a function of the average recalled U-Factor as memories are shown to the network one-by-one over time when the network is told to erase commonly occurring features. For this test, Eth=400wON.

118

A more impressive depiction of the SAN’s recovery after the erasure event can be seen when studying the unique memory ratio shown in Fig. 50. Once the erasure event occurs, a very steep decline in the network’s number of unique memories stored takes ~8% hit. However, as time passes, this unique memory ratio quickly recovers back to a level of 0.98 (i.e. 98%). In Fig. 36, the end unique memory ratio for the base level functionality of the SAN was 0.975, or 97.5%.

From this result it can be concluded that the SAN when using the erasing behavior can store at least a slightly higher amount of unique memories within itself than when not using this behavior.

Figure 50. The ratio between the number of unique memories the SAN currently holds and the total number of memories shown to the network at that given point in time during the

119

simulation when the network is told to erase commonly occurring features. For this test,

Eth=400wON.

To further observe how the erasing behavior affects the network, the memory occurrence map shown in Fig. 51 shows an interesting picture of how the erasure event causes a shift in the memories being recalled. After the event some memories that began to be unrecallable suddenly became recallable again, and others that were still recallable at the time of the event become unrecallable. Over time however, more newly introduced memories are able to take root in the network and other memories that previously became unrecallable returned due to memories with low Hamming distances to the previously erased ones being shown to the network after the erasure event.

120

Figure 51. Recall count heatmap for every memory at each point during the simulation when the network is told to erase commonly occurring features. For this test, Eth=400wON.

To further study the interesting results of the erasing behavior, another simulation was conducted with a much lower erase threshold of Eth=200wON. The same memory sequence was shown to the network as before. The results of this trial can be seen in Figs. 52-54. Over the course of this simulation, a total of 14 erasure events occurred due to the much lower erase threshold.

The hit rate shown in Fig. 52 for the lower threshold erasing behavior SAN shows a hit rate in constant flux due to the amount of erasure events occurring. One common theme after each

121

event however is that the hit rate tends to recover or increase after the event. These events begin occurring to enough of an extend however that by the end of the simulation so many have occurred that it has severely damaged the network’s overall hit rate when compared to the higher threshold trial demonstrated in Fig. 49.

Figure 52. Hit Rate as a function of the average recalled U-Factor as memories are shown to the network one-by-one over time when the network is told to erase commonly occurring features. For this test, Eth=200wON.

122

Likewise, the unique memory ratio for this lower erase threshold shown in Fig. 53 tells a similar story of constant erasure events occurring within the network; preventing it from ever reaching high unique memory ratio values. At the end of the simulation, a series of two erasure events happen in quick succession that gravely damages the number of unique memories in the network to a point where it only partially recovers.

Figure 53. The ratio between the number of unique memories the SAN currently holds and the total number of memories shown to the network at that given point in time during the

123

simulation when the network is told to erase commonly occurring features. For this test,

Eth=200wON.

The deviation of the network to recalling specific memories can be seen in the memory occurrence map shown in Fig. 54. As each erasure event occurs certain memories either temporarily or permanently become unrecallable. However, newer memories after each event almost always take root. This phenomenon can be observed by noticing the right-angle triangle pattern that closely follows the diagonal across Fig. 54.

Figure 54. Recall count heatmap for every memory at each point during the simulation when the network is told to erase commonly occurring features. For this test, Eth=200wON.

124

From the two simulations conducted on the erase behavior described in Fig. 41b shown in

Figs. 49-54, a few interesting conclusions can be made. First, it demonstrates the potential for such a maintenance scheme for the SAN in long-term application spaces such as lifelong learning. The

EHoS dataset only has 700 memories (or data points) within it, which is not a negligible amount, but not a massive one either. Some of the sets of data within the EHoS dataset also have very high

U-Factors, which means the network naturally has more space to store diverse amounts of information. In a scenario where the SAN is being used for long periods of time, if the network was given a large amount of data or a dataset with sets that had universally low U-Factors, the

SAN could quickly saturate its synaptic grid with too much information. The network would become quickly confused (in a manner like the prediction behavior seen in Figs. 43-45) and could not recall important information about the items it is supposed to remember. The erase behavior demonstrates useful scheme to keep the network away from its saturated state while also maintaining higher performance.

Second, the change in performance between the first (Eth=400wON) and second

(Eth=200wON) erase behavior trials shows another important conclusion. The erase behavior can be very beneficial to network performance, however if used too much it could decrease the network’s recall capability to a point where it can never recover before the next erasure event occurs. This result means that the erase threshold should be kept to a higher level with respect to the amount of data provided to the network over time to ensure high performance.

Finally, the last thing that can concluded from the erase behavior tests is how accepting the network is to new memories over time. In the original base behavior run (Fig. 37), memories began to appear that could never take root in the network as time passed (i.e. solid black lines that exist in Fig. 37 until the end of the simulation). If it is desirable to always ensure every memory exists

125

in the network for at least some period, it is necessary to remove these events. The erase behavior does reduce this phenomenon to a degree, but it does not remove it completely. When Eth=400wON,

Fig. 51 shows fewer of these events occurring, but some persist. When Eth is lowered to 200wON in Fig. 54 only a few black lines remain. If the erase threshold is further lowered this observation means that this issue of new memories not taking root would disappear, but at the cost of general network performance. Also as described previously, implementing the erase behavior within hardware would be quite expensive from both a space, processing, and power perspective due to the amount of peripheral circuitry required to monitor, detect, and perform the erase behavior.

Could there be a simpler way to fix the new memory problem? Luckily, the behavior described in

Fig. 41c answers this question. j. Forgetting Memories Over Time

If it is necessary to ensure that every single new memory shown to the network is to be immediately recallable, the forgetting behavior described in Fig. 41c can answer that problem with ease. As previously discussed, many synaptic device candidates are volatile to some extent by nature. In hardware, this volatile nature can be harnessed as the SAN slowly “forgetting” memories over time. To demonstrate this behavior in action, the same memory sequence as before was given to the SAN when being told to forget with rstp=0.005 (0.5%). What the value for rstp means is that between every time step in the simulation (i.e. between each memory being exposed to the SAN), every single synaptic connection within the SAN’s synaptic grid will lose 0.5% of its current weight value. In this version of the SAN, synaptic connections will no longer remain at wON for eternity, but instead start at wON once programmed and then decay at a rate of 0.5% of their current value at each time step.

126

The result of this simulation is contained in Figs. 55-57. One very quick observation that can be made pertaining to the hit rate seen in Fig. 55 is that the hit rate very quickly drops to a low value past the early stages of the simulation. For this behavior however this result is not of much concern. The behavior forgets previous information over time by design. This resulting hit rate is expected and should not be taken as a failure metric for the behavior.

Figure 55. Hit Rate as a function of the average recalled U-Factor as memories are shown to the network one-by-one over time when the network is told decay its synaptic connections over time. For this test, rstp=0.005 (0.5%).

127

Once again in Fig. 56, the unique memory ratio shown as time passes continually decreases.

The network was able to maintain a high unique memory ratio for a period before the decrease, however. This initial retention of the memory is due to the slow rate of decay of the synapses.

Higher values of rstp would result in a sharp and faster decrease in the unique memory ratio in addition to a lower floor for the ratio value. As can be seen in Fig. 56, a trend resembling some sort of floor does begin to appear at the end of the simulation. The level of this floor would be dependent upon the value of rstp since it dictates how quickly memories should fade from the network over time.

Figure 56. The ratio between the number of unique memories the SAN currently holds and the total number of memories shown to the network at that given point in time during the

128

simulation when the network is told decay its synaptic connections over time. For this test, rstp=0.005 (0.5%).

The floor of the unique memory ratio can be more clearly seen if the decay rate of the synapses in the system is increased to rstp=0.015 (1.5%). When increase, a very distinct floor appears in the network just above the 30% unique memory ratio mark.

Figure 57. The unique memory ratio for the SAN when its rstp value is set to 0.015 (1.5%).

Finally, Fig. 58 shows the critical behavior that the forgetting behavior is designed to exhibit. As time passes for the SAN, new memories are introduced to the network and other, older

129

memories begin to fade. The memory “band” that forms across the diagonal of Fig. 58 shows a sort of temporal forgetting shelf over time that generally keeps in step with the diagonal. Past that shelf, it would be logical to assume that the memory is forgotten forever. That hypothesis is not the result that is observed in Fig. 58, however. Fig. 58 shows that once memories appear to be

“forgotten” past the shelf, that they can appear periodically as memories once again at later times.

This phenomenon is very interesting and is one that deserves explanation.

Figure 58. Recall count heatmap for every memory at each point during the simulation when the network is told decay its synaptic connections over time. For this test, rstp=0.005 (0.5%).

The previously described “shelf” that appears where memories seem to be forgotten is much more apparent when the SAN is re-ran once again with a higher decay rate of rstp=0.015

130

(1.5%). Fig. 59 shows the existence of this shelf in a much more prominent fashion. Plenty of noise still exists after the shelf however, where memories are intermittently remembered and forgotten as other memories are exposed to the network.

Figure 59. Recall count heatmap for every memory at each point during a simulation of the

SAN when told to decay at a value of rstp=0.015 (1.5%).

From the forgetting behavior tested and shown, there are two critical conclusions that can be made. The first involves what the forgetting behavior was initially designed to solve. As can be seen in Fig. 58, all memories are able to be recalled once introduced at least for a period. No more

131

solid black lines exist in the map. This result shows that the forgetting behavior could be an easy alternative to ensuring immediate recallability of memories within the SAN. Another side conclusion that can be made from this result as well is that the behavior could be useful in solving the memory saturation problem if the SAN is provided an extensive amount of data (i.e. larger than the EHoS dataset).

The second critical conclusion that can be made from the forgetting behavior is that memories initially seem “forgotten for good,” but then appear once again later in time! This result might appear odd at first but makes sense when considering the information in future memories and its Hamming distance to information in previously “forgotten” memories.

When a memory is “forgotten” in the SAN in Figs. 55-59, the term “forgotten” does not mean it is completely wiped from memory. Instead, the memory is simpler weaker in strength than any other memories in the SAN’s recent memory, so the memory cannot be recalled.

However, if a new memory is exposed to the network that has a low Hamming distance to the previously observed memory that appears forgotten (i.e. the two memories are incredibly similar), enough of the synaptic connections associated with that old, forgotten memory are reactivated to once again enable that old memory to be fully recalled. A human neurological analogy to this phenomenon would be a person remembering an event in the past that is tied to specific stimuli such as smell or sound. Despite that memory being mostly forgotten for long periods of time, those specific smells or sounds can trigger the old memory to be seemingly remembered out of nowhere.

Examples of this phenomenon in humans widely range from sudden hits of nostalgia [102] to extreme moments of post-traumatic stress disorder [103]. The behavior demonstrated in Fig. 58 is a result analogous to those phenomena.

132

k. Using Behavior Ensembles

Up to this point, the extra behaviors for the SAN have only been studied on their own.

These behaviors can be used with one another however to obtain different behaviors that might be beneficial. Using all three extra behaviors together is an experiment worth performing and will be the first behavior ensemble to be studied in this subsection.

For the full ensemble test of the SAN, the simulation parameter values of Pth=6wON,

Eth=200wON, and rstp=0.005 were used. When using all three behaviors together in a single simulation of the SAN, the end results across Figs. 60-62 appear like an averaged version of previous plots that showed different learning behaviors. Fig. 60 displays a very modest hit rate performance for the network that appears akin to either the predictive behavior’s hit rate or the forgetting behavior’s hit rate. A single erasure event can be seen occurring just past the 300th time step. The lack of erasure events in this version of the network can be explained by the forgetting behavior preventing the Eth to be exceeded for them to occur. Unfortunately, only small degrees of the increase in hit rate with respect to time due to erasure events can be seen at high U-Factor averages before quickly disappearing due to the transient decay of old weights due to the forgetting behavior. It should also be noted that the predictive behavior made the same amount of predictions that it did when previously used by itself (527), so this might have been a contributing factor the hit rate seen in Fig. 60.

133

Figure 60. Hit Rate as a function of the average recalled U-Factor as memories are shown to the network one-by-one over time when the network is told to use all three behaviors. For this test, Pth=6wON, Eth=200wON, and rstp=0.005 (0.5%).

When studying the unique memory ratio for the network, once again a mix of previous patterns can be seen appearing in Fig. 61. The forgetting behavior drops the network’s unique memory ratio slowly over time, with the erasure event that occurred near the middle of the simulation causing a quick drop following by a gradual increase in the unique memory ratio

134

afterwards. This increase is quickly overwhelmed however by the transient decay of the system as was shown in Fig. 60.

Figure 61. The ratio between the number of unique memories the SAN currently holds and the total number of memories shown to the network at that given point in time during the simulation when the network is told to use all three behaviors. For this test, Pth=6wON,

Eth=200wON, and rstp=0.005 (0.5%).

When observing the recall occurrence heatmap for the simulation in Fig. 62, evidence of all three behaviors can be seen appearing in certain parts of the figure. The erasure event is very

135

apparent near the middle of the simulation, the network’s older memory decrease in frequency as time progresses, and a handful of predictive memories occur above the plot’s diagonal. These predictions are unfortunately still very seldom and some of them are temporary. The question to ask at this point involving the predictive behavior is that are the predictions appearing due to the behavior, or due to random chance?

Figure 62. Recall count heatmap for every memory at each point during the simulation when the network is told to use all three behaviors. For this test, Pth=6wON, Eth=200wON, and rstp=0.005 (0.5%).

136

The way to test if the predictive behavior is assisting the network in making predictions is to perform a second ensemble test that only includes the erase and forgetting behaviors and to exclude the predictive behavior. The results of this version of the ensemble test can be seen in

Figs. 63-65.

As can be seen in the hit rate results, they appear incredibly identical to the results previously seen in Fig. 60. This result is evidence that the predictive behavior had little effect on the overall hit rate of the network when using it in tandem with the other behaviors. This result is a positive one in showing that the predictive behavior, when used with other behaviors, can have one of its original detrimental effects mitigated.

137

Figure 63. Hit Rate as a function of the average recalled U-Factor as memories are shown to the network one-by-one over time when the network is told to only slowly decay its synaptic connections over time and erase commonly occurring features. For this test, Eth=200wON and rstp=0.005 (0.5%).

Once again in Fig. 64, the unique memory ratio obtained when not using the predictive behavior is almost the same. In Fig. 44, it was shown that the unique memory ratio would drop drastically when specifically using the predictive behavior. This result when comparing Fig. 64

138

demonstrates that the predictive behavior appears to have some of its previous negative effects removed once more.

Figure 64. The ratio between the number of unique memories the SAN currently holds and the total number of memories shown to the network at that given point in time during the simulation when the network is told to only slowly decay its synaptic connections over time and erase commonly occurring features. For this test, Eth=200wON and rstp=0.005 (0.5%).

The recall occurrence heatmap in Fig. 65 is one that once again looks strikingly similar to the one shown in Fig. 62. There are small differences hidden within initial recall values for some

139

memories along the diagonal, but they are often minor. The real conclusion point to be taken from this figure is that the predictions that appeared in Fig. 62 all still appear here. This result means that these predictions are due to random chance, and not due to the predictive behavior. Even though the predictive behavior had its negatives removed, its one beneficial trait did not show promise either.

Figure 65. Recall count heatmap for every memory at each point during the simulation when the network is told to only slowly decay its synaptic connections over time and erase commonly occurring features. For this test, Eth=200wON and rstp=0.005 (0.5%).

140

As one final demonstration in an attempt to show the predictive behavior’s positive outcome on the network’s long term performance, one last behavior ensemble simulation was conducted where the network was told to form predictions (where Pth=6wON) and also erase commonly occurring features within the SAN (where Eth=200wON). During this test, the internal state of the SAN’s synaptic grid will be a battleground between the SAN attempting to make many predictions as it encounters data over time while erasing information that is quickly filling the

SAN’s memory. The hit rate for this simulation is displayed in Fig. 66. It demonstrates a hit rate pattern that appears like the previous erase behavior hit rate shown in Fig. 52. Variations in the pattern are present, however. The prediction behavior in this simulation attempted to make 460 predictions. The amount is reduced from previous trials likely due to the erase behavior constantly erasing commonly occurring data (i.e. 36 different erasure events occurred during the length of the trial).

141

Figure 66. Hit rate heatmap of the SAN when it was told to perform predictions and erase commonly occurring features within the synaptic grid. For this test, Eth=200wON and

Pth=6wON.

A version of the unique memory ratio that appears as a mix between the erase and prediction behaviors shown previously appears in Fig. 67. After the catastrophic drop in unique memories maintained between the 200th and 300th time steps in the simulation, the ratio slowly dropped with a couple intermittent drops of ~10-15% thrown into the curve.

142

Figure 67. Unique memory ratio for the SAN when told to perform predictions and erase commonly occurring features within the synaptic grid. For this test, Eth=200wON and

Pth=6wON.

Finally, Fig. 68 shows the recall occurrence heatmap for this ensemble of behaviors. One thing that can be observed is a higher amount of correctly predicted memories in the black region of the heatmap than in the two previous ensemble trials. Unfortunately, these predictions were all temporary and did not retain themselves within the system. However, this one piece of evidence does give a small amount of promise to the predictive behavior’s potential in other situations.

143

Figure 68. Recall occurrence heatmap for the SAN when told to perform predictions and erase commonly occurring features within the synaptic grid. For this test, Eth=200wON and

Pth=6wON.

One potential shortfall of the predictive behavior used in the SAN demonstrated here is that if Fig. 32 is taken into account, there are no memories in the entire EHoS dataset with an average Hamming distance from all other memories in the dataset below six. In order for a prediction to be accurately made within the dataset, a total of three memories must be alike enough.

Two of these memories would act as “predictors” for the later occurring third memory. There are

144

isolated cases of high levels of likeness occurring within the dataset, but they are not common. In a dataset where items having a lower average Hamming distance from one another, the predictive behavior could possibly make more accurate predictions than the ones shown here (which often appear to be happening by chance). Fig. 68 does lend some promise to the predictive behavior when specifically used with the erase behavior, but the amount of predictions the behavior still makes to the number of accurate guesses is most likely too many.

145

VII. Conclusion

During the design process for any engineering system, the individual(s) designing the system must make a choice on where to start and end the design process for the system’s hierarchy. Some systems require a top-down approach where either applications or theory at the higher levels of system complexity guide the design process due to requirements or restraints placed on the system.

Other systems require a bottom-up approach where certain fundamental rules at the lowest levels of system hierarchy constrain or guide the system on tasks it can perform. The type of design process demonstrated in this dissertation has been one the latter type where the innate behavior and design of low-level devices within the system guides its overall nature.

Through the design of the SR Octopus Retina neuron circuit and the innate behavior of gated synaptic devices demonstrated in this dissertation, those two core building blocks laid a ruleset that showed how fundamentals at the device and circuit level of a neuromorphic system can shape how it behaves at the macroscale. From the initial work that utilized the double gated synaptic device model that demonstrated a neuromorphic architecture for basic navigation tasks, to the demonstration of the segmented attractor network using a behavioral model for GSDs and dataset-level evaluation, systems were shown that can asynchronously process information provided to them for associative learning tasks.

Developing architectures and systems that demonstrate associative learning behavior will be a core design paradigm for future AI systems that are built for complex tasks in lifelong learning situations. As neuromorphic architectures and neural networks become increasingly complex in future years, techniques must be developed to handle the increasing amount of information these systems output in order to display emergent behaviors that living organisms such as humans display that our AI currently does not exhibit. The associative learning system shown here is one

146

such system that can analyze high level concepts, tasks, and ideas and relate them to one another to perform higher level actions in a continuous manner.

147

References

[1] M. M. Waldrop, "The chips are down for Moore’s law," Nature News 530.7589, 2016, 144.

[2] J. von Neumann, “First Draft of a Report on the EDVAC,” University of Pennsylvania

Moore School Library, Contract No. W-670-ORD-4926, 1945.

[3] D. A. Patterson and J. L. Hennessy, Computer Organization and Design, 4th edition,

Morgan Kaufmann, 2011.

[4] J. Backus, “Can programming be liberated from the von Neumann ? A functional style and its algebra of programs,” Communications of the ACM, vol. 21, no. 8, 1978, pp. 613-641.

[5] J. Edwards and S. O'Keefe, “Eager recirculating memory to alleviate the von Neumann

Bottleneck,” 2016 IEEE Symposium Series on Computational Intelligence (SSCI), IEEE, 2016.

[6] M. Naylor and C. Runciman, “The Reduceron: Widening the von Neumann bottleneck for graph reduction using an FPGA,” Symposium on Implementation and Application of Functional

Languages. Springer, Berlin, Heidelberg, 2007.

[7] D. Ielmini and H-S. P. Wong, “In-memory computing with resistive switching devices,”

Nature Electronics, vol. 1, no. 6, 2018, pp. 333-343.

[8] C. J. Thompson, S. Hahn, and M. Oskin, “Using modern graphics architectures for general- purpose computing: a framework and analysis,” 35th Annual IEEE/ACM International

Symposium on Microarchitecture (MICRO-35), IEEE, 2002.

[9] C. Chen et al., “Gflink: An in-memory computing architecture on heterogeneous CPU-

GPU clusters for big data,” IEEE Transactions on Parallel and Distributed Systems, vol. 29, no. 6,

2018, pp. 1275-1288.

[10] A. K. Fidjeland and M. P. Shanahan, “Accelerated simulation of spiking neural networks using GPUs,” The 2010 International Joint Conference on Neural Networks (IJCNN), IEEE, 2010.

148

[11] A. K. Jain, J. Mao, and K. M. Mohiuddin, “Artificial neural networks: A tutorial,”

Computer, vol. 29, no. 3, 1996, pp. 31-44.

[12] E. Nurvitadhi et al., “Accelerating binarized neural networks: Comparison of FPGA, CPU,

GPU, and ASIC,” 2016 International Conference on Field-Programmable Technology (FPT),

IEEE, 2016.

[13] M. Motamedi et al., “Design space exploration of FPGA-based deep convolutional neural networks,” 2016 21st Asia and South Pacific Design Automation Conference (ASP-DAC), IEEE,

2016.

[14] Q. Liu et al., “Live Demonstration: Face Recognition on an Ultra-low Power Event-driven

Convolutional Neural Network ASIC,” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2019.

[15] C. D. Schuman et al., “A survey of neuromorphic computing and neural networks in hardware,” arXiv preprint arXiv:1705.06963, 2017.

[16] G. B. Penn et al., “System and method for applying a convolutional neural network to speech recognition,” U.S. Patent No. 9,734,824. 15 Aug. 2017.

[17] M. Lu et al., “Stock index prediction method and device based on neural network model and time series,” CN Patent No. CN110264352A, 20 Sep. 2019.

[18] X. Zhang et al., “High-performance video content recognition with long-term recurrent convolutional network for FPGA,” 2017 27th International Conference on Field Programmable

Logic and Applications (FPL), IEEE, 2017.

[19] A. Bain, Mind and body: The theories of their relation, vol. 4, Henry S. King, 1873.

[20] W. James, The Principles of Psychology, vol. 2, H. Holt, 1890.

149

[21] C. S. Sherrington, “Experiments in examination of the peripheral distribution the fibres of the posterior roots of some spinal nerfs. –Part II,” Philosophical Transactions of the Royal Society

B, vol. 190, 1898. DOI: 10.1098/rstb.1898.0002

[22] W. S. McCulloch and W. Pitts, “A logical calculus of the ideas immanent in nervous activity,” The bulletin of mathematical biophysics, vol. 5, 1943, pp. 115-133. DOI:

10.1007/BF02478259

[23] D. O. Hebb, The Organization of Behavior, a Neuropsychological Theory, John Wiley &

Sons, Inc., 1949.

[24] A. L. Hodgkin and A. F. Huxley, “Currents carried by sodium and potassium ions through the membrane of the giant axon of Loligo,” The Journal of Physiology, vol. 116, no. 4, 1952, pp.

449-472.

[25] Ö. Yildirim et al., "Simulation of biological networks," Anatomy: International Journal of

Experimental & Clinical Anatomy, vol. 13, 2019.

[26] F. Rosenblatt, “The Perceptron: A Probabilistic Model for Information Storage and

Organization in the Brain,” Cornell Aeronautical Laboratory, Psychological Review, vol. 65, no.

6, 1958, pp. 386–408. DOI:10.1037/h0042519

[27] F. Silva et al., “Perceptrons from memristors,” Neural Networks, vol. 122, 2020, pp. 273-

278.

[28] H. J. Kelley, “Gradient theory of optimal flight paths,” Ars Journal, vol. 30, no. 10, 1960, pp. 947-954.

[29] D. E. Rumelhart, G. E. Hinton, and R. J. Williams, “Learning representations by back- propagating errors,” nature, vol. 323, no. 6088, 1986, pp. 533-536.

150

[30] R. Dechter, “Learning While Searching in Constraint-Satisfaction-Problems,” Proceedings of the 5th National Conference on Artificial Intelligence, 1986, pp. 178-185.

[31] I. Goodfellow, Y. Bengio, and A. Courville, Deep learning, MIT press, 2016.

[32] C. Mead, Analog VLSI and Neural Systems, Addison-Wesley, 1989.

[33] R. Sarpeshkar, L. Watts, and C. Mead, “Refractory neuron circuits," CNS Technical Report

Number CNS-TR-92-08, California Institute of Technology, 1992.

[34] J. Lazzaro and J. Wawrzynek, “Low-power silicon neurons, axons and synapses,” Silicon implementation of pulse coded neural networks, Springer, Boston, MA, 1994, pp. 153-164.

[35] G. Indiveri, “A low-power adaptive integrate-and-fire neuron circuit,” Proceedings of the

2003 International Symposium on Circuits and Systems (ISCAS'03), vol. 4. IEEE, 2003.

[36] T. Morie et al., “A multinanodot floating-gate MOSFET circuit for spiking neuron models,” IEEE transactions on nanotechnology, vol. 2, no. 3, 2003, pp. 158-164.

[37] A. Afifi, A. Ayatollahi, and F. Raissi, “Implementation of biologically plausible spiking neural network models on the memristor crossbar-based CMOS/nano circuits,” 2009 European

Conference on Circuit Theory and Design, IEEE, 2009.

[38] J. Hsu, “Ibm's new brain [news],” IEEE spectrum, vol. 51, no. 10, 2014, pp. 17-19.

[39] G. Indiveri, F. Corradi and N. Qiao, “Neuromorphic architectures for spiking deep neural networks,” 2015 IEEE International Electron Devices Meeting (IEDM), Washington, DC, 2015, pp. 4.2.1-4.2.4. DOI: 10.1109/IEDM.2015.7409623

[40] C-K. Lin et al., “Programming spiking neural networks on Intel’s Loihi,” Computer, vol.

51, no. 3, 2018, pp. 52-61.

151

[41] M. Motamedi et al., “Design space exploration of FPGA-based deep convolutional neural networksm," 2016 21st Asia and South Pacific Design Automation Conference (ASP-DAC),

IEEE, 2016.

[42] M. Davies et al., “Loihi: A neuromorphic manycore processor with on-chip learning,”

IEEE Micro, vol. 38, no. 1, 2018, pp. 82-99.

[43] S. Ghosh-Dastidar and H. Adeli, “Spiking neural networks,” International journal of neural systems, vol. 19, no. 04, 2009, pp. 295-308.

[44] B. V. Benjamin et al., “Neurogrid: A mixed-analog-digital multichip system for large-scale neural simulations,” Proceedings of the IEEE, vol. 102, no. 5, 2014, pp. 699-716.

[45] L. Chua, “Memristor-the missing circuit element,” IEEE Transactions on circuit theory, vol. 18, no. 5, 1971, pp. 507-519.

[46] D. B. Strukov et al., “The missing memristor found,” nature, vol. 453, no. 7191, 2008, pp.

80-83.

[47] A. C. Torrezan et al., “Sub-nanosecond switching of a tantalum oxide memristor,”

Nanotechnology, vol. 22, no. 48, 2011.

[48] A. J. Rush and R. Jha, "NbOx synaptic devices for spike frequency dependent plasticity learning," 2017 75th Annual Device Research Conference (DRC), South Bend, IN, 2017, pp. 1-2.

DOI: 10.1109/DRC.2017.7999430

[49] H. Nili et al., “Nanoscale resistive switching in amorphous perovskite oxide (a‐SrTiO3) memristors,” Advanced Functional Materials, vol. 24, no. 43, 2014, pp. 6741-6750.

[50] S. Park et al., “Programmable analogue circuits with multilevel memristive device,”

Electronics letters, vol. 48, no. 22, 2012, pp. 1415-1417.

152

[51] C-M. Jung, J-M. Choi, and K-S. Min, “Two-step write scheme for reducing sneak-path leakage in complementary memristor array,” IEEE transactions on nanotechnology, vol. 11, no. 3,

2012, pp. 611-618.

[52] I. E. Ebong and P. Mazumder, “Self-controlled writing and erasing in a memristor crossbar memory,” IEEE Transactions on Nanotechnology, vol. 10, no. 6, 2011, pp. 1454-1463.

[53] Y. Li et al., “Ultrafast synaptic events in a chalcogenide memristor,” Scientific reports, vol.

3, 2013.

[54] D. Sacchetto, G. De Micheli, and Y. Leblebici, “Multiterminal memristive nanowire devices for logic and memory applications: A review,” Proceedings of the IEEE, vol. 100, no. 6,

2011, pp. 2008-2020.

[55] S. Lim, J.-H. Bae, J.-H. Eum, S. Lee, C.-H. Kim, D. Kwon, and J.-H. Lee, “Hardware- based Neural Networks using a Gated Schottky Diode as a Synapse Device,” in proc. 2018 IEEE

International Symposium on Circuits and Systems (ISCAS), May 2018, pp. 1-5. doi:

10.1109/ISCAS.2018.8351152

[56] B. J. Murdoch, T. J. Raeber, Z. C. Zhao, A. J. Barlow, D. R. McKenzie, D. G. McCulloch, and J. G. Partridge, “Light-gated amorphous carbon memristors with indium-free transparent electrodes,” Carbon, 152, pp. 59-65, Nov. 2019. doi: 10.1016/j.carbon.2019.06.022

[57] E. Herrmann, A. Rush, T. Bailey, and R. Jha, “Gate Controlled Three-Terminal Metal

Oxide Memristor,” IEEE Electron Device Letters, 39 (4), pp. 500-503, Apr. 2018. doi:

10.1109/LED.2018.2806188

[58] L. Bao, J. Zhu, Z. Yu, R. Jia, Q. Cai, Z. Wang, L. Xu, Y. Wu, Y. Yang, Y. Cai, and R.

Huang, “Dual-Gated MoS2 Neuristor for Neuromorphic Computing,” ACS Appl. Mater.

Interfaces, Oct. 2019, pp. 1-35.

153

[59] A. Jones et al., “A neuromorphic SLAM architecture using gated-memristive synapses,”

Neurocomputing, 2019.

[60] J. Lazzaro and J. Wawrzynek, “Low-power silicon neurons, axons and synapses,” Silicon implementation of pulse coded neural networks, Springer, Boston, MA, 1994, pp. 153-164.

[61] G. Indiveri et al., “Neuromorphic silicon neuron circuits,” Frontiers in neuroscience 5

(2011): 73.

[62] J. M. Cruz-Albrecht, M. W. Yung and N. Srinivasa, “Energy-Efficient Neuron, Synapse and STDP Integrated Circuits,” in IEEE Transactions on Biomedical Circuits and Systems, vol. 6, no. 3, pp. 246-256, Jun. 2012. DOI: 10.1109/TBCAS.2011.2174152

[63] L. Zhao, Q. Hong and X. Wang, “Novel designs of spiking neuron circuit and STDP learning circuit based on memristor,” Neurocomputing, vol. 314, pp. 207-214, Nov. 2018. DOI:

10.1016/j.neucom.2018.06.062

[64] T. Tuma, A. Pantazi, M. Le Gallo, A. Sebastian and E. Eleftheriou, “Stochastic phase- change neurons,” Nature Nanotechnology, vol. 11, pp. 693-699, May 2016. DOI:

10.1038/NNANO.2016.70

[65] A. Joubert, B. Belhadj, O. Temam and R. Héliot, “Hardware spiking neurons design: analog or digital?” The 2012 International Joint Conference on Neural Networks, Brisbane,

Australia, Jun. 2012. DOI: 10.1109/IJCNN.2012.6252600

[66] D. Patterson, “50 Years of computer architecture: From the mainframe CPU to the domain- specific tpu and the open RISC-V instruction set,” 2018 IEEE International Solid-State Circuits

Conference-(ISSCC), IEEE, 2018.

[67] Advanced Micro Devices, Inc., “A New Horizon in Processing,” Advanced Micro Devices,

Inc. presentation, 2016.

154

[68] E. Culurciello, R. Etienne-Cummings and K. A. Boahen, "A biomorphic digital image sensor," in IEEE Journal of Solid-State Circuits, vol. 38, no. 2, pp. 281-294, Feb. 2003. DOI:

10.1109/JSSC.2002.807412

[69] Y. Cao et al., Arizona State University’s Predictive Technology Models. [Online], http://ptm.asu.edu/. Last Access: 1/24/2020

[70] V. H. M. and V. N. Ramakrishnan, “A Novel Approach to Analyze Current-Voltage

Characteristics of Double Gated-Memristor,” 2018 4th International Conference on Devices,

Circuits and Systems (ICDCS), IEEE, 2018.

[71] A. Jones, A. Ruen and R. Jha, “A Spiking Neuromorphic Architecture Using Gated-RRAM for Associative Memory,” manuscript submitted to ACM Journal on Emerging Technologies in

Computing, Aug. 2020, manuscript under review.

[72] Tzu-Ying Lin, Yong-Xiao Chen, Jin-Fu Li, Chih-Yen Lo, Ding-Ming Kwai, and Yung-Fa

Chou. 2016. A Test Method for Finding Boundary Currents of 1T1R Memristor Memories. 2016

IEEE 25th Asian Test Symposium (ATS ’16), Hiroshima, Japan, 281-286. DOI: https://doi.org/10.1109/ATS.2016.44

[73] Emmanuelle J Merced-Grafals, Noraica Dávila, Ning Ge, R Stanley Williams, and John Paul

Strachan. 2016. Repeatable, accurate, and high speed multi-level programming of memristor 1T1R arrays for power efficient analog computing applications. Nanotechnology 27, 36 (Aug. 2016), 1-

9. DOI: https://doi.org/10.1088/0957-4484/27/36/365202

[74] Mahmoud Zangeneh and Ajay Joshi. 2014. Design and Optimization of Nonvolatile Multibit

1T1R Resistive RAM. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 22, 8

(Aug. 2014), 1815-1828. DOI: https://doi.org/10.1109/TVLSI.2013.2277715

155

[75] Yoeri van de Burgt, Ewout Lubberman, Elliot J. Fuller, Scott T. Keene, Grégorio C. Faria,

Sapan Agarwal, Matthew J. Marinella, A. Alec Talin, and Alberto Salleo. 2017. A non-volatile organic electrochemical device as a low-voltage artificial synapse for neuromorphic computing.

Nature materials 16, 4 (Apr. 2017), 414-418. DOI: https://doi.org/10.1038/nmat4856

[76] Jianshi Tang, Douglas Bishop, Seyoung Kim, Matt Copel, Tayfun Gokmen, Teodor Todorov,

SangHoon Shin, Ko-Tao Lee, Paul Solomon, Kevin Chan, Wilfried Haensch, and John Rozen.

2018. ECRAM as Scalable Synaptic Cell for High-Speed, Low-Power Neuromorphic Computing.

In 2018 IEEE International Electron Devices Meeting (IEDM ’18). IEEE, San Francisco, CA,

13.1.1-13.1.4. DOI: https://doi.org/10.1109/IEDM.2018.8614551

[77] H. Tan, G. Liu, H. Yang, X. Yi, L. Pan, J. Shang, S. Long, M. Liu, Y. Wu, and R.-W. Li,

“Light-Gated Memristor with Integrated Logic and Memory Functions,” ACS Nano, 11, Oct.

2017, pp. 11298-11305. doi: 10.1021/acsnano.7b05762.

[78] A. Jones and R. Jha, “A Compact Gated-Synapse Model for Neuromorphic Circuits,” submitted to IEEE Transactions on Computer-Aided Design (TCAD), Apr. 2020, manuscript under review. Pre-print Available on arxiv: https://arxiv.org/abs/2006.16302

[79] Schumacher, Rolf, and Kh Rosenbach. "ATR of battlefield targets by SAR classification results using the public MSTAR dataset compared with a dataset by QinetiQ UK." RTO SET

Symposium on Target Identification and Recognition Using RF Systems. 2004.

[80] Waggener, Bill (1995). Pulse Code Modulation Techniques. Springer. p. 206.

[81] D. Batas and H. Fiedler, “A memristor SPICE implementation and a new approach for magnetic flux-controlled memristor modeling,” IEEE Transactions on Nanotechnology, vol. 10, no. 2, 2010, pp. 250-255.

156

[82] Y. V. Pershin and Massimiliano Di Ventra, “On the validity of memristor modeling in the neural network literature,” Neural Networks, vol. 121, 2020, pp. 52-56.

[83] G. Witzany, “Memory and Learning as Key Competences of Living Organisms,” Memory and Learning in Plants. Springer, Cham, 2018, pp. 1-16.

[84] I. P. Pavlov and W. Gantt, “Lectures on conditioned reflexes: Twenty-five years of objective study of the higher nervous activity (behaviour) of animals,” 1928.

[85] N. J. Mackintosh, Conditioning and associative learning, Oxford: Clarendon Press, 1983.

[86] H. D. Beale, H. B. Demuth, and M. T. Hagan, “Neural network design,” Pws, Boston,

1996.

[87] S. K. Pal and S. Mitra, “Multilayer perceptron, fuzzy sets, classification,” 1992.

[88] T. L. Huntsberger and P. Ajjimarangsee, “Parallel self-organizing feature maps for unsupervised pattern recognition,” International Journal Of General System, vol. 16, no. 4, 1990, pp. 357-372.

[89] T. Mikolov et al., “Recurrent neural network based language model,” Eleventh annual conference of the international speech communication association, 2010.

[90] V. Srinivasan, C. Eswaran, and N. Sriraam, “Approximate entropy-based epileptic EEG detection using artificial neural networks,” IEEE Transactions on information Technology in

Biomedicine, vol. 11, no. 3, 2007, pp. 288-295.

[91] K. Gregor et al., “Draw: A recurrent neural network for image generation,” arXiv preprint arXiv:1502.04623, 2015.

[92] K. Pagiamtzis and A. Sheikholeslami, “Content-addressable memory (CAM) circuits and architectures: A tutorial and survey,” IEEE journal of solid-state circuits, vol. 41, no. 3, 2006, pp.

712-727.

157

[93] J. J. Hopfield, “Neural networks and physical systems with emergent collective computational abilities,” Proc. Natl. Acad. Sci., USA, vol. 79, 1982, pp. 2554-2558.

[94] E. Gardner, “Maximum Storage Capacity in Neural Networks,” Europhys. Lett., vol. 4, no.

4, 1987, pp. 481-485.

[95] N. Davey and S. P. Hunt, “The capacity and attractor basins of associative memory models,” International Work-Conference on Artificial Neural Networks, Alicante, Spain, 1999.

[96] A. Storkey, “Increasing the capacity of Hopfield network without sacrificing functionality,” International Conference on Neural Networks, Houston, TX, USA, 1997.

[97] A. Jones et al., “A Segmented Attractor Network for Neuromorphic Associative Learning,”

Proceedings of the International Conference on Neuromorphic Systems, 2019.

[98] German I. Parisi, Ronald Kemker, Jose L. Part, Christopher Kanan, and Stefan Wermter.

2019. Continual lifelong learning with neural networks: A review. Neural Networks 113 (May

2019), 54-71. DOI: https://doi.org/10.1016/j.neunet.2019.01.012

[99] D. Bzdok and J. P.A. Ioannidis, “Exploration, Inference, and Prediction in Neuroscience and Biodmedicine,” Trends in Neurosciences, 42 (4), Apr. 2019, pp. 251-262. DOI: https://doi.org/10.1016/j.tins.2019.02.001

[100] X. Zhou, S. A. Ferguson, R. W. Matthews, C. Sargent, D. Darwent, D. J. Kennaway and

G. D. Roach, “Sleep, Wake, and Phase Dependent Changes in Neurobehavioral Function under

Forced Desynchrony,” Sleep, 34 (7), Jul. 2011, pp. 931-941. DOI: https://doi.org/10.5665/SLEEP.1130

[101] R. Stickgold, “Sleep-dependent memory consolidation,” Nature, 437, Oct. 2005, pp. 1273-

1278.

158

[102] E. Hungenberg, M. Slavich, A. Bailey and T. Sawyer, “Examining Minor League Baseball

Spectator Nostalgia: A Neuroscience Perspective,” Sport Management Review, Jun. 2020. DOI: https://doi.org/10.1016/j.smr.2020.04.001

[103] B. Bottalico and T. Bruni, “Post traumatic stress disorder, neuroscience, and the law,”

Interational Journal of Law and Psychiatry, 35 (2), Apr. 2012, pp. 112-120. DOI: https://doi.org/10.1016/j.ijlp.2011.12.001

159

Appendix A: SR Octopus Retina Neuron Transistor Sizes (ASU PTM Version) [59] Transistor Length (nm) Width (nm) M1 720 180 M2 720 180 M3 180 180 M4 180 180 M5 180 180 M6 180 180 M7 180 180 M8 180 180 M9 720 180 M10 720 180 M11 180 180 M12 720 180 M13 720 180 M14 720 180 M15 720 180 M16 720 180 M17 720 180 M18 720 180 M19 720 180 M20 180 180 M21 180 180 M22 180 180 M23 180 180

160

Appendix B: GSD Model Code (Verilog A)

//GENERIC GATED-SYNAPTIC DEVICE MODEL //ver. 1.9, LAST UPDATED: 3/26/2020 //Latest Updates: //-Final updates to the model have been added. Two new user-parameters have been created (f and xstart) to allow for inverting voltage needed for potentiation/depression and adjusting the initial conductance value of the device, respectively. //-Cosmetic adjustments to the model have been finalized. This includes renaming certain variables and parameters, limiting ranges of certain user- parameters, and cleaning up comments. //AUTHOR: ALEXANDER JONES module syngen(vin,vout,vgate,vstate); //Node List: vin - Input node to synapse. Considered device's presynatic connection. // vout - Output node to synapse. Considered device's postsynaptic connection. // vgate - Gate node to synapse. Primary node to potentiate/depress synaptic device to increase/decrease its conductance. // vstate - State readout for device for x-value.

inout vin,vout,vgate,vstate; electrical vin,vout,vgate,vstate;

//USER-DEFINED SIMULATION PARAMETERS //Threshold voltage for potentiation/depression. parameter real vt = 0 from [0:inf);

//Term that defines whether the behavior during reverse bias is that of a memristor (brev=1), diode (brev=0), or inbetween (0

//Minimum conductance value. parameter real gmin = 1e-11;

//Theoretical maximum conductance value. Acts as an asymptote of inverse exponential and sigmoid portions of conductance equation. parameter real gmax = 1e-6;

//Set time for device (assuming no decay (rstp=0 and rltp=0) and potentiation @1V above threshold voltage (vt). Magnitude often inversely proportional to short-term plasticity decay rate (rstp). parameter real tset = 1e-6;

161

//Rate of decay for short-term plasticity. Magnitude often inversely proportional to set time (tset). parameter real rstp = 0.00 from [0:inf);

//Allows control over amplified depression of device when negative voltage applied. namp=1 allows for a symmetric response, while higher values will make for quicker depression. parameter real namp = 1 from (0:inf);

//Dictates control of how much the bias across the device channel (vin- vout) influences the effective voltage applied to the device via vgate. oc=0 means it has no influence, and oc=1 means the channel bias is fully included in calculating veff. parameter real oc = 0 from [0:1];

//Dictates control of how much vt emphasis occurs when calculating the change in x and xmin. tc=0 is none, while tc=1 is a perfect difference between vgate and vt. parameter real tc = 0 from [0:1];

//Controls the quality of the long-term potentiation rate of the device. The higher the value of qltp, the closer the long-term potentiation conductance value is to gmax as the device is programmed. parameter real qltp = 0.00 from [0:1];

//Controls the rate of decay for long-term plasticity. The higher the value, the sharper the rate of decay for the long-term plasticity state. parameter real rltp = 0.00 from [0:inf);

//Term that indicates if the polarity at which the device is potentiated/depressed is flipped or not. f=1 means its not flipped, while f=- 1 means it is. parameter real f = 1 from [-1:1] exclude (-1:1);

//Parameter that defines the initial conductance state of the device. xstart=0 means the device is in its lowest conductance state, and xstart=1 means it's in its highest conductance state. parameter real xstart = 0 from [0:1];

//Central fitting parameter in controlling a device's overall transient conductance curve shape. gc is a perfect inverse exponential @0, a perfect linear curve @0.5, and a perfect sigmoid @1. Inbetween values are linear combinations of the two closest equation values. // (0

162

//CALCULATED/DEFINED SIMULATION PARAMETERS //Range between minimum and maximum conductance parameter real grange = gmax - gmin;

//Amount the sigmoid is shifted to get state variable started at gmin when x=0. parameter real s = ln(gmax/gmin-1);

//Calculated maximum state variable value for Sigmoidal conductance. Stops once the conductance has reached a point where gsyn = gmax - gmin; parameter real m = ln(1/grange-1)+s;

//Calculated maximum state variable value for Inverse Exponential conductance. Stops once the conductance has reached a point where gsyn=grange. parameter real p = -ln(gmin/grange);

//Defined value for xmax. parameter real xmax = 1;

//VARIABLES

//Effective voltage that is compared to the threshold voltage. real veff = 0;

//State variable that controls state of conductance. real x = xstart;

//Synaptic conductance value for the device channel. real gsyn = 0;

//Tracks current timestamp in simulation. real currtime = 0;

//Tracks time of previous timestamp in simulation. real pasttime = 0;

//Calculates how much x should move based upon the difference between the current and previous simulation timestamps. real xscale = 0;

//Tracks the lowest value at which x can be. xmin can be increased/decreased depending on the user-defined simulation parameters and negative potential applied to vgate.

163

real xmin = 0;

analog begin

//Time Calculation pasttime = currtime; currtime = $realtime;

//Calculation of current xscale value xscale = (currtime-pasttime)/tset;

//Calculating Effective Voltage Applied to vgate veff = f*V(vgate) + oc*(V(vin)-V(vout));

//Tests to potentiate or not against threshold voltage. if (abs(veff) > vt) begin

//Amplifies veff if its polarity is negative in accordance to namp. if (veff < 0) begin veff = namp*veff; end

//Calculates change in x and xmin. if (veff >= 0) begin x = x + (veff-tc*vt)*xscale; xmin = xmin + (veff-tc*vt)*xscale*qltp; end else begin x = x + (veff+tc*vt)*xscale; xmin = xmin + (veff+tc*vt)*xscale*qltp; end

end

//Calculates short-term decay if (x > xmin) begin x = x - (x-xmin)*(currtime-pasttime)*tset*rstp; end

//Decays xmin via long-term decay if (xmin > 0) begin xmin = xmin - (currtime-pasttime)*tset*rltp; end

//Bounds the x state variable

164

if (xmin < 0) begin xmin = 0; end if (xmin > xmax) begin xmin = xmax; end

//Bounds the xmin variable. if (x < xmin) begin x = xmin; end if (x > xmax) begin x = xmax; end

//Calculates conductance gsyn = max(1-2*gc,0)*(grange*(1-exp(-x*p))+gmin)+max(2*gc- 1,0)*gmax/(1+exp(-x*m+s))+(-abs(2*gc-1)+1)*(gmin+x*grange);

//Calculates final current. if (V(vin)-V(vout)>=0) begin I(vin,vout) <+ gsyn*(V(vin)-V(vout)); end else if (V(vin)-V(vout)<0) begin I(vin,vout) <+ brev*gsyn*(V(vin)-V(vout))+(1- brev)*gsyn*(exp(V(vin)-V(vout))-1); end

//State readout V(vstate) <+ x;

end endmodule

165

Appendix C: EHoS Dataset

First Title/Last Cent State Position Dynasty/Family/Poli Cause of Reign Name Name/Numeral ury tical Party Death (Years) Doris Bures 21st President Socialist Alive 0 AD 20th Austria President Austrian People's Natural 0 AD Causes 21st Austria President Socialist Alive 12 AD Michael Hainisch 20th Austria President None Natural 8 AD Causes Adolph Hitler 20th Austria Führer Nazi Suicide 7 AD Norbert Hofer 21st Austria President Freedom Alive 0 AD Charles I 20th Austria Emperor Habsburg Exile 1 AD Ferdinan I 19th Austria Emperor Habsburg Natural 13 d AD Causes Francis I 19th Austria Emperor Habsburg Natural 30 AD Causes Francis I 19th Austria Emperor Habsburg Natural 67 AD Causes 20th Austria President Socialist Natural 8 AD Causes Rudolf Kirchschläger 20th Austria President None Natural 12 AD Causes 20th Austria President Austrian People's Natural 0 AD Causes 20th Austria President Austrian People's Natural 11 AD Causes Karlhein Kopf 21st Austria President Austrian People's Alive 0 z AD Theodor Körner 20th Austria President Socialist Natural 5 AD Causes 20th Austria President Socialist Natural 0 AD Causes Wilhelm Miklas 20th Austria President Christian Social Natural 9 AD Causes 20th Austria President Austrian People's Natural 0 AD Causes 20th Austria President Socialist Natural 5 AD Causes Adolf Schärf 20th Austria President Socialist Natural 7 AD Causes 20th Austria President Socialist Natural 1 AD Causes Alexand Van der Bellen 21st Austria President Green Alive 3 er AD 20th Austria President Austrian People's Natural 6 AD Causes Ivan Alexander 14th Shishman Natural 40 AD Causes Presian I 9th Bulgaria Unknown 16 AD Alexand I 19th Bulgaria Prince Battenberg Natural 7 er AD Causes

166

Boris I 9th Bulgaria Prince Dulo Natural 37 AD Causes Constant I 13th Bulgaria Tsar Asen Combat 20 ine AD Ivan I 12th Bulgaria Tsar Asen Assassinat 6 AD ed Kaliman I 13th Bulgaria Tsar Asen Assassinat 5 AD ed Michael I 13th Bulgaria Tsar Asen Assassinat 10 AD ed Petar I 10th Bulgaria Tsar Dulo Natural 42 AD Causes Simeon I 10th Bulgaria Tsar Dulo Natural 34 AD Causes Ferdinan I 20th Bulgaria Tsar Saxe-Coburg and Natural 31 d AD Gotha Causes George I 13th Bulgaria Tsar Terter Unknown 12 AD Ivan II 13th Bulgaria Tsar Asen Natural 23 AD Causes Kaliman II 13th Bulgaria Tsar Asen Assassinat 0 AD ed Petar II 12th Bulgaria Tsar Asen Assassinat 5 AD ed Boris II 10th Bulgaria Tsar Dulo Accident 1 AD Simeon II 20th Bulgaria Tsar Saxe-Coburg and Alive 3 AD Gotha Constant II 15th Bulgaria Tsar Shishman Exile 25 ine AD George II 14th Bulgaria Tsar Shishman Natural 1 AD Causes Ivan III 13th Bulgaria Tsar Asen Unknown 1 AD Boris III 20th Bulgaria Tsar Saxe-Coburg and Natural 24 AD Gotha Causes Michael III 14th Bulgaria Tsar Shishman Combat 7 AD None 9th Bulgaria Kanasubigi Dulo Unknown 17 AD Asparuk None 7th Bulgaria Khan Dulo Combat 20 h AD Kardam None 8th Bulgaria Khan Dulo Unknown 26 AD Kormesi None 8th Bulgaria Khan Dulo Unknown 17 y AD None 9th Bulgaria Khan Dulo Natural 11 AD Causes Malamir None 9th Bulgaria Khan Dulo Natural 5 AD Causes Sevar None 8th Bulgaria Khan Dulo Natural 15 AD Causes None 8th Bulgaria Khan Dulo Unknown 9 AD Tervel None 8th Bulgaria Khan Dulo Unknown 20 AD Pagan None 8th Bulgaria Khan None Assassinat 1 AD ed Sabin None 8th Bulgaria Khan None Natural 1 AD Causes

167

Toktu None 8th Bulgaria Khan None Combat 1 AD Telets None 8th Bulgaria Khan Ugain Assassinat 3 AD ed Kormiso None 8th Bulgaria Khan Vokil Unknown 3 sh AD Umor None 8th Bulgaria Khan Vokil Unknown 0 AD Vinekh None 8th Bulgaria Khan Vokil Assassinat 6 AD ed Vladimir None 9th Bulgaria Prince Dulo Executed 4 AD Boril None 13th Bulgaria Tsar Asen Executed 11 AD Kaloyan None 13th Bulgaria Tsar Asen Assassinat 10 AD ed Mitso None 13th Bulgaria Tsar Asen Unknown 1 AD Samuel None 11th Bulgaria Tsar Cometopuli Natural 17 AD Causes Roman None 10th Bulgaria Tsar Dulo Executed 14 AD Chaka None 13th Bulgaria Tsar None Executed 1 AD Ivaylo None 13th Bulgaria Tsar None Executed 3 AD Smilets None 13th Bulgaria Tsar Smilets Unknown 6 AD Gavril Radomir 11th Bulgaria Tsar Cometopuli Assassinat 1 AD ed Ivan Shishman 14th Bulgaria Tsar Shishman Unknown 24 AD Ivan Sratsimir 14th Bulgaria Tsar Shishman Executed 40 AD Ivan Stephen 14th Bulgaria Tsar Shishman Unknown 1 AD Theodor Svetoslav 14th Bulgaria Tsar Terter Natural 21 e AD Causes Ivan Vladislav 11th Bulgaria Tsar Cometopuli Combat 3 AD Isaac 12th Byzantium Emperor Angelid Assassinat 10 AD ed Marcus Aurelius 5th Byzantium Emperor Leonid Executed 1 AD Philippi Bardanes 8th Byzantium Emperor None Unknown 1 kos AD Androni 11th Byzantium Emperor Doukid Unknown 11 kos AD Constant Doukas 11th Byzantium Emperor Doukid Unknown 4 ine AD Konstant Doukas 11th Byzantium Emperor Doukid Combat 18 ios AD Justin I 6th Byzantium Emperor Justinian Natural 9 AD Causes Justinian I 6th Byzantium Emperor Justinian Natural 38 AD Causes Alexios I 11th Byzantium Emperor Komnenid Natural 37 AD Causes Androni I 15th Byzantium Emperor Komnenid Executed 2 kos AD

168

Manuel I 13th Byzantium Emperor Komnenid Unknown 37 AD Isaac I 11th Byzantium Emperor Unknown 2 AD Theodor I 13th Byzantium Emperor Laskarid Unknown 21 e AD Anasatiu I 6th Byzantium Emperor Leonid Natural 27 s AD Causes Leo I 5th Byzantium Emperor Leonid Natural 17 AD Causes 9th Byzantium Emperor Macedonian Unknown 19 AD John I 10th Byzantium Emperor Macedonian Assassinat 7 AD ed Romano I 10th Byzantium Emperor Macedonian Exile 24 s AD Michael I 9th Byzantium Emperor Nikephorian Natural 1 AD Causes Nikepho I 9th Byzantium Emperor Nikephorian Combat 9 ros AD Michael II 9th Byzantium Emperor Amorian Exile 9 AD Constan II 7th Byzantium Emperor Heraclian Assassinat 27 s AD ed Justinian II 7th Byzantium Emperor Heraclian Assassinat 16 AD ed Leo II 8th Byzantium Emperor Isaurian Natural 24 AD Causes Justin II 6th Byzantium Emperor Justinian Natural 13 AD Causes Tiberius II 6th Byzantium Emperor Justinian Natural 3 AD Causes Alexios II 14th Byzantium Emperor Komnenid Executed 3 AD John II 12th Byzantium Emperor Komnenid Accident 25 AD Theodor II 13th Byzantium Emperor Laskarid Natural 4 e AD Causes Leo II 5th Byzantium Emperor Leonid Assassinat 0 AD ed Basil II 11th Byzantium Emperor Macedonian Unknown 49 AD Nikepho II 10th Byzantium Emperor Macedonian Assassinat 6 ros AD ed Romano II 10th Byzantium Emperor Macedonian Assassinat 4 s AD ed Anastasi II 8th Byzantium Emperor None Executed 2 os AD Androni II 14th Byzantium Emperor Palaiologan Unknown 45 kos AD Manuel II 15th Byzantium Emperor Palaiologan Unknown 34 AD Theodos II 5th Byzantium Emperor Theodosian Accident 48 ius AD Michael III 9th Byzantium Emperor Amorian Assassinat 25 AD ed Alexios III 12th Byzantium Emperor Angelid Executed 8 AD Nikepho III 11th Byzantium Emperor Doukid Exile 3 ros AD

169

Constant III 7th Byzantium Emperor Heraclian Natural 0 ine AD Causes John III 13th Byzantium Emperor Laskarid Unknown 33 AD Romano III 11th Byzantium Emperor Macedonian Assassinat 6 s AD ed Theodos III 8th Byzantium Emperor None Unknown 2 ius AD Tiberios III 8th Byzantium Emperor None Executed 7 AD Androni III 14th Byzantium Emperor Palaiologan Natural 13 kos AD Causes Alexios IV 13th Byzantium Emperor Angelid Assassinat 0 AD ed Romano IV 11th Byzantium Emperor Doukid Exile 3 s AD Constant IV 7th Byzantium Emperor Heraclian Natural 17 ine AD Causes Leo IV 8th Byzantium Emperor Isaurian Natural 5 AD Causes John IV 13th Byzantium Emperor Laskarid Executed 3 AD Michael IV 11th Byzantium Emperor Macedonian Natural 7 AD Causes Androni IV 14th Byzantium Emperor Palaiologan Unknown 3 kos AD Constant IX 11th Byzantium Emperor Macedonian Natural 13 ine AD Causes Michael IX 14th Byzantium Emperor Palaiologan Unknown 25 AD Matthew 14th Byzantium Emperor Palaiologan Natural 4 AD Causes John Komnenos 16th Byzantium Emperor Komnenid Executed 2 AD Christop Lekapenos 10th Byzantium Emperor Macedonian Exile 10 her AD Constant Lekapenos 10th Byzantium Emperor Macedonian Exile 21 ine AD Stephen Lekapenos 10th Byzantium Emperor Macedonian Exile 21 AD Theophil None 9th Byzantium Emperor Amorian Natural 13 os AD Causes Heracliu None 7th Byzantium Emperor Heraclian Natural 30 s AD Causes Heraklo None 7th Byzantium Emperor Heraclian Unknown 0 nas AD Leontios None 7th Byzantium Emperor Heraclian Executed 3 AD Martinus None 7th Byzantium Emperor Heraclian Unknown 0 AD Tiberius None 7th Byzantium Emperor Heraclian Assassinat 28 AD ed Artabasd None 8th Byzantium Emperor Isaurian Unknown 1 os AD Maurice None 6th Byzantium Emperor Justinian Executed 20 AD Theodos None 6th Byzantium Emperor Justinian Executed 20 ius AD Basilisc None 5th Byzantium Emperor Leonid Executed 1 us AD

170

Zeno None 5th Byzantium Emperor Leonid Natural 17 AD Causes Alexand None 10th Byzantium Emperor Macedonian Accident 1 er AD Stauraki None 9th Byzantium Emperor Nikephorian Natural 0 os AD Causes Theophy None 9th Byzantium Emperor Nikephorian Unknown 1 lact AD Phocas None 7th Byzantium Emperor None Executed 8 AD Arcadius None 5th Byzantium Emperor Theodosian Natural 13 AD Causes Marcian None 5th Byzantium Emperor Theodosian Natural 7 AD Causes Irene of Athens 8th Byzantium Empress Isaurian Unknown 5 AD Theodor Porphyrogenita 11th Byzantium Empress Macedonian Natural 14 a AD Causes Zoe Porphyrogenita 11th Byzantium Empress Macedonian Unknown 22 AD Alexios V 13th Byzantium Emperor Angelid Executed 0 AD Constant V 8th Byzantium Emperor Isaurian Natural 34 ine AD Causes Michael V 11th Byzantium Emperor Macedonian Executed 0 AD Leo V 9th Byzantium Emperor None Assassinat 7 AD ed Androni V 15th Byzantium Emperor Palaiologan Unknown 4 kos AD John V 14th Byzantium Emperor Palaiologan Unknown 49 AD Constant VI 8th Byzantium Emperor Isaurian Assassinat 17 ine AD ed Leo VI 9th Byzantium Emperor Macedonian Unknown 26 AD Michael VI 11th Byzantium Emperor None Exile 1 AD John VI 14th Byzantium Emperor Palaiologan Exile 7 AD Michael VII 11th Byzantium Emperor Doukid Unknown 11 AD Constant VII 10th Byzantium Emperor Macedonian Assassinat 46 ine AD ed John VII 14th Byzantium Emperor Palaiologan Unknown 0 AD Constant VIII 11th Byzantium Emperor Macedonian Unknown 3 ine AD John VIII 15th Byzantium Emperor Palaiologan Unknown 23 AD Michael VIII 13th Byzantium Emperor Palaiologan Unknown 23 AD Constant X 11th Byzantium Emperor Doukid Unknown 8 ine AD Constant XI 15th Byzantium Emperor Palaiologan Combat 4 ine AD Lothair I 9th Carolingian Emperor Carolingian Natural 32 AD Empire Causes Louis I 9th Carolingian Emperor Carolingian Natural 26 AD Empire Causes

171

Berenga I 10th Carolingian Emperor Unruoching Unknown 8 r AD Empire Guy I 9th Carolingian Emperor Widonid Combat 3 AD Empire Lambert I 9th Carolingian Emperor Widonid Unknown 6 AD Empire Louis II 9th Carolingian Emperor Carolingian Unknown 19 AD Empire Louis III 10th Carolingian Emperor Bosonid Unknown 4 AD Empire Arnulph None 9th Carolingian Emperor Carolingian Natural 3 AD Empire Causes Charlem None 9th Carolingian Emperor Carolingian Natural 14 agne AD Empire Causes 9th Carolingian Emperor Carolingian Natural 1 AD Empire Causes 9th Carolingian Emperor Carolingian Natural 6 AD Empire Causes Hanno I 6th Carthage King Didonian Unknown 24 BC Hamilca I 5th Carthage King Magonids Combat 30 r BC Hasdrub I 6th Carthage King Magonids Combat 20 al BC Mago I 6th Carthage King Magonids Unknown 20 BC Hanniba I 5th Carthage Suffete Magonids Combat 34 l BC Hamilca II 4th Carthage Suffete Hannonian Unknown 21 r BC Hanno II 5th Carthage Suffete Magonids Unknown 40 BC Himilco II 5th Carthage Suffete Magonids Suicide 10 BC Mago II 4th Carthage Suffete Magonids Combat 21 BC Hanno III 4th Carthage Suffete Magonids Unknown 4 BC Mago III 4th Carthage Suffete Magonids Unknown 31 BC Malchus None 6th Carthage King Didonian Unknown 6 BC Dido None 8th Carthage Queen Didonian Suicide 54 BC Bomilca None 4th Carthage Suffete Hannonian Executed 1 r BC Gisco None 4th Carthage Suffete Hannonian Unknown 7 BC Hanno the Great 4th Carthage Suffete Hannonian Unknown 3 BC Oliver Cromwell 17th England Lord Protector Cromwell Natural 4 AD Causes Richard Cromwell 17th England Lord Protector Cromwell Natural 0 AD Causes Harold Godwinson 11th England King Godwin Combat 0 AD Harold Harefoot 11th England King Natural 4 AD Causes Richard I 12th England King Anjou Combat 9 AD

172

Henry I 12th England King Normandy Accident 35 AD William I 11th England King Normandy Accident 20 AD Edward I 13th England King Plantagenet Natural 34 AD Causes Charles I 17th England King Stuart Executed 23 AD James I 17th England King Stuart Natural 22 AD Causes Edmund I 10th England King Wessex Assassinat 6 AD ed Elizabet I 16th England Queen Tudor Natural 44 h AD Causes Mary I 16th England Queen Tudor Natural 5 AD Causes Henry II 12th England King Anjou Natural 34 AD Causes William II 11th England King Normandy Accident 12 AD Edward II 14th England King Plantagenet Assassinat 19 AD ed Richard II 14th England King Plantagenet Executed 22 AD Charles II 17th England King Stuart Natural 24 AD Causes James II 17th England King Stuart Natural 3 AD Causes Mary II 17th England Queen Stuart Natural 5 AD Causes Edward III 14th England King Plantagenet Natural 50 AD Causes Henry III 13th England King Plantagenet Natural 56 AD Causes William III 17th England King Stuart Natural 13 AD Causes Richard III 15th England King York Combat 2 AD Edmund Ironside 11th England King Wessex Assassinat 0 AD ed Henry IV 15th England King Lancaster Natural 13 AD Causes Edward IV 15th England King York Unknown 20 AD John None 13th England King Anjou Natural 17 AD Causes Stephen None 12th England King Blois Natural 18 AD Causes Harthac None 11th England King Denmark Accident 2 nut AD Sweyn None 11th England King Denmark Accident 0 AD Philip None 16th England King Tudor Natural 4 AD Causes Æthelre None 10th England King Wessex Combat 2 d AD Æthelsta None 10th England King Wessex Unknown 14 n AD Eadred None 10th England King Wessex Natural 9 AD Causes

173

Eadwig None 10th England King Wessex Unknown 3 AD Anne None 18th England Queen Stuart Natural 5 AD Causes Edward the Confessor 11th England King Wessex Natural 23 AD Causes Edward the Elder 10th England King Wessex Combat 24 AD Cnut the Great 11th England King Denmark Unknown 19 AD Alfred the Great 9th England King Wessex Unknown 12 AD Edward the Martyr 10th England King Wessex Assassinat 2 AD ed Edgar the Peaceful 10th England King Wessex Unknown 15 AD Henry V 15th England King Lancaster Natural 9 AD Causes Edward V 15th England King York Assassinat 0 AD ed Henry VI 15th England King Lancaster Executed 38 AD Edward VI 16th England King Tudor Natural 6 AD Causes Henry VII 15th England King Tudor Natural 23 AD Causes Henry VIII 16th England King Tudor Natural 37 AD Causes Vincent Auriol 20th France President SFIO Natural 7 AD Causes 10th France King Capet Unknown 9 AD Jean Casimir-Perier 19th France President Opportunist Natural 0 AD Republican Causes Jacques Chirac 21st France President UMP Natural 11 AD Causes René Coty 20th France President CNIP Natural 4 AD Causes Charles de Gaulle 20th France President UNR Natural 10 AD Causes Patrice de MacMahon 19th France President Moderate Monarchist Natural 5 AD Causes Paul Deschanel 20th France President Democratic Natural 0 AD Republican Causes Paul Doumer 20th France President Radical-Socialist Assassinat 0 AD ed Gaston Doumergue 20th France President Radical-Socialist Natural 7 AD Causes Armand Fallières 20th France President Democratic Natural 7 AD Republican Causes Félix Faure 19th France President Opportunist Natural 4 AD Republican Causes Marie François Sadi 19th France President Opportunist Assassinat 6 Carnot AD Republican ed Valéry Giscard d'Estaing 20th France President Republican Alive 6 AD Jules Grévy 19th France President Opportunist Natural 8 AD Republican Causes François Hollande 21st France President Socialist Alive 4 AD

174

Napoleo I 18th France Consul Bonaparte Exile 5 n AD Napoleo I 19th France Emperor Bonaparte Exile 9 n AD Henry I 11th France King Capet Natural 29 AD Causes John I 14th France King Capet Unknown 0 AD Philip I 11th France King Capet Natural 47 AD Causes Louis I 19th France King Orléans Natural 17 AD Causes Robert I 10th France King Robertian Combat 0 AD Francis I 16th France King Valois Natural 32 AD Causes Philip II 13th France King Capet Natural 42 AD Causes Robert II 11th France King Capet Unknown 34 AD Carloma II 9th France King Carolingian Accident 2 n AD Francis II 16th France King Valois Natural 1 AD Causes Henry II 16th France King Valois Accident 12 AD John II 14th France King Valois Natural 13 AD Causes Napoleo III 19th France Emperor Bonaparte Natural 18 n AD Causes Philip III 13th France King Capet Natural 15 AD Causes Louis III 9th France King Carolingian Accident 3 AD Henry III 16th France King Valois Assassinat 15 AD ed Napoleo III 19th France President Bonaparte Natural 4 n AD Causes Henry IV 16th France King Bourbon Assassinat 20 AD ed Charles IV 14th France King Capet Unknown 6 AD Philip IV 13th France King Capet Natural 29 AD Causes Louis IV 10th France King Carolingian Accident 18 AD Louis IX 13th France King Capet Natural 73 AD Causes Charles IX 16th France King Valois Natural 13 AD Causes Albert Lebrun 20th France President Democratic Natural 8 AD Republican Causes Émile Loubet 20th France President Democratic Natural 7 AD Republican Causes Emmanu Macron 21st France President En Marche! Alive 3 el AD Alexand Millerand 20th France President None Natural 3 re AD Causes François Mitterrand 20th France President Socialist Natural 13 AD Causes

175

Rudolph None 10th France King Bosonid Natural 12 AD Causes Lothair None 10th France King Carolingian Unknown 31 AD Odo of 9th France King Robertian Unknown 9 AD Alain Poher 20th France President Democratic Centre Natural 0 AD Causes Raymon Poincaré 20th France President Democratic Natural 7 d AD Republican Causes Georges Pompidou 20th France President UDR Natural 4 AD Causes Nicolas Sarkozy 21st France President UMP Alive 4 AD Charles the Bald 9th France King Carolingian Natural 33 AD Causes Charles the Fat 9th France King Carolingian Natural 2 AD Causes 10th France King Carolingian Executed 24 AD Louis the Stammerer 9th France King Carolingian Natural 1 AD Causes Adolphe Thiers 19th France President None Natural 1 AD Causes Philip V 14th France King Capet Natural 5 AD Causes Louis V 10th France King Carolingian Accident 0 AD Charles V 14th France King Valois Natural 16 AD Causes Louis VI 12th France King Capet Natural 29 AD Causes Charles VI 15th France King Valois Natural 42 AD Causes Philip VI 14th France King Valois Natural 22 AD Causes Louis VII 12th France King Capet Natural 43 AD Causes Charles VII 15th France King Valois Natural 38 AD Causes Louis VIII 13th France King Capet Natural 3 AD Causes Charles VIII 15th France King Valois Accident 14 AD 19th France King Bourbon Natural 5 AD Causes Louis X 14th France King Capet Accident 2 AD Louis XI 15th France King Valois Natural 22 AD Causes Louis XII 16th France King Valois Natural 16 AD Causes Louis XIII 17th France King Bourbon Natural 33 AD Causes Louis XIV 17th France King Bourbon Natural 72 AD Causes Louis XV 18th France King Bourbon Natural 58 AD Causes Louis XVI 18th France King Bourbon Executed 18 AD

176

Louis XVIII 19th France King Bourbon Natural 9 AD Causes Konrad Adenauer 20th Germany Chancellor Christian Democratic Natural 14 AD Union Causes Willy Brandt 20th Germany Chancellor Social Democratic Natural 4 AD Causes Ludwig Erhard 20th Germany Chancellor Christian Democratic Natural 3 AD Union Causes Adolph Hitler 20th Germany Führer Nazi Suicide 10 AD William I 19th Germany Emperor Hohenzollern Natural 17 AD Causes William II 20th Germany Emperor Hohenzollern Natural 30 AD Causes Frederic III 19th Germany Emperor Hohenzollern Natural 0 k AD Causes Kurt Kiesinger 20th Germany Chancellor Christian Democratic Natural 2 AD Union Causes Helmut Kohl 20th Germany Chancellor Christian Democratic Natural 16 AD Union Causes Angela Merkel 21st Germany Chancellor Christian Democratic Alive 14 AD Union Helmut Schmidt 20th Germany Chancellor Social Democratic Natural 8 AD Causes Gerhard Schröder 20th Germany Chancellor Social Democratic Alive 7 AD George I 18th Great Britain King Hanover Natural 12 AD Causes George II 18th Great Britain King Hanover Natural 33 AD Causes George III 18th Great Britain King Hanover Natural 41 AD Causes Anne None 18th Great Britain Queen Stuart Natural 7 AD Causes Ferdinan I 16th Holy Roman Emperor Habsburg Natural 7 d AD Empire Causes Joseph I 18th Holy Roman Emperor Habsburg Natural 5 AD Empire Causes Leopold I 17th Holy Roman Emperor Habsburg Natural 46 AD Empire Causes Maximil I 16th Holy Roman Emperor Habsburg Unknown 25 ian AD Empire Francis I 18th Holy Roman Emperor Lorraine Natural 19 AD Empire Causes Otto I 10th Holy Roman Emperor Ottonian Natural 11 AD Empire Causes Frederic I 12th Holy Roman Emperor Staufen Accident 35 k AD Empire Albert I 14th Holy Roman King Habsburg Assassinat 9 AD Empire ed Rudolf I 13th Holy Roman King Habsburg Natural 17 AD Empire Causes Ferdinan II 17th Holy Roman Emperor Habsburg Natural 17 d AD Empire Causes Francis II 18th Holy Roman Emperor Habsburg Natural 14 AD Empire Causes Joseph II 18th Holy Roman Emperor Habsburg Natural 24 AD Empire Causes Leopold II 18th Holy Roman Emperor Habsburg Unknown 1 AD Empire

177

Maximil II 16th Holy Roman Emperor Habsburg Natural 12 ian AD Empire Causes Rudolph II 16th Holy Roman Emperor Habsburg Natural 35 AD Empire Causes Henry II 11th Holy Roman Emperor Ottonian Natural 10 AD Empire Causes Otto II 10th Holy Roman Emperor Ottonian Unknown 15 AD Empire Conrad II 11th Holy Roman Emperor Salian Natural 12 AD Empire Causes Frederic II 13th Holy Roman Emperor Staufen Natural 30 k AD Empire Causes Lothair II 12th Holy Roman Emperor Supplinburg Unknown 4 AD Empire Ferdinan III 17th Holy Roman Emperor Habsburg Natural 20 d AD Empire Causes Frederic III 15th Holy Roman Emperor Habsburg Natural 53 k AD Empire Causes Otto III 10th Holy Roman Emperor Ottonian Natural 5 AD Empire Causes Henry III 11th Holy Roman Emperor Salian Natural 9 AD Empire Causes Charles IV 14th Holy Roman Emperor Natural 32 AD Empire Causes Henry IV 11th Holy Roman Emperor Salian Natural 39 AD Empire Causes Otto IV 13th Holy Roman Emperor Welf Suicide 16 AD Empire Louis IV 14th Holy Roman Emperor Wittelsbach Natural 33 AD Empire Causes Matthias None 17th Holy Roman Emperor Habsburg Natural 6 AD Empire Causes Sigismu None 15th Holy Roman Emperor Luxembourg Natural 3 nd AD Empire Causes Wencesl None 14th Holy Roman King Luxembourg Natural 24 aus AD Empire Causes Alfonso of Castile 13th Holy Roman King Ivrea Unknown 18 AD Empire Richard of Cornwall 13th Holy Roman King Plantagenet Natural 15 AD Empire Causes William of Holland 13th Holy Roman King Holland Assassinat 8 AD Empire ed Adolf of Nassau 13th Holy Roman King Nassau Combat 6 AD Empire Rupert of the Palatinate 15th Holy Roman King Wittelsbach Unknown 9 AD Empire Henry Raspe 13th Holy Roman King Thuringia Combat 0 AD Empire Charles V 16th Holy Roman Emperor Habsburg Natural 37 AD Empire Causes Henry V 12th Holy Roman Emperor Salian Natural 14 AD Empire Causes Charles VI 18th Holy Roman Emperor Habsburg Natural 29 AD Empire Causes Henry VI 12th Holy Roman Emperor Staufen Natural 6 AD Empire Causes Henry VII 14th Holy Roman Emperor Luxembourg Natural 1 AD Empire Causes Charles VII 18th Holy Roman Emperor Wittelsbach Natural 2 AD Empire Causes

178

Günther von Schwarzburg 14th Holy Roman King Schwarzburg Natural 0 AD Empire Causes Vseslav Briachislavich 10th Kievan Rus' Grand Prince Rurikid Natural 0 AD Causes Igor I 10th Kievan Rus' Grand Prince Rurikid Executed 32 AD Iziaslav I 10th Kievan Rus' Grand Prince Rurikid Combat 17 AD Mstislav I 12th Kievan Rus' Grand Prince Rurikid Unknown 6 AD Rostisla I 12th Kievan Rus' Grand Prince Rurikid Unknown 7 v AD Sviatopo I 10th Kievan Rus' Grand Prince Rurikid Combat 1 lk AD Sviatosl I 10th Kievan Rus' Grand Prince Rurikid Assassinat 27 av AD ed Viachesl I 12th Kievan Rus' Grand Prince Rurikid Unknown 0 av AD Viachesl I 12th Kievan Rus' Grand Prince Rurikid Natural 0 av AD Causes Vladimir I 10th Kievan Rus' Grand Prince Rurikid Unknown 35 AD Vsevolo I 10th Kievan Rus' Grand Prince Rurikid Natural 0 d AD Causes Yaropol I 10th Kievan Rus' Grand Prince Rurikid Assassinat 8 k AD ed Yaroslav I 10th Kievan Rus' Grand Prince Rurikid Natural 34 AD Causes Yuri I 12th Kievan Rus' Grand Prince Rurikid Assassinat 4 AD ed Igor II 12th Kievan Rus' Grand Prince Rurikid Assassinat 0 AD ed Iziaslav II 12th Kievan Rus' Grand Prince Rurikid Unknown 5 AD Mstislav II 12th Kievan Rus' Grand Prince Rurikid Unknown 1 AD Sviatopo II 12th Kievan Rus' Grand Prince Rurikid Unknown 19 lk AD Sviatosl II 10th Kievan Rus' Grand Prince Rurikid Natural 3 av AD Causes Vladimir II 12th Kievan Rus' Grand Prince Rurikid Natural 12 AD Causes Vsevolo II 12th Kievan Rus' Grand Prince Rurikid Unknown 7 d AD Yaropol II 12th Kievan Rus' Grand Prince Rurikid Unknown 6 k AD Iziaslav III 12th Kievan Rus' Grand Prince Rurikid Unknown 1 AD Vladimir III 12th Kievan Rus' Grand Prince Rurikid Unknown 0 AD Askold None 9th Kievan Rus' Grand Prince Kyi Unknown 40 AD Olga of Kiev 10th Kievan Rus' Regent Rurikid Natural 17 AD Causes Oleg of Novgorod 10th Kievan Rus' Grand Prince Rurikid Accident 30 AD Antipate Etesias 3rd Macedon King Antipatrid Unknown 0 r BC Aeropus I 6th Macedon King Argead Unknown 26 BC

179

Alcetas I 6th Macedon King Argead Unknown 29 BC Alexand I 5th Macedon King Argead Unknown 44 er BC Amyntas I 6th Macedon King Argead Unknown 49 BC Argaeus I 7th Macedon King Argead Unknown 38 BC Perdicca I 7th Macedon King Argead Unknown 22 s BC Philip I 7th Macedon King Argead Combat 38 BC Demetri I Poliorcetes 3rd Macedon King Antigonid Executed 8 us BC Antipate II 3rd Macedon King Antipatrid Assassinat 3 r BC ed Aeropus II 4th Macedon King Argead Unknown 3 BC Alcetas II 5th Macedon King Argead Assassinat 6 BC ed Alexand II 4th Macedon King Argead Assassinat 3 er BC ed Amyntas II 4th Macedon King Argead Assassinat 0 BC ed Archela II 4th Macedon King Argead Assassinat 0 us BC ed Argaeus II 4th Macedon King Argead Combat 1 BC Perdicca II 5th Macedon King Argead Unknown 35 s BC Philip II 4th Macedon King Argead Assassinat 23 BC ed Demetri II Aetolicus 3rd Macedon King Antigonid Combat 10 us BC Antigon II Gonatas 3rd Macedon King Antigonid Natural 39 us BC Causes Amyntas III 4th Macedon King Argead Natural 22 BC Causes Perdicca III 4th Macedon King Argead Combat 9 s BC Philip III Arrhidaeus 4th Macedon King Argead Executed 6 BC Antigon III Doson 3rd Macedon King Antigonid Combat 8 us BC Philip IV 3rd Macedon King Antipatrid Natural 0 BC Causes Alexand IV 4th Macedon King Argead Assassinat 13 er BC ed Amyntas IV 4th Macedon King Argead Executed 3 BC Ptolemy Keraunos 3rd Macedon King None Combat 2 BC Perseus None 2nd Macedon King Antigonid Executed 11 BC Cassand None 4th Macedon King Antipatrid Natural 8 er BC Causes Sosthene None 3rd Macedon King Antipatrid Unknown 3 s BC Archela None 5th Macedon King Argead Assassinat 14 us BC ed

180

Crateuas None 4th Macedon King Argead Unknown 0 BC Orestes None 4th Macedon King Argead Unknown 3 BC Pausania None 4th Macedon King Argead Assassinat 0 s BC ed Lysimac None 3rd Macedon King None Combat 5 hus BC Meleage None 3rd Macedon King None Executed 0 r BC Antipate None 4th Macedon Regent Argead Natural 11 r BC Causes Ptolemy of Aloros 4th Macedon Regent Argead Assassinat 3 BC ed Pyrrhus of 3rd Macedon King None Combat 2 BC Alexand the Great 4th Macedon King Argead Unknown 13 er BC Philip V 3rd Macedon King Antigonid Unknown 42 BC Alexand VI 3rd Macedon King Antipatrid Assassinat 3 er BC ed Daniel Aleksandrovich 13th Moscow Grand Prince Rurikid Unknown 20 AD Yuri Dmitriyevich 15th Moscow Grand Prince Rurikid Unknown 0 AD Ivan I 14th Moscow Grand Prince Rurikid Unknown 14 AD Vasily I 15th Moscow Grand Prince Rurikid Unknown 35 AD Ivan II 14th Moscow Grand Prince Rurikid Unknown 6 AD Vasily II 15th Moscow Grand Prince Rurikid Unknown 35 AD Ivan III 15th Moscow Grand Prince Rurikid Natural 43 AD Causes Vasily III 16th Moscow Grand Prince Rurikid Natural 28 AD Causes Yuri III 14th Moscow Grand Prince Rurikid Assassinat 22 AD ed Ivan IV 16th Moscow Grand Prince Rurikid Natural 13 AD Causes Dmitry Ivanovich 14th Moscow Grand Prince Rurikid Unknown 29 AD Simeon Ivanovich 14th Moscow Grand Prince Rurikid Natural 13 AD Causes Dmitry Yuryevich 15th Moscow Grand Prince Rurikid Assassinat 1 AD ed Vasily Yuryevich 15th Moscow Grand Prince Rurikid Unknown 1 AD Rurik I 9th Novgorod Prince Rurikid Natural 17 AD Causes Oleg of Novgorod 9th Novgorod Prince Rurikid Accident 3 AD Albert Frederick 16th Hohenzollern Natural 50 AD Causes Frederic I 17th Prussia Duke Hohenzollern Natural 12 k AD Causes Frederic I 18th Prussia King Hohenzollern Natural 13 k AD Causes

181

Frederic I 18th Prussia King Hohenzollern Natural 27 k AD Causes William I 19th Prussia King Hohenzollern Natural 10 AD Causes Frederic II 18th Prussia King Hohenzollern Natural 46 k AD Causes Frederic II 18th Prussia King Hohenzollern Natural 11 k AD Causes Frederic III 19th Prussia King Hohenzollern Natural 42 k AD Causes Frederic IV 19th Prussia King Hohenzollern Natural 20 k AD Causes Albert None 16th Prussia Duke Hohenzollern Natural 42 AD Causes John Sigismund 17th Prussia Duke Hohenzollern Natural 1 AD Causes Frederic William 17th Prussia Duke Hohenzollern Natural 47 k AD Causes George William 17th Prussia Duke Hohenzollern Natural 20 AD Causes Severus Alexander 3rd Rome Emperor Severan Assassinat 13 AD ed Romulus Augustulus 5th Rome Emperor None Natural 0 AD Causes Marcus Aurelius 2nd Rome Emperor Nerva-Antonine Natural 19 AD Causes Constant Chlorus 4th Rome Emperor Constaninian Natural 1 ius AD Causes Herenni Etruscus 3rd Rome Emperor None Combat 1 us AD Claudius Gothicus 3rd Rome Emperor None Natural 1 AD Causes Tullus Hostilius 7th Rome King None Unknown 30 BC Constan I 4th Rome Emperor Constaninian Assassinat 13 s AD ed Gordian I 3rd Rome Emperor Gordian Suicide 0 AD Theodos I 4th Rome Emperor Theodosian Natural 2 ius AD Causes Valentin I 4th Rome Emperor Valentinian Natural 11 ian AD Causes Constant II 4th Rome Emperor Constaninian Combat 3 ine AD Constant II 4th Rome Emperor Constaninian Natural 24 ius AD Causes Maximi II 4th Rome Emperor Constaninian Suicide 2 nus AD Gordian II 3rd Rome Emperor Gordian Combat 0 AD Philip II 3rd Rome Emperor None Assassinat 5 AD ed Constan II 5th Rome Emperor Theodosian Executed 2 s AD Valentin II 4th Rome Emperor Valentinian Assassinat 16 ian AD ed Gordian III 3rd Rome Emperor Gordian Assassinat 5 AD ed Constant III 5th Rome Emperor Theodosian Executed 2 ine AD

182

Constant III 5th Rome Emperor Theodosian Natural 0 ius AD Causes Valentin III 5th Rome Emperor Theodosian Assassinat 30 ian AD ed Didius Julianus 2nd Rome Emperor None Executed 0 AD Ancus Marcius 7th Rome King Romulan Natural 26 BC Causes Petroniu Maximus 5th Rome Emperor Anicii Assassinat 0 s AD ed Magnus Maximus 4th Rome Emperor Theodosian Executed 4 AD Julius Nepos 5th Rome Emperor None Assassinat 1 AD ed Julian None 4th Rome Emperor Constaninian Combat 3 AD Licinius None 4th Rome Emperor Constaninian Executed 15 AD Martinia None 4th Rome Emperor Constaninian Executed 15 n AD Maxenti None 4th Rome Emperor Constaninian Combat 6 us AD Vetranio None 4th Rome Emperor Constaninian Natural 0 AD Causes Domitia None 1st Rome Emperor Flavian Assassinat 15 n AD ed Titus None 1st Rome Emperor Flavian Natural 2 AD Causes Vespasia None 1st Rome Emperor Flavian Natural 9 n AD Causes Augustu None 1st Rome Emperor Julio-Claudian Natural 13 s BC Causes Caligula None 1st Rome Emperor Julio-Claudian Assassinat 3 AD ed Claudius None 1st Rome Emperor Julio-Claudian Assassinat 13 AD ed Nero None 1st Rome Emperor Julio-Claudian Suicide 13 AD Tiberius None 1st Rome Emperor Julio-Claudian Natural 50 AD Causes Commo None 2nd Rome Emperor Nerva-Antonine Assassinat 12 dus AD ed Hadrian None 2nd Rome Emperor Nerva-Antonine Natural 20 AD Causes Nerva None 1st Rome Emperor Nerva-Antonine Natural 1 AD Causes Trajan None 2nd Rome Emperor Nerva-Antonine Natural 18 AD Causes Aemilia None 3rd Rome Emperor None Assassinat 0 n AD ed Anthemi None 5th Rome Emperor None Executed 5 us AD Aurelian None 3rd Rome Emperor None Assassinat 5 AD ed Avitus None 5th Rome Emperor None Assassinat 1 AD ed Balbinus None 3rd Rome Emperor None Assassinat 0 AD ed Carinus None 3rd Rome Emperor None Combat 2 AD

183

Carus None 3rd Rome Emperor None Natural 0 AD Causes Decius None 3rd Rome Emperor None Combat 1 AD Diocleti None 3rd Rome Emperor None Natural 20 an AD Causes Eugeniu None 4th Rome Emperor None Executed 2 s AD Florianu None 3rd Rome Emperor None Assassinat 0 s AD ed Galba None 1st Rome Emperor None Assassinat 0 AD ed Galerius None 4th Rome Emperor None Natural 6 AD Causes Gallienu None 3rd Rome Emperor None Assassinat 14 s AD ed Glyceriu None 5th Rome Emperor None Assassinat 1 s AD ed Hostilia None 3rd Rome Emperor None Natural 0 n AD Causes Jovian None 4th Rome Emperor None Natural 0 AD Causes Majoria None 5th Rome Emperor None Executed 4 n AD Maxima None 4th Rome Emperor None Suicide 21 n AD Numeria None 3rd Rome Emperor None Assassinat 1 n AD ed Olybrius None 5th Rome Emperor None Natural 0 AD Causes Otho None 1st Rome Emperor None Suicide 0 AD Pertinax None 2nd Rome Emperor None Assassinat 0 AD ed Probus None 3rd Rome Emperor None Assassinat 6 AD ed Pupienu None 3rd Rome Emperor None Assassinat 0 s AD ed Quintillu None 3rd Rome Emperor None Assassinat 0 s AD ed Saloninu None 3rd Rome Emperor None Assassinat 14 s AD ed Tacitus None 3rd Rome Emperor None Natural 0 AD Causes Valerian None 3rd Rome Emperor None Combat 7 AD Vitellius None 1st Rome Emperor None Assassinat 0 AD ed Caracall None 3rd Rome Emperor Severan Assassinat 6 a AD ed Diadum None 3rd Rome Emperor Severan Executed 1 enian AD Elagabal None 3rd Rome Emperor Severan Assassinat 3 us AD ed Geta None 3rd Rome Emperor Severan Assassinat 0 AD ed Macrinu None 3rd Rome Emperor Severan Executed 1 s AD Honoriu None 5th Rome Emperor Theodosian Natural 28 s AD Causes

184

Joannes None 5th Rome Emperor Theodosian Executed 1 AD Victor None 4th Rome Emperor Theodosian Combat 4 AD Gratian None 4th Rome Emperor Valentinian Assassinat 16 AD ed Valens None 4th Rome Emperor Valentinian Combat 14 AD Romulus None 8th Rome King Romulan Assassinat 37 BC ed Antonin Pius 2nd Rome Emperor Nerva-Antonine Natural 22 us AD Causes Numa Pompilius 7th Rome King Romulan Natural 43 BC Causes Valerius Severus 4th Rome Emperor Constaninian Suicide 0 AD Libius Severus 5th Rome Emperor None Assassinat 4 AD ed Septimiu Severus 3rd Rome Emperor Severan Natural 17 s AD Causes Lucius Tarquinius 6th Rome King Etruscan Assassinat 37 Priscus BC ed Lucius Tarquinius 6th Rome King Etruscan Unknown 25 Superbus BC Philip the Arab 3rd Rome Emperor None Combat 5 AD Constant the Great 4th Rome Emperor Constaninian Natural 30 ine AD Causes Maximi Thrax 3rd Rome Emperor None Assassinat 3 nus AD ed Servius Tullius 6th Rome King Etruscan Assassinat 45 BC ed Valerius Valens 4th Rome Emperor Constaninian Executed 15 AD Lucius Verus 2nd Rome Emperor Nerva-Antonine Natural 8 AD Causes Sophia Alekseyevna 17th Regent Romanov Unknown 7 AD Simeon Bekbulatovich 16th Russia Tsar Qasim Unknown 1 AD Boris Godunov 16th Russia Tsar Godunov Natural 7 AD Causes Alexand I 19th Russia Emperor Romanov Natural 24 er AD Causes Nicholas I 19th Russia Emperor Romanov Natural 29 AD Causes Paul I 18th Russia Emperor Romanov Assassinat 4 AD ed Catherin I 18th Russia Empress Romanov Natural 2 e AD Causes Alexis I 17th Russia Tsar Romanov Unknown 30 AD Michael I 17th Russia Tsar Romanov Natural 31 AD Causes I 17th Russia Tsar Romanov Natural 39 AD Causes Feodor I 16th Russia Tsar Rurikid Unknown 13 AD False I 17th Russia Tsar Rurikid Assassinat 0 Dmitry AD ed

185

Alexand II 19th Russia Emperor Romanov Assassinat 26 er AD ed Michael II 20th Russia Emperor Romanov Assassinat 0 AD ed Nicholas II 20th Russia Emperor Romanov Executed 22 AD Peter II 18th Russia Emperor Romanov Natural 2 AD Causes Catherin II 18th Russia Empress Romanov Natural 34 e AD Causes Feodor II 17th Russia Tsar Godunov Assassinat 0 AD ed Alexand III 19th Russia Emperor Romanov Natural 13 er AD Causes Peter III 18th Russia Emperor Romanov Assassinat 0 AD ed Feodor III 17th Russia Tsar Romanov Unknown 6 AD Anna Ioannovna 18th Russia Empress Romanov Natural 10 AD Causes Ivan IV 16th Russia Tsar Rurikid Natural 37 AD Causes Vasily IV 17th Russia Tsar Rurikid Executed 4 AD Władysł IV 17th Russia Tsar Vasa Natural 2 aw AD Causes Anna Leopoldovna 18th Russia Regent Mecklenburg Exile 1 AD Dmitry Medvedev 21st Russia President None Alive 4 AD Nikolai Nikolaevich 20th Russia Emperor Romanov Natural 0 AD Causes Elizabet None 18th Russia Empress Romanov Natural 20 h AD Causes Constant Pavlovich 19th Russia Emperor Romanov Natural 0 ine AD Causes Vladimir Putin 21st Russia President None Alive 16 AD Ivan V 17th Russia Tsar Romanov Natural 13 AD Causes Ivan VI 18th Russia Emperor Welf Assassinat 1 AD ed Boris Yeltsin 20th Russia President None Natural 8 AD Causes Yuri Andropov 20th Soviet Union General Soviet Natural 1 AD Secretary Causes Lavrenti Beria 20th Soviet Union First Deputy Soviet Executed 0 y AD Chairman Leonid Brezhnev 20th Soviet Union General Soviet Natural 18 AD Secretary Causes Konstant Chernenko 20th Soviet Union General Soviet Natural 1 in AD Secretary Causes Mikhail Gorbachev 20th Soviet Union President Soviet Alive 6 AD Andrei Gromyko 20th Soviet Union Chairman Soviet Natural 0 AD Causes Lev Kamenev 20th Soviet Union Deputy Soviet Executed 2 AD Chairman Alexei Kosygin 20th Soviet Union Chairman Soviet Natural 12 AD Causes

186

Nikita Kruschev 20th Soviet Union First Secretary Soviet Natural 11 AD Causes Vladimir Lenin 20th Soviet Union Chairman Soviet Natural 6 AD Causes Georgy Malenkov 20th Soviet Union Chairman Soviet Natural 0 AD Causes Vyaches Molotov 20th Soviet Union First Deputy Soviet Natural 0 lav AD Chairman Causes Nikolai Podgorny 20th Soviet Union Chairman Soviet Natural 12 AD Causes Joseph Stalin 20th Soviet Union General Soviet Natural 31 AD Secretary Causes Dmitry Ustinov 20th Soviet Union Minister of Soviet Natural 0 AD Defence Causes Grigory Zinoviev 20th Soviet Union Chairman Soviet Executed 2 AD Niceto Alcalá-Zamora 20th Spain President Liberal Republican Natural 5 AD Right Causes Manuel Azaña 20th Spain President Republican Left Natural 3 AD Causes Diego Barrio 20th Spain President Republican Union Natural 0 AD Causes Emilio Castelar 19th Spain President Republican Natural 0 AD Possibilist Causes Estanisla Figueras 19th Spain President Democratic Federal Natural 0 o AD Republican Causes Francisc Franco 20th Spain Caudillo Francoist Natural 39 o AD Causes Joseph I 19th Spain King Bonaparte Natural 5 AD Causes Juan I 20th Spain King Bourbon Alive 38 AD Louis I 18th Spain King Bourbon Natural 0 AD Causes Charles I 16th Spain King Habsburg Natural 39 AD Causes Amadeo I 19th Spain King Savoy Natural 2 AD Causes Philip I 16th Spain King Trastámara Natural 0 AD Causes Isabella I 15th Spain Queen Trastámara Natural 29 AD Causes Charles II 17th Spain King Habsburg Natural 35 AD Causes Philip II 16th Spain King Habsburg Natural 42 AD Causes Ferdinan II 15th Spain King Trastámara Natural 41 d AD Causes Isabella II 19th Spain Queen Bourbon Natural 35 AD Causes Charles III 18th Spain King Bourbon Natural 29 AD Causes Philip III 17th Spain King Habsburg Natural 22 AD Causes Charles IV 18th Spain King Bourbon Natural 19 AD Causes Philip IV 17th Spain King Habsburg Natural 44 AD Causes Joanna None 16th Spain Queen Trastámara Natural 50 AD Causes

187

Francisc Pi I Margall 19th Spain President Democratic Federal Natural 0 o AD Republican Causes Nicolás Salmerón 19th Spain President Progressive Natural 0 AD Causes Francisc Serrano y 19th Spain President None Natural 0 o Domínguez AD Causes Francisc Serrano y 19th Spain Regent None Natural 2 o Domínguez AD Causes Philip V 18th Spain King Bourbon Natural 23 AD Causes Felipe VI 21st Spain King Bourbon Alive 6 AD Ferdinan VI 18th Spain King Bourbon Natural 21 d AD Causes Ferdinan VII 19th Spain King Bourbon Natural 0 d AD Causes Ferdinan VII 19th Spain King Bourbon Natural 19 d AD Causes Alfonso XII 19th Spain King Bourbon Natural 10 AD Causes Alfonso XIII 20th Spain King Bourbon Natural 44 AD Causes Elizabet II 20th United Queen Windsor Alive 68 h AD Kingdom George III 19th United King Hanover Natural 18 AD Kingdom Causes George IV 19th United King Hanover Natural 10 AD Kingdom Causes William IV 19th United King Hanover Natural 6 AD Kingdom Causes Victoria None 19th United Queen Hanover Natural 63 AD Kingdom Causes George VI 20th United King Windsor Natural 15 AD Kingdom Causes Edward VII 20th United King Saxe-Coburg and Natural 9 AD Kingdom Gotha Causes George VII 20th United King Windsor Natural 25 AD Kingdom Causes Edward VIII 20th United King Windsor Natural 0 AD Kingdom Causes Dmitry Aleksandrovich 13th Vladimir Grand Prince Rurikid Unknown 14 AD Andrey I 12th Vladimir Grand Prince Rurikid Assassinat 17 AD ed Ivan I 14th Vladimir Grand Prince Rurikid Unknown 8 AD Mikhail I 12th Vladimir Grand Prince Rurikid Unknown 1 AD Andrey II 13th Vladimir Grand Prince Rurikid Unknown 2 AD Yaroslav II 13th Vladimir Grand Prince Rurikid Assassinat 8 AD ed Yuri II 13th Vladimir Grand Prince Rurikid Combat 24 AD Andrey III 13th Vladimir Grand Prince Rurikid Unknown 13 AD Sviatosl III 13th Vladimir Grand Prince Rurikid Unknown 3 av AD Vsevolo III 12th Vladimir Grand Prince Rurikid Unknown 35 d AD

188

Yaropol III 12th Vladimir Grand Prince Rurikid Unknown 1 k AD Yaroslav III 13th Vladimir Grand Prince Rurikid Unknown 7 AD Yuri III 14th Vladimir Grand Prince Rurikid Assassinat 4 AD ed Alexand Mikhailovich 14th Vladimir Grand Prince Rurikid Executed 1 er AD Dmitry Mikhailovich 14th Vladimir Grand Prince Rurikid Executed 4 AD Konstant of Rostov 13th Vladimir Grand Prince Rurikid Unknown 1 in AD Mikhail of Tver 14th Vladimir Grand Prince Rurikid Executed 13 AD Alexand Vasilyevich 14th Vladimir Grand Prince Rurikid Unknown 3 er AD Alexand Yaroslavich 13th Vladimir Grand Prince Rurikid Natural 11 er AD Causes Mikhail Yaroslavich 13th Vladimir Grand Prince Rurikid Combat 0 AD Vasily Yaroslavich 13th Vladimir Grand Prince Rurikid Unknown 5 AD Friedric Ebert 20th Weimar President Social Democratic Natural 6 h AD Republic Causes Paul Hindenburg 20th Weimar President None Natural 9 AD Republic Causes

189