Optimal Input Encoding For Memristor Based Reservoir Computing

Charles Boutens

Supervisor: Prof. dr. ir. Joni Dambre Counsellor: Prof. dr. ir. Joni Dambre

Master's dissertation submitted in order to obtain the academic degree of Master of Science in Engineering Physics

Department of Electronics and Information Systems Chair: Prof. dr. ir. Rik Van de Walle Faculty of Engineering and Architecture Academic year 2016-2017 Optimal Input Encoding For Memristor Based Reservoir Computing Charles Boutens Supervisor(s): Prof. dr. ir. Joni Dambre1

Abstract— One of the most promising fields of unconventional ap- With Moore’s law reaching its end, alternatives to the proaches to computation might be the brain-inspired field digital computing paradigm are being investigated. These of analogue, neuromorphic computing [4]. Here, the ulti- unconventional approaches range from quantum- and opti- cal computing to promising analogue, neuromorphic imple- mate vision is to use self-organised neural networks consist- mentations. A recent approach towards computation with ing of nano-scale components with variable properties and analogue, physical systems is the emerging field of physi- erroneous behaviour, inevitable at the nano-scale. These cal reservoir computing (PRC). By considering the physical system as an excitable, dynamical medium called a reser- neuromorphic approaches require highly connected com- voir, this approach enables the exploitation of the intrinsic plex neural networks with adaptive synapse-like connec- computational power available in all physical systems. tions. Recently [5] [6] [7] interest has arisen in the function- In this work the RC approach towards computation is ap- alities of locally connected switching networks. These net- plied to networks consisting of resistive switches (RS). A densely connected atomic switch network (ASN) and a net- works consist of switches and memory components such as work build up from T iO2 memristors are used as reservoirs molecular transistors, negative differential resistances [8], in simulation. Both simulation models rely on volatile ex- memristors [6] or atomic switches [9]. tensions of Strukov’s widely spread current controlled mem- ristor (CCMR) model. The dynamics of the ASN in simu- Contrary to the digital framework, in analogue com- lation are thoroughly characterized and based on these ob- puting the state of the system can take up any contin- servations two encoding schemes are proposed in order to uous value. Hence processes cannot be reliably repeated solve two reservoir benchmark tasks, memory capacity and NARMA-10 task. Experiments executed with the reservoirs with exact equivalence. Also, in the fabrication process consisting of T iO2 memristors lead to the observation of the of these analogue computation blocks, device variability failure of the RC approach in the absence of volatility of the is inevitable and thus a computational framework aiming used devices. at using these systems should account for both inter- and By comparing the simulation with the actual dynamics of the ASN, it is concluded that the CCMR model, used intra-device variability. to describe the individual atomic switches, fails at captur- This is where the RC approach towards computing with ing some of the fundamental characteristics of the device. By looking at the performance of both the ASN and the recurrent neural networks (RNN) comes into play [10] [11] T iO2 reservoir, and at similar research found in the litera- [12]. Here an artificial, randomly assembled RNN is consid- ture, it is argued that the CCMR models are unsuited for ered as a dynamical, excitable medium called the reservoir, the RC approach. Nevertheless, the real ASN devices do whose internal dynamics are untrained. The RC concept show promising reservoir properties. A voltage controlled memristor model (VCMR), already used in a memristor- relies on two key points. Firstly, the detailed nature of the based RC approach, is suggested to continue further re- reservoir is unimportant, only its overall dynamics play a search. Also, some interesting new architectures are briefly role. Secondly, RC allows for a simple training approach discussed that could increase the performance of these RS reservoirs. where only the readout layer is trained. This flexibility en- Index terms – memristor based reservoir computing, ables the RC approach to be applied to a large variety of neuromorphic computation, physical reservoir computing, dynamical systems. Swapping the RNN reservoir with a memristor models, memristor models for reservoir comput- physical system, leads to what is called physical reservoir ing. computing (PRC). PRC presents the right framework and tools to exploit the analogue computational power, intrin- I. Introduction sic to physical dynamical systems, in a robust way: i.e. Digital computation has been the standard for over half without the need to control the randomness and fluctua- a century, mainly thanks to its extreme robustness against tions of the system. The physical system is considered as a noise and variability. However, with the continuous de- reservoir that maps the input via its own intrinsic dynam- mand for more computational power and the slowing down ics onto a high dimensional feature space. These features of Moores law [1] [2], alternatives to the silicon based com- are read out by measuring the system’s state and are sub- putational methods are being developed and researchers sequently combined in a trained way (using any standard are reconsidering the field of unconventional computing regression technique) to produce the desired output. PRC which stayed on the sideline of their Turing machine broth- has been demonstrated in various physical systems, both ers for many years [3]. experimentally and in simulations, ranging from a bucket of water in [13], morphological [14], to very promising pho- 1Department of Electronics and Information Systems, Ghent Uni- tonics implementations [15] [16] [17] [18] [19]. In the light of versity neuromorphic computing, the RC approach has also been applied to networks consisting of nano-scale switching ele- in Figure 1. Here w corresponds to the width of the doped ments based on memristive junctions [20] [21] [22] [23] [24]. region. The total resistance of the memristor device is de- With the advancements in material sciences, self- termined by two variable resistors connected in series. RON assembled networks of memristive devices at the nano-scale and ROFF denote the resistance of the memristor in both have been produced, forming very promising candidates limit cases where w = D and w = 0 respectively. Ohm’s in the search for synthetic synapses for the fabrication law leads to the following port equation (1): of neuromorphic systems. These devices exhibit similar properties present in biological neurons, such as hystere- v(t) = RON x + ROFF (1 − x) i(t). (1) sis, short-term and long-term-placticity (STP, LTP), long- where x = w ∈ [0, 1]. Applying a bias v across the device term depression etc [6] [5]. One very promising implemen- D will cause the charged dopants to drift towards the undoped tation is a highly interconnected atomic switch network region and hence move the boundary between both regions (ASN) [7] [5] [22] [9]. In the ASN, the individual atomic [27]. Here, the simplest case of ohmic electronic conduc- switches are composed of a Ag|Ag S|Ag metal-insulator- 2 tion and linear ionic drift in a uniform field is considered, metal (MIM) interface, where the switching process is gov- leading to the following state equation: erned mostly by a combination of two phenomena: the formation/annihilation of a conductive metal filament to- dx µR = ON i(t), (2) gether with a bias induced phase transition in the insulator dt D2 layer of the MIM junction. A multi-electrode-array (MEA) for the state variable x. Here µ denotes the average ion is being overgrown by a complex structure of densely con- mobility. The coupled equations 1, 2 take the normal form nected silver nanowires, the ASN. The electrodes on the for a current controlled memristor [28]. This CCMR model MEA can serve both as a source or readout. This densely is used as basis for both the T iO memristor and the ASN connected network structure gives rise to new interesting 2 models in [26] and [22] respectively. properties unseen in single Ag|Ag S|Ag atomic switches. It 2 A simulation model for the ASN has been developed in has been reported[5] that the ASNs show signs of an oper- MATLAB [22]. The CCMR model, introduced above, is ational regime near the ”edge of chaos”, where the network used as a starting point to describe the behavior of a single exhibits avalanche dynamics, criticality and power law scal- atomic switch. The port equation 1 remains unchanged ing of temporal metastability. More over, the ASN shows with w ∈ [0,D], now representing the length of the Ag distributed, continuous network activity caused by the in- nano filament. If w = 0 there is no conducting filament, terplay between filament formation and dissolution across corresponding with a resistance R . For w = D the the whole network. A voltage drop due to the formation of OFF filament is fully formed, the switch is in the ON state, a filament in the network instigates the filament’s thermo- R . Next a new state equation is introduced for w: dynamical instability and hence its corresponding decay. ON The second memristive device under investigation is the

T iO2 memristor created by HP [25]. This device consists dw(t) Ron = [µv i(t)] Ω(w(t)) − τ (w(t) − D) + η(t), (3) of a thin T iO2 film sandwiched between two platinum elec- dt D trodes. The T iO2 layer consists of a highly conductive + channel (T iO2−x), due to positively charged oxygen va- where µv represents the ionic mobility of Ag and Ω(w) = 2 cancies, and a very narrow insulating T iO2 barrier near [w (D−w)/D ] is a window function, modelling the nonlin- the positive electrode. By applying a voltage across the ear dopant drift [29]. The second term represents the ther- memristor, a dopant drift in the conducting channel can modynamical instability of the filament and η(t) is stochas- modulate the barrier width. Due to vacancy migration, tic accounting for fluctuations in the density of available locally reduced T iO2 Magnelli phases are formed, which silver ions and the stochastic nature of the filament forma- eventually line up to form highly conductive channels that tion/dissolution process. extend along the memristor’s active core. Again both the The T iO2 memristor model [26] also starts off from formation of a conducting filament as well as a phase transi- Strukov’s CCMR model. However here, in order to cap- tion result in the observed resistance switching. The T iO2 ture both volatile and non-volatile switching characteris- memristor exhibits both volatile as non-volatile switching behavior [26] [6]. The metastable phase-transition within the T iO2 core of the device lies at the basis of this volatile behavior. This happens before any thermodynamically sta- ble phase-transitions occur with the formation or annihi- lation of conductive channels consisting of reduced T iO2 filaments (non-volatile switching).

II. Experimental Setup A. Memristor Models In [25] a thin semiconductor film of thickness D sand- wiched between two metal contacts is considered as shown Fig. 1. Schematic representation of the T iO2 memristor [29] Fig. 2. Volatile behavior of T iO2 ReRam cell. Effect off the inter- pulse timing on the relative change in conductance ∆C/C0. Blue: measured device response - Red: modelled device response. (a) interpulse time of 1s leads only to volatile state transitions. (b) Fig. 3. Network configurations used. a) Sparsely connected network with interpulse times of 600ms results in non-volatile switching with some long ranging connections (maximally ranging over 10 [26]. neighboring nodes). b) Densely connected network with only close connections (maximally ranging over 2 neighboring nodes). Each grid point serves as a node at which the voltage is read tics, Figure 2, three state variables x, y and z are intro- out. The green and red node represent the input/ground node duced to describe the memristor. These are linked by the respectively. following set of coupled differential equations, represented by a volatile, non-volatile and charge cell: The ’deviation from linearity in the frequency domain’ [30], denoted by δ , is introduced as a first measure for dx x − y φ V olatile Cell : Cx = − + I0(x) (4) the nonlinear mapping capacities of the reservoir. This dt Rx measure is obtained by feeding the system with a single frequency (fc) sine wave and calculating the ratio between  I (y), z > q the energy in the original input frequency Ec and the en- dy  0 p Non − volatile Cell : C = I (y), z < q (5) ergy contained in all other frequencies (minus the DC com- y dt 0 N  0, else ponent) Etot in the system’s response: Ec δφ = 1 − . Etot dz z Charge Cell : Cz = Imem − , (6) A δφ close to one corresponds to a strongly nonlinear dt Rz regime, as nearly all the energy is located in higher har- monics. On the other hand, δ close to zero relates to a I µ R f(h) φ where I (h) = mem v on . (7) linear regime. 0 D2 Here f(h) is again a window function modelling the non- Linear Memory Capacity linear dopant drift and I0(h) relates the input current Imem to the drift velocity of h. This set of equations is imple- The following is a standard measure related to the mem- mented in SPICE, an analog electronic circuit simulator ory in recurrent neural networks [31]. The input u(n) is used in integrated circuit and board-level design. sampled from a uniform distribution between [0, 0.5] and the desired output is defined as dφ(n) = u(n − φ), a φ- B. Memristor Reservoirs delayed version of the input. The capacity function Cφ is The ASN reservoir is build by starting from a 10×10 grid given by: 2 and connecting the different nodes. Both short range con- C = Cov(y,dφ) . nections, between nearest neighbors, as well as long range φ var(y)var(d) connection are formed, Figure 3. Each connection repre- Adding up these capacity functions for all positive φ val- sents an atomic switch that is modeled by the equations ues leads to: (1, 3) introduced above, where the different memristor pa- rameters are drawn from a corresponding probability dis- X C = C , tribution [22]. For the T iO2 reservoir, a similar approach is mem φ used. Now a 3 × 3 grid is connected by memristors in such φ a way as to form a hexagonal structure. Each individual which is called the linear memory capacity of the reservoir. memristor is described by the (semi-volatile) SPICE model [26] described above. Model parameters are again drawn Academic Benchmark task from a random distribution with mean given by the values in [26] and σ = 25%. A benchmark task in reservoir characterization used to measure a reservoir’s capacity both on nonlinear map- C. Reservoir Measures ping as well as memory persistence is the nonlinear auto- Meausure of Non-Linearity regressive moving average 10 (NARMA-10) task. The in- put u(k) is again drawn from a uniform distribution in the a 1V input bias. After an initial transient regime (the tran- sient time) each of these node voltages settles to a constant value, after which the state of the system stays unchanged. This can be understood by looking at the individual atomic switches. Figures 4 b.2 shows the progress of the fraction of formation of the filament lengths x(t) = w(t)/D ∈ [0, 1]. The squares in Figures 4 b.1 and b.2 show the relation be- tween the change in filament width and the corresponding change in voltage response at the different nodes. For clar- ity the initial fierce filament/voltage changes are enlarged in Figures 4 c.1 and c.2. Here an important fact catches the eye, namely that immediately after the start of the exper- iments, t ∈ (0s − 0.01s), the largest part of the filaments ’dies out’, meaning that the filament length exponentially decays to x = 0, and the corresponding AS switches to the high resistance OFF state. This exponential decay is caused by the dissolution term −τ (w(t)−D) in equation 3. The same behavior occurs for all biases applied. However, with increasing bias, an increasing amount of the filaments ’survive’. If the applied bias is too low on the other hand, then all filaments will decay towards x = 0, as the flux de- pendent growth rate can’t overcome the decay rate. This switching towards the OFF state happens very quickly due to its exponential form and short timescale τ −1, resulting in the short transient times seen for biases under 0.4V . In order to better grasp this ’die out’ phenomenon, the progression of filament lengths across the network for dif- ferent times is presented in Figure 5. At time t = 0 all the filament lengths are initialized randomly. However, al- Fig. 4. Network’s response for the sparse network configuration to ready after t = 0.01s, the largest part of the network has an input step at t = 0. a) Transient times for different applied died out, and only three switches are still notably conduct- biases. b.1 Voltage progression at different nodes for 1V bias. ing, corresponding to the purple, orange and blue filaments The voltage at the input node is depicted by the yellow constant lines. b.2 Fraction of the filament formation for the different in Figure 4 c.1. As time progresses a couple of more fila- atomic switches. c.1-c.2) Zoom in on initial transient for the 1V ments are formed until the system reaches its steady state. applied bias case. The current looks for the path with least resistance con- necting the input to the ground (basically it solves a kind interval [0, 0.5] and the output is defined by the following of shortest path problem). As the current flows along this time series d(k): path, it initializes a chain effect: current passing through the memristor leads to an increase in filament length (due to the electronic flux dependent growth rate). The corre- X sponding increase in the memristor’s conductance in turn d(k +1) = αd(k)+βd(k) d(k −i)+γu(k)u(k −9)+δ, results in a higher current density along this path. This i=1...n cycle is repeated until the filament is fully formed. The with α = 10, n = 9, β = 0.05, γ = 1.5 and δ = 0.1. same phenomenon is observed for the dense network con- figuration, Figure 6. In order to predict the time series given by d(k) the com- Frequency Response putational system at hand must have memory of the 10 pre- vious inputs, and equally be able to compute the nonlinear The values of δφ, seen in Figure 7, are relatively low, combinations in the expression. corresponding to a nearly linear system response. It can III. Results be noted that increasing the amplitude results in an in- crease in δφ. The same, being it a bit more subtle, holds A. Atomic Switch Network for increasing biases, except for low amplitudes where an Step Response additional bias leads to a decrease in δφ. Figure 8 shows the network’s response for some specific As a first experiment the network is driven by step in- bias-amplitude combinations. In Figure 8 a.1, small input puts with varying heights, Figure 4. The transient time amplitude and no extra bias term, the network behaves in presented in Figure 4 a, is obtained by looking at the volt- a completely linear fashion, producing mere scalings of the age responses at each of the network nodes, Figure 4 b.1 for input voltage. Figure 8 a.2 shows why: the applied input Fig. 7. δφ = 1 − Ec/Etot as a measure of the networks nonlinearity for input sine waves with fc = 11Hz and a range of different biases and amplitudes

Fig. 5. Filament lengths x ∈ [0, 1] for the sparse network configura- TABLE I tion at different times for a step input with height 1V. Memory Capacity: Encoding Scheme 1 - 2

∆t(s) / V1-V2(V) 1.5 - 2 2.75 - 2 3.75 - 2 0.1 0.09/0.07 0.1/0.08 0.08/0.07 0.05 0.06/0.06 0.08/0.06 0.07/0.05 0.03 0.05/0.04 0.07/0.06 0.1/0.09

Memory capacity values for the ASN where input encoding scheme 1/2 is used. The values are normalized by dividing by the number of readout electrodes N = 14.

responses as seen in 8 e.1 and low δφ values.

Memory Capacity

Fig. 6. Filament lengths x ∈ [0, 1] for the dense network configuration at different times for a step input with height 2V. Due to the nature of the discrete input u(n) for the mem- ory capacity and NARMA task, namely white noise, it is straightforward to opt for the encoding schemes as pre- of 1V isn’t strong enough to sustain any filament formation sented in Figure 9. Each value of the discrete input u(n) and the decay term outweighs the growth rate in the state is mapped to a pulse with a certain width and amplitude. equation. All memristors nearly instantaneously switch to Also an additional bias is required for input pulses to pro- the OFF state and the network now basically consists of duce significant changes in the ASN’s filament lengths. A plane resistors. As the amplitude is raised, more current bias, width and amplitude sweep was performed in order passes through the network leading to a higher amount of to determine which combinations produce the most inter- filaments being formed, Figure 8 b and c. This results in esting system responses. Here it was seen that pulses with a larger part of the network contributing to the nonlinear the right amplitude and width combination on top of a con- system’s response, Figures 8 b.2 versus c.2 (more filaments stant bias result in momentary state changes of the ASN by are formed and annihilated each period). Hence a larger forming some additional filaments that create short lived part of the input’s energy is found in higher harmonics of extra conducting channels. The used input encoding val- the input frequency. The effect of an additional applied ues and the corresponding results for the memory capacity bias can be seen by looking at Figure 8 d.2. versus Figure task can be found in Table I. Compared to the hierarchi- 8 c.2. During the second part of the period, where the sine cal single-cycle-reservoir (SCR) architectures in [23], where becomes negative, the extra voltage provided by the bias normalized memory capacities of 0.9 are reached, here it helps sustaining a part of the formed filaments. This way, can be clearly noted that the used reservoir doesn’t show the system responds longer in a nonlinear fashion to the a lot of memory. applied input which leads to the measured increase in δφ. Finally from Figure 8 e.1 and e.2 the decrease in δφ for NARMA-10 low amplitudes and increasing biases can be understood. If the applied bias is too large relative to the signal’s am- In order to measure the reservoir’s performance on the plitude, than most filaments that are fully formed will not NARMA-10 task, the normalized root-mean-square error decay in the second half of the period. The corresponding (NRMSE) is used. The results are summarized in Table memristors will remain saturated for most of the time and II for encoding schemes 1 and 2. These results can again act as resistors with resistance RON . This leads to linear be compared to the SCR in [23]. Here NRMSE as low as Fig. 9. Input encoding schemes used for the memory capacity and NARMA-10 task. The discrete input u(n) ∈ [0, 0.5] is mapped to the continuous time signal u(t) ∈ V 1 + V 2 [0, 0.5] with a certain width.

TABLE II NARMA-10 task: Encoding Scheme 1 - 2

∆t(s) / V1-V2(V) 1.5 - 2 2.75 - 2 3.75 - 2 0.1 0.75/0.67 0.81/0.74 0.91/0.71 0.05 0.85/0.76 0.83/0.81 0.71/0.97 0.03 0.83/0.92 0.81/0.94 0.77/1.1

ASN performance on the NARMA-10 task. Obtained NRMS errors for different pulse widths ∆t and V1-V2 values for encoding scheme 1/2.

Fig. 8. The ASN’s response, driven by an input sine wave with frequency fc = 11Hz with varying biases and amplitudes. (a.1- b.1-c.1-d.1-e.1) Voltage progression at different nodes (including input node) for time periods when the system has reached its steady state. (a.2-b.2-c.2-d.2-e.2) Filament length x ∈ [0, 1] for these time periods.

0.2 are obtained. In [32] it is stated that the best NRMSE for the NARMA-10 task obtained with a linear reservoir in the ESN approach is 0.4. The performance of the ASN on both the memory capacity and NARMA-10 task can be understood from the fact that only a small part of the reservoir actually contributes to the computation, caused by the shortest-path phenomenon described earlier. Fig. 10. T iO2 network response to an applied sine wave with fre- quency fz = 1Hz and amplitude 10µ A. a.1) Progression of filament length. a.2) Detailed view on the blue square of a.1). B. T iO2 Reservoir b.1) Current response through the different memristors. b.2) Zoomed in version of the current response. For the networks of interconnected T iO2 memristors in the SPICE environment, the used set of differential equa- tions turned out to be unstable in a network configuration resolved this problem but resulted in a new, more funda- driven by a voltage source. Switching to a current source, mental issue regarding the non-volatility of these devices. as it presented a more natural choice for the CCMR models, Something fundamentally different about the T iO2 model compared to the ASN is its non-volatile switching behav- case as can be seen by the sudden unpredictable change in ior. In [24] it is stated how device volatility is key for the the system’s state while operating in a steady regime. RC approach to work as it is closely related to the echo state property (ESP) for RNNs [33] which makes sure that IV. Conclusion and Future Work the functional relationship between the input and the sys- By comparing the results obtained from the ASN model tem’s response is localized in time. However, computation to the physical ASN’s behavior, it can be concluded that with these networks would be possible as long as there ex- the CCMR model doesn’t capture some of the main char- ists a unique mapping from the input to the state of the acteristics of the physical system. The model ascribes reservoir. A first remark is that only zero DC inputs can the resistance switching of an individual AS to the cur- be used to drive this network, otherwise the memristors rent passing through the junction instead of the voltage will saturate and behave as simple resistors. One option is across its terminals. This leads to the described chain- to use frequency encoding schemes without any additional effect, where an increase in conductance is amplified by bias in order to satisfy this condition. As a first step, a sine additional current passing through the device. Where the wave with f = 1Hz is applied to the network and both the physical ASN shows distributed, continuous network ac- current through the memristors as the filament widths x tivity caused by the interplay between filament formation are measured. Figure 10 a.1 shows how again only a cou- and dissolution across the whole network, the modeled net- ple of filaments grow towards x = 1 after the current is work’s response is mainly restricted to a small set of fila- applied. These filaments correspond to the high current ments forming the conducting channel of least resistance. values shown in Figure 10 b.1. The same mechanism oc- This non-conformity between simulation and reality can curs as in the ASN network, the current again looks for be ascribed to the used CCMR model based on Strukov’s the shortest path from input to ground, which is formed memristor model. by these three filaments. They transport the majority of In [24] volatile extensions of Strukov’s CCMR are in- the current. The difference now however, is that the other vestigated for RC purposes. Here, in order to create the filaments in the network don’t decay but stay around their richest reservoir dynamics as a whole, each CCMR has an initial resistance value RINIT (which is a clear indication of individual current source tuning its response. From this it the fact that the ESP isn’t obeyed). In this case, RINIT is can be understood that in order to make the RC approach chosen corresponding to x = 0.5. As can be seen in Figure work for a reservoir consisting of memristors described by 10 b.2, the other memristors still conduct a small amount (the volatile extension of) Strukov’s model, a huge amount of the current which form very interesting transformations of parameters need to be introduced and tuned according of the original input sine wave. Looking back at 10 a.1, it to each individual memristor. Avoiding the direct tuning appears as if the network has settled into a steady regime of all these different parameters is precisely the reason why after t ≈ 90s, however, out of the blue, at t ≈ 190s the fil- RC formed such a promising approach towards analogue ament widths start changing again as can be seen in more computing. It can be argued that Strukov’s CCMR model detail in 10 a.2. The same phenomenon occurs for differ- and its extensions in general, are unsuited for RC purposes, ent amplitudes and frequencies. A straightforward exam- regardless of their physical correctness. ple that clarifies the problem that arises when performing Several physical phenomena lie at the origin of the re- computational tasks in this scenario, can be found in the sistive switching effect. These phenomena have been dis- higher harmonic generation (HHG) task. Here the network cussed thoroughly in the literature in the past few years, responses, i.e. the currents through the different memris- [34] [35] [36]. However progress in mathematical models tors as measured in Figure 10 b.1, are combined to form remains relatively modest, see [37] [38] for a nice overview. higher harmonic versions of the input (in this case a sine The ASN does show interesting characteristics that make it wave with frequency f = 2Hz). A first time, the training a very suitable candidate for RC. In order to easily further phase runs from t = 100s (after the initial transients have investigate different architectures, encoding schemes and died out, as can be verified by Figure 10 a.1) until t = 150s, operating regimes, a good, physically valid model is key. and testing occurs between t = 150s and t = 175. Here the Hence as a first step it would be interesting to look at the system performs incredibly well with an accuracy of 96% resemblance between a network simulated with the volt- in reproducing the desired wave form. However when test- age controlled memristor (VCMR) model [39] and the real ing is repeated over an equally long time interval, but now ASN. This VCMR model has already been applied to form in the region where the sudden state change occurs, from randomly connected networks in a SCR hierarchical archi- t = 200s until t = 225s, the accuracy drops to 86%. This tecture [23]. The individual neurons, consisting of these is obvious as the features presented to the trained weights randomly connected memristor networks, were inspired by in order to optimally produce the desired output suddenly the ASNs. Hence, this VCMR model might be a better change. The ESP is a necessary condition for the training fit than the currently used CCMR models to simulate the algorithm to work as it ensures that the current state of behavior of the ASN. the system only depends on the input and previous system Regarding the drastic improvements made in [23] by in- states up until some time in the past. In this way there troducing the hierarchical SCR architecture, here some ad- exists a one on one correspondence between the input and ditional ideas on reservoir architectures will be suggested the state of the reservoir. This condition is violated in this that could be interesting to look at. The weights connect- ing the individual ’neurons’ of the SCR are untrained and (2007). best values are found by performing a grid search. As a [13] C. Fernando and S. Sojakka, in Advances in artificial first improvement, FORCE learning [40] could be used in life (Springer, 2003) pp. 588–597. order to train these connections in an online manner. Addi- [14] H. Hauser, A. J. Ijspeert, R. M. F¨uchslin, R. Pfeifer, tionally, instead of only using the parallel SCR formation, and W. Maass, “Towards a theoretical foundation for a deep architecture could be used, inspired by the deep morphological computation with compliant bodies,” RC approach with ESN introduced in [41]. Here several Biological cybernetics 105, 355 (2011). reservoirs are stacked forming different architectures. This [15] K. Vandoorne, W. Dierckx, B. Schrauwen, D. Ver- approach leads to the processing of the input on different straeten, R. Baets, P. Bienstman, and J. Van Camp- time scales by different reservoirs. However the signal to enhout, “Toward optical signal processing using pho- noise ratio might decrease rapidly with increasing number tonic reservoir computing,” Optics Express 16, 11182 of reservoirs. Here again, the weights can be trained using (2008). FORCE learning. [16] Y. Paquot, F. Duport, A. Smerieri, J. Dambre, B. Schrauwen, M. Haelterman, and S. Massar, “Op- References toelectronic reservoir computing,” Scientific reports 2 [1] S. Kumar, “Fundamental limits to moores law,” Fun- (2012). damental Limits to Moore’s Law. Stanford University [17] L. Larger, M. C. Soriano, D. Brunner, L. Appeltant, 9 (2012). J. M. Guti´errez, L. Pesquera, C. R. Mirasso, and [2] M. M. Waldrop, “The chips are down for moores law,” I. Fischer, “Photonic information processing beyond Nature News 530, 144 (2016). turing: an optoelectronic implementation of reservoir [3] Z. Konkoli and G. Wendin, “Toward bio-inspired computing,” Optics express 20, 3241 (2012). information processing with networks of nano-scale [18] K. Vandoorne, P. Mechet, T. Van Vaerenbergh, switching elements,” arXiv preprint arXiv:1311.6259 M. Fiers, G. Morthier, D. Verstraeten, B. Schrauwen, (2013). J. Dambre, and P. Bienstman, “Experimental demon- [4] C. Mead, “Neuromorphic electronic systems,” Pro- stration of reservoir computing on a silicon photonics ceedings of the IEEE 78, 1629 (1990). chip,” Nature communications 5 (2014). [5] A. Z. Stieg, A. V. Avizienis, H. O. Sillin, C. Martin- [19] K. Vandoorne, J. Dambre, D. Verstraeten, Olmos, M. Aono, and J. K. Gimzewski, “Emergent B. Schrauwen, and P. Bienstman, “Parallel reser- criticality in complex turing b-type atomic switch net- voir computing using optical amplifiers,” IEEE works,” Advanced Materials 24, 286 (2012). transactions on neural networks 22, 1469 (2011). [6] R. Berdan, E. Vasilaki, A. Khiat, G. Indiveri, A. Serb, [20] M. S. Kulkarni and C. Teuscher, in Nanoscale Ar- and T. Prodromakis, “Emulating short-term synaptic chitectures (NANOARCH), 2012 IEEE/ACM Inter- dynamics with memristive devices,” Scientific reports national Symposium on (IEEE, 2012) pp. 226–232. 6 (2016). [21] J. R. Burger and C. Teuscher, in Nanoscale Architec- [7] A. V. Avizienis, H. O. Sillin, C. Martin-Olmos, H. H. tures (NANOARCH), 2013 IEEE/ACM International Shieh, M. Aono, A. Z. Stieg, and J. K. Gimzewski, Symposium on (IEEE, 2013) pp. 1–6. “Neuromorphic atomic switch networks,” PloS one 7, [22] H. O. Sillin, R. Aguilera, H.-H. Shieh, A. V. Avizie- e42772 (2012). nis, M. Aono, A. Z. Stieg, and J. K. Gimzewski, [8] D. B. Strukov and K. K. Likharev, in Nanotechnol- “A theoretical and experimental study of neuromor- ogy (IEEE-NANO), 2011 11th IEEE Conference on phic atomic switch networks for reservoir computing,” (IEEE, 2011) pp. 865–868. Nanotechnology 24, 384004 (2013). [9] E. Demis, R. Aguilera, H. Sillin, K. Scharnhorst, [23] J. B¨urger, A. Goudarzi, D. Stefanovic, and E. Sandouk, M. Aono, A. Stieg, and J. Gimzewski, C. Teuscher, in Proceedings of the 2015 IEEE/ACM “Atomic switch networksnanoarchitectonic design of a International Symposium on Nanoscale Architectures complex system for natural computing,” Nanotechnol- (NANOARCH¿¿ 15) (IEEE, 2015) pp. 33–38. ogy 26, 204003 (2015). [24] J. P. Carbajal, J. Dambre, M. Hermans, and [10] H. Jaeger, “The echo state approach to analysing and B. Schrauwen, “Memristor models for machine learn- training recurrent neural networks-with an erratum ing,” Neural computation (2015). note,” Bonn, Germany: German National Research [25] D. B. Strukov, G. S. Snider, D. R. Stewart, and Center for Information Technology GMD Technical R. S. Williams, “The missing memristor found,” na- Report 148, 34 (2001). ture 453, 80 (2008). [11] W. Maass, T. Natschl¨ager, and H. Markram, “Real- [26] R. Berdan, C. Lim, A. Khiat, C. Papavassiliou, and time computing without stable states: A new frame- T. Prodromakis, “A memristor spice model accounting work for neural computation based on perturbations,” for volatile characteristics of practical reram,” Elec- Neural computation 14, 2531 (2002). tron Device Letters, IEEE 35, 135 (2014). [12] D. Verstraeten, B. Schrauwen, M. dHaene, and [27] J. Blanc and D. L. Staebler, “Electrocoloration in srti D. Stroobandt, “An experimental unification of reser- o 3: Vacancy drift and oxidation-reduction of transi- voir computing methods,” Neural networks 20, 391 tion metals,” Physical Review B 4, 3548 (1971). [28] L. O. Chua, “Memristor-the missing circuit element,” Circuit Theory, IEEE Transactions on 18, 507 (1971). [29] Z. Biolek, D. Biolek, and V. Biolkova, “Spice model of memristor with nonlinear dopant drift,” Radioengi- neering 18, 210 (2009). [30] D. Verstraeten, J. Dambre, X. Dutoit, and B. Schrauwen, in The 2010 international joint confer- ence on neural networks (IJCNN) (IEEE, 2010) pp. 1–8. [31] H. Jaeger, Short term memory in echo state net- works (GMD-Forschungszentrum Informationstech- nik, 2001). [32] L. Appeltant, “Reservoir computing based on delay- dynamical systems,” These de Doctorat, Vrije Univer- siteit Brussel/Universitat de les Illes Balears (2012). [33] H. Jaeger, Tutorial on training recurrent neural net- works, covering BPPT, RTRL, EKF and the” echo state network” approach (GMD-Forschungszentrum Informationstechnik, 2002). [34] R. Waser and M. Aono, “Nanoionics-based resistive switching memories,” Nature materials 6, 833 (2007). [35] Y. Yang, P. Gao, S. Gaba, T. Chang, X. Pan, and W. Lu, “Observation of conducting filament growth in nanoscale resistive memories,” Nature communica- tions 3, 732 (2012). [36] A. Sawa, “Resistive switching in transition metal ox- ides,” Materials today 11, 28 (2008). [37] R. S. Williams and M. D. Pickett, in Memristors and Memristive Systems (Springer, 2014) pp. 93–104. [38] R. Kozma, R. E. Pino, and G. E. Pazienza, Advances in neuromorphic memristor science and applications, Vol. 4 (Springer Science & Business Media, 2012). [39] E. Lehtonen and M. Laiho, in Cellular Nanoscale Net- works and Their Applications (CNNA), 2010 12th In- ternational Workshop on (IEEE, 2010) pp. 1–4. [40] D. Sussillo and L. F. Abbott, “Generating coherent patterns of activity from chaotic neural networks,” Neuron 63, 544 (2009). [41] C. Gallicchio and A. Micheli, in European Symposium on Artificial Neural Networks, Computational Intelli- gence and Machine Learning (2016). Preface

One of the most interesting and fascinating quests in science for me, is un- derstanding the working of the human brain and its capacities. This thesis touched on so many interesting subjects, machine learning, neuromorphic computing, physics etc. and hence it formed an incredibly satisfying con- cluding piece to my 5 years of engineering physics education. First and foremost I would like to thank my supervisor, professor Joni Dambre for the opportunity she gave me. Also her patience, support and time management skills to schedule me in for our weekly consults near the end of my deadline, deserve all my gratitude. A special thanks goes to professor Adam Stieg and Renato Aguilera, UCLA, for their great cooperation in the process and giving me the oppor- tunity to work with their simulation model. I really hope their research can benefit from this work. Also I would like to thank all the people involved in the ”US-Belgium Workshop on Memristive Networks” that took place in Ghent this summer, for the unforgettable and informative experience. It really gave me an in- credible boost to work on this project. Next I have to thank my parents for supporting me and giving me all the opportunities in life, my friends for being there for me in times when I needed a laugh, a break and a beer to put everything back in perspective. And finally, my best friend and the most special girl in my life, Astrid, for being the best, for dreaming with me, for her support and believing in me no matter what. Charles Boutens, January 2017

i Permission for consultation

”The author gives permission to make this master dissertation available for consultation and to copy parts of this master dissertation for personal use. In the case of any other use, the copyright terms have to be respected, in particular with regard to the obligation to state expressly the source when quoting results from this master dissertation.”

Charles Boutens, January 2016

ii Abstract

With Moore’s law reaching its end, alternatives to the digital computing paradigm are being investigated. These unconventional approaches range from quantum- and optical computing to promising analogue, neuromorphic implementations. A recent approach towards computation with analogue, physical systems is the emerging field of physical reservoir computing (PRC). By considering the physical system as an excitable, dynamical medium called a reservoir, this approach enables the exploitation of the intrinsic computa- tional power available in all physical systems. In this work the RC approach towards computation is applied to networks consisting of resistive switches (RS). A densely connected atomic switch net- work (ASN) and a network build up from T iO2 memristors are used as reser- voirs in simulation. Both simulation models rely on volatile extensions of Strukov’s widely spread current controlled memristor (CCMR) model. The dynamics of the ASN in simulation are thoroughly characterized and based on these observations two encoding schemes are proposed in order to solve two reservoir benchmark tasks, memory capacity and NARMA-10 task. Ex- periments executed with the reservoirs consisting of T iO2 memristors lead to the observation of the failure of the RC approach in the absence of volatility of the used devices. By comparing the simulation with the actual dynamics of the ASN, it is concluded that the CCMR model, used to describe the individual atomic switches, fails at capturing some of the fundamental characteristics of the device. By looking at the performance of both the ASN and the T iO2 reser- voir, and at similar research found in the literature, it is argued that the CCMR models are unsuited for the RC approach. Nevertheless, the real ASN devices do show promising reservoir properties. A voltage controlled

iii memristor model (VCMR), already used in a memristor-based RC approach, is suggested to continue further research. Also, some interesting new archi- tectures are briefly discussed that could increase the performance of these RS reservoirs. Index terms – memristor based reservoir computing, neuromorphic com- putation, physical reservoir computing, memristor models, memristor models for reservoir computing.

iv List of Figures

1.1 Representation of a FFNN with two hidden layers ...... 7 1.2 On the left: representation of a RNN with input layer, out- put layer with possible feedback and hidden neurons forming cyclic paths. On the right: the corresponding unfolded repre- sentation of the RNN ...... 8 1.3 Representation of an ESN consisting of a RNN, the reservoir, with random weights and trained read out. [66] ...... 11

2.1 Projection of the input into a higher dimensional space can make a classification problem linearly separable [98]...... 17 2.2 Delayed feedback reservoirs scheme. Along the delay line N states, separated by a distance θ = τ/N from each other, are chosen to represent virtual nodes. with τ the delay time and θ the read out time [5] ...... 28

3.1 Examples of CR, PSM, and PHL for a current-controlled mem- ristor [92]. See table 3.2 for the definition of the input, output, TIU, TIY, PSM and CR in case of a CCMR...... 34 3.2 Resistive switches implemented in a cross-bar structure as they are used for memory storage...... 39 3.3 Cross section of HP TiO2 memristor [48] ...... 41

v 3.4 Typical switching dynamics for an anion (T iO2−x) device char- acterized by a voltage pulse stress (a, top) with variable pulse duration and amplitude. In particular, panel a (bottom) shows 16 curves, that is, 8 each for set (green) and reset (blue), with each curve showing evolution of the normalized resistance (R, measured at specific bias) for the device that is initially set to the OFF (ON) state and then continuously switched to the ON (OFF) by voltage pulses with fixed amplitude and expo- nentially increasing duration [114]...... 42

3.5 (a) Optical microphotograph of a single T iO2 memristor. (b) Example of measured volatile characteristics of the device shown in (a); Three disruptive pulses are applied with interpulse tim- ing of 1s which trigger volatile state transitions; READ pulses of small amplitude (0.5V non-disruptive) are applied every 20 ms to assert the conductivity of the device. (c) Conceptual model of conduction mechanisms that can render a volatile (metastable), as well as a non-volatile (stable) transition in the device’s conductance. [8] ...... 44

3.6 Schematic representation of the T iO2 memristor [10] . . . . . 46 3.7 Blockdiagram of the SPICE model [10] ...... 48 3.8 (a) Circuit schematic of the volatile SPICE model, based on the model in [10], extended with two extra cells to account for volatility. (b) Simulated pinched hysteresis loop under stim- ulus by a voltage sine wave of 1Hz; reduction to linear resis- tor for sine wave of 100Hz.,(c) Resistance of the device un- der stimulus as in (b). (d)-(g) Simulation example of volatile to non-volatile state transitions is shown in d) to (f) for two identical devices stimulated by identical input stimuli with dif- ferent interpulse timing: red line -500ms, blue line -1s. (d)

Instantaneous resistance of the model, Rmem = v/Imem e) Non-volatile resistance levels at which the model will subse- quently settle at, as shown in (d). (f) Volatile charge response

(Y = Ron y + (1 − y) Roff ). (g) Input stimulus. [8] ...... 49

vi 3.9 Volatile behavior of T iO2 ReRam cell. Effect off the interpulse

timing on the relative change in conductance ∆C/C0. Blue: measured device response - Red: modelled device response. (a) interpulse time of 1s leads only to volatile state transitions. (b) (c) with interpulse times of 600ms - 200ms, the energy barrier is exceeded and apart from a transient behavior, also a non-volatile state transition has occured [8] ...... 51

3.10 Schematic representation of the Ag|Ag2S|Ag MIM junction. Under an applied bias, the Ag cations migrate from the anode to the cathode where they are reduced. This leads to the formation of a conducting metallic filament [83] ...... 53 3.11 a) initial weak switching regime of the atomic switch-network. b) switching from a) is included and rescaled to show the dif- ference with the hard-switching regime [6] ...... 53

3.12 Volatile behavior of Ag2S gap-type atomic switch [15] . . . . . 54 3.13 Atomic switch network device. (a) Multi-electrode array of outer platinum electrodes lithographically patterned on a sil- icon substrate enable electrical characterization and stimula- tion of the central network. Scale bar = 4 mm. (b) SEM im- age of atomic switch network comprised of self-organized silver nanowires electrode- posited on a grid of copper posts. Over- lapping junctions of wires form atomic switches when func- tionalized. Scale bar = 500 µm. [21] ...... 55 3.14 Ultra-sensitive IR image of a distributed device conductance under external bias at 300K; electrodes are outlined in white. [86] ...... 56

vii 3.15 ASN response, nonlinear transformations. (a) and (b) Input waveforms shown are a 750mV , 30ms FWHM Gaussian pulse with a 1.25V offset input at electrode 12, and a 10V , 50ms square wave repeated at 10Hz with no offset input at electrode 4. (c) and (d) Each recorded output waveform is plotted with respect to where the recording was physically located on the device. The patterned seed network in (c) is grown on top of the MEA in (d). (e) and (f) Distributed network activity causes voltage signals that are input to the device to be trans- formed into higher-dimensional output representations. These output voltage representations are simultaneously recorded at each electrode in the 4 × 4 array. Vertical axes are normalized for clarity. [21] ...... 58 3.16 Connection graph of the ASN grid used in simulation experi- ments ...... 60 3.17 Simulation of device activation demonstrating (a) an initial soft switching repeated indefinitely, until (b) a transition in behavior from soft (blue) to hard (red switching. (c) Hard switching persists indefinitely. This behavior was ubiquitous across all configurations with discrepancies in the bias am- plitude/frequency. Experimental device activation curves are shown as insets for comparison [83] ...... 61

4.1 Architecture of memristive SCR. (a) Amplitude dependent memristive switching characteristics for a 10Hz applied sine wave; (b) example of a randomly assembled memristive net- work. The circles indicate nodes in which memristive devices (links between nodes) connect. The colored nodes represent an example CMOS/memristor interface with blue as input node (In), orange as ground node (0V), and green as differential output nodes (O1, O2). (c) simple cycle reservoir. Instead of analog neurons, memristive networks provide the input- output-mapping of each SCR node. [12] ...... 65

viii 4.2 Network configurations used. a) Sparsely connected network with some long ranging connections (maximally ranging over 10 neighboring nodes). b) Densely connected network with only close connections (maximally ranging over 2 neighboring nodes). Each grid point serves as a node at which the voltage is read out. The green and red node represent the input/ground node respectively...... 69 4.3 Network’s response for the sparse network configuration to an input step at t = 0. a) Transient times for different applied biases. b.1-c.1) Voltage progression at different nodes for 1V bias (b) and 2.9V bias. The voltage at the input nodes are depicted by the yellow constant lines (c). b.2-c.2) Fraction of the filament formation for the different atomic switches. b.2.1)-b.1.1) Zoom in on initial transient for the 1V applied bias case...... 72 4.4 Filament lengths x ∈ [0, 1] for the sparse network configura- tion at different times for a step input with height 1V. . . . . 73 4.5 Filament lengths x ∈ [0, 1] for the sparse network configura- tion at different times for a step input with height 2.9V. . . . 73 4.6 Filament lengths x ∈ [0, 1] for the dense network configuration at different times for a step input with height 1.5V...... 74 4.7 Filament lengths x ∈ [0, 1] for the dense network configuration at different times for a step input with height 2V...... 74 4.8 ASNs response to different input pulses for pulses with 1V constant bias, varying pulse heights and pulse widths as given in the insets. The voltage progression at different nodes is plotted, right after the end of the pulse for a representative time until the system has reached its new steady state. . . . . 77

4.9 δφ = 1 − Ec/Etot, section 2.1.6, as a measure of the networks

nonlinearity for input sine waves with fc = 11Hz and a range of different biases and amplitudes ...... 78

ix 4.10 The ASN’s response, driven by an input sine wave with fre-

quency fc = 11Hz with varying biases and amplitudes. (a.1- b.1-c.1-d.1-e.1) Voltage progression at different nodes (includ- ing input node) for time periods when the system has reached its steady state. (a.2-b.2-c.2-d.2-e.2) Filament width x ∈ [0, 1] for these time periods...... 80 4.11 Input encoding schemes used for the memory capacity and NARMA-10 task. The discrete input u(n) ∈ [0, 0.5] is mapped to the continuous time signal u(t) ∈ V 1 + V 2 [0, 0.5] with a certain width...... 81

4.12 Network architecture used for the T iO2 memristor network. The upper left node is connected to the ground, bottom right node to the input...... 86 4.13 Network response to an applied sine wave with frequency fz = 1Hz and amplitude 10µ A. a.1) progression of filament width. a.2) detailed view on the blue square of a.1). b.1) current response through the different memristors. b.2) zoomed in version of the current response...... 88

5.1 Memristors in series. Set of memristors for simple signal pro-

cessing. Each memristor is fed a constant current mi = µiI0i. [15] ...... 92

x List of Tables

3.1 General Memristors ...... 35 3.2 VCMR & CCMR ...... 36

4.1 Used parameters for the ASN simulations ...... 67

4.2 Used parameters in the T iO2 simulations ...... 67 4.3 Memory Capacity Encoding Scheme 1 ...... 83 4.4 Memory Capacity Encoding Scheme 2 ...... 84 4.5 NARMA-10 task Encoding Scheme 1 ...... 84 4.6 NARMA-10 task Encoding Scheme 2 ...... 84

xi Abbreviations

NDR Negative differential resistance ANN Artificial neural network MLP Multilayered FFNN Feedforward neural network RNN TDNN Time-delay neural networks BPTT Backpropagation through time SVM Support vector machine RC Reservoir computing ESN Echo state network LSM Liquid State Machines PRC Physical Reservoir Computing ESP Echo state property FFT Fast Fourier Transform NARMA-10 Nonlinear auto-regressive moving average 10 TDDS Time delay dynamical systems TIU Time-domain integral of the input TIY Time-domain integral of the response PSM Parameter versus state map CR Constitution Relation PHL Pinched hysteresis loop

xii ECMR Effort controlled memristor FCMR Flow controlled memristor VCMR Voltage controlled memristor CCMR Current controlled memristor HP Hewlett Packard RS Resistive switches MIM Metal-insulator-metal STP Short-term plasticity LTP Long-term plasticity AS Atomic switch SPICE Simulation Program with Integrated Circuit Emphasis ASN Atomic switch network MEA Multi-electrode-array ELD Electroless deposition SEM Scanning electron microscope SCR Simple-cycle-reservoir topology HHG Higher harmonic generation NRMSE Normalized root-mean-square error

xiii Contents

Preface i

Permission for consultation ii

Abstract iii

Abbreviations xii

1 Background 1 1.1 The End Of The Digital Era ...... 1 1.2 Unconventional Computing ...... 2 1.3 Analogue, Neuromorphic Approaches to Computation . . . . . 4 1.4 Artificial Neural Networks ...... 4 1.4.1 Feedforward Neural Networks ...... 5 1.4.2 Recurrent Neural Networks ...... 7 1.4.3 Reservoir Computing ...... 10 1.5 Physical Reservoir Computing ...... 11

2 Theoretical Background 14 2.1 Echo State Networks ...... 14 2.1.1 Mathematical Framework of ESNs ...... 14 2.1.2 The RC View on Computation ...... 16 2.1.3 Reservoir Dynamics ...... 18 2.1.4 Reservoir Parameter Tuning ...... 21 2.1.5 Training in the ESN Framework ...... 22 2.1.6 Measures to characterize the reservoir ...... 24 2.2 Physical Reservoirs ...... 27

xiv 3 Memristor 31 3.1 The Missing Electrical Component ...... 31 3.2 The Memristor ...... 32 3.2.1 Characteristics of the Ideal Memristor ...... 32 3.2.2 State Variable ...... 36 3.2.3 General Memristive Systems ...... 37 3.3 Resistive Switches ...... 38

3.4 T iO2 Memristor ...... 40 3.4.1 Physics ...... 40 3.4.2 Simulation Framework ...... 43 3.5 Atomic Switch Networks ...... 52 3.5.1 Physics ...... 52 3.5.2 Simulation Framework ...... 57 3.6 Voltage Controlled Memristor Model ...... 60

4 Memristor Based Reservoir Computing 62 4.1 Previous Work in the Field ...... 63 4.2 The Memristor Network as Reservoir ...... 65 4.3 Reservoir Computing with Atomic Switch Networks ...... 68 4.3.1 System Characterization ...... 69 4.3.2 Tasks ...... 79

4.4 Reservoir Computing with T iO2 Network ...... 85 4.4.1 Architecture ...... 85 4.4.2 Conclusive Example ...... 86

5 Conclusions and Future Challenges 90 5.1 Conclusions ...... 90 5.2 Future Challenges ...... 94

xv Chapter 1

Background

1.1 The End Of The Digital Era

In 1965, Gordon Moore – co founder of Intel – stated his famous law pre- dicting the doubling of the number of transistors in integrated circuits every two years. His prediction proved accurate for several decades and has served as a target guideline for the semiconductor industry, which is now aiming for exaflop computing by 2020 [23]. However this scaling down is about to reach its limits due to a number of reasons [31], [50] and now it is expected by most semiconductor industry forecasters, that Moore’s law will end by the year 2025 [53] [105]. However, ever more computational power is required for solving a variety of computationally intense problems that may be essential for further progress of mankind, e.g. long-range weather forecasting, reverse engineering of the human brain, understanding climate change etc [49]. Digital computation has been the standard for over half a century for all our computational tasks. This is mainly due to the fact that digital computing is extremely robust to noise and variability. In the digital frame- work, computation corresponds to bit flips (related to Flops). The higher the amount of bit flips per second, the higher the computational power of the system. Every bit is in essence a transistor switching between two voltage states, ’off’ and ’on’. The corresponding stored energy leads to local heating as it has to be dissipated every clock cycle. The down-scaling of transistors is tackled in such a way as to keep the electric fields in the channel constant,

1 also known as Dennard’s scaling law [22]. However, due to the impossibility of scaling down the voltages beyond limits set by reproducibility and robust- ness against static and dynamic fluctuations, this scaling law is reaching its limits and with it, Moore’s law is approaching its end. Rising performance demands – with the increasing amount of available data in several sectors – and the diminishing growth in computational re- sources – due to the slowing down of Moore’s law – require new solutions to keep resource budgets manageable in the computational industry. One promising solution for this rising problem lies in the field of approximate computing. This new research field is based on the observation that instead of performing exact digital computation, huge efficiency and power gains can be achieved by allowing selective approximations for many applications that do not require the high precision that is used today [70].

1.2 Unconventional Computing

With the continuous demand for more computational power and the slow- ing down of Moore’s law, alternatives to the silicon based computational methods are being developed and researchers are reconsidering the field of unconventional, or alternative, computing which stayed on the sideline of their Turing machine brothers for many years. Conventional digital Turing computation has proven to be incredibly successful, however it encompasses only a small subset of all computational possibilities. The digital computing approach was only one of many computing models introduced in the early days of computer science [101] [94]. Unconventional approaches to compu- tation are still in their infancy, but promise equally revolutionary results as they mature. The term covers a broad spectrum of computational paradigms ranging from , optical-, analogue- and chemical comput- ing to reaction-diffusion systems, neuromorphic computing and many more [49]. Where the Von Neumann architecture aims at sequential processing and programmability, most non-Von Neumann models distribute the computa- tion amongst several parallel processing units. These unconventional ap- proaches can be used in combination with non-Von Neumann computational

2 paradigms to reach incredibly powerful parallel computation. Here, instead of combining conventional, digital processors in parallel, the intrinsic, paral- lel computation powers found in – for example – most analogue systems is exploited, which could give rise to huge increases in speed and efficiency. Solving exponentially hard problems by designing application specific de- vices forms another field that can really benefit from unconventional comput- ing approaches. Where conventional digital methods only exploit two states of a transistor, ’on’ and ’off’, in analog terms, the same transistor has an infinite number of states. Analogue computing directly exploits the intrinsic dynamics of a physical system in its response to external stimuli, where the selection of a physical system with the right dynamics matching the compu- tational properties of the task at hand is required. An example where this analog approach clearly outperforms its digital competition is presented in the solving of different sets of differential equations presented in [1]. In a digital approach, time is discretized and the full set of equations is solved for each timestep. Additionally each transistor only represents one of two values, resulting in the need for circuits consisting of millions of transistors and digital clock cycles. In an analogue circuit, voltages and currents encode the variables in the set of differential equations. As the voltages and currents across the circuit need to balance out, varying one will change the other and hence changing the input over time results in a complete solution to the full set of equations. Finally, in situations where large-scale CMOS solutions are hard to imple- ment, embedded computation solutions based on these unconventional com- puting paradigms can present a solution. For example flight-control systems in airplanes are analogue systems in nature, however in modern implementa- tions a digital simulation of the control computer is used. Another example is found in morphological computing in robotics [32], where the robot’s mor- phology is designed in such a way as to incorporate a large part of the control complexity. This idea is inspired by the way animal bodies have evolved to allow for energy efficient movement and simple periodic central pattern gen- erator control. In the long run, unconventional computing schemes will try to compete with CMOS-driven digital computing systems in both speed and efficiency.

3 But for now, the main objective is limited to the exploration of computational capacities and applications of these unconventional computing systems.

1.3 Analogue, Neuromorphic Approaches to Computation

One of the most promising fields of unconventional approaches to computa- tion might be the brain-inspired field of analogue, neuromorphic computing. The concept of neuromorphic computing was developed in the late 1980s [67], and aimed at mimicking the neuro-biological architecture of the brain. Nowa- days, the term ’neuromorphic’ is used to describe both (analogue, digital and mixed) hardware [64] as software implementations of models of neural sys- tems. The ultimate vision of the analogue realization of these systems is to use self-organised neural networks consisting of nano-scale components with highly variable properties and erroneous behaviour, which are inevitable at the nano-scale. Here the intrinsic computational properties of these indi- vidual devices and the parallel architecture of the network are exploited to reproduce the computational capacities of the brain. These neuromorphic approaches require highly connected complex neural networks with adap- tive synapse-like connections. Analogue VLSI for neural systems has mainly been to focus to build neuromorphic computers in semiconductor hardware [68] ever since the 1980s. Recently [86] [9] [6] interest has arisen in the func- tionalities of locally connected switching networks. Here the network consists of switches and memory components such as molecular transistors, negative differential resistances (NDR) [87], memristors [9] or atomic switches [21].

1.4 Artificial Neural Networks

Artificial neural networks (ANNs) are a family of machine learning mod- els inspired by the central nervous system, especially the brain. Neurons, the computing units of the nervous system, communicate through synapses, which form the connections between them. In 1957, the psychologist, F. Rosenblatt introduced the perceptron as a simplified mathematical model of

4 a neuron in the brain [79]. The neuron takes the weighted sum of a set of inputs as its activation and subsequently thresholds this activation to out- put the value of 1 if the activation is large enough or 0 otherwise, much like the firing of a neuron in the brain. He also proposed a way to make these artificial neurons learn a specific task, by using examples of the desired output and accordingly adjusting the input weights to the neuron’s activa- tion. Combining several of these simple building blocks into a larger network with feedforward structure leads to what is called a multilayered perceptron (MLP), also known as a feedforward neural network.

1.4.1 Feedforward Neural Networks

There are two types of network topologies for ANNs: feedforward neural networks (FFNN) and recurrent neural networks (RNNs). In the former ar- chitecture, Figure 1.1, the input signal is ’piped’ through the network from the input to the output through ’hidden layers’ of neurons from the left to the right, hence the name feedforward network. The network only interacts with the outside world through its input layer, where the (possibly multidi- mensional) input is received, and its output layer, where the outputs of the network are presented to the outside world. In between these two layers, there is a series of hidden layers, consisting of internal units, and connection weights. There are only connections between neurons of subsequent layers. At each neuron, the outputs of the previous ones are combined as a weighted sum, i.e. the activation of the neuron. This activation is subsequently fed through a nonlinearity, most commonly these are tanh or sigmoid functions, to form the neuron’s output which, in turn, serves as the input for the neurons in the following layer until the output layer is reached. In order for the network to produce the desired output, the weights con- necting the neurons are trained for the task. Introduced in the 1970’s [108] [61] and made famous in 1986 by G. Hinton in [80], backpropagation is the most commonly used training algorithm for adjusting the connection weights in ANNs. It relies on the well known optimization method known as gradient descent. Here the parameters of a function are adjusted in order to minimize a cost function. By iteratively taking small steps in the direction of the neg-

5 ative gradient of the cost function w.r.t. the parameters, a solution is found that minimizes this cost function. This solution could be sub-optimal if the algorithm gets stuck in local minima. In order to train these FFNN, the gradient descent algorithm is also used to optimize the connection weights of the network. The bottleneck of this procedure lies in the calculation of the gradients of the cost function w.r.t. the internal weights. This is where the backpropagation algorithm excelled over the preceding training methods and made training neural networks manageable for the first time in the 1980’s after some time out of the spotlight. It starts by initializing all the weights (either randomly or with specialized methods [29]), next a two cycle phase is repeated. In the first step, the input is passed forward through the network with the current weights until the output is reached. There the error between the actual output and the desired output is calculated. Next the gradients are calculated by applying the chain rule and propagating the error back- wards through the network, hence the name backpropagation. The weights are updated accordingly and this whole two phase cycle is repeated. As the network is trained, these weight updates happen in such a way that the neurons in each subsequent layer start to represent a more abstract representation of the input. The input is eventually expanded in a high dimensional, nonlinear feature space, learned for the specific task. In the end this leads to a static arbitrary nonlinear input-output mapping. Due to the feedforward architecture of these networks, they are incapable of processing temporal information. Intrinsically embedded in the design of these networks lies the assumption that all input examples are independent of each other. However, for a lot of tasks (time series prediction [28] , speech recognition [30], pattern recognition [62] etc.) this is not the case. In 1989 one approach was suggested to tackle the problem of temporal data in the form of time-delay neural networks (TDNN) [104]. Here the feedforward architecture remains, but now each neuron processes only a subset of the input and has several sets of weights for different delays of the input data. These weight-sharing and time-window concepts, introduced with the TDNN, served as inspiration for LeCun’s first convolutional neural network [55]: a feedforward network architecture specifically designed to exploit the intrinsic structure in images. However, in the case of temporal data, the feedforward

6 Figure 1.1: Representation of a FFNN with two hidden layers topology is usually replaced by a designated ANN architecture that captures the temporal characteristics of the task. These type of networks are know under the name of Recurrent Neural Networks.

1.4.2 Recurrent Neural Networks

Just as with the feedforward architecture a recurrent neural network still consists of neurons that are connected by weights. Each neuron again takes the weighted sum of the outputs of the neurons connected to it as its ac- tivation, and performs a nonlinear transformation. In the case of RNNs the feedforward structure is dropped and neurons are connected in such a way as to form cyclic paths, see Figure 1.2 left. Hence the name recurrent neural networks. Due to the feedback in the network topology, the system shows internal temporal dynamics, i.e. it has a memory. In contrary to the static nonlinear input-output mapping seen with feed forward networks the recurrence in RNNs enables these networks to perform dynamical, nonlin- ear computations of the input signal, making them suitable for processing temporal input sequences. Again these networks are trained for the task in an iterative way. Several algorithms (Backpropagation through time, Real-Time Recurrent Learning and Extended Kalman Filtering Based Techniques ) have been developped

7 Figure 1.2: On the left: representation of a RNN with input layer, output layer with possible feedback and hidden neurons forming cyclic paths. On the right: the corresponding unfolded representation of the RNN for this purpose [44]. Backpropagation through time (BPTT), introduced in 1990 [109], is the extension of the regular backpropagation algorithm as seen in 1.4.1, for training the weights in recurrent neural networks. First the RNN is ”unfolded” through time by stacking identical copies of the network on top of each other and treating each loop as an input to the subsequent network, Figure 1.2 on the right. In this way a feedforward structure is again obtained and the regular backpropagation algorithm can be applied. In the forward pass, the total input-output training sequence is fed to the network. At each timestep the output is calculated and compared to the desired output at that time. Next, starting from the final training timestep, the errors are propagated backwards through the feedforward representation, and thus backwards through time. The weights are then updated correspondingly after which this whole process is repeated. If the network is trained for a total time period T, given T subsequent input-output examples, than the unfolded recurrent neural network corre- sponds to a FFNN with T hidden layers. A problem that used to impede the successful training of ”deep” neural networks (i.e. FFNN with multiple

8 hidden layers) is known as the vanishing gradient problem [38]. As the name suggests, it is related to the decrease of the derivatives for earlier layers en- countered in training deep neural networks. This is caused by the sequence of multiplications of derivatives smaller than 1 in order to obtain the gradi- ents in the first layers. Due to these small derivatives, only small changes are made to these weights during the update step and unfeasibly many train- ing iterations are required to get to reasonable results. As training of RNN relies on the same principles as regular backpropagation, and the unfolded network corresponds to a ”deep” FFNN of T layers, the vanishing gradient problem also resulted in the delay of the success of these RNNs [7]. They did outperform static feedforward networks on temporal tasks, but were dif- ficult to train optimally. The trained networks were able to account for short term dependencies, but failed to do so for the long term characteristics of the task, as the parameters settled in suboptimal solutions. In 1997 [39] a solution in long-short term memory networks (a modification of the general RNN architecture) was found to train these RNNs. Still there was a general perception of disbelief in the potential of neural networks with the coming of new machine learning algorithms, such as the support vector machine (SVM), outperforming neural networks [56]. During the last decade improvements both in the understanding of train- ing these artificial neural networks as well as pure computation power has lead to the huge successes booked in the field of in recent years. First, in 2006, the problem of vanishing gradients was solved by using the right weight initialization instead of random weights [36] [29]. Next the in- crease of computational power and the use of GPUs over CPUs (which lead to a drastic shrinkage in training time [76]) combined with large amounts of training data proved that deep neural networks could outperform other machine learning techniques. Further improvements were made by recon- sidering different nonlinearities instead of the standard tanh and sigmoid function [29] and the introduction of the concept of ”dropout” [37]. However, in the early 2000’s before the break through of deep learning, an entirely different approach in designing and training RNNs, which is now known as the machine learning branch of reservoir computing (RC) – as will be discussed next – was introduced.

9 1.4.3 Reservoir Computing

Reservoir Computing is an approach to design, train and analyze RNNs. There are three fundamental principles of RC, distinguishing it from other views on neural networks. First of all, large, random RNNs are used in which the weights connecting the neurons are not trained, but randomly generated. This random RNN is considered as an excitable medium, a reservoir. When driven by an external stimulus, each unit in the reservoir creates its own non- linear transform of the input. The state of the reservoir as a whole, which consists of all the individual transformations of the input – i.e. the states of the individual neurons – can be seen as an expansion of the input in a ’ran- dom’ feature space. This randomness comes from the randomly initialized weights forming the connections between the units. Secondly, the output signals are computed by taking a weighted sum of the system’s responses, the states of the neurons. In other words, different features from the feature space are combined to form the output. Finally, these output weights are trained in a supervised way, by linear regression for example, in order to approximate the desired output as best as possible. In essence, the network is used as a black box computational tool, where its intrinsic random dy- namics are exploited to perform temporal tasks. Two computing paradigms, based on these fundamental principles of RC, are echo state networks (ESN) introduced by Herbert Jaeger in [43] and Liquid State Machines (LSMs) by Wolfgang Maass [65]. Although both concepts are inspired by the function- ality of the brain, ESNs are designed specifically as a machine learning tool while Maass’ LSMs have the intention to model and mimic the behavior of the human brain as best as possible. They were introduced separately and from these two different viewpoints. Later (2007) their resemblance was no- ticed by D. Verstraeten and together with a third methodology to work with untrained RNNs, the Backpropagation Decorrelation learning rule [85], they were unified in [100]. Although this completely new approach of handling RNNs led to instan- taneous state of the art results in several temporal tasks such as (chaotic) time-series prediction [45], financial forecasting and even speech recognition [84] [93], the trained versions of these RNNs have caught up and eventually

10 Figure 1.3: Representation of an ESN consisting of a RNN, the reservoir, with random weights and trained read out. [66] surpassed the ESN framework. Hence RC as competing force with trained RNNs has steadily lost its ground, being outperformed by the latter on nearly every task. Nevertheless RC has found its new promising niche in the field of analogue computing, where its view upon computation is used in order to exploit the intrinsic computing power of physical systems.

1.5 Physical Reservoir Computing

As was seen in Section 1.2, the analogue computing paradigm relies on the exploitation of the intrinsic dynamics of physical systems to solve computa- tional problems. The state of these physical systems can take up any con- tinuous value, contrary to the set of discrete values in the digital framework. Hence processes cannot be reliably repeated with exact equivalence in case of analogue computing. Also, in the fabrication of these analogue computation blocks, device variability is inevitable and thus a computational framework aiming at using these systems should account for both inter- and intra-device variability.

11 This is where the RC approach towards computing with RNN comes into play. The concept relies on two key points. Firstly, the detailed nature of the reservoir is unimportant, only its overall dynamics play a role. Secondly, RC allows for a simple training approach where only the readout layer is trained. This flexibility enables the RC approach to be applied to a large variety of dynamical systems for computation. Applying these same core concepts of RC to physical systems leads to what is called Physical Reservoir Computing (PRC). PRC presents the right framework and tools to exploit the analogue computational power, intrinsic to physical dynamical systems, in a robust way: i.e. without the need to control the randomness that lies in the different parameters of the physical system. The physical system is considered as a reservoir that maps the input via its own intrinsic dynamics onto a high dimensional feature space. These features are read out by measuring the state of the system and can subsequently be combined in a trained way to produce the desired output. There have been successful efforts to translate the backpropagation algorithm to physical systems, in the fields of photonics and acoustics [34]. Nevertheless due to its simplicity, flexibility and robustness, the RC approach has become one of the main paradigms to perform computation with analogue systems. PRC has been demonstrated in various physical systems, both experi- mentally and in simulations. In [24], as a proof of concept a water bucket is used as reservoir. Here, the water is disturbed by speech signals and the wrinkle-patterns on the surface are used to represent the state of the system. These dynamical input-driven patterns are read out by a camera and used as features to a simple perceptron in order to solve the XOR problem and undertake speech recognition. An example of PRC in morphological compu- tation can be found in [32], where a simple spring mass system emulates the complex body morphology and is used as computational block to facilitate body control. Promising photonics-based hardware implementations are described in [96] [74] [54] [97] [95]. Although these photonic realizations are hard to in- tegrate with current CMOS hardware, they form a particularly attractive alternative for applications where the information is already in the optical domain, e.g. in telecom and image processing. State of the art performance

12 is obtained on a variety of tasks and the high-bandwidth, low-power and inherent parallelism characteristic to light, are fully exploited for these com- putational purposes. A first proof of concept of PRC in analogue, neuromorphic systems is pre- sented in [82], using an analogue VLSI chip. Furthermore the RC approach is also applied to networks consisting of nano-scale switching elements based on memristive junctions, e.g. [52] [83] [12]. The nano-scale nature of these devices comes with a huge amount of inevitable variability, hence the RC concepts form a promising approach in developing schemes for computing with these unreliable components. Electronically driven and easily scalable, these neuromorphic networks can be embedded in a classical digital CMOS environment to form powerful parallel computational units. The research done in this part of analogue computing shows the promising future that lies in exploiting the intrinsic complex dynamical properties of physical systems in situations where the gains in robustness and efficiency outweigh the induced errors compared to digital alternatives. However most of the technologies presented here are still in an embryonic stage and further exploration of the computational capacities and applications of these devices is necessary.

13 Chapter 2

Theoretical Background

Reservoir Computing was introduced in Section 1.4.3, as an alternative way of training RNNs. Two adaptations of these general concepts emerged, in the form of H. Jaeger’s Echo State Networks and, the more biological inspired, Liquid State Machines of Wolfgang Maass. In this chapter the mathematical framework for ESNs is presented. The different reservoir parameters and their influence on the dynamics of the reservoir will be discussed. In Section 2.1.6 measures to characterize the computational performance of the ESNs will be introduced. Finally this newly acquired knowledge will be extrapo- lated to physical reservoirs in Section 2.2.

2.1 Echo State Networks

2.1.1 Mathematical Framework of ESNs

The ESN model is introduced in discrete time as was done by H. Jaeger in [43], where only discrete timesteps are considered n = 1, 2, 3, .... Here the framework for the case of simple linear regression on the network’s state will be presented. The total network has the following characteristics as can be seen in Figure 1.3, a K dimensional input layer, L output units and an internal recurrent network consisting of N internal units. The input, respec- tively the output, at timestep n is denoted by u(n) = (u1(n), ..., uK (n)), y(n) = (y1(n), ..., yL(n)). The activation of the internal neurons are repre- sented by x(n) = (x1(n), ..., xN (n)). Four weight matrices contain all the

14 in in connection weights: a N × K matrix W = (wij ) for the weights connect- ing the input neurons to the internal units, a N ×N matrix W = (wij) inter- out out connecting the internal weights, a L × (K + N + L) matrix W = (wij ) connecting both the input and the internal units to the output and the out- back back put nodes among each other, and finally a N × L matrix W = (wij ), representing the feedback connections from the output into the internal neu- rons. Although these feedback connections are often redundant, sometimes they prove necessary to provide good results. The equations governing the update step of the internal units and the output are given by:

x(n + 1) = f(W inu(n + 1) + Wx(n) + W backy(n)), (2.1)

y(n + 1) = f out(W out(u(n + 1), x(n + 1), y(n)), (2.2) where f and f out are applied elementwise. These are the nonlinearities of the internal and output neurons respectively, usually sigmoid functions. An important aspect is that the internal connection weight matrix W isn’t re- stricted in any case (i.e. no particular feedforward layered structure is pre- scribed), however the aim is to create recurrent pathways. In the ESN ap- proach, the weight matrices W, W in and W back are randomly initialized together with the initial state at n = 0 of the internal nodes x(0). Combined with the above update equations, this describes a dynamical system (a non- linear system in case of a nonlinear transfer function) that maps a certain input u(n) to a corresponding output y(n) by taking a linear combination, denoted by W out, of the internal states (neurons) x(n) at each timestep n. Contrary to the idea of BPTT where all the internal weights are learned, the system’s dynamics in the ESN approach aren’t trained for a specific task, as the weights are initialized at random. The training phase only addresses the readout layer and determines the optimal values for W out. Different training techniques will be discussed in Section 2.1.5.

15 2.1.2 The RC View on Computation

The whole RC approach to designing and training RNNs relies on a concep- tual and computational distinction that is made between a dynamic reservoir – a randomly connected RNN that serves as a nonlinear temporal expansion function – and a static (usually linear) readout that is trained to produce the desired output. This separation can be understood starting from the two different purposes for which the reservoir and the readout serve. The goal of the reservoir is to expand the input history into a high dimensional reservoir state space [43] [63]. On the other hand the readout is trained to tap into this state space and combine the high dimensional input representations in such a way to form the desired output. This is a familiar idea that is common with Kernel Methods (e.g. SVM and radial basis function methods) where the input is expanded into a high dimensional feature space, without any computationally added effort. On these transformed features any standard machine learning algorithm can be applied to perform the task. If these ’simple’ methods were to be applied on the original input data, they wouldn’t be powerful enough to perform the desired mapping. This concept can be easily visualized with the following example of the XOR task, as presented in Figure 2.1. The XOR problem is an example of a nonlinear classification task. In the original 2D input space, the data is never linearly separable. However, by projecting the input data into a three dimensional space using a well chosen transformation, the data becomes linearly separable. Hence, in this new higher dimensional feature space, the task can be solved by a linear classification model. The real power hidden in these Kernel methods lies in a smart mathematical detour in the way this projection happens, where the explicit computation of these features is avoided. This follows from Mercer’s theorem (a continuous symmetric pos- itive semi-definite kernel K(x, z) can always be written as an inner product in a high dimensional feature space) and is known as the ’Kernel trick’ [33]. Inner products between data samples are often used in machine learning methods to represent some sort of similarity measure and hence as features in the computation. By making use of the ’Kernel trick’ these inner prod- ucts can be replaced by a (well-chosen) kernel function without the explicit

16 Figure 2.1: Projection of the input into a higher dimensional space can make a classification problem linearly separable [98]. computation of the high-dimensional mapping from the input space to the corresponding feature space. In this regard, even infinite-dimensional feature spaces can be attained without any additional cost in computation. In reservoir computing, the reservoir serves as a ’temporal kernel’ that expands the input, due to its intrinsic dynamics, into a higher dimensional feature space. The mapping of the input into the reservoir state space is temporal in nature as the reservoir is in essence a dynamic system. Another important distinction with Kernel Methods is the fact that now the kernel mapping has to be computed explicitly. In the ESN approach this is done by solving the RNNs dynamics using the update equations, described in Section 2.1.1. However, in a more general view of RC this enables the use of, e.g., physical systems as reservoirs. Training the readout layer is a relatively straightforward non-temporal task. The crucial part in the RC approach on the other hand, is setting up the reservoir in such a way that its state observations serve as ”good” tem- poral kernels, i.e. the states of the reservoir contain the necessary features to perform the task at hand. This is the bottleneck of the RC approach and is still ill-understood in many aspects.

17 2.1.3 Reservoir Dynamics

One of the key requirements for RC that must be fulfilled by the system is that of fading memory, as introduced by Maass in [65], closely related to the echo state property (ESP) in the ESN framework. See [44] for a mathematical definition. Fading memory is present when the state of the reservoir depends on the input history in a decaying way, meaning that excitations far in the past have no effect on the current state. In this sense the response of the system to the input is localized in time, hence it provides the system with a notion of memory [35]. Moreover, the influence of the initial conditions of the system is also reduced if the system exhibits fading memory. Equivalent conclusions can be drawn starting from the ESP in the ESN approach. The ESP makes sure that the state of the reservoir only depends on the input history up until a certain point, not on its initial state:

xi(n) = ei(u(n), u(n − 1), ...), (2.3) where xi is the state of an individual neuron and ei is called an echo function. In the ESN approach, the total state of the network is interpreted as a com- bination of echoes of the input. If the network has been run for a sufficiently long time, the history of the input determines uniquely the current network state. Next consider a nonlinear one-dimensional dynamical system in discrete time, governed by the following update equation:

d(n) = e(u(n), u(n − 1), u(n − 2), ...), with e a (possibly nonlinear) function of the previous inputs. The most general description of such a system is considered, without limiting the de- pendence of d(n) on the number of previous inputs, i.e. the memory of the system. The task now is to model this system by approximating the function e, using the reservoir computing approach. An ESN consisting of N inter- nal neurons, whose individual state is represented by xi, is considered. The network is driven by the input u and the state of the network x is updated

18 according to the update equation 2.1 presented in Section 2.1.1. Next, the static readout is trained. For sake of notation, linear output units are con- sidered while output-output and direct input-output connections are omitted in equation 2.2. Training the linear readout weights then results in:

d(n) = e(u(n), u(n − 1), u(n − 2), ...)

≈ y(n) = W outx(n), where y represents the trained network output. Explicitly writing out the matrix multiplication gives:

X out y(n) = wi xi(n) i=1..N

X out = wi ei(u(n), u(n − 1), ...), i=1..N where in the last step the ESP, equation 2.3, is used to rewrite the individual states xi as echo functions. From this it can be seen how the approximation y can be interpreted as a linear combination of echo functions ei. Also, the arguments of the desired function e and the arguments of ei are identical in nature: both are collections of previous inputs. This transparent relation between the reservoir’s output and the desired output, directly relies on the interpretation of network states as echo states. Without the echo state property, one could neither math- ematically understand the relationship between the actual network output and the desired output, nor would the RC approach work. The previous ’example’ also shows that in order to achieve good approx- imations, the echo state functions ei are ought to provide a ”rich” set of

19 dynamics to combine from. However, some subtleties are involved in this definition of ’richness’. As the input causes the state of the reservoir to change in a dynamical, nonlinear way, these states can be collected at every timestep into X, the state matrix. The numerical rank of the state matrix X is denoted as r, which can never exceed N, the number of internal units of the reservoir. In a geometrical sense, these responses gathered in the state matrix span an r-dimensional subspace of all possible features of the input. For a specific task, these features are combined in a weighted manner in order to obtain the desired output. Hence it would be expected that the higher the rank, the more likely it would be that the features needed for the task are present in this subspace. However, if the system’s response is chaotic or random this would lead to a high ranked state matrix that doesn’t generalize well to unseen data. The responses have to be sufficiently dependent on the input history in order to produce decent features, which is not the case in a chaotic regime. This is not represented by the rank of X but rather by its range (the linear span of its singular vectors) [15]. It has been stated, both in the LSM [57] as the ESN [58] framework, that network performance increases the closer it operates to the ”edge of chaos”. In this operating regime both time and spatial correlations of the reservoir di- verge, leading to high network memory and ’richness’. The more dynamically the system responds to the input, the more varied these responses will be and the more likely it becomes that the right features for the task are generated by the reservoir, linking back to its role as a temporal kernel. However, if the system’s activity becomes too dynamic, i.e. it behaves chaotically, all the necessary information about the input will be swallowed by the intrinsic, wild and random dynamics of the system. Finally it is important to note that the performance of the reservoir de- teriorates fast with increasing input dimension. This is caused by the fact that even though the network’s behavior is only slightly nonlinear, all these input dimensions will get mixed up by the network’s dynamics and form many (nonlinear) combinations. So a lot of exotic, but mostly useless, input transformations will be produced, while most of the interesting signals and memory needed for the task are washed out.

20 2.1.4 Reservoir Parameter Tuning

The RC approach, although beautiful in its simplicity and learning complex- ity, still has its bottleneck in tuning the different reservoir hyper-parameters in order to push the dynamics of the system towards an interesting regime. This is mainly due to the fact that a principled approach for reservoir design is still missing. Most of the time, due to the easy and fast learning phase, a grid search is done over the parameter space. These hyper-parameters can be divided in several categories, ranging from the network topology (number of internal nodes, sparse or densely connected networks, etc..) to the scal- ing parameters, (the weights of the randomly initialized connection, input and potentially feedback matrices) [99]. Here the influence of the different hyper-parameters of the reservoir on its dynamic behavior will be discussed and how to tune these for optimal performance given the intrinsic challenges related to the task. With echo state networks one of the most important network parameters that govern the dynamics of the reservoir is the spectral radius, λmax (the supremum among the absolute values of the eigenvalues), of the reservoir’s connection weight matrix W. The correct choice of this parameter is crucial for the reservoir to be suited for the task at hand, as it is intimately related to the intrinsic timescale of the reservoir’s dynamics. Usually a spectral radius with a value between zero and one is chosen. A small value for the spectral radius leads to fast dynamics of the reservoir, whereas larger values (slightly less than 1) give rise to slower dynamics and longer memory as the input signal ’echoes’ longer through the network. The timescale of the network should be adapted to match the intrinsic timescale of the task in order to produce good results. The reason the spectral radius is (often) chosen smaller than 1, is due to the fact that in [43] it is stated that for an untrained recurrent network, with tanh as nonlinearity function and a spectral radius λmax > 1, the ESP is not guaranteed w.r.t. any input/output interval containing the zero input/output (0, 0) (a sufficient condition for the non-existence of echo states). However, in practice the ESP is consistently satisfied in the case of a spectral radius smaller than 1. As is often the case, here as well the devil is in the details. The strictness of the non-existence of

21 echo states for λmax > 1 is only valid for zero input/output, as the spectral radius is a static measure only dependent on the weight matrix, not on the actual input. Hence, as soon as the network is excited by an external input, the ’effective’ spectral radius of the network (taking the actual input into account) decreases due to the fact that the activation of the neurons move along the nonlinearity. As the derivative of the tanh is maximal at zero and decreases for all other values, this influences the dynamics of the reservoir and also the ’effective’ spectral radius. This means that for sufficiently large inputs, the ESP can still hold for spectral radii larger than 1. Input scaling is also key in tuning the reservoirs behavior to suit a given task. The input weights W in feed the input to the reservoir nodes and hence help to determine the activation of the nonlinearity. Large values for W in drive the activations more towards the nonlinear edges of the tanh, leading to more nonlinear system dynamics, whereas with small input scaling the system resides more in a linear regime. However, choosing these W in values too large leads to saturation of the internal nodes, as the activation constantly sits at the edges of the tanh function. With symmetric inputs, adding an extra bias term can help to break the symmetric response of the reservoir. Another parameter to consider is the connectivity or sparseness of the network, related to the number of zeros in the connection matrix W. Usually it is opted to use a rather sparse network in order to make the internal nodes inside the network less dependent and create a richer network response. The number of internal nodes, or network size, is the final parameter con- tributing to the reservoir’s performance. As a general rule one uses the largest network one can afford computationally, as more internal states increases the chance on the presence of useful features.

2.1.5 Training in the ESN Framework

In Section 2.1.1, the ESN framework was introduced with a simple linear regression read out of the network’s states. Here, building further on the view on computation described in Section 2.1.2, a more general approach towards training will be described. The whole RC procedure basically boils down to the following: a recur-

22 rent network is initialized randomly, and considered as a dynamical system that is fixed, i.e. its internal dynamics are not tweaked for a specific task. Next, the system is driven by the input. After the initial transients have died out, the internal states of the network are considered at each timestep and combined to form the desired output. It is only in this last step that training occurs in order to match the mapping from the internal states to the desired output. Both offline (or batch) as online training methods can be applied for the learning phase in RC. In offline learning, the model is trained once on the entire training data set and stays fixed afterwards. With online learning techniques, the model keeps on learning with every new data sample it encounters. Starting off with offline learning, in case of the ESN setup described in Section 2.1.1, this can be made more mathematically rigorous as will be described next. Let d(n) denote the desired L × 1 dimensional output vector that needs to be approximated by y(n), the actual output of the reservoir. In order to achieve this, a representative time series of the desired input-output behavior is taken. The system starts in a random initial state x(0) at n = 0 and gets excited by the input stimulus. In order to get to a representative input-output behavior, the initial transient of the system has to die out in order for the state of the system - hence the output- to only depend on the input and not on the initial state x(0). Hence, for training the initial transient timesteps are discarded, e.g. the first Ti steps. Next the system is run for an additional m training steps from n = Ti until n = Ti+m. During this training phase the input, all the states of the internal neurons and the previous output, are collected in a m×(K+N +L) state collecting matrix M. The same is done for the desired outputs at each timestep during training in the m × L dimensional vector D. Each ith row of the matrix M corresponds with a (K+N +L) dimensional feature vector for the desired L × 1 dimensional output at timestep i, d(n), the ith row of the matrix D. At this point, any machine learning model can be applied that takes the feature matrix of the training data M and maps this to the desired output training data D [44]. Most often, a simple linear regression (or ridge regression to prevent over-fitting) is chosen due to ease of implementation and training speed. However more specialized machine

23 learning algorithms have also been used, such as SVMs for example. In online learning, the parameters governing the mapping from the reser- voir’s state to the output are updated at each timestep. This is done by com- paring the actual output with the desired output and updating the model parameters accordingly. For example, linear regression and ridge regression models can also be trained in an online fashion using stochastic gradient de- scent. One example of an online learning technique specifically designed for RC, called FORCE learning, is presented in [91]. Inspired by the fact that an increasing amount of chaos in the network’s dynamics corresponds to an increase in its computational capacities, as mentioned in Section 2.1.3, they start from networks showing spontaneous chaotic behavior (i.e. networks with spectral radii large than 1). A simple linear read out is used together with additional feedback connections. FORCE learning enables the exploita- tion of this interesting operating region of spontaneous activity by suppress- ing chaotic network behavior during training. This is achieved by strong and rapid weight modifications in the initial training phase. The FORCE learn- ing algorithm isn’t restricted to the training of the output weights, also the internal weights can be modified in an online way. Results in [91] show that network performance indeed increases as the amount of chaotic behavior is increased up until a certain point, known as ”the edge of chaos” or, more correctly, ”the edge of stability”, where the FORCE learning algorithm fails to suppress the chaotic activity during training.

2.1.6 Measures to characterize the reservoir

Here a couple of measures will be introduced that can be used to character- ize a reservoir’s computational capacities. These measures will mainly focus on two important task related properties of a reservoir, namely its memory capacity and nonlinear richness.

Measure of Non-Linearity

A first measure for the amount of non-linearity in the response of the reservoir is given by the ’deviation from linearity in the frequency domain’

24 denoted by δφ. Introduced in [99] to characterize ESNs, however this measure doesn’t make any assumptions about the properties of the reservoir itself, hence it can be applied to different reservoir types as well. It is inspired by the fact that a nonlinear system, driven by a sine wave with a single frequency fc, produces higher harmonics in its response, whereas linear systems can only respond with re-scaled outputs at the same frequency. Hence an intuitive way to define the deviation from linearity of a nonlinear dynamical system is to look at the ratio between the energy in the original input frequency fc and the energy contained in all other frequencies in the system’s response:

δ = 1 − Ec . φ Etot

Here Ec is the energy of the response at fc and Etot is the total energy at all other frequencies minus the DC component. This measure is obtained by feeding the system with a single frequency (fc) sine wave and computing the Fast Fourier Transform (FFT) of the reservoir states. These FFTs are then averaged over all neurons to obtain an average power spectrum density. Next

Etot and Ec are calculated by integrating over the correct frequency ranges.

A δφ close to one corresponds to a strongly nonlinear regime, as nearly all the energy is located in higher harmonics that originate from the nonlinear behavior of the reservoir. On the other hand, δφ close to zero relates to a nearly linear regime.

Linear Memory Capacity

This is a standard measure, introduced by H. Jaeger in [42], that is related to the memory in recurrent neural networks. The input u(n) is sampled from a uniform distribution between [0, 0.5] and the desired output is defined as dφ(n) = u(n − φ), a φ-delayed version of the input. The capacity function

Cφ is given by:

2 Cov(y,dφ) Cφ = var(y)var(d) .

Adding up these capacity functions for all positive φ values leads to:

25 X Cmem = Cφ, φ which is called the Linear Memory Capacity of the reservoir. It is shown that with ESNs this measure cannot exceed N, the number of internal neu- rons, which is intuitively clear if the fact is consider that even if each neuron remembers its input history perfectly, combined they can’t store more infor- mation than the N previous inputs. Also there is a trade off between the memory and the nonlinear computation power of the reservoir [99] [20]. This can be understood from the fact that it gets easier to reconstruct past inputs from network states, the more these form linear combinations of the input. However the more nonlinear these combinations get, the harder it will be to reconstruct the delayed input signals. A more general and complete measure for the memory capacity of a sys- tem is defined in [20], as the nonlinear memory capacity. In essence this measure describes how well nonlinear combination of different previous in- puts can be reproduced by the reservoir. However, in order to produce this measure, long time series are needed. Hence here another benchmark task will be introduced that is related to both memory and nonlinear computation properties of the reservoir.

Academic Benchmark

A benchmark task in reservoir characterization used to measure a reser- voir’s capacity both on nonlinear mapping as well as memory persistence is the nonlinear auto-regressive moving average 10 (NARMA-10) task. It’s a discrete-time temporal task with 10th order time lag. The input u(k) is drawn from a uniform distribution in the interval [0, 0.5] and the output is defined by the following time series d(k):

X d(k + 1) = αd(k) + βd(k) d(k − i) + γu(k)u(k − 9) + δ, i=1...n

26 with α = 10, n = 9, β = 0.05, γ = 1.5 and δ = 0.1.

In order to predict the time series given by d(k) the computational system at hand must have memory of the 10 previous inputs, and equally be able to compute the nonlinear combinations in the expression.

2.2 Physical Reservoirs

In this section the RC approach will be extended to physical systems. The RNN from the ESN framework will be replaced by a physical system, which is again considered as a dynamical reservoir that maps the input to a high dimensional feature space. Nevertheless, some subtleties are involved in translating the RC concepts from discrete time ANNs to physical systems in continuous time. First of all the state of the physical system and a way to read out this state needs to be defined. In the case of a reservoir consisting of artifi- cial neurons this was straightforward as it was just the value of the internal nodes at timestep n. In a physical system however, state variables have to be found that are measurable, and representative for the system’s state changes induced by the applied input. As was seen in Section 1.5, in [24], a water bucket is used as reservoir and the wrinkle-patterns on the surface represent the state of the system. In [5], time delay dynamical systems (TDDS) are investigated to serve in the reservoir computing approach. The entire RNN is replaced by a single nonlinear node subjected to delayed feedback. The input is transformed by the nonlinearity of the single node and resides in the delay line (with τ the delay time). After a time τ, it enters the non- linear node again and the process is repeated. Along the delay line, the transformed input passes through different states which are read out with a constant interval, θ. These readout states are called virtual nodes, as they don’t correspond to actual physical nodes. However they do represent nonlin- ear transformations of the input, where the transformation happened earlier in the single real nonlinear node. Next these ’virtual’ nodes, containing the delayed transformed response, are linearly combined in a supervised fashion to perform the desired task. As this setup replaces an entire network of

27 Figure 2.2: Delayed feedback reservoirs scheme. Along the delay line N states, separated by a distance θ = τ/N from each other, are chosen to represent virtual nodes. with τ the delay time and θ the read out time [5] connected nodes by one single nonlinear node, it leads to a drastic simplifi- cation for the experimental implementation of PRC. This theoretical delay line approach is implemented in an opto-electronic setup, where a diode laser is delayed in an optical fiber and the nonlinearity is realized optically using a Mach-Zehnder interferometer, showing results comparable to state of the art digital implementations of reservoir computing [74]. In case of a RNN as reservoir, both positive and negative weights are present in the input and internal connection matrices. However, when mov- ing to the physical domain, there is no analogon for these negative weights. Hence, the possible input transformations that can be attained by the reser- voir are limited by this fact. One way to resolve this is with input encoding, where the physical reservoir is presented with the necessary input transforma- tions it can’t compute by itself. A nice example can be found in the photonics implementation on an integrated optical chip (a passive silicon photonics chip ) presented in [97]. Contrary to the usual fibre based implementations [74], here coherent light is used, which yields a significant performance improve- ment over the real-valued networks using incoherent light. As the input is now presented as complex numbers, the internal degrees of freedom of the reservoir is essentially doubled. In Section 2.1.6, the Memory capacity and NARMA task were introduced in discrete time, with discrete white input noise u(n) ∈ [0, 0.5] and discrete output values d(n). However, real physical systems operate in continuous

28 time, hence the discrete input noise needs to be mapped to a continuous time signal u(t). The same mapping is performed on the desired discrete output d(n) to obtain d(t). The input signal u(t) is then fed to the system which transforms it according to its internal dynamics. These transformations are read out from the state of the system and combined (for example using linear regression) to form the continuous time output y(t). The weights in this linear combination are trained such that y(t) approximates d(t) as best as possible. In turn, y(t) needs to be decoded to discrete time y(n), after which the system’s performance can be evaluated by comparing y(n) to d(n). Another important aspect when moving from discrete time to continuous time in physical systems, is the influence of the different timescales govern- ing the system on its computational capacities. Taking another look at the TDDS, can provide some insights. Here three distinct time scales can be identified: the delay time τ, the read out time of the virtual nodes θ, and the timescale T of the real nonlinear node. The interplay of these three timescales influences the diversity of the states and hence the efficiency with which the computational power intrinsic to the system is harvested. First of all θ has to be chosen such that θ << τ in order to have enough virtual states to work with. In case T << θ the nonlinear node would have reached its steady-state before the first read out at time θ. Hence, there would be no coupling between the virtual nodes as they would only be determined by the instantaneous value of the input and the delayed reservoir state. How- ever if θ < T , then the read out state of the system at time t depends on the states of the previous virtual nodes. So good performance is expected when these three time scales are related according to θ < T << τ. In case of the photonics implementation on a photonics chip [97], mentioned earlier, the system’s timescale is dependent on both the signal’s propagation speed and the interconnection delays. Hence, in physical systems, not only spatial architecture and signal speed, but even drive speed of the used sources, read out speed of the measuring equipment etc. need to be taken into account. Every physical system has certain timescales related to its intrinsic dy- namics so in order to successfully apply RC it is important to have a good grasp of the different processes underlying the system’s behavior. Just as was the case with ESNs, here as well, a naive blackbox approach won’t work

29 as a lot of subtleties are involved in getting the system to operate in a regime that is interesting from a computational point of view. This is where smart design and input encoding come into play to drive the system into these desired regimes for specific tasks.

30 Chapter 3

Memristor

3.1 The Missing Electrical Component

In 1971 [18] the circuit theorist Leon Chua postulated the existence of the memristor. By looking at the existing relationships (voltage versus current, voltage versus charge, and the magnetic flux versus current) linked by the three fundamental circuit elements (resistor, capacitor and inductor) he in- ferred that a fourth fundamental circuit element might exist, linking the magnetic flux and charge. In contrast to the static resistor, the memristor has a dynamic relationship between the current and the voltage remembering its past inputs, hence the name ’memory resistor’ or memristor for short. The physical behavior of memristors (and memristive devices in general) can be broadened beyond the scope of electrically controlled devices [92]. In other physical phenomena, memristive behavior emerges as well and the mathematical concepts behind the memristor can also be applied to describe this behavior. In the description that follows therefore not only the electrical case will be treated but a more general mathematical framework will be introduced, independent of the physical system at hand.

31 3.2 The Memristor

3.2.1 Characteristics of the Ideal Memristor

The ideal memristor, a theoretical concept as introduced by Chua for the electrical case, is defined by a set of equations that will be presented here. The first equation, called the port equation, describes the relationship between the memristor’s port variables, its excitation variable u(t) and the response y(t), in the following way:

y(t) = g(x) u(t) Port Equation, (3.1) where t denotes the time. g(x), known as the parameter versus state map (PSM), represents a nonlinear function of the state variable x(t), who’s time derivative is governed by the input signal u(t) and described by the following State Equation:

dx = u(t) State Equation. (3.2) dt Next two more quantities are defined: the time-domain integrals of the input uI (TIU) and the response yI (TIY). By integrating the state equation 3.2 with respect to time the following is obtained:

uI = x TIU. (3.3)

Substituting this in the port equation 3.1 and performing the integration results in:

yI = F (uI ) = F (x) Constitution Relation (CR). (3.4)

Differentiating 3.4 w.r.t time gives:

32 d d d dx d y (t) = y(t) = F (x) = F (x) = F (x) u(t). (3.5) dt I dt dx dt dx And thus:

d g(x) = F (x). (3.6) dx From this it can be seen that the CR and PSM are equivalent features for describing the behavior of the memristor. As a third characteristic, the relationship between y and u is also frequently used, where the memristor is driven by a sinusoidal source. Plotting the memristor’s output versus its input gives rise to a pinched hysteresis loop (PHL), Figure 3.1. This PHL is defined in [18] as a double-valued Lissajous figure of y(t) versus u(t) for all times t, with the restriction that it has to pass through the origin (here the loop is pinched). The PHL forms the memristor’s most familiar finger print. For increasing frequencies the pinched hysteresis effect will degenerate, corresponding to a straight line through the origin.

33 Figure 3.1: Examples of CR, PSM, and PHL for a current-controlled mem- ristor [92]. See table 3.2 for the definition of the input, output, TIU, TIY, PSM and CR in case of a CCMR.

34 Table 3.1: General Memristors

Port variables Domain Effort E Flow F Electrical Voltage Current Mechanical, translation Force Linear velocity Mechanical, rotation Torque Angular velocity Hydraulic Pressure Volumetric flow Thermodynamic Temperature Entropy flow Chemical Chemical potential Molar flow

Native state variables Domain Momentum = TIE Displacement = TIF Electrical Flux Charge Mechanical, translation Momentum Position Mechanical, rotation Angular momentum Angle Hydraulic Pressure momentum Volume Thermodynamic Temperature momentum Entropy Chemical Chemical momentum Moles

Effort and flow and their time-domain integrals in various domains as port quantities and native state variables of generalized memristors [92].

Memristors can be categorized into two general classes, given by effort and flow controlled memristors (ECMR and FCMR). In this way, the set of equations defining the memristor as introduced above, can be applied to a multitude of physical fields. The way the theoretically introduced port and state variables translate to these different fields is presented in Table 3.1. The general concepts of effort and flow correspond to voltage and current in the specific case of electronically controlled memristors. For the voltage controlled memristor (VCMR) and current controlled memristor (CCMR) the state and circuit variables are presented in Table 3.2. From this it can be seen that the PSM, g(x), corresponds to the memductance (GM ) for the

VCMR and memristance (RM ) for the CCMR. In case of the CCMR, the

35 memristance describes the relation between the input current and output voltage:

v(t) = Rm(q(t))i(t), hence the memristor’s electrical resistance is dependent on the charge, i.e. on the history of the current that has passed through the memristor. It is important to note that the state variable x, which corresponds to the charge q in this case, does not change in the absence of current flowing through the device. This results in the so called non-volatility property: the device remembers its history when the electric power supply is turned off, i.e. the memristor keeps its most recent resistance value. The degeneration of the PHL with increasing frequency resulting in a linear input output relation, in the electrical case, corresponds to the memristor acting as a simple resistor for higher frequencies. Table 3.2: VCMR & CCMR

Memristor VCMR CCMR

Input (u) v i

Output (y) i v

TIU = state (x) φ q

TIY (yI ) q φ

PSM [g(x)] GM (φ) RM (q)

CR [yI = F (uI )] q(φ) φ(q)

Voltage and current controlled memristors (VCMR and CCMR) and their circuit and state variables. v, i denote the voltage and current respectively. q corresponds to the charge and φ represents the magnetic flux. [92]

3.2.2 State Variable

From the state equation 3.2 it can be seen that the natural state quantity of the memristor, denoted by x, is the time integral of the excitation. These state quantities are called ’native state variables’. For example, from Table

36 3.2 we read that for a CCMR the native state variable is the charge q, the integrated current that has flowed through the memristor. In actual existing systems, the state is determined by a physical process described by multiple parameters that may be interdependent. Hence, instead of using the native state variable, another variable could be used to describe the state of the system. However, due to the (possibly nonlinear) dependence of this new variable on the native state variable, the port and state equations (3.1, 3.2) for this newly chosen state variable might not be in the same unified form as is the case for the native state variable.

3.2.3 General Memristive Systems

Ideal memristors are described by the state and port equations (3.1, 3.2) defined above, where x is the native state variable. The state equation 3.2 states that the time evolution of the state parameter does not depend on the state itself. From the port equation 3.1 it is found that the memristor’s PSM g(x) is independent of the instantaneous values of the excitation. In [19], both port and state equations are replaced by:

y(t) = g(x, u, t) u(t), (3.7)

d x(t) = f(x, u, t). (3.8) dt Here x(t) denotes an n-dimensional vector of different state variables. Systems described by the above equations are called nth order memristive systems and form the mathematical generalizations of the ideal memristors introduced before. This more general model of memristive systems has been applied to describe several empirically-observed non-electrical phenomena, including the HodgkinHuxley model of the axon, Josephson junctions, neon bulbs, a thermistor at constant ambient temperature [19] and even biological structures such as blood and skin [47]. These new versions of the port and state equations allow memristive systems, in contrast to memristors, to show volatile behavior. Even without any applied input, the state vector x(t) can change over time.

37 3.3 Resistive Switches

In the previous section, the memristor was introduced as a mathematical concept. In 2008, a team from the Hewlett Packard (HP) labs claimed to have produced the first electrical memristor device, based on a thin film of titanium dioxide [88]. Exposed to a strong electric field or current this device changes its two terminal resistance, a phenomenon known as resistive switch- ing. In [88] a physical model was proposed in order to describe this switching behavior and form the link with the memristor concept introduced by Chua. Later Chua argued for a broader definition of the electrical memristor includ- ing all 2-terminal non-volatile memory devices based on resistance switching [17]. Ever since, a lot of controversy has arisen regarding the legitimacy of HP’s claim and the applicability of memristor theory to any physically realizable device [103] [69] [102]. Despite the ongoing controversy, in the proceeding of this work the terms memristor and resistive switch (RS) will both be used. Resistive switches, although linked to the memristor as stated above, are more accurately classified as memristive systems. These RS, two-terminal cir- cuit elements composed of a metal-insulator-metal (MIM) junction, exhibit a nonlinear relationship between the current and voltage across their termi- nals, also known as memristance. Several materials can be used to form this MIM junction. The RS’s switching behavior and nonlinear current-voltage relationship are ascribed to the adjustment of Schottky barrier heights at the metal-insulator interface, phase changes of the insulating material, the formation/annihilation of conductive filaments that extend across the active region of the device, Joule heating and the displacement of ionic species [9]. Yet, regardless the underlying physical mechanism, in order to obtain substantial switching of the resistance in these devices a conductive channel across the junction has to be created/destroyed. Since 2008, interest in this new nano-scale device skyrocketed with re- search in a variety of application fields: chaotic circuits [71] [14], processors [77] [26], and robot control [25] [40], to name a few. However, especially in the design of new solid-state memory devices, also known as resistive random- access memory or ReRAM devices [107] [2] and synaptic weights modelling

38 Figure 3.2: Resistive switches implemented in a cross-bar structure as they are used for memory storage. in neuromorphic computing [41] [9] [75] [60] [46] the properties of these new memristive devices are highly attractive. In the case of ReRAM, the RS are implemented in cross-bar architectures to serve as building blocks for these new forms of ultra dense memory storage, Figure 3.2. Here one is interested in the memristor-like characteristics of the RS, i.e. its non-volatility property. However volatile behavior is present in the developed technologies [8] [72]. Under a weak stimulus, these systems attain a temporal state and after a certain time they decay back to a state of equilibrium. These state-restoration time constants are system dependent and can range from a few hundreds of nanoseconds to years. As said before, the RS are not memristors in the strict sense of the word, but fall under the class of the more general memristive systems. Biological neurons in the brain, on the other hand, exhibit both short term revertible dynamics, known as short-term plasticity (STP), as well as long term stable modifications of their state, long-term plasticity (LTP) [9]. The LTP property of synapses are assumed to lie at the basis of memory in the brain and form an active field of research. STP, although not related to long term memory, may provide a lot of computational power and relates to the large variance present in the response of synapses to a specific signal. In

39 the search for synthetic synapses for the fabrication of neuromorphic systems, resistive switches form a particularly promising class of candidates [41]. Their nonlinear dynamics show properties also present in biological synapses, such as hysteresis, LTP, long-term depression, spike-timing dependent plasticity etc. Other qualities that make these nano-devices suitable candidates for fabricating brain-like systems, are there low energy consumption, small size and simple architecture (as described above: a two-terminal device composed of a MIM junction). In the following of this chapter two types of RS will be introduced, the

T iO2 memristor and the atomic switch (AS). Their physical properties will be investigated and for each of these a simulation model will be proposed based on a CCMR model. In the final section of this chapter, a VCMR model will be presented as alternative.

3.4 T iO2 Memristor

3.4.1 Physics

The first memristive device under investigation is the T iO2 memristor cre- ated by HP in 2008 [88]. This device consists of a thin T iO2 film (≈ 10nm), an insulator, sandwiched between two platinum electrodes, as can be seen in

Figure 3.3. In order to create the switching channel from the insulating T iO2

film, an electroforming process is applied [112]. The T iO2 layer now consists of a highly conductive channel (T iO2−x), due to positively charged oxygen vacancies, and a very narrow insulating T iO2 barrier near the positive elec- trode. Due to the difference in work function between the upper platinum electrode and the T iO2 insulator, a Schottky barrier is formed at their in- terface. The high concentration of oxygen vacancies in the conducting layer however obstructs the Schottky barrier between the lower electrode and the

T iO2−x layer. Here an ohmic contact is formed [111]. By applying a voltage across the memristor, a dopant drift in the con- ducting channel can modulate the barrier width w. A positive voltage leads −2 to a drift of the O vacancies towards the T iO2 layer, where they cancel the Schottky barrier and transform the P t/T iO2 interface into an Ohmic

40 Figure 3.3: Cross section of HP TiO2 memristor [48] contact. As the width w decreases, the tunneling of electrons through the barrier increases. This leads to an increase in conductivity of the memristor.

Due to this migration of the vacancies, locally reduced T iO2 Magnelli phases are formed, which eventually line up to form highly conductive channels that extend along the memristor’s active T iO2 core. The memristor is switched to its ON state. By applying a negative bias, or due to the effects of Joule heating caused by the high current densities, these conductive channels are annihilated, leading to a drastic decrease of the device’s conductivity. This negative bias also results in the migration of the oxygen vacancies away from the T iO2 layer, increasing the barrier width and the memristor’s resistance. This low conductive state corresponds to the OFF state of the device. In

Figure 3.4 the resulting nonlinear switching dynamics by which the T iO2 device is governed is shown. These volatile versus non-volatile switching characteristics seen in Figure 3.4 – i.e. for small biases no state changes occur but with further increase in amplitude the state changes abruptly – can be explained by nonlinear trans- port of the oxygen vacancies. Figure 3.5 c presents a conceptual model for the conduction mechanism explaining both volatile and non-volatile state transi- tions. This vacancy motion can be described as thermally activated hopping of oxygen ions with a critical energy barrier Ei. In order to induce a stable, non-volatile transition, the energy barrier has to be exceeded. This can be accomplished by a single strong voltage pulse, or by several smaller pulses together, making use of the fact that this energy barrier Ei is dependent on

41 Figure 3.4: Typical switching dynamics for an anion (T iO2−x) device char- acterized by a voltage pulse stress (a, top) with variable pulse duration and amplitude. In particular, panel a (bottom) shows 16 curves, that is, 8 each for set (green) and reset (blue), with each curve showing evolution of the normalized resistance (R, measured at specific bias) for the device that is initially set to the OFF (ON) state and then continuously switched to the ON (OFF) by voltage pulses with fixed amplitude and exponentially increas- ing duration [114].

42 the previous state of the device. Most likely, this is caused by the fact that several channels can be formed simultaneously across the junction, leading to a higher conductive state of the device. In this sense, the memristor acts as a nonlinear accumulator. Once Ei is exceeded, the system undergoes a transition from one long-term thermodynamically stable state to another, giving rise to a non-volatile change in resistance. If Ei is not reached how- ever, a volatile response occurs, with the eventual restoration of the initial equilibrium state (Figure 3.5 b). A metastable phase-transition within the

T iO2 core of the device lies at the basis of this volatile behavior. This hap- pens before any thermodynamically stable phase-transitions occur with the formation or annihilation of conductive channels consisting of reduced T iO2 filaments. The higher this hopping rate of the oxygen vacancies, the higher the drift velocity of the ions and hence also the switching speed of the device [89]. Due to the small dimensions of these nano-scale devices, large electric fields are not unlikely and exponential drift velocities occur. As said before, Joule heating is also a common phenomenon in these devices due to the high current densities. As the current is exponentially dependent on the temperature, both of these effects contribute to the nonlinear switching behavior.

3.4.2 Simulation Framework

Due to the rising interest in memristors, section 3.3, but the limited avail- ability, an accurate modeling of their behavior is required. For this purpose several designated simulation models have been created [110]. Here a mem- ristor model [8], that also accounts for the volatile effects is introduced. The model is written in SPICE (Simulation Program with Integrated Circuit Em- phasis), an analog electronic circuit simulator used in integrated circuit and board-level design. The next section starts off with Strukov’s current con- trolled memristor model [88] for the HP memristor. Next it is shown how this model is implemented in SPICE and an additional ’window function’ is introduced to account for nonlinear dopant drift [10] . Finally the extended memristor model from [8] is introduced to take the encountered volatility into account.

43 Figure 3.5: (a) Optical microphotograph of a single T iO2 memristor. (b) Example of measured volatile characteristics of the device shown in (a); Three disruptive pulses are applied with interpulse timing of 1s which trig- ger volatile state transitions; READ pulses of small amplitude (0.5V non- disruptive) are applied every 20 ms to assert the conductivity of the device. (c) Conceptual model of conduction mechanisms that can render a volatile (metastable), as well as a non-volatile (stable) transition in the device’s con- ductance. [8]

44 Current Controlled Memristor Model

In [88] they start off from the description of a general current controlled memristive system with the following port and state equation:

V = R(I, w, t)I,

dw dt = f(I, w, t), where w can be a set of state variables. In order to give a physical interpreta- tion to these equations, a thin semiconductor film of thickness D sandwiched between two metal contacts is considered as shown in Figure 3.6. Here w corresponds to the width of the doped region. The total resistance of the memristor device is determined by two variable resistors connected in series.

RON and ROFF denote the resistance of the memristor in both limit cases where w = D and w = 0 respectively, and hence the total resistance, RMEM , is written as follows:

RMEM (x) = RON x + ROFF (1 − x), (3.9) where

w x = ∈ [0, 1]. (3.10) D Ohm’s law leads to the following port equation (3.11), where in analogy with Section 3.2.1, x corresponds to the (non-native) state variable, RMEM (x) to the PSM g(x) and v(t), i(t) to the output and input respectively:

v(t) = RMEM (x) i(t). (3.11)

Applying an external bias v across the device will cause the charged dopants to drift towards the undoped region and hence move the bound- ary between both regions [11]. Here, the simplest case of ohmic electronic conduction and linear ionic drift in a uniform field is considered, leading to the following state equation:

45 Figure 3.6: Schematic representation of the T iO2 memristor [10]

dx µR = ON i(t), (3.12) dt D2 for the state variable x. Here µ(≈ 10−14m2s−1V −1) denotes the average ion mobility which influences the resistive switching speed of the device. The coupled equations 3.11, 3.12 take the normal form for a current-controlled memristor. However this is only valid for x ∈ [0, 1] or thus w ∈ [0,D]. In both limit cases, the memristor is saturated and acts as a simple resistor with resistance ROFF ,RON respectively, until the bias is reversed.

Non-Volatile SPICE-Model

As described in section 3.4.1, the high electric fields in these nano-technologies cause strong nonlinear dopant drift. These nonlinearities occur particularly at the interfaces between the electrodes and the T iO2 layer, i.e. for x ≈ 0 and x ≈ 1. In [10] a window function f(x) is introduced to model this behavior:

f(x) = 1 − (2x − 1)2p, (3.13) with p a positive integer. This leads to the following extension of the state equation 3.12:

dx µR = k i(t) f(x), k = ON . (3.14) dt D2 46 The above state and port equations, 3.11 - 3.14, can be implemented in SPICE with the block diagram as shown in Figure 3.7. The main circuit consists of the upper two blocks where the left one provides the relation between the voltage and the current. This can be seen by rewriting the memristance 3.9 as

RMEM (x) = ROFF − x∆R, ∆R = ROFF − RON . (3.15)

As x varies through time, a voltage controlled voltage source EMEM is used dependent on the voltage at node x in the right block. Here one makes use of the definition of the capacitance to perform the time integration of the state parameter x:

dV C = I. (3.16) dt Integrating both sides w.r.t. time leads to:

1 Z t V = ( I dt + V0). (3.17) C 0 If a capacitance of 1F is taken and the current through the capacitor is set equal to the right hand side of the state equation, than the voltage V (x) over the capacitance Cx gives the normalized width of the doped region, i.e. w the state parameter x = D . The initial boundary between the doped and undoped region x0, corresponds to the initial voltage over the capacitor Cx.

R − R x = OFF INIT , (3.18) 0 ∆R with RINIT ∈ [RON ,ROFF ] the initial resistance of the memristor.

47 Figure 3.7: Blockdiagram of the SPICE model [10]

48 Figure 3.8: (a) Circuit schematic of the volatile SPICE model, based on the model in [10], extended with two extra cells to account for volatility. (b) Simulated pinched hysteresis loop under stimulus by a voltage sine wave of 1Hz; reduction to linear resistor for sine wave of 100Hz.,(c) Resistance of the device under stimulus as in (b). (d)-(g) Simulation example of volatile to non-volatile state transitions is shown in d) to (f) for two identical devices stimulated by identical input stimuli with different interpulse timing: red line -500ms, blue line -1s. (d) Instantaneous resistance of the model, Rmem = v/Imem e) Non-volatile resistance levels at which the model will subsequently settle at, as shown in (d). (f) Volatile charge response (Y = Ron y + (1 − y) Roff ). (g) Input stimulus. [8]

49 Volatile Extension

As mentioned in section 3.4.1, the fabricated T iO2 memristors exhibit volatile behavior due to initial metastable phase-transitions in the switching channel. To account for this non-volatile behavior, the model from [10] is extended with two additional cells in [8], as can be seen in Figure 3.8 a. The following parameters determine the characteristics of the model and can be tuned to match experimental data: RON , ROFF , RINIT , µ , D, Cx,

Rx, qp, qn, P . Here, RON , ROFF and RINIT serve the same purpose as in the non-volatile model. Cx controls the magnitude of the volatile resistive switching behavior proportional to the input while Rx determines the time constant of the state decay. The state variable is, just as in the non-volatile model, again given by x, determining the memristor’s resistance RMEM (x) 3.9. However, now three internal variables describe the state of the model: x, y and z, initialized as follows:

Roff − Rinit x0 = y0 = ; z0 = 0. (3.19) Roff − Ron These three state variables are linked by the following set of coupled differential equations, represented by a volatile, non-volatile and charge cell:

dx x − y V olatile Cell : Cx = − + I0(x) (3.20) dt Rx

 I (y), z > q and V > 0 dy  0 p Non − volatile Cell : C = I (y), z < q and V < 0 (3.21) y dt 0 N  0, else

dz z Charge Cell : Cz = Imem − , (3.22) dt Rz

I µ R f(h) where I (h) = mem v on . (3.23) 0 D2 50 Figure 3.9: Volatile behavior of T iO2 ReRam cell. Effect off the interpulse timing on the relative change in conductance ∆C/C0. Blue: measured device response - Red: modelled device response. (a) interpulse time of 1s leads only to volatile state transitions. (b) (c) with interpulse times of 600ms - 200ms, the energy barrier is exceeded and apart from a transient behavior, also a non-volatile state transition has occured [8]

Here f(h) is again a window function modelling the nonlinear dopant drift and I0(h) relates the input current Imem to the drift velocity of h. The set of coupled differential equations is modeled by the three cells represented in Figure 3.8 (a). This is done in a similar way as was the case with the non-volatile model: namely, RC circuits are used to perform the integration of the variables and their inter-dependency is translated into voltage/current controlled voltage sources integrated in the different cells. Figures 3.8 (a),(d)-(g) present a nice example in order to understand how both volatile and non-volatile switching is modeled. In Figure 3.8 (g) the input stimuli, with different inter-pulse timing, are shown. The charge cell in (a) forms a leaky integrator (RC-circuit) where V (z) is proportional to the charge that passes through the memristor, Figure 3.8 (f). Whenever this charge level reaches a certain threshold (qp , qn), the current source Gy in the non-volatile Cell is turned on and the capacitor Cy is charged, leading to a non-volatile state transition of the device, Figure 3.8 (e). The volatile cell is another RC-circuit that models the volatile state transitions. The extra voltage source ENOV = V (y) enforces the non-volatile state, Figure 3.8

(d). In Figure 3.9 the model is compared to experimental data of a T iO2 memristor regarding the response on write pulses. Good agreement with the actual response of the device is found by fitting the model parameters.

51 3.5 Atomic Switch Networks

3.5.1 Physics

The second memristive system under consideration is a highly interconnected atomic switch network (ASN) [6]. Atomic switches, just as HP’s T iO2 mem- ristor, fall under the broader class of resistive switches. They also exhibit some of the common memristive characteristics including pinched I-V hys- teresis and large ON/OFF switching ratios. Again multiple mechanism con- tribute to the switching behavior seen in atomic switches. The switching process is a combination of the formation/annihilation of a conductive metal filament together with a bias induced phase transition in the insulator layer of the MIM junction. The atomic switches that will be introduced are com- posed of a Ag|Ag2S|Ag MIM interface. The corresponding bias-catalyzed phase transition in the insulating material happens between the monoclinic acanthite (α) and body centered cubic argentite (β) phase of Ag2S. Where the acanthite phase is both electrically as ionically insulating (the ion mo- bility of the Ag+ ions is very low), the argentite phase is semiconducting and has an exceptionally high diffusion coefficient for silver. Under an ap- plied bias, Ag2S undergoes a phase transition from the insulating α- to the conductive β-phase. This leads to an increase in conductivity of the switch, the ON state. This argentite phase however is thermodanymically unstable in the absence of an applied bias. Hence in that case, the Ag2S returns to its stable insulating α-phase, the atomic switch’s initial high resistance OFF state. The switching between this (semi) conducting β and insulating α state, characterizes a weakly memristive behavior Figure 3.11 a. However under continuous applied bias, due to the high ionic mobility in the β-phase (comparable to that of gaseous silver atoms), the mobile Ag+ cations start migrating from the anode to the cathode. Here they are reduced and atom by atom they start forming a silver filament, leading to a decrease of the electrical resistance at the junction. The switch encounters a dramatic in- crease in conductivity as soon as this highly conductive Ag nanofilament is fully formed, it has transitioned to the ON state. Again, with removal of the applied bias, the β-phase becomes unstable and so does the filament which

52 Figure 3.10: Schematic representation of the Ag|Ag2S|Ag MIM junction. Under an applied bias, the Ag cations migrate from the anode to the cathode where they are reduced. This leads to the formation of a conducting metallic filament [83]

Figure 3.11: a) initial weak switching regime of the atomic switch-network. b) switching from a) is included and rescaled to show the difference with the hard-switching regime [6] results in filament dissolution and switching to the OFF state. The interplay between the formation and dissolution of this strongly conductive filament corresponds to a strongly memristive behavior, as can be seen in Figure 3.11 b. The volatility in this type of atomic switch, is caused by the thermody- namical instability of the β- Ag2S phase in the absence of an applied bias. This can be associated with the short term memory aspect of the system.

53 Figure 3.12: Volatile behavior of Ag2S gap-type atomic switch [15]

The filament’s dissolution time constant, and thus the volatility rate, is de- pendent on the filament thickness. Figure 3.12 shows the volatile behavior of an Ag2S gap AS where it can be seen that the decay rate is dependent on the state of the memristor. Under continued applied bias, more Ag ions will be reduced, even after the filament has been completely formed, leading to a thickening of the filament and an increase in its stability. This results in a more resilient conducting pathway and can be regarded at as a form of long term memory. Usually atomic switches are used in crossbar architectures in combination with regular CMOS components for memory storage as discussed in Section 3.3. These crossbars are designed in such a way as to address each memristive element in a sequential and individual order, as was shown in Figure 3.2. The system presented in [6] is, in this aspect, quite different. As can be seen in Figure 3.13, a multi-electrode-array (MEA) is being overgrown by an atomic switch network, that consists of densely connected silver nanowires forming a complex structure. The electrodes (a 4 × 4 grid, 16 in total, connects the ASN to outer electrical contact pads) on the MEA can serve both as a source or readout. In order to produce these structures copper micro-spheres, ranging from 1 to 10 µmeter, are used as seeds for electroless deposition (ELD) of silver. The metallic silver nano-wires are created by spontaneous oxidation reactions of metallic copper in contact with an AgNO3 solution. The morphology of the nano-wire structure is dependent on the distribution of the seeds and the con- + centration of Ag ions in the AgNO3 solution. This atomic switch network, inspired by the connectivity found in the brain, consists of a huge amount of

54 Figure 3.13: Atomic switch network device. (a) Multi-electrode array of outer platinum electrodes lithographically patterned on a silicon substrate enable electrical characterization and stimulation of the central network. Scale bar = 4 mm. (b) SEM image of atomic switch network comprised of self-organized silver nanowires electrode- posited on a grid of copper posts. Overlapping junctions of wires form atomic switches when functionalized. Scale bar = 500 µm. [21]

55 Figure 3.14: Ultra-sensitive IR image of a distributed device conductance under external bias at 300K; electrodes are outlined in white. [86] interconnections with a density of around 109 junctions/cm2 (according to analysis of scanning electron microscope (SEM) images). The length’s of the nano-wires created in the process range from 100nm to over 1mm, forming multiple junctions. This densely connected network structure gives rise to new interesting properties that aren’t seen in single Ag|Ag2S|Ag atomic switches. As can be seen in Figure 3.13, due to the structure of the MEA and the ASN, it is physically impossible to measure or stimulate a single switch individually. So whenever the voltage or current at a certain electrode is measured, it corresponds to the response of the network as a whole and not just to the response of a single switch. This is validated from IR image analysis of the network under an applied input. It is concluded that the input is distributed across the whole device. As can be seen in Figure 3.14, where Joule heating from current flow shows how power is dissipated throughout the network. Hence the measured I-V characteristics have to be ascribed to the properties of the network as a whole and is influenced by its structure and connectivity. Whenever a bias is applied at one of the electrodes, the voltage is dis- tributed across the whole ASN. This leads to the initialization of the for- mation of different filaments throughout the network. The completion of a filament at one location leads to a drastic increase in the conductivity locally, hence a potential drop at the junction, which in turn leads to a redistribution of the potential over the network. Now somewhere else in the network, the

56 buildup of the potential can cause another filament to form while the local potential drop at the completed filament results in its thermodynamic disso- lution. This leads to a chain reaction of filament formation and dissolution all across the network. Whereas a single atomic switch under a continuous bias eventually becomes saturated and starts behaving as a static system, the network structure and connections in the ASN make sure that the device’s response stays dynamic even under a constant applied bias. This has exper- imentally been shown by stimulating the device with a constant voltage for several hours in a row, and measuring the network’s activity which showed continuous dynamical behavior, distributed all across the network. It has also been reported [86] that the ASNs, and similar network devices, show signs of an operational regime near the ”edge of chaos”, where the network exhibits avalanche dynamics, criticality and power law scaling of temporal metastability. Figure 3.15 shows the ASN’s response to two different wave forms at the different electrodes on the MEA. These voltage responses form rich and interesting nonlinear transformations of the original input signal and hence show the possible suitability of these systems as reservoirs in a RC approach.

3.5.2 Simulation Framework

A simulation model of the Ag|Ag2S|Ag atomic switch network has been de- veloped in MATLAB [83] based on [73] and the experimental results of the

ASN devices. Unlike the SPICE model of the T iO2 memristor, this model is explicitly aimed at reproducing the behavior of the network as a whole, rather than just one single memristor. In [73] again Strukov’s current con- trolled memristor model is used [88] to describe the behavior of a single atomic switch. The port equation 3.11 is repeated here for clarity.

w(t) D − w(t) v(t) = [R + R ] i(t), (3.24) ON D OFF D again with w ∈ [0,D], now representing the length of the Ag nano filament, the state variable of the AS. If w = 0 there is no conducting filament, hence

57 Figure 3.15: ASN response, nonlinear transformations. (a) and (b) Input waveforms shown are a 750mV , 30ms FWHM Gaussian pulse with a 1.25V offset input at electrode 12, and a 10V , 50ms square wave repeated at 10Hz with no offset input at electrode 4. (c) and (d) Each recorded output wave- form is plotted with respect to where the recording was physically located on the device. The patterned seed network in (c) is grown on top of the MEA in (d). (e) and (f) Distributed network activity causes voltage signals that are input to the device to be transformed into higher-dimensional output repre- sentations. These output voltage representations are simultaneously recorded at each electrode in the 4 × 4 array. Vertical axes are normalized for clarity. [21]

58 the switch is in the OFF state, corresponding with a resistance ROFF . For w = D (the junctions’ gap size), the filament is fully formed, the switch has a low resistance RON and is said to be in the ON state. Next a new state equation is introduced, reflecting the different physical mechanisms related to the formation and annihilation of the filament:

dw(t) R = [µ on i(t)] Ω(w(t)) − τ (w(t) − D) + η(t), (3.25) dt v D + where µv represents the ionic mobility of Ag .

D−w(t) Ω(w(t)) = [w(t) D2 ] is called the window function, and serves the same purpose as in the T iO2 SPICE model. Three different contributions to the change in filament length

RON can be recognized. The first term, [µv D i(t)] Ω(w(t)), represents the depen- dence of the growth rate on the electronic flux through the junction. Next we have a term describing the dissolution of the filament, τ (w(t) − D), where τ serves as dissolution rate constant (the value of which was determined by a numerical survey in order to determine the best fit to the experimental data). Finally, an extra stochastic term is added to the equation:

η(t) = α(t) ∆w(t), with ∆w(t) representing the change in filament width at time t and α(t) a noise factor. This last term takes both the fluctuations in the density of available Ag-ions as the random nature of the filament formation/dissolution process into account. Both equations (3.24, 3.25) describe the behavior of a single atomic switch. Next a square lattice of nodes is created, where the nodes mimic the copper seeds in the hardware design. Both short range connections, be- tween nearest neighbors, as well as long range connection are formed, Figure 3.16. Each connection represents an atomic switch that is modeled by the equations introduced above. In order to best mimic the behavior of the

Ag|Ag2S|Ag network as a whole, the probability distributions of the junc- tion width D and the added noise term α(t) were determined in order to

59 Figure 3.16: Connection graph of the ASN grid used in simulation experi- ments

fit the experimental results. The D distribution has a mean of 5nm, while

α is centered at 0 with a standard deviation σα. As the model is build to simulate the ASN, it has to emulate the behavior of this highly intercon- nected structure. Yet, computer memory and simulation time have to stay reasonable. In [73] finite size effects of the simulation model are investigated to determine the optimal number of nodes in the square lattice, while in [83] periodic boundary conditions are applied to deal with these finite size effects. In this way the ASN can be modeled using network configurations consisting of 10 × 10 nodes. Figure 3.17 shows the switching behavior in simulation and the corresponding experimental results. It can be seen that the tran- sition from a weak to a hard memristive switching regime can be modeled qualitatively well by varying the σα parameter.

3.6 Voltage Controlled Memristor Model

In [59], a voltage controlled memristor model is introduced based on the measured I-V characteristic of the T iO2 memristor in [113]. Compared to the current controlled models seen earlier, where the conduction channel length is used as the state variable of the system, here the conduction channel area (or equivalently, width) is chosen. For many devices this approach seems more natural, since these conduction channels typically form locally and in parallel. More over, continued applied bias, even after the filament is fully formed,

60 Figure 3.17: Simulation of device activation demonstrating (a) an initial soft switching repeated indefinitely, until (b) a transition in behavior from soft (blue) to hard (red switching. (c) Hard switching persists indefinitely. This behavior was ubiquitous across all configurations with discrepancies in the bias amplitude/frequency. Experimental device activation curves are shown as insets for comparison [83] either increases the thickness of the conduction channel or the number of conduction channels. These two phenomena can both be modelled by the same increase in conduction channel area. The voltage controlled memristor model is further extended in [16] with an additional linear diffusion term in the state equation, to account for volatility. The port equation reads:

i = (1 − w)α[1 − exp(−βv)] + wγsinh(δv), where α, β, γ and δ are positive-valued fitting parameters determined by the material properties. The first term corresponds to the contribution from Schotty emission between the oxide layer and electrode whereas the sec- ond term represents conduction due to tunneling in the conducting channels through the MIM junction. The corresponding state equation, describing the rate of change of the channel area w, is determined by its expansion rate, related to the exponential ionic drift, and the additional lateral diffusion of ions that build up the conducting channel:

dw w dt = λsinh(ηv) − τ .

61 Chapter 4

Memristor Based Reservoir Computing

In Section 1.5 a couple of examples were shown of physical systems that have been used as reservoirs in PRC. The interesting properties of memristive sys- tems in general, Section 3.2, and two realizations, the T iO2 and Ag|Ag2S|Ag resistive switches (Sections 3.4 and 3.5), were discussed. The RS’s character- istics (ranging from history-dependent resistance, non linear output response to applied bias to synaptic properties such a short- and long-term memory), were linked to the characteristics of synapses in the brain 3.3 and it was concluded that memristors form a very promising class of devices for the fabrication of neuromorphic systems. In Section 1.4.3 it was explained how two fields emerged simultaneously, both with a different purpose, but with a general reservoir computing approach towards RNNs. Where Maass’ LSMs were purposely built to reproduce and better understand the activity in the brain, Jaeger’s ESNs were introduced from an engineering point of view in order to exploit the intrinsic, computational power hidden in these RNN reservoirs. Now again with the upcoming of these new nano-scale memris- tive devices a similar distinction between two fields of research can be made. One branch, related to the viewpoint of the LSM, uses these new devices to fabricate synthetic synapses and integrate them into large interconnected networks in order to reproduce the behavior of the brain [9] [41] [3] [90]. In this master thesis however, the memristive devices will be deployed

62 from a reservoir computing point of view related to the viewpoint of ESNs. The RC approach will be applied to networks of interconnected memristors in order to exploit their interesting dynamical behavior for computational purposes. This is a continuation of previous works in the domain of memristor based reservoir computing [52] [13] [83] [12] [15]. Regardless of this distinction in purpose, just as in the case of LSMs and ESNs, both fields could benefit from combined efforts and exchange of ideas.

4.1 Previous Work in the Field

The first research regarding memristor based reservoir computing was done in [52]. Here, in analogy to the RC approach, they use a non-planar, random architecture of memristors as reservoir. Several reservoirs with a varying number of memristors are used for a set of tasks, ranging from benchmark memory experiments to basic logical computations (e.g. OR, AND, XOR gates), to analyze the computational power of memristor reservoirs. The memristor SPICE-model, used to simulate the individual memristors, how- ever doesn’t account for volatility neither do they take memristor variations into account, which are inevitable in the production process of nano-scale devices such as memristors. In [13] the reservoir contains memristors with variations in the device parameters for a more realistic investigation of memristor based RC. Next a structured planar network topology is proposed. Using the random memris- tor reservoirs from [52] extended with device variations as a benchmark, the proposed structured networks are compared against the randomly connected reservoirs for robustness against device variability. This is done on a signal classification task. Not only are these structured networks reproducible for varying network sizes, they also show a higher tolerance against device vari- ability. However, memristor volatility isn’t taken into account in the used SPICE model. In [83] the ASN network described earlier is used for the first time in a reservoir computing approach to perform a higher harmonic generation (HHG) task. The results show that the networks can indeed be used as a pattern generating kernel. It is stated however that in order for the ASNs to

63 carry out more intricate tasks, it might be necessary to introduce multiple, simultaneous inputs as well as real-time feedback. The main focus in [15] is on the modeling of the memristors itself. In this paper the necessity of volatility of the memristors in the reservoir for the RC approach to work, is shown. New memristor models are introduced based on Strukov’s model [88] with an additional diffusion term in the state equation to account for volatility. Inspired by the general theory of ESNs [43] other important characteristics of the memristors in the reservoir are determined. Finally, in [12] a novel hierarchical architecture is proposed based off the simplified ESN network topologies introduced in [78], the simple-cycle- reservoir (SCR) topology. Instead of using artificial neurons they introduce randomly assembled memristive networks, inspired by the ASN [83], as in- dividual reservoir nodes and connect these to form a SCR. This hierarchical architecture is proposed to optimally exploit the computational power of the memristors individually as it was seen that with simple, single connected memristor networks performance stopped improving with increasing network size. This is mainly due to the fact that the voltage drop over the single mem- ristive devices has to be in the right range. Too high voltages can damage the devices, however there needs to be a minimum voltage over the memristor in order to cause any memristive state changes, which are necessary for com- putation. This novel architecture is tested on several reservoir benchmark tasks: memory capacity, HHG, multiple superimposed oscillator prediction and the NARMA-10 task, and compared to [52], [13] and [83]. The results on the memory capacity task show a clear trade-off between nonlinear behavior vs memory, as been theoretically stated in [44] and [20] for artificial ESN. Also drastic improvements are achieved both on the wave form generation task (20% improvement compared to the single memristive networks) as well as the NARMA-10 task. On this last task, unsolvable by the memristor networks without the new architecture, error rates twice as low as the con- ventional SCR consisting of artificial neurons are achieved. Also important to note is that in this work, compared to [52] and [13], device volatility is included in the state equation and, even more crucial, for the first time in the RC approach, the used memristor model isn’t based on Strukov’s current controlled memristor model. Instead the voltage controlled memristor model

64 Figure 4.1: Architecture of memristive SCR. (a) Amplitude dependent mem- ristive switching characteristics for a 10Hz applied sine wave; (b) example of a randomly assembled memristive network. The circles indicate nodes in which memristive devices (links between nodes) connect. The colored nodes represent an example CMOS/memristor interface with blue as input node (In), orange as ground node (0V), and green as differential output nodes (O1, O2). (c) simple cycle reservoir. Instead of analog neurons, memristive networks provide the input-output-mapping of each SCR node. [12] is used as was introduced in Section 3.6.

4.2 The Memristor Network as Reservoir

As was described in Section 2.2, some subtleties have to be considered when applying the RC approach to a physical system. A good understanding of the system’s underlying principles is key for the exploitation of its intrinsic dynamics for computational purposes. Now all the insights discussed so far in Sections 2.2 and 2.1.3 will be combined and their implications for the memristor reservoir will be considered. Two types of reservoirs will be used: the ASN as presented in Section 3.5 and a planar network consisting of different nodes connected by T iO2 mem- ristors, Section 3.4.1. These memristor reservoirs will serve as a temporal kernel, projecting the input into a high dimensional feature space. In order to access this feature space the state of the system needs to be read out, i.e. a way to represent the state of the memristor reservoir needs to be defined.

65 Different options have been used so far in previous works. In [52] and [13] a voltage source is applied to one of the nodes in the network and the state is defined by the voltage at the other nodes. In case of the ASN [83] a voltage is applied at one of the electrodes of the MEA while the system’s state is represented by the voltage responses measured at the remaining electrodes, as it is impossible (and unnecessary) to interact with any individual atomic switch in the network. [15] uses a current source to drive a network con- sisting of memristors connected in series. The voltage over each individual memristor is used as the state of the system. Finally in [12], the state of each individual SCR neuron is defined by the differential voltage between two randomly selected memristors inside the neuron, Figure 4.1. The first important property that needs to be satisfied by the reservoir for the RC approach to work is that of fading memory as was stressed in [15]. This is necessary for the functional relationship between the input driving the reservoir and its response to be localized in time. Hence volatility of the memristive system is key. Also, in the absence of volatility, a memristor under a constant bias may become saturated in which case it acts as a simple resistor – a linear element – which only leads to a scaling of the input voltage. In this regard, the ASN seems a perfect fit as the Ag filaments are thermo- dynamically unstable and dissolution happens spontaneously all across the network due to the constant redistribution of voltage. The T iO2 memristor on the other hand, apart from its short term volatile behavior, also has the ability to switch states, leading to long term non-volatile memory. This be- havior doesn’t comply with the fading memory property and will lead to the failure of the RC approach as will be discussed later in Section 4.4.2. In the production process of these nano-scale devices, even in highly con- trolled processing flows, variations in device parameters are inevitable and computing systems consisting of these components must take this device noise into account. Where this is usually seen as an unwanted property in conventional memristor applications, in the RC approach this variability is exploited as an advantage to increase the ’richness’, by increasing the linear independence between the responses, and hence the computational capacity of the memristor network [15] [13]. In the simulation model of the ASN, device variability is included when setting up the network. The width D,

66 RON /ROFF ratio and the stochastic term α are all drawn from a random distribution with mean and variance as stated in Table 4.1. For the reser- voirs consisting of T iO2 memristors, the same procedure is applied. The parameters for the individual memristors in the network are also drawn from random distributions. The memristor parameter values used to fit the be- havior of the T iO2 memristor, as presented in [8], are taken as mean for the distributions. Next the deviations σ from these means are defined, leading to the parameter values and distributions as presented in Table 4.2.

Table 4.1: Used parameters for the ASN simulations

2 −1 −1 −1 D (nm) µv (m s V ) RON /ROFF τ (s ) α Ave: 5 0.5 10−12 Ave: 10−3 1 - 103 Ave: 0

σD: 0 - 40% σON/OF F : 0 - 40% σα : 0 - 30%

Parameters used in simulation, as discussed in Section 3.5.2. Total gap + width (D), ionic mobility (µv) of Ag in Ag2S; ratio of resistances

(RON /ROFF ); filament dissolution rate constant (τ); modulation (α) level of noise in the w(t) term with each timestep, [83].

Table 4.2: Used parameters in the T iO2 simulations

2 −1 −1 D (nm) µv (m s V ) RON (Ω) ROFF (kΩ) qp (nV) qn (nV) Ave: 10 1 10−12 Ave: 10 Ave: 100 Ave: 300 Ave: −300

σD : 25% σRON : 25% σROFF : 25% σqp : 25% σqn : 25%

Cx (F ) Rx (Ω) Rz (Ω) Ave: 0.5 Ave: 1 0.1

σCx : 25% σRx : 25%

Parameters used for the volatile T iO2 memristor model from [8] as discussed in Section 3.4.2.

As was discussed in section 2.1.3 and 2.2, having a clear notion of the timescales governing the dynamics of the physical system at hand is im-

67 portant in order to exploit its intrinsic computational power. Inspired by this, input encoding schemes can be proposed in order to match the system’s timescales and have it operating in an interesting regime. For the ASNs, the switching process is dependent on several mechanisms as was described ear- lier in Section 3.5. However, in the case of strongly memristive behavior, the switching speed is mainly determined by the filament formation and dissolu- tion rate. The formation process is related to the ionic drift of the Ag ions towards the cathode and hence the formation time is dependent on both the + ionic mobility of Ag in β-Ag2S and the gap width. The filament dissolution rate on the other hand governs the switching off time of the atomic switches. Filament thickening under continued applied bias leads to increasingly sta- ble filaments and hence longer dissolution times. In the ASN model these two time dependencies are reflected in the first and second term of the state equation 3.25. In the first term, the ionic mobility µv and junction’s gap size D govern the filament formation speed related to the ionic drift, while the filament dissolution term leads to an exponential decay with time constant τ.

In case of the volatile T iO2 model described by the set of equations presented in Section 3.4.2, the same term governs the ionic mobility and thus the same parameters µv and D are related to formation time. The decay time constant in the volatile cell is determined by both the values of Rx and Cx.

4.3 Reservoir Computing with Atomic Switch Networks

In a first exploration phase, Section 4.3.1, the ASN is excited with basic inputs – ranging from constant applied biases and input pulses to sine waves with different frequencies, amplitudes and offsets – in order to characterize the behavior of the system and look for possible computationally interesting regimes. These gathered insights in the system’s dynamics are subsequently applied in the search for the most effective encoding scheme for the input in order to solve two benchmark tasks, the memory capacity and NARMA-10 tasks.

68 Figure 4.2: Network configurations used. a) Sparsely connected network with some long ranging connections (maximally ranging over 10 neighboring nodes). b) Densely connected network with only close connections (maxi- mally ranging over 2 neighboring nodes). Each grid point serves as a node at which the voltage is read out. The green and red node represent the input/ground node respectively.

4.3.1 System Characterization

The ASNs used in the following experiments are presented in Figure 4.2. Here, the voltage nodes are represented by the grid points, whereas the actual atomic switches are depicted by the different lines connecting these nodes. Input and ground nodes for the experiments are denoted by the green and red dot respectively. Model parameters can be found in table 4.1. The first ASN realization is a sparsely connected network with long connections rang- ing over several neighboring nodes, mimicking the long nanowires forming several junctions across the ASN, seen in the real devices as discussed in Sec- tion 3.5. Secondly another network configuration consisting of dense, short range connections is also chosen. Here connections only span at most two neighboring nodes.

69 Step Response

As a first experiment the network is driven by a step input, starting at time t = 0, with varying heights. For the results presented in Figure 4.3, the exact same sparse network configuration is used for each run, i.e. the initialization of the different network parameters is identical in each experiment. The transient time presented in Figure 4.3 a, is obtained by looking at the voltage responses at each of the network nodes. Two of these voltage progressions are shown in Figure 4.3 b.1 and c.1, for two different input biases. After an initial transient regime (the transient time) each of these node voltages settles to a constant value, after which the state of the system stays unchanged. This can be understood by looking at what is happening at each of the internal connections between the nodes, the individual atomic switches. Figures 4.3 b.2-c.2 show the progress of the fraction of formation of the filament lengths x(t) = w(t)/D ∈ [0, 1] as introduced in Section 3.5.2. With x = 0/x = 1 corresponding to the filament being fully dissolved/formed. The squares in Figures 4.3 b.1 and b.2 show the relation between the change in filament length and the corresponding change in voltage response at the different nodes. For clarity the initial fierce filament/voltage changes for the case of the 1V bias input step are enlarged in Figures 4.3 b.1.1 and b.2.1. Here an important fact catches the eye, namely that immediately after the start of the experiments, t ∈ (0s − 0.01s), the largest part of the filaments ’dies out’, meaning that the filament length exponentially decays to x = 0, and the corresponding memristor switches to the high resistance OFF state. This exponential decay is caused by the dissolution term −τ (w(t)−D) in equation 3.25, with time constant τ −1 introduced in section 3.5.2. The same behavior occurs for all biases applied. However, with increasing bias, an increasing amount of the filaments ’survive’. If the applied bias is too low on the other hand, then all filaments will decay towards x = 0, as the flux dependent

Ron growth rate [µv D I(t)] Ω(w) can’t overcome the decay rate. This switching towards the OFF state happens very quickly due to its exponential form and short timescale τ, resulting in the short transient times seen for biases under 0.4V . Also interesting to note is the fact that whenever a filament is formed, it doesn’t decay back to its initial state and that the formation of one filament

70 influences the formation of others. In order to better grasp this ’die out’ phenomenon, Figure 4.3 b.2.1, and the varying relation between transient time and applied bias, Figure 4.3 a, the progression of filament lengths across the network for different times is presented in Figure 4.4 and 4.5. At time t = 0 all the filament lengths are initialized randomly. However, already after t = 0.01s, the largest part of the network has died out, and only three switches are still notably conducting in Figure 4.4, corresponding to the purple, orange and blue filaments in Figure 4.3 b.2.1. As time progresses a couple of more filaments are formed until the system reaches its steady state. The current looks for the path with least resistance connecting the input to the ground (basically it solves a kind of shortest path problem). As the current flows along this path, it initializes a chain effect: current passing through the AS leads to an increase in filament length (due to the electronic flux dependent growth rate). The corresponding increase in the switch’s con- ductance in turn results in a higher current density along this path. This cycle is repeated until the filament is fully formed. Current will keep on choosing this route as it is the ’shortest path’ from input to ground, pre- venting the filament to decay. This can be seen by the very rapid initial formation of the long connecting edge ending in the ground node in Figure 4.4. It is part of the initial path of least resistance and stays fully formed for the time the bias is applied. To prove that this ’shortest path’ theory holds, the same experiment is repeated with the dense network configuration where long range connections are absent. Figure 4.6 shows that, again in this case, the largest part of the network dies out and only a small fraction of the filaments, forming the highest conductive path, carries all the current from input to ground. As the bias is increased, not all the current can pass through just a single high conductivity path, and more channels are formed as can be seen in Figures 4.5 and 4.7. With this in mind, the – at first sight – peculiar bias-transient time rela- tionship seen in Figure 4.3 a can be understood. With higher biases, more channels are formed and the formation of these side branches takes time, pro- portional to the current passing through the filament. Until all filaments have reached their final length, changes in the memristance of these branches will

71 Figure 4.3: Network’s response for the sparse network configuration to an input step at t = 0. a) Transient times for different applied biases. b.1- c.1) Voltage progression at different nodes for 1V bias (b) and 2.9V bias. The voltage at the input nodes are depicted by the yellow constant lines (c). b.2-c.2) Fraction of the filament formation for the different atomic switches. b.2.1)-b.1.1) Zoom in on initial transient for the 1V applied bias case. 72 Figure 4.4: Filament lengths x ∈ [0, 1] for the sparse network configuration at different times for a step input with height 1V.

Figure 4.5: Filament lengths x ∈ [0, 1] for the sparse network configuration at different times for a step input with height 2.9V.

73 Figure 4.6: Filament lengths x ∈ [0, 1] for the dense network configuration at different times for a step input with height 1.5V.

Figure 4.7: Filament lengths x ∈ [0, 1] for the dense network configuration at different times for a step input with height 2V.

74 result in changes in the voltage response and hence higher transient times. However, if the input bias has just the right value (for example V = 2.6 cor- responding to a transient time of ttr = 0.2s), than the current is distributed perfectly across the initially formed conducting channels and no new branches are created. The formation of these initial channels happens rather quickly as can be seen in Figures 4.4, 4.5, 4.6, 4.7, leading to small transient times.

Pulse Response

Next different pulses are applied to the network (sparse network configura- tion), with a variation in both width and amplitude. In order for the ASN to actually respond to such a short pulse an additional bias is required, oth- erwise no filaments would be formed prior to the pulse and this pulse by itself wouldn’t be able to create any meaningful additional channels. After the initial transients of the bias have died out and the system operates in its steady state (constant voltage at each node and every filament width at a stable operating value) this additional voltage pulse imposes an extra cur- rent through the network, resulting in the formation of new filaments. This disrupts the steady state of the network, as can be seen in Figure 4.8. Af- ter an additional settling time, the system again returns to a steady regime, which might be different from its steady state prior to the pulse, as the formation of new filaments might have changed the current’s course. The system’s response to these pulses is dependent on all factors (constant bias, pulse amplitude and pulse width). Too low biases lead to networks that are in essence ’dead’. Prior to the pulse all the memristors are in their OFF state. Even if the pulse induces some filaments to (partially) form, these would decay exponentially afterwards, as the constant bias isn’t sufficient to sustain them. Too high of bias on the hand results in many parallel channels carrying the current from source to drain 4.5. Hence the (comparably small) extra contribution of the pulse won’t contribute enough for it to induce any noteworthy changes. The same goes for a too short pulse width. Due to the interplay of all these different factors, it is hard to predict in advance which combination will result in interesting system responses, hence a grid search over bias/amplitude/width-space is carried out. A couple of the most

75 interesting pulse regimes are presented in Figure 4.8. The network is firstly driven by a constant bias of 1V until the initial transients have died out, next the pulse is applied on top of this bias, and the progress of the voltages at the different nodes is recorded. It can be clearly noted that all three input parameters play a role in the dynamics of the system’s response. Interest- ing to note is that with increasing amplitudes, shorter pulse widths result in longer and more varied responses. However, too short widths (for a given amplitude) result in no system state changes at all.

Frequency Response

In section 2.1.6 the deviation from linearity in the frequency domain δφ was introduced. Driving the system with a sine wave at a constant frequency fc, δφ essentially measures the amount of energy that resides in the original input frequency fc versus the energy contained in all other frequencies in the system’s response. In Figure 4.9 the results are presented for a sine wave with input frequency fc = 11Hz and different bias and amplitude values, ranging from 0V to 5.5V and 0.5V to 15.5V respectively, with steps of 0.5V. The network is driven by this sine wave for a total of 30 periods. The first 10 periods are not included in the calculation of δφ as they might still exhibit transient behavior, and influence the results accordingly. Similar results are obtained for other frequencies.

Overall, the values of δφ, seen in Figure 4.9, are relatively low (a maxi- mum value of 0.09 is found for high amplitudes), corresponding to a nearly linear system response. Still, some interesting relationships can be seen be- tween the amount of nonlinearity present in the system’s response and the bias-amplitude values chosen. First, increasing the amplitude results in an increase in δφ. The same, being it a bit more subtle, holds for increasing biases, except for low amplitudes. Here it is found that low amplitudes in combination with an additional bias leads to a decrease in δφ. Figure 4.10 shows the network’s response for some specific bias-amplitude combinations. From this, the calculated δφ values can be qualitatively un- derstood. Figure 4.10 a.1 shows the response at different nodes in case of

76 Figure 4.8: ASNs response to different input pulses for pulses with 1V con- stant bias, varying pulse heights and pulse widths as given in the insets. The voltage progression at different nodes is plotted, right after the end of the pulse for a representative time until the system has reached its new steady state. 77 Figure 4.9: δφ = 1 − Ec/Etot, section 2.1.6, as a measure of the networks nonlinearity for input sine waves with fc = 11Hz and a range of different biases and amplitudes a small input amplitude of 1V and no extra bias term. Here the network behaves in a completely linear fashion, producing mere scalings of the input voltage. Figure 4.10 a.2 shows why: the applied input of 1V isn’t strong enough to sustain any filament formation and the decay term outweighs the growth rate. All atomic switches nearly instantaneously switch to the OFF state and the network now basically consists of plane resistors. Figure 4.10 b and c explain the increase in δφ for increasing amplitudes. As the ampli- tude is raised, more current passes through the network leading to a higher amount of filaments being formed. This in turn results in a larger part of the network contributing to the nonlinear system’s response, as can be seen in Figures 4.10 b.2 versus c.2 (more filaments are formed and annihilated each period). Hence, more nodes will be nonlinear transformations of the input sine wave, i.e. a larger part of the input’s energy is found in higher harmonics of the input frequency. The (subtle) increase in δφ with increasing biases in case of sufficiently large amplitudes can be understood by comparing Figures 4.10 c and d. Here in case d, an extra bias of 5V is added to the sine wave with an amplitude of 15V. The responses measured at the nodes, as seen in Figure 4.10 d.1 aren’t necessarily more nonlinear than in the case of zero bias. As can be seen by all previous examples without an additional bias

78 term, the system only responds in a nonlinear way during that time of the sine wave’s period where the voltage is positive. This can be understood by looking at the filament progression in these cases. After the first quarter of a period, the applied voltage starts decreasing. As soon as this voltage drops beneath a certain threshold the current through the network can no longer sustain the formed filaments. The dissolution term starts dominating the current dependent growth rate and the filaments begin decaying. As soon as the voltage drops below zero after half a period, negative currents through the network lead to an even faster decay of filament lengths. Hence during the second part of the period, all switches operate in the OFF state and only perform a scaling of the input. The effect of an additional applied bias can be seen by looking at Figure 4.10 d.2. versus Figure 4.10 c.2. During the second part of the period, where the sine becomes negative, the extra voltage provided by the bias helps sustaining a part of the formed filaments. This way, the system responds longer in a nonlinear fashion to the applied input which leads to the measured increase in δφ. Finally from Figure 4.10 e.1 and e.2 the decrease in δφ for low amplitudes and increasing biases can be understood. If the applied bias is too large relative to the signal’s amplitude, than most filaments that are fully formed will not decay in the second half of the period. The corresponding switches will remain saturated for most of the time and act as resistors with resistance RON . This leads to linear responses as seen in 4.10 e.1 and low δφ values. It must be noted that during each repetition of the sine input, the same filaments are responsible for the current transportation. They form only a couple of conducting channels, while the rest of the network immediately dies out. This is again related to the fact that the current always chooses the ’shortest path’ from source to drain, which is independent of time. Each new period the same path will be chosen by the current.

4.3.2 Tasks

In the following, the ASNs will be characterized regarding its memory and nonlinear computation power by looking at its performance on two bench- mark reservoir characterization tasks: memory capacity and the NARMA-10

79 Figure 4.10: The ASN’s response, driven by an input sine wave with fre- quency fc = 11Hz with varying biases and amplitudes. (a.1-b.1-c.1-d.1-e.1) Voltage progression at different nodes (including input node) for time periods when the system has reached its steady state. (a.2-b.2-c.2-d.2-e.2) Filament width x ∈ [0, 1] for these time periods.

80 Figure 4.11: Input encoding schemes used for the memory capacity and NARMA-10 task. The discrete input u(n) ∈ [0, 0.5] is mapped to the con- tinuous time signal u(t) ∈ V 1 + V 2 [0, 0.5] with a certain width. task. Here, to build further on the research done in [83], the ASN will also be driven by a voltage source at one of the nodes in the simulation and the voltage responses at 14 randomly chosen nodes (corresponding to the MEA’s 16 electrodes, of which two are used as input and ground) will represent the state of the system.

Optimal input encoding

As was discussed in Section 2.2 and observed in Section 4.3.1, using the ASN as a blackbox computational tool won’t lead to any useful results. In order to drive the system into a computationally potent regime for the two upcoming tasks (Memory capacity and NARMA-10), the insights gathered in Section 4.3.1 will be used to propose two input encoding schemes, Figure 4.11, that aim at exploiting as much of the ASN’s computational power as possible. Encoding of the input is always dependent on both task and system properties.

As was seen in section 4.3.1, only small deviations (δφ close to 0) from linearity were found in the ASN’s response to a sinusoidal wave. Hence frequency encoding of the input in this case wouldn’t be the best bet to

81 exploit the ASN’s computational capacities. Section 4.3.1 however showed some pulse configurations that led to interesting system responses, dependent on width, bias and amplitude of the input pulse. Due to the nature of the discrete input u(n) for the memory capacity and NARMA task, namely white noise, it is straightforward to opt for the encoding schemes as presented in Figure 4.11. Each value of the discrete input u(n) is mapped to a pulse with a certain width and amplitude. As an additional bias is required for input pulses to produce significant changes in the ASN’s filament widths the following encoding schemes are proposed:

u(t) = V 1 + V 2 u(n) for t ∈ [n∆t, (n + 1)∆t[, with ∆t the width of the pulse, for scheme 1. And:

u(t) = V 1 + V 2 u(n) for t ∈ [n∆t, (n + 1/2)∆t[ u(t) = V 1 for t ∈ [(n + 1/2)∆t, (n + 1)∆t[, for scheme 2. These forms of encoding are based on the results shown in Figure 4.8. Here it is seen that pulses with the right amplitude and width combination on top of a constant bias result in momentary state changes of the ASN by forming some additional filaments that create short lived extra conducting channels. A bias, width and amplitude sweep was performed in order to determine which combinations produce the most interesting system responses. The found values for these parameters are now used in both encoding schemes. The V 1 corresponds to the constant bias applied on top of which the additional pulses with amplitudes ranging between 0 and 0.5V 2 are superposed. The widths of these individual pulses are initially also based on those found in Section 4.3.1. In encoding scheme 1, all these input pulses are concatenated one after the other, whereas in scheme 2 after each pulse, the input returns to the bias baseline. Both the memory capacity and the NARMA-10 task use the same white noise input, hence the ASN’s responses obtained by feeding the encoded input to the network can be used as features for both tasks. The input consists of 200 values u(n) drawn from a uniform distribution between 0 and 0.5. The u(n) sequence is subsequently encoded using encoding scheme 1 or 2,

82 with different values for V 1, V 2 and different widths, to produce u(t) which is fed to the ASN (sparsely connected configuration with long connections). The first 25 input values are discarded in order for the system to reach its steady state. The following 100 bits are used for training the readout weights using ridge regression, and finally the system’s performance is tested on the remaining 75 bits. The results for both tasks will be presented in the following sections.

Memory Capacity

Tables 4.3 and 4.4 summarize the ASNs memory capacity for both encoding schemes. As discussed in Section 2.2, the memory capacity is calculated with the decoded discrete output y(n) and the desired discrete output d(n). Compared to the hierarchical SCR architectures in [12] where normalized memory capacities of 0.9 are reached, corresponding to a memory of nearly 20 bits of input history, here it can be clearly noted that the used reservoir doesn’t show a lot of memory.

Table 4.3: Memory Capacity Encoding Scheme 1

∆t (s) / V1 - V2 (V) 1.5 - 2 2.75 - 2 3.75 - 2 0.1 0.09 0.1 0.08 0.05 0.06 0.08 0.07 0.03 0.05 0.07 0.1

Memory capacity values for the ASN where input encoding scheme 1 is used. The values are normalized by dividing by the number of readout electrodes N = 14.

83 Table 4.4: Memory Capacity Encoding Scheme 2

∆t (s) / V1 - V2 (V) 1.5 - 2 2.75 - 2 3.75 - 2 0.1 0.07 0.08 0.07 0.05 0.06 0.06 0.05 0.03 0.04 0.06 0.09

Memory capacity values for the ASN where input encoding scheme 2 is used. The values are normalized by dividing by the number of readout electrodes N = 14.

NARMA-10 task

In order to measure the reservoir’s performance on the NARMA-10 task, the normalized root-mean-square error (NRMSE) is used. Here again, the actual decoded output y(n) is compared to the desired output d(n) in discrete time. The results are summarized in Tables 4.5 and 4.6.

Table 4.5: NARMA-10 task Encoding Scheme 1

∆t (s) / V1 - V2 (V) 1.5 - 2 2.75 - 2 3.75 - 2 0.1 0.75 0.81 0.91 0.05 0.85 0.83 0.71 0.03 0.83 0.81 0.77

ASN performance on the NARMA-10 task. Obtained NRMS errors for different pulse widths ∆t and V1-V2 values for encoding scheme 1.

Table 4.6: NARMA-10 task Encoding Scheme 2

∆t (s) / V1 - V2 (V) 1.5 - 2 2.75 - 2 3.75 - 2 0.1 0.67 0.74 0.71 0.05 0.76 0.81 0.97 0.03 0.92 0.94 1.1

ASN performance on the NARMA-10 task. Obtained NRMS errors for different pulse widths ∆t and V1-V2 values for encoding scheme 2.

84 These results can again be compared to the SCR in [12]. Here NRMSE as low as 0.2 are obtained. In [4] it is stated that the best NRMSE for the NARMA-10 task obtained with a linear reservoir in the ESN approach is 0.4. Also, increasing the number of readout nodes in case of the ASN, e.g. reading out all the nodes instead of only the 14 randomly chosen ones, doesn’t lead to drastic improvements. The performance of the ASN on both the memory capacity and NARMA-10 task can be understood from the fact that only a small part of the reservoir actually contributes to the computation, caused by the shortest-path phenomenon described earlier.

4.4 Reservoir Computing with T iO2 Network

4.4.1 Architecture

The second network used is build up from T iO2 resistive switches, as intro- duced in Section 3.4.1. They are arranged to form hexagonal planar struc- tures as presented in Figure 4.12. The input and ground node are again situated at the bottom right corner and the the upper left corner. Initially, as with the ASN reservoir, a voltage source was used to drive the network and the system’s internal state was again defined by the voltages at each node. However, when using this setup, something interesting happened that really helped to gain a better understanding of the problems that arise when using these memristor models in a RC approach. Applying a voltage source to a single memristor works as expected, however as soon as networks con- sisting of multiple memristors are driven by a voltage source the SPICE software runs into trouble regarding convergence. These convergence issues keep sporadically returning and can’t be dealt with by simply adjusting the settings of the SPICE simulation. Their persistence served as a hunch that something more fundamentally wrong is going on with the used setup. By looking at the set of differential equations this T iO2 SPICE model is solv- ing, Section 3.4.2, it is clear that not the voltage but the current through the memristor is directly responsible for the change in memristance. Hence the encountered convergence issues are related to too brusque initial changes in the modeled doped region width, corresponding to x in the memristor

85 Figure 4.12: Network architecture used for the T iO2 memristor network. The upper left node is connected to the ground, bottom right node to the input. model, when driving the network with a voltage source. (in some cases even non-physical behavior occurs where the value of x, which should lie between [0, 1], abruptly overshoots this upper bound). Hence it is opted to replace the voltage source by a current source, which resolves the convergence problem, and describing the state of the network by the current running through the individual memristors. This is a sensible approach, as the state of the sys- tem depends on x, who’s rate of change is determined by the current flowing through the memristor.

4.4.2 Conclusive Example

Something fundamentally different about the T iO2 model compared to the ASN is its non-volatile switching behavior. As was discussed in section 4.2 a reservoir composed of these memristors doesn’t obey the ESP. However, computation with these networks would be possible as long as there exists a unique mapping from the input to the state of the reservoir. A first remark is that only zero DC inputs can be used to drive this network, otherwise the memristors will saturate and behave as simple resistors. One option is to use frequency encoding schemes without any additional bias in order to satisfy this condition. As a first step, a sine wave with f = 1Hz is applied

86 to the network and both the current through the memristors as the filament widths x are measured. Figure 4.13 a.1 shows how again only a couple of filaments grow towards x = 1 after the current is applied. These filaments correspond to the high current values shown in Figure 4.13 b.1. The same mechanism occurs as in the ASN network, the current again looks for the shortest path from input to ground, which is formed by these three filaments. They transport the majority of the current. The difference now however, is that the other filaments in the network don’t decay but stay around their initial resistance value RINIT (which is a clear indication of the fact that the

ESP isn’t obeyed). In this case, RINIT is chosen corresponding to x = 0.5. As can be seen in Figure 4.13 b.2, the other memristors still conduct a small amount of the current which form very interesting transformations of the original input sine wave. Looking back at 4.13 a.1, it appears as if the network has settled into a steady regime after t ≈ 90s, however, out of the blue, at t ≈ 190s the filament widths start changing again as can be seen in more detail in 4.13 a.2. The same phenomenon occurs for different amplitudes and frequencies. As was seen in Section 2.1.3, the ESP is a necessary condition for the training algorithm to work as it ensures that the current state of the system only depends on the input and previous system states up until some time in the past. In this way there exists a one on one correspondence between the input and the state of the reservoir. This condition is violated in this case as can be seen by the sudden unpredictable change in the system’s state while operating in a steady regime. A straightforward example that clarifies the problem that arises when performing computational tasks in this scenario, can be found in the HHG task. Here the network responses, i.e. the currents through the different memristors as measured in Figure 4.13 b.1, are combined to form higher harmonic versions of the input (in this case a sine wave with frequency f = 2Hz). A first time, the training phase runs from t = 100s (after the initial transients have died out, as can be verified by Figure 4.13 a.1) until t = 150s, and testing occurs between t = 150s and t = 175. Here the system performs incredibly well with an accuracy of 96% in reproducing the desired wave form. However when testing is repeated over an equally long time interval, but now in the region where the sudden state change occurs, from t = 200s

87 Figure 4.13: Network response to an applied sine wave with frequency fz = 1Hz and amplitude 10µ A. a.1) progression of filament width. a.2) detailed view on the blue square of a.1). b.1) current response through the different memristors. b.2) zoomed in version of the current response.

88 until t = 225s, the accuracy drops to 86%. This is obvious as the features presented to the trained weights in order to optimally produce the desired output suddenly change. As a side note this example can be used as indication of the fact that the HHG task is a less general and conclusive measure to describe the system’s nonlinear response compared to δφ as introduced in Section 2.1.6 and used to characterize the ASN in Section 4.3.1. Here, although the network per- forms excellent on the HHG task, only a δφ = 0.03 is calculated. This is a consequence of the fact that nearly all the energy is dissipated by the three filaments that form the shortest path. Their responses are nearly linear (due to the fact that the memristors operate in an almost saturated regime) as can be seen in Figure 4.13 b.1. Hence the largest part of the energy resides in linear transformations of the input resulting in a low δφ value. Nevertheless, the nonlinear current transformations found in the other filaments exhibit all the necessary higher harmonic frequencies needed to produce the desired output. These signals are then scaled up by the right weights, determined during training, and combined in order to form the higher harmonic sine wave. Different zero DC encoding schemes have been tried as well as different network configurations and memristor parameters, with the incentive to push the network in a stable configuration for the RC approach to work. However, even in symmetric architectures consisting of a 3×3 square grid (input source at the middle node and four ground connections at each corner) build with memristors without any variation in the parameters, the same situation keeps returning. Hence, as was stated in [15], device volatility is key for the RC approach to work.

89 Chapter 5

Conclusions and Future Challenges

5.1 Conclusions

In Section 4.3.1, it was investigated how the ASN model responded to dif- ferent input stimuli. The formation of a shortest path between input and ground and a corresponding ’die out’ phenomenon of the rest of the net- work’s switches was observed. The network’s response on sine waves with different amplitudes and offsets was summarized by looking at the different

δφ values in Figure 4.9. Here it was concluded that most of the input’s energy resided in linear scalings of the input rather than nonlinear transformations, resulting in low δφ values and hence a nearly linear system response. These results were explained by looking at the used differential equation, based on Strukov’s model, for modeling the individual atomic switches. By comparing the modeled results to the physical ASN’s behavior, it can be concluded that the CCMR model doesn’t capture some of the main characteristics of the physical system. The model ascribes the resistance switching of an individual AS to the current passing through the junction instead of the voltage across its terminals. As was stated in Section 4.3.1, this leads to the described chain-effect, where an increase in conductance is amplified by additional current passing through the device. In the actual physical ASN on the other hand, it is believed that a voltage drop due to the

90 formation of the filament instigates the thermodynamical instability of the filament and hence its corresponding decay. Where the physical ASN shows distributed, continuous network activity caused by the interplay between filament formation and dissolution across the whole network, even under a non zero DC bias, the modeled network’s response is mainly restricted to a small set of filaments forming the conducting channel of least resistance. This non-conformity between simulation and reality can be ascribed to the used CCMR model based on Strukov’s memristor model.

The networks of interconnected T iO2 memristors showed the same ’short- est path’ behavior and also pointed to some additional problems in using a reservoir consisting of these type of CCMRs for reservoir computing. First, in the SPICE environment, the used set of differential equations for the memris- tor turned out to be unstable in a network configuration driven by a voltage source. Switching to a current source, as it presented a more natural choice for the CCMR models, resulted in a new, more fundamental issue regarding the non-volatility of these devices. In Section 4.2 it was mentioned how the ESP is key for the RC approach to work as it makes sure that the functional relationship between the input and the system’s response is localized in time.

This issue also arose in the HHG task performed by the T iO2 network, Sec- tion 4.4.2. From this it can be concluded that volatility of the used devices in compliance with the ESP is key for the RC approach to work. As was mentioned earlier in Section 4.1, in [15], some memristor models are investigated regarding their applicability for RC. After the results dis- cussed above, it is interesting to go deeper into the insights presented in this work. In [15], in order to incorporate volatility into the model, an additional linear diffusion term is added to Strukov’s model, equivalent to the decay term in the ASN and T iO2 model, resulting in the following state equation:

dx dt = µx(1 − x)I(t) − λx.

Here µ corresponds to the term µvRON /D seen in the state equations of the ASN and T iO2 models. Further it is shown that this nonlinear memristor model can be seen (approximately) as a Wiener system, which imposes strong restrictions on the processing capacities of the system. It is deduced that this correspondence might also hinder the usability of these systems in an

91 Figure 5.1: Memristors in series. Set of memristors for simple signal process- ing. Each memristor is fed a constant current mi = µiI0i. [15]

RC approach, since memory functions are restricted to decaying exponential functions (where a uniformly decaying memory function is often unsuited for machine learning problems). Still the volatility observer in the physical devices is more complex than the simple linear volatility and hence more suitable for RC. In [15], the used network consists of a series of memristors, as presented in Figure 5.1. In this setup each memristor has its individual current source to tune its response. This approach is proposed in order to have each device op- erating in an optimal regime to make sure that when combining the different memristor responses, the richest total response is achieved by the network

(i.e. the richest state matrix). Each individual current source mi is depen- dent on both the value of µ and the decay time constant λ for that specific memristor, as it is shown that the difference between these two terms deter- mines the response of the memristor. Hence from this it can be understood that in order to make the RC approach work for a reservoir consisting of memristors described by (the volatile extension of) Strukov’s model, a huge amount of parameters need to be introduced and tuned according to each

92 individual memristor’s unique µ- and λ-values. Avoiding the direct tuning of all these different parameters is precisely the reason why RC forms such a promising approach towards analogue computing. Hence, from the observed results discussed in this work and the necessity of introducing individual current sources to create a global, rich system re- sponse in [15], it can be argued that Strukov’s current controlled memristor model and its extensions in general, are unsuited for RC purposes, regard- less of their physical correctness. However, when combined in an organized, well controlled manner as in [41] [9], in order to simulate memristor net- works, mimicking the firing behavior of neurons in the brain, these models do comply with the desired task. As was seen in Section 2.2, the importance of insight into the system’s dynamics in order to obtain decent results with RC can’t be overstated. Thus, correct modeling of the system at hand is absolutely key in the investigation of the possible RC applicability of a system. In case of memristor based RC, in [15] it was concluded that ”Before moving towards more detailed models, there is a need for thorough evaluation of the current models against the behavior of real devices”. These models can then be used to propose new architectures and new encoding schemes or to discover new interesting operational regimes for the real physical system. Especially in the case of systems that are difficult or expensive to fabricate, such as the memristor reservoirs, proper modelling is important to cut production costs and push the research forward. As was discussed in Section 3.3 and later on in greater detail for the two specific cases of the T iO2 RS, Section 3.4.1, and the ASN, Section 3.5, several physical phenomena lie at the origin of the resistive switching effect. These phenomena have been discussed thoroughly in the literature in the past few years, [106] [115] [81]. However progress in mathematical models remains relatively modest, see [110] [51] for a nice overview. The great challenge in modeling the RS effect lies in incorporating all the physical phenomena at the basis of the switching mechanism, while maintaining enough generality to be applied to the large variety of systems where RS occurs.

93 5.2 Future Challenges

The ASN presented in Section 3.5 does show some very interesting character- istics that make it a very suitable candidate for RC. As said in the previous section, in order to easily further investigate different architectures, encoding schemes and operating regimes, a good, physically valid model is key. Hence as a first step it would be interesting to look at the resemblance between a network simulated with the VCMR model introduced in Section 3.6, and the actual ASN. As was said earlier in Section 4.1, this VCMR model has already been applied to form randomly connected networks in a SCR hierarchical architecture, [12]. The individual neurons, consisting of these randomly con- nected memristor networks, were inspired by the ASNs. Hence, this VCMR model might be a better fit than the currently used CCMR models to simu- late the behavior of the ASN. Regarding the drastic improvements made in [12] by introducing the hi- erarchical SCR architecture, here some additional ideas on reservoir archi- tectures will be suggested that could be interesting to look at. The weights connecting the individual ’neurons’ of the SCR, in [12], are untrained and best values are found by performing a grid search. As a first improvement, FORCE learning, as discussed in Section 2.1.5, could be used in order to train these connections in an online manner. Additionally, instead of only using the parallel SCR formation, a deep architecture could be used, inspired by the deep RC approach with ESN introduced in [27]. Here several reservoirs are stacked forming different architectures. This approach leads to the pro- cessing of the input on different time scales by different reservoirs. However the signal to noise ratio might decrease rapidly with increasing number of reservoirs. Here again, the weights can be trained using FORCE learning. To mathematically define computational capacity of a device is a highly non-trivial task. Therefore, there is a need for standardised measures to characterize the reservoir and its performance. Often different measures, or benchmark tasks, are being used making it hard to compare the reservoir with the state of the art. Finally, where in the photonics case as discussed in Section 1.5, the advantages of light as input source where fully exploited in application domains such as telecom and image processing, memristor based

94 RC still lacks its niche application field where it could start competing with conventional CMOS in either efficiency or speed, or both.

95 Bibliography

[1] Sara Achour, Rahul Sarpeshkar, and Martin C Rinard. Configuration synthesis for programmable analog devices with arco. In Proceedings of the 37th ACM SIGPLAN Conference on Programming Language Design and Implementation, pages 177–193. ACM, 2016.

[2] Hiroyuki Akinaga and Hisashi Shima. Resistive random access memory (reram) based on metal oxides. Proceedings of the IEEE, 98(12):2237– 2251, 2010.

[3] Fabien Alibart, Elham Zamanidoost, and Dmitri B Strukov. Pattern classification by memristive crossbar circuits using ex situ and in situ training. Nature communications, 4, 2013.

[4] Lennert Appeltant. Reservoir computing based on delay-dynamical systems. These de Doctorat, Vrije Universiteit Brussel/Universitat de les Illes Balears, 2012.

[5] Lennert Appeltant, Miguel Cornelles Soriano, Guy Van der Sande, Jan Danckaert, Serge Massar, Joni Dambre, Benjamin Schrauwen, Clau- dio R Mirasso, and Ingo Fischer. Information processing using a single dynamical node as complex system. Nature communications, 2:468, 2011.

[6] Audrius V Avizienis, Henry O Sillin, Cristina Martin-Olmos, Hsien Hang Shieh, Masakazu Aono, Adam Z Stieg, and James K Gimzewski. Neuromorphic atomic switch networks. PloS one, 7(8):e42772, 2012.

96 [7] Yoshua Bengio. A connectionist approach to speech recognition. In- ternational journal of pattern recognition and artificial intelligence, 7(04):647–667, 1993.

[8] Radu Berdan, Chuan Lim, Ali Khiat, Christos Papavassiliou, and Themistoklis Prodromakis. A memristor spice model accounting for volatile characteristics of practical reram. Electron Device Letters, IEEE, 35(1):135–137, 2014.

[9] Radu Berdan, Eleni Vasilaki, Ali Khiat, Giacomo Indiveri, Alexandru Serb, and Themistoklis Prodromakis. Emulating short-term synaptic dynamics with memristive devices. Scientific reports, 6, 2016.

[10] ZdenˇekBiolek, Dalibor Biolek, and Viera Biolkova. Spice model of memristor with nonlinear dopant drift. Radioengineering, 18(2):210– 214, 2009.

[11] Joseph Blanc and David L Staebler. Electrocoloration in srti o 3: Va- cancy drift and oxidation-reduction of transition metals. Physical Re- view B, 4(10):3548, 1971.

[12] Jens B¨urger, Alireza Goudarzi, Darko Stefanovic, and Christof Teuscher. Hierarchical composition of memristive networks for real- time computing. In Proceedings of the 2015 IEEE/ACM International Symposium on Nanoscale Architectures (NANOARCH¿¿ 15), pages 33–38. IEEE, 2015.

[13] John Robert Burger and Christof Teuscher. Variation-tolerant computing with memristive reservoirs. In Nanoscale Architectures (NANOARCH), 2013 IEEE/ACM International Symposium on, pages 1–6. IEEE, 2013.

[14] Arturo Buscarino, Luigi Fortuna, Mattia Frasca, and Lucia Valentina Gambuzza. A chaotic circuit based on hewlett-packard memris- tor. Chaos: An Interdisciplinary Journal of Nonlinear Science, 22(2):023136, 2012.

97 [15] Juan Pablo Carbajal, Joni Dambre, Michiel Hermans, and Benjamin Schrauwen. Memristor models for machine learning. Neural computa- tion, 2015.

[16] Ting Chang, Yuchao Yang, and Wei Lu. Building neuromorphic cir- cuits with memristive devices. IEEE Circuits and Systems Magazine, 13(2):56–73, 2013.

[17] Leon Chua. Resistance switching memories are memristors. Applied Physics A, 102(4):765–783, 2011.

[18] Leon O Chua. Memristor-the missing circuit element. Circuit Theory, IEEE Transactions on, 18(5):507–519, 1971.

[19] Leon O Chua and Sung Mo Kang. Memristive devices and systems. Proceedings of the IEEE, 64(2):209–223, 1976.

[20] Joni Dambre, David Verstraeten, Benjamin Schrauwen, and Serge Mas- sar. Information processing capacity of dynamical systems. Scientific reports, 2, 2012.

[21] EC Demis, R Aguilera, HO Sillin, K Scharnhorst, EJ Sandouk, M Aono, AZ Stieg, and JK Gimzewski. Atomic switch networksnanoar- chitectonic design of a complex system for natural computing. Nan- otechnology, 26(20):204003, 2015.

[22] Robert H Dennard, Fritz H Gaensslen, V Leo Rideout, Ernest Bassous, and Andre R LeBlanc. Design of ion-implanted mosfet’s with very small physical dimensions. IEEE Journal of Solid-State Circuits, 9(5):256– 268, 1974.

[23] Jack Dongarra, Pete Beckman, Terry Moore, Patrick Aerts, Giovanni Aloisio, Jean-Claude Andre, David Barkai, Jean-Yves Berthou, Taisuke Boku, Bertrand Braunschweig, et al. The international exascale soft- ware project roadmap. International Journal of High Performance Computing Applications, 25(1):3–60, 2011.

98 [24] Chrisantha Fernando and Sampsa Sojakka. Pattern recognition in a bucket. In Advances in artificial life, pages 588–597. Springer, 2003.

[25] Ella Gale, Ben de Lacy Costello, and Andrew Adamatzky. Design of a hybrid robot control system using memristor-model and ant-inspired based information transfer protocols. arXiv preprint arXiv:1402.4004, 2014.

[26] Ella Gale, Ben de Lacy Costello, and Andrew Adamatzky. Boolean logic gates from a single memristor via low-level sequential logic. In International Conference on Unconventional Computing and Natural Computation, pages 79–89. Springer, 2013.

[27] Claudio Gallicchio and Alessio Micheli. Deep reservoir computing: A critical analysis. In European Symposium on Artificial Neural Net- works, Computational Intelligence and Machine Learning, 2016.

[28] C Lee Giles, Steve Lawrence, and Ah Chung Tsoi. Noisy time series prediction using recurrent neural networks and grammatical inference. Machine learning, 44(1-2):161–183, 2001.

[29] Xavier Glorot and Yoshua Bengio. Understanding the difficulty of training deep feedforward neural networks. In Aistats, volume 9, pages 249–256, 2010.

[30] Alex Graves, Abdel-rahman Mohamed, and Geoffrey Hinton. Speech recognition with deep recurrent neural networks. In 2013 IEEE inter- national conference on acoustics, speech and signal processing, pages 6645–6649. IEEE, 2013.

[31] Wilfried Haensch, Edward J Nowak, Robert H Dennard, Paul M Solomon, Andres Bryant, Omer H Dokumaci, Arvind Kumar, Xin- lin Wang, Jeffrey B Johnson, and Massimo V Fischetti. Silicon cmos devices beyond scaling. IBM Journal of Research and Development, 50(4.5):339–361, 2006.

[32] Helmut Hauser, Auke J Ijspeert, Rudolf M F¨uchslin, Rolf Pfeifer, and Wolfgang Maass. Towards a theoretical foundation for morphological

99 computation with compliant bodies. Biological cybernetics, 105(5):355– 370, 2011.

[33] Marti A. Hearst, Susan T Dumais, Edgar Osman, John Platt, and Bernhard Scholkopf. Support vector machines. IEEE Intelligent Sys- tems and their Applications, 13(4):18–28, 1998.

[34] Michiel Hermans, Micha¨el Burm, Thomas Van Vaerenbergh, Joni Dambre, and Peter Bienstman. Trainable hardware for dynamical com- puting using error backpropagation through physical media. Nature communications, 6, 2015.

[35] Michiel Hermans and Benjamin Schrauwen. Memory in linear recurrent neural networks in continuous time. Neural Networks, 23(3):341–355, 2010.

[36] Geoffrey E Hinton, Simon Osindero, and Yee-Whye Teh. A fast learning algorithm for deep belief nets. Neural computation, 18(7):1527–1554, 2006.

[37] Geoffrey E Hinton, Nitish Srivastava, Alex Krizhevsky, Ilya Sutskever, and Ruslan R Salakhutdinov. Improving neural networks by preventing co-adaptation of feature detectors. arXiv preprint arXiv:1207.0580, 2012.

[38] Sepp Hochreiter. Untersuchungen zu dynamischen neuronalen netzen. Diploma, Technische Universit¨atM¨unchen, page 91, 1991.

[39] Sepp Hochreiter and J¨urgenSchmidhuber. Long short-term memory. Neural computation, 9(8):1735–1780, 1997.

[40] Gerard Howard, Ella Gale, Larry Bull, Ben de Lacy Costello, and Andy Adamatzky. Evolution of plastic learning in spiking networks via mem- ristive connections. IEEE Transactions on Evolutionary Computation, 16(5):711–729, 2012.

[41] Giacomo Indiveri, Robert Legenstein, George Deligeorgis, Themistoklis Prodromakis, et al. Integration of nanoscale memristor synapses in neu-

100 romorphic computing architectures. Nanotechnology, 24(38):384010, 2013.

[42] Herbert Jaeger. Short term memory in echo state networks. GMD- Forschungszentrum Informationstechnik, 2001.

[43] Herbert Jaeger. The echo state approach to analysing and training recurrent neural networks-with an erratum note. Bonn, Germany: German National Research Center for Information Technology GMD Technical Report, 148:34, 2001.

[44] Herbert Jaeger. Tutorial on training recurrent neural networks, cover- ing BPPT, RTRL, EKF and the” echo state network” approach. GMD- Forschungszentrum Informationstechnik, 2002.

[45] Herbert Jaeger and Harald Haas. Harnessing nonlinearity: Predicting chaotic systems and saving energy in wireless communication. science, 304(5667):78–80, 2004.

[46] Sung Hyun Jo, Ting Chang, Idongesit Ebong, Bhavitavya B Bhadviya, Pinaki Mazumder, and Wei Lu. Nanoscale memristor device as synapse in neuromorphic systems. Nano letters, 10(4):1297–1301, 2010.

[47] GK Johnsen, CA L¨utken, ØG Martinsen, and S Grimnes. Memristive model of electro-osmosis in skin. Physical Review E, 83(3):031916, 2011.

[48] Zdenek Kolka, Dalibor Biolek, and Viera Biolkova. Improved model of tio2 memristor. Radioengineering, 24(2):379, 2015.

[49] Zoran Konkoli and G¨oranWendin. Toward bio-inspired information processing with networks of nano-scale switching elements. arXiv preprint arXiv:1311.6259, 2013.

[50] Jonathan Koomey, Stephen Berard, Marla Sanchez, and Henry Wong. Implications of historical trends in the electrical efficiency of comput- ing. IEEE Annals of the History of Computing, 33(3):46–54, 2011.

101 [51] Robert Kozma, Robinson E Pino, and Giovanni E Pazienza. Ad- vances in neuromorphic memristor science and applications, volume 4. Springer Science & Business Media, 2012.

[52] Manjari S Kulkarni and Christof Teuscher. Memristor-based reser- voir computing. In Nanoscale Architectures (NANOARCH), 2012 IEEE/ACM International Symposium on, pages 226–232. IEEE, 2012.

[53] Suhas Kumar. Fundamental limits to moores law. Fundamental Limits to Moore’s Law. Stanford University, 9, 2012.

[54] Laurent Larger, Miguel C Soriano, Daniel Brunner, Lennert Appeltant, Jose M Guti´errez,Luis Pesquera, Claudio R Mirasso, and Ingo Fischer. Photonic information processing beyond turing: an optoelectronic im- plementation of reservoir computing. Optics express, 20(3):3241–3249, 2012.

[55] Yann LeCun, Bernhard Boser, John S Denker, Donnie Henderson, Richard E Howard, Wayne Hubbard, and Lawrence D Jackel. Back- propagation applied to handwritten zip code recognition. Neural com- putation, 1(4):541–551, 1989.

[56] Yann LeCun, LD Jackel, Leon Bottou, A Brunot, Corinna Cortes, JS Denker, Harris Drucker, I Guyon, UA Muller, Eduard Sackinger, et al. Comparison of learning algorithms for handwritten digit recog- nition. In International conference on artificial neural networks, vol- ume 60, pages 53–60, 1995.

[57] Robert Legenstein and Wolfgang Maass. Edge of chaos and predic- tion of computational performance for neural circuit models. Neural Networks, 20(3):323–334, 2007.

[58] TNNBR Legenstein. At the edge of chaos: Real-time computations and self-organized criticality in recurrent neural networks. In Advances in Neural Information Processing Systems 17: Proceedings of the 2004 Conference, volume 17, page 145. MIT Press, 2005.

102 [59] Eero Lehtonen and Mika Laiho. Cnn using memristors for neighbor- hood connections. In Cellular Nanoscale Networks and Their Applica- tions (CNNA), 2010 12th International Workshop on, pages 1–4. IEEE, 2010.

[60] Bernabe Linares-Barranco, Teresa Serrano-Gotarredona, Luis A Camu˜nas-Mesa,Jose A Perez-Carrasco, Carlos Zamarre˜no-Ramos,and Timothee Masquelier. On spike-timing-dependent-plasticity, memris- tive devices, and building a self-learning visual cortex. Frontiers in neuroscience, 5:26, 2011.

[61] Seppo Linnainmaa. The representation of the cumulative rounding error of an algorithm as a taylor expansion of the local rounding errors. Master’s Thesis (in Finnish), Univ. Helsinki, pages 6–7, 1970.

[62] Carl Grant Looney. Pattern recognition using neural networks: theory and algorithms for engineers and scientists. Oxford University Press, Inc., 1997.

[63] Mantas LukoˇsEviˇcIusand Herbert Jaeger. Reservoir computing ap- proaches to recurrent neural network training. Computer Science Re- view, 3(3):127–149, 2009.

[64] Akshay Kumar Maan, Deepthi Anirudhan Jayadevi, and Alex Pap- pachen James. A survey of memristive threshold logic circuits. IEEE transactions on neural networks and learning systems, 2016.

[65] Wolfgang Maass, Thomas Natschl¨ager,and Henry Markram. Real-time computing without stable states: A new framework for neural compu- tation based on perturbations. Neural computation, 14(11):2531–2560, 2002.

[66] Leandro Maciel, Fernando Gomide, David Santos, and Rosangela Ballini. Exchange rate forecasting using echo state networks for trad- ing strategies. In 2014 IEEE Conference on Computational Intelligence for Financial Engineering & Economics (CIFEr), pages 40–47. IEEE, 2014.

103 [67] Carver Mead. Neuromorphic electronic systems. Proceedings of the IEEE, 78(10):1629–1636, 1990.

[68] Carver Mead and Mohammed Ismail. Analog VLSI implementation of neural systems, volume 80. Springer Science & Business Media, 2012.

[69] Paul Meuffels and Rohit Soni. Fundamental issues and problems in the realization of memristors. arXiv preprint arXiv:1207.7319, 2012.

[70] Sparsh Mittal. A survey of techniques for approximate computing. ACM Computing Surveys (CSUR), 48(4):62, 2016.

[71] Bharathwaj Muthuswamy. Implementing memristor based chaotic cir- cuits. International Journal of Bifurcation and Chaos, 20(05):1335– 1350, 2010.

[72] Takeo Ohno, Tsuyoshi Hasegawa, Alpana Nayak, Tohru Tsuruoka, James K Gimzewski, and Masakazu Aono. Sensory and short-term memory formations observed in a ag2s gap-type atomic switch. Ap- plied Physics Letters, 99(20):203108, 2011.

[73] Ehsan Nedaaee Oskoee and Muhammad Sahimi. Electric cur- rents in networks of interconnected memristors. Physical Review E, 83(3):031105, 2011.

[74] Yvan Paquot, Francois Duport, Antoneo Smerieri, Joni Dambre, Ben- jamin Schrauwen, Marc Haelterman, and Serge Massar. Optoelectronic reservoir computing. Scientific reports, 2, 2012.

[75] Yuriy V Pershin and Massimiliano Di Ventra. Neuromorphic, digital, and quantum computation with memory circuit elements. Proceedings of the IEEE, 100(6):2071–2080, 2012.

[76] Rajat Raina, Anand Madhavan, and Andrew Y Ng. Large-scale deep unsupervised learning using graphics processors. In Proceedings of the 26th annual international conference on machine learning, pages 873– 880. ACM, 2009.

104 [77] Alexander D Rast, Francesco Galluppi, Xin Jin, and SB Furber. The leaky integrate-and-fire neuron: A platform for synaptic model explo- ration on the spinnaker chip. In The 2010 International Joint Confer- ence on Neural Networks (IJCNN), pages 1–8. IEEE, 2010.

[78] Ali Rodan and Peter Tino. Minimum complexity echo state network. IEEE transactions on neural networks, 22(1):131–144, 2011.

[79] Frank Rosenblatt. The perceptron, a perceiving and recognizing au- tomaton Project Para. Cornell Aeronautical Laboratory, 1957.

[80] David E Rumelhart, Geoffrey E Hinton, and Ronald J Williams. Learn- ing representations by back-propagating errors. Cognitive modeling, 5(3):1, 1988.

[81] Akihito Sawa. Resistive switching in transition metal oxides. Materials today, 11(6):28–36, 2008.

[82] Felix Sch¨urmann,Karlheinz Meier, and Johannes Schemmel. Edge of chaos computation in mixed-mode vlsi-a hard liquid. In NIPS, pages 1201–1208, 2004.

[83] Henry O Sillin, Renato Aguilera, Hsien-Hang Shieh, Audrius V Avizie- nis, Masakazu Aono, Adam Z Stieg, and James K Gimzewski. A theo- retical and experimental study of neuromorphic atomic switch networks for reservoir computing. Nanotechnology, 24(38):384004, 2013.

[84] Mark D Skowronski and John G Harris. Automatic speech recogni- tion using a predictive echo state network classifier. Neural networks, 20(3):414–423, 2007.

[85] Jochen J Steil. Backpropagation-decorrelation: online recurrent learn- ing with o (n) complexity. In Neural Networks, 2004. Proceedings. 2004 IEEE International Joint Conference on, volume 2, pages 843– 848. IEEE, 2004.

[86] Adam Z Stieg, Audrius V Avizienis, Henry O Sillin, Cristina Martin- Olmos, Masakazu Aono, and James K Gimzewski. Emergent criticality

105 in complex turing b-type atomic switch networks. Advanced Materials, 24(2):286–293, 2012.

[87] Dmitri B Strukov and Konstantin K Likharev. All-ndr crossbar logic. In Nanotechnology (IEEE-NANO), 2011 11th IEEE Conference on, pages 865–868. IEEE, 2011.

[88] Dmitri B Strukov, Gregory S Snider, Duncan R Stewart, and R Stan- ley Williams. The missing memristor found. nature, 453(7191):80–83, 2008.

[89] Dmitri B Strukov and R Stanley Williams. Exponential ionic drift: fast switching and low volatility of thin-film memristors. Applied Physics A, 94(3):515–519, 2009.

[90] Anand Subramaniam, Kurtis D Cantley, Gennadi Bersuker, David C Gilmer, and Eric M Vogel. Spike-timing-dependent plasticity using biologically realistic action potentials and low-temperature materials. IEEE Transactions on Nanotechnology, 12(3):450–459, 2013.

[91] David Sussillo and Larry F Abbott. Generating coherent patterns of activity from chaotic neural networks. Neuron, 63(4):544–557, 2009.

[92] Ronald Tetzlaff et al. Memristors and Memristive Systems. Springer, 2014.

[93] Fabian Triefenbach, Azarakhsh Jalalvand, Benjamin Schrauwen, and Jean-Pierre Martens. Phoneme recognition with large hierarchical reservoirs. In Advances in neural information processing systems, pages 2307–2315, 2010.

[94] Alan Mathison Turing. The chemical basis of . Bulletin of mathematical biology, 52(1-2):153–197, 1990.

[95] Kristof Vandoorne, Joni Dambre, David Verstraeten, Benjamin Schrauwen, and Peter Bienstman. Parallel reservoir computing using optical amplifiers. IEEE transactions on neural networks, 22(9):1469– 1481, 2011.

106 [96] Kristof Vandoorne, Wouter Dierckx, Benjamin Schrauwen, David Ver- straeten, Roel Baets, Peter Bienstman, and Jan Van Campenhout. Toward optical signal processing using photonic reservoir computing. Optics Express, 16(15):11182–11192, 2008.

[97] Kristof Vandoorne, Pauline Mechet, Thomas Van Vaerenbergh, Martin Fiers, Geert Morthier, David Verstraeten, Benjamin Schrauwen, Joni Dambre, and Peter Bienstman. Experimental demonstration of reser- voir computing on a silicon photonics chip. Nature communications, 5, 2014.

[98] David Verstraeten. Reservoir computing: computation with dynamical systems. PhD thesis, Ghent University, 2009.

[99] David Verstraeten, Joni Dambre, Xavier Dutoit, and Benjamin Schrauwen. Memory versus non-linearity in reservoirs. In The 2010 international joint conference on neural networks (IJCNN), pages 1–8. IEEE, 2010.

[100] David Verstraeten, Benjamin Schrauwen, Michiel dHaene, and Dirk Stroobandt. An experimental unification of reservoir computing meth- ods. Neural networks, 20(3):391–403, 2007.

[101] John Von Neumann. The general and logical theory of automata. Cere- bral mechanisms in behavior, 1(41):1–2, 1951.

[102] Sascha Vongehr. Missing the memristor. Advanced Science Letters, 17(1):285–290, 2012.

[103] Sascha Vongehr and Xiangkang Meng. The missing memristor has not been found. Scientific reports, 5, 2015.

[104] Alex Waibel, Toshiyuki Hanazawa, Geoffrey Hinton, Kiyohiro Shikano, and Kevin J Lang. Phoneme recognition using time-delay neural net- works. IEEE transactions on acoustics, speech, and signal processing, 37(3):328–339, 1989.

107 [105] M Mitchell Waldrop. The chips are down for moores law. Nature News, 530(7589):144, 2016.

[106] Rainer Waser and Masakazu Aono. Nanoionics-based resistive switch- ing memories. Nature materials, 6(11):833–840, 2007.

[107] Zhiqiang Wei, Y Kanzawa, K Arita, Y Katoh, K Kawai, S Muraoka, S Mitani, S Fujii, K Katayama, M Iijima, et al. Highly reliable taox reram and direct evidence of redox reaction mechanism. In 2008 IEEE International Electron Devices Meeting, pages 1–4. IEEE, 2008.

[108] Paul Werbos. Beyond regression: New tools for prediction and analysis in the behavioral sciences. 1974.

[109] Paul J Werbos. Backpropagation through time: what it does and how to do it. Proceedings of the IEEE, 78(10):1550–1560, 1990.

[110] R Stanley Williams and Matthew D Pickett. The art and science of constructing a memristor model. In Memristors and Memristive Sys- tems, pages 93–104. Springer, 2014.

[111] Wang Xiao-Ping, Chen Min, and Shen Yi. Switching mechanism for tio2 memristor and quantitative analysis of exponential model param- etersproject supported by the national natural science foundation of china (grant nos. 61374150 and 61374171), the state key program of the national natural science foundation of china (grant no. 61134012), the national basic research program of china (grant no. 2011cb710606), and the fundamental research funds for the central universities, china (grant no. 2013ts126). Chinese Physics B, 24(8):088401, 2015.

[112] J Joshua Yang, Feng Miao, Matthew D Pickett, Douglas AA Ohlberg, Duncan R Stewart, Chun Ning Lau, and R Stanley Williams. The mechanism of electroforming of metal oxide memristive switches. Nan- otechnology, 20(21):215201, 2009.

[113] J Joshua Yang, Matthew D Pickett, Xuema Li, Douglas AA Ohlberg, Duncan R Stewart, and R Stanley Williams. Memristive switching

108 mechanism for metal/oxide/metal nanodevices. Nature nanotechnol- ogy, 3(7):429–433, 2008.

[114] J Joshua Yang, Dmitri B Strukov, and Duncan R Stewart. Memristive devices for computing. Nature nanotechnology, 8(1):13–24, 2013.

[115] Yuchao Yang, Peng Gao, Siddharth Gaba, Ting Chang, Xiaoqing Pan, and Wei Lu. Observation of conducting filament growth in nanoscale resistive memories. Nature communications, 3:732, 2012.

109