<<

Nonlinear Dyn https://doi.org/10.1007/s11071-018-4730-z

REVIEW

Memristor-based neural networks with weight simultaneous perturbation training

Chunhua Wang · Lin Xiong · Jingru Sun · Wei Yao

Received: 24 May 2018 / Accepted: 11 December 2018 © Springer Nature B.V. 2018

Abstract The training of neural networks involves pler and easier implementation of the neural network numerous operations on the weight matrix. If neural in hardware. The practicability, utility and simplicity of networks are implemented in hardware, all weights this scheme are demonstrated by the supervised learn- will be updated in parallel. However, neural networks ing tasks. based on CMOS technology face many challenges in the updating phase of weights. For example, deriva- Keywords · Synaptic unit · Hardware tion of the activation and error back prop- implementation · Multilayer neural networks (MNNs) · agation make it difficult to be realized at the circuit Weight simultaneous perturbation (WSP) level, even though the back propagation is rather efficient and popular in neural networks. In this paper, a novel synaptic unit based on double identical is designed, on the basis of which a new 1 Introduction neural network circuit architecture is proposed. The whole network is trained by a hardware-friendly weight With the rapid development of artificial intelligence, simultaneous perturbation (WSP) algorithm. The hard- neural networks are attracting more and more atten- ware implementation of neural networks based on WSP tions and play a significantly important role in the field algorithm only involves the feedforward circuit and of image recognition, [1] and auto- does not require the bidirectional circuit. Furthermore, matic control, etc. Up to now, most of neural networks two forward calculations are merely needed to update have been realized by software, which lacks the inher- all weight matrices for each pattern, which significantly ent parallelism of neural networks. However, circuit simplifies the weight update circuit and allows sim- operations are inherently parallel and capable of pro- viding high speed computation. Therefore, it is neces- B · · · C. Wang ( ) L. Xiong J. Sun W. Yao sary to research hardware implementations of neural College of Computer Science and Electronic , Hunan University, Changsha 410082, China networks. e-mail: [email protected] Regarding hardware implementations of neural net- L. Xiong works, many studies have already been conducted [2– e-mail: [email protected] 8], most of which implemented neural networks via J. Sun CMOS technology. However, the implementation of e-mail: [email protected] nonvolatile weight storage is a major bottleneck for W. Yao them. On the other hand, when the area and consump- e-mail: [email protected] tion taken into account, it is a hard task to efficiently 123 C. Wang et al. realize the circuit-based neural networks using CMOS all single- networks [30]. Their design makes the technology. training process simplified and indeed reduces the com- Recently, a novel device, the memristor [9–13], pro- munication overhead and the circuit complexity. How- vides a fire-new approach for hardware implementa- ever, their initial training and the weight update calcu- tions of neural networks. The memristor, a nonvolatile lation were performed in software. In 2015, Soudry et programmable resistor, was first physically realized by al. designed an artificial synapse constituting of a single HP Labs in 2008 [14]. Current through the memristor memristor and two transistors, and BP algorithm was or voltage across the memristor enables to tune mem- used to train the whole network [31]. Indeed, it is fairly ristor resistance. When its excitation is turned off, the novel for their design that the errors are encoded as cor- device keeps its most recent resistance until the excita- responding weight update time and the errors and inputs tion is turned on again. Memristor is widely applied are introduced into weight updating process by means to chaotic circuits [15,16], memory and neural net- of the state variables of memristors. However, BP algo- works. In memristor-based neural networks, memris- rithm needs to propagate the error backward from the tor’s nonvolatility is significantly similar to the biolog- output layer to the hidden layers, and the update weight ical synapse, which spurs it to be a promising candidate is obtained by calculating the of the activa- for weight storage. And the fact that memristor belongs tion function. These complex calculations are not easy to nanodevices enables it easier to be integrated in the to implement at the circuit level. analog circuit. Because of these, memristor was widely Focusing on the difficulties in efficiently imple- used in hardware implementations of neural networks. menting learning rule like BP algorithm in hard- Many memristor-based neural networks were previ- ware, some hardware-friendly were pro- ously trained by the so-called spike-timing-dependent posed and applied to the circuit implementation of plasticity (STDP) [17–21], which was intended for neural networks. For example, in 2015, the random explaining the biological phenomenon about tuning weight change (RWC) algorithm was used to train the synaptic junction strength of the neurons. Besides, memristor-based neural network where the synaptic Sheri et al. realized neuromorphic recogni- weights were randomly updated by a small constant in tion system with two PCMO memristors as a synapse each iteration [32]. Their design simplifies the circuit [22]. Nishitani et al. modified an existing three-terminal structure and does not involve complex circuit oper- ferroelectric memristor (3T-FeMEM) model, and the ations. However, in RWC’s weight updating process, STDP was exploited to perform just the sign but not the value of the error variation is [19]. However, the convergence of STDP-based learn- considered, and the direction of weight change is ran- ing is not guaranteed for general inputs [23]. dom every time, which results in more iterations being Other memristor-based learning systems recently needed. attract people’s attentions. For example, memristor- In order to solve the problems in the memristor- based single-layer neural networks (SNNs) trained based neural network trained by RWC and BP algo- by the algorithm were designed to com- rithm, in this paper, a new memristor-based neural net- plete some linear separable tasks [24,25]. Yet, it is work circuit trained by weight simultaneous perturba- the single-layer structure and binary outputs that limit tion (WSP) algorithm is proposed. WSP algorithm was its wide-ranging applications. Therefore, constructing introduced in [33]. Perturbations whose signs are ran- memristor-based multilayer neural networks (MNNs) domly selected with equal are simultane- [26–29] that are rather more practical and efficient than ously added to all weights, and the difference between SSNs is requisite. In 2012, Adhikari et al. put forward a the first error function without perturbations and the modified chip-in-the-loop learning rule for MNNs with second one with perturbations is used to control the memristor bridge synapses that consisted of three tran- change of weights in WSP. Unlike RWC algorithm, sistors and four identical memristors and realized zero, both the sign and value of the variation of the error negative and positive synaptic weights [30]. Unlike the function are exploited to update all weights. Hence, conventional chip-in-the-loop learning, a complex mul- fewer iterations are needed compared with RWC algo- tilayer network learning was conducted by decompos- rithm. Additionally, in WSP,complex calculations such ing it into multiple simple single-layer network learn- as derivation and error back propagation are not also ing, and the descent algorithm was used to train involved compared with BP algorithm. Though the 123 Memristor-based neural networks

∗  ∗  ∗ algorithm is relatively efficient and simple, so far, G(s ) G (s ) ∗ G (s ) ∗ G(s) = + (s − s ) + (s − s )2 the memristor-based neural network circuit trained by 0! 1! 2! (n) ∗ WSP algorithm is not available. The hardware architec- G (s ) ∗  ∗  +··· + (s − s )n + O (s − s )n ture of the neural network trained by WSP algorithm is n! therefore proposed in this paper. In , in order (3) to solve the problem of storing and updating weights, a (n)( ∗) novel synaptic unit based on double identical memris- where n is a positive . G s represents nth ( ) = ∗ tors is designed, and taking full advantage of the char- order derivative of the function G s at s s , and [( − ∗)n] acteristics of WSP algorithm, this design only needs O s s represents higher-order infinitesimal of ( − ∗)n two forward computations for each pattern to update s s . all weight matrices, which greatly simplifies the cir- Soudry et al. demonstrated that if the variations in the ( ) ( ( )) cuit structure. This paper indicates that it is possible value of s t are restricted to be small, G s t is able to be approximated by first-order Taylor series around for neural networks to be actualized in a simpler, more ∗ compact and reliable form at the circuit level. certain point s [31]. The memductance is hence given The remainder of this paper is organized as follows. by ∗  ∗ In Sect. 2, the basic information about memristor, RWC G(s ) G (s ) ∗ G(s) = + (s − s ) (4) algorithm and WSP algorithm is introduced. In Sect. 3, 0! 1! the circuit implementations of SNNs and MNNs are simplified form: described in detail. In Sect. 4, the proposed circuit ∗ G(s(t)) = g +ˆgs(t) (5) architectures are numerically evaluated by supervised ∗ ∗ ∗ tasks to demonstrate its practicability. Finally, Sect. 6 where gˆ =[dG(s)/ds]s=s∗ and g = G(s ) −ˆgs . presents conclusions. Based on this memristor model, SSNs and MNNs are efficiently implemented in hardware.

2 Preliminaries 2.2 Random weight change In this section, the basic information about memristor, RWC algorithm and WSP algorithm is described. In the random weight change (RWC) algorithm, synap- tic weights are randomly tuned by a small constant, and the learning rule is given as ⎧ ω ( + ) = ω ( ) + ω ( + ) 2.1 Memristor ⎨⎪ ij n 1 ij n ij n 1 ω (n + 1) = ω if E(n + 1)

Moreover, the direction of weight change is random function calculation process is again executed. Finally, every time. Hence, the general trend of the trajectory the weight update matrices are calculated according to of the weight change is to change along the direction the difference between the first and second error func- of energy decline, as shown in [32, Fig. 1]. Therefore, tions. In 1993, Alspector et al. [33] demonstrated that more iterations are needed in RWC. in WSP algorithm, the amount of change in weight was The WSP algorithm is different from the RWC algo- able to be approximated by the following formulas for rithm. In WSP, the sign and value of the variation of a perturbation small enough ⎧ the error function are used to tune the weights, which ∂ E ⎪ ω =−η makes the data fully utilized, as described in Sect. 2.3. ⎪ ij ⎨ ∂ωij It also does not require complicated operations such as E(ωij + ω hij) − E(ωij) (10) ⎪ ≈−η per h derivative and error back propagation. ⎪ ω ij ⎩⎪ per ωij(n + 1) = ωij(n) + ωij 2.3 Weight simultaneous perturbation The learning rule is expressed in matrix form ⎧ ⎪ E(W + ωperH) − E(W) For convenience, the following nomenclatures are used ⎪ W =−η H ⎨⎪ ω in this paper. per  ω ( = , ,..., , = , ,..., ) E (11) ij i 1 2 m j 1 2 n : A weight. ⎪ =−η H ⎪ ω ωij(i = 1, 2,...,m, j = 1, 2,...,n): Variation of ⎩ per W(n + 1) = W(n) + W ωij. ( = , ,..., , = , ,..., ) = hij i 1 2 m j 1 2 n : A sign. hij where ωper, a positive (usually small) constant, is the ± 1.Thesignofhij is randomly selected with equal value of simultaneous perturbation injected into the probability. weight. η is the , a positive coefficient. W:Am × n weight matrix consisting of ωij. According to Eq. (11), the WSP algorithm requires W:Am×n weight update matrix consisting of ωij. only two feedforward operations of the neural network. H:Am × n sign matrix consisting of hij. And compared with BP algorithm, derivative of the x: A vector representing a pattern. and error back propagation which d: A vector representing a target output. are difficult to be implemented in hardware are not o: A vector representing an actual output of the network involved. In addition to simple sign circuits to control without perturbations for the input x. the direction of weight perturbation and weight adjust- oper: A vector representing an actual output of the net- ment, it either does not require other and work with simultaneous perturbations for the input x. transposition operations. Hence, the relationship between x and o is given by = ( ) o f Wx (7) 3 Circuit architecture design where f (·) represents the activation function. The error function is defined as In this section, implementing simple SNNs and general 1 MNNs architectures based on memristor arrays with E = d − o2 (8) 2 the WSP algorithm is described. MNNs are generally constructed by cascading multiple SNNs. Therefore, In BP algorithm, the weight update learning rule is ⎧ attentions are mainly paid to the design of SNNs. ⎨⎪ ∂ E ωij =−η ∂ωij (9) ⎩⎪ 3.1 Artificial synapse ωij(n + 1) = ωij(n) + ωij Regarding the WSP algorithm, its execution details As regards the circuit architecture of the artificial are as follows. Firstly, the normal error function, with- synapse in the memristor crossbars, three possible out simultaneous perturbations, is calculated. Then, approaches are exhibited in Fig. 1.InFig.1a, a single perturbations whose signs are randomly selected are memristor at the cross point of the circuit is regarded simultaneously applied to all weights, and the error as a synapse for weight storage where the adjustable 123 Memristor-based neural networks

where vgs and vds are the gate–source voltage and the drain–source voltage, respectively. vth is the threshold voltage. kn is the conductance parameter of the n-type transistor. When vgs

where k p is the conductance parameter of the p-type transistor. When vgs > −vth, the current of the p-type Fig. 1 Three circuit architectures of memristor-based artificial = synapse transistor is equal to zero, namely i p 0. Assume that kn = k p = k and the absolute value of the threshold voltage, namely |vth|, is the same for two transistors. The synaptic unit is fed with three voltage input sig- nals in which u and u¯ =−u are, respectively, con- nected to the terminals of the n-type and p-type tran- sistors and an enable e whose value is 0, VDD, or −VDD (with VDD >vth) is connected to the gate of both transistors. We assume that

− vth < u

k(vDD − 2vth)>>G(s(t)) (15) Fig. 2 A synaptic unit based on double identical memristors Consequently, there are three possible situations: 1. When e = 0, four transistors are off. In other words, range of each weight is relatively small. Accordingly, in the output current of the synaptic unit is equal to = order to expand this range, the architectures in Fig. 1b, zero, namely I 0. In this case, the state variables c are extensively employed in memristor crossbars. In are not changed. = Fig. 1b, each input signal and its opposite polarity sig- 2. When e VDD, Tr2 and Tr3 both work in the linear nal are fed to a column of memristors, and in Fig. 1c region, while Tr1 and Tr4 are off. The currents I1 and I through both memristors, respectively, are each input signal is applied to a pair of memristors [27]. 2 ∗ By adding simple auxiliary circuits, two memristors are I1 = (g +ˆgs1)u ∗ (16) considered as a synapse. In this paper, the architecture I2 =−(g +ˆgs2)u in Fig. 1b is selected. The output current I of the synaptic unit is The architecture constituting of two MOSFET (p- = + = ( − ) ˆ =  ˆ type and n-type) and an ideal single memristor was pro- I I1 I2 s1 s2 gu sgu (17) posed by Soudry [31, Fig. 3(a)]. Taking full advantage 3. When e =−VDD, Tr1 and Tr4 both work in the of their architecture, a novel artificial synapse based on linear region, while Tr2 and Tr3 are off. The currents double identical memristors is designed, as shown in I1 and I2 through both memristors, respectively, are Fig. 2. I =−(g∗ +ˆgs )u The current of the n-type transistor in the linear 1 1 (18) I = (g∗ +ˆgs )u region is 2 2  The output current I of the synaptic unit is 1 2 in = kn (v − v )v − v (12) = + = ( − ) ˆ =  ˆ gs th ds 2 ds I I1 I2 s2 s1 gu sgu (19) 123 C. Wang et al. where for notational simplicity, s contains the sign of 3.3 Circuit operation the change in two state variables. According to the relationship between input signal During running period of WSP algorithm, the circuit’s and output signal from Eqs. (17) and (19), we can define operations in each trial are composed of simple five the synaptic weight as phases. First, in the computing phase, the current sum- mation operation and calculation of the error function ω =  ˆ sg (20) are realized. Second, in the perturbing phase, perturba- The above analysis indicates that the design of the tions with random signs are added to all weights simul- synaptic unit is feasible. taneously. Third, a new computing phase is needed in order to get the new error function. Fourth, in the restor- ing phase, restoring are applied to memristors to eliminate effect of the perturbation in the perturb- 3.2 SNN circuit architecture ing phase on the state variables. Fifth, in the updating phase, all weight matrices are adjusted according to Based on the proposed synaptic unit in Fig. 2,a Eq. (11). The signals and their effect on the state vari- 3 × 3 SNN circuit architecture is designed in Fig. 3, able are shown in Fig. 4. where each synaptic unit is indexed by (i, j), with (1) First computing phase A vector x from the i ∈ {1,...,3} and j ∈ {1,...,3}. Each synaptic unit dataset is encoded by multiplying a positive constant receives two inputs u j , u¯ j and an enable signal eij, and a, which allows to convert a dimensionless number x j produces an output current Iij. Each column of synap- to corresponding voltage u j , namely u = ax. In order ¯ tic units in the arrays shares two inputs u j and u j . not to change the state of the memristor at last, when > The current summing operation, j Iij, is conducted the input signal is greater than zero, namely x j 0, at the terminal of each row. The row output interfaces the enable signal is designed as follows: convert the total current to the dimensionless output , ≤ < . ( ) = VDD if 0 t 0 5T1 signal ri . eij t (21) −VDD, if 0.5T1 ≤ t < T1

When the input signal is less than zero, namely x j < 0, the enable signal is designed as follows:

−VDD, if 0 ≤ t < 0.5T1 eij(t) = (22) VDD, if 0.5T1 ≤ t < T1 According to Eq. (2), the total changes of the state vari- ables for both memristors are, respectively, given by ⎧ . ⎨ s = 0 5T1 (ax )dt + T1 (−ax )dt = 0 ij1 0 j 0.5T1 j ⎩ . (23) s = 0 5T1 (−ax )dt + T1 (ax )dt = 0 ij2 0 j 0.5T1 j From Eq. (23), the net changes of the state variables both are zero, which indicates that the resistance of both memristors is not finally changed. When t ∈ [0, 0.5T1), the transistors Tr2 and Tr3 both work in the linear region, while Tr1 and Tr4 correspondingly are in the cutoff region. The currents I1 and I2 in Fig. 2 are calculated according to Eq. (16):

Fig. 3 A3× 3 SNN circuit architecture. ω represents a synap- ∗ ij Iij1 = a(g +ˆgsij1)x j tic unit which receives two voltage input signals u j , u¯ j and an ∗ (24) Iij2 =−a(g +ˆgsij2)x j enable signal eij, and produces an output current Iij,wherein the terminal eij controls not only the sign of input signal but also And the output current I of each synaptic unit from the duration and direction of weight adjustment signal. And the Eq. (17)is brown box represents the row output interface for data conver- ( ∈{ ,..., }, ∈{ ,..., }) sion. i 1 3 j 1 3 . (Color figure online) Iij = Iij1 + Iij2 = agˆ(sij1 − sij2)x = agsˆ ijx j (25) 123 Memristor-based neural networks

Φ Therefore, the total current in each row terminal i is Φi = Iij = agˆ sijx j (26) j j The target output vector d from the dataset commonly is dimensionless. The constant c, consequently, is signif- icantly essential to convert the current unit to a dimen- sionless number in the row output interface, namely oi = cΦi = acgˆ sijx j (27) j According to Eq. (20), we obtain Fig. 4 Operation protocol of a single synaptic unit (note that for S ω = ˆ the curve of in the figure, we intentionally enlarge the variation ij acgsij (28) of S in the perturbing phase and the restoring phase, and expand On the basis of the Ohm’s law, the arithmetic opera- the duration of the perturbing phase and the restoring phase in order to better observe its change) tion Wx is realized in this circuit. Finally, according to the target output d and the actual output o, the first error function also called cost function in where Tper is a positive constant. In other words, the is calculated and stored in a buffer circuit. The error duration of the perturbing phase is changeless through- function is given by out the training process, which also explains why the magnitude and duration of the voltage signals u and 1 2 per Efir = d − o (29) ¯ 2 uper are fixed. (2) Perturbing phase During perturbing phase, a pair (3) Second computing phase During each computing of voltage signals uper and u¯per with unchanging ampli- phase, the operations of the entire network are the same tude and duration are applied to the terminals u and u¯ in as the first computing phase. In this phase, the enable order to alter the state of memristors and introduce the signal is ( for x j > 0) + simultaneous perturbation. To generate random pertur- V , if T ≤ t < T2 T3 e (t) = DD 2 2 (34) bation, let e = hijVDD control the sign of each pertur- ij T2+T3 −VDD, if ≤ t < T3 bation, as described in Section III-A and the duration 2 And the input signals are the same as in the first com- of u¯per is the same as that of e. puting phase. Hence, the total changes in the internal = ≤ < eij hijVDD if T1 t T2 (30) state variables for the synaptic unit are ⎧ ∈ , ) + From Eq. (28), when t [T1 T2 , the perturbation ⎪ T2 T3 ⎪ 2 T3 ω ⎪ s = (ax )dt + + (−ax )dt constant per is given by ⎪ ij1 T2 j T2 T3 j ω ⎨⎪ 2 = per = 0 sper (31) + (35) acgˆ ⎪ T2 T3 ⎪  = 2 (− ) + T3 ( ) ⎪ sij2 axj dt T +T axj dt From the definitional perspective of the state variable, ⎪ T2 2 3 ⎩⎪ 2 the total change in the internal state for the synaptic = 0 unit⎧ is And the error function is given by ⎪ uper = axper   ⎪ 1 2 ⎨⎪ if T ≤ t < T E = d − o (36) 1 2 sec 2 per s = s − s ⎪ per 1  2  (32) ⎪ T2 (4) Restoring phase During the perturbing phase, ⎪ = uper − (−uper) dt ⎩ T1 simultaneous perturbations are added to change the = ( − ) 2axper T2 T1 state of memristors. Restoring signals are, hence, where the absolute values of s1 and s2 are equal needed to restore memristors to the stage where simul- because two memristors of the synaptic unit are iden- taneous perturbations are not added, which is signifi- tical. According to Eqs. (31) and (32), the duration of cantly critical for accurate update of all weights. Yet, the perturbing signal with a constant amplitude is there are some differences compared with the per-  ω  turbing phase. The enable signal and input signal are, T = T − T =  per  per 2 1  2  (33) 2a cgxˆ per respectively, 123 C. Wang et al. ⎧ ⎨ uper = axper if T ≤ t < T (37) ⎩ 3 4 eij =−hijVDD And the duration of e and u must meet the following requirement:

Tper = T4 − T3 (38) (5) Updating phase During updating phase, the updating voltage signals with a constant amplitude, namely uupd and u¯upd, are, respectively, connected to the terminals u and u¯. And the difference between two error functions is given by

E = E(W + ωperH) − E(W) = Esec − Efir (39) According to the learning rule described by Eq. (11), the variation of each weight is ω  Fig. 5 Circuit structure of weight updating when 11 and E ω both are positive and others are negative. The duration ωij =−η hij (40) 12 ωper of the enable signal is Tupd From Eq. (28), we obtain E 1. In the whole training process, the duration and sij =−η hij (41) acgˆωper amplitude of the perturbation signal, namely Tper and ax , are constants, and the random sign of The direction of weight adjustment is controlled by the per the perturbation signal is controlled by the termi- enable signal e. ⎧ nal e. ⎨ eij = VDD if ωij > 0 2. The duration of computing and restoring phase for = ω = ⎩ eij 0ifij 0 (42) each pattern is also constant throughout the training eij =−VDD if ωij < 0 process. From Eq. (2), the variation of the state variable for the 3. In the kth weight update process, the duration and amplitude of the updating signal, namely Tupd and synaptic unit is, hence, (for ωij > 0) ⎧ axupd , are constants for all synaptic units, and the ⎪ = ⎪ uupd axupd sign of the updating signal is controlled by the ter- ⎪ ≤ < ⎨   if T4 t T5 minal e.(k, a positive integer, is the index of weight T5 sij = uupd − (−uupd) dt (43) update.) ⎪ T4 ⎪ T5 ⎪ = 2axupddt ⎩⎪ T4 = 2a(T5 − T4)xupd According to Eq. (41) and Eq. (43), the duration of the 3.4 MNN circuit architecture updating signal with a constant amplitude axupd is   In part 2 and part 3, attentions are primarily paid to the  E  T = T − T = η  (44) SNNs circuit architecture and its operations with the upd 5 4  2 ˆω  2a cg per xupd WSP algorithm training. And derivative of the acti- Therefore, the sign and duration of the input and update vation function and error back propagation are not signals can be easily controlled by the enable signal e. involved for the multilayer neural network in WSP, Finally, for example, assume that the signs of ω11 and which greatly simplifies the circuit structure. There- ω12 are plus and others are minus in Fig. 3. So, its fore, in this part, the MNNs circuit architecture cas- weight update circuit is designed as Fig. 5. cading multiple SNNs, mainly a two-layer neural net- Through the above analysis, there are three notewor- work, is discussed. Regarding the circuit architecture thy points: of MNNs, its schematic is described in Fig. 6. 123 Memristor-based neural networks

Table 1 Circuit parameters Parameter Value

uper 40mV a 0.1V

VDD 5V

T1 20 µs c 1.0 × 108 A g∗ 1 µ S Fig. 6 Architecture of a general MNN trained by WSP algo- ˆ µ −1 −1 rithm. f (·) represents the activation function. The green box g 180 SV s represents a layer in a multilayer network. (Color figure online)

For a two-layer neural network, its mathematical in this paper. In SimElectronics, basic electrical com- model is expressed as ponent models including the MOSFET are provided. In addition, other circuit operations required for real- = ( ) o W2 f W1x (45) izing the WSP algorithm such as random sign genera- tion with equal probability and calculation of the error where f (·) denotes the activation function. W1 and function are implemented using MATLABcode. As for W2 are the input-hidden and the hidden-output weight matrix, respectively. And from Eq. (11), the weight the weight updating, a voltage signal with a constant update rule for a two-layer network is given by magnitude, namely axupd in Fig. 4, is needed, and its polarity is controlled by the enable signal e. Hence, ⎧  ⎪ E the weight update value is converted to corresponding ⎨⎪ W1 =−η H1 ωper pulse of specific duration from Eq. (44). Other param- ⎪  (46) ⎩⎪ E eters in this paper are shown in Table 1. W2 =−η H2 ωper where H1 and H2 are, respectively, the random sign 4.1 Basic operations of synaptic unit matrices of W1 and W2. The learning rule for a general MNNs is derived based on Eq. (11) Toverify basic operations of the proposed synaptic unit,   =−η E a numerical simulation is carried out. A relatively sim- Wp Hp (47) × ωper ple 2 2 synaptic unit circuit whose input signals x1 and x are separately set to 0.1 and 0.8 is simulated in where the subscript p, a positive integer, is the index 2 ten operation cycles, namely t ∈[0, 10T ].Thesimul- of network layer without taking the input layer into 5 taneous perturbation ω is 0.004, and its sign is ran- account. per domly chosen with equal probability. The magnitude of As for the circuit operations of MNNs, it is basically updating signal, namely ax , is 60mV. With regard the same as that of SNNs. Similarly, it is only two feed- upd to weight adjustments of four synapse units, assume forward operations that are needed to synchronously that update all weight matrices for the MNN when WSP ⎧ ⎪ ω = 0.008 × sign(5T − t) algorithm used to train the neural network according to ⎨⎪ 11 5 ω = . × ( − ) Eq. (47). 12 0 008 sign 5T5 t ⎪ ω = . × ( − ) (48) ⎩⎪ 21 0 008 sign t 5T5 ω22 = 0.008 × sign(t − 5T5) 4 Circuit simulation It is because the synaptic unit is a dual identical mem- ristors structure that the absolute value of the corre- In this section, the neural network circuit is modeled sponding weight change for each memristor is 0.004. in SimElectronics attached to the Simscape toolbox Based on the above hypothetical data, the circuit in MATLAB to evaluate the circuit performance. The simulation results are shown in Fig. 7, together with memristor model [31] proposed by Soudry is exploited the input voltages of synaptic units and its correspond- 123 C. Wang et al.

Fig. 7 A2× 2 synaptic unit circuit simulation during ten oper- output result (Output1, Output2). vij is the input signal connected ation cycles. Top and middle: voltage (solid blue) and weight to the terminal u of the synaptic unit (i ∈ [1, 2] , j ∈ [1, 2]). (solid red) change for each synaptic unit. Bottom: corresponding (Color figure online)

ing weights. The feasibility of basic operations of the number of iterations will become very large. The mag- circuit is finally verified. nitude of perturbation voltage, namely axper,is40mV. The curve between network training errors and itera- tions is shown in Fig. 8. Eight different test samples, 4.2 Three input odd parity function all possible inputs of three input odd parity function, are finally used to test the performance of the network. A three input odd parity function task is conducted The target output and actual output are shown in Fig. 9 using a two-layer network with five neurons in the where the mean squared errors (MSE) are 0.0016. As hidden layer and only one neuron in the output layer. shown in Fig. 8, the iterations trained with WSP algo- Table 2 shows the truth table of the three odd parity rithm are about 1000. For the same task, the iterations function. In this task, the activation function is set as trained with RWC algorithm are approximately 10,000 the . The learning rate η is 0.2, and and the iterations of BP are about 1000 in [38, Fig. 12], the significantly important parameter ωper, the amount which indicates that the performance of the proposed of weight simultaneous perturbation, is set to 0.002. If memristor-based neural network with WSP training is ωper is too large, the whole system may not converge successful. from Eq. (11). On the contrary, if it is too small, the 123 Memristor-based neural networks

Table 2 Truth table of three input parity function x1 x2 x3 Output

0000 0011 0101 0110 1001Fig. 9 The target output and the actual output of circuit simula- 1010tion for eight samples. The red and blue dots represent the actual output and the target output, respectively. (Color figure online) 1100 1111 classes of the dataset are encoded with three bits, and the details are shown in Table 3 in which Output-X1, Output-X2 and Output-X3 represent outputs of three neurons in the output layer, respectively. In terms of network parameters, the learning rate η is 0.02, and the significantly important parameter ωper is set to 0.001. The training set size is 120. Finally, the training error is shown in Fig. 10, and 30 test samples are selected to test the network performance. In Fig. 11, the target output and actual output are depicted. The MSE of each (a) bit in Fig. 11a–c are 0.0004, 0.0012 and 0.0006, respec- tively. The iterations trained with RWC algorithm are approximately 20,000 in Fig. 10c, while that of WSP are approximately 2,000 in Fig. 10b. The iterations of BP are about 1200 in [31, Fig. 8(b)]. Compared with BP algorithm, the iterations with WSP algorithm training are more. But comparing [31, Fig.4(b)] with Fig. 6,the circuit structure trained by WSP algorithm is simpler, because it does not involve complicated circuit opera- tions, such as derivative of the activation function and (b) error back propagation. These results indicate that the Fig. 8 Training error versus iterations curve. a The training error proposed scheme is practical. with algorithm. b The training error with circuit

5 Discussion 4.3 Iris classification The iterations of WSP, RWC and BP algorithms are Another more complex classification task, Iris clas- shown in Table 4 for three input odd parity function and sification, is conducted to test accuracy of the pro- Iris classification. WSP algorithm is an approximation posed memristor-based architecture trained by WSP to the algorithm, so its convergence algorithm. The Iris dataset [39] contains 150 patterns speed is smaller than that of the BP algorithm based which can be divided into three classes (Setosa, Ver- on Eq. (10). RWC algorithm is based on the Brownian sicolor, Virginica), and each pattern consists of four random motion equation, so the convergence speed is features which are normalized in this task. The sim- smaller than WSP algorithm based on Eq. (6). Although ulation is executed using the designed two-layer net- converging faster than RWC and WSP, BP algorithm work with four neurons in the hidden layer and three involves complex operations such as the derivative of neurons in the output layer. In this task, the activa- activation function and error back propagation. And in tion function is set as the sigmoid function. The three BP, the weight changes depend on gradient of the error 123 C. Wang et al.

Table 3 Coding list for Category Encoded output three classes Output-X1 Output-X2 Output-X3

Setosa 1 0 0 Versicolor 0 1 0 Virginica 0 0 1

0.4

0.3

0.2

0.1 Training Error

0 0 500 1000 1500 2000 Number of iterations (a)

0.4

0.3

0.2

Training Error 0.1

0 0 500 1000 1500 2000 Number of iterations (b) Fig. 11 The target output and the actual output of circuit sim- ulation for 30 test samples. a–c, respectively, are results of the output neurons Output-X1, Output-X2 and Output-X3. The red and blue dots represent the actual output and the target output, respectively. (Color figure online)

learning allows the solution to reach the global opti- mum. Furthermore, it has been shown that training with noise may improve the generalization capability of the network and reduce the negative effects of the weight variations after learning [40]. The advantages of WSP (c) with respect to the BP are the ability to avoid local minima of the error function and the low computational Fig. 10 Training error versus iterations curve. a The training complexity, which leads to a simplification in hardware error with algorithm. b The training error with circuit. c The training error with WRC algorithm implementation. function. When the error function has a great number of local minima, the neural network easily converges 6 Conclusion to local optimum. Many proposed solutions use a per- turbation inserted in the network in order to get out A memristor-based hardware implementation of the from local minima. The introduction of noise during neural network with on-chip learning is proposed. 123 Memristor-based neural networks

Table 4 The iterations of RWC, BP and WSP algorithms 4. Sengupta, A., Banerjee, A., Roy, K.: Hybrid spintronic- CMOS with on-chip learning: Algorithm Task devices, circuits, and systems. Phys. Rev. Appl. 6(6), 064003 Three input odd parity function Iris classification (2016) 5. Dlugosz, R., Kolasa, M., Pedrycz, W., Szulc, M.: Parallel RWC 10,000 20,000 programmable asynchronous neighborhood mechanism for BP 900 [40] 1200 [40] Kohonen SOM implemented in CMOS technology. IEEE Trans. Neural Netw. 22(12), 2091–2104 (2011) WSP 1000 2000 6. Wu, X., Saxena, V., Zhu, K., Balagopal, S.: A CMOS spik- ing neuron for brain-inspired neural networks with resistive synapses and in situ learning. IEEE Trans. Circuits Syst. II 62(11), 1088–1092 (2015) Moreover, WSP algorithm, a hardware-friendly algo- 7. Pan, C., Naeemi, A.: Non-Boolean computing benchmark- rithm, is adopted to train the neural network, which ing for beyond-CMOS devices based on cellular neural net- does not require error back propagation and calculation work. IEEE J. Explor. Solid-State Comput. Devices Circuits 2, 36–43 (2016) of the derivative of activation function. Only two feed- 8. Goknar, I.C., Yildiz, M., Minaei, S., Deniz, E.: Neural forward operations are needed to update all weights. CMOS-integrated circuit and its application to data classifi- Therefore, the circuit architecture is simpler and more cation. IEEE Trans. Neural Netw. Learn. Syst. 23(5), 717– compact compared with BP algorithm. By combin- 724 (2012) 9. Valov, I., Kozicki, M.: Non-volatile memories: organic ing nonvolatile memristors and WSP algorithm, a sim- memristors come of age. Nat. Mater. 16, 1170–1172 (2017) ple, compact and reliable hardware implementations of 10. Wang, Z., Joshi, S., Savel’ev, S.E., Jiang, H., Midya, R., Lin, neural networks is put in real-life practice. Two super- P., Wu, Q.: Memristors with diffusive dynamics as synaptic 16 vised learning tasks, three input odd parity function emulators for neuromorphic computing. Nat. Mater. (1), 101 (2017) and Iris classification, are conducted to test the accu- 11. Herrmann, E., Rush, A., Bailey, T., Jha, R.: Gate controlled racy of the network, and the correctness of the proposed three-terminal metal oxide memristor. IEEE Electron Device architecture is similar to its software counterpart, which Lett. 39(4), 500–503 (2018) 12. van de Burgt, Y., Lubberman, E., Fuller, E.J., Keene, S.T., opens an opportunity for large-scale online learning in Faria, G.C., Agarwal, S., Salleo, A.: A non-volatile organic MNNs. electrochemical device as a low-voltage artificial synapse for neuromorphic computing. Nat. Mater. 16(4), 414–418 Acknowledgements This work was supported in part by the (2017) National Natural Science Foundation of China (No. 61571185), 13. Gupta, I., Serb, A., Khiat, A., Zeitler, R., Vassanelli, S., the Natural Science Foundation of Hunan Province, China (No. Prodromakis, T.: Sub 100 nW volatile nano-metal-oxide 2016JJ2030) and the Open Fund Project of Key Laboratory in memristor as synaptic-like encoder of neuronal spikes. IEEE Hunan Universities (No. 15K027). Trans. Biomed. Circuits Syst. 12(2), 351–359 (2018) 14. Strukov, D.B., Snider, G.S., Stewart, D.R., Williams, R.S.: Compliance with ethical standards The missing memristor found. Nature 453(7191), 80 (2008) 15. Zhou, L., Wang, C., Zhou, L.: Generating hyperchaotic Conflict of interest The authors declare that they have no con- multi-wing attractor in a 4D memristive circuit. Nonlinear flict of interest. Dyn. 85(4), 2653–2663 (2016) 16. Zhou, L., Wang, C., Zhou, L.: A novel no-equilibrium hyper- chaotic multi-wing system via introducing memristor. Int. J. Circuit Theory Appl. 46(1), 84–98 (2018) 17. Panwar, N., Rajendran, B., Ganguly, U.: Arbitrary spike References time dependent plasticity (STDP) in memristor by analog waveform engineering. IEEE Electron Device Letters. 38(6), 1. Tao, F., Busso, C.: Gating neural network for large vocab- 740–743 (2017) ulary audiovisual speech recognition. IEEE/ACM Trans. 18. Cai, W., Ellinger, F., Tetzlaff, R.: Neuronal synapse as a Audio Speech Lang. Process. (TASLP) 26(7), 1286–1298 memristor: modeling pair- and triplet-based STDP rule. (2018) IEEE Trans. Biomed. Circuits Syst. 9(1), 87–95 (2015) 2. Tala´ska, T., Kolasa, M., Dlugosz, R., Pedrycz, W.: Analog 19. Nishitani, Y., Kaneko, Y., Ueda, M.: Supervised learn- programmable distance calculation circuit for winner takes ing using spike-timing-dependent plasticity of memristive all neural network realized in the CMOS technology. IEEE synapses. IEEE Trans. Neural Netw. Learn. Syst. 26(12), Trans. Neural Netw. Learn. Syst. 27(3), 661–673 (2016) 2999–3008 (2015) 3. Tala´ska, T., Kolasa, M., Dlugosz, R., Farine, P.A.: An effi- 20. Boyn, S., Grollier, J., Lecerf, G., Xu, B., Locatelli, N., Fusil, cient initialization mechanism of neurons for Winner Takes S., et al.: Learning through ferroelectric domain dynamics All Neural Network implemented in the CMOS technology. in solid-state synapses. Nat. Commun. 8, 14736 (2017) Appl. Math. Comput. 267, 119–138 (2015) 123 C. Wang et al.

21. Kheradpisheh, S.R., Ganjtabesh, M., Thorpe, S.J., Masque- 32. Adhikari, S.P., Kim, H., Budhathoki, R.K., Yang, C., Chua, lier, T.: STDP-based spiking deep convolutional neural net- L.O.: A circuit-based learning architecture for multilayer works for object recognition. Neural Netw. 99, 56–67 (2018) neural networks with memristor bridge synapses. IEEE 22. Sheri, A.M., Hwang, H., Jeon, M., Lee, B.G.: Neuromorphic Trans. Circuits Syst. I 62(1), 215–223 (2015) character recognition system with two PCMO memristors 33. Alspector, J., Meir, R., Yuhas, B., Jayakumar, A., Lippe, D.: as a synapse. IEEE Trans. Ind. Electron. 61(6), 2933–2941 A parallel gradient descent method for learning in analog (2014) VLSI neural networks. In: Advances in Neural Information 23. Legenstein, R., Naeger, C., Maass, W.: What can a neuron Processing Systems, pp. 836–844 (1993) learn with spike-timing-dependent plasticity? Neural Com- 34. Kvatinsky, S., Friedman, E.G., Kolodny, A., Weiser, U.C.: put. 17(11), 2337–2382 (2005) Team: threshold adaptive memristor model. IEEE Trans. 24. Alibart, F., Zamanidoost, E., Strukov, D.B.: Pattern classifi- Circuits Syst. I 60(1), 211–221 (2013) cation by memristive crossbar circuits using ex situ and in 35. Belli, M.R., Conti, M., Turchetti, C.: Analog Brownian situ training. Nat. Commun. 4, 2072 (2013) weight movement for learning of artificial neural networks. 25. Prezioso, M., Merrikh-Bayat, F., Hoskins, B.D., Adam, In: European Symposium on Artificial Neural Networks G.C., Likharev, K.K., Strukov, D.B.: Training and opera- (ESANN), pp. 19–21 (1995) tion of an integrated neuromorphic network based on metal- 36. Conti, M., Orcioni, S., Turchetti, C.: A new stochastic oxide memristors. Nature 521, 61–64 (2015) learning algorithm for analog hardware implementation. 26. Li, C., Belkin, D., Li, Y.,Yan, P., Hu, M., Ge, N., Song, W., et In: International Conference on Artificial Neural Networks al.: Efficient and self-adaptive in-situ learning in multilayer (ICANN), pp. 1171–1176 (1998) memristor neural networks. Nat. Commun. 9, 2385 (2018) 37. Kvatinsky, S., Satat, G., Wald, N., Friedman, E.G., Kolodny, 27. Bayat, F.M., Prezioso, M., Chakrabarti, B., Nili, H., Kataeva, A., Weiser, U.C.: Memristor-based material implication I., Strukov, D.: Implementation of net- (imply) logic: design principles and methodologies. IEEE work with highly uniform passive memristive crossbar cir- Trans. Very Large Scale Integr. Syst. 22(10), 2054–2066 cuits. Nat. Commun. 9, 2331 (2018) (2014) 28. Hu, X., Feng, G., Duan, S., Liu, L.: A memristive multilayer 38. Yang, C., Kim, H., Adhikari, S.P., Chua, L.O.: A circuit- cellular neural network with applications to image process- based neural network with hybrid learning of backpropaga- ing. IEEE Trans. Neural Netw. Learn. Syst. 28(8), 1889– tion and random weight change algorithms. Sensors 17,16 1901 (2016) (2016) 29. Zeng, X., Wen, S., Zeng, Z., Huang, T.: Design of memristor- 39. Bache, K., Lichman, M.: UCI machine learning repository. based image calculation in convolutional neural http://archive.ics.uci.edu/ml (2018) network. Neural Comput. Appl. 30(2), 503–508 (2018) 40. Murray, A.F., Edwards, P.J.: Enhanced MLP performance 30. Adhikari, S.P., Yang, C., Kim, H., Chua, L.O.: Memristor and fault tolerance resulting from synaptic weight noise dur- bridge synapse-based neural network and its learning. IEEE ing training. IEEE Trans. Neural Netw. 5(5), 792–802 (1994) Trans. Neural Netw. Learn. Syst. 23(9), 1426–1435 (2012) 31. Soudry, D., Di Castro, D., Gal, A., Kolodny, A., Kvatinsky, S.: Memristor-based multilayer neural networks with online gradient descent training. IEEE Trans. Neural Netw. Learn. Syst. 26(10), 2408–2421 (2015)

123