materials

Article Hybrid Circuit of Memristor and Complementary Metal-Oxide-Semiconductor for Defect-Tolerant Spatial Pooling with Boost-Factor Adjustment

Tien Van Nguyen , Khoa Van Pham and Kyeong-Sik Min *

School of Electrical Engineering, Kookmin University, Seoul 02707, Korea * Correspondence: [email protected]; Tel.: +82-2-910-4634

 Received: 17 June 2019; Accepted: 29 June 2019; Published: 1 July 2019 

Abstract: Hierarchical Temporal Memory (HTM) has been known as a software framework to model the brain’s neocortical operation. However, mimicking the brain’s neocortical operation by not software but hardware is more desirable, because the hardware can not only describe the neocortical operation, but can also employ the brain’s architectural advantages. To develop a hybrid circuit of memristor and Complementary Metal-Oxide-Semiconductor (CMOS) for realizing HTM’s spatial pooler (SP) by hardware, memristor defects such as stuck-at-faults and variations should be considered. For solving the defect problem, we first show that the boost-factor adjustment can make HTM’s SP defect-tolerant, because the false activation of defective columns are suppressed. Second, we propose a memristor-CMOS hybrid circuit with the boost-factor adjustment to realize this defect-tolerant SP by hardware. The proposed circuit does not rely on the conventional defect-aware mapping scheme, which cannot avoid the false activation of defective columns. For the Modified subset of National Institute of Standards and Technology (MNIST) vectors, the boost-factor adjusted crossbar with defects = 10% shows a rate loss of only ~0.6%, compared to the ideal crossbar with defects = 0%. On the contrary, the defect-aware mapping without the boost-factor adjustment demonstrates a significant rate loss of ~21.0%. The energy overhead of the boost-factor adjustment is only ~0.05% of the programming energy of memristor synapse crossbar.

Keywords: memristor-CMOS hybrid circuit; defect-tolerant spatial pooling; boost-factor adjustment; memristor crossbar; neuromorphic hardware

1. Introduction The human brain’s neocortex covers the brain’s surficial area, which is known to carry out the most intelligence functions. The thickness of neocortex has been observed as thin as 2.5 mm, where six layers are stacked one-by-one [1–3]. The six neocortical layers seem to be columnar, in which the complicated vertical and horizontal synaptic connections are intertwined among neurons to form the 3-dimensional neuronal architecture [4,5]. The neocortical neurons collectively respond to human’s sensory information from retina, cochlea, and olfactory organ [6]. The collective activation of neocortical neurons are trained over and over with respect to time, by changing the synaptic connection’s strength according to the sensory stimuli. The neuronal activation and synaptic plasticity can be thought of as a fundamental aspect of human perception and cognition, which are computed in a different way from the conventional Von Neumann machines. As a software framework, Hierarchical Temporal Memory (HTM) has been developed to model the cognitive functions of neocortex [7–11]. By doing so, HTM can recognize and interpret various spatiotemporal patterns, mimicking how the human brain’s neocortex understands human’s sensory stimuli. The software framework of HTM is divided into two functional blocks: Spatial Pooler (SP)

Materials 2019, 12, 2122; doi:10.3390/ma12132122 www.mdpi.com/journal/materials Materials 2019, 12, 2122 2 of 17

Materialsand Temporal2019, 9, x FOR Memory PEER REVIEW (TM). The role of SP is receiving and learning the sensory information.2 of In17 SP, the sensory information is transformed into the collective activation of neocortical neurons. From the thebiological sensory information experiments, is the transformed neocortical into neurons the colle havective been activation observed of to neocortical be activated neurons. sparsely, From not densely, the biologicalin response experiments, to human the sensory neocortical stimuli. neurons The sparse have activation been observed of neocortical to be neuronsactivated is sparsely, mathematically not densely,described in response as Sparse Distributedto human sensory Representation stimuli. (SDR) The insparse HTM activation [1]. After SP of learning neocortical the spatial neurons features is mathematicallyof the sensory described stimuli, TMas Sparse responds Distributed to the temporal Representation sequences (SDR) ofSDR in HTM patterns [1]. After generated SP learning from SP. theBy spatial learning features thetemporal of the sensory sequences stimuli, of SDRTM re patterns,sponds to TM the can temporal perform sequences recognition of SDR and patterns prediction generatedfor them. from SP. By learning the temporal sequences of SDR patterns, TM can perform recognition and predictionFigure1 afor shows them. a conceptual diagram of SP operation, where the input-space neurons are mapped toFigure the SP neurons1a shows [8 ].a Here,conceptual the input-space diagram andof SP SP operation, neurons refer where to the the neurons input-space of sensory neurons organ are and mappedneocortex, to the respectively. SP neurons The[8]. Here, sensory the stimuli input-space generated and SP from neurons the input-space refer to the neurons neurons are of connectedsensory organwith and the neocortex, neocortical respectively neurons, as. indicated The sensory in Figure stimuli1a. generated The lines betweenfrom thethe input-space input and neurons the SP spaces are connectedrepresent with the the synaptic neocortical connections. neurons, Synaptic as indicat weightsed in Figure of the 1a. connections The lines between are trained the accordinginput and to theHebbian SP spaces learning represent rule the in HTMsynaptic [8]. connections. If an SP neuron Synaptic becomes weights active, of the in response connections to an are input-space trained accordingstimulus, to the Hebbian synaptic learning weights rule belonging in HTM to [8]. this If neuron an SP areneuron strengthened, becomes andactive, weakened in response otherwise to an [ 8]. input-spaceThe circle stimulus, zone in the the SP synaptic space represents weights belonging a local inhibition to this neuron area, withinare strengthened, which only and few weakened neurons are otherwiseallowed [8]. to beThe active. circle zone In HTM, in the the SP sizespace of represents inhibition a zone local in inhibition the SP space area, can within be decidedwhich only to control few neuronsthe sparsity are allowed of neuronal to be activation.active. In HTM, It has the been size known of inhibition that the zone percentage in the SP of neuronalspace can activation be decided is as to sparsecontrol as the 2% sparsity on average of neuronal in the brain’s activation. neocortex. It has This been low known sparsity that of the neuronal percentage activation of neuronal may have activationsomething is as to dosparse with as high 2% energy-e on averagefficiency in the of neocorticalbrain’s neocortex. cognitive This operation. low sparsity of neuronal activation may have something to do with high energy-efficiency of neocortical cognitive operation.

Figure 1. a Figure 1. (a) (The) The conceptual conceptual diagram diagram of Spatial of Spatial Pooler Pooler (SP) (SP) operation, operation, where where the theinput-space input-space neurons neurons are mapped to the SP neurons; and (b) the comparison of the ideal crossbar without defects and the real are mapped to the SP neurons; and (b) the comparison of the ideal crossbar without defects and the crossbar with defects. LRS and HRS mean Low Resistance State and High Resistance State, respectively. real crossbar with defects. LRS and HRS mean Low Resistance State and High Resistance State, respectively. In the previous publications, we developed hybrid CMOS-memristor circuits for implementing HTM, which was developed as the software framework originally, as mentioned earlier [12,13]. In the previous publications, we developed hybrid CMOS-memristor circuits for implementing Memristors have been studied intensively for many years for their potential in neuromorphic hardware, HTM, which was developed as the software framework originally, as mentioned earlier [12,13]. Memristors have been studied intensively for many years for their potential in neuromorphic hardware, since the first experimental demonstration [14,15]. This is because the memristive behaviors seem very similar with the experimental synaptic plasticity observed from biological

Materials 2019, 12, 2122 3 of 17 since the first experimental demonstration [14,15]. This is because the memristive behaviors seem very similar with the experimental synaptic plasticity observed from biological neurons. From the biological experiments, the synaptic connections have been observed to be strengthened or weakened dynamically by electrical spiking signals applied to them [16]. Moreover, memristors can be fabricated to build 3-dimensional crossbar architecture using the CMOS-compatible Back-End-Of-Line (BEOL) process [17,18]. The 3-dimensional connectivity of memristor-synapses is very similar to the anatomical structure of the biological neocortex. In terms of cognitive functions, the memristor crossbar can perform vector-matrix multiplications in parallel, which can be considered very important in implementing energy-efficient computing like human brain’s cognition, unlike the state-of-the-art Von Neumann based computers [19,20]. One important thing to consider in the memristor crossbar is defects, as shown in Figure1b. In the real memristor crossbar, there are stuck-defects, such as stuck-at-0, stuck-at-1, etc. [21]. In addition, variation-related defects can also be considered, where each memristor can have different LRS and HRS values due to process variations [22]. Here, LRS and HRS mean Low Resistance State and High Resistance State, respectively. Figure1b compares the ideal crossbar (without defects) and the real one (with defects). The solid and open red circles with stars represent stuck-at-LRS and stuck-at-HRS defects, respectively. For the memristor defects such as stuck-at-faults and variations, these defects may be caused from the random of filamentary current path which can be formed or erased by the applied current and voltage to the memristor. The filamentary current path created or erased during the memristor programming can have statistical distributions like . Various statistical distributions by device-to-device, wafer-to-wafer, lot-to-lot, and process-to-process lead to the variations in memristance and stuck-at-faults [21]. To minimize a loss of recognition rate due to these memristor defects, we can consider the defect-tolerance scheme based on the conventional defect-aware mapping [21]. To explain the previous defect mapping scheme, the following logic function is assumed, f = X1X2 + X2X3 + X3X1 + /X1/X2/X3 is implemented in the crossbar [21]. In the logic function, /X1 means the inversion of X1. Figure2a shows the real memristor crossbar (with defects). Here, I1,I2, etc. represent input columns. O1,O2, etc. are output rows. The gray circle indicates a good memristor cell, which can be programmed with HRS or LRS. The solid and open red circles represent stuck-at-1 and stuck-at-0 defects, respectively. Figure2b shows the direct mapping without considering the defect map. P1,P2,P3, and P4 indicate the first, second, third, and fourth partial products in the target logic function. P1 calculates X1X2. However, P2 calculates X1X2X3, not X2X3 defined in the logic function, because of the stuck-at-1 fault on the crossing point between X1 and P2.P4 also calculates the wrong partial product. The stuck-at-0 fault is found at the crossing point between /X2 and P4. By doing so, P4 calculates /X1/X3 instead of the target product of /X1/X2/X3. Materials 2019, 12, 2122 4 of 17 Materials 2019, 9, x FOR PEER REVIEW 4 of 17

stuck-at-0 stuck-at-1 Logic function f=X1X2+X2X3+X3X1+/X1/X2/X3 good cell fault fault

I1 I2 I3 I4 I5 I6 X1 X2 X3 /X1 /X2 /X3 X1 X2 X3 /X1 /X2 /X3 O1 P1 P3

O2 P2 P1

O3 P3 P4

O4 P4 P2

Real crossbar with defects Direct mapping without considering defects Defect-aware mapping

(a) (b) (c)

Fabrication of memristor crossbar Fabrication

Post-fabrication Fabrication Test/diagnosis -> (defect map) Fabrication of configuration memristor crossbar (stauck faults, Defect-aware crossbar mapping and variations, etc) training

Post-fabrication On-line test/diagnosis -> Defect-tolerant & in-field (transient defect map) In-field configuration crossbar training (transient faults) without defect map configuration

Defect-aware crossbar mapping and training

(d) (e)

FigureFigure 2. (a) The2. (a real) The crossbar real crossbar with defects;with defects; (b) the (b direct) the direct mapping mapping of the of logic the logic function function without without considering the defectconsidering map; ( cthe) the defect defect-aware map; (c) the mappingdefect-aware of mapping the logic of functionthe logic function with considering with considering the defectthe type and location;defect type (d) theand flowchartlocation; (d) of the crossbar flowchart training of crossbar using training the conventionalusing the conventional defect-aware defect-aware mapping [21]; and (e)mapping the proposed [21]; and flowchart (e) the proposed of the defect-tolerant flowchart of the crossbar defect-t trainingolerant crossbar without training using thewithout defect using map. the defect map.

FigureFigure2c shows 2c shows the defect-awarethe defect-aware mapping, mapping, wherewhere the the defects defects can can be used be used in implementing in implementing the the logic functionlogic function according according to the to the defect defect type type and and location.location. To To do do so, so,the thecrossbar’s crossbar’s rows in rows Figure in 2c Figure are 2c are reorderedreordered to consider to consider the the defect defect type type and and location location in in calculating calculating the thepartial partial products. products. For example, For example, the firstthe row first in row Figure in Figure2c is 2c assigned is assigned to Pto 3P,3, not not PP11..P P1 1isis assigned assigned to the to thesecond second row to row calculate to calculate X1X2. X1X2. The stuck-at-1 fault on the second row can be used in calculating P1 = X1X2. Similarly, the stuck-at-1 The stuck-at-1 fault on the second row can be used in calculating P1 = X1X2. Similarly, the stuck-at-1 fault on P4 can be employed to calculate P4 = /X1/X2/X3. Moreover, the stuck-at-0 faults on P2 and P4 do fault on P4 can be employed to calculate P4 = /X1/X2/X3. Moreover, the stuck-at-0 faults on P2 and P4 not cause a wrong result for the calculation of partial products of P2 and P4. As shown in Figure 2c, do not causethe defects a wrong can be result employed for the in implementing calculation of the partial target productslogic function of P according2 and P4 .to As the shown defect intype Figure 2c, the defectsand location. can be employed However, the in implementingdefect-aware mapping the target scheme logic demands function very according complicated to circuits, the defect such type and location.as memory, However, , the defect-aware controller, etc., mapping to be implemented scheme demandsin hardware. very complicated circuits, such as memory, processor,Figure 2d controller,shows the etc.,flowchart to be implementedof crossbar training in hardware. using the conventional defect-aware Figuremapping.2d shows After fabricating the flowchart the memristor of crossbar crossbar, training the defect using map the should conventional be obtained defect-aware by measuring mapping. the crossbar. As a post-fabrication configuration, the trained synaptic weighs can be transferred to After fabricating the memristor crossbar, the defect map should be obtained by measuring the crossbar. the crossbar using the defect-aware mapping, as explained in Figure 2c. To do so, however, the As a post-fabricationcomplicated digital configuration, circuits, such as thememory, trained contro synapticller, processor, weighs etc., can are beneeded transferred for implementing to the crossbar using thethe defect-awaredefect-aware mapping mapping, in hardware, as explained as mentioned in Figure earlier.2c. To do so, however, the complicated digital circuits, suchNot as using memory, the defect-aware controller, mapping, processor, in this etc., paper, are needed we propose for implementinga simple memristor-CMOS the defect-aware mappinghybrid in hardware, circuit of defect-tolerant as mentioned spatial-pooling, earlier. which does not need the complicated circuits of Notmemory, using thecontroller, defect-aware processor, mapping, etc., as shown in this in Figure paper, 2e, we where, propose unlike a simple in Figure memristor-CMOS 2d, the crossbar’s hybrid defect map is not used. For developing the hybrid circuit of memristor-CMOS, we first show that the circuit ofspatial-pooling defect-tolerant based spatial-pooling, on Hebbian learning which can does be notdefect-tolerant, need the complicatedowing to the circuitsboost-factor of memory, controller,adjustment, processor, in Section etc., 2. as Additionally, shown in Figurewe propos2e,e where,a new memristor-CMOS unlike in Figure hybrid2d, circuit, the crossbar’s where the defect map is not used. For developing the hybrid circuit of memristor-CMOS, we first show that the spatial-pooling based on Hebbian learning can be defect-tolerant, owing to the boost-factor adjustment, in Section2. Additionally, we propose a new memristor-CMOS hybrid circuit, where the winner-take-all circuit is implemented not using occupying large area. In Section3, the proposed hybrid circuit is verified to be able to recognize well Modified subset of National Institute of Standards and Technology (MNIST) hand-written digits, in spite of memristor defects such as stuck-at-faults, Materials 2019, 12, 2122 5 of 17 variations, etc. In Section4, we discuss and compare the following three cases: (1) Spatial-pooling without both the boost-factor adjustment and the defect-aware mapping, (2) spatial-pooling with the defect-aware mapping, and (3) spatial pooling with the boost-factor adjustment, in terms of hardware implementation, energy consumption, and recognition rate. Finally, in Section5, we summarize this paper.

2. Materials and Methods To develop a memristor-CMOS hybrid circuit for realizing HTM’s SP function by hardware, memristor defects such as stuck-at-faults and variations should be considered. To consider the memristor defects in developing the hybrid circuit of the SP function, we explain the memristor fabrication and its behavioral model in the following sub-section of ‘a. Materials’. Then, we describe the boost-factor adjustment in HTM’s SP operation can make it defect-tolerant, because the false activation of defective columns in the crossbar are suppressed, in the sub-section of ‘b. Methods (scheme)’. In the sub-section of ‘c. Method (circuit)’, we propose the memristor-CMOS hybrid circuit by explaining its schematic and operation in detail. The hybrid circuit with the boost-factor adjustment is discussed and compared with the previous techniques without the boost-factor adjustment, later in this paper. The simulation result and comparison indicates that the memristor-CMOS hybrid circuit with the boost-factor adjustment can improve the recognition rate by more than ~20%, than the previous defect-map-based technique. This hybrid circuit can be very useful for energy-efficient computing in future IoT systems, where many IoT sensors are connected to a cloud of centralized data processing, as explained later. a. Materials Figure3a shows a cross-sectional view of the fabricated memristor in this paper. The fabricated memristor has a film structure made of a Pt/LaAlO3/Nb-doped SrTiO3 stacked [23]. A microscope picture of the measured device is shown in Figure3b, where the top area is 100 µm 100 µm [24]. The top and bottom were formed by (Pt) and SrTiO3, in the × measured device, respectively [23]. Figure3c shows current–voltage relationships of the fabricated memristor and the Verilog-A model, respectively [23]. The measurement was performed by Keithley-4200 (Semiconductor Characterization System, Tektronix, Inc., Beaverton, OR, USA) [23]. Here, the HRS/LRS ratio in Figure3c was observed as large as 100. The black and red lines in Figure3c represent the behavioral model of memristors and the measured data, respectively. The behavioral model described by Verilog-A in Figure3c was used in the circuitMaterials simulation 2019, 9, x FOR of PEER the REVIEW memristor-CMOS hybrid circuit in this paper. 6 of 17

Figure 3.Figure(a) The 3. ( cross-sectionala) The cross-sectional view view of the of measuredthe measured memristor memristor [ 23[23];]; ((bb)) the microscope microscope picture picture of of the measuredthe memristor measured [24memristor]; and ( c)[24]; the memristor’sand (c) the memristor’s current–voltage current–voltage relationships relationships of the measurement of the and Verilog-Ameasurement model and [23 ].Verilog-A model [23].

b. Methods (scheme): boost-factor adjustment scheme for defect-tolerant spatial-pooling The spatial-pooling in HTM software framework is composed of initialization, overlap computation, inhibition, and learning, as shown in Figure 4a [8,12]. After the initialization step (Phase 1), three steps: Overlap computation (Phase 2), inhibition (Phase 3), and learning (Phase 4), are repeated sequentially [8,12]. In Phase 1, random sets of inputs are selected from the input space, as indicated in Figure 1a. The number of random sets of inputs per training vector is the same with the number of crossbar’s columns. Each input in this random set can be connected to an output neuron in the SP via a synapse [8,12]. In Phase 2, an amount of overlap of each output neuron with the chosen set of inputs from the input space is calculated [8,12] The amount of overlap of each neuron in the SP can be calculated with the number of the connected synapses with the active inputs, multiplied by each column’s boost factor. In Phase 3, we decide which columns can be winners within the inhibition radius [8,12]. By doing so, the sparsity regarding the percentage of activation in the neocortical neurons can be controlled to not exceed a certain limit. In the case of the human brain’s neocortex, only 2% of neocortical neurons have been observed to be activated in response to human sensory stimuli. In Phase 4, Hebbian learning is performed to strengthen and weaken synaptic connections [8,12]. For the winners chosen in Phase 3, the synaptic permanence values for the active inputs are increased by p+. For the inactivate inputs, the permanence values are decreased by p-. p+ and p- represent the increment and decrement of synaptic permanence, respectively. The permanence value is allowed to vary between 0 and 1. If it reaches 1 or 0, the synaptic weight is changed to LRS or HRS, respectively.

Materials 2019, 12, 2122 6 of 17

b. Methods (scheme): boost-factor adjustment scheme for defect-tolerant spatial-pooling The spatial-pooling in HTM software framework is composed of initialization, overlap computation, inhibition, and learning, as shown in Figure4a [ 8,12]. After the initialization step (Phase 1), three steps: Overlap computation (Phase 2), inhibition (Phase 3), and learning (Phase 4), are repeated sequentially [8,12]. In Phase 1, random sets of inputs are selected from the input space, as indicated in Figure1a. The number of random sets of inputs per training vector is the same with the number of crossbar’s columns. Each input in this random set can be connected to an output neuron in the SP via a synapse [8,12]. In Phase 2, an amount of overlap of each output neuron with the chosen set of inputs from the input space is calculated [8,12] The amount of overlap of each neuron in the SP can be calculated with the number of the connected synapses with the active inputs, multiplied by each column’s boost factor. In Phase 3, we decide which columns can be winners within the inhibition radius [8,12]. By doing so, the sparsity regarding the percentage of activation in the neocortical neurons can be controlled to not exceed a certain limit. In the case of the human brain’s neocortex, only 2% of neocortical neurons have been observed to be activated in response to human sensory stimuli. In Phase 4, Hebbian learning is performed to strengthen and weaken synaptic connections [8,12]. For the winners chosen in Phase 3, the synaptic permanence values for the active inputs are increased by p+. For the inactivate inputs, the permanence values are decreased by p-. p+ and p- represent the increment and decrement of synaptic permanence, respectively. The permanence value is allowed to vary between 0 and 1. If it reaches 1 or 0, the synaptic weight is changed to LRS or

MaterialsHRS, 2019 respectively., 9, x FOR PEER REVIEW 7 of 17

Figure 4. (a) The spatial-pooling algorithm composed of initialization, overlap computation, inhibition, Figure 4. (a) The spatial-pooling algorithm composed of initialization, overlap computation, and learning [8,12]; and (b) the defect map of a memristor crossbar with 10% random defects. Here the inhibition, and learning [8,12]; and (b) the defect map of a memristor crossbar with 10% random numbers of rows and columns of the crossbar are 400 and 256, respectively. The random defects are defects. Here the numbers of rows and columns of the crossbar are 400 and 256, respectively. The stuck-at-LRS and stuck-HRS defects. random defects are stuck-at-LRS and stuck-HRS defects. One more parameter needed to be updated after the activation of each neuron is a boost factor. One more parameter needed to be updated after the activation of each neuron is a boost factor. The boost factor can be defined with the following inverse relationship with the activity ratio [8]: The boost factor can be defined with the following inverse relationship with the activity ratio [8]: β(a a ) b= ,e i i,neighbor (1) 𝑏 =𝑒 i − −h i (1)

Here, bi means the boost factor of column, i. β is a positive parameter that controls the strength Here, bi means the boost factor of column, i. β is a positive parameter that controls the strength of of the adaptation effect. ai is the activity ratio of column i, and < ai, neighbor > means the average activity the adaptation effect. ai is the activity ratio of column i, and < ai, neighbor > means the average activity ratio of the column’s neighborhood. For given M test vectors, the activity ratio of column, i, can be calculated with 1 (2) 𝑎 = 𝑎𝑐𝑡𝑖𝑣𝑎𝑡𝑖𝑜𝑛𝑐𝑜𝑙𝑢𝑚𝑛, 𝑖. 𝑀

Here, ai is the activity ratio of column i. M is the number of test vectors. The defined with ‘activation(column, i)’ in Equation (2) becomes one, if the column, i, is activated. If the column is not activated, the activation function should be zero. For a neuron activated very frequently, its boost factor should be adjusted to be very small to lower the probability of activation. On the contrary, if a neuron is chosen very rarely, its boost factor should be increased. As explained just earlier, by adjusting each column’s boost factor according to each column’s activity ratio, the number of activations can be distributed more evenly for all columns in the crossbar. We now discuss how each column’s activity ratio can be affected by memristor defects. Figure 4b shows a defect map of 400 x 256 memristor crossbar. Here, we assume random defects = 10% in the crossbar. The random defects can be stuck-at-HRS and stuck-at-LRS. Because the HRS defects do not cause erroneous activation of neurons, we focus on the LRS defects here. In Figure 5a, the number of defects per column is ranked from the largest to the smallest. The number of columns in the crossbar is assumed to be 256 in Figure 5a. Each column is assumed to have 400 cells. Among the 400 cells per column, the most defective column has almost ~90 defects. The smallest number of defects per column is ~0. Figure 5b and c compare the simulated boost factors of the crossbars without and with the boost-factor adjustment, respectively. In Figure 5b, all 256 columns have the same boost factor, fixed by 50. On the contrary, in Figure 5c, each column’s boost factor is adjusted between 0 and 100, according to each column’s activity ratio.

Materials 2019, 12, 2122 7 of 17 ratio of the column’s neighborhood. For given M test vectors, the activity ratio of column, i, can be calculated with M 1 X a = (activation(column, i)). (2) i M j=1

Here, ai is the activity ratio of column i. M is the number of test vectors. The activation function defined with ‘activation(column, i)’ in Equation (2) becomes one, if the column, i, is activated. If the column is not activated, the activation function should be zero. For a neuron activated very frequently, its boost factor should be adjusted to be very small to lower the probability of activation. On the contrary, if a neuron is chosen very rarely, its boost factor should be increased. As explained just earlier, by adjusting each column’s boost factor according to each column’s activity ratio, the number of activations can be distributed more evenly for all columns in the crossbar. We now discuss how each column’s activity ratio can be affected by memristor defects. Figure4b shows a defect map of 400 256 memristor crossbar. Here, we assume random defects = 10% × in the crossbar. The random defects can be stuck-at-HRS and stuck-at-LRS. Because the HRS defects do not cause erroneous activation of neurons, we focus on the LRS defects here. In Figure5a, the number of defects per column is ranked from the largest to the smallest. The number of columns in the crossbar is assumed to be 256 in Figure5a. Each column is assumed to have 400 cells. Among the 400 cells per column, the most defective column has almost ~90 defects. The smallest number of defects per column is ~0. Figure5b and c compare the simulated boost factors of the crossbars without and with the boost-factor adjustment, respectively. In Figure5b, all 256 columns have the same boost factor, fixed by 50. On the contrary, in Figure5c, each column’s boost factor is adjusted between 0 and 100, according to each column’s activity ratio. Figure5d and e compare the activity ratios of the crossbars without and with the boost-factor adjustment. Here, each column’s activity ratio is shown on the y-axis with respect to the ranked column number according to the number of defects. Figure5d clearly indicates that a large number of defects in a defective column causes frequent activation of the column. The small number of defects results in the rare activation of the column. The frequent activation due to the defective column is very likely to be false and should be suppressed not to happen. Figure5e shows that the frequent activation of the defective columns can be suppressed, by decreasing the boost factor of the defective columns lower than the neighbors. By doing so, we can reduce the false activation of the defective columns. Thus, the recognition rate loss due to the defective columns can be minimized by the boost-factor adjustment. In Figure5f, the crossbar’s entropy is compared without and with the boost-factor adjustment. The entropy of the crossbar with N columns is calculated with Equation (3) [8].

XN h i Entropy = a log a (1 a ) log (1 a ) (3) − i 2 i − − i 2 − i i=1

In Equation (3), ‘Entropy’ means the calculated amount of entropy. N is the number of columns in the crossbar. ai is the activity ratio of column i. ‘log’ means the logarithmic function. Figure5f indicates that the crossbar with the boost-factor adjustment shows much larger entropy than the crossbar without the boost-factor adjustment. The better entropy can result in a better recognition rate, as shown later in this paper. Materials 2019, 9, x FOR PEER REVIEW 8 of 17

Figure 5d and e compare the activity ratios of the crossbars without and with the boost-factor adjustment. Here, each column’s activity ratio is shown on the y-axis with respect to the ranked column number according to the number of defects. Figure 5d clearly indicates that a large number of defects in a defective column causes frequent activation of the column. The small number of defects results in the rare activation of the column. The frequent activation due to the defective column is very likely to be false and should be suppressed not to happen. Figure 5e shows that the frequent activation of the defective columns can be suppressed, by decreasing the boost factor of the defective columns lower than the neighbors. By doing so, we can reduce the false activation of the defective columns. Thus, the recognition rate loss due to the defective columns can be minimized by the boost- factor adjustment. In Figure 5f, the crossbar’s entropy is compared without and with the boost-factor adjustment. The entropy of the crossbar with N columns is calculated with Equation (3) [8]. (3) Entropy =−𝑎 log 𝑎 − 1−𝑎 log1−𝑎 In Equation (3), ‘Entropy’ means the calculated amount of entropy. N is the number of columns in the crossbar. ai is the activity ratio of column i. ‘log’ means the logarithmic function. Figure 5f indicates that the crossbar with the boost-factor adjustment shows much larger entropy than the Materialscrossbar2019 ,without12, 2122 the boost-factor adjustment. The better entropy can result in a better recognition8 of 17 rate, as shown later in this paper.

100 80

60 40 20

per column 0 50 100 150 200 250 (a)

Number of defects More Less Ranked column # defects defects 100 100 80 Without boost-factor adjustment 80 With boost-actor adjustment 60 60 40 40 20 20 0 0 50 100 150 200 250(b) 50 100 150 200 250 (c) Boost factor(a.u) Boost Boost Boost factor(a.u) More Less More Less Ranked column # Ranked column # defects defects defects defects

100 Without boost-factor adjustment 100 With boost-factor adjustment 10-1 10-1 10-2 10-2 -3 -3 10 (d) 10

50 100 150 200 250 ratio(a.u) Activity 50 100 150 200 250 (e) Activity ratio(a.u) Activity More Less More Less Ranked column # Ranked column # defects defects defects defects 0.15 0.10 0.05 0.00 Entropy (bits) Without boost-factor With boost-factor (f) adjustment adjustment

Figure 5. (a) The number of defects per column ranked from largest (left) to smallest (right); (b) the simulated Figure 5. (a) The number of defects per column ranked from largest (left) to smallest (right); (b) the boost factor of the crossbar without the boost-factor adjustment; (c) the simulated boost factor of the crossbar simulated boost factor of the crossbar without the boost-factor adjustment; (c) the simulated boost with the boost-factor adjustment; (d) the simulated activity ratio of the crossbar without the boost-factor factor of the crossbar with the boost-factor adjustment; (d) the simulated activity ratio of the crossbar adjustment; (e) the simulated activity ratio of the crossbar with the boost-factor adjustment; and (f) the without the boost-factor adjustment; (e) the simulated activity ratio of the crossbar with the boost- comparison of crossbar entropy without and with the boost-factor adjustment. factor adjustment; and (f) the comparison of crossbar entropy without and with the boost-factor adjustment. c. Methods (circuit): memristor-CMOS hybrid circuit of defect-tolerant spatial pooling c. MethodsFigure6a (circuit): shows a memristor-CMOS schematic of the hybrid memristor-CMOS circuit of defect-tolerant hybrid circuit spatial of defect-tolerant pooling spatial pooling,Figure where 6a eachshows column’s a schematic boost of the factor memristor-CM can be adjustedOS hybrid to make circuit each of defect-tolerant column’s activity spatial ratio morepooling, even. where The memristoreach column’s crossbar boost factor is composed can be adjusted of 400 rowsto make and ea 256ch column’s columns activity for recognizing ratio more the MNIST hand-written digits. The 400 rows can receive 20 20 input pixels of each MNIST test vector. × The 256 columns correspond to the 256 output neurons of the SP. In Figure6a, X 0 and X1 are the first and second row, respectively. g0,0 means memristor’s conductance of row #0 and column #0. Similarly, g1,0 means memristor’s conductance of row, #1, and column, #0. I0 and I1 represent the currents of columns, #0 and #1, respectively. I0 and I1 enter the current–voltage converters of B0 and B1, respectively, where each column’s boost factor can be adjusted according to each column’s activity ratio. Here, V0 and V1 are the converted voltages of columns #0 and #1, respectively. The converted voltages, V0 and V1, enter the comparators of C0 and C1, respectively, where V0 and V1 are compared with VREF.VREF is obtained from the maximum output voltage among the neighbors using the -connected of M0,M1, etc. If we assume the diode ‘ON’ voltage is very small, VREF can be very similar with the maximum voltage among all the output voltages such as V0,V1, etc. If V0 or V1 is very close to VREF, then column #0 or column #1 will be activated as a winner, inhibiting the neighboring columns from being activated. One thing to note here is that the VREF for selecting the winner columns is obtained dynamically by extracting the largest output voltage among the neighbors. If a new input vector is applied to the crossbar, the output voltages are changed, too. Thus, we can obtain a new maximum voltage for the new input vector dynamically. Comparing each column’s output voltage with the new maximum, we can choose the next winner columns that are very close to the new maximum voltage. Y0 and Y1 refer to the output SDR bits for columns #0 and #1, respectively. The circuit proposed in this paper does not use any for realizing the winner-take-all function, as shown in Figure6a. This is different from the previous publications [ 12,20], where the capacitor was used to integrate the column current over time to accumulate the charge. The accumulated charge can Materials 2019, 12, 2122 9 of 17 be represented by the capacitor’s voltage. If one column’s voltage reaches a certain level at the earliest time, then that column is chosen as the winner [12,20]. Instead of capacitors occupying a very large area, the diode-connected MOSFETs are used here to obtain the maximum voltage among all output voltages, as shown in Figure6a. The winning column can be chosen by comparing each column’s output voltage with the maximum voltage extracted from the diode-connected MOSFETs. The diode-connected MOSFETs in Figure6a can occupy a much smaller area than the capacitors used in the previous publications [12,20]. One problem with the winner-take-all circuit using the diode-connected MOSFETs in Figure6a is that the winner may be multiple, not single, in some cases. To investigate the number of winning columns per input vector, the statistical sparsity distribution is compared between the previous winner-take-all and the proposed circuit in Figure6a. To do so, the average and variance of sparsity distribution are calculated for the previous winner-take-all and the proposed circuit in Figure6a. The average sparsity of the previous winner-take-all and the proposed circuit in Figure6a, are 2.2% and 2.3%, respectively. The calculated variance values are 0.09 and 0.11, respectively. The small difference in variance between the previous and proposed indicates that the winner-take-all can be implemented with the diode-connected MOSFETs and voltage comparators, not using the capacitors occupying a very large area. Figure6b shows a detailed schematic of the current-to-voltage converter, B 0, with the boost-factor adjustment. OP1 and OP2 are OP amplifiers. The column current, I0, goes though R1. The node voltage, N , becomes–I R . The converted voltage from I is given to R . Thereby, the current 1 0 × 1 0 2 though R2 goes to Mb,0 and is finally converted to V0.V0 is the output voltage of the current–voltage converter. Here, we used R1 = 5 kΩ and R2 = 100 MΩ, respectively. For the boost-factor adjustment, Mb,0 should be changed according to the activity ratio of column #0, with respect to the activity ratios of the neighbors. S1,S2, and S3 are the controlled by SW0. SW0 is applied by the boost-factor adjustment controller. VP means the memristor programming pulse. VP is applied to Mb,0, through S1 and S2, to change the memristor’s conductance. VP is applied to the boost-factor memristor for the boost-factor adjustment, when SW0 is high. On the contrary, when SW0 is low, S3 becomes ‘ON’ and V0 is compared with the other output voltages such as V1,V2, etc. Figure6c shows the operational diagram of the proposed memristor-CMOS hybrid circuit illustrated in Figure6a,b. As indicated in Figure6c, the crossbar performs the overlap calculation, in which the input voltage is multiplied with the memristor’s conductance. Then, each column’s current can be calculated by summating all the cell currents belonging to the column. The column current enters the current-to-voltage converter. The converted voltage from each column is delivered to the winner-take-all, where the winning column is chosen. Based on the winning column, the learning controller adjusts each column’s boost factor and the permanence values of the cells belonging to the column, according to Hebbian rule. The steps indicated in the operational diagram in Figure6c are repeated again, when a new input vector is applied to the crossbar. Materials 2019, 9, x FOR PEER REVIEW 10 of 17

Figure 6c shows the operational diagram of the proposed memristor-CMOS hybrid circuit illustrated in Figures 6a and b. As indicated in Figure 6c, the crossbar performs the overlap calculation, in which the input voltage is multiplied with the memristor’s conductance. Then, each column’s current can be calculated by summating all the cell currents belonging to the column. The column current enters the current-to-voltage converter. The converted voltage from each column is delivered to the winner-take-all, where the winning column is chosen. Based on the winning column, the learning controller adjusts each column’s boost factor and the permanence values of the cells Materials 2019, 12, 2122 10 of 17 belonging to the column, according to Hebbian rule. The steps indicated in the operational diagram in Figure 6c are repeated again, when a new input vector is applied to the crossbar. I-V converter with boost-factor adjustment

Crossbar for overlap calculation I0 B0 X0 LRS g0,0 R1 OP1 X1 HRS N1 g1,0 R2 S1 N2 Xj Mb,0 S2 VP

OP2 S3 X399 SW0

V0 (b) I0 I1 I255

B0 B1 Overlap calculation by crossbar I-V converter V0 V1

M0 M1 M255 Winner-take- Current-to-voltage conversion all circuit VREF

C0 C1 C 255 Inhibition by winner-take-all Comparator Y0 Y1 Y255 Learning by changing boost- SDR factor and permanence

(a) (c)

FigureFigure 6.6.(a )(a The) The detailed detailed schematic schematic of the memristor-CMOS of the memris (Complementarytor-CMOS (Complementary Metal-Oxide-Semiconductor) Metal-Oxide- hybridSemiconductor) circuit of defect-tolerant hybrid circuit spatialof defect-tolerant pooling. The spat hybridial pooling. circuit is The composed hybrid circuit of thememristor is composed crossbar, of the thememristor current–to-voltage crossbar, the (I–to-V) current–to-voltage converters with (I–to-V) the boost-factorconverters with adjustment, the boost-factor and the adjustment, winner-take-all and circuitthe winner-take-all with the diode-connected circuit with Metal-Oxide-Semiconductor the diode-connected Metal-Oxide-Semiconductor Field-Effect (MOSFETs) Field-Effect and comparators;Transistors (MOSFETs) (b) the detailed and schematiccomparators; of the (b voltage-converter) the detailed schematic circuit with of the the voltage-converter boost-factor adjustment; circuit andwith (c )the the boost-factor operational diagramadjustment; of the and memristor-CMOS (c) the operational hybrid diagram circuit. of the memristor-CMOS hybrid circuit. The simulated waveforms in Figure7 demonstrate the operation of the proposed memristor-CMOS hybrid circuit with the boost-factor adjustment. Here, the circuit simulation was performed using The simulated waveforms in Figure 7 demonstrate the operation of the proposed memristor- CADENCE SPECTRE (Cadence Design Systems, Inc., San Jose, CA, USA) and 0.13-µm CMOS hybrid circuit with the boost-factor adjustment. Here, the circuit simulation was performed circuit simulation parameters [25]. The mathematical equations of the Verilog-A model of memristors using CADENCE SPECTRE (Cadence Design Systems, Inc., San Jose, CA, USA) and SAMSUNG 0.13- used in the circuit simulation were explained in-detail in a previous publication [23]. In the simulation, µm circuit simulation parameters [25]. The mathematical equations of the Verilog-A model of we assumed the memristor crossbar of SP with 400 rows and 256 columns. The number of synaptic memristors used in the circuit simulation were explained in-detail in a previous publication [23]. In memristors per column is 25 among 400 cells. It means the 25 cells can be activated at the maximum the simulation, we assumed the memristor crossbar of SP with 400 rows and 256 columns. The among 400 cells per column in the crossbar. The increment and decrement of permanence are +0.01 number of synaptic memristors per column is 25 among 400 cells. It means the 25 cells can be and 0.01, respectively. The initial permanence values are assumed to be random between 0 and 1. activated− at the maximum among 400 cells per column in the crossbar. The increment and decrement The minimum amount of overlap between the input-space and spatial-pooler space can start from zero. The amount of overlap can be calculated by multiplying the input voltages with the memristor synaptic weights. The size of inhibition circuit zone is 64 columns in the crossbar. The number of winning columns is allowed not to exceed 2 among 64 columns. Thereby, the sparsity in Figure6a can be controlled within 2%, as the brain’s neocortex does. In Figure7, during the crossbar training time, each column’s activity ratio is calculated by counting the number of activation of each column. If column #0 becomes activated, Y0 becomes high. Similarly, Materials 2019, 9, x FOR PEER REVIEW 11 of 17

of permanence are +0.01 and −0.01, respectively. The initial permanence values are assumed to be random between 0 and 1. The minimum amount of overlap between the input-space and spatial- pooler space can start from zero. The amount of overlap can be calculated by multiplying the input voltages with the memristor synaptic weights. The size of inhibition circuit zone is 64 columns in the crossbar. The number of winning columns is allowed not to exceed 2 among 64 columns. Thereby, Materialsthe sparsity2019, 12 in, 2122 Figure 6a can be controlled within 2%, as the brain’s neocortex does. 11 of 17 In Figure 7, during the crossbar training time, each column’s activity ratio is calculated by counting the number of activation of each column. If column #0 becomes activated, Y0 becomes high. if column #1 becomes activated, Y1 is high. After the crossbar training time, the boost factor can be Similarly, if column #1 becomes activated, Y1 is high. After the crossbar training time, the boost factor adjusted according to each column’s activity ratio, as described in Equation (1). The more frequent can be adjusted according to each column’s activity ratio, as described in Equation (1). The more activation of column #0 leads to decrease the boost factor more. The less frequent activation of column frequent activation of column #0 leads to decrease the boost factor more. The less frequent activation #1 reduces the boost factor little, as shown in Figure7. For adjusting the boost factors of columns #0 of column #1 reduces the boost factor little, as shown in Figure 7. For adjusting the boost factors of and #1, the pulse widths of SW0 and SW1, respectively, are modulated. Mb,0 can be decreased more columns #0 and #1, the pulse widths of SW0 and SW1, respectively, are modulated. Mb,0 can be than Mb,1 by many programming pulses of VP, because SW0 is high for a longer time than SW1. On the decreased more than Mb,1 by many programming pulses of VP, because SW0 is high for a longer time contrary, Mb,1 is changed little, due to the fact that SW1 is high only for a very short time. The pulse than SW1. On the contrary, Mb,1 is changed little, due to the fact that SW1 is high only for a very short modulation of SW0 and SW1 can be controlled very easily by counting the number of activations of time. The pulse modulation of SW0 and SW1 can be controlled very easily by counting the number of each column during the crossbar training time. activations of each column during the crossbar training time.

Crossbar Boost-factor Training time Adjusting time

Y0

Y1

SW0

SW

1

Vp

Mb,0

Mb,1

0 10 20 30 40 50 60 Time(a.u)

Figure 7. The simulated waveforms of the proposed memristor-CMOS hybrid circuit with the boost-factor adjustment. Figure 7. The simulated waveforms of the proposed memristor-CMOS hybrid circuit with the boost- 3. Simulationfactor adjustment. Results

3. SimulationFor calculating results the recognition rate, we tested MNIST vectors [26,27] with the proposed memristor- CMOS hybrid circuit. To reflect the real crossbar with non-ideal effects, we considered source resistance, neuronFor resistance, calculating resistance,the recognition etc., in therate, recognition-rate we tested MNIST simulation vectors [22]. Figure[26,27]8 awith shows the a schematicproposed memristor-CMOS hybrid circuit. To reflect the real crossbar with non-ideal effects, we considered of memristor crossbar that includes these non-ideal parasitic effects. Here, RS and RN represent source source resistance, neuron resistance, wire resistance, etc., in the recognition-rate simulation [22]. resistance and neuron resistance, respectively [2]. RW represents wire resistance from metal layers. In the Figure 8a shows a schematic of memristor crossbar that includes these non-ideal parasitic effects. non-ideal crossbar, RN and RS are assumed to be 0.27% of HRS and 0.067% of HRS, respectively [22]. RW is Here, RS and RN represent source resistance and neuron resistance, respectively [2]. RW represents assumed to be ~1Ω per cell in this paper. These RS and RN, which are 0.27% of HRS and 0.067% of HRS, respectively,wire resistance are thefrom worst-case metal layers. values In of the the non-ideal source and crossbar, neuron resistanceRN and RS observed are assumed from to the be fabricated 0.27% of HRS and 0.067% of HRS, respectively [22]. RW is assumed to be ~1Ω per cell in this paper. These RS real crossbars [22]. In Figure8a, V 0,V1, and Vn represent the input voltages. I0,I1, and Im represent the columnand RN currents., which are 0.27% of HRS and 0.067% of HRS, respectively, are the worst-case values of the sourceWe and now neuron explain resistance the crossbar observed architecture from forthe recognizingfabricated real the crossbars MNIST vectors. [22]. In Here, Figure the 8a, number V0, V1, and Vn represent the input voltages. I0, I1, and Im represent the column currents. of rows in the crossbar should be 400, which should be the same with the number of input voltages. We now explain the crossbar architecture for recognizing the MNIST vectors. Here, the number Each MNIST vector is composed of 20 20 = 400 pixels. Thus, the number of input voltages is of rows in the crossbar should be 400, which× should be the same with the number of input voltages. 400 for recognizing the MNIST vector. For the number of columns of the SP crossbar, 256, 1024, Each MNIST vector is composed of 20 × 20 = 400 pixels. Thus, the number of input voltages is 400 for and 4096 columns are used in Figure8b, c, and d, respectively. It is known that having more SP columns can result in a better recognition rate [12]. This is because each SP column can store a specific feature of tested vectors. If the number of SP columns becomes larger, then more features can be stored in the columns. Thereby, the recognition rate for the tested images can be improved with increasing the number of SP columns. The number of SP columns = 256 is the same condition for the memristor-implemented Convolutional Neural Network, where the testing image has 20 20 pixels, × Materials 2019, 9, x FOR PEER REVIEW 12 of 17 recognizing the MNIST vector. For the number of columns of the SP crossbar, 256, 1024, and 4096 columns are used in Figure 8b, c, and d, respectively. It is known that having more SP columns can result in a better recognition rate [12]. This is because each SP column can store a specific feature of tested vectors. If the number of SP columns becomes larger, then more features can be stored in the columns.Materials 2019 Thereby,, 12, 2122 the recognition rate for the tested images can be improved with increasing12 the of 17 number of SP columns. The number of SP columns = 256 is the same condition for the memristor- implemented Convolutional Neural Network, where the testing image has 20 x 20 pixels, the kernel sizethe kernelis 5 x 5, size and is the 5 number5, and theof kernels number = of1. kernelsSimilarly,= 1.the Similarly, number theof SP number columns of SP= 1024 columns is the= same1024 × conditionis the same of conditionConvolutional of Convolutional Neural Network, Neural with Network, 20 x 20 withimage, 20 5 x20 5 image,kernel, 5 and5 the kernel, number and theof × × kernelsnumber = of4. kernelsThe number= 4. The of numberSP columns of SP = columns4096 is =the4096 same is thecondition same condition of Convolutional of Convolutional Neural Network,Neural Network, with 20 xwith 20 image, 20 20 5 imagex 5 kernel,, 5 5and kernel, the number and the numberof kernels of = kernels 16. In this= 16. simulation, In this simulation, we did × × notwe simulate did not simulate the crossbars the crossbars with SP with columns SP columns more than more 4096, than 4096,because because we do we not do use not usethe number the number of kernelsof kernels more more than than 16 for 16 forrecognizing recognizing the theMNSI MNSITT vectors, vectors, in Convolutional in Convolutional Neural Neural Network. Network.

RS Rw Rw Rw Rw V0 100 ... R w Rw Rw Rw 80 RS Rw Rw Rw Rw V1

... R Rw Rw ... Rw w 60

RS Rw Rw Rw Vn 40 R # of SP columns = 256 HRS w Rw Rw Rw ... Recognition rate (%) 20 Without boost-factor adjustment c LRS RN RN RN RN With boost-factor adjustment

I0 I1 Im 0 0 5 10 15 20 RS = 0.27%*HRS RN = 0.067%*HRS Percentage of defects in crossbar (%) (a) (b)

100 100

80 80

60 60

40 40 # of SP columns = 1024 # of SP columns = 4096 Recognition Recognition rate (%) Recognition rate (%) 20 Without boost-factor adjustment 20 Without boost-factor adjustment With boost-factor adjustment With boost-factor adjustment 0 0 0 5 10 15 20 0 5 10 15 20 Percentage of defects in crossbar (%) Percentage of defects in crossbar (%) (c) (d)

FigureFigure 8. 8. (a(a) )The The memristor memristor crossbar crossbar with with the the non-ideal eeffectsffects ofof RRSS,R, RNN, and RW.. Here Here R RSS == 0.27%*HRS0.27%*HRS

andand R RNN = =0.067%*HRS.;0.067%*HRS.; (b ()b the) the MNIST MNIST recognition recognition rate rate of of the the non-ideal non-ideal crossbar crossbar with with 256 256 SP SP columns. columns. Here,Here, SP SP means means Spatial Spatial Pooler. Pooler. The The percentage percentage σσ ofof memristance memristance variation variation of of HRS HRS and and LRS LRS is is assumed assumed toto be be 0% 0% in in Figure Figure 8.;8.; ( (cc) )the the MNIST MNIST recognition recognition rate rate of of the the non- non-idealideal crossbar crossbar with with 1024 1024 SP SP columns. columns. HereHere the the percentage percentage σσ ofof memristance memristance variation variation of of HRS HRS and and LRS LRS is is 0%; 0%; and and (d (d) )the the MNIST MNIST recognition recognition raterate of of the the non-ideal non-ideal crossbar crossbar with with 4096 4096 SP SP columns. columns. The The percentage percentage σσ ofof memristance memristance variation variation in in HRSHRS and and LRS LRS is is assumed assumed 0%. 0%.

FigureFigure 8b8b shows shows MNIST MNIST recognition recognition rate rate of ofthe the memristor memristor crossbar crossbar with with SP columns SP columns = 256. Here,= 256. σ theHere, percentage the percentage of defects of in defects the crossbar in the is crossbar changed is from changed 0% to from20%. 0%The topercentage 20%. The σ percentageof memristanceof variationmemristance of HRS variation and LRS of HRSis assumed and LRS to isbe assumed zero. For to the be percentage zero. For the of defects percentage = 0%, of the defects crossbars= 0%, withoutthe crossbars and with without the andboost-factor with the boost-factoradjustment adjustmentshow the recognition show the recognition rates of 77.3% rates ofand 77.3% 77.6%, and respectively.77.6%, respectively. When the When percentage the percentage of defects of defects is very is small, very small, the boost-factor the boost-factor adjustment adjustment affects aff theects recognitionthe recognition rate ratevery very little. little. Howeve However,r, if the if the percentage percentage increases, increases, the the boost-factor boost-factor adjustment adjustment plays plays an important role to keep the recognition rate as high as the rate of defects = 0%, as shown in Figure8b. For the defects = 20%, the boost-factor adjustment can show the recognition rate better by as much as 30.6%, compared to the crossbar without the boost-factor adjustment. Figure8c is for the SP columns = 1024. As mentioned earlier, the crossbar with the SP columns = 1024 recognizes MNIST vectors better than the SP columns = 256. For the percentage of defects = 0%, Materials 2019, 9, x FOR PEER REVIEW 13 of 17 an important role to keep the recognition rate as high as the rate of defects=0%, as shown in Figure 8b. For the defects = 20%, the boost-factor adjustment can show the recognition rate better by as much as 30.6%, compared to the crossbar without the boost-factor adjustment. Figure 8c is for the SP columns = 1024. As mentioned earlier, the crossbar with the SP columns = 1024Materials recognizes2019, 12, 2122 MNIST vectors better than the SP columns = 256. For the percentage of defects=0%,13 of 17 the recognition rates of 256 and 1024 SP columns are 77.6% and 92.5%, respectively. As indicated in Figure 8b, the boost-factor adjustment in Figure 8c can maintain this good recognition rate, even thoughthe recognition the percentage rates of of 256 defects and is 1024 increased SP columns to 20%. are For 77.6% the percentage and 92.5%, = respectively. 20%, the gap As of indicatedrecognition in ratesFigure without8b, the boost-factorand with the adjustment boost-factor in adjustment Figure8c can is maintain as much this as 35.9%. good recognition rate, even though the percentageFigure 8d is of for defects the SP is columns increased = 4096. to 20%. If the For perc the percentageentage of defects= 20%, is the0%, gap the ofrecognition recognition rate rates of thewithout crossbar and is with as high the as boost-factor 96.2%. In spite adjustment of the perc is asentage much of as defects 35.9%. = 20%, the boost-factor adjustment can keepFigure the8 drate isfor as thehigh SP as columns 94%, whereas= 4096. the If thecrossb percentagear without of defectsthe boost-factor is 0%, the recognitionadjustment rateis as of low the ascrossbar 78%. is as high as 96.2%. In spite of the percentage of defects = 20%, the boost-factor adjustment can keepFigure the rate 9a as highshows as 94%,the statistical whereas the distribution crossbar withouts of HRS the boost-factorand LRS, where adjustment the ispercentage as low as 78%.σ of memristanceFigure9a variation shows the is statistical assumed distributions to be 30%. Figure of HRS 9 andb, c, LRS, and whered are for the percentagethe SP columnsσ of memristance= 256, 1024, andvariation 4096, isrespectively. assumed to As be 30%.indicated Figure in9 Figureb, c, and 9, dthe are more for the SP SPcolumns columns can= result256, 1024, in the and better 4096, recognitionrespectively. rate. As In indicated Figure 9b, in the Figure percentage9, the more of de SPfects columns is changed can resultfrom 0% in theto 20%. better For recognition the percentage rate. ofIn defects Figure9 =b, 0%, the percentagethe boost-factor of defects adjustment is changed affects from the 0% recognition to 20%. For rate the very percentage little. However, of defects if= the0% , percentagethe boost-factor of defects adjustment is increased affects to the20%, recognition the boost-factor rate very adjustment little. However, can improve if the the percentage recognition of ratedefects significantly is increased compared to 20%, theto the boost-factor crossbar without adjustment the boost-factor can improve adjustment the recognition. Similarly, rate significantly in Figure 9ccompared with the toSP the columns crossbar = 1024, without the therecognition boost-factor rates adjustment. without and Similarly, with the inboost-factor Figure9c withadjustment the SP arecolumns 54% and= 1024, 87%, the respectively, recognition when rates the without defects and = 20% with. In the Figure boost-factor 9d with adjustment the SP columns are 54% = 4096, and 87%,the recognitionrespectively, rates when without the defects and =with20%. the In boost-fact Figure9d withor adjustment the SP columns are 74%= 4096,and 93.9%, the recognition respectively, rates whenwithout the and defects with = the 20%. boost-factor adjustment are 74% and 93.9%, respectively, when the defects = 20%.

2.0x10-6 2.0x10-8 Ω LRS = 1MΩ HRS = 100M σ 100 σ = 30% = 30% 1.5x10-6 1.5x10-8 80

1.0x10-6 1.0x10-8 60

40 Probability density Probability Probability density Probability 5.0x10-7 5.0x10-9 # of SP columns = 256 Recognition rate (%) Recognition 20 Without boost-factor adjustment With boost-factor adjustment 0.0 0.0 0 0 1x106 2x106 0 1x108 2x108 05101520 Memristance (Ω) Memristance (Ω) Percentage of defects in crossbar (%) (a) (b)

100 100

80 80

60 60

40 40 # of SP columns = 1024 # of columns = 4096

Recognition rate (%) Recognition 20 Without boost-factor adjustment rate (%) Recognition 20 Without boost-factor adjustment With boost-factor adjustment With boost-factor adjustment 0 0 0 5 10 15 20 0 5 10 15 20 Percentage of defects in crossbar (%) Percentage of defects in crossbar (%) (d) (c) FigureFigure 9. (a) TheThe statistical statistical distributions distributions of LRSof LRS and and HRS HRS with thewith percentage the percentageσ of memristance σ of memristance variation variation= 30% in = the 30% simulation; in the simulation; (b) the MNIST (b) the recognition MNIST recognition rate of the rate non-ideal of the crossbarnon-ideal with crossbar 256 SP with columns 256 SPand columns the percentage and the σpercentageof memristance σ of memristance variation = variation30%. Here, = 30%. SP means Here, SpatialSP means Pooler.; Spatial (c) Pooler.; the MNIST (c) therecognition MNIST recognition rate of the non-idealrate of the crossbar non-ideal with cro 1024ssbar SP with columns 1024 SP and columns the percentage and theσ percentageof memristance σ of variation = 30%; and (d) the MNIST recognition rate of the non-ideal crossbar with 4096 SP columns and the percentage σ of memristance variation = 30%. Materials 2019, 12, 2122 14 of 17

4. Discussion In this session, to understand the benefit of the proposed circuit exactly, we discuss and compare the following three SP schemes in Table1: (1) Spatial-pooling without both the boost-factor adjustment and the defect-aware mapping, (2) spatial-pooling with the defect-aware mapping, and (3) spatial-pooling with the boost-factor adjustment.

Table 1. Comparison of possibility of hardware implementation, energy consumption for the crossbar programming, and MNIST recognition rate for the three SP schemes. Here, SP means Spatial Pooler. Energy consumption is calculated during the training time of 10,000 MNIST vectors.

Energy Consumption of the MNIST Recognition Rate Possibility for Hardware Crossbar Programming Implementation # of SP Rate (%) Rate (%) (SP Column = 256) Columns Defects = 0% Defects = 10% (1) Spatial-pooling 256 77.3 55.6 without the boost-factor Able to be implemented 3.9 mJ for the crossbar programming 1024 92 65.4 adjustment and the with hardware defect-aware mapping 4096 95.7 81.1 The defect-aware mapping in 256 77.3 56.3 (2) Spatial-pooling Figure2d demands the very 3.9mJ for the crossbar programming 1024 92 66.5 with the complicated hardware of memory, defect-aware mapping processor, controller, etc. 4096 95.7 82.4 3.9 mJ for the crossbar programming, 256 77.6 77 (3) Spatial- pooling +2uJ for the boost-factor adjustment Able to be implemented 1024 92.5 91.8 with the (Energy overhead due to boost-factor boost-factor adjustment with hardware adjustment: ~0.05%) 4096 96.2 95.4

First, we discuss the possibility of hardware implementation in Table1. As mentioned earlier, (1) and (3) can be implemented in hardware. However, the defect-aware mapping of (2), as indicated in Figure2d, demands very complicated circuits such as memory, processor, controller, etc. Second, the energy consumptions of the crossbar programming are compared among (1), (2), and (3) in Table1. The amount of programming energy is simulated during the training time of 10,000 MNIST vectors (1) and (2) consume 3.9 mJ for programming the crossbar with HRS and LRS, according to Hebbian learning rule, as explained in Figure4a. The energy overhead due to the boost-factor adjustment is less than ~0.05% of the crossbar programming energy. This is because each column has only one memristor for the boost-factor adjustment, compared to 400 cells per column for Hebbian learning. For the recognition rate, in Table1 (1), without the boost-factor adjustment and defect-aware mapping, shows MNIST recognition rates of 77.3% and 55.6%, when the defects = 0% and 10%, respectively. Similarly, (2), with only the defect-aware mapping, shows the rates of 77.3% and 56.3%, when the defects = 0% and 10%, respectively. Without the boost-factor adjustment, the defective columns necessarily become activated frequently. The frequent activation of defective columns degrades the recognition rate significantly, as shown in (2) in Table1. On the contrary, (3) with the boost-factor adjustment shows the rates of 77.6% and 77%, when the defects = 0% and 10%, respectively. It has very little loss of the recognition rate, in spite of the defects = 10%. The gap between the defects = 0% and 10% is negligibly small for the crossbar with the boost-factor adjustment. We now discuss the relationship of this work to the previous works performed in HTM hardware realization. Actually, as a previous works of this paper, we developed the memristor crossbar circuits for performing the SP and TM operations of HTM, respectively [12,13]. However, in the previous works, we did not consider the memristor defects, which should be taken into account in the real memristor crossbar having defects of stuck-at-faults and variations. Thus, the SP hardware implemented with the real defective memristor crossbar can be an essential part of future HTM’s hardware system. Additionally, as a further work, we try to fabricate the crossbar having more than 100 memristors and combine the fabricated crossbar with the CMOS circuit to verify the SP operation by hardware, for testing the MNSIT vectors. Finally, we discuss possible applications of the memristor-CMOS hybrid circuit of HTM’s hardware. As Internet of Things (IoT) sensors become more popular in human life and environment, an amount Materials 2019, 12, 2122 15 of 17 of data generated from the sensors becomes enormous [28–30]. To handle this huge amount of data from the physical world, we can think of the integration of IoT sensors and memristor-CMOS hybrid circuit into one chip [31,32]. By doing so, the unstructured data from the sensors can be pre-processed and interpreted near the sensors by the integrated memristor-CMOS hybrid circuit of HTM hardware. If we deliver all the data generated from the IoT sensors to the cloud, without any pre-processing of the unstructured data near the IoT sensors, an amount of computing energy demanded at the cloud may be huge [33]. Thus, the memristor-CMOS hybrid circuit that can perform the pre-processing of the unstructured data from the IoT sensors can be very useful for energy-efficient computing in future.

5. Conclusions The SP of HTM has been known as the software framework to model human brain’s neocortical operation such as recognition, cognition, etc. However, mimicking the brain’s neocortical operation by hardware rather than software is more desirable, because the hardware not only describes the neocortical operation, but also employs the brain’s architectural advantages such as high energy efficiency, extreme parallel-computation, etc. To realize HTM’s SP by hardware, in this paper, we developed the memristor-CMOS hybrid circuit. One thing important for hardware implementation is that memristor defects such as stuck-at-faults, memristance variations, etc., should be considered in developing the memristor-CMOS hybrid circuit of SP. For considering memristor defects in hardware implementation, first, we showed that the boost-factor adjustment can make HTM’s SP defect-tolerant, because the false activation of defective columns can be suppressed. Second, we proposed the memristor-CMOS hybrid circuit with the boost-factor adjustment for realizing the defect-tolerant spatial-pooling in hardware. The proposed circuit does not rely on the conventional defect-aware mapping scheme, which cannot avoid the false activation of defective columns in spatial-pooling. For the MNIST data-set, the boost-factor adjusted crossbar with the defects = 10% was verified to have a rate loss as low as ~0.6%, compared to the ideal crossbar with the defects = 0%. On the contrary, the defect-aware mapping without the boost-factor adjustment demonstrated a significant rate loss, as much as ~21.0%. The energy overhead of the boost-factor adjustment was estimated to be as little as ~0.05% of the programming energy of the memristor synapse crossbar.

Author Contributions: All authors have contributed to the submitted manuscript. K.-S.M. defined the research topic. T.V.N. and K.V.P. performed the simulation and measurement. K.-S.M. wrote the manuscript. All authors read and approved the submitted manuscript. Funding: The work was financially supported by NRF-2015R1A5A7037615, MOTIE/KEIT (10052653), ETRI grant (18ZB1800), and Samsung Electronics under Project Number SRFC-IT1701-07. Acknowledgments: The CAD tools were supported by IC Design Education Center (IDEC), Daejeon, Korea. Conflicts of Interest: The authors declare that they have no competing interests.

References

1. Hawkins, J.; Blakeslee, S. On intelligence: How a New Understanding of the Brain Will Lead to the Creation of Truly Intelligent Machines; Holt & Company: New York, NY, USA, 2004. 2. Horton, J.C.; Adams, D.L. The cortical column: A structure without a function. Philos. Trans. R. Soc. Lond. B Biol. Sci. 2005, 360, 837–862. [CrossRef][PubMed] 3. Thomson, A. Neocortical layer 6, a review. Front. Neuroanat. 2010, 4, 13. [CrossRef][PubMed] 4. Hensch, T.; Stryker, M. Columnar architecture sculpted by GABA circuits in developing cat . Science 2004, 303, 1678–1681. [CrossRef][PubMed] 5. Muir, D.R.; Cook, M. Anatomical constraints on lateral competition in columnar cortical architectures. Neural Comput. 2014, 26, 1624–1666. [CrossRef][PubMed] 6. Douglas, R.J.; Martin, K.A.C.; Whitteridge, D. A canonical microcircuit for neocortex. Neural Comput. 1989, 1, 480–488. [CrossRef] Materials 2019, 12, 2122 16 of 17

7. Hawkins, J.; Ahmad, S.; Dubinsky, D. Hierarchical Temporal Memory including HTM Cortical Learning Algorithms; Tech. Rep.; Numenta, Inc.: Palo Alto, CA, USA, 2011. 8. Cui, Y.; Ahmad, C.; Hawkins, J. The HTM spatial pooler—A neocortical algorithm for online sparse distributed coding. bioRxiv 2016, bioRxiv:085035. Available online: https://doi.org/10.1101/085035 (accessed on 2 November 2016). [CrossRef] 9. Ahmad, S.; Hawkins, J. Properties of sparse distributed representations and their application to hierarchical temporal memory. arXiv 2015, arXiv:1503.07469. 10. Ahmad, S.; Hawkins, J. How do neurons operate on sparse distributed representations? A mathematical theory of sparsity, neurons and active dendrites. arXiv 2016, arXiv:1601.00720. 11. Cui, Y.; Ahmad, C.; Hawkins, J. Continuous online sequence learning with an unsupervised neural network model. arXiv 2015, arXiv:1512.05463. [CrossRef] 12. Truong, S.N.; Pham, K.V.; Min, K.S. Spatial-pooling memristor crossbar converting sensory information to sparse distributed representation of cortical neurons. IEEE Trans. Nanotechnol. 2018, 17, 482–491. [CrossRef] 13. Nguyen, T.; Pham, K.; Min, K.S. Memristor-CMOS Hybrid Circuit for Temporal-Pooling of Sensory and Hippocampal Responses of Cortical Neurons. Materials 2019, 12, 875. [CrossRef][PubMed] 14. Chua, L.O. Memristor-the missing circuit element. IEEE Trans. Circuit Theory 1971, 18, 507–519. [CrossRef] 15. Strukov, D.B.; Snider, G.S.; Stewart, D.R.; Williams, R.S. The missing memristor found. Nature 2008, 453, 80–83. [CrossRef][PubMed] 16. Jo, S.H.; Chang, T.; Ebong, I.; Bhadviya, B.B.; Mazumder, P.; Lu, W. Nanoscale memristor device as synapse in neuromorphic systems. Nano Lett. 2010, 10, 1297–1301. [CrossRef][PubMed] 17. Kügeler, C.; Meier, M.; Rosezin, R.; Gilles, S.; Waser, R. High-density 3D memory architecture based on the resistive switching effect. Solid State Electron. 2009, 53, 1287–1292. [CrossRef] 18. Shulaker, M.M.; Wu, T.F.; Pal, A.; Zhao, L.; Nishi, Y.; Saraswat, K.; Wong, H.-S.P.; Mitra, S. Monolithic 3D integration of logic and memory: Carbon nanotube FETs, resistive RAM, and silicon FETs. In Proceedings of the IEEE International Electron Devices Meeting, San Francisco, CA, USA, 15–17 December 2014; pp. 638–641. 19. Truong, S.N.; Shin, S.H.; Byeon, S.D.; Song, J.S.; Min, K.S. New twin crossbar architecture of binary memristors for low-power image recognition with discrete cosine transform. IEEE Trans. Nanotechnol. 2015, 14, 1104–1111. [CrossRef] 20. Truong, S.N.; Ham, S.J.; Min, K.S. Neuromorphic crossbar circuit with nanoscale filamentary-switching binary memristors for . Nanoscale Res. Lett. 2014, 9, 1–9. [CrossRef] 21. Tunali, M.A.O. A survey of fault-tolerance algorithms for reconfigurable nano-crossbar arrays. ACM Comput. Surv. 2017, 50, 79:1–79:35. [CrossRef] 22. Chakraborty, I.; Roy, D.; Roy, K. Technology Aware Training in Memristive Neuromorphic Systems based on non-ideal Synaptic Crossbars. IEEE Trans. Emerg. Top. Comput. Intell. 2018, 2, 335–344. [CrossRef] 23. Truong, S.; Pham, K.; Yang, W.; Shin, S.; Pedrotti, K.; Min, K.S. New pulse amplitude modulation for fine tuning of memristor synapses. Microelectron. J. 2016, 55, 162–168. [CrossRef] 24. Pham, K.; Tran, S.; Nguyen, T.; Min, K.-S. Asymmetrical Training Scheme of Binary-Memristor-Crossbar-Based Neural Networks for Energy-Efficient Edge-Computing Nanoscale Systems. Micromachines 2019, 10, 141. [CrossRef][PubMed] 25. Virtuoso Spectre Circuit Simulator User Guide; Cadence Design System Inc.: San Jose, CA, USA, 2011. 26. LeCun, Y.; Bottou, L.; Bengio, Y.; Haffner, P. Gradient-Based Learning Applied to Document Recognition. Proc. IEEE 1998, 86, 2278–2324. [CrossRef] 27. Deng, L. The MNIST database of handwritten digit images for research. IEEE Signal Process. Mag. 2012, 29, 141–142. [CrossRef] 28. Sun, X.; Ansari, N. EdgeIoT: Mobile Edge Computing for the Internet of Things. IEEE Commun. Mag. 2016, 54, 22–29. [CrossRef] 29. Gusev, M.; Dustdar, S. Going back to the roots 2014; the evolution of edge computing, an IoT perspective. × IEEE Internet Comput. 2018, 22, 5–15. [CrossRef] 30. Gopika, P.; Mario, D.F.; Tarik, T. Edge computing for the internet of things: A case study. IEEE Internet Things 2018, 5, 1275–1284. 31. Abunahla, H.; Mohammad, B.; Mahmoud, L.; Darweesh, M.; Alhawari, M.; Jaoude, M.; Hitt, G. Memsens: Memristor-based radiation sensor. IEEE Sens. J. 2018, 18, 3198–3205. [CrossRef] Materials 2019, 12, 2122 17 of 17

32. Krestinskaya, O.; James, A.; Chua, L. Neuro-memristive Circuits for Edge Computing: A review. arXiv 2018, arXiv:1807.00962. 33. Plastiras, G.; Terzi, M.; Kyrkou, C.; Theocharidcs, T. Edge intelligence: Challenges and opportunities of near-sensor machine learning applications. In Proceedings of the 2018 IEEE 29th International Conference on Application-specific Systems, Architectures and Processors (ASAP), Milan, Italy, 10–12 July 2018; pp. 1–7.

© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).