LETTER Communicated by Manu Rastogi

ASIC Implementation of a Nonlinear Dynamical Model for Hippocampal Prosthesis

Zhitong Qiao [email protected] Yan Han [email protected] Xiaoxia Han [email protected] Institute of Microelectronics and Nanoelectronics, Zhejiang University, Hangzhou 310027, China

Han Xu [email protected] School of Medicine, Zhejiang University, Hangzhou 310058, China

Will X. Y. Li [email protected] School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing 210094, China

Dong Song [email protected] Theodore W. Berger [email protected] Department of Biomedical Engineering, Center for , University of Southern California, Los Angeles, CA 90089, U.S.A.

Ray C. C. Cheung [email protected] Department of Electronic Engineering, City University of Hong Kong, Hong Kong 999077, China

A hippocampal prosthesis is a very large scale integration (VLSI) biochip that needs to be implanted in the biological brain to solve a cognitive dys- function. In this letter, we propose a novel low-complexity, small-area, and low-power programmable hippocampal neural network application- specific integrated circuit (ASIC) for a hippocampal prosthesis. Itis based on the nonlinear dynamical model of the : namely multi-input, multi-output (MIMO)–generalized Laguerre-Volterramodel (GLVM). It can realize the real-time prediction of hippocampal neural

Neural Computation 30, 2472–2499 (2018) © 2018 Massachusetts Institute of Technology doi:10.1162/neco_a_01107

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/neco_a_01107 by guest on 29 September 2021 ASIC Implementation of Hippocampal Neural Networks 2473

activity. New hardware architecture, a storage space configuration scheme, low-power convolution, and gaussian random number generator modules are proposed. The ASIC is fabricated in 40 nm technology with acoreareaof0.122mm2 and test power of 84.4 μW. Compared with the design based on the traditional architecture, experimental results show that the core area of the chip is reduced by 84.94% and the core power is reduced by 24.30%.

1 Introduction

The hippocampus, an important part of the brain system, is mainly respon- sible for the formation of memory and spatial positioning. Research has found that the hippocampus is mainly responsible for the formation of new memories (Valiant, 2012). Therefore, damage to the hippocampus and sur- rounding regions of the medical temporal lobe can result in a permanent loss of the ability to form new long-term memories, causing cognitive dys- function such as Alzheimer’s disease (AD) and other dementia (Berger, Orr, & Orr, 1983; Eichenbaum, Fagan, Mathews, & Cohen, 1988; Milner, 1970; Squire & Zola-Morgan, 1991). Thus far, most of the drug treatment pro- grams have failed to treat AD. Some drugs can reduce the rate of cognitive decline in patients with early AD but cannot repair nerve damage; more- over, these drugs still present some undesirable side effects (Mullard, 2016; Sevigny et al. 2016). Clearly the effect of drug treatment to alleviate cogni- tive decline is very limited. Hippocampal cognitive neural prosthesis, or hippocampal prosthesis for short, has been proposed to address this issue by replacing damaged tissue with a neurochip that mimics the functions of the original biologi- cal circuitry. It is used to replace the damaged region of the hippocampus (CA3–CA1 path) and thereby repair the memory and cognitive dysfunction caused by damage to the hippocampus. It consists of five modules: a low- noise amplifier (LNA), an analog-to-digital converter (ADC), a spike sorter, a multi-input, multi-output response model (MIMO–GLVM), and a charge- metering stimulus amplifier (CM), as shown in Figure 1 (Berger et al., 2012). The analog front end consists of 16 LNAs and 16 ADCs in parallel, that is, 16 input electrodes implanted in the hippocampus deliver neural signals for amplification and digitization. The digitized signals are then classified by 16 spike sorters into spike event channels, where events are represented by a single bit. Outputs (responses to the spike events) are computed by a single MIMO–GLVM-based hippocampal neural network, which delivers 8 channels of output to 8 CMs. Currently, probe technology and the size of the hippocampus in rats limit the chip to 16 parallel inputs and 8 differential outputs. In this letter, we focus on the research and implementation of the hip- pocampal neural network application-specific integrated circuit (ASIC) for

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/neco_a_01107 by guest on 29 September 2021 2474 Z. Qiao et al.

Figure 1: Functional block diagram of the hippocampal prosthesis.

hippocampal prosthesis. It is based on the nonlinear dynamical model of hippocampus, MIMO–GLVM. The MIMO–GLVM-based hippocampal neu- ral network is an artificial neural network (ANN). It is the core module and mainly realizes the memory function of hippocampus, that is, it converts short-term memory into long-term memory. Because of the similarity be- tween an ANN and a neural network (NN), it can be used to replace the damaged CA3–CA1 pathway in the hippocampus, completing the normal processing and transmission of neural signals. (The structure of the hip- pocampal CA3–CA1 pathway and the implantation diagram of the hip- pocampal prosthesis are in Figures 2 and 3 of Berger et al., 2012.) In recent years, ANN has developed rapidly. An increasing number of research groups are developing VLSI chips that implement hundreds to thousands of spiking with biophysically realistic dynamics, with the intention of emulating brain-like real-world behavior in hardware and robotic systems rather than simply simulating their performance on general-purpose digital computers (Neftci, Chicca, Indiveri, & Douglas, 2011; Martí, Rigotti, Seok, & Fusi, 2016; Cymbalyuk, Patel, Calabrese, De- Weerth, & Cohen, 2000; Bartolozzi & Indiveri, 2007; Giulioni, Pannunzi, Badoni, Dante, & Giudice, 2009). For example, IBM has developed mil- lions of neurons of integrated circuits, TrueNorth. It reaches the level of su- percomputers, but with extremely low power consumption (Merolla et al., 2014; Service, 2014). A convolutional neural network (CNN) is a particular kind of ANN specifically designed for hardware implementation, usually in embedded and real-time systems, such as image processing applications (Karahaliloglu, Gans, Schemm, & Balkir, 2008; Rawat & Wang, 2017; Chen,

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/neco_a_01107 by guest on 29 September 2021 ASIC Implementation of Hippocampal Neural Networks 2475

Krishna, Emer, & Sze, 2017). Its main goals are to improve system speed and reduce system power consumption. Unlike these other studies, our study is aimed at the neural prosthesis that needs to be implanted in the biological brain. Because the frequency of neural spike signal is very low, the working speed of the MIMO hip- pocampal NN is not very high as long as the oversampling frequency can be achieved. It achieves the computation performance of super computers with extremely low power consumption (Merolla et al., 2014; Service, 2014). The ASIC platform has the following outstanding advantages com- pared with the field programmable gate array (FPGA) software simulation platform:

1. The architecture is customized, so it is efficient in area, power, and speed. 2. The die area is very small, so that it can be implanted in the organism. 3. It can integrate digital and analog circuits on a single chip.

The ASIC will serve as the main platform for the realization of hip- pocampal prosthesis. As the core module, the ASIC design of the MIMO hippocampal NN is very important. Currently, the only available work on ASIC-based MIMO hippocampal NNs is in Berger et al. (2012). That paper proposes a prototype of the hip- pocampal prosthesis ASIC, which was fabricated in a 180 nm process. The study gives a detailed introduction to the GLVM algorithm, but little in- formation on the specific implementation of the circuit and corresponding area, power consumption, accuracy, and functional test results of the chip. An FPGA-based MIMO hippocampal NN hardware architecture is pro- posed to realize the coefficient’s estimation of the GLVM and prediction of neuronal population firing activity (Li, Cheung et al., 2013a, 2013b; Li,Xin et al., 2014; Li, Chan et al. 2011a, 2011b). Actually, the coefficient’s estimation module is very complex, and does not need to be implanted in the brain. Therefore, the coefficient’s estimation function can be realized outside the brain; only the prediction function needs to be realized by ASIC to be im- planted into the brain. After the coefficient’s estimation process is finished, the coefficients can be sent to the ASIC, so the hippocampal NN needsto be programmable. Considering that nerve cells are sensitive to heat and that battery life is vital, researchers found that chips with low power are critical. However, not much work is available on appropriate low-power design technology. In this letter, we present an entire MIMO–GLVM based programmable hippocampal NN ASIC. Compared with other work, we focus on the new architecture, low-power, and low complexity ASIC design, based on the ad- vanced 40 nm process. We offer a detailed ASIC implementation scheme. In addition, our test has validated the function.

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/neco_a_01107 by guest on 29 September 2021 2476 Z. Qiao et al.

Our work makes the following contributions: 1. A novel power- and area-efficient programmable hippocampal NN ASIC architecture. The ASIC is fabricated in a 40 nm process with a core area of 0.122 mm2 and test power of 84.4 μW. Compared with the traditional architecture, our experimental results show that the core area of the chip is reduced by 84.94% and the core power by 24.30%. 2. A highly efficient storage space configuration scheme for the hip- pocampal NN that has high utilization of physical space with little wasted storage space. 3. A novel low-power and low-complexity convolution unit and gaus- sian random number generator (GRNG) module. Compared with traditional circuits, the circuit area of the convolution unit can be re- duced by 72.43%, and the power consumption of the GRNG module can be reduced by about 152.68 times. 4. A study based on an advanced 40 nm complementary metal–oxide– semiconductor (CMOS) process and suitable low-power technolo- gies, which we discuss for the NN ASIC. The letter is organized as follows. We introduce the nonlinear dynami- cal model of hippocampus–GLVM in section 2 and give details on the ASIC design of programmable hippocampal NN in section 3. We present the de- sign and validation results of the proposed scheme in section 4 and draw conclusions in section 5.

2 Hippocampal Nonlinear Dynamical Model—GLVM

A mathematical model describing how information carried by biosignals flows through the brain regions is important to the development oftheneu- ral prosthesis. One approach is parametric modeling (Song, Marmarelis, & Berger, 2009). However, this approach requires a large number of coeffi- cients and intensive computation, which are not feasible in the hippocam- pal prosthesis design. Considering these factors, we refer to nonparametric models that use such engineering modeling techniques as network analy- ses, information theory, and statistical methods to investigate the behavior of biological neurons or neural networks. In the nonparametric model, the CA3–CA1 pathway of the hippocampus can be abstracted as a MIMO model. It can be decomposed into multiple independent multi-input, single-output (MISO) models as shown in Figure 2, which can be found in previous work (Berger et al., 2012). Mathematically, the MISO model can be expressed by the following equations: 0, w <θ y = , (2.1) 1, w ≥ θ w = u(k, x) + a(h, y) + ε(σ ). (2.2)

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/neco_a_01107 by guest on 29 September 2021 ASIC Implementation of Hippocampal Neural Networks 2477

Figure 2: The MIMO model can be decomposed into multiple independent MISO models. (Top) Structure of a MIMO model. (Bottom) Structure of a MISO model.

See the notations in the MISO model in Table 1. Neurons communicate with each other using all-or-none action poten- tials, or spikes. x/y = 1 represents the action potential, and x/y = 0repre- sents the resting potential. The input x and output y are both 1 bit signals. The MIMO model studied in this letter is a 64-input, 8-output MIMO system, which is very suitable for the design of a hippocampal cognitive prosthetic biochip (Berger et al., 2012). It can be decomposed into eight in- dependent MISO models whose structure as shown at the top of Figure 2.

2.1 Generalized Functional Additive Model. In order to solve the modeling problem of the hippocampus, Song et al. (2013) proposed a gen- eralized functional additive model (GFAM; Hampson et al., 2012). In the model, the synaptic potential u can be expressed with a MISO Volterra kernel:

N M = + (n) τ − τ . u(t) k0 k1 ( )xn(t ) (2.3) n=1 τ=0

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/neco_a_01107 by guest on 29 September 2021 2478 Z. Qiao et al.

Table 1: Table of Notations in the MISO Model.

Symbols Meanings y Output of the MISO model representing the output spike trains of the hippocampal CA1 region w Prethreshold potential of cell membrane θ Threshold of cell membrane u Synaptic potential a Afterpotential ε Gaussian white noise input σ Variance of neuronal gaussian noise x Input of the model, which represents the input spike trains of the hippocampal CA3 region k, h Kernel function

Table 2: Table of Notations in the GFAM.

Symbols Meanings

xn nth input signal N Number of input signals x M Memory length k0 Zeroth-order kernel (n) τ τ k1 ( ) First-order kernel of the nth input at lag h(τ ) Feedback kernel at lag τ

The afterpotential a can be expressed with the SISO Volterra kernel:

M a(t) = h(τ )y(t − τ ). (2.4) τ=1

See the notations for the GFAM model in Table 2.

2.2 Generalized Laguerre-Volterra Model. When the memory length M is very long, the number of the coefficients in the model will be large, which will greatly increase model complexity. One of the commonly used methods to reduce the number of model coefficients is to expand the ker- nel functions k and h with a series of basis functions; that is, they are writ- ten in the form of weighted sums of a series of basis functions. Another improved model has been proposed in Song et al. (2013): the generalized Laguerre-Volterra model (GLVM). In this model, the standard orthogonal Laguerre basis function b can be used to expand the Volterra kernel with the Laguerre expansion of the Volterra kernel (LEV) technique.

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/neco_a_01107 by guest on 29 September 2021 ASIC Implementation of Hippocampal Neural Networks 2479

With the LEV technique, the feedforward kernel k and the feedback ker- nel h can be rewritten:

L (n) τ = (n) τ ,τ∈ , , k ( ) c ( j)b j ( ) [0 M] (2.5) j=1

L τ = τ ,τ∈ , , h( ) ch( j)b j ( ) [1 M] (2.6) j=1

where c is the coefficient of the Laguerre basis function, L is the number of Laguerre basis functions, and j is the order of the Laguerre basis function ( j = 1, 2,...,L). The standard orthogonal Laguerre basis function b can be calculated by a simple iteration using the following equation: ⎧ ⎪ j = 1,τ≥ 0 ατ (1 − α) ⎨ √ b (τ ) = j > 1,τ= 0 αb − (0) , (2.7) j ⎩⎪ √ j 1 > ,τ> α τ − + τ − τ − j 1 0 [b j ( 1) b j−1( )] b j−1( 1)

where α is the Laguerre parameter. Thus, using the LEV technique, we can rewrite the synaptic potential u and the afterpotential a as

N L = + (n) v (n) , u(t) c0 c1 ( j) j (t) (2.8) n=1 j=1

L = v (h) , a(t) ch( j) j (t) (2.9) j=1

(n) where v is the convolution of the input pulse spike trains xn and the La- guerre basis function b, v (h) is the convolution of the output spike trains y and b:

M v (n) = τ − τ , j (t) b j ( )xn(t ) (2.10) τ=0 M v (h) = τ − τ . j (t) b j ( )y(t ) (2.11) τ=1

In addition to reducing the number of model coefficients, the use of LEV technology has another advantage that the convolved products v (n)and v (h)

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/neco_a_01107 by guest on 29 September 2021 2480 Z. Qiao et al.

Table 3: Number of Coefficients c/k in Each MISO Model.

Kernel k0/c0 k1/c1 h/ch Total 1 N(M + 1) M GFAM 1 32,064 500 32,565 1 NL L GLVM 1 192 3 196

can be computed recursively (Song et al., 2013; Marmarelis & Marmarelis, 1993; Ogura, 1972):

⎧ √ √ ⎪ j = 1,τ >0 αv (τ − 1) + 1 − αx(τ ) ⎨ 1 v τ = j . j ( ) j ≥ 1,τ = 0 α (1 − α)x(τ ) ⎪ √ √ ⎩ > ,τ > αv τ − + αv τ − v τ − j 1 0 j ( 1) j−1( ) j−1( 1) (2.12)

Therefore, when calculating the synaptic potential u and the afterpoten- tial a, it is not necessary to span the entire memory length M and calculate the coefficients associated with all sampling points in equations 2.3 and2.4. It is necessary only to estimate the coefficients associated with L Laguerre basis functions. Since L is much smaller than M (L = 3 in this letter), the number of open coefficients will be reduced to a large extent and the calcu- lation process will become more efficient. In this letter, N = 64, L = 3, when M = 500. The number of coefficients c/k in each MISO model is listed in Table 3. After the GLVM is used, the number of coefficients in the MISO model is reduced by 99.398% in a comparison with the GFAM. Because of the con- siderable advantages of the GLVM, we chose it for the hippocampal NN ASIC design in this letter. Some conclusions can be drawn from the above model. The matrix C, which consists of c0, c1,andch, is the weight of the hippocampal NN. It embodies the existence of connections between neurons and the strength of each connection. In addition, the matrix V, composed of v (n) and v (h),is iteratively updated, reflecting the influence of historical input on the current output, which also reflects the memory characteristics of the hippocampal NN. The GLVM also has some disadvantages. First, the complexity of the con- volution unit circuit will increase and second, the model training process will become more complicated. For the first point, the design of convolu- tion unit circuits with low complexity and low power consumption there- fore becomes even more important. Second, the complicated model training

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/neco_a_01107 by guest on 29 September 2021 ASIC Implementation of Hippocampal Neural Networks 2481

can be done outside the brain without placing a burden on the ASIC design of the hippocampus neural network. Model training of the GLVM can be done outside the brain through Mat- lab. After the coefficients’ estimation process is finished, the coefficients can be sent to the ASIC. The Laguerre coefficients matrix C can be calculated it- eratively at each time step using point-process filtering algorithms such as the steepest descent point-process filter (SDPPF; Eden, Frank, Barbieri, Solo, & Brown, 2004; Li et al., 2013a):

∂ T log Pf (t) C(t) = C(t − 1) + R y(t) − P (t) , (2.13) ∂C f C(t−1)

where R is the learning rate and Pf is the conditional firing probability in- tensity function, which reflects the conditional probability of generating a spike.

3 ASIC Design of Programmable Hippocampal Neural Networks

3.1 Hardware Architecture. In existing studies, MIMO hippocampal NN hardware is based on traditional parallel architecture (Berger et al., 2012; Li et al., 2013a, 2013b; Li et al., 2014; Li et al., 2011a, 2011b). It is broken down into a series of structurally identical MISO models for each indepen- dent output, as shown in Figure 2. Then the implementation of the MISO becomes the focus of research and design. This hardware architecture has the following advantage:

• The architecture easily expands to more outputs. The system consists of eight independent MISOs whose structures are the same. There- fore, the system architecture has very good flexibility.

However, there are some disadvantages with this architecture:

• The scale of the system is very large, so the circuit area and power consumption are also very large. In fact, many processing units can be shared to reduce the system size. • This architecture is not efficient for the ASIC design. If it is usedfor the ASIC design, the storage space and physical space cannot be ef- fectively used.

Considering these factors, we propose a novel ASIC-based MIMO– GLVM hippocampal NN architecture, shown in Figure 3. It is not composed of eight identical MISOs and does not focus on the implementation of one MISO; rather, it directly studies the realization of the whole MIMO–GLVM NN. It has been proven to be area and power efficient.

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/neco_a_01107 by guest on 29 September 2021 2482 Z. Qiao et al.

Figure 3: Proposed architecture of the MIMO NN. (Top) Proposed structure of convolution unit. (Bottom) Proposed storage space configuration scheme. GRNG is the gaussian random number generator module. Qn_x and Qn_y are the parameters in the GLVM as shown in Table 4. Their expressions are the same, but the corresponding Laguerre parameters α are different. fp_cmp is a floating- point comparator.

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/neco_a_01107 by guest on 29 September 2021 ASIC Implementation of Hippocampal Neural Networks 2483

The programmable MIMO hippocampal NN has two stages: stage 1 is the coefficients’ configuration and stage 2 the MIMO prediction output spike trains. First, the system is working in stage 1. This stage is sequential, and the coefficientsσ ( , C,andV) in the MIMO–GLVM are serially configured from the 32-bit input port coeff_in. When all the coefficients are configured, stage 1 is finished, and the coeff_up_finish signal will be pushed to high. Then the system shifts to stage 2. Stage 2 is also sequential. The processing modules will predict the output spike trains in real time based on the input spike trains x and the coefficients, which have been configured successfully in stage 1. The processing modules first predict the output spike trains using the coefficients C and V in MISO-1, then MISO-2, ..., finally MISO-8 and completing a one-round prediction process. After that it will repeat the en- tire prediction process.

3.2 Storage Space Configuration. The configuration scheme of the ASIC storage space is closely related to the architecture of the convolu- tion module. A convolution module has been proposed by Li et al. (2013a) v (n) v (h) v (n) v (h) with the circuit using three parallel ways to calculate 1 / 1 , 2 / 2 ,and v (n) v (h) 3 / 3 . There are 65 signals, including 64 input signals xn and 1 feedback signal y. Therefore, for each way, 65 j-order Lagurerre coefficients exist— (1) , (2) ,..., (64) c1 ( j) c1 ( j) c1 ( j)andch( j)—and 65 corresponding jth convolution v (1) v (2) ... v (64) v (h) products— j j j and j . In order to get high precision, all the coefficients are stored in the single-precision floating-point (FP) format. Therefore, whether jth c or v, each requires storage space for 65 words. For the matrix C or V in each MISO, 65 × 3 = 195 elements are found in addi- tion to c0. When they are stored in the single-precision FP format, C and V need 196 and 195 words of storage space respectively. In the FPGA-based study (Li et al., 2013a), all the coefficients are stored in register banks (RBs); the RBs make the system area very large, which is undesirable in the ASIC design. The SRAM may be the most satisfying choice. But there is a problem in the ASIC design: because of the memory compiler limitation, SRAM storage depth must be an integer multiple of 16. As discussed previously, whether jth c or v,eachrequires65wordsof storage space, which is not an integer multiple of 16. Therefore, if all the coefficients are stored with SRAM, each uses 64 + 16 = 80 words for storage, with 15 words of storage space wasted. Each MISO will therefore waste a total of 15 × 3 × 2 = 90 words space, 3 for j = 1, 2, 3 and 2 for c and v. Therefore, 8 MISOs will waste a total of 90 × 8 = 720 words storage space. There is another problem: each MISO requires 3 SRAMs to store the matrix V and 3 SRAMs to store the corresponding matrix C. Therefore, one MISO requires 6 SRAMs—48 SRAMs for the overall MIMO. Due to the existence of a keep-out margin between SRAM and SRAM, or SRAM and standard cells, physical space cannot be effectively used.

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/neco_a_01107 by guest on 29 September 2021 2484 Z. Qiao et al.

Table 4: Parameters in the GLVM.

Parameters Corresponding Expressions in the GLVM α Q1 − α2 Q2 1 − α2 Q3 1 −α − α2 Q4 1 α α2 − Q5 ( 1) α2 − α2 Q6 1

It is obvious that because of the existing configuration of storage space, which makes the storage space for coefficients very dispersed, it is difficult to store these data compactly, causing a waste of storage space and low utilization of physical space. This scheme therefore is not very efficient for ASIC design. The goal of the study is to minimize the number of SRAMs and make storage space as compact as possible. The scheme proposed concentrates all coefficient matrices C in the MIMO together and stores it with SRAM, the same scheme for all coefficient matrices V. The core processing modules slide in parallel between the two SRAMs that store the coefficient matrices C and V and process them. In this way, all the processing units can be mul- tiplexed. Obviously, this architecture is significantly less complex than tra- ditional parallel architecture, and there are only two SRAMs in the MIMO system. The size of SRAM that stores the matrix C is 1568 words. In order to facilitate the processing of the processing units and simplify the system structure, storage of the matrix V is symmetric with the matrix C.Sothesize of SRAM that stores the matrix V is also 1568 words, with only 8 words of storage space wasted. The data can be stored compactly with little storage space wasted and high utilization of physical space.

3.3 Convolution Unit. The convolution unit plays an important role in the whole circuit. The calculation that the convolution unit achieves is

v (n) = v (n) − + , 1 (t) Q1 1 (t 1) Q2xn(t) (3.1) v (n) = v (n) − + v (n) − + , 2 (t) Q3 1 (t 1) Q1 2 (t 1) Q4xn(t) (3.2) v (n) = v (n) − + v (n) − 3 (t) Q5 1 (t 1) Q3 2 (t 1) + v (n) − + . Q1 3 (t 1) Q6xn(t) (3.3)

The parameters in these equations are in Table 4. Li et al. (2013a) have proposed-convolution structure based on the FPGAplatform. In their study, v (n) v (h) v (n) v (h) v (n) v (h) 1 / 1 , 2 / 2 ,and 3 / 3 are calculated by three parallel ways that

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/neco_a_01107 by guest on 29 September 2021 ASIC Implementation of Hippocampal Neural Networks 2485

Figure 4: The relationship between three j-order convolution results that v (n) − , v (n) − v (n) − 1 (t 1) 2 (t 1) and 3 (t 1) shift from left to right.

are independent of each other. There is no logical resource sharing between each way, so the area of the circuit is very large. In order to improve the system’s speed, many registers are inserted, increasing the size of the con- volution logic. For the application of a hippocampal prosthesis, this part does not need to work at very high frequency in dealing with neural sig- nals because only neural spike signals with mean firing rates in the range of 0.5 to 15 Hz need to be included in the processing (Song, Chan et al., 2009). At the same time, this structure will waste storage space and lead to a low utilization of physical space. In summary, this structure is not very efficient for ASIC design. In this letter, we propose a novel small area and low-power convolution circuit whose structure is shown in the top of Figure 3. Equations 3.1 to 3.3 can be rewritten as shown in Figure 4. The calculation of three j-order convolution results is not independent of each other; there are some links between them. When the position of = , , v (n) − , v (n) − parameters Qn (n 1 3 5) remains unchanged, 1 (t 1) 2 (t 1), and v (n) − 3 (t 1) shift from left to right. Thus, the multiplier associated with Qn (n = 1, 3, 5) can be shared. One operand of the multiplier is Qn and the other is v, which can be implemented with shift registers. The convolution period refers to the time required to complete one round of the convolution operation in equations 3.1 to 3.3, and it has three clock v (n) − cycles. In the first clock cycle, the data 1 (t 1) are sent to m1_reg from the port mat_in, and the calculation in equation 3.1 is carried by the circuit to get v (n) 1 (t). Then the result of the calculation is selected by the multiplexer. In the v (n) − v (n) − second clock cycle, 2 (t 1) is sent to m1_reg,and 1 (t 1) is shifted to v (n) m2_reg. The circuit calculates 2 (t) according to equation 3.2. Similarly, in v (n) − v (n) − the third clock cycle, 3 (t 1) is sent to m1_reg, 2 (t 1) is shifted m2_reg, v (n) − v (n) and 1 (t 1) is shifted to m3_reg. The circuit calculates 3 (t) according to equation 3.3. In the first 64 convolution periods, the circuit calculates the convolu- v (n) v (n) v (n) tion results of x,namely, 1 , 2 ,and 3 . In the next convolution period,

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/neco_a_01107 by guest on 29 September 2021 2486 Z. Qiao et al.

v (h) v (h) the circuit calculates the convolution results of y_1, namely, 1 , 2 ,and v (h) 3 . y_1 is the last round prediction output y. Thus, 65 convolution pe- riods are required to complete the entire convolution operation for each MISO. When the convolution results (namely, v (n) or v (h)) are calculated, then v and c are accumulated to calculate the sum of the synaptic potential u and afterpotential a in equations 2.8 and 2.9, namely, the membrane potential u + a. It goes through the threshold trigger module to get the final output. At the same time, the circuit calculates the convolution for another MISO. According to Li et al. (2013a), since all operations related to 1-bit signal x or y_1 are FP operations, the circuit first converts the 1-bit signal to a 32-bit FP signal to participate in the operation. In fact, x or y_1 does not have to be converted to 32-bit FP data to participate in the operation, but as a 1-bit select signal of the 2-to-1 multiplexer (MUX) to select the final results. This eliminates the need for an FP conversion unit and an FP multiplier, so the area becomes smaller. In addition to a significant reduction in the number of registers, this circuit can reduce the FP multiplier from 9 to 3, and the number of FP adders and multiplexers is also decreased. Combined with the above points, our experiment found that in a 40 nm process, the circuit area of this module can be reduced by 72.43%. In the hippocampal NN, the neuronal spike trains of hippocampal CA3 and CA1 regions are very sparse. After analyzing the rat hippocampal CA3 and CA1 neuronal spike trains collected by the multielectrode array, We found that the static probability of 1 is only approximately 1.23%. Consider- ing this finding, we used the operand isolation (OI) technique, shown inthe top panel of Figure 3. Some AND gates are inserted; one port connects the high switching activity signal and the other the spike control signal x/y_1, which is a low switching activity signal. Given that the input and output spike trains are very sparse, they are zero in most cases. Therefore, the high switching activity will not be transferred to the subsequent FP adder and 2-to-1 MUX, significantly reducing the power consumption. Compared with the same design without using the OI technique, our experiment, which used the OI technique, found that the core power of the chip can be reduced by 27.43%, which is considerable. This proves that OI technology is suitable for the application of a low-power NN ASIC design and provides an application scheme conducive to the future development of low-power NN ASIC.

3.4 Threshold Trigger. The threshold trigger (TT) module is used to generate the output spike trains. It consists of a gaussian random number generator (GRNG) module and an FP comparator. The GRNG is used to generate the gaussian noise term ε(σ ). It reflects the spontaneous firing rate of the CA1 and is able to capture thesys- tem uncertainty that resulted from both the intrinsic neuronal noise and the

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/neco_a_01107 by guest on 29 September 2021 ASIC Implementation of Hippocampal Neural Networks 2487

unobserved inputs (neurons whose spiking activities essentially contribute to the model outputs but are not included into the model). Previous parallel architecture has used a GRNG in each MISO module, so the area and power are very large. In previous studies, the GRNG was designed based on the central limit theorem (CLT; Tkacik, 2003) and the FPGA platform (Li et al., 2013a, 2013b). In order to generate some random numbers that conform to the gaussian distribution, many uniform random numbers need to be accumulated, so a high-frequency clock is required. Power analysis found that the power consumption of this module accounts for a large part in the chip, mainly because the frequency of the module is very high. Here, we use the Box-Muller (BM) algorithm (Box & Muller, 1958; Li, Lee, & Hwang, 2015). Since the random number generated by the algorithm itself is subject to the gaussian distribution, it is not necessary to accumulate multiple sets of random numbers, so it can reduce the working frequency of the module, thus reducing the dynamic power consumption of the module. Compared with the CLT method, the power can be reduced by 19.085 times by using the BM algorithm. In addition, in this study, the GRNG module is shared to replace eight GRNG modules. Compared with the traditional scheme, the scheme achieves the same effect, but the number of GRNG modules is only one- eighth of the traditional scheme. At the same time, in a round of predic- tion processes, eight MISOs take turns predicting the output, but the noise source needs to provide only one gaussian random number (GRN). In order to reflect the different spontaneous firing probability of CA1, a multiplier at the end of the GRNG adjusts the standard deviation of the noise, so differ- ent MISOs have different gaussian noise. In the case of generating the same amount of gaussian noise, the GRNG power consumption can be further reduced by eight times. Compared with the traditional scheme, the power consumption of the GRNG module in the MIMO NN ASIC can be reduced by about 19.085 × 8 = 152.68 times. The proposed circuit is shown in Figure 5. The uniform random number generator (URNG) is implemented by a 43-bit linear feedback shift register (LFSR) and a 37-bit cellular automata shift register (CASR), which can be found in Li et al. (2013a). The ln, sqrt, and sin modules are fixed-point nat- ural logarithm, square root, and sine computation digital signal processors (DSPs), respectively. They are used to achieve the BM algorithm. When the fixed-point GRNs are generated, every two GRNs are averaged to improve the quality of the GRNs. Then they are converted to FP format by fix2float. The FP GRNs are stored in the first-input, first-output memory (FIFO). The fifo_r_en signal is the read-enable signal to read the noise data stored in the FIFO. When eight MISOs take turns performing the prediction process, only one GRN is generated. When a MISO is predicting the output spike trains, the corresponding σ is selected to adjust the variance of the neuronal gaus- sian noise to get the gaussian noise term ε(σ ).

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/neco_a_01107 by guest on 29 September 2021 2488 Z. Qiao et al.

Figure 5: Circuit structure of GRNG. URNG is the uniform random number generator. The ln, sqrt, and sin modules are fixed-point natural logarithm, square root, and sine computation DSPs, respectively. fix2float is a fixed-point to floating-point converter.

The power consumption of the module can even be further reduced. The LFSR and CASR are used to generate the uniform random number. The signal prog_full generated by the FIFO can be used as the gated clock enable signal to control the data flow of the two modules and other registers; thus, the dynamic power and area of the URNG module can be reduced. This technique is called integrated clock gating (ICG). When eight MISOs predict the output spike, only one gaussian noise is required, so the fifo_r_en signal can be used as the gated clock enable signal of the FIFO. When the gaussian random number ε(σ ) is generated, it is added with the membrane potential u + a to get the prethreshold potential w.Itcan be used also to arrive at the final output prediction spike y by just one FP comparator fp_cmp and one inverter. The 1-bit output port altb of fp_cmp will be high when input port a is less than input port b. The convolution and TT modules slide in parallel between the two SRAMs that store the C and V coefficient matrices. When one prediction output spike y is generated by each MISO, it will be saved. In the next round of prediction processes, the spike is used as y_1 to participate in the convo- lution operation.

4 Results

For comparison, we also designed a chip based on traditional parallel archi- tecture. In order not to waste storage space, SRAMs and RBs are used for the storage of coefficients. In order to reduce the area of the chip, theGRNG module is also shared by eight MISOs.

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/neco_a_01107 by guest on 29 September 2021 ASIC Implementation of Hippocampal Neural Networks 2489

Figure 6: Chip layout (top) and corresponding die photo (bottom) of MIMO- GLVM NN ASIC based on the proposed architecture.

The MIMO hippocampal NN ASIC based on the proposed architecture and on the traditional parallel architecture are both implemented by de- sign with Verilog code, logic synthesis with design compiler, and automatic placement and routing with IC compiler. They have passed static timing verification, dynamic simulation verification, formal verification, and phys- ical verification. Both are fabricated in SMIC 40 nm 1P8M technology. The chip layout and corresponding die photo of the MIMO hippocam- pal NN ASIC based on the proposed architecture are shown in Figure 6. The chip layout and the corresponding die photo of the MIMO hippocam- pal NN ASIC based on the traditional parallel architecture are shown in Figure 7. The functions of both chips have been verified by testing. (The test sys- tem is in Figure 8.) The test process begins by using Matlab to train the

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/neco_a_01107 by guest on 29 September 2021 2490 Z. Qiao et al.

Figure 7: Chip layout (top) and corresponding die photo (bottom) of MIMO– GLVM NN ASIC based on the traditional parallel architecture.

MIMO–GLVM to obtain the coefficients in the model. Then the ASIC starts to work in stage 1, and the coefficients are sent to the ASIC by Xilinx FPGA Spartan-6 XC6SLX9. When stage 1 is finished, the ASIC starts to work in stage 2—the prediction stage. In stage 2, the ASIC and the software algo- rithm (whose precision has already been validated by experiments; Song et al., 2007) are used to perform the prediction process with one session of the neuronal firing data (animal 1150). The waveform generator is usedto

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/neco_a_01107 by guest on 29 September 2021 ASIC Implementation of Hippocampal Neural Networks 2491

Figure 8: Test system of the MIMO–GLVM NN ASIC.

generate the 2 MHz clock. The other inputs of the ASIC are generated by FPGA, and the outputs are captured by FPGA and oscilloscope. The training and prediction data are acquired from male Long-Evans rats aged four to six months that have been trained to perform the delayed nonmatch-to-sample (DNMS) task (Hampson et al., 2012). The hippocam- pal CA3 and CA1 neuronal firing activities are recorded by the multielec- trode array when the rats are performing the task. The recorded spike signals are processed by the spike-sorting algorithm. The accuracy of the system can be measured by the normalized mean square error (NMSE) of membrane potential and the output spikes:

T T   NMSE = (y(t) − y (t))2/ y (t)2, (4.1) t=1 t=1

where y(t) is the variable to be evaluated; it can be membrane potential u + a or spike outputs y. y(t) is the hardware calculation results and y(t) the soft- ware calculation results. T is the length of the data set. In our experiment, T is 8000. For simplicity, here we give only the test results based on the proposed architecture. The calculation results of the membrane potential are shown in Figures 9a and 9b. The difference between the two data sets is shown in Figure 9c. After analyzing the results, we found that the NMSE of the

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/neco_a_01107 by guest on 29 September 2021 2492 Z. Qiao et al.

Figure 9: Calculation results of membrane potential. (a) Software calculation results. (b) Hardware calculation results. (c) Difference between the software and hardware results.

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/neco_a_01107 by guest on 29 September 2021 ASIC Implementation of Hippocampal Neural Networks 2493

Table 5: Performance Comparison.

Technology SMIC 40 nm 1P8M Supply voltage Core: 0.99–1.21 V I/O: 2.5 V Architecture Traditional parallel architecture Proposed architecture − − NMSE (u + a) 3.6271 × 10 12 3.6656 × 10 12 NMSE (y)00 Chip area 1.643 mm2 1.464 mm2 Core area 0.810 mm2 0.122 mm2 Gates count 1,629K 158.7K (2-input NAND) On-chip SRAM 12.00K bytes 12.25K bytes Coefficients’ registers 224 bytes 32 bytes Frequency 98 kHz 2 MHz Core total powera 111.5 μW 84.4 μW Core static power 92.8 μW5.2μW Core dynamic power 18.7 μW 79.2 μW

aTest condition: input spike mean firing rates are in the range of 0.5 to 15Hz.

membrane potentials u + a and output prediction spike trains y are 3.6656 × 10−12 and 0, respectively. That the precision of the hardware is very high is clear. The performance comparison with the ASIC based on the traditional par- allel architecture is in Table 5. As the most important part of the hippocam- pal prosthesis, the core of the chip will be embedded in the whole prosthesis in the future, so only the area and power consumption of the core are wor- thy of attention. So far, there has been little MIMO–GLVM-based hippocampal NN ASIC research. It is reported only in Berger et al. (2012), who do not give data about chip area, power consumption, accuracy, or functional test results. Other research has been based on the FPGA platform. In Li et al. (2011a), the MIMO system consists of multiple FPGAs, each implementing a MISO. The power consumption is 1.9 W for one MISO, so the power consumption is 15.2 W for the MIMO system. In this letter, eight MISOs are integrated in one chip. Test results show that the power consumption of the ASIC proposed is only 84.4 μW. Compared with the hippocampal NN ASIC using the traditional parallel architecture, the NMSE of u + a is bigger in the ASIC using the pro- posed architecture, but it increased by only 1.06% and is acceptable be- cause the NMSE of the most important final output prediction spike trains y is 0. The physical space area occupied by SRAMs is reduced 59.38%, from 114,643 μm2 to 46,571 μm2, proving the high efficiency of the pro- posed storage space configuration scheme. The core area of the chip is also reduced, from 0.810 mm2 to 0.122 mm2, or 84.94%, demonstrating the

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/neco_a_01107 by guest on 29 September 2021 2494 Z. Qiao et al.

low-complexity and small area characteristics of the proposed architecture. Test results show that the core power is reduced by 24.30% which demon- strates the low-power characteristic of the proposed architecture. The power consumption of the ASIC proposed is very small for the fol- lowing reasons: • The chip can operate at a frequency of 2 MHz or higher. For the ap- plication of a hippocampal prosthesis, in dealing with neural signals, the MIMO–GLVM NN ASIC does not need to work at a higher fre- quency because only the neural spike signals with mean firing rates in the range of 0.5 to 15 Hz need to be included in the processing (Song, Chan et al., 2009). In the case of achieving the oversampling frequency of the neural spike signal, the system frequency (2 MHz) is minimized to reduce power consumption, and it can still realize the real-time prediction of neural activity. • The proposed architecture greatly reduces the core area of the chip. In the 40 nm process, the leakage power accounts for a significant pro- portion, and the leakage power is greatly reduced due to the reduced area. • Avariety of low-power strategies are adopted, and low-power convo- lution and GRNG modules are proposed for the MIMO hippocampal NN ASIC design.

5 Discussion

There are three advantages of the proposed architecture: 1. Highly efficient storage space configuration. Compared with thechip based on the traditional parallel architecture, the proposed architec- ture has high utilization of physical space with little storage space wasted. The physical space area occupied by SRAMs is reduced by 59.38%. 2. Low complexity and small area. Compared with the chip based on the traditional parallel architecture, the core area of the chip can be reduced by 84.94%. 3. Low power consumption. Compared with the chip based on the tra- ditional parallel architecture, the chip’s core power consumption can be reduced by 24.30%. The proposed architecture has these disadvantages: 1. The chip needs a higher clock frequency, which increases the dy- namic power consumption. After adopting the proposed architec- ture, the core processing unit is time-multiplexed between the eight MISOs. Therefore, in order to achieve the same function as the par- allel architecture, the operating speed of the core processing units needs to be increased by eight times. In addition, the convolution

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/neco_a_01107 by guest on 29 September 2021 ASIC Implementation of Hippocampal Neural Networks 2495

unit uses a pipeline structure, making the chip operating frequency increase by less than three times. These make the proposed architec- ture work about 20 times faster than the traditional parallel architec- ture, increasing the dynamic power consumption. 2. The system scalability (extended to more outputs) is not as good as the parallel architecture. The parallel architecture consists of multiple MISOs that are independent of each other, so the scalability is rela- tively high. In the proposed architecture, the coefficient matrices in multiple MISOs are stored together, and the core processing units are time-division-multiplexed, thus reducing the system area and scale at the expense of scalability. 3. The accuracy of the system is not as high as the parallel architecture. In the chip, the convolution unit is the key to calculation accuracy. The convolution unit circuit presented in this letter greatly reduces the circuit area at the expense of computational accuracy.

For the first point, although the chip’s dynamic power consumption in- creased but the chip’s static power consumption is greatly reduced, the overall power consumption is also reduced by 24.30%. For the third point, although the precision of the system is not as high as that of the parallel architecture, the calculation accuracy of the cell membrane potential is re- duced by only 1.06%. The accuracy of the output y is the same without af- fecting the correctness of the system functions. It is a trade-off between the scalability and the chip area for the two architectures. In current work, the input/output data for model training are prere- corded. Training data were collected from implanted electrodes in normal rat hippocampus without the need for implantation of the hippocampal prosthesis chip. Only individuals with impaired hippocampus need to be have an implanted hippocampal prosthesis bio-chip. Due to the similar- ities of the hippocampal structures, these training coefficients also apply to other rats. The model training part is done on the server, and the final coefficients obtained by the training are stored in the FPGA. In the test, we transmit the coefficients through the FPGA to the ASIC. In future hip- pocampal prostheses, wireless interfaces will be integrated, and these coef- ficients will be transmitted to the chip wirelessly, so the operation needsto be performed only once after the chip has been powered on. In the current chip test, the waveform generator is used to generate the clock signal. In future hippocampus prostheses, the ring oscillator can be used to generate the clock signal. The coefficients are stored in a single-precision FP format, which would no doubt result in higher memory requirements. By further study of the neural network of the hippocampus, two important phenomena can be found: the neuron’s output may not be affected by all the inputs connected to it, and for a given input, not all historical stimuli have an important con- tribution to its connected neurons’ outputs.

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/neco_a_01107 by guest on 29 September 2021 2496 Z. Qiao et al.

These phenomena make the coefficient matrix C and V have certain sparse characteristics. While the complete Volterra model does not account for these sparse features, the number of elements in the coefficient matrix to be estimated is huge and not particularly efficient in representing the sys- tem. The sparse Volterra model (sVM), a simplified version of the Volterra model, selects only a few statistical subsets of all the model coefficients and can greatly reduce the number of elements in the matrix of coefficients to be estimated, reducing the complexity of the model. In order to reduce the memory requirements, two methods can be used. One is reducing the num- ber of coefficients. We will consider the sparse characteristics of neural con- nections based on the GLVM model and study the hardware-implemented sparse GLVM (sGLVM) (Song et al., 2013, 2015). The other is using a fixed-point format instead of single-precision FP format to store these coefficients. Another potential extension of our current work is the utilization of the stochastic state point-process filter (SSPPF) (Eden et al., 2004; Chan, Song, & Berger, 2009) in doing the model training instead of using the SDPPF. In the SSPPF algorithm, the learning rate will not be set as a constant; it need to be adjusted during each round of calculation. It can improve the speed that the model achieves convergence by training. The drawback is that this algorithm is more complicated, so it will consume more software memory. This chip is aimed at an implantable neural prosthesis, so the power is- sue is more critical. In muscle tissue, the power density of the chip cannot be higher than 800 μW/mm2; otherwise it will cause cell necrosis (Seese, Harasaki, Saidel, & Davies, 1998). The power density of this chip is 691.8 μW/mm2 which is lower than this value. No relevant research has been found in brain tissue. Since the MIMO–GLVM NN ASIC is only one part of the hippocampal prosthesis, it will make more sense to discuss the power density of the hippocampal prosthesis in the future. We will study this issue in future hippocampal prosthesis design.

6Conclusion

A novel MIMO–GLVM-based programmable hippocampal neural network ASIC is proposed. New hardware architecture, a storage space configura- tion scheme, low-power convolution, and gaussian random number gen- erator modules are proposed. The ASIC is fabricated in 40 nm technology with a core area of 0.122 mm2 and test power of 84.4 μW. Compared with the design based on the traditional architecture, experimental results show that the core area of the chip is reduced by 84.94% and the core power is reduced by 24.30%, demonstrating the low-complexity, small area, and low-power characteristics of the proposed architecture. As the most important module of the hippocampal prosthesis, it will facilitate the research and develop- ment of implantable hippocampal prosthesis.

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/neco_a_01107 by guest on 29 September 2021 ASIC Implementation of Hippocampal Neural Networks 2497

Acknowledgments

This work was supported in part by Shenzhen Municipal Research grant JCYJ20150630140546712, in part by the National Natural Science Founda- tion of China under grant 61601226, and in part by the Natural Science Foundation of Jiangsu Province of China under grant BK20160850.

References

Bartolozzi, C., & Indiveri, G. (2007). Synaptic dynamics in analog VLSI. Neural Com- putation, 19(10), 2581–2603. Berger, T. W., Orr, W. B., & Orr, W. B. (1983). Hippocampectomy selectively disrupts discrimination reversal conditioning of the rabbit nictitating membrane response. Behavioural Brain Research, 8(1), 49–68. Berger, T. W., Song, D., Chan, R. H. M., Marmarelis, V. Z., LaCoss, J., Wills, J., . . . Granacki, J. J. (2012). A hippocampal cognitive prosthesis: Multi-input, multiout- put nonlinear modeling and VLSI implementation. IEEE Transactions on Neural Systems and Rehabilitation Engineering, 20(2), 198–211. Box, G. E. P., Muller, M. E. (1958). A note on the generation of random normal devi- ates. Annals of Mathematical Statistics, 29(2), 610–611. Chan, R. H. M., Song, D., & Berger, T. W. (2009). Nonstationary modeling of neu- ral population dynamics. In Proceedings of the International Conference of the IEEE Engineering in Medicine and Biology Society (pp. 4559–4562). Piscataway, NJ: IEEE. Chen, Y., Krishna, T., Emer, J. S., & Sze, V. (2017). Eyeriss: An energy-efficient re- configurable accelerator for deep convolutional neural networks. IEEE Journal of Solid-State Circuits, 52(1), 127–138. Cymbalyuk, G. S., Patel, G. N., Calabrese, R. L., DeWeerth, S. P., & Cohen, A. H. (2000). Modeling alternation to synchrony with inhibitory coupling: A neuromor- phic VLSI approach. Neural Computation, 12(10), 2259–2278. Eden, U. T., Frank, L. M., Barbieri, R., Solo, V., & Brown, E. N. (2004). Dynamic anal- ysis of neural encoding by point process adaptive filtering. Neural Computation, 16(5), 971–998. Eichenbaum, H., Fagan, A., Mathews, P., & Cohen, N. J. (1988). Hippocampal system dysfunction and odor discrimination learning in rats: Impairment or facilitation depending on representational demands. Neural Computation, 102(3), 331–339. Giulioni, M., Pannunzi, M., Badoni, D., Dante, V., & Giudice, P. D. (2009). Classifi- cation of correlated patterns with a configurable analog VLSI neural network of spiking neurons and self-regulating plastic synapses. Neural Computation, 21(11), 3106–3129. Hampson, R. E., Song, D., Chan, R. H. M., Sweatt, A. J., Riley, M. R., Gerhardt, G. A., . . . Deadwyler, S. A. (2012). A nonlinear model for hippocampal cognitive pros- thesis: Memory facilitation by hippocampal ensemble stimulation. IEEE Transac- tions on Neural Systems and Rehabilitation Engineering, 20(2), 184–197. Karahaliloglu, K., Gans, P., Schemm, N., & Balkir, S. (2008). Pixel sensor integrated neuromorphic VLSI system for real-time applications. Neurocomputing, 72(1), 293–301.

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/neco_a_01107 by guest on 29 September 2021 2498 Z. Qiao et al.

Li,W.X.Y.,Chan,R.H.M.,Zhang,W.,Cheung,R.C.C.,Song,D.,&Berger,T. W. (2011a). High-performance and scalable system architecture for the real-time estimation of generalized Laguerre-Volterra MIMO model from neural popula- tion spiking activity. IEEE Journal on Emerging and Selected Topics in Circuits and Systems, 1(4), 489–501. Li,W.X.Y.,Chan,R.H.M.,Zhang,W.,Cheung,R.C.C.,Song,D.,&Berger,T. W. (2011b). A hardware-based computational platform for generalized Laguerre– Volterra MIMO model for neural activities. In Proceedings of the 2011 Annual In- ternational Conference of the IEEE Engineering in Medicine and Biology Society (pp. 7282–7285). Piscataway, NJ: IEEE. Li, W. X. Y., Cheung, R. C. C., Chan, R. H. M., Song, D., & Berger, T. W. (2013a). Real-time prediction of neuronal population spiking activity using FPGA. IEEE Transactions on Biomedical Circuits and Systems, 7(4), 489–498. Li, W. X. Y., Cheung, R. C. C., Chan, R. H. M., Song, D., & Berger, T. W. (2013b). A re- configurable architecture for real-time prediction of neural activity. In Proceedings of the 2013 IEEE International Symposium on Circuits and Systems (pp. 1869–1872). Piscataway, NJ: IEEE. Li, W. X. Y., Xin, Y., Chan, R. H. M., Song, D., Berger, T. W., & Cheung, R. C. C. (2014). Laguerre-Volterra model and architecture for MIMO system identification and output prediction. In Proceedings of the Annual International Conference of the IEEE Engineering in Medicine and Biology Society (4539–4542). Piscataway, NJ: IEEE. Li, Y., Lee, T., & Hwang, J. (2015). Modular design and implementation of field- programmable-gate-array-based gaussian noise generator. International Journal of Electronics, 103(5), 819–830. Marmarelis, V. Z., & Marmarelis, V. Z. (1993). Identification of nonlinear biological systems using Laguerre expansions of kernels. Annals of Biomedical Engineering, 21(6), 573–589. Martí D., Rigotti, M., Seok, M., & Fusi, S. (2016). Energy-efficient neuromorphic clas- sifiers. Neural Computation, 28(10), 2011–2044. Merolla, P., Arthur, J., Alvarez-Icaza, R., Cassidy, A., Sawada, J., Akopyan, F., ...Modha,D.(2014). A million spiking-neuron integrated circuit with a scalable communication network and interface. Science, 345(6197), 668–673. Milner, B. (1970). Memory and the medial temporal regions of the brain. In K. H. Pribram & D. E. Broadbent (Eds.), Biology of memory (pp. 29–50). New York: Aca- demic Press. Mullard, A. (2016). Symptomatic AD treatment fails in first phase III. Nature Reviews Drug Discovery, 15(11), 738. Neftci, E., Chicca, E., Indiveri, G., & Douglas, R. (2011). A systematic method for configuring VLSI networks of spiking neurons. Neural Computation, 23(10), 2457– 2497. Ogura, H. (1972). Orthogonal functionals of the Poisson process. IEEE Transactions on Information Theory, 18(4), 473–481. Rawat, W., & Wang, Z. (2017). Deep convolutional neural networks for image clas- sification: A comprehensive review. Neural Computation, 29(9), 2352–2449. Seese, T. M., Harasaki, H., Saidel, G. M., & Davies, C. R. (1998). Characterization of tissue morphology, angiogenesis, and temperature in the adaptive response of muscle tissue to chronic heating. Laboratory Investigation, 78(12), 1553.

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/neco_a_01107 by guest on 29 September 2021 ASIC Implementation of Hippocampal Neural Networks 2499

Service, R. (2014). The brain chip. Science, 345(6197), 614–616. Sevigny, J., Chiao, P., Bussière, T., Weinreb, P. H., Williams, L., Maier, M., . . . San- drock, A. (2016). The antibody aducanumab reduces A plaques in Alzheimer’s disease. Nature, 537(7618), 50–56. Song, D., Chan, R. H. M., Marmarelis, V. Z., Hampson, R. E., Deadwyler, S. A., & Berger, T. W. (2007). Nonlinear dynamic modeling of spike train transformations for hippocampal-cortical prostheses. IEEE Transactions on Biomedical Engineering, 54(6), 1053–1066. Song,D.,Chan,R.H.M.,Marmarelis,V.Z.,Hampson,R.E.,Deadwyler,S.A.,& Berger, T. W. (2009). Nonlinear modeling of neural population dynamics for hip- pocampal prostheses. Neural Networks, 22(9), 1340–1351. Song, D., Marmarelis, V. Z., & Berger, T. W. (2009). Parametric and non-parametric modeling of short-term synaptic plasticity: Part I: Computational study. Journal of Computational Neuroscience, 26(1), 1–19. Song, D., Robinson, B. S., Hampson, R. E., Marmarelis, V. Z., Deadwyler, S. A., & Berger, T. W. (2015). Sparse generalized Volterra model of human hippocampal spike train transformation for memory prostheses. In Proceedings of the Engineer- ing in Medicine and Biology Society (p. 3961). Piscataway, NJ: IEEE. Song,D.,Wang,H.,Tu,C.Y.,Marmarelis,V.Z.,Hampson,R.E.,Deadwyler,S.A., & Berger, T. W. (2013). Identification of sparse neural functional connectivity us- ing penalized likelihood estimation and basis functions. Journal of Computational Neuroscience, 35(3), 335–357. Squire, L. R., & Zola-Morgan, S. (1991). The medial temporal lobe memory system. Science, 253(5026), 1380–1386. Tkacik, T. E. (2003). Ahardware random number generator. In B. Kaliski, Ç. K. Koç, & C. Paar (Eds.), Lecture Notes in Computer Science: Vol. 2523: Cryptographic hardware and embedded systems (pp. 450–453). Berlin: Springer. Valiant, L. G. (2012). The hippocampus as a stable memory allocator for cortex. Neu- ral Computation, 24(11), 2873–2899.

Received December 1, 2017; accepted March 21, 2018.

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/neco_a_01107 by guest on 29 September 2021