PyCARL: A PyNN Interface for Hardware-Software Co-Simulation of

Adarsha Balaji1, Prathyusha Adiraju2, Hirak J. Kashyap3, Anup Das1,2, Jeffrey L. Krichmar3, Nikil D. Dutt3, and Francky Catthoor2,4 1Electrical and Computer Engineering, Drexel University, Philadelphia, USA 2Neuromorphic Computing, Stichting Imec Nederlands, Eindhoven, Netherlands 3Cognitive Science and Computer Science, University of California, Irvine, USA 4ESAT Department, KU Leuven and IMEC, Leuven, Belgium Correspondence Email: [email protected], [email protected], [email protected] ,

Abstract—We present PyCARL, a PyNN-based common To facilitate faster application development and portability Python programming interface for hardware-software co- across research institutes, a common Python programming simulation of spiking neural network (SNN). Through PyCARL, interface called PyNN is proposed [8]. PyNN provides a we make the following two key contributions. First, we pro- vide an interface of PyNN to CARLsim, a computationally- high-level abstraction of SNN models, promotes code sharing efficient, GPU-accelerated and biophysically-detailed SNN simu- and reuse, and provides a foundation for simulator-agnostic lator. PyCARL facilitates joint development of machine learning analysis, visualization and data-management tools. Many SNN models and code sharing between CARLsim and PyNN users, simulators now support interfacing with PyNN — PyNEST [9] promoting an integrated and larger neuromorphic community. for the NEST simulator, PyPCSIM [4] for the PCSIM sim- Second, we integrate cycle-accurate models of state-of-the-art neuromorphic hardware such as TrueNorth, Loihi, and DynapSE ulator, and 2 [10] for the Brian simulator. Through in PyCARL, to accurately model hardware latencies, which delay PyNN, applications developed using one simulator can be an- spikes between communicating neurons, degrading performance alyzed/simulated using another simulator with minimal effort. of machine learning models. PyCARL allows users to analyze Currently, no interface exists between CARLsim, which is and optimize the performance difference between software-based implemented in C++ and the Python-based PyNN. Therefore, simulation and hardware-oriented simulation. We show that system designers can also use PyCARL to perform design-space applications developed in PyNN cannot be analyzed using exploration early in the product development stage, facilitating CARLsim and conversely, CARLsim-based applications can- faster time-to-market of neuromorphic products. not be analyzed/simulated using other SNN simulators without Index Terms—spiking neural network (SNN); neuromorphic requiring significant effort. This creates a large gap between computing; CARLsim; co-simulation; design-space exploration these two research communities. Our objective is to bridge this gap and create an integrated I.INTRODUCTION neuromorphic research community, facilitating joint develop- ments of machine learning models and efficient code sharing. Advances in computational neuroscience have produced a Figure 1 illustrates the standardized application programming variety of software for simulating spiking neural network interface (API) architecture in PyNN. Brian 2 and PCSIM, (SNN) [1] — [2], NEST [3], PCSIM [4], Brian [5], which are native Python implementations, employ a direct MegaSim [6], and CARLsim [7]. These simulators model communication via the pynn.brian and pynn.pcsim neural functions at various levels of detail and therefore have API calls, respectively. NEST, on the other hand, is not a different requirements for computational resources. native Python simulator. So, the pynn.nest API call first In this paper, we focus on CARLsim [7], which facilitates results in a code generation to the native SLI code, a stack- parallel simulation of large SNNs using CPUs and multi- based language derived from PostScript. The generated code GPUs, simulates multiple compartment models, 9-parameter is then used by the Python interpreter PyNEST to simulate an Izhikevich and leaky integrate-and-fire (LIF) spiking neuron SNN application utilizing the backend NEST simulator kernel. models, and integrates the fourth order Runge Kutta (RK4) Figure 1 also shows our proposed interface for CARLsim, method for improved numerical precision. CARLsim’s sup- which is exposed via the new pynn.carlsim API in PyNN. port for built-in biologically realistic neuron, synapse, current We describe this interface in details in Section III. and emerging learning models and continuous integration On the hardware front, neuromorphic computing [11] has and testing, make it an easy-to-use and powerful simulator shown significant promise to fuel the growth of machine learn- of biologically-plausible SNN models. Benchmarking results ing, thanks to low-power design of neuron circuits, distributed demonstrate simulation of 8.6 million neurons and 0.48 billion implementation of computing and storage, and integration of synapses using 4 GPUs and up to 60x speedup with multi-GPU non-volatile synaptic memory. In recent years, several spiking implementations over a single-threaded CPU implementation. neuromorphic architectures are designed: SpiNNaker [12], DYNAP-SE [13], TrueNorth [14] and Loihi [15]. Unfortu- state-of-the-art neuromorphic hardware, PyCARL allows users nately, due to non-zero latency of hardware components, to perform hardware exploration and performance estimation spikes between communicating neurons may experience non- early during application development, accelerating the neuro- deterministic delays, impacting SNN performance. morphic product development cycle.

PyNN (frontend) II.OUR INTEGRATED FRAMEWORK PYCARL (pynn.nest) (pynn.pcsim) (pynn.brian) (pynn.carlsim) PyNN APIs Figure 3 shows a high-level overview of our integrated framework PyCARL, based on PyNN. An SNN model written Python Interpreter in PyNN is simulated using the CARLsim backend kernel with

Native Interpreter the proposed pynn-to-carlsim interface (contribution 1). This generates the first output snn.sw.out, which consists of synaptic strength of each connection in the network and pre- NEST CARLsim Simulator Kernel PCSIM cise timing of spikes on these connections. This output is then Our Contribution 1 used in the proposed carlsim-to-hardware interface, allowing Fig. 1: PyNN standardized API architecture and our proposed simulating the SNN on a cycle-accurate model of a state- pynn-to-carlsim interface. of-the-art neuromorphic hardware such as TrueNorth [14], Currently, no PyNN-based simulators incorporate neuro- Loihi [15], and DYNAP-SE [13]. Our cycle-accurate model morphic hardware laterncies. Therefore, SNN performance generates the second output snn.hw.out, which consists of estimated using PyNN can be different from the perfor- 1) hardware-specific metrics such as latency, throughput, and mance obtained on hardware. Our objective is to estimate energy, and 2) SNN-specific metrics such as inter-spike inter- this performance difference, allowing users to optimize their val distortion and disorder spike count (which we formulate machine learning model to meet a desired performance on a and elaborate in Section IV). SNN-specific metrics estimate Brian 2 Brian SLI Interface C++ PyNEST target neuromorphic hardware. FigurePyPCSIM 2 shows our proposed the performance drop due to non-zero hardware latencies. carlsim-to-hardware interface to model state-of-the- art neuromorphic hardware at a cycle-accurate level, using Neuromorphic Hardware (TrueNorth / Loihi / DYNAP-SE) the output generated from the proposed pynn-to-carlsim PyCARL interface (see Figure 1). We describe this interface in Sec. IV. Neuromorphic PyNN CARLsim DSE Hardware PyNN (frontend) Design DSE of (pynn.carlsim) Points neuromorphic pynn-to-carlsim carlsim-to-hardware hardware TrueNorth / Loihi/ DYNAP-SE (1) snn.sw.out DSE of DSE of SNN model SNN model (2) snn.hw.out Contribution 1

Interface Fig. 3: Our integrated framework PyCARL. DSE of SNN model and neuromorphic CARLsim hardware We now describe the components of PyCARL and show Contribution 2 Interface how to use PyCARL to perform design space explorations. Hardware (cycleC++ accurate) DSE = Design-space exploration III.PYNN-TO-CARLSIM INTERFACE IN PYCARL (a) pynn-to-carlsim and (b) SNN and hardware DSE Apart from bridging the gap between the PyNN and carlsim-to-hardware using our framework the CARLsim research communities, the proposed pynn-to- interface Fig. 2: (a) Our proposed interface to estimate SNN per- carlsim interface is also significant in the following three formance on neuromorphic hardware and (b) design space ways. First, Python being an interactive language allows exploration (DSE) based on this contribution. users to interact with the CARLsim kernel through command line, reducing the application development time. Second, the The two new interfaces developed in this work can be proposed interface allows code portability across different integrated inside a design-space exploration (DSE) framework operating systems (OSes) such as Linux, Solaris, Windows, (illustrated in Figure 2(b)) to explore different SNN topologies and Macintosh. Third, Python being open source, allows dis- and neuromorphic hardware configurations, optimizing both tributing the proposed interface with mainstream OS releases, SNN performance such as accuracy and hardware performance exposing neuromorphic computing to the systems community. such as latency, energy, and throughput. An interface between Python (PyNN) and C++ (CARLsim) Summary: To summarize, our comprehensive co-simulation can be created using the following two approaches. First, framework, which we call PyCARL, allows CARLsim-based through statically linking the C++ library with a Python detailed software simulations, hardware-oriented simulations, interpreter. This involves copying all library modules used and neuromorphic design-space explorations, all from a com- in CARLsim into a final executable image by an external mon PyNN frontend, allowing extensive portability across program called linker or link editors. Statically linked files different research institutes. By using cycle-accurate models of are significantly larger in size because external programs are built into the executable files, which must be loaded into the The fourth component consists of enumerated data types memory every time they are invoked. This increases program defined by the directive enum (Figure 7). In this example execution time. Static linking also requires all files to be we show two definitions – 1) the STDP curve and 2) the recompiled every time one or more of the shared modules computing platform. The last component is the CARLsim class change. A second approach is the dynamic linking, which object along with its member functions (Figure 8). involves placing the names of the external libraries (shared enum STDPCur ve { EXP_CURVE, libraries) in the final executable file while the actual linking PULSE_CURVE, TI MI NG_BASED_CURVE, taking place at run time. Dynamic linking is performed by the UNKNOWN_CURVE OS through API calls. Dynamic linking places only one copy }; enum Comput i ngBackend { of the shared library in memory. This significantly reduces CPU_CORES, GPU_CORES the size of executable programs, thereby saving memory and }; disk space. Individual shared modules can be updated and 32 Fig.Chapter 7: Enumerated 5. Design data and types implementation using the enum of PyCARLsimdirective. recompiled, without compiling the entire source code again. cl ass CARLsi m{ Finally, load time of shared libraries is reduced if the shared publ i c: library code is already present in memory. Due to lower // creati ng carl si m obj ect// many languages map enums,CARLsi structsm(const st and d: : st constantsr i ng& net Name in = "SNN",a class Si mMode definition into con- execution time, reduced memory usage, and flexibility, we pr ef er r edSi mMode = CPU_MODE, Logger Mode l ogger Mode = USER, stants with the class namei nt as ia t hGPUs prefix, = 0, SWIG i nt randSeed generates = -1); a set mappings that can be adopt dynamic linking of CARLsim with PyNN. used by the target scripting language. ~CARLsi m(); In CARLsim, there are a number of enumer- / / creat i ng groups and spi kegenerat or group/ / We now describe the two steps involvedated data in types creating that the are neededi nt creat for eSpithe keGenerat working orGroup(const of the st complete d: : st ri ng& grpName, simulator. These i nt nNeur, i nt neurType, i nt preferredParti ti on = ANY, proposed pynn-to-carlsim interface. have been added to the carlsim Computmodule. i ngBackend pr ef er r edBackend = CPU_CORES) ; Finally, a CARLsim class objecti nt creat is eSpi created keGenerat with orGroup(const its member st d: : st functions. ri ng& grpName, These mem- const Gri d3D& gri d, i nt neurType, i nt preferredParti ti on = ANY, A. Step 1: Generating the Interface Binaryber functionscarlsim.so are the different Comput functionalities i ngBackend pr thatef er r edBackend have been = CPU_CORES) discussed ; earlier in the design choices. The carlsimi nt creatmodule eGroup(const has st been d: : st ri extended ng& grpName, and i nt more nNeur, functionality Unlike PyNEST, which generates the interface binary manu- i nt neurType, i nt preferredParti ti on = ANY, ally, we propose to use the Simplifiedlike Wrapperspike Interface generation Gen- from file and Computspike i ngBackend generation pr ef er fromr edBackend vector = CPU_CORES)have been ; added to the carlsim module. i nt creat eGroup(const st d: : st ri ng& grpName, const Gri d3D& gri d, erator (SWIG), downloadable at http://www.swig.org. SWIG i nt neurType, i nt preferredParti ti on = ANY, Comput i ngBackend pr ef er r edBackend = CPU_CORES) ; simplifies the process of interfacing highThe level major languages advantage such of}; using SWIG is that its uses a layered approach to gen- as Python with low-level languages sucherate as C/C++,a wrapper preserving over C++ classes. At theFig. lowest 8: CARLsim level, aclass collection object. of procedural the robustness and expressiveness of theseANSI-C low-level style languageswrappers are generatedThe major advantageby SWIG. of These using wrappers SWIG is thattake it care uses of a thelayered ba- from the high-level abstraction. sic type conversions, typeapproach checking, to error generate handling a wrapper and other over low-level C++ classes. details At of the the The SWIG compiler creates a wrapperC++ bindings binary [ code36]. by lowest level, a collection of procedural ANSI-C style wrappers using headers, directives, macros, and declarations from the are generated by SWIG. These wrappers take care of the basic underlying C++ code of CARLsim.Command Figures 4-8 line show compilation the type conversions, type checking, error handling and other low- different components of the input file carlsim.i needed to level details of C++ bindings. To generate the interface binary After the implementation of the interface file, it must be compiled in order to gener- generate the compiled interface binary file carlsim.so. The file carlsim.so, the input file carsim.i is compiled ate a .so file. This is done using the command line as seen in figure 5.5 first component are the interface files that are included using using the swig compiler as shown in Figure 9. the %include directive (Figure 4). The second component consists of declaration of the data structures (e.g., vectors) of Fig. 9: Compilation of carlsim.i using the SWIG compiler CARLsim using the %template directive (Figure 5). The FIGURE 5.5:to Executing generate carsim.so the build frominterface the command binary. line third component is the main module definition that can be loaded in Python using the import commandHere, when (Figure wrapping 6). B. C++ Step code,it 2: Designing is necessary PyNN that API the to userLink carlsim.so specifies ’-c++’ op- // Incl ude i nterface fi l es i f necessary tion, this might affect mannerWe in now which describe memory the management proposed pynn.carlsim and other criticalAPI fea- to %incl ude tures are handled. Moreover,link the the recognition interface binary of C++carlsim.so keyword isin enabled PyNN. by specify- %incl ude %incl ude ing this option. Consequently,The thecarlsim.so command asinterface shown in binary figure is5.6 placedruns within the setup the %incl ude %incl ude that has been implementedsub-package in the setup directory file. The of linkingPyNN. This among exposes the files CARLsim and initial- inter- %incl ude Fig. 4: Define interface files using theizations%include are donedirective. by this command.nal methods as a Python library using the import command as from carlsim import *. The PyNN front-end API namespace st d { %templ at e(vect ori ) vect or; architecture supports implementing both basic functionali- %templ at e(vect ord) vect or; }; ties (common for all backend simulators) and specialized FIGURE 5.6: Executing setup from the command line Fig. 5: Declare CARLsim data structures using the simulator-specific functionalities. %template directive. 1) Implementing common functionalities: PyNN defines The result of these two commands is a .so (library file) file that contains binary many common functionalities to create a basic SNN model. %modul e car l si m data of the carlsim module. A carlsim.py file is also resulting as a product of %{ Examples include cell types, connectors, synapses, and elec- / * Put headers and ot her decl arat i ons here this */ build. Hence, the CARLsim simulator is now made available as a python library #i ncl ude ". . / CARLsi m4/ carl si m/i nt erf ace/ i nc/ carl si m.h" trodes. Figure 10 shows the UML class diagram to create #i ncl ude ". . / CARLsi m4/ carl si m/i nt erf ace/ i nc/ carl si m_dat ast ruct ures. h" #i ncl ude ". . / CARLsi m4/ carl si m/i nt erf ace/ i nc/that carl si can m_def be i ni called t i ons. h" from within python as can be seen in the listing 5.2. #i ncl ude ". . / CARLsi m4/ carl si m/i nt erf ace/ i nc/ cal l back. h" the Izhikevich cell type [16] using the pynn.carlsim %} 1 API. The inheritance relationship shows that the PyNN Fig. 6: Main module definition2 forfrom importcarlsim in Python.import * 3 standardmodels class includes the definition of all the 4 import time 5 6 TRAINING_TIME_IN_SEC = 50 7 TEST_TIME_IN_SEC = 50 8 9 DEBUG_ENABLE = 0 10 11 N_IN = 9 12 N_TOT = 80 13 N_EXC = 0 14 N_INH = 0 15 EXC_INH_RATIO = 80 16 17 EXCITATORY_PYRAMIDAL_NEURONS = 0 methods under the StandardModelType class. The Izhike- C. Using pynn-to-carlsim Interface vich model and all similar standard cell types are a spe- To verify the integrity of our implementation, Figure 13 cialization of this StandardModelType class, which sub- shows the source code of a test application written in PyNN. sequently inherits from the PyNN BaseModelType class. The source code sets SNN parameters using PyNN. A simple Defining other standard components of an SNN model follow spike generation group with 1 neuron and a second neuron similar inheritance pattern using the common internal API group with 3 excitatory Izhikevich neurons are created. The functions provided by PyNN. user can run the code in a CPU or a GPU mode by specifying the respective parameters in the command line. pynn.models.BaseModelType + __init__ pynn.common.control.BaseState pynn.carlsim.CARLsim 4 + __repr__ + __init__() + this + describe + get_parameter_names + get_schema + __init__() + has_parameter + biasWeights() + connect() + createGroup() pynn.carlsim.simulator.State + createSpikeGeneratorGroup + getSimTime() pynn.models.BaseCellType pynn.standardmodels.__init__.StandardModelType + network + getSimTimeMsec() + can_record + computed_parameters + netName + get_native_names + simMode + getSimTimeSec() + native_parameters-getter + logMode + loadSimulation() + reverse_translate + ithGPUs + runNetwork() + scaled_parameters + randSeed + saveSimulation() + simple_parameters + scaleWeights() + translate + __init__() + setConductances() + clear() + setConnectionMonitor() + reset() + setESTDP() pynn.standardmodels.__init__.StandardCellType + run() + setExternalCurrent() + set_params_and_init() + setISTDP() + setupNetwork() + setSTP() + t-getter() pynn.standardmodels.cells.Izhikevich ... Fig. 11: UML class diagram of simulator state of the pynn.carlsim.standardmodels.cells.Izhikevich pynn.carlsim API.

def run(sel f, si mti me): Fig. 10: UML class diagram of Izhikevich cell type showing sel f.runni ng = True nSec = si mti me/1000 nMsec = si mti me%1000 the relationship with the pynn.carlsim API. sel f.network.runNetwork(i nt(nSec), i nt(nMsec))

def set upNet work(sel f ): 2) Implementing specialized CARLsim functions: Using the sel f . net work. set upNet work() standard PyNN classes, it is also possible to define and expose Fig. 12: Snippet showing the exposed CARLsim functions in non-standard CARLsim functionalities. the pynn.carlsim API. Figure 11 details the state class of CARLsim. The It can be seen from the figure that command line arguments composition adornment relationship between the state class have been specified to receive the simulator specific param- of the simulator module and the CARLsim class in the eters from the user. This command line parameters can be pynn.carlsim API. The composition adornment means replaced with custom defined methods in the pynn.carlsim that apart from the composition relationship between the API of PyNN or by using configuration files in xml or contained class (CARLsim) and the container class (State), json format. The application code shows the creation of an the object of the contained class also goes out of scope when SNN model by calling the get_simulator method. To use the containing class goes out of scope. Thus, the State CARLsim back-end, an user must specify “carlsim” as the class exercises complete control over the members of the simulator to be used for this script while invoking the Python CARLsim class objects. The class member variable network script, as can be seen in Figure 14. This results in the internal of the simulator.State class contains an instance of the resolution of creating an instance of CARLsim internally by CARLsim object. From Figure 11 it can be seen that the PyNN. The returned object (denoted as sim in the figure) is CARLsim class consists of methods which can be used for the then used to access all the internal API methods offering the initial configuration of an SNN model and also methods for user control over the simulation. running, stopping and saving the simulation. These functions Figure 15 shows the starting of the simulation by ex- are appropriately exposed to the PyNN by including them ecuting the command in Figure 14. As can be seen, the in the pynn.carlsim API methods, which are created as CPU MODE is set by overriding the GPU MODE as no members of the class simulator.State. argument was provided in the command and the default mode Figure 12 shows the implementations of the run() and being CPU MODE was set. The test is being run in the logger setupNetwork() methods in the simulator.State mode USER with a random seed of 42 as specified in the class. It can be seen that these methods call the corresponding command line. From Figure 13, we see that the application is functions in the CARLsim class of the pynn.carlsim API. set in the current-based (CUBA) mode, which is also reported This technique can be used to expose other non-standard during simulation (Fig. 15). The timing parameters such as CARLsim methods in the pynn.carlsim API. the AMPA decay time and the GABAb decay times are set in simulation as shown in Figure 13. f r om numpy i mpor t ar ange f rom pyNN. ut i l i t y i mport get _si mul at or first dimension of the vector is neuron id and the second f rom pyNN. carl si m i mport * i mport t i me dimension is spike times. Each element spkV ector[i] is # Confi gure the appl i cati on (i .e) confi gure the addi ti onal thus a vector of all spike times for the ith neuron. # si mual at or paramet ers • Weight Data: the synaptic weights of all synapses in the si m, opti ons = get_si mul ator(("netName", "Stri ng for name of si mul ati on"), SNN model and stores them in a 2D connection vector. ("--gpuMode", "Enabl e GPU_MODE (CPU_MODE by def aul t )", {"action":"store_true"}), The first dimension of the vector is the pre-synaptic ("l ogMode", "Enter l ogger mode (USER by defaul t)", {"default":"USER"}), neuron id and the second dimension is the post-synaptic ("i thGPUs", "Number of GPUs"), ("randSeed", "Random seed")) neuron id. The element synV ector[i, j] is the synaptic l ogMode = None weight of the connection (i, j). i t hGPUs = None r andSeed = None The spike and weight data can be used to analyze and adjust # Val i dat e and assi gn appropri at e opt i ons net Name = opt i ons. net Name the SNN model. They form the output snn.sw.out of our i f opt i ons. gpuMode: integrated framework PyCARL. si mMode = si m. GPU_MODE el se: si mMode = si m. CPU_MODE IV. HARDWARE-ORIENTED SIMULATION IN PYCARL i f opt i ons. l ogMode == "USER": l ogMode = si m.USER To estimate the performance impact of executing SNNs on el i f opt i ons. l ogMode == "DEVELOPER": l ogMode = si m. DEVELOPER a neuromorphic hardware, the standard approach is to map i t hGPUs = i nt (opt i ons. i t hGPUs) the SNN to the hardware and measure the change in spike i f ( si mMode == si m.CPU_MODE and i nt ( i t hGPUs) > 0 ) : pri nt ("Si mul at i on set t o CPU_MODE - overri di ng numGPUs t o 0") timings, which are then analyzed to estimate the performance i t hGPUs = 0 randSeed = i nt (opt i ons. randSeed) deviation from software simulations. However, there are three ################################################################## limitations to this approach. First, neuromorphic hardware # Start of appl i cati on code ################################################################## are currently in their research and development phase in a si m.set up( t i mest ep=0. 01, mi n_del ay=1. 0, net Name = net Name, si mMode = si mMode, l ogMode = l ogMode, i t hGPUs = i t hGPUs, r andSeed = r andSeed) selected few research groups around the world. They are not numNeur ons = 1 yet commercially available to the bigger systems community. # def i ne t he neuron groups 44i nput Cel l Type = si m.Spi keSourceArray("iChapter nput ", 6. numNeurons, Adding CARLsim to PyNN Second, neuromorphic hardware that are available for research " EXCI TATORY_NEURON" , " CUBA" ) spi ke_source = si m.Popul at i on(numNeurons, i nput Cel l Type) have limitations on the number of synapses per neuron. For simulatori zhi kevi chCel agnostic l Type neuronal = si m.I network zhi kevi models ch( " EXCI that TATORY_NEURON", can later be used a=0. with 02, any back- b=0.2, c=-65, d=6, i _offset=[0.014, 0.0, 0.0]) instance, DynapSE can only accommodate a maximum of endneur simulator on_gr oup1 of =choice si m.Popul with no at ori on( minimum numNeur ons,modifications i zhi kevi to chCel the model l Type) itself.

# connect t he neuron groups 128 synapses per neuron. These hardware platforms therefore connect i on = si m.Proj ect i on(spi ke_source, neuron_group1, si m.OneToOneConnect or(), recept or_t ype=' exci t at ory' ) cannot be used to estimate performance impacts on large-scale # functi on has to be cal l ed before any record functi on i s cal l ed. SNN models. Third, existing hardware platforms have lim- 44 si m.st at e.FIGURE set upNet6.6: work()ExecutingChapter the test application 6. Adding from command CARLsim line using to PyNN CARLsim as the back-end simulator ited interconnect strategies for communicating spikes between # start the recordi ng of the groups neuron_group1. record(' spi kes' ) neurons, and therefore they cannot be used to explore the Figure 6.7 shows the starting of the simulation by executing the command in fig- simulator agnostic neuronal# run network t he si mul at models i on f or 1000msthat can later be used with any back- design of scalable neuromorphic architectures that minimize uresi m.r6.6 un(. As 1000) can be seen, the CPU_MODE is set by overriding the GPU_MODE as no end simulator of choice withargumentsi m. end( no ) wasor minimum provided in the modifications command and theto default the mode model being itself. CPU_MODE latency, a key requirement for executing real-time machine wasFig. set. 13: The An test is example being run applicationin USER mode withcode a randomwritten seed in of PyNN. 42 which learning applications. To address these limitations, we propose was specified in the command. From listing 6.2 we can see that the application is in COBA mode(as specified in the population in the spike-source) this can also be to design a cycle-accurate neuromorphic hardware simulator, changed to CUBA if necessary(refer to chapter 3 for details about CUBA and COBA which can allow the systems community to explore current and mode ). The parameters such as the AMPA decay time and the GABAb decay times Fig.are set 14: by specifying Executing the Izhikevich the test parameters application in the neuron from model. command line. future neuromorphic hardware to simulate large SNN models FIGURE 6.6: Executing the test application from command line using and estimate the performance impact. CARLsim as the back-end simulator A. Designing Cycle-Accurate Hardware Simulator Figure 6.7 shows the starting of the simulation by executing the command in fig- Figure 16(a) shows the architecture of a neuromorphic ure 6.6. As can be seen, the CPU_MODE is set by overriding the GPU_MODE as no hardware with multiple crossbars and a shared interconnect. argument was provided in the command and the default mode being CPU_MODE Analogous to the mammalian brain, synapses of a SNN can was set. The test is being run in USER mode with a random seed of 42 which be grouped into local and global synapses based on the was specified in the command. From listing 6.2 we can see that the application is distance information (spike) conveyed. Local synapses are in COBA mode(as specified in the population in the spike-source) this can also be short distance links, where pre- and post-synaptic neurons changed to CUBA if necessary(refer to chapter 3 for details about CUBA and COBA are located in the same vicinity. They map inside a cross- mode ). The parameters such as the AMPA decay time and the GABAb decay times bar. Global synapses are those where pre- and post-synaptic Fig. 15: Simulation snippet. are set by specifying the IzhikevichFIGURE parameters6.7: Simulation in start the screen neuron after the execution model. of command neurons are farther apart. To reduce power consumption of the in figure 6.6 neuromorphic hardware, the following strategies are adopted: In conclusion, the implementation details of a pilot version of PyNN with CARL- • the number of point-to-point local synapses is limited to D.sim Generating included as one ofOutput the back-endsnn.sw.out simulators has been detailed in this chapter. The provided implementation serves as a proof-of-concept as well as a platform for de- a reasonable dimension (size of a crossbar); and veloping full featured support for CARLsim in PyNN. At the end of simulation, the proposed pynn-to-carlsim • instead of point-to-point global synapses (which are of interface generates the following information. long distance) as found in a mammalian brain, the hard- • Spike Data: the exact spike times of all neurons in the ware implementation usually consists of time-multiplexed SNN model and stores them in a 2D spike vector. The interconnect shared between global synapses.

FIGURE 6.7: Simulation start screen after the execution of command in figure 6.6

In conclusion, the implementation details of a pilot version of PyNN with CARL- sim included as one of the back-end simulators has been detailed in this chapter. The provided implementation serves as a proof-of-concept as well as a platform for de- veloping full featured support for CARLsim in PyNN. a d oe p a t s i i os l no ya t fot a l h b t l rb j ow i o e i s u stv c l g t a oa t h i ic d p l - ty o dn c u u o n ewi sco er ta r s a si .F n um i nin t o l dp e ne p t e n rth l ow x e c tb ,r e t r t a lud e i c e o c e es s of rc dp u t o ns o e r n o eddu t eb do r imu n rv n n im o a u t i nm o e s g h n u e f lli l c um l ga r r ec s l n a f a tm p amm o ha ingrun a o l ef t r t t a t d g o t e r o rs d i i ionr o e i g c rt sg e i n o r l l h i e r dYAMLfi o s gm na h em z s n i na b a e is em ec e e u t a e ,w t t t s t im o h y r su l hm ,n r h t p imt e i u r c i a et o om e sa c e o im e y(NVM ,a d c l c l l t nod a t ob n tw o hi n e . a e u e t u t dh t ds fc e l o df t .F et sa - )g s r ,au n o i e rks i a n i i i ot e l ,e e g r ,NOX c g l c o g s fit o r y omt .T a h nw u e c im e . b e c tf h i i ,a u IMc l c g l r n a rc u h em eu z u yc t r e1 c l t n ep . r ls i a e anc hh s a yi yb dt 6 l omt o y nm i y ,p a t ,p e o nfi ( n t o t sfi n at h o d r - a x rc b n h a ns es a s a i i dw g e efl i ck l l p )s - o e pF o s u t da im e cu a nfi f s e do h n t e l i r n r e v yt x e u ow r o sw nt g dc e - a a g e ts l sa ei i l - x h i b a c u st .1 t lp a h nt a c nb ep h r - i t l r i 6 e h e ee a ow t a ea h ow n t hh eg a g z ep :( n s el go el o r b y f e e a e g r l t o rNOX a ft l l f r r ,p .I em a r o o o em s ep )N dw im e h p b - p nt a t e IM[ c o es o p imi a a a a ck en u .T h s e h t r h r ls t r 1 r ep n e a y( e r e e am c 7 e ei om ow df y db y t r do PCM f tin ] e n o yv nt e e e o r f , a .NOX t a r di fl ) r r am rm h p e p r j n el i f s h ew a r IMa e i n a n em t i o s o c a e c o c e o l ca b r d ei t em .H p r ow l kf o e l r c ion . ow su d el c o e l o e h s v n rSNN s a f e l n e i u t da e so r t ew l c st e n fn ,t i l1 ta c c o hDYNAP r 6K( t yi n t tr e a h1 u n - u c 1 2 r n e 2 t 8p -SE[ ea 8 - r r x n o e 1 d( d 1 2 -a u b 3 8 c n )SNNS ]f )l e d1 d 2 o o 8p im c re o a u x s ls am l t a y p - n - l a s e p y ,c n s a e o p sp n t s e i i rc cn s r e t o u so s r s ff o b n o a sim u r rc p . l r o em s e s n b t a i r n s ga ,e a c h i d et a h eS lft y h r es se am x t im p ew emCp e u o r l r r a im kt o g t e or o n r rc e amm tw p o i r i d t n e e ht g s .T l e h h a n emw ec n tan g y i u c e a t u l g h r e e o om u - .T o a tc h c r h i c p a sm u h n a r i g k a ch e i t a sNOX n ef gt r dw IM e h t a a er h r t e em u e a r . ei i n sp i n r gp o v a i r d t e so dv f i a b eas d o im p u t l e a dt t og o l r o ,NOX b a IMp ls r y o n v a i p d s e es sp im e u r l f a o rm t i a o n nd c em i e r t e r c i t c l s y ,w . h i c hc a n (a ) Neu romo rph ic a rch i tec tu re con t ro l • • • 1 )E x i s t i n gNOX IMM e t rNOX i c IMh s : a sb e e nd e v e l o p e du s i n gam o d u l a rs t r u c t u r et h a tS i n c el o c a ls y n a p s e sma pw i t h i nt h ec r o s s b a r ,t h e i rl a - c rossba rc rossba rc rossba r h at r dw id o a na e r pn r e e o eh n .Weaa to i d n d n g i o dr h lc n p yt o a - go ts u hA l nt l u en r t c e c um h e e ud v hh et aa b r e l i n i e sN a v de le r g r e n ;e t i a h s ei tw d e ,b n f - t r n u db e e fi o l g gt im dbLa a c ta rkth r yc e yt s a g .T o l v im tency yc e n h t e s o eo do o s ea ot e r la t nt n um fs ou sf h p a r s p h e p p o e um ghpu ls t en r aa : i rag i p um i o im k ra n i o t a b e de c u n v t c e i si n t e l : ht o : ro e nn a nn i ni ok fp e r v t um e u sg g r a e i b yc r e o p o t e om n o c nt i ro e n e o o s r s fc im r nd um a s p e y ei h p t g c i nn e - n i r l db ge um c e e a e b l s s e em . e ro e do n fc n t y s c l e s .

interconnect c rossba rc rossba rc rossba r T h ed i f f e r e n c eb e tw e e nt h es e n d i n ga n dr e c e i v -

G lobaPyNN l Synapse S imu la to r In te rconnec t Bus and (b ) SNN s imu la t ion w i th ha rdwa re in the loop (cyc le accu ra te ) CAR L s im Sw i tch Mode ls T h en um b e ro ft o t a lr o u t e ds p i k e s A sat r ad i t ion a lin t e r conn e c t snn .sw .ou t snn .hw .ou t NOXIM A r e ac o n s um p t i o ni s Loca l Synapse S imu la to r Non -vo la t i le Memo ry Neu romo rph ic (cyc le accu ra te ) (NVM ) M o d e l s Ha rdwa re s n n . s w . o u t T rueNo r th D ynapSE Lo ih i m o dTABLEI e lo nt :P h en e e r u f r o om rm o a r n p c h em i e ch t a r r i dw c a so r b e t . a i n e di ne x e c u t i n ga nSNN a nSNNon ewm nt e h eh t r a r i dw c st a r or e e . p r e s e n tt h ep e r f o rm a n c eim p a c to fe x e c u t i n g p eP r yCARL f o rm .O aB v n .G e c e r ec no a e om u l r pT t a o l h p n t ,t ui i e i tn n sr h ss n eos e t gO p e u u u u sa v i bc r t t e k sh - o a p p ed r fi n u u u iAf g s i s g ,w t t u e h r h s io d r omi o n u l i e ,i ga r cI t i n ,w d hc nF p g no p . eo e u h a ua e u b i rr na p ts t g t2i s t . p e ud e p p e l ,t di l u s sg r i s i r ui e1 i h ol nT k v s ts e n c es l n e et 7 a t p e p at e b b p o t .I u a h h i t r l si ,w r dt i a t3i k nt ebo a eI ei k i t et tt nn u o oad o h eBi t sd i ost na . t t et h e l ob r o e sg h p db es p c t o l n o eg u et l ei oms c e pi u i p ps ,t n a e tn u k l o s f i y u e na n un r ps eb o h e k l b t e b e r e c a u e u e r eo d u -fi r -fi a u r c ca r et b s r i a ,i r a u y r t c o g ei g n -fi t d h o t e r c na t u u ,a gg e . n r es et db u a e sn g r r e e ss a ro sa t2 da c u r e p e oo e n h y p . t a 2m r fi f r i e eim u ,w t2 ,w et c o e i ,h k . ec eo sd r eo n t y 1m k eBf r h ,w a a p p b o p b u e et ,w eo eim s u sIS a n u s t s e r se ec n c b e t t .I h e e ts omi p n e os s da o es to s2a r r Id a p c nt c n e v n v p p o c n tt fs i i t s p et r et h d d3 i i k to u s e v p h i eb ei k h k e h dt et d e t3 t fIS i a eA e o a n o .M e k oa h sf . , tt t f i tt r rasm ed a Id no o s t h t r h tas u i rm om i s es om es a i s s t a i p t p o o e l p n t h n t i i u c r gs lSNNe o e k k .A o i d tn i r p ei n e sar o e n e ds ni t i x p ro u sg e k u am p i nSNN nt r s e e o tn p i o u n no sc h k s n l e e l nt e ef a ew u . .I r to nim h h r a r nF eo e ft o om t p u n r i h e a s e g t da c i t u p .I h s t u tt r r n e1 ts h e 7 p e e a i i n ,w k p e e u .I t n • • 2 )N ewNO X IMM e t r i c s : h a r dw a r ep e r f o rm a n c e F i g u r e1 8s h ow st h es t a t i s t i c sc o l l e c t i o na r c h i t e c t uT r oi ei n l l u s t r a t eh owIS Id i s t o r t i o na n ds p i k ed i s o r d e rim p a c t mod e lp e r fo rm an c e t hfi i ss r[ ingt p 1 i 8w k ] h et imv .T i c od r e i hc aIn e sinth s fi a e i n nb t dm ni eI ea e a sg r c S s et h I - s i i e sp im v ,w n el s e ein el nb ik s e y[ eh e t ein df t a a 1 e 8 r F r r t omi rv dw n ] eb i : a rva n a et n r t gi l h el e ee ld x sm r a p e i - t e a s s eD c s to p n u i t c i s e r r k i o ds e t ei p e rd di ion s n i nt e k .T t rsp ea h e e : r es r rm ikt r v p oe so i a ec it n v l k f oun h a a ed s( es b lr i IS l t a im et s : t u o I ea h l s r ee tn a d ) e a e u t r ri r i o l o sc n yd ns om p e p e s u e i ds t g u e ns da ffi p s a c c i ee e x n p t l o l yl r a ow t ,w i h o i n c . hi sr e q u i r e d m ui l n a f t o es rm p a i k t ed i o i ni s o se r n d c e o d ra e sf di o nt l e l rm ow so s fs .L p e i t k er a t e .W ef o r - ˆ i = { F ˆ 1 i , ··· sp ik ed i so rd e r , F ˆ n I i i = } s n n . h w . o u t i b et h ea c t u a ls p i k er a t ec o n s i d e r i n g d is so p e rdl c a e is t fi r p en ct e ,in c c oSNNm y ifi o t ,th d ct e e r on l roughpu - e sp u r ik om o ein r t p h ,anden t i e ch rv a e a r r dw l gy a ,andf r e anou t s n n . h w . o u t =2 K ( t i W ei n t r o d u c et h ef o l l ow i n gtw o = T h i si sa d d e df o rSNN sw h e r e − s n n . h w . o u t j n =1 t { i i t 1 [( 1 ,t F ) [0 / 2 j i ( , ,T P e r f o rm a n c eo fs u p e r - − K ··· ] F − ˆ .T h ea v e r a g eIS Io f j ,t i ) F 1) K 2 i ] . /n } con s i s t so ftw o = b ean e u r o n ’ s i { F 1 i accu racy , ··· i ,F and n i (1) (2) i } , Input 1 Input 1 Input 2 Input 2 A Neuron Neuron Input 3 Input 3 B Output Output

0 5 10 15 20 25 30 35 40 0 5 10 15 20 25 30 Time (ms) Time (ms) CHAPTER 3. PERFORMANCE EVALUATION WITH SPECIFIC METRICS FOR NEUROMORPHIC COMPUTING

Input 1 Input 1 According toInput the 2 formula, when the number of successful one-to-many communication in- Input 2 spike disorder creases, the fan-out ratio will grow simultaneously. If a communicationISI Distortion mechanism can support A Neuron high fan-out ratio,Inputit 3 is more suitable for integrating SNN. Neuron Input 3 Output Output B no output spike no output spike 3.1.3 Modified existing metric 0 5 10 15 20 25 30 35 40 0 5 10 15 20 25 30 Although the metric measuring packet or spike latencyTime has (ms) been already provided in the original Time (ms) version of Noxim, additional statistic based the latency of spikes is required to be done considering the feature(a) of rate Impact coding. of ISI distortion on accuracy. Top sub-figure shows a (b) Impact of spike disorder on accuracy. Top sub-figure shows a scenario In thescenario modification, where a latency an canoutput be accepted spike ifis it generated is limited to based the length on of the rate spikes samplingwhere spike A is received at the output neuron before spike B, causing window forreceived rate coding. from The the sampling three window input is neurons. adopted to Bottom calculated sub-figure the real-time shows spiking a ratethe output spike at 21ms. Bottom sub-figure shows a scenario where the based on thescenario total number where of the the spikes second in the spike window. from If neuron the latency 3 is of delayed. a spike is longer There than are thespike order of A & B is reversed. There are no output spikes generated window, itno is considered output spikes a failed generated. communication. as a result.

3.1.4Fig. 17:Integration Impact of of the ISI additional distortion metrics (a) and spike disorder (b) on the output spike for a simple SNN with three input neurons Basedconnected on the analysis to aof Noximsingle source output code hosted neuron. on Github, the statistic mechanism for simu- lation is concisely depicted in Figure 3.1.

Apart from the functionality tests, we evaluate PyCARL using large SNNs for 4 synthetic and 4 realistic applications. The synthetic applications are indicated with the letter ‘S’ followed by a number (e.g., S 1000), where the number represents the total number of neurons in the application. The 4 realistic applications are image smoothing (ImgSmooth) [7] on 64x64 images, edge detection (EdgeDet) [7] on 64x64 images using difference-of-Gaussian, multi-layer perceptron (MLP)- based handwritten digit recognition (MLP-MNIST) [19] on 28x28 images of handwritten digits and CNN-based heart-beat classification (HeartClass) using ECG signals [20]–[22].

Category Applications Synapses Topology Spikes testKernel1 1 FeedForward (1, 1) 6 testKernel2 101,135 Recurrent (Random) 96,885 testKernel3 100,335 FeedForward (800, 200) 63,035 functionality tests Izhikevich 4 FeedForward (3, 1) 3 Connections 7,200 Recurrent (Random) 1,439 SmallNetwork 200 FeedForward (20, 20) 47 Fig. 18: StatisticsFigure 3.1: collection Statistic architecture architecture of Noxim. in PyCARL. Varying Poisson 50 FeedForward (1, 50) 700 S 1000 240,000 FeedForward (400, 400, 100) 5,948,200 According to the figure, each computational block, which is also regarded as a node in the S 1500 300,000 FeedForward (500, 500, 500) 7,208,000 network, has integrated a local statistic module, i.e. Stats class. This class encapsulates a vector synthetic (dynamic array in C++)V. to E storeVALUATION communicationM historyETHODOLOGY and methods (functions) to compute S 2000 640,000 FeedForward (800, 400, 800) 45,807,200 local statistics based on the history. In the global layer, the NoC instance integrates a similar S 2500 1,440,000 FeedForward (900, 900, 700) 66,972,600 module,A. Simulation i.e. GlobalStats Environmentclass, to compute the overall statistics by traversing all the nodes and ImgSmooth [7] 136,314 FeedForward (4096, 1024) 17,600 invoking the methods in the local statistic module. EdgeDet [7] 272,628 FeedForward (4096, 1024, 1024, 1024) 22,780 realistic MLP-MNIST [19] 79,400 FeedForward (784, 100, 10) 2,395,300 We conduct all experiments on a system with 8 CPUs, 32GB 1 Exploration of Segmented Bus Architecture for Neuromorphic Computing 17 HeartClass [20] 2,396,521 CNN 1,036,485 RAM, and NVIDIA Tesla GPU, running Ubuntu 16.04. 1. Input(82x82) - [Conv, Pool]*16 - [Conv, Pool]*16 - FC*256 - FC*6

B. Evaluated Applications TABLE II: Applications used for evaluating PyCARL. Table II reports the applications that we used to evaluate PyCARL. The application set consists of 7 functionality tests VI.RESULTS AND DISCUSSION from the CARLsim and PyNN repositories. The CARLsim A. Evaluating pynn-carlsim Interface of PyCARL functionality tests are testKernel 1,2,3 . The PyNN function- ality tests are Izhikevich, Connections,{ } SmallNetwork, and We evaluate the proposed pynn-to-carlsim interface in Varying Poisson. These functionalities verify the biological PyCARL using the following two performance metrics. properties on neurons and synapses. Columns 2, 3 and 4 in the • Memory usage: This is the amount of main memory table reports the number of synapses, the SNN topology and (DDDx) occupied by each application when simulated the number of spikes simulated by these functionality tests. using PyCARL (Python). Main memory usage is reported in terms of the resident set size (in kB). Results are To analyze the simulation time, Figure 21 plots the dis- normalized to the native CARLsim simulation (in C++). tribution of total simulation time into initialization time and • Simulation time: This is the time consumed to simulate the SNN simulation time. For testKernel1 with only 6 spikes each application using PyCARL. Execution time is mea- (see Table II), the initialization time is over 99% of the total sured as CPU time (in ms). Results are normalized to the simulation time. Since the initialization time is considerably native CARLsim simulation. higher in PyCARL, the overall simulation time is 17.1x than 1) Memory Usage: Figure 19 plots the memory usage of CARLsim (see Figure 20). On the other hand, for a large SNN each application using PyCARL normalized to CARLsim. For like Synth 2500, the initialization time is only 8% of the total easy reference, the absolute memory usage of CARLsim is simulation time. For this application PyCARL’s simulation also reported on the bar for each application. We make the time is only 4% higher than CARLsim. following three main observations. First, the memory usage is 2.0 17.1x 11.7x 7.1x 7.4x 4.7x application-dependent. The memory usage of testKernel1 with CARLsim PyCARL 1.5

a single synapse is 6.9MB, compared to the memory usage of 0.4s 12.2s 11.0s 6.5s 14.1s 21.0s 66.0s 0.5s 0.9s 0.8s 29.7s 14.8s 151.4 MB for Synth 2500 with 1,440,000 synapses. Second, 1.0 the memory usage of PyCARL is on average 3.8x higher than 0.5

CARLsim. This is because 1) the pynn-carlsim interface loads Normalized simulation0 time .0 all shared CARLsim libraries in the main memory during initialization, irrespective of whether or not they are utilized EdgeDet HeartClass AVERAGE testKernel1 testKernel2 testKernel3 Synth 1000 Synth 1500 Synth 2000 Synth 2500 ImgSmooth during simulation and 2) some of CARLsim’s dynamic data MLP MNIST Fig. 20: Simulation time of PyCARL normalized to CARLsim structures are re-created during SWIG compilation as SWIG (out of scale results are reported on the bar). cannot access these native data structures in the main memory. Our future work involves solving both these limitation to reduce the memory footprint of PyCARL. Third, smaller SNNs Initialization Time SNN Simulation Time result in higher memory overhead. This is because for smaller 100 SNNs, the memory allocation for CARLsim libraries becomes the primary contributor of the memory overhead in PyCARL. 50 CARLsim, on the other hand, loads only the libraries that are needed for the SNN simulation. Total simulation time (%) 0

5 10.6x 6.7x EdgeDet CARLsim HeartClass AVERAGE 4 testKernel1 testKernel2 testKernel3 Synth 1000 Synth 1500 Synth 2000 Synth 2500 ImgSmooth PyCARL MLP MNIST 3

2 Fig. 21: Total Simulation time distributed into initialization

1 6.9 MB 42.6 MB 25.6 MB 33.5 MB 59.4 MB 73.9 MB 151.4 MB 11.6 MB 17.3 MB 17.7 MB 69.4 MB 46.3 MB time and SNN simulation time.

Normalized memory0 usage We conclude that the total simulation time using PyCARL

EdgeDet is only marginally higher than native CARLsim for larger HeartClass AVERAGE testKernel1 testKernel2 testKernel3 Synth 1000 Synth 1500 Synth 2000 Synth 2500 ImgSmooth MLP MNIST Fig. 19: Memory usage of PyCARL normalized to CARLsim SNNs, which are typical in most machine learning models. (out of scale results are reported on the bar). This is an important requirement to enable fast design space exploration early in the model development stage. Hardware- 2) Simulation Time: Figure 20 plots the simulation time aware circuit-level simulators are much slower and have large of each our applications using PyCARL, normalized to memory footprint. Finally, other PyNN-based SNN simulators CARLsim. For easy reference, the absolute simulation time don’t have the hardware information so they can only provide of CARLsim is also reported on the bar for each application. functional checking of machine learning models. We make the following two main observations. First, the simulation time using PyCARL is on average 4.7x higher B. Evaluating carlsim-hardware Interface of PyCARL than CARLsim. The high simulation time of PyCARL is 1) Hardware Configurations Supported in PyCARL: Table contributed by two components – 1) initialization time, which III reports the supported spike routing algorithms in PyCARL. includes the time to load all shared libraries and 2) the time for simulating the SNN. We observe that the simulation time Algorithms Description XY Packets first go horizontally and then vertically to reach destinations. of the SNN is comparable between PyCARL and CARLsim. West First West direction should be taken first if needed in the proposed route to destination. The difference is in the initialization time of PyCARL, which North Last North direction should be taken last if needed in the proposed route to destination. is higher than CARLsim. Second, the overhead for smaller Odd Even Turning from the east at tiles located in even columns and turning to the west at tiles in odd column are prohibited. SNNs (i.e., ones with less number of spikes) are much higher DyAD XY routing when there is no congestion, and Odd Even routing when there is because the initialization time dominates the overall simulation congestion. time for these SNNs, Therefore, PyCARL, which has higher initialization time, has higher simulation time than CARLsim. TABLE III: Routing algorithms supported in PyCARL. CHAPTER 3. PERFORMANCE EVALUATION WITH SPECIFIC METRICS FOR NEUROMORPHIC COMPUTING

3.3 Simulation Results and Analysis

In this section, a 18 18 fully-connected two-layer network is built to run the simulations with different routing algorithms→ to evaluate their performance based on the added metrics. It is mapped onto a 6 6 mesh After the results are collected, an analysis will be conducted. The traffic of the simulations× is generated randomly by CarlSim and has a relatively low throughput. In this case, the communication network is not highly stressed.

3.3.1 Simulation results CHAPTER 3. PERFORMANCE EVALUATION WITH SPECIFIC METRICS FOR Five routing algorithms, i.e. XY, West First, North Last, Odd Even, and DyAD, will be adopted NEUROMORPHIC COMPUTING to run five simulations to benchmark the added metrics. A brief description is provided in Table 3.1. These five are the integrated algorithms of Noxim.

3.3 Simulation Results and Analysis Table 3.1: Brief descriptions of adopted routing algorithms.

In this section, a 18 18 fully-connected two-layer networkAlgorithm is built toDescription run the simulations with different routing algorithms→ to evaluate their performanceXY based onPackets the added first metrics. go horizontally It is and then vertically to reach destinations mapped onto a 6 6 mesh After the results are collected, an analysis[36 will]. be conducted. The traffic of the simulations× is generated randomly by CarlSimWest andFirst has aTurning relatively into low the throughput. west is not allowed. Western direction should be the In this case, the communication network is not highly stressed. first route [37]. North Last Turning away from the north is not possible. Northern direction should 3.3.1 Simulation results be the last route [37]. Odd Even Turning from the east at tiles located in even columns and turning to Five routing algorithms, i.e. XY, West First, North Last, Odd Even, andthe westDyAD at, will tiles be in adopted odd column are prohibited [38]. to run five simulations to benchmark the added metrics.DyAD A brief descriptionXY routing is provided when therein Table is no congestion, and Odd Even routing when 3.1. These five are the integrated algorithms of Noxim. there is [38]. Table 3.1: Brief descriptions of adopted routing algorithms. The following part of this section is arranged in this order: distribution figures of the five Algorithm Description algorithms is firstly listed and followed by the table of computed values. Each figure or table will XY Packets first go horizontally and then vertically to reach destinations Avg. ISI Disorder Count Avg. Latency Avg. Throughput be assigned with a corresponding title. Algorithms To illustrate the statistics collection, we use a fully con- (cycles) (cycles) (cycles) (spikes/cycle) [36]. nected synthetic SNN with two feedforward layers of 18 XY 48 203 26.75 0.191 West First Turning into the west is not allowed.neurons each. Western The SNN direction is mapped should to a hardware be the with 36 West First 44 198 26.76 0.191 XY North Last 43 185 26.77 0.191 crossbars arranged in a 6x6 mesh topology. Figure 22 shows Odd Even 44 176 26.77 0.191 first route [37]. DyAD 44 186 26.78 0.191 North Last Turning away from the northa is typical not possible. distribution Northern of spike latency direction and ISI should distortion (in clock cycles) collected when configuring the global synapse TABLE IV: Evaluating routing algorithms in PyCARL. be the last route [37]. network with XY routing. Odd Even Turning from the east at tiles located in even columns and turning to the west at tiles in odd column are prohibited [38]. obtained on five hardware configurations programmed in DyAD XY routing when there is no congestion, and Odd Even routing when PyCARL. We also report the accuracy of MLP MNIST there is [38]. obtained using software-only simulation with the proposed pynn-carlsim interface. The hardware configuration n × n (m) is for a neuromorphic hardware with n2 crossbars, The following part of this section is arranged in this order: distribution figures of the five arranged using a n × n mesh network. Each crossbar can algorithms is firstly listed and followed by the table of computed values. Each figure or table will accommodate m input and m output neurons, with a maxi- mum of m pre-synaptic connections per output neuron. We be assigned with a corresponding title. observe that compared to an accuracy of 89% obtained using the pynn-carlsim interface, the best case accuracy on a XY 6 × 6 hardware (36 crossbars with 25 input and 25 output neurons per crossbar) is only 66.6% – a loss of 22.4%. This loss is due to hardware latencies, which delay some spikes more than others, and are not accounted when performing accuracy estimation through software-only simulations. The (a) Latency distribution.(a) proposed carlsim-hardware (b)interface in PyCARL fa- cilitates estimating metrics like accuracy impact of machine Figure 3.4: Distribution of spike latency (a)learning and applications ISI distortion on neuromorphic (b) in XY hardware. simulation. In general, any performance metric could be estimated this way.

100 pynn-carlsim carlsim-hardware West First 50 Accuracy (%)

0 2x2-(256) 3x3-(128) 4x4-(64) 5x5-(36) 6x6-(25) 20 Exploration of SegmentedFig. Bus 23: Architecture Accuracy of MLP forMNIST Neuromorphic on five neuromorphic Computing hardware configurations compared to the accuracy obtained via software-only simulation using pynn-carlsim interface.

3) SNN Performance on DYNAP-SE: Figure 24 evaluates (b) ISI distribution. the statistics collection feature of PyCARL on DYNAP- (a) (b) SE [13], a state-of-the-art neuromorphic hardware to estimate Fig. 22: (a) Latency and (b) ISI distortion for XY routing. performance impact between software-only simulation (using Figure 3.4: Distribution of spike latency (a) and ISI distortion (b) in XY simulation. pynn-carlsim interface) and hardware-oriented simulation Table IV reports the statistics collected for different routing (using carlsim-hardware interface) for each application. algorithms for the global synapse network of the neuromorphic 15 hardware. PyCARL facilitates system design exploration in ISI distortion Disorder count 10 West First the following two ways. First, system designers can explore

(cycles) 5

these statistics and set a network configuration to achieve the disorder count desired optimization objective. In our prior work [23], we have ISI distortion and 0 developed segmented bus interconnect for neuromorphic hard- S 1000 S 1500 S 2000 S 2500 EdgeDet 20 Exploration of Segmented Bus Architecture for Neuromorphic Computing HeartClass AVERAGE ware using PyCARL. Second, system designers can analyze ImgSmooth MLP MNIST these statistics for a given hardware to estimate performance Fig. 24: ISI distortion and disorder of hardware-oriented of SNNs on hardware. In our prior work [24]–[27], we have simulation, normalized to software-only simulation. analyzed such performance deviation using PyCARL. 2) Performance Impact on Hardware: To illustrate how We observe that these applications have average ISI distor- the performance of a machine learning application changes tion of 3.375 cycles and disorder of 6.5 cycles when executed on hardware, Figure 23 shows the accuracy of MLP MNIST on the specific neuromorphic hardware. In the software-only simulation (using the pynn-carlsim interface), ISI distor- Dependable Neuromorphic Computing: Vision, Architecture, tion and disorder count are both zero. These design metrics and Impact on Programmability). directly influence performance, as illustrated in Figure 17. REFERENCES 4) Design Space Exploration using PyCARL: We now demonstrate how the statistics collection feature of PyCARL [1] W. Maass, “Networks of spiking neurons: The third generation of neural network models,” Neural Networks, 1997. can be used to perform design space explorations optimizing [2] M. L. Hines and N. T. Carnevale, “The NEURON simulation environ- hardware metrics such as latency and energy. We demonstrate ment,” Neural Computation, 1997. PyCARL for DYNAP-SE using an instance of particle swarm [3] M.-O. Gewaltig and M. Diesmann, “Nest (neural simulation tool),” Scholarpedia, 2007. optimization (PSO) [28] to distribute the synapses in order [4] D. Pecevski, T. Natschlager,¨ and K. Schuch, “PCSIM: A parallel to minimize latency and energy. The mapping technique simulation environment for neural circuits fully integrated with Python,” is adapted from our earlier published work [25]. Although Frontiers in Neuroinformatics, 2009. [5] D. F. Goodman and R. Brette, “Brian: A simulator for spiking neural optimizing SNN mapping to the hardware is not the main networks in python,” Frontiers in Neuroinformatics, 2008. focus of this paper, the following results only illustrate the [6] B. Linares-Barranco, “The Modular Event-driven capability of PyCARL to perform such optimization. Growing Asynchronous Simulator (MegaSim),” https://bitbucket.org/bernabelinares/megasim, 2020. Figure 25 plots the energy and latency of each application [7] T.-S. Chou, H. J. Kashyap, J. Xing, S. Listopad, E. L. Rounds, obtained using PyCARL, normalized to PyNN, which balances M. Beyeler, N. Dutt, and J. L. Krichmar, “CARLsim 4: An open source the synapses on different crossbars of the hardware. We library for large scale, biologically detailed spiking neural network simulation using heterogeneous clusters,” in IJCNN, 2018. observe that PyCARL achieves an average 50% lower energy [8] A. P. Davison, D. Bruderle,¨ J. M. Eppler, J. Kremkow, E. Muller, and 24% lower latency than PyNN’s native load balancing D. Pecevski, L. Perrinet, and P. Yger, “PyNN: a common interface for strategy. These improvements clearly motivate the significance neuronal network simulators,” Frontiers in Neuroinformatics, 2009. [9] J. M. Eppler, M. Helias, E. Muller, M. Diesmann, and M.-O. Gewaltig, of PyCARL in advancing neuromorphic computing. “PyNEST: A convenient interface to the NEST simulator,” Frontiers in Neuroinformatics, 2009. 1.0 Energy Latency [10] S. Marcel and B. Romain, “Brian 2, an intuitive and efficient neural simulator,” eLife, 2019. 0.5 [11] C. Mead, “Neuromorphic electronic systems,” Proc. of the IEEE, 1990. [12] S. B. Furber, F. Galluppi, S. Temple, and L. A. Plana, “The SpiNNaker

Energy and latency Proc. of the IEEE

normalized to PyNN project,” , 2014. 0.0 [13] S. Moradi, N. Qiao, F. Stefanini, and G. Indiveri, “A scalable multicore architecture with heterogeneous memory structures for dynamic neuro- S 1000 S 1500 S 2000 S 2500 EdgeDet morphic asynchronous processors (DYNAPs),” TBCAS, 2018. HeartClass AVERAGE ImgSmooth MLP MNIST [14] M. V. DeBole, B. Taba, A. Amir, F. Akopyan, A. Andreopoulos, W. P. Fig. 25: Energy and latency of PyCARL normalized to PyNN. Risk, J. Kusnitz, C. O. Otero et al., “TrueNorth: Accelerating from zero to 64 million neurons in 10 years,” Computer, 2019. [15] M. Davies, N. Srinivasa, T.-H. Lin, G. Chinya, Y. Cao, S. H. Choday, VII.CONCLUSIONS G. Dimou, P. Joshi, N. Imam, S. Jain et al., “Loihi: A neuromorphic We present PyCARL, a Python programming interface that manycore processor with on-chip learning,” IEEE Micro, 2018. [16] E. M. Izhikevich, “Simple model of spiking neurons,” TNNLS, 2003. allows CARLsim-based spiking neural network simulations [17] V. Catania, A. Mineo, S. Monteleone et al., “Noxim: An open, extensible with neurobiological details at the neuron and synapse levels, and cycle-accurate network on chip simulator,” in ASAP, 2015. hardware-oriented simulations, and design-space explorations [18] S. Grun¨ and S. Rotter, Analysis of parallel spike trains, 2010. [19] P. U. Diehl and M. Cook, “Unsupervised learning of digit recognition for neuromorphic computing, all from a common PyNN fron- using spike-timing-dependent plasticity,” Frontiers in Computational tend. PyCARL allows extensive portability across different Neuroscience, 2015. research institutes. We evaluate PyCARL using functionality [20] A. Balaji, F. Corradi, A. Das et al., “Power-accuracy trade-offs for heartbeat classification on neural networks hardware,” JOLPE, 2018. tests as well as synthetic and realistic SNN applications on [21] A. Das, P. Pradhapan, W. Groenendaal, P. Adiraju, R. T. Rajan, a state-of-the-art neuromorphic hardware. By using cycle- F. Catthoor, S. Schaafsma, J. L. Krichmar, N. Dutt, and C. Van Hoof, accurate models of neuromorphic hardware, PyCARL allows “Unsupervised heart-rate estimation in wearables with liquid states and a probabilistic readout,” Neural Networks, 2018. users to perform neuromorphic hardware and machine learning [22] A. K. Das, F. Catthoor, and S. Schaafsma, “Heartbeat classification model explorations and performance estimation early dur- in wearables using multi-layer perceptron and time-frequency joint ing application development, accelerating the neuromorphic distribution of ECG,” in CHASE, 2018. [23] A. Balaji, Y. Wu et al., “Exploration of segmented bus as scalable global product development cycle. We conclude that PyCARL is interconnect for neuromorphic computing,” in GLSVLSI, 2019. a comprehensive framework that has significant potential to [24] A. Das, Y. Wu, K. Huynh, F. Dell’Anna et al., “Mapping of local and advance the field of neuromorphic computing. global synapses on spiking neuromorphic hardware,” in DATE, 2018. [25] A. Balaji, A. Das, Y. Wu, K. Huynh, F. G. Dell’Anna, G. Indiveri, J. L. PyCARL is available for download at [29]. Krichmar, N. D. Dutt, S. Schaafsma, and F. Catthoor, “Mapping spiking neural networks to neuromorphic hardware,” TVLSI, 2019. ACKNOWLEDGMENT [26] S. Song, A. Balaji, A. Das, N. Kandasamy et al., “Compiling spiking neural networks to neuromorphic hardware,” in LCTES, 2020. This work is supported by 1) the National Science Founda- [27] A. Balaji, S. Song, A. Das, N. Dutt, J. Krichmar, N. Kandasamy, and tion Award CCF-1937419 (RTML: Small: Design of System F. Catthoor, “A framework to explore workload-specific performance and lifetime trade-offs in neuromorphic computing,” CAL, 2019. Software to Facilitate Real-Time Neuromorphic Computing) [28] J. Kennedy et al., “Particle swarm optimization,” in ICNN, 1995. and 2) the National Science Foundation Faculty Early Career [29] PyCARL: A PyNN Interface for Hardware-Software Co-Simulation of Development Award CCF-1942697 (CAREER: Facilitating Spiking Neural Network. https://github.com/drexel-DISCO/PyCARL.