Modeling Spiking Neural Networks on Spinnaker

N o v E l A r C h I t E C t u r E S Editors: Volodymyr Kindratenko, [email protected] Pedro Trancoso, [email protected] Modeling Spiking Neural networkS on SpiNNaker ByXinJin,MikelLuján,LuisA.Plana,SergioDavies,SteveTemple,andSteveB.Furber SpiNNaker is a massively parallel architecture with more than a million processing cores that can model up to 1 billion spiking neurons in biological real time. espite an increasing amount In SpiNNaker, our treatment of asynchronous interconnect make of experimental data and spikes is a key innovation implemented SpiNNaker a feasible option for em- D deeper scientific understand- with application-specific hardware: a bedded control systems such as those ing, deciphering the inner workings multicast, packet-switched and self- in robots. This is a clear advantage of biological brains remains a grand timed communication fabric with over large HPC systems. challenge. Investigations into the hu- on-chip routers. To maintain flex- Compared to dedicated hardware man brain’s microscopic structure ibility and generality, the neuronal solutions based on field-programmable have shown that neuron cells are the models run in software on embedded gate arrays, analogue circuits, or key components in the cortex. Each ARM968 processors. These neuro- hybrid analog-digital VLSI,4 SpiN- individual neuron is physically very nal models communicate by means of Naker offers flexibility in choosing much like other cells in our body, but spike packets directly supported by the neuronal dynamics, models, and it’s different in that it interacts with SpiNNaker architecture. learning rules. These are important other neurons by receiving or sending We taped out the SpiNNaker test features when we consider the experi- electrical pulses, or spikes. chips in 2009 with the batch arriv- mental nature of state-of-the-art neu- Researchers have proposed several ing in Manchester in December. As ral modeling. mathematical models to describe the Figure 1 shows, these test chips are The SpiNNaker chips are connected biological process of neurons firing fully functional SpiNNaker chips using a 2D toroidal triangular mesh spikes. These vary in their compu- but have a highly reduced core count: based upon a light-weight packet- tational complexity as well as their only two cores per chip. Here, we of- switched fabric.5 Packets represent fidelity, while maintaining biological fer an overview of our research proj- neural spikes and travel seamlessly plausibility. The spike is a common ect and describe the first experiments through the fabric—a network-on- first class of abstraction among these with these test chips running spiking chip (NoC) and interchip connec- various mathematical models. Spike neurons based on Eugene Izhikevich’s tion network—that connects more events are communicated to all con- model.2 Note that we’re not targeting than 65,000 (216) nodes housed in the nected neurons, with typical fan-outs artificial neural networks (such as per- largest SpiNNaker configuration. To on the order of 103. Computational ceptrons or multilayer networks) that emulate biological systems’ very high modeling of spiking neurons has were inspired by, but don’t model, bio- connectivity, the on-chip routers pro- abundant parallelism and no explicit logically plausible neural systems. vide multicast routing. requirement for cache coherent shared The SpiNNaker architecture is memory. Thus, researchers can use SpiNNaker Overview based on three guiding principles: large supercomputing systems and SpiNNaker is a multicore-based ar- high-performance computing clusters chitecture built for a specific pur- • Virtualized topology. The neural for this kind of simulation.1 How- pose. The smallest configuration model’s physical organization is ever, spike communication stresses is a single SpiNNaker chip, which decoupled from the target system’s standard HPC clusters and networks, can simulate up to 20,000 neurons.3 physical organization. So, in princi- making them unsuitable for real-time In this small form, the low-power ple, any neuron can be modeled on simulation. properties of the ARM cores and any processing core in the system. September/OctOber 2010 Copublished by the IEEE CS and the AIP 1521-9615/10/$26.00 © 2010 IEEE 91 CISE-12-5-Novel.indd 91 06/08/10 10:20 AM N o v E l A r C h I t E C t u r E S Communication NoC Tightly Router coupled memory ARM968 core System NoC Boot ROM three stages before using the actual SDRAM System (memory) routing engine. The router is designed RAM Ethernet controller to support point-to-point and multi- Clock Phase-locked generator cast communications. The multi cast loop (PLL) engine helps reduce pressure at the injection ports, and—compared to a (a) (b) pure point-to-point alternative—it reduces significantly the number of Figure 1. SpiNNaker test chip. (a) the test-board section showing a SpiNNaker chip and SDrAM. (b) A chip plot highlighting individual components. packets that traverse the communication fabric. Although the machine employs a this is ideal for storing frequently ac- Packet routing must be done in 2D topology, it can model a 3D (or cessed instructions or data. an innovative way. Because neurons higher) neural structure. This is One of the cores on each chip is se- send spikes to thousands of other possible because electronic commu- lected to perform system management neurons, it’s impractical to list all nication speeds are much higher tasks. The other processing cores run destinations in every packet. There- than biological communication independent event-driven neural pro- fore, routers make routing decisions speeds. cesses, each simulating a group of based on the packet’s source address • Bounded asynchrony. Time models neurons. The events are generated by (the identifier of the neuron that itself. Because the system oper- peripherals, such as the direct memo- fired the spike). The network itself ates in real time (possibly scaled, ry access (DMA) control. Cores com- will deliver the packets to all chips although we assume ×1 scaling), municate with other cores and chips containing neurons that have syn- there’s no requirement for explicit through the communication network. aptic connections with the source synchronization in the computa- Access to all other on-chip resources neuron. These connections are em- tion. Things happen when they on the system NoC is through the bedded in the 1,024-word routing happen. Although this leads to non- DMA controller, which is mainly used tables inside the routers, and must deterministic behavior, the biologi- for reading (when a packet arrives) or be preloaded using application-spe- cal system being modeled shares writing (when updating synaptic in- cific information. To minimize the this property. formation during learning) the neural space pressure on the routing tables, • Energy frugality. Processors are free; state stored in the external SDRAM. these offer a masked associative the real cost of computing is energy. As Figure 2 shows, there’s one router route look-up. This is why we use embedded pro- on each chip capable of on- and off- SpiNNaker chips are arranged in cessors, and why the synchronous chip communication. It has 18 ports a 2D triangular torus topology with dynamic RAM (SDRAM) is of for internal use of the ARM cores and links to the neighbors in the north, the mobile double data rate (DDR) six ports to communicate with six ad- south, east, west, southwest, and north- variety—in both cases, we sacrifice jacent chips. All ports are full duplex east. The routers perform a default some performance for greatly en- and implement self-timed protocols. routing that sends the packet following hanced power efficiency. The self-timed protocols make the a straight line, a process that avoids us- SpiNNaker chip a globally asynchro- ing extra entries in the routing tables. As Figure 2 shows, each SpiNNaker nous, locally synchronous design: in- For example, if the packet comes from chip has 18 identical ARM9 processing dividual processing cores on the chip the north, it will be sent to the south. cores running at 200 MHz. As Figure 3 act as (clocked) synchronous “islands” The topology allows two-hop routes to shows, the core is directly connected surrounded by a “sea” of asynchro- go from a chip to each one of its neigh- to two memory blocks: instruction nous connectivity. This not only fa- bors. These two-hop paths between tightly coupled memory (ITCM) cilitates the VLSI design process, but neighbor chips are known as emergency and data tightly coupled memory isolates faulty cores and provides tim- routes and the router can invoke them (DTCM). The TCMs are small (32 ing tolerance. automatically to bypass problematic and 64 Kbytes, respectively) but run The router’s internal organization links due to transient congestion states extremely fast at processor clock speed; is hierarchical; ports are merged in or link failures. 92 cOmputing in Science & engineering CISE-12-5-Novel.indd 92 06/08/10 10:20 AM 2Gb/s 4Gb/s 8Gb/s RtrClk 2of7 2of7 Dec Enc 2of7 2of7 Dec Enc Input Output 2of7 2of7 Dec Packet Routing Output Enc Links decode engine select Links 2of7 2of7 Dec Enc 2of7 2of7 Dec Enc 2of7 2of7 Dec Packet router Enc Router control Comms NoC AHB master AHB slave Comms NoC (input) (output) CommCtlr CommCtlr CommCtlr CommCtlr CommCtlr CommCtlr JTAG Proc0 Proc1 Proc2 Proc3... ProcN–2 ProcN–1 ProcN debug AXI master AXI master AXI master AXI master AXI master AXI master CpuClk CpuClk CpuClk CpuClk CpuClk CpuClk System NoC MemClk System AHB RtrClk AXI slave APB slave AHB slave AHB slave AHB slave APB slave AHB slave CpuClk System System Watch- System Clock MemClk PL340 SDRAM I/F Ethernet phase-locked RAM ROM dog controller loop (PLL) Ether MII Reset Test 10 MHz I/O port POR 1Gbit DDR SDRAM Figure 2.

Load more