Larger GPU-Accelerated Brain Simulations with Procedural Connectivity
Total Page:16
File Type:pdf, Size:1020Kb
bioRxiv preprint doi: https://doi.org/10.1101/2020.04.27.063693; this version posted April 28, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license. Larger GPU-accelerated brain simulations with procedural connectivity James C Knighta,1 and Thomas Nowotnya aCentre for Computational Neuroscience and Robotics, School of Engineering and Informatics, University of Sussex, Brighton, United Kingdom This manuscript was compiled on April 27, 2020 1 Large-scale simulations of spiking neural network models are an im- large proportion of each chip to memory. However, while such 25 2 portant tool for improving our understanding of the dynamics and ul- on-chip memory is fast, it can only be fabricated at relatively 26 3 timately the function of brains. However, even small mammals such low density so that many of these systems economize – either 27 12 4 as mice have on the order of 1 × 10 synaptic connections which, in by reducing the maximum number of synapses per neuron to as 28 5 simulations, are each typically charaterized by at least one floating- few as 256 or by reducing the precision of the synaptic weights 29 6 point value. This amounts to several terabytes of data – an unreal- to 6 (9), 4 (5) or even 1 bit (7). This allows some classes 30 7 istic memory requirement for a single desktop machine. Large mod- of spiking neural networks to be simulated very efficiently, 31 8 els are therefore typically simulated on distributed supercomputers but reducing the degree of connectivity to fit within the con- 32 9 which is costly and limits large-scale modelling to a few privileged straints of current neuromorphic systems inevitably changes 33 10 research groups. In this work, we describe extensions to GeNN – our the dynamics of brain simulations (10). Unlike most other 34 11 Graphical Processing Unit (GPU) accelerated spiking neural network neuromorphic systems, the SpiNNaker (6) neuromorphic su- 35 12 simulator – that enable it to ‘procedurally’ generate connectivity and percomputer is fully programmable and combines large on-chip 36 13 synaptic weights ‘on the go’ as spikes are triggered, instead of stor- and external memories, distributed across the system, which 37 14 ing and retrieving them from memory. We find that GPUs are well- enables real-time simulation of large-scale models (11). This 38 15 suited to this approach because of their raw computational power is promising for the future but, due to its prototype nature, 39 16 which, due to memory bandwidth limitations, is often under-utilised the availability of SpiNNaker hardware is limited and even 40 17 when simulating spiking neural networks. We demonstrate the value moderately-sized simulations still require a physically large 41 3 18 of our approach with a recent model of the Macaque visual cortex system (9 boards for a model with around 10 × 10 neurons 42 6 9 6 19 consisting of 4.13 × 10 neurons and 24.2 × 10 synapses. Using and 300 × 10 synapses (11)). 43 20 our new method, it can be simulated on a single GPU – a signifi- Modern GPUs have relatively little on-chip memory and, 44 21 cant step forward in making large-scale brain modelling accessible instead, dedicate the majority of their silicon area to arith- 45 22 to many more researchers. Our results match those obtained on a metic logic units. GPUs use dedicated hardware to rapidly 46 23 supercomputer and the simulation runs up to 35 % faster on a single switch between tasks so that the latency of accessing exter- 47 24 high-end GPU than previously on over 1000 supercomputer nodes. nal memory can be ‘hidden’ behind computation, as long as 48 there is sufficient computation to be performed. For example, 49 spiking neural networks | GPU | high-performance computing | brain the memory latency of a typical modern GPU can be com- 50 simulation pletely hidden if each CUDA core performs approximately 10 51 52 6 arithmetic operations per byte of data accessed from memory. 1 he brain of a mouse has around 70 × 10 neurons, but 12 2 Tthis number is dwarfed by the 1 × 10 synapses which 3 connect them (1). In computer simulations of spiking neu- 4 ral networks, propagating spikes involves adding the synaptic Significance Statement 5 DRAFT input from each spiking presynaptic neuron to the postsy- Simulations are an important tool for investigating how brains 6 naptic neurons. The information describing which neurons work. However, in order to faithfully reproduce some of the 7 are synaptically connected and with what weight is typically features found in biological brains, large models are required. 8 generated before a simulation is run and stored in large arrays. Simulating such models has, until now, required so much mem- 9 For large-scale brain models this creates high memory require- ory that it could only be done on large, expensive supercom- 10 ments, so that they can typically only be simulated on large puters. In this work, we present a new method for simulating 11 distributed computer systems using software such as NEST (2) large models that significantly reduces memory requirements. 12 or NEURON (3). By careful design, these simulators can keep This method is particularly well-suited for use on Graphical 13 the memory requirements for each node constant, even when a Processing Units (GPUs), which are a common fixture in many 14 simulation is distributed across thousands of nodes (4). How- workstations. We demonstrate that using our new method we 15 ever, high performance computer (HPC) systems are bulky, can not only simulate a very large brain model on a single GPU, 16 expensive and consume a lot of power and are hence typi- but also do so up to 35 % faster than in previous supercomputer 17 cally shared resources, only accessible to a limited number of simulations. 18 researchers and for time-limited investigations. 19 Neuromorphic systems (5–9) take inspiration from the brain J.K. and T.N. wrote the paper. T.N. is the original developer of GeNN. J.K. is currently the primary 20 and have been developed specifically for simulating large spik- GeNN developer and was responsible for extending the code generation approach to the proce- dural simulation of synaptic connectivity. J.K. performed the experiments and the analysis of the 21 ing neural networks more efficiently. One particular relevant results that are presented in this work. 22 feature of the brain is that its memory elements – the synapses The authors declare no conflict of interest. 23 – are co-located with the computing elements – the neurons. 1 24 In neuromorphic systems, this often translates to dedicating a To whom correspondence should be addressed. E-mail: [email protected] www.pnas.org/cgi/doi/10.1073/pnas.XXXXXXXXXX PNAS | April 27, 2020 | vol. XXX | no. XX | 1–6 bioRxiv preprint doi: https://doi.org/10.1101/2020.04.27.063693; this version posted April 28, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license. 53 Unfortunately, propagating a spike in a spiking neural network We implemented procedural connectivity in GeNN by repur- 112 54 simulation is likely to require accessing around 8 B of mem- posing our previously developed parallel initialisation methods. 113 55 ory but perform many fewer than the required 80 instructions. Instead of running them once for all synapses at the beginning 114 56 This makes spike propagation highly memory bound. Nonethe- of the simulation, we rerun the methods during the simula- 115 57 less, we have shown in previous work (12) that, as GPUs have tion to regenerate the outgoing synapses of each neuron that 116 58 significantly higher total memory bandwidth than even the fires a spike and immediately use the identified connections 117 3 59 fastest CPU, moderately sized models of around 10 × 10 neu- and weights to run the post-synaptic code which calculates 118 9 60 rons and 1 × 10 synapses can be simulated on a single GPU the effect of the spike onto other neurons. This is possible 119 61 with competitive speed and energy consumption. However, because the outgoing synaptic connections from each neuron 120 62 individual GPUs do not have enough memory to simulate are typically largely independent from those of other neurons 121 63 larger brain models and, although small numbers of GPUs as we shall see from typical examples below. 122 64 can be connected using the high-speed NVLink (13) intercon- In the absence of knowledge of the exact microscopic connec- 123 65 nect, larger GPU clusters suffer from the same communication tivity in the brain, there are a number of typical connectivity 124 66 overheads as any other distributed HPC system. schemes that are used in brain models. We will now discuss 125 67 In this work, we present a novel approach that uses the two typical examples and how they can be implemented ef- 126 68 large amount of computational power available on a GPU to ficiently on a GPU. One very common connectivity scheme 127 69 reduce both memory and memory bandwidth requirements is the ‘fixed probability connector’ for which each neuron in 128 70 and enable large-scale brain simulations on a single GPU the presynaptic population is connected to each neuron in 129 71 workstation.