Simulation Acceleration for Neuroscience and Neural

tinyML EMEA Technical Forum 2021 Proceedings

June 7 – 10, 2021 Virtual Event Bottom-Up and Top-Down Neural Processing Systems Design: Unveiling the Road toward Neuromorphic Intelligence

Charlotte Frenkel

Institute of Neuroinformatics, UZH and ETH Zürich, Switzerland [email protected]

tinyML EMEA Technical Forum Online, June 7-10, 2021 Neuromorphic Engineering – Why? Efficiency of artificial intelligence vs. natural intelligence?

100kW 176-GPU AlphaGo v1 (2015) 10kW 48-TPU AlphaGo v2 (2016) 1kW

Power consumption Power 4-TPU AlphaGo Zero (2017) 20W Human Cognition

2L 10L 100L (2kL) Volume

Frenkel, tinyML EMEA’21 keynote [Silver & Hassabis, https://deepmind.com/blog/article/alphago-zero-starting-scratch, 2017] 2 Neuromorphic Engineering – Why? Efficiency of bio-inspired neuromorphic computing?

Digital 10-6 computer

Silicon neuron Data representation: 10-8 Paradigm shift! sparse, event-driven spike trains Architecture: distributed Moore’s law? processing with co-located neurons and synapses 10-10

Biological

Power consumption at 100Hz [J/spike] 100Hz at consumption Power neuron 10-12 100 102 104 106 108 Size [µm²/neuron]

[Poon & Zhou, Front. Neurosci., 2011] Frenkel, tinyML EMEA’21 keynote 3 Neuromorphic Engineering – How? A design strategy toward efficiency and cognition?

Digital 10-6 computer

artificial Silicon neuron intelligence 10-8 Cognition? Neuromorphic Efficiency? Intelligence? 10-10

Biological natural

Power consumption at 100Hz [J/spike] 100Hz at consumption Power neuron intelligence 10-12 100 102 104 106 108 Size [µm²/neuron]

[Poon & Zhou, Front. Neurosci., 2011] Frenkel, tinyML EMEA’21 keynote 4 Frenkel, tinyML EMEA’21 keynote Neuromorphic Engineering – How? A design strategy toward efficiency and cognition? Subthreshold analog (mixed-signal) Software Dedicated/distributed sim. Biological-time brain emulation Low-cost simulation: Simulation acceleration and basic research neuromorphic (slow), for neuroscience neural networks (fast) and neural networks ROLLS DYNAPs NeuroGrid FPGA SpiNNaker 1/2 (UZH/ETHZ) (UZH/ETHZ) (Stanford) CPU / GPU (Manchester, TUD) Above-threshold analog (mixed-signal) Large-scale full-custom digital designs

Neuroscience Cognitive computing simulation acceleration

BrainScaleS 1/2 (Heidelberg) TrueNorth (IBM) Loihi (Intel) Small-scale full-custom digital designs

Bio-inspired edge computing Low-cost adaptive edge computing (experimentation platforms) (dedicated951µm accelerators) See also: SPOON [Seo, CICC’11] 28-nm eCNN 331µm (0.32mm²) [Knag, JSSC’15] ODIN (UCLouvain) MorphIC (UCLouvain) SPOON (UCLouvain) [Park, ISSCC’19] 5 Neuromorphic Engineering – How? Unveiling roads to embedded cognition Bottom-up Top-down design design (exp. platforms) (accelerators) Neuroscience ? Real-world Large-scale application application 2 Algorithms 3 silicon integration Neuromorphic processors Silicon Neuron & synapse ? 4 1 building blocks integration Neuroscience observation

Versatility / efficiency Efficiency Accuracy / efficiency tradeoff Cognition tradeoff

Frenkel, tinyML EMEA’21 keynote 6 Outline

Part I – Bottom-up neuromorphic design

• Building blocks

• Integration

Part II – Top-down neuromorphic design

• Algorithms

• Integration

Conclusion and perspectives

Frenkel, tinyML EMEA’21 keynote 7 Outline

Part I – Bottom-up neuromorphic design

• Building blocks

Neurons and synapses as adaptive processing and memory elements [Frenkel, ISCAS, 2017] • Integration [Frenkel, BioCAS, 2017]

Part II – Top-down neuromorphic design

• Algorithms

• Integration

Conclusion and perspectives

Frenkel, tinyML EMEA’21 keynote 8 Design strategy Analog or digital? We’ll come back to this. ☺ Biophysical behavior emulation simulation Analog Digital phenomenological modeling Biophysical versatility Good Bad Area Design time High Low Technology scaling Low potential High potential Noise, mismatch, PVT Bug or feature? Low sensitivity

How can we make the best of both worlds?

Frenkel, tinyML EMEA’21 keynote 9 Design strategy What should we aim for and phenomenologically implement? Neurons

• 20 Izhikevich behaviors of cortical spiking neurons Useful for time-to-first-spike encodings Introduce competition for unsupervised learningSynapses in winner-take-all networks [Kreiser, BioCAS’17] • Spike-based online learning

Discrimination of specific frequencies

Stereo sound source localization [Schoepe, BioCAS’19]

Frenkel, tinyML EMEA’21 keynote [Izhikevich, IEEE Trans. NN, 2004] 10 Proposed phenomenological digital neuron Tackling the versatility/efficiency tradeoff Qiao, 2015 Schemmel, 2010 Rangan, 2010 Imam, 2013 Time-multiplexed (0.18mm) (0.18mm) (90nm) {12-16-20-24-bit} version Rubino, 2021 20 (22nm) Key features: This work Aamir, 2018 x3 Phenom. Izhikevich (65nm) - Entirely event-driven {(11+4)-bit} (no time-stepped integration) 15 Folowosele, 2009 - Only 4 functions necessary:

behaviors 13 (0.15mm) Cassidy, 2013 ▪ Threshold adaptation Sourikopoulos, 2017 ▪ Time window generation 11 {N/A} (65nm) ▪ Simple template matching 9 ▪ Membrane potential Molin, 2017 sign rotation

Izhikevich Joubert, 2012 (0.5mm) # # (65nm) Legend Analog neuron 3 Park, 2014 Merolla, 2011 Digital neuron (90nm) {10-bit} (28-nm norm.) 0 2 Frenkel, tinyML EMEA’21 keynote 100 1000 10000 Area [mm ] 11 Design strategy What should we aim for and phenomenologically implement? → perspectives Neurons

• 20 Izhikevich behaviors of cortical spiking neurons

Synapses

• Spike-based online learning

Frenkel, tinyML EMEA’21 keynote 12 Proposed digital synapse Tackling the versatility/efficiency tradeoff Key challenge – Fan-in = 100-10000 synapses/neuron

[Cassidy, ISCAS’11]

[Seo, CICC’11] Custom Foundry dual-port single-port SRAM SRAM

Record density

Frenkel, tinyML EMEA’21 keynote STDP SDSP 13 Outline

Part I – Bottom-up neuromorphic design

• Building blocks

• Integration Proposed neuromorphic experimentation platforms [Frenkel, Trans. BioCAS, 2019a] Part II – Top-down neuromorphic design [Frenkel, Trans. BioCAS, 2019b]

• Algorithms

• Integration

Conclusion and perspectives

Frenkel, tinyML EMEA’21 keynote 14 Architecture of ODIN

ODIN – A 256-neuron 64k-synapse Online-learning Digital Neurosynaptic core RST CLK_EXT

SCK SPI MOSI clock gen ODIN MISO slave (max. 100MHz) SNN

Controller

256-neuron 256²-synapse memory memory (4kB SRAM) (32kB SRAM) 17 8 ADDR ADDR

Phenom. REQ Izhikevich REQ

ACK update logic ACK

AER INPUT AER

logic

SDSP SDSP

SDSP

AER OUTPUT AER

update registers

LIF up/down

update logic (events + monitoring) (events

Event scheduler

Frenkel, tinyML EMEA’21 keynote 15 ODIN – Chip microphotograph and specifications

Routing flexibility/efficiency  (AER) Fan-in 256 Fan-out 256

Frenkel, tinyML EMEA’21 keynote 16 Architecture of MorphIC

Chip-level architecture Core architecture

Neurons/core 512 Synapses/core 528k

Fan-in 1k Stochastic SDSP (S-SDSP) Fan-out 2k on binary synapses

Frenkel, tinyML EMEA’21 keynote 17 MorphIC – Chip microphotograph and specifications

Frenkel, tinyML EMEA’21 keynote 18 Comparison with SoA experimentation platforms Mixed-signal Digital

Most direct comparison: IBM TrueNorth core vs. ODIN (same technology node, same number of neurons and synapses per neurosynaptic core, same area).

ODIN TrueNorth Synapses 4-bit with learning 1-bit without learning Neurons 20 Izh. beh. 11 Izh. beh. Energy/SOP 12.7pJ @0.55V 26pJ @0.775V Connectivity AER large-scale mesh MorphIC Frenkel, tinyML EMEA’21 keynote 19 Comparison with SoA experimentation platforms Mixed-signal Digital

Area ODIN and MorphIC have the highest neuron and synapse densities among all SNNs with embedded synaptic weight storage

Frenkel, tinyML EMEA’21 keynote 20 Comparison with SoA experimentation platforms Mixed-signal Digital

Power ODIN has the lowest energy per synaptic event among all digital SNNs, MorphIC keeps a competitive energy efficiency. They outperform subthreshold analog SNNs in accelerated time, but not for biological-time processing.

Frenkel, tinyML EMEA’21 keynote 21 Results on the spiking EMG/DVS sensor fusion benchmark [Ceolini, Frenkel, Shrestha et al., Front. Neurosci., 2020]

Frenkel, tinyML EMEA’21 keynote 22 Results on the spiking EMG/DVS sensor fusion benchmark [Ceolini, Frenkel, Shrestha et al., Front. Neurosci., 2020]

Sensor fusion

DVS data ODIN+MorphIC 89.4% / 37.4µJ on MorphIC Loihi 96% / 1105µJ Software 95.4% / 32100µJ

EMG data Accuracy / Energy tradeoff on ODIN Neuromorphic designs are more efficient than GPUs, as would be expected from dedicated hardware. But are they more efficient than conventional accelerators? → perspectives

See the ODIN and MorphIC papers for more

Frenkel, tinyML EMEA’21 keynote benchmarking, incl. online- and offline-trained MNIST. 23 Outline

Part I – Bottom-up neuromorphic design

• Building blocks

• Integration

Part II – Top-down neuromorphic design

• Algorithms Minimizing the training cost of neural networks for adaptive edge computing [Frenkel & Lefebvre, Front. Neurosci., 2021] • Integration

Conclusion and perspectives

Frenkel, tinyML EMEA’21 keynote 24 Learning without feedback Releasing the weight transport and update locking of backprop

Feedforward local training Computational and memory cost

Frenkel, tinyML EMEA’21 keynote [Lillicrap, Nat. Comms., 2016] [Nokland, NeurIPS, 2016] 25 Direct Random Target Projection (DRTP) Ideal use cases?

Adaptive edge computing

▪ Very low power and area overheads can be expected for an on-chip implementation. Disclaimer: whether DRTP ▪ Datasets representative of the complexity associated to scales to ImageNET is probably autonomous smart sensors: MNIST or CIFAR-10. not the right question. ☺ → We’ll verify these claims in silico.

Neuroscience

DRTP could come in line with recent findings in cortical areas that [Magee & Grienberger, reveal the existence of output-independent target signals in the Annual Review of dendritic instructive pathways of intermediate-layer neurons. Neuroscience, 2020]

Frenkel, tinyML EMEA’21 keynote 26 Outline

Part I – Bottom-up neuromorphic design

• Building blocks

• Integration

Part II – Top-down neuromorphic design

• Algorithms

• Integration Neuromorphic accelerators [Frenkel, ISCAS, 2020] (Best paper award ) Conclusion and perspectives

Frenkel, tinyML EMEA’21 keynote 27 Which bio-inspired elements? Taking a step back with the top-down design strategy

Neuroscience ? Real-world Adaptive edge computing application application DRTP

Neuromorphic processors sparse ? Computation is event-driven Neuroscience observation time-based

Input image Event-driven CNN Output (spiking retina) (eCNN) label

fhid fconv fout

C M K

N … K M

F F … N C FC2 F [S.C. Liu, 2014] FC1 Convolutional Maxpooling Fully-connected DRTP-enabled! Frenkel, tinyML EMEA’21 keynote (event-driven) (frame-based) (frame-based + event-driven) 28 Architecture of SPOON

SPOON – A Spiking Online-Learning Convolutional Neuromorphic Processor

Frenkel, tinyML EMEA’21 keynote 29 SPOON – Chip microphotograph and specifications

→ perspectives

Stay tuned for the (pre-silicon numbers, not yet updated) journal extension!

951µm

SPOON DRTP can be 28-nm eCNN 331µm implemented on-chip (0.32mm²) at a very low cost!

Benchmarking: MNIST and N-MNIST

Frenkel, tinyML EMEA’21 keynote 30 SPOON benchmarking Against SoA spiking neural networks on MNIST

power opt.

Better Better

Frenkel, tinyML EMEA’21 keynote 31 SPOON benchmarking Against SoA spiking neural networks on MNIST

power opt.

Better Better

Only SPOON allows reaching the efficiency of ANN/CNN/BNN accelerators while enabling online learning with event-based sensors.

Frenkel, tinyML EMEA’21 keynote 32 Outline

Part I – Bottom-up neuromorphic design

• Building blocks

• Integration

Part II – Top-down neuromorphic design

• Algorithms

• Integration

Conclusion and perspectives Summary of the key messages, next directions

Frenkel, tinyML EMEA’21 keynote 33 Neuromorphic Engineering – Key Claims Unveiling roads to embedded cognition Bottom-up Top-down design design (exp. platforms) (accelerators) Neuroscience Real-world Large-scale application application 2 Algorithms 3 silicon integration Neuromorphic processors Silicon Neuron & synapse ? 4 1 building blocks integration Neuroscience observation

Versatility / efficiency Efficiency Accuracy / efficiency tradeoff Cognition tradeoff

Frenkel, tinyML EMEA’21 keynote 34 Neuromorphic Engineering – Key Claims Unveiling roads to embedded cognition Bottom-up Top-down design design (exp. platforms) (accelerators) Neuroscience ? Real-world Large-scale application application Learning 2 3 silicon integration algorithm Neuromorphic processors Silicon Neuron & synapse ? 4 1 building blocks integration Neuroscience observation

Claim 1 Versatility / efficiency Hardware-Efficiencyaware neuroscience model designAccuracy and selection/ efficiency tradeoff allows reachingCognition record neuron and synapse densitiestradeoff with low- power operation for large-scale integration in silico.

Frenkel, tinyML EMEA’21 keynote 35 Neuromorphic Engineering – Key Claims Unveiling roads to embedded cognition Bottom-up Top-down design design (exp. platforms) (accelerators) Neuroscience ? Real-world Large-scale application application 2 Algorithms 3 silicon integration Neuromorphic processors Silicon Neuron & synapse ? 4 1 building blocks integration Neuroscience observation

Claim 2 VersatilityCombining/ efficiency event-driven and frame-basedEfficiency processing with Accuracy / efficiency weighttradeoff-transport-free update-unlockedCognition training supports tradeoff low-cost adaptive edge computing with spike-based sensors.

Frenkel, tinyML EMEA’21 keynote 36 Frenkel, tinyML EMEA’21 keynote Neuromorphic Engineering – Key Claims Unveiling roads to embedded cognition Bottom-up Top-down design design (exp. platforms) (accelerators) Neuroscience ? Real-world Large-scale application application 2 Algorithms 3 silicon integration Neuromorphic processors Silicon Neuron & synapse ? 4 1 building blocks integration Neuroscience observation

Versatility / efficiency Efficiency Accuracy / efficiency tradeoff Cognition tradeoff

Claim 3 Top-down guidance helps pushing bottom-up neuron and synapse integration beyond the purpose of neuroscience experimentation platforms, while bottom-up guidance supports top-down design toward brain reverse-engineering. 37 Perspectives ▪ Neuromorphic engineering and spiking neural networks: “Can we make it work?” “Will it bring a competitive advantage?” (not only against GPUs) Need something better than MNIST Audio (KWS) and bio-signal processing (time, biological-time) [Davies, Nat. Mach. Intel., 2019]

▪ Phenomenological digital design: pragmatic short-to-midterm approach. Promising avenues: leveraging the variability of subthreshold analog design; fine-grained mixed-signal design.

▪ Bottom-up trend: dendrites [Sacramento, NeurIPS’18] ▪ Top-down trend: new wave of training algorithms mapping onto bio-plausible primitives [Payeur, bioRxiv, 2020] [Bellec, Nat. Comms., 2020] Neuroscience ? Real-world ▪ Cognition: a case for neuromorphic robots? application application [Man & Damasio, Nat. Mach. Intel., 2019]

Neuromorphic processors ? Neuroscience observation

Frenkel, tinyML EMEA’21 keynote 38 Acknowledgments

PhD Postdoc

Institution:

Funding:

Supervisors:

Profs. David Bol, Jean-Didier Legat Prof. Giacomo Indiveri

Key colleagues:

Martin Lefebvre Frenkel, tinyML EMEA’21 keynote 39 Frenkel, tinyML EMEA’21 keynote @C_Frenkel Questions? cfrenkel

Charlotte-Frenkel

ChFrenkel

Main references: [email protected]

- ODIN: [C. Frenkel et al., “A 0.086-mm² 12.7-pJ/SOP 64k-synapse 256-neuron Open-sourced! online-learning digital spiking neuromorphic processor in 28nm CMOS,” github.com/ChFrenkel/ODIN IEEE Trans. BioCAS, 2019] - MorphIC: [C. Frenkel et al. “MorphIC: A 65-nm 738k-synapse/mm² quad-core binary- weight digital neuromorphic processor with stochastic spike-driven online learning,” IEEE Trans. BioCAS, 2019] - DRTP: [C. Frenkel, M. Lefebvre et al., “Learning without feedback: Fixed random Open-sourced! learning signals allow for feedforward training of deep neural networks,” github.com/ChFrenkel/Direct Frontiers in Neuroscience, 2021] RandomTargetProjection - SPOON: [C. Frenkel et al., “A 28-nm convolutional neuromorphic processor enabling online learning with spike-based retinas,” IEEE ISCAS, 2020] Journal extension coming soon - Review: [C. Frenkel, D. Bol and G. Indiveri, “Bottom-up and top-down neural processing systems design: Neuromorphic intelligence as the convergence Just released! of natural and artificial intelligence”, arXiv preprint arXiv:2106.01288, 2021] 40 Premier Sponsor Automated TinyML

Zero-сode SaaS solution

Create tiny models, ready for embedding, in just a few clicks!

Compare the benchmarks of our compact models to those of TensorFlow and other leading neural network frameworks.

Build Fast. Build Once. Never Compromise. Executive Sponsors Arm: The Software and Hardware Foundation for tinyML

1 1 Connect to Application high-level frameworks 2 Profiling and Optimized models for embedded 2 debugging AI Ecosystem Supported by tooling such as Partners end-to-end tooling Arm Keil MDK 3 Runtime (e.g. TensorFlow Lite Micro)

3 Optimized low-level NN libraries Connect to (i.e. CMSIS-NN) Runtime

RTOS such as Mbed OS

Stay Connected Arm Cortex-M CPUs and microNPUs @ArmSoftwareDevelopers

@ArmSoftwareDev Resources: developer.arm.com/solutions/machine-learning-on-arm

Dataset

Acquire valuable Enrich data and training data train ML securely algorithms

Edge Device Impulse Real sensors in real time Open source SDK Embedded and Test impulse edge compute with real-time deployment device data flows options Test

www.edgeimpulse.com Perception Advancing AI Object detection, speech IoT/IIoT research to make recognition, contextual fusion efficient AI ubiquitous Edge cloud Reasoning Scene understanding, language Automotive Power efficiencyPersonalization Efficient learning understanding, behavior prediction

Model design, Continuous learning, Robust learning compression, quantization, contextual, always-on, through minimal data, algorithms, efficient privacy-preserved, unsupervised learning, hardware, software tool distributed learning on-device learning Action Reinforcement learning Cloud A platform to scale AI for decision making across the industry Mobile

Qualcomm AI Research is an initiative of Qualcomm Technologies, Inc. Syntiant Corp. is moving artificial intelligence and machine learning from the cloud to edge devices. Syntiant’s chip solutions merge deep learning with semiconductor design to produce ultra-low-power, high performance, deep neural network processors. These network processors enable always-on applications in battery-powered devices, such as smartphones, smart speakers, earbuds, hearing aids, and laptops. Syntiant's Neural Decision ProcessorsTM offer wake word, command word, and event detection in a chip for always-on voice and sensor applications.

Founded in 2017 and headquartered in Irvine, California, the company is backed by Amazon, Applied Materials, Atlantic Bridge Capital, Bosch, Intel Capital, Microsoft, Motorola, and others. Syntiant was recently named a CES® 2021 Best of Innovation Awards Honoree, shipped over 10M units worldwide, and unveiled the NDP120 part of the NDP10x family of inference engines for low-power applications.

www.syntiant.com @Syntiantcorp Platinum Sponsors www.infineon.com

Gold Sponsors Adaptive AI for the Intelligent Edge

Latentai.com Build Smart IoT Sensor Devices From Data SensiML pioneered TinyML software tools that auto generate AI code for the intelligent edge.

• End-to-end AI workflow • Multi-user auto-labeling of time-series data • Code transparency and customization at each step in the pipeline

We enable the creation of production- grade smart sensor devices. sensiml.com Silver Sponsors Copyright Notice

The presentation(s) in this publication comprise the proceedings of tinyML® EMEA Technical Forum 2021. The content reflects the opinion of the authors and their respective companies. This version of the presentation may differ from the version that was presented at tinyML EMEA. The inclusion of presentations in this publication does not constitute an endorsement by tinyML Foundation or the sponsors.

There is no copyright protection claimed by this publication. However, each presentation is the work of the authors and their respective companies and may contain copyrighted material. As such, it is strongly encouraged that any use reflect proper acknowledgement to the appropriate source. Any questions regarding the use of any materials presented should be directed to the author(s) or their companies. tinyML is a registered trademark of the tinyML Foundation. www.tinyML.org