Silicon Retina Technology Tobi Delbruck Inst

Silicon retina technology Tobi Delbruck Inst. of Neuroinformatics, University of Zurich and ETH Zurich

Sensors Group sensors.ini.uzh.ch

Sponsors: Swiss National Science Foundation NCCR Robotics project, EU projects SEEBETTER and

VISUALISE, Samsung, DARPA 1 sensors.ini.uzh.ch inilabs.com

Sponsors: Swiss National Science Foundation NCCR Robotics, EU projects CAVIAR, SEEBETTER, VISUALISE, Samsung, DARPA, University of Zurich and ETH Zurich 2 Conventional cameras (Static vision sensors) output a stroboscopic sequence of frames

Golf-guides.blogspot.com

Muybridge 1878 (150 years ago) Good Bad Compatible with 50+years of machine vision Redundant output Allows small pixels (1um for consumer, 3-5um Temporal aliasing for machine vision) Limited dynamic range (60dB) Fundamental “latency vs. power” trade-off 3 The Human Eye as a digital camera

100M photoreceptors 1M output fibers carrying max 100Hz spike rates 180dB (109) operating range >20 different “eyes” Many GOPs computing 3mW power consumption

Output is sparse, asynchronous stream of digital spike events 4 This talk has 4 parts •Dynamic Vision Sensor Silicon Retinas •Simple object tracking by algorithmic processing of events •Using probabilistic methods for state estimation •“Data-driven” deep inference with CNNs 5 DVS (Dynamic Vision Sensor) Pixel threshold ON Brightness logI change OFF events Event reset comparators I (ganglion cells) ±훥log퐼 change amplifier (bipolar cells) photoreceptor Lichtsteiner et al., ISSCC 2007, JSSC 2009

From Rodieck 1998 6 DVS pixel has wide dynamic range

780 lux ON events 5.8 lux

780 lux 5.8 lux Edmund 0.1 density chart Illumination ratio=135:1 ISSCC 2007 Using DVS for high speed (low data rate) imaging

Data rate <1MBps “Frame rate” equivalent to 10 kHz but 100x less data (10 kHz image sensor x 16k pixels = 160 MBps)

ISSCC 2007 DAVIS (Dynamic and Active Pixel Vision Sensor) Pixel Intensity threshold reset ON Intensity Change value logI events OFF Event reset comparators ±훥log퐼 I (ganglion cells) change amplifier (bipolar cells) photoreceptor Brandli et al., Symp VLSI, JSSC 2014

From Rodieck 1998 9 10 DVS/DAVIS +IMU demo Start DAVIS Demo

Brandli, Berner, Delbruck et al., Symp. VLSI 2013, JSSC 2014, ISCAS 2015 11 DAVIS (Dynamic and Active Pixel Vision Sensor) Pixel Intensity threshold reset ON Intensity Change value logI events OFF Event reset comparators ±훥log퐼 I (ganglion cells) change amplifier (bipolar cells) photoreceptor Brandli et al., Symp VLSI, JSSC 2014

From Rodieck 1998 12 DAVIS346 8mm

AER DVS asynch. event readout

Bias generator 180nm CIS 346x260 18.5um pixel DAVIS

APS col-parallel ADCs and scanner

13 Important layout considerations 1. Post layout simulations to minimize parasitic coupling 2. Shielding parasitic photodiodes

14 Event threshold matching measurement Experiment: Apply slow triangle wave LED stimulus to entire array, measure njmber of events that pixels generate

Conclusion: Pixels generate 11±3 events per factor 3.3 contrast. Since ln(3.3)=1.19 and 1.19/11=0.11, contrast threshold=11% ± 4% Measuring DVS pixel latency Experiment: Stimulate small area of sensor with flashing LED spot, measure response latencies from recorded event stream

jitter

latency

time

Conclusion: Pixels can have minimum latency of about 12us under bright illumination. But “real world” latencies are more like 100us-1ms. DVS pixel has built-in temperature compensation

Photoreceptor Threshold

VpT ln(Ip) onT ln(Ion/Id) Since photoreceptor gain and threshold voltage both

scale with absolute temperature TNozaki,, it cancels Delbruck 2017 out (unpublished) 17 Integrated bias generator and circuit design enables operation over extended temperature range

Nozaki, Delbruck 2017 (submitted) 18 DVS pixel size trend 350nm

350/180nm

180nm

90nm Global shutter APS Rolling shutter consumer APS

https://docs.google.com/spreadsheets/d/1pJfybCL7i_wgH3qF8zsj1JoWMtL0zHKr9eygikBdElY/edit#gid=0 20 Event camera silicon retina developments

DVS/DAVIS ATIS/CCAM

Commercial entities DVS Inilabs (Zurich) – R&D prototoypes CeleX Insightness (Zurich) – Drones and Augmented Reality Samsung (S Korea) – Consumer electronics Pixium Vision (Paris) – Retinal implants Inivation (Zurich) – Industrial applications, Automotive Chronocam (Paris) - Automotive Hillhouse (Singapore) - Automotive 21 www.iniLabs.com

Founded 2009 Run as not-for-profit Neuromorphic sensor R&D prototypes Open source software, user guides, app notes, sample data Shipped devices based on multiproject wafer silicon to 100+ organizations

22 • Dynamic Vision Sensor Silicon Retinas • Simple object tracking by algorithmic processing of events • Using probabilistic methods for state estimation • “Data-driven” deep inference with CNNs

23 Tracking objects from DVS events using spatio-temporal coherence 1. For each event, find nearest cluster • If event within a cluster, move cluster • If event not within cluster, seed new cluster 2. Periodically prune starved clusters, merge clusters, etc (lifetime mgmt)

Advantages 1. Low computational cost (e.g. <5% CPU) 2. No frame memory (~100 bytes/object). 3. No frame correspondence problem

Litzenberger 2007 24/30 Robo Goalie

Delbruck et al, ISCAS 2007, Frontiers 2013 25 Using DVS allows 2 ms reaction time at 4% processor load with USB bus connections

26 This talk has 4 parts •Dynamic Vision Sensor Silicon Retinas •Simple object tracking by algorithmic processing of events •Using probabilistic methods for state estimation •“Data-driven” deep inference with CNNs 27 Simultaneous Mosaicing and Tracking with DVS

Hanme Kim, A. Handa, … Andy J. Davison, BMVC 2014. 28 Simultaneous Mosaicing and Tracking with DVS

Hanme Kim, A. Handa, … Andy J. Davison, BMVC 2014. 29 Goal: To do event-based, semi-dense visual odometry

•We want to estimate State vector 푠 (camera pose, visual scene spatial brightness gradients and sensor event thresholds) using Bayesian filtering from the events 푒: 푝 푠 푒 •Sensor likelihood 푝 푒 푠 is modeled as mixture of inlier Gaussian distribution and outlier uniform distribution •A tractable posterior 푞 푠 푒 ≈ 푝(푠|푒) is approximated by Kullback-Leibler (KL) divergence •Leads to closed-form update equations in the form of a classical Kalman filter, thus computationally efficient (unlike particle filtering)

(submitted to PAMI, 2016) G. Gallego E. Mueggler D. Scaramuzza Towards event-based, semi-dense SLAM: 6-DOF pose estimation

G. Gallego et al., PAMI (submitted 2016). This talk has 4 parts •Dynamic Vision Sensor Silicon Retinas •Simple object tracking by algorithmic processing of events •Using probabilistic methods for state estimation •“Data-driven” deep inference with CNNs 32 Demo - RoShamBo

33 RoShamBo CNN architecture Conventional 5-layer LeNet with ReLU/MaxPool and 1 FC layer before output.

240x180 DVS “frames” MaxPool MaxPool MaxPool 2x2 MaxPool 2x2 2x2 2x2 32x14x14 64x6x6 16x30x30 128x2x2 Paper Scissors Rock Background

Conv 3x3 Conv 3x3 32x28x28 Conv 3x3 128x4x4 64x12x12 64x64 Conv 5x5 16x60x60 Total 18MOp (~9M MAC) DVS 2D rectified Conv 1x1 histogram of + 2k events MaxPool 2x2 Compute times: 128x1x1 (0.1Hz – 1kHz rate) On 150W Core i7 PC in Caffe: 2ms On 1W CNN accelerator on FPGA: 8ms

I.-A. Lungu, F. Corradi, and T. Delbruck, “Live Demonstration: Convolutional Neural Network Driven by Dynamic Vision Sensor Playing RoShamBo,” in 2017 IEEE Symposium on Circuits and Systems (ISCAS 2017), Baltimore, MD, USA, 2017. A. Aimar, E. Calabrese, H. Mostafa, A. Rios-Navarro, R. Tapiador, I.-A. Lungu, A. Jimenez-Fernandez, F. Corradi, S.-C. Liu, A. Linares-Barranco, and T. Delbruck, “Nullhop: Flexibly efficient FPGA CNN accelerator driven by DAVIS neuromorphic vision sensor,” in NIPS 2016, Barcelona, 2016. 36 Conclusions 1. The DVS was developed by following a neuromorphic approach of emulating key properties of biological retinas 2. The wide dynamic range and sparse, quick output make these sensors useful in real time uncontrolled conditions 3. Applications could include vision prosthetics, surveillance, robotics and consumer electronics 4. The precise timing could improve learning and inference 5. The main challenges are to reduce pixel size and to develop effective algorithms. Only industry can do the first but academia has plenty of room to play for the second. 6. Event sensors can nicely drive deep inference. There is a lot of room for improvement of deep inference power efficiency at the system level! 37