Neuromorphic Vision: Sensors to Event-Based Algorithms Q2 7
Total Page:16
File Type:pdf, Size:1020Kb
Journal Code Article ID Dispatch: 15-FEB-19 CE: D, Subramani WIDM 1310 No. of Pages: 34 ME: Received: 6 September 2018 Revised: 23 January 2019 Accepted: 24 January 2019 DOI: 10.1002/widm.1310 1 2 3 ADVANCED REVIEW 4 5 6 Neuromorphic vision: Sensors to event-based algorithms Q2 7 8 1,2 3 2 9 A. Lakshmi | Anirban Chakraborty | Chetan S. Thakur Q1 Q3 10 11 1Centre for Artificial Intelligence and Robotics, Regardless of the marvels brought by the conventional frame-based cameras, they 12 Defence Research and Development Organization, Bangalore, India have significant drawbacks due to their redundancy in data and temporal latency. 13 2Department of Electronic Systems Engineering, This causes problem in applications where low-latency transmission and high- 14 Indian Institute of Science, Bangalore, India speed processing are mandatory. Proceeding on this line of thought, the neurobio- 15 3 Department of Computational and Data Sciences, logical principles of the biological retina have been adapted to accomplish data 16 Indian Institute of Science, Bangalore, India Q4 sparsity and high dynamic range at the pixel level. These bio-inspired neuro- 17 Correspondence morphic vision sensors alleviate the more serious bottleneck of data redundancy by 18 Chetan Singh Thakur, Department of Electronic Systems Engineering, Indian Institute of Science, responding to changes in illumination rather than to illumination itself. This paper 19 Bangalore 560012, India. reviews in brief one such representative of neuromorphic sensors, the activity- 20 Email: [email protected] driven event-based vision sensor, which mimics human eyes. Spatio-temporal 21 encoding of event data permits incorporation of time correlation in addition to spa- 22 tial correlation in vision processing, which enables more robustness. Henceforth, 23 the conventional vision algorithms have to be reformulated to adapt to this new 24 generation vision sensor data. It involves design of algorithms for sparse, asynchro- 25 nous, and accurately timed information. Theories and new researches have begun 26 emerging recently in the domain of event-based vision. The necessity to compile 27 the vision research carried out in this sensor domain has turned out to be consider- 28 ably more essential. Towards this, this paper reviews the state-of-the-art event- 29 based vision algorithms by categorizing them into three major vision applications, 30 object detection/recognition, object tracking, localization and mapping. 31 32 This article is categorized under: 33 Technologies > Machine Learning Q5 34 35 36 37 38 1 | INTRODUCTION 39 40 In this era of rapid innovations and technological marvels, much effort is being put into developing visual sensors that aim to 41 impersonate the working principles of the human retina. These efforts are motivated by and driven towards the ultimate objec- 42 tive of catering for applications such as high-speed robotics and parsing/analyzing high dynamic range scenes. There are situa- 43 tions where the conventional video cameras fall short. The colossal amount of data from a conventional camera overwhelms 44 the processing of embedded systems and henceforth unavoidably turns into a bottleneck. This necessitates that some funda- 45 mental level of preprocessing has to be done at the acquisition level, as opposed to loading processing systems with redundant 46 data. In any case, developing sensors for this novel modality requires information-processing technologies built from the 47 domain of neuromorphic engineering. One such research effort prompted the development of neuromorphic vision sensors, 48 49 also known as event-based vision sensors, which have shown incredible accomplishment in furnishing solutions to the issues 50 encountered in conventional cameras. These sensors have led to a revolutionary way of recording information by asynchro- 51 nously registering only changes in illumination with microsecond accuracy. 52 These sensors find utility extensively in high-speed vision-based applications, which are indispensable in various fields 53 such as micro-robotics. The frame-based vision algorithms would not be effective in analyzing these event data-streams. As 54 55 WIREs Data Mining Knowl Discov. 2019;e1310. wires.wiley.com/dmkd © 2019 Wiley Periodicals, Inc. 1of34 https://doi.org/10.1002/widm.1310 2of34 LAKSHMI ET AL. 1 they expect synchronous data, these would not enable us to leverage upon the aforementioned benefits obtainable from neuro- 2 morphic vision sensor data. Consequently, a shift in the paradigm of algorithms from conventional frame-based to event-based 3 sensors is essential. The development of event-based vision algorithms is yet to reach the scale of development of their frame- 4 based counterparts as well as the level of accuracy these algorithms have been able to achieve in the conventional video data 5 domain. The main reason is the very limited commercial availability of these sensors and that these are accessible only as pro- 6 totypes or proof-of-concept. Nevertheless, quite a number of interesting computer vision algorithms are recently being devel- 7 oped for these type of sensors, which are optimally designed to exploit the higher temporal resolution and data sparsity of 8 these sensors. 9 With the development of present-day technologies, these sensors are starting to show up in the commercial field. This also 10 necessitates a structured and detailed study of important vision algorithms already conceived or needed to be developed for 11 12 analyzing the event data streams. However, there is hardly any such survey in the event domain unlike the plethora of similar 13 works existing for its frame-based counterparts. Hence, we present a survey of vision algorithms available for event data. 14 Despite the fact that our focus is to review event-based vision algorithms, we have additionally intended to provide a concise 15 overview of silicon retinae, with more emphasis on a specific type of silicon retina known as the temporal difference silicon 16 retina. Furthermore, we also briefly review the event datasets available in the public domain to evaluate the aforementioned 17 event-based vision algorithms. 18 The structure of the paper is as follows. The paper introduces silicon retina and temporal difference silicon retina in a nut- 19 shell, followed by the details of the state-of-the-art of three important event-based vision applications, viz., object detection 20 and recognition, object tracking as well as localization and mapping. As a closing remark, the paper gives a gist of open- 21 source event datasets along with the available codes1 and implementations of event-based vision sensor algorithms. 22 23 24 2 SILICON RETINA 25 | 26 Information processing of neurobiological systems is very exceptionally one-of-a-kind in relation to present-day image- 27 capturing devices. Biological vision systems dependably beat human-invented vision sensors. Conventional image sensors are 28 29 operated by a synchronous clock signal, which has resulted in an enormous amount of monotonous and redundant data accu- 30 mulation. This aspect has motivated the research in neuromorphic engineering to emulate vision sensors that mimic human 31 eyes, which has led to the development of silicon retinae. Silicon retinae are event-based, unlike conventional sensors, which 32 are frame-based. This makes it more similar to the biological model of the human eye, which is driven by events occurring in 33 the real world. The first electronic model of the retina came into existence in the early 70's (Fukushima, Yamaguchi, 34 Yasuda, & Nagata, 1970; Mahowald, 1994), following which numerous forms of silicon retinae widely known as event-based 35 vision sensors came into existence, including the spatial contrast retina that encodes relative light intensity change with respect 36 to the background (Barbaro, Burgi, Mortara, Nussbaum, & Heitger, 2002; Bardallo, Gotarredona, & Barranco, 2009; Ruedi 37 et al., 2009), gradient-based sensors that encode static edges, temporal intensity retina (Mallik, Clapp, Choi, Cauwenberghs, & 38 Cummings, 2005), and temporal difference retina (Kramer, 2002; Lichtsteiner, Posch, & Delbruck, 2006). This study reviews 39 the rationale of temporal difference silicon retinae, which are commonly known as activity-driven event-based vision sensors. 40 Temporal difference silicon retina generates events in response to changes in illumination in the scene in temporal space. 41 The following section discusses, in brief, the details of their architecture, types, history, advantages, companies involved, and 42 software available. 43 44 45 2.1 | Event generation mechanism 46 I (t), whose value is sampled at fixed time intervals Δt. This results in a 47 In a conventional sensor, an image consists of pixels x,y 48 representation that is incompatible with compact representation. In temporal difference silicon retina, the pixels with signifi- 0 49 cant luminance changes across time, Ix, yðÞt , are retained. An event at spatial location (x, y) and time t can be represented as 50 8 no < 0 0 △ 51 sign Ix, yðÞt if j Ix, yðÞjt > et = , ð1Þ 52 : 0 △ 0 if j Ix, yðÞjt < 53 54 where △ is the threshold. These events are spatio-temporal in nature. 55 LAKSHMI ET AL. 3of34 1 2.2 | Event transmission mechanism 2 As temporal difference silicon retina is spike based, it utilizes a spike-based interfacing procedure known as address event rep- 3 resentation (AER: Figure 1) to transmit asynchronous information over a narrow channel (Gotarredona, Andreou, & Barranco, F1 4 1999). In AER, each neuron is assigned a unique address by its encoder/arbiter. When an event is generated, the address of 5 the neuron and relevant information are sent to the destination over the bus. The decoder in the receiver interprets the received 6 information and assigns the received information to the particular destination. When the number of event-generating neurons 7 surpasses the transmission capacity of the bus, multiplexing is employed. 8 9 10 11 2.3 | Event processing mechanism 12 This section describes the way in which spatio-temporal data of silicon retina could be interpreted.