High Performance Motion Detection: Some Trends
Total Page:16
File Type:pdf, Size:1020Kb
High performance motion detection: some trends toward new embedded architectures for vision systems Lionel Lacassagne, Antoine Manzanera, Julien Denoulet, Alain Mérigot To cite this version: Lionel Lacassagne, Antoine Manzanera, Julien Denoulet, Alain Mérigot. High performance motion detection: some trends toward new embedded architectures for vision systems. Journal of Real- Time Image Processing, Springer Verlag, 2009, 4 (2), pp.127-146. 10.1007/s11554-008-0096-7. hal- 01131002 HAL Id: hal-01131002 https://hal.archives-ouvertes.fr/hal-01131002 Submitted on 12 Mar 2015 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Journal of RealTime Image Processing manuscript No. (will be inserted by the editor) Lionel Lacassagne · Antoine Manzanera · Julien Denoulet · Alain Merigot´ High Performance Motion Detection: Some trends toward new embedded architectures for vision systems Received: date / Revised: date Abstract The goal of this article is to compare some opti- Introduction mized implementations on current High Performance Plat- forms in order to highlight architectural trends in the field For more than thirty years, Moore’s law had ruled the perfor- of embedded architectures and to get an estimation of what mance and the development of computers: speed and clock should be the components of a next generation vision sys- frequency were the races to win. This trend slowly drifted tem. We present some implementations of robust motion de- as the processing power of computers reached a seemingly tection algorithms on three architectures: a general purpose stable value. Other constraints (static current consumption, RISC processor - the PowerPC G4 - a parallel artificial retina leakage, less MIPS per gates and less MIPS per Watts) of dedicated to low level image processing - Pvlsar34 - and the current technology - 90 nm and 65 nm - gave researchers Associative Mesh, a specialized architecture based on asso- an impulse to look for innovative directions to improve effi- ciative net. To handle the different aspects and constraints of ciency and performance of their architectures. Current chal- embedded systems, execution time and power consumption lenge is to tackle power consumption to increase systems of these architectures are compared. autonomy. Such technology, like IP core within embedded systems, make the processor frequency adaptable and lead to a finely optimized energy consumption. As image pro- cessing and computer vision are very CPU demanding, we focus on the impact of the architecture for a frequently used class of algorithms: the motion detection algorithms. Keywords Three architectural paradigms are compared: Embedded system, Vision-SoC, RISC, SIMD, SWAR, par- allel architecture, programmable artificial retina, associative – SWAR: SIMD Within A Register: the impact of the SIMD nets model, vision, image processing, motion detection, High multimedia extension inside RISC processors, to enhance Performance Computing. performance; – programmable artificial retina: one elementary processor Lionel Lacassagne per pixel for cellular massively parallel computation and Institut d’Electronique Fondamentale (IEF/AXIS) low power consumption; Universite´ Paris Sud – associative net: impact of reconfigurable graph/net be- E-mail: [email protected] tween processors for local and global computations and Antoine Manzanera also the impact of asynchronous processors on power Laboratoire d’Electronique et d’Informatique consumption. Ecole Nationale Superieure´ de Techniques avancees´ (ENSTA) E-mail: [email protected] We focus on the advantages and limitations of these ar- Julien Denoulet chitectures through a set of benchmarks. We also show how Laboratoire des Instruments et Systemes` d’Ile de France to modify the algorithms to take advantage each architec- (LISIF/SYEL) Universite´ Pierre et Marie Curie ture’s specificities. We provide different performance indexes E-mail: [email protected] like speed, energy required and a “down-clocking” frequency Alain Merigot´ to enforce real-time execution. Such indexes provide insight Institut d’Electronique Fondamentale (IEF/AXIS) on future trends in computer architecture for embedded sys- Universite´ Paris Sud tems. E-mail: [email protected] 2 The paper is organized as follow. The first section in- “frame difference” option is classical and fairly obvious: the troduces a set of motion detection algorithms: Frame Dif- temporal derivative is approximated by a difference between ference, Markovian relaxation, Sigma-Delta algorithm and consecutive frames, whose absolute value is used as a single post-processing morphological operators. The second sec- motion map (observation) Ot(x) = |It(x)−It−1(x)| and the tion presents the three architectures: the PowerPC G4, Pvl- initial value of the motion label Eˆt is obtained by threshold- sar34 (a 200 × 200 Programmable Artificial Retina) and the ing Ot. The “SigmaDelta” option – detailed in Section 1.1 – Associative Mesh (an asynchronous net with SIMD func- is a recent algorithm (29), based on non-linear estimation of tional units). This section also provides details about how temporal statistics of every pixel. algorithms are optimized in regard to the targeted architec- The spatiotemporal regularization part aims at exploiting tures. The third section deals with benchmarking: bench- the correlations between neighboring pixels in the motion mark of the different algorithms in term of speed and in term measures in order to improve the localization of the moving of power consumption. To conclude, a synthesis of two ex- objects. Two main options are considered here: (a) Morpho- tensive benchmarks is provided. logical filtering, detailed in Section 1.2 and (b) Markovian relaxation, detailed in Section 1.3. So, the “Sigma-Delta” can be seen as a pre-processing 1 Motion detection algorithms step for the Markovian regularization or as the main algo- rithm when followed by a morphological post-processing. As the number of places observed by cameras is constantly increasing, a natural trend is to eliminate the human interac- tion within the video monitoring systems and to design fully 1.1 Sigma-Delta Estimation automatic video surveillance devices. Although the relative importance of the low level image processing may vary from The principle of the Σ∆ algorithm is to estimate two pa- one system to the other, the computational weight of the rameters Mt and Vt of the temporal signal It within every low level operators is generally high, because they involve a pixel using Σ∆ modulations. It is composed of four steps: great amount of data. Thus, the ability of video surveillance (1) update the current background image Mt with a Σ∆ fil- systems to detect a relevant event (intrusion, riot, distress...) ter, (2) compute the frame difference between Mt and It, is strongly related to the performance of some crucial image (3) update the time-variance image Vt from the difference Ot ˆ processing functions. using a Σ∆ filter and (4) estimate the initial motion label Et Such fundamental processing step is the motion detec- by comparing the current difference Ot and time-variance tion, whose purpose is to partition the pixels of every frame Vt. of the image sequence into two classes: the background, cor- for each pixel x: responding to pixels belonging to the static scene (label: 0) if Mt(x) < It(x),Mt(x) = Mt−1(x) + 1 and the foreground, corresponding to pixels belonging to a if Mt(x) > It(x),Mt(x) = Mt−1(x) − 1 moving object (label: 1). A motion detection algorithm must otherwise Mt(x) = Mt−1(x) discriminate the moving objects from the background as ac- step1: update Mt curately as possible, without being too sensitive to the sizes for each pixel x: and velocities of the objects, or to the changing conditions of Ot(x) = |Mt(x) − It(x)| the static scene. For long autonomy and discretion purposes, step2: compute Ot the system must not consume too much computational re- sources (energy and circuit area). The motion detection is for each pixel x such that Ot(x) 6= 0: if Vt(x) < N × Ot(x),Vt(x) = Vt−1(x) + 1 usually the most computationally demanding function of a if Vt(x) > N × Ot(x),Vt(x) = Vt−1(x) − 1 video surveillance system. How the algorithm is actually otherwise Vt(x) = Vt−1(x) computed and on which architecture, then become crucial step3: update Vt questions. x Three algorithm/architecture pairs will be considered here. for each pixel : if Ot(x) < Vt(x) In order to compare those very different architectures, we then Eˆt = 0 will consider different versions of motion detection algo- else Eˆt = 1 rithms with similar quality but relying on different compu- step4: estimate Eˆ tational models, some of them being more adapted to one t architecture than the other. Apparently, the only parameter is the amplification fac- The motion detection algorithm can be separated into tor N of the difference (typical values of N are in 2 ... 4). two parts: (1) time-differentiation and (2) spatiotemporal reg- The dimension of N is the number of standard deviation ularization. used in the initialization of the motion label. In fact, the The purpose of the time-differentiation part is to pro- updating frequency, which has the dimension of number of vide, for every pixel x and every time index t: a measure gray level per second, can also be adapted. This is a way of of the temporal variation (the observation), denoted Ot and customizing the Σ∆ estimation to different kinds of motion an initial value of the motion binary label, denoted Eˆt. The and image noise (30). 3 ^ εB (I)(x) = I(x − b) γB = δB ◦ εB The final motion label Et then corresponds to the objects b∈B (connected components) bigger than Bn that appear on two _ δB (I)(x) = I(x + b) ϕB = εB ◦ δB consecutive frames.