Introducing Slambench, a Performance and Accuracy Benchmarking Methodology for SLAM

Introducing SLAMBench, a performance and accuracy benchmarking methodology for SLAM Luigi Nardi1, Bruno Bodin2, M. Zeeshan Zia1, John Mawer3, Andy Nisbet3, Paul H. J. Kelly1, Andrew J. Davison1, Mikel Lujan´ 3, Michael F. P. O’Boyle2, Graham Riley3, Nigel Topham2 and Steve Furber3 Kernels Architectures Correctness Performance Metrics Frame rate Computer bilateralFilter (..) halfSampleRobust (..) Vision renderVolume (..) integrate (..) : Energy : Accuracy Compiler/Runtime Hardware ICL-NUIM Dataset Fig. 1: The SLAMBench framework makes it possible for experts coming from the computer vision, compiler, run-time, and hardware communities to cooperate in a unified way to tackle algorithmic and implementation alternatives. Abstract— Real-time dense computer vision and SLAM offer While classical point feature-based simultaneous locali- great potential for a new level of scene modelling, tracking sation and mapping (SLAM) techniques are now crossing and real environmental interaction for many types of robot, into mainstream products via embedded implementation in but their high computational requirements mean that use on mass market embedded platforms is challenging. Mean- projects like Project Tango [3] and Dyson 360 Eye [1], while, trends in low-cost, low-power processing are towards dense SLAM algorithms with their high computational re- massive parallelism and heterogeneity, making it difficult for quirements are largely at the prototype stage on PC or robotics and vision researchers to implement their algorithms laptop platforms. Meanwhile, there has been a great focus in a performance-portable way. In this paper we introduce in computer vision on developing benchmarks for accuracy SLAMBench, a publicly-available software framework which represents a starting point for quantitative, comparable and comparison, but not on analysing and characterising the validatable experimental research to investigate trade-offs in envelopes for performance and energy consumption. performance, accuracy and energy consumption of a dense The growing interest in real-time applications means that RGB-D SLAM system. SLAMBench provides a KinectFusion the difficulty of parallelisation affects every application de- implementation in C++, OpenMP, OpenCL and CUDA, and veloper. Existing benchmarks are typically general purpose, harnesses the ICL-NUIM dataset of synthetic RGB-D sequences with trajectory and scene ground truth for reliable accuracy e.g. the MediaBench suite [16] for evaluation of embedded comparison of different implementation and algorithms. We multimedia and communications, the DaCapo benchmark present an analysis and breakdown of the constituent algorith- [7] for sequential client and server Java code, and the mic elements of KinectFusion, and experimentally investigate SPEC CPU benchmark [14] which stresses the processor, their execution time on a variety of multicore and GPU- memory subsystem and compiler. Such benchmarks have accelerated platforms. For a popular embedded platform, we also present an analysis of energy efficiency for different proved valuable for the evaluation of existing scientific and engineering applications, but they fail to capture essential arXiv:1410.2167v2 [cs.RO] 26 Feb 2015 configuration alternatives. elements of modern computer vision algorithms and, in I. INTRODUCTION particular, SLAM applications. To address these issues, we Recent advances in computer vision have led to real- present SLAMBench [4] (see Fig. 1) which is a portable time algorithms for dense 3D scene understanding, such implementation of KinectFusion with verifiable accuracy as KinectFusion [19], which estimate the pose of a depth relative to a known ground truth. SLAMBench allows com- camera while building a highly detailed 3D model of the puter vision researchers to provide and evaluate alternative environment. Such real-time 3D scene understanding capa- algorithmic implementations but it also enables domain- bilities can radically change the way robots interact with and specific optimisation, auto-tuning and domain-specific lan- manipulate the world. guages (DSLs) in dense SLAM. It takes a step forward improving the way academia and industry evaluate system 1Dept. Computing, Imperial College, London, UK fluigi.nardi, zeeshan.zia, p.kelly, [email protected] design and implementation in computer vision. To the best 2Institute for Computing Systems Architecture, The Univ. of Edinburgh, of our knowledge, SLAMBench is the first performance, Scotland fbbodin, mob, [email protected] energy and accuracy benchmark dedicated to 3D scene 3School of Computer Science, Univ. of Manchester, UK fandy.nisbet, john.mawer, mikel.lujan, understanding applications. graham.riley, [email protected] The contributions of this paper are: • We present a SLAM performance benchmark that com- systematically comparing the accuracy of the reconstruction bines a framework for quantifying quality-of-result with while measuring the performance. In this work we focus instrumentation of execution time and energy consump- specifically on SLAM and introduce a publicly-available tion. framework for quantitative, comparable and validatable ex- • We present a qualitative analysis of the parallel pat- perimental research in the form of a benchmark for dense terns underlying each of the constituent computational 3D scene understanding. A key feature of SLAMBench is elements of the KinectFusion algorithm. that it is designed on top of the recently-proposed ICL- • We present a characterisation of the performance of NUIM accuracy benchmark [12], and thus supports wider the KinectFusion implementation using both multicore research in hardware and software. The quantitative evalu- CPUs and GPUs, on a range of hardware platforms from ation of solution accuracy into SLAMBench enables algo- desktop to embedded. rithmic research to be performed. The latter is an important • We present an analysis of the algorithm’s energy con- feature that is lacking in current performance benchmarks. A sumption under various implementation configurations typical output of SLAMBench consists of the performance using an embedded, multicore system on chip (SoC) achieved, and the accuracy of the result along with the with GPU acceleration. energy consumption (on platforms where such measurement • We offer a platform for a broad spectrum of future is possible). These parameters capture the potential trade-offs research in jointly exploring the design space of algo- for real-time vision platforms. rithmic and implementation-level optimisations. III. SLAM AND KINECTFUSION II. RELATED WORK Computer vision research has traditionally focused on SLAM systems aim to perform real-time localisation and optimising the accuracy of algorithms. In autonomous driv- mapping “simultaneously” for a sensor moving through an ing, for example, the KITTI benchmark suite [11] provides unknown environment. SLAM could be part of a mobile data and evaluation criteria for the stereo, optical flow, robot, enabling the robot to update its estimated position visual odometry and 3D object recognition. The ICL-NUIM in an environment, or an augmented reality (AR) system dataset [12] and TUM RGB-D benchmark [23] aim to where a camera is moving through the environment, allowing benchmark the accuracy of visual odometry and SLAM the system to render graphics at appropriate locations in algorithms. However, in an energy and performance con- the scene (e.g. to illustrate repair procedures in-situ [13] or strained context, such as a battery-powered robot, it is to animate a walking cartoon character on top of a table). important to achieve sufficient accuracy while maximising Localisation typically estimates the location and pose of the battery life. New benchmarks are needed which provide tools sensor (e.g. a camera) with regard to a map which is extended and techniques to investigate these constraints. An important as the sensor explores the environment. early benchmark suite for performance evaluation entirely The KinectFusion [19] algorithm utilises a depth camera dedicated to computer vision is SD-VBS [25]. SD-VBS to perform real-time localisation and dense mapping. A provides single-threaded C and MATLAB implementations single raw depth frame from these devices has both noise of 28 commonly used computer vision kernels that are and holes. KinectFusion registers and fuses the stream of combined to build 9 high-level vision applications; only some measured depth frame obtained as the scene is viewed from modules relevant to SLAM are included, notably Monte different viewpoints into a clean 3D geometric map. While Carlo localisation. SD-VBS [25] prohibits modifications to it is beyond the scope of this paper to go into the details the algorithm, only allowing the implementation to be tuned of the KinectFusion algorithm, we briefly outline the key to suit novel hardware architectures. This limits the use of computational steps involved in the top row of Fig. 2. the benchmark in the development of novel algorithms. KinectFusion normalizes each incoming depth frame and Another attempt at such performance evaluation is applies a bilateral filter [24] (Preprocess); before computing MEVBench [9], which focuses on a set of visual recognition a point cloud (with normals) for each pixel in the camera applications including face detection, feature classification, frame of reference. Next, KinectFusion estimates (Track) object tracking and feature extraction. It provides single the new 3D pose of the moving camera by registering this and multithreaded

Introducing Slambench, a Performance and Accuracy Benchmarking Methodology for SLAM

Enabling the Use of Low Power Mobile and Embedded Technologies For

DISI - University of Trento

Report on Tuned Linux-ARM Kernel and Delivery of Kernel Patches to the Linux Kernel Version 1.0

Proyecto Fin De Grado

Que E Un Arduino

Energy Neutral Wireless Sensing for Server Farms Monitoring

D5.5– Intermediate Report on Porting and Tuning of System Software to ARM Architecture Version 1.0

On First Page

Design, Construction, and Use of a Single Board Computer Beowulf Cluster: Application of the Small- Footprint, Low-Cost, Insignal 5420 Octa Board

Samsung Exynos 5 Board - 10-30-2012 by Vincent - Streamcomputing

Fast Software Polynomial Multiplication on ARM Processors Using the NEON Engine Danilo Câmara, Conrado Gouvêa, Julio López, Ricardo Dahab

KVM/ARM: the Design and Implementation of the Linux ARM Hypervisor