Efficient Generation of 3D Surfel Maps Using Rgb–D Sensors

Int. J. Appl. Math. Comput. Sci., 2016, Vol. 26, No. 1, 99–122 DOI: 10.1515/amcs-2016-0007 EFFICIENT GENERATION OF 3D SURFEL MAPS USING RGB–D SENSORS a,∗ b b ARTUR WILKOWSKI ,TOMASZ KORNUTA ,MACIEJ STEFANCZYK´ , b WŁODZIMIERZ KASPRZAK aIndustrial Research Institute for Automation and Measurements Al. Jerozolimskie 202, 02-486 Warsaw, Poland e-mail: [email protected] bInstitute of Control and Computation Engineering Warsaw University of Technology, ul. Nowowiejska 15/19, 00-665 Warsaw, Poland e-mail: {T.Kornuta,M.Stefanczyk,W.Kasprzak}@elka.pw.edu.pl The article focuses on the problem of building dense 3D occupancy maps using commercial RGB-D sensors and the SLAM approach. In particular, it addresses the problem of 3D map representations, which must be able both to store millions of points and to offer efficient update mechanisms. The proposed solution consists of two such key elements, visual odometry and surfel-based mapping, but it contains substantial improvements: storing the surfel maps in octree form and utilizing a frustum culling-based method to accelerate the map update step. The performed experiments verify the usefulness and efficiency of the developed system. Keywords: RGB-D, V-SLAM, surfel map, frustum culling, octree. 1. Introduction i.e., from the wheel odometry of mobile robots or from the direct kinematics of manipulators. However, inaccurate Industrial robots operate in structured, fully deterministic robot motion models (wheel drifts, friction, imprecise environments, thus they usually do not need an explicit kinematic parameters, etc.) lead to faulty odometry representation of the environment. In contrast, the lack readings, resulting in map inconsistencies. To overcome of up-to-date maps of the surroundings of service and these effects, algorithms were developed to estimate the field robots might cause serious effects, including the robot’s trajectory directly from sensor readings. They are damaging of both robots and objects, or even injuring part of a solution named “simultaneous localization and people. Besides, the diversity and complexity of tasks mapping” (SLAM) (Thrun and Leonard, 2008). There are performed by such robots requires several types of maps many flavours of SLAM systems, using diverse methods and different abstraction levels of representation. For of estimation of the robot and environment state, as example, simple geometric (occupancy) maps enable well as applying all kinds and combination of sensors, robots to avoid obstacles and collisions, whereas e.g., monocular vision in an probabilistic, EKF-based semantic maps allow reasoning, planning and navigation framework (Skrzypczynski,´ 2009). In particular, a between different places (Martínez Mozos et al., specialized term “visual SLAM” (V-SLAM in short) was 2007). A semantic map combines a geometric map of coined to distinguish the SLAM systems relying mainly the environment with objects recognized in the image on cameras, which relates to the fact that the estimation of (Kasprzak, 2010; Kasprzak et al., 2015). subsequent poses of the moving camera is called “visual Multiple sensor readings (images, point clouds) are odometry” (VO) (Nistér et al., 2004). required to generate a geometric map of the environment. Theoretically, the coordinate system transformation The recent advent of cheap RGB-D sensors between a pair of readings can be obtained in advance, accelerated the V-SLAM research. An RGB-D sensor provides two concurrent, synchronized image ∗Corresponding author streams—colour images and depth maps, which represent 100 A. Wilkowski et al. Fig. 1. Components of an example RGB-D-SLAM system. the colour of observed scene points and their relative current pose into account, allowing both visibility checks distance to the camera—that subsequently can be and reduction of depth quantization noise—the most combined into point clouds. Those clouds contain prominent type of noise in cheap RGB-D sensors. Finally, typically hundreds of thousands of points, acquired with every surfel can be processed independently, simplifying with a high-framerate, while the mapped areas are usually and speeding up the map update procedure in comparison complex. Thus, several efficiency and quality issues in with a mesh-like representation. V-SLAM must be addressed: The existing solutions for surfel map creation are not • speed-optimized. A single update of the map might take precision and computational efficiency of point cloud even several seconds (Henry et al., 2012). In this paper, registration, two substantial efficiency improvements are proposed: • global coherence of maps, 1. utilization of the “frustum culling” method, based • efficient and precise representation of a huge number on an octree representation, to accelerate surfel of points. updating; In this paper, the focus is on the third issue. There 2. replacement of a dense ICP algorithm by a much is a proposition developed for an efficient representation more efficient sparse-ICP method, as proposed by of a precise three-dimensional map of the environment. Dryanovski et al. (2013). An “efficient representation” means it is compact (there are few spurious points) and can be rapidly accessed and The article is organized as follows. In Section 2 updated. the recent developments in the field of RGB-D-based A 3D occupancy map can be represented in several mapping are presented. A structure of a typical V-SLAM ways: system is described (Section 2.1), followed by a survey of its key steps: the feature extraction (Section 2.2), 1. “voxel” representation—the 3D space is divided into feature matching (Section 2.3) and the estimation of small “cubes” (Marder-Eppstein et al., 2010), camera motion (Section 2.4) in visual odometry. Then, solutions to dense point clouds alignment (Section 2.5) 2. “surfel” representation—the space is modelled by and global map coherence (Section 2.6) are explained. In an arrangement of surface elements (Krainin et al., Section 3 the most popular 3D mapping systems based 2010), on RGD-D V-SLAM are reviewed. Section 4 describes 3. MLS (multi-level surface)—contains a list of the proposed solution, starting from its important data obstacles (occupied voxels) associated with each structures, followed by a short overview of the system and element of a 2D grid (Triebel et al., 2006), operation details. In Section 5, the ROS-based system implementation is shortly outlined. Section 6 presents 4. octal trees (octrees)—constitute an effective experiments performed for the validation of developed implementation of the voxel map using the octal tree solution. Conclusions are given in Section 7. Appendix structure (Wurm et al., 2010), contains a theoretical primer on “frustum culling”. 5. combined voxel-surfel representation (e.g., the multi-resolution surfel map). 2. RGB-D-based V-SLAM The core of the proposed system is a map of 2.1. Structure of the RGB-D-SLAM system. A small surface patches of uniform size (called surfels), state diagram of an example RGB-D-SLAM system integrating readings of an RGB-D sensor. The utilization is presented in Fig. 1. First, the camera motion of a surfel map has several advantages (Weise et al., 2009; (the transition between consecutive poses) is estimated. Henry et al., 2014). Firstly, it is a compact representation Typically, it is based on matching features extracted from of a sequence of point clouds being merged into surfels. both current and previous images, finding parts of both Secondly, the surfel update procedure takes the sensor’s images corresponding to the same scene elements. There Efficient generation of 3D surfel maps using RGB-D sensors 101 are algorithms based only on RGB image features (either cloud). An example of a global feature calculated from the for a single camera (Nistér et al., 2004) or a stereo-camera RGB image is a color histogram of all the object points. A case (Konolige et al., 2011)), but the additional depth good example of a global feature extracted from the point information makes the matching process much easier. cloud is the VFH (viewpoint feature histogram) (Rusu As a result, a sparse 3D cloud of features is generated, et al., 2010), which creates a histogram of angles between allowing robust estimation of the transformation between surface normal vectors for pairs of points constituting the camera poses. object cloud. Usually, a subset of detected features is sufficient to In the case of local features, one should consider estimate the transformation. Thus, when taking numerous keypoint detectors and feature descriptors. Recently subsets, different estimates of the transformation are proposed detectors locating key points in RGB images found. Typically this step is based on the RANSAC include the FAST (features from accelerated segment (random sample consensus) algorithm (Fischler and test) (Rosten and Drummond, 2006) and the AGAST Bolles, 1981), which selects the transformation that is the (adaptive and generic accelerated segment test) (Mair most consistent with the majority of feature pairs. et al., 2010)), whereas sample novel descriptors The next step, transformation refinement is are BRIEFs (binary robust independent elementary typically based on the ICP (iterative closest point) features) (Calonder et al., 2010) and the FREAK (fast algorithm (Zhang, 1994). This is an iteration of retina keypoint) (Alahi et al., 2012). least-square-error (LSE) solutions to the problem of There are also several features having their dense point cloud correspondence. own detectors and descriptors, such as SIFT (Lowe, The four steps above (feature extraction and 1999) or BRISKs (binary robust invariant scalable matching, transform estimation and refinement) are keypoints) (Leutenegger et al., 2011). The latter possess repeated for every next RGB-D image. However, the so-called “binary descriptor”—a vector of boolean the optimal alignments between every two consecutive values instead of real or integer numbers. It is worth images can eventually lead to a map that is globally noting that in recent years binary descriptors have become inconsistent. This is especially visible in the case of more popular, especially in real-time applications due to closed loops in the sensor’s trajectory.

Efficient Generation of 3D Surfel Maps Using Rgb–D Sensors

Luna Moth: Supporting Creativity in the Cloud

3D Modeling: Surfaces

Full Body 3D Scanning

Bonus Ch. 2 More Modeling Techniques

Digital Sculpting with Zbrush

3D Modeling and the Role of 3D Modeling in Our Life

Using Depth Cameras for Dense 3D Modeling of Indoor Environments

3D Computer Graphics Compiled By: H

Extents and Limitations of 3D Computer Models for Graphic Analysis of Form and Space in Architecture SALEH UDDIN, Mohammed University of Missouri-Columbia, U.S.A

Multi-Resolution Surfel Maps for Efficient Dense 3D Modeling And

Autodesk 3Ds Max Design Fundamentals

Architectural Visualization Produce Accurate, Beautiful, High-Quality Architectural Renderings