Online Surfel-Based Mesh Reconstruction

Total Page:16

File Type:pdf, Size:1020Kb

Online Surfel-Based Mesh Reconstruction This is the author’s version of an article that has been published in this journal. Changes were made to this version by the publisher prior to publication. The final version of record is available at IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. XX, NO. X, SEPTEMBER 2018 1 SurfelMeshing: Online Surfel-Based Mesh Reconstruction Thomas Schops,¨ Torsten Sattler, and Marc Pollefeys Abstract—We address the problem of mesh reconstruction from live RGB-D video, assuming a calibrated camera and poses provided externally (e.g., by a SLAM system). In contrast to most existing approaches, we do not fuse depth measurements in a volume but in a dense surfel cloud. We asynchronously (re)triangulate the smoothed surfels to reconstruct a surface mesh. This novel approach enables to maintain a dense surface representation of the scene during SLAM which can quickly adapt to loop closures. This is possible by deforming the surfel cloud and asynchronously remeshing the surface where necessary. The surfel-based representation also naturally supports strongly varying scan resolution. In particular, it reconstructs colors at the input camera’s resolution. Moreover, in contrast to many volumetric approaches, ours can reconstruct thin objects since objects do not need to enclose a volume. We demonstrate our approach in a number of experiments, showing that it produces reconstructions that are competitive with the state-of-the-art, and we discuss its advantages and limitations. The algorithm (excluding loop closure functionality) is available as open source at Index Terms—3D Modeling and Scene Reconstruction, RGB-D SLAM, Real-Time Dense Mapping, Applications of RGB-D Vision, Depth Fusion, Loop Closure, Surfels. F 1 INTRODUCTION HE availability of fast programmable GPUs has enabled autonomous navigation [7], [8], [9] benefit from continuous T handling the massive amount of RGB-D data generated surface representations, e.g., meshes. On the other side, scene by depth sensors such as the Kinect in real-time. This has led representations need to be flexible to handle spatial deforma- to dense Simultaneous Localization and Mapping (SLAM) tions caused by loop closure events in SLAM. It is important systems [1], [2], online 3D scene perception approaches [3], that they can be updated efficiently if new data becomes [4], and applications such as Augmented Reality (AR) [5], available to allow for real-time processing. Ideally, a scene [6] building on top of SLAM and dense mapping. representation should also be adaptive in terms of resolution: Performing dense mapping in real-time is particularly It should be able to model both large structures and small important for interactions with the environment in AR or in or thin objects without too much memory overhead. robotics. For example, robots navigating through unknown The predominant scene representation for RGB-D SLAM scenes must map their surroundings quickly to be able and real-time dense 3D reconstruction is a voxel vol- to advance their path planning [7], [8], [9]. Complex AR ume [12]. Each voxel in the volume stores the truncated applications, e.g., AR games in which users can race cars signed distance to the closest surface, which can be updated through their living room, need more information than is very efficiently in parallel. A continuous surface representa- provided in a single depth map, which would be sufficient tion in the form of a mesh can then be extracted from this for occlusion handling but does not provide any information volume with the Marching Cubes algorithm [13]. However, in the occluded areas. Consumer applications benefit from volume-based methods lack flexibility: Handling loop clo- online mapping which avoids waiting times for users. On- sures during online operation is expensive, since accurate arXiv:1810.00729v2 [cs.CV] 20 Nov 2019 line mapping is also useful as a live preview for offline 3D compensation can imply changing the whole volume. At reconstructions, e.g., created with hand-held mobile devices. the same time, the resolution of the voxel volume is often Especially non-expert users benefit from this as a feedback fixed in practice for efficiency reasons [2], [14], [15], which about which parts of the scene have been reconstructed in limits the adaptiveness of the representation (c.f . Fig.2, left). good quality already and where additional data is needed. In addition, objects cannot be reconstructed if they are small A central question in the context of SLAM and online or thin with respect to the voxel size. dense mapping is the choice of scene representation. On Another popular approach is to represent the scene the one side, applications such as collision detection in with a set of points or with surfels [16], [17], i.e., oriented physics simulations [10], occlusion handling in AR applica- discs (c.f . Fig.2, middle right). This scene representation tions [5], [11], and obstacle detection and path planning for is flexible, since point or surfel coordinates can be updated very efficiently for the whole reconstruction. It is also highly adaptive as measurements at a higher resolution lead to • Thomas Sch¨opsand Marc Pollefeys are with the Department of Computer denser point resp. surfel clouds, and it easily handles thin Science, ETH Zurich, Switzerland. E-mail: objects. The main drawback of this representation is its dis- • Marc Pollefeys is additionally with Microsoft, Zurich. crete nature, which can be resolved by meshing. However, • Torsten Sattler is with Chalmers University of Technology. global methods such as Delaunay triangulation [18], [19], E-mail: [20] or Poisson reconstruction [21] are too computationally Copyright (c) 2019 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing This is the author’s version of an article that has been published in this journal. Changes were made to this version by the publisher prior to publication. The final version of record is available at IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. XX, NO. X, SEPTEMBER 2018 2 Fig. 1. Triangle mesh reconstructed with our method with loop closures handled during reconstruction, colored (left) and shaded. expensive to run in real-time on the dense point clouds gen- erated by RGB-D sensors. In contrast, efficient local meshing methods [22], [23] are very sensitive to noise, leading to both noisy surface reconstructions and holes in the meshes. In this paper, we show that online meshing of recon- structed dense surfel clouds is possible through denois- ing and adaptive remeshing (c.f . Fig.1). Our algorithm is a novel combination of surfel reconstruction [17], [24] (Sec. 3.1) with fast point set triangulation [22], [23] (Sec. 3.3). The key step for enabling local triangulation is a novel surfel denoising scheme (Sec. 3.2) to handle the noisy input data InfiniTAM [29] FastFusion [27] ElasticFusion [17] Ours (c.f . Fig.8). For efficiency, flexibility, and adaptiveness, we introduce a remeshing algorithm (Sec. 3.4) that recomputes Fig. 2. Left to right: voxel-based meshes by InfiniTAM [29] and FastFu- sion [27] (each with default voxel size settings), surfel splat rendering by the mesh locally if required. In contrast to volume-based ElasticFusion [17], our surfel-based mesh. The same camera trajectory methods, our approach easily adapts to loop closures and is used with all approaches. Notice the sharper colors produced by our leads to higher resolution models (c.f . Fig.2) while automat- method. ically handling thin objects (c.f . Fig. 15). In contrast to purely point resp. surfel cloud-based approaches, our method re- constructs high-resolution meshes during online operation. create consistent scene models. We exclude offline methods To the best of our knowledge, ours is the first approach to as the focus of our work is on (real-time) online operation. provide a continuous surface reconstruction while allowing Voxel volume-based methods build on truncated signed flexible deformations for loop closures, being adaptive to distance function (TSDF) fusion [12]. Popularized by Kinect- different resolutions, and allowing efficient updates. Fusion [1], many following works focused on scaling this Looking at the complete system, there is a fundamental approach to larger scenes [25], [26], adding multi-resolution question to be asked about how tightly the SLAM and 3D capability [27], [28], or improving efficiency [29], [30]. Mesh reconstruction components should be integrated with each extraction typically runs at a lower frame rate compared to other. A tight integration allows to share data structures, al- TSDF fusion. In contrast to our method, these works do not lowing both systems to influence each other while reducing handle loop closures during reconstruction. duplicated efforts. Voxel-based methods such as KinectFu- sion [1] commonly use the volume for both tracking and Kintinuous [2] uses a volume that moves with the cam- mapping. A loose integration enables to easily exchange era. Scene parts leaving this volume are triangulated [23], the 3D reconstruction and SLAM parts independently. This and loops are closed by deforming the mesh. Compared to improves portability, e.g., one could use ARCore resp. ARKit our method, [2] does not merge meshes after loop closures, when porting a system to Android respectively iOS. In this resulting in multiple (potentially inconsistent) meshes per article, we focus on the 3D reconstruction part and present surface. Kahler¨ et al.[14] use many small subvolumes that a meshing pipeline that is rather loosely coupled, while of- move independently to handle loop closures, combining fering the potential for tighter integration. In fact, making a overlapping subvolumes by weighted averaging during step towards a tight integration between both parts, similar rendering.
Recommended publications
  • Rendering of Complex Heterogenous Scenes Using Progressive Blue Surfels
    Rendering of Complex Heterogenous Scenes using Progressive Blue Surfels SASCHA BRANDT, Paderborn University CLAUDIUS JÄHN, MATTHIAS FISCHER, Paderborn University FRIEDHELM MEYER AUF DER HEIDE, Paderborn University Fig. 1. 20 donut production lines with ∼50M Polygons each (∼1G total) rendered at interactive frame rates of ∼120 fps at a resolution of 2560×1440. We present a technique for rendering highly complex 3D scenes in real-time the interest dropped as one might consider the problem solved con- by generating uniformly distributed points on the scene’s visible surfaces. sidering the vast computational power of modern graphics hardware The technique is applicable to a wide range of scene types, like scenes and the rich set on different rendering techniques like culling and directly based on complex and detailed CAD data consisting of billions of approximation techniques. polygons (in contrast to scenes handcrafted solely for visualization). This A new challenge arises from the newest generation of head allows to visualize such scenes smoothly even in VR on a HMD with good mounted displays (HMD) like the Oculus Rift or its competitors image quality, while maintaining the necessary frame-rates. In contrast to other point based rendering methods, we place points in an approximated – high framerates are no longer a nice-to-have feature but become blue noise distribution only on visible surfaces and store them in a highly a hard constraint on what content can actually be shown to the GPU efficient data structure, allowing to progressively refine the number of user. To avoid nausea it is necessary to keep a high frame-rate of rendered points to maximize the image quality for a given target frame rate.
    [Show full text]
  • Luna Moth: Supporting Creativity in the Cloud
    Pedro Alfaiate Instituto Superior Técnico / Luna Moth INESC-ID Inês Caetano Instituto Superior Técnico / INESC-ID Supporting Creativity in the Cloud António Leitão Instituto Superior Técnico / INESC-ID 1 ABSTRACT Algorithmic design allows architects to design using a programming-based approach. Current algo- 1 Migration from desktop application rithmic design environments are based on existing computer-aided design applications or building to the cloud. information modeling applications, such as AutoCAD, Rhinoceros 3D, or Revit, which, due to their complexity, fail to give architects the immediate feedback they need to explore algorithmic design. In addition, they do not address the current trend of moving applications to the cloud to improve their availability. To address these problems, we propose a software architecture for an algorithmic design inte- grated development environment (IDE), based on web technologies, that is more interactive than competing algorithmic design IDEs. Besides providing an intuitive editing interface which facilitates programming tasks for architects, its performance can be an order of magnitude faster than current aalgorithmic design IDEs, thus supporting real-time feedback with more complex algorithmic design programs. Moreover, our solution also allows architects to export the generated model to their preferred computer-aided design applications. This results in an algorithmic design environment that is accessible from any computer, while offering an interactive editing environment that inte- grates into the architect’s workflow. 72 INTRODUCTION programming languages that also support traceability between Throughout the years, computers have been gaining more ground the program and the model: when the user selects a component in the field of architecture. In the beginning, they were only used in the program, the corresponding 3D model components are for creating technical drawings using computer-aided design highlighted.
    [Show full text]
  • 3D Modeling: Surfaces
    CS 430/536 Computer Graphics I Overview • 3D model representations 3D Modeling: • Mesh formats Surfaces • Bicubic surfaces • Bezier surfaces Week 8, Lecture 16 • Normals to surfaces David Breen, William Regli and Maxim Peysakhov • Direct surface rendering Geometric and Intelligent Computing Laboratory Department of Computer Science Drexel University 1 2 1994 Foley/VanDam/Finer/Huges/Phillips ICG 3D Modeling Representing 3D Objects • 3D Representations • Exact • Approximate – Wireframe models – Surface Models – Wireframe – Facet / Mesh – Solid Models – Parametric • Just surfaces – Meshes and Polygon soups – Voxel/Volume models Surface – Voxel – Decomposition-based – Solid Model • Volume info • Octrees, voxels • CSG • Modeling in 3D – Constructive Solid Geometry (CSG), • BRep Breps and feature-based • Implicit Solid Modeling 3 4 Negatives when Representing 3D Objects Representing 3D Objects • Exact • Approximate • Exact • Approximate – Complex data structures – Lossy – Precise model of – A discretization of – Expensive algorithms – Data structure sizes can object topology the 3D object – Wide variety of formats, get HUGE, if you want each with subtle nuances good fidelity – Mathematically – Use simple – Hard to acquire data – Easy to break (i.e. cracks represent all primitives to – Translation required for can appear) rendering – Not good for certain geometry model topology applications • Lots of interpolation and and geometry guess work 5 6 1 Positives when Exact Representations Representing 3D Objects • Exact
    [Show full text]
  • Full Body 3D Scanning
    3D Photography: Final Project Report Full Body 3D Scanning Sam Calabrese Abhishek Gandhi Changyin Zhou fsmc2171, asg2160, Figure 1: Our model is a dancer. We capture his full-body 3D model by combining image-based methods and range scanner, and then do an animation of dancing. Abstract Compared with most laser scanners, image-based methods using triangulation principles are much faster and able to provide real- In this project, we are going to build a high-resolution full-body time 3D sensing. These methods include depth from motion [Aloi- 3D model of a live person by combining image-based methods and monos and Spetsakis 1989], shape from shading [Zhang et al. laser scanner methods. A Leica 3D range scanner is used to obtain 1999], depth from defocus/focus [Nayar et al. 1996][Watanabe and four accurate range data of the body from four different perspec- Nayar 1998][Schechner and Kiryati 2000][Zhou and Lin 2007], and tives. We hire a professional model and adopt many measures to structure from stereo [Dhond and Aggarwal 1989]. They often re- minimize the movement during the long laser-scanning. The scan quire the object surface to be textured, non-textured, or lambertian. data is then sequently processed by Cyclone, MeshLab, Scanalyze, These requirements often make them impractical in many cases. VRIP, PlyCrunch and 3Ds Max to obtain our final mesh. We take In addition, image-based methods usually cannot give a precision three images of the face from frontal and left/right side views, and depth estimation since they do patch-based analysis.
    [Show full text]
  • Bonus Ch. 2 More Modeling Techniques
    Bonus Ch. 2 More Modeling Techniques When it comes to modeling in modo, the sky is the limit. This book is designed to show you all of the techniques available to you, through written word and visual examples on the DVD. This chapter will take you into another project, in which you’ll model a landscape. From there, you’ll texture it, and later you’ll add the environment. You’ll see how modo’s micro polygon displacement works and how powerful it is. From there, you’ll create a cool toy gun. The techniques used in this project will show you how to create small details that make the model come to life. Then, you’ll learn to texture the toy gun to look like real plastic. Building a Landscape Landscapes traditionally have been a chore for 3D artists. This is because to prop- erly create them, you need a lot of geometry. A lot of geometry means a lot of polygons, and a lot of polygons means a lot of render time. But the team at Luxology has introduced micro polygon displacement in modo 201/202, allow- ing you to create and work with simple objects, but render with millions of poly- gons. How is this possible? The micro polygon displacement feature generates additional polygons at render time. The goal is that finer details can be achieved without physically modeling them into the base object. You can then add to the details achieved through micro poly displacements with modo’s bump map capabilities and generate some terrific-looking models.
    [Show full text]
  • Digital Sculpting with Zbrush
    DIGITAL SCULPTING WITH ZBRUSH Vincent Wang ENGL 2089 Discourse Analysis 2 ZBrush Analysis Table of Contents Context ........................................................................................................................... 3 Process ........................................................................................................................... 5 Analysis ........................................................................................................................ 13 Application .................................................................................................................. 27 Activity .......................................................................................................................... 32 Works Cited .................................................................................................................. 35 3 Context ZBrush was created by the Pixologic Inc., which was founded by Ofer Alon and Jack Rimokh (Graphics). It was first presented in 1999 at SIGGRAPH (Graphics). Version 1.5 was unveiled at the MacWorld Expo 2002 in New York and SIGGRAPH 2002 in San Antonio (Graphics). Pixologic, the company describes the 3D modeling software as a “digital sculpting and painting program that has revolutionized the 3D industry…” (Pixologic). It utilizes familiar real-world tools in a digital environment, getting rid of steep learning curves and allowing the user to be freely creative instead of figuring out all the technical details. 3D models that are created in
    [Show full text]
  • 3D Modeling and the Role of 3D Modeling in Our Life
    ISSN 2413-1032 COMPUTER SCIENCE 3D MODELING AND THE ROLE OF 3D MODELING IN OUR LIFE 1Beknazarova Saida Safibullaevna 2Maxammadjonov Maxammadjon Alisher o’g’li 2Ibodullayev Sardor Nasriddin o’g’li 1Uzbekistan, Tashkent, Tashkent University of Informational Technologies, Senior Teacher 2Uzbekistan, Tashkent, Tashkent University of Informational Technologies, student Abstract. In 3D computer graphics, 3D modeling is the process of developing a mathematical representation of any three-dimensional surface of an object (either inanimate or living) via specialized software. The product is called a 3D model. It can be displayed as a two-dimensional image through a process called 3D rendering or used in a computer simulation of physical phenomena. The model can also be physically created using 3D printing devices. Models may be created automatically or manually. The manual modeling process of preparing geometric data for 3D computer graphics is similar to plastic arts such as sculpting. 3D modeling software is a class of 3D computer graphics software used to produce 3D models. Individual programs of this class are called modeling applications or modelers. Key words: 3D, modeling, programming, unity, 3D programs. Nowadays 3D modeling impacts in every sphere of: computer programming, architecture and so on. Firstly, we will present basic information about 3D modeling. 3D models represent a physical body using a collection of points in 3D space, connected by various geometric entities such as triangles, lines, curved surfaces, etc. Being a collection of data (points and other information), 3D models can be created by hand, algorithmically (procedural modeling), or scanned. 3D models are widely used anywhere in 3D graphics.
    [Show full text]
  • Multi-View Scene Capture by Surfel Sampling: from Video Streams to Non-Rigid 3D Motion, Shape & Reflectance
    Multi-View Scene Capture by Surfel Sampling: From Video Streams to Non-Rigid 3D Motion, Shape & Reflectance Rodrigo L. Carceroni Kiriakos N. Kutulakos Departamento de Ciˆencia da Computac¸˜ao Department of Computer Science Universidade Federal de Minas Gerais University of Toronto Belo Horizonte, MG, CEP 31270-010, Brazil Toronto, ON M5S3H5, Canada Abstract In this paper we study the problem of recovering the 3D shape, reflectance, and non-rigid motion properties of a dynamic 3D scene. Because these properties are completely unknown and because the scene’s shape and motion may be non-smooth, our approach uses multiple views to build a piecewise-continuous geometric and radiometric representation of the scene’s trace in space-time. A basic primitive of this representation is the dynamic surfel, which (1) encodes the instantaneous local shape, reflectance, and motion of a small and bounded region in the scene, and (2) enables accurate prediction of the region’s dynamic appearance under known illumination conditions. We show that complete surfel-based reconstructions can be created by repeatedly applying an algorithm called Surfel Sampling that combines sampling and parameter estimation to fit a single surfel to a small, bounded region of space-time. Experimental results with the Phong reflectance model and complex real scenes (clothing, shiny objects, skin) illustrate our method’s ability to explain pixels and pixel variations in terms of their underlying causes — shape, reflectance, motion, illumination, and visibility. Keywords: Stereoscopic vision (3D reconstruction, multiple-view geometry, multi-view stereo, space carving); Motion analysis (multi-view motion estimation, direct estimation methods, image warping, deformation analysis); 3D motion capture; Reflectance modeling (illumination modeling, Phong reflectance model) This research was conducted while the authors were with the Departments of Computer Science and Dermatology at the University of Rochester, Rochester, NY, USA.
    [Show full text]
  • Using Depth Cameras for Dense 3D Modeling of Indoor Environments
    RGB-D Mapping: Using Depth Cameras for Dense 3D Modeling of Indoor Environments Peter Henry1, Michael Krainin1, Evan Herbst1, Xiaofeng Ren2, Dieter Fox1;2 Abstract RGB-D cameras are novel sensing systems that capture RGB images along with per-pixel depth information. In this paper we investigate how such cam- eras can be used in the context of robotics, specifically for building dense 3D maps of indoor environments. Such maps have applications in robot navigation, manip- ulation, semantic mapping, and telepresence. We present RGB-D Mapping, a full 3D mapping system that utilizes a novel joint optimization algorithm combining visual features and shape-based alignment. Visual and depth information are also combined for view-based loop closure detection, followed by pose optimization to achieve globally consistent maps. We evaluate RGB-D Mapping on two large indoor environments, and show that it effectively combines the visual and shape informa- tion available from RGB-D cameras. 1 Introduction Building rich 3D maps of environments is an important task for mobile robotics, with applications in navigation, manipulation, semantic mapping, and telepresence. Most 3D mapping systems contain three main components: first, the spatial align- ment of consecutive data frames; second, the detection of loop closures; third, the globally consistent alignment of the complete data sequence. While 3D point clouds are extremely well suited for frame-to-frame alignment and for dense 3D reconstruc- tion, they ignore valuable information contained in images. Color cameras, on the other hand, capture rich visual information and are becoming more and more the sensor of choice for loop closure detection [21, 16, 30].
    [Show full text]
  • 3D Computer Graphics Compiled By: H
    animation Charge-coupled device Charts on SO(3) chemistry chirality chromatic aberration chrominance Cinema 4D cinematography CinePaint Circle circumference ClanLib Class of the Titans clean room design Clifford algebra Clip Mapping Clipping (computer graphics) Clipping_(computer_graphics) Cocoa (API) CODE V collinear collision detection color color buffer comic book Comm. ACM Command & Conquer: Tiberian series Commutative operation Compact disc Comparison of Direct3D and OpenGL compiler Compiz complement (set theory) complex analysis complex number complex polygon Component Object Model composite pattern compositing Compression artifacts computationReverse computational Catmull-Clark fluid dynamics computational geometry subdivision Computational_geometry computed surface axial tomography Cel-shaded Computed tomography computer animation Computer Aided Design computerCg andprogramming video games Computer animation computer cluster computer display computer file computer game computer games computer generated image computer graphics Computer hardware Computer History Museum Computer keyboard Computer mouse computer program Computer programming computer science computer software computer storage Computer-aided design Computer-aided design#Capabilities computer-aided manufacturing computer-generated imagery concave cone (solid)language Cone tracing Conjugacy_class#Conjugacy_as_group_action Clipmap COLLADA consortium constraints Comparison Constructive solid geometry of continuous Direct3D function contrast ratioand conversion OpenGL between
    [Show full text]
  • Extents and Limitations of 3D Computer Models for Graphic Analysis of Form and Space in Architecture SALEH UDDIN, Mohammed University of Missouri-Columbia, U.S.A
    Go to contents 18 Extents and Limitations of 3D Computer Models for Graphic Analysis of Form and Space in Architecture SALEH UDDIN, Mohammed University of Missouri-Columbia, U.S.A. This paper investigates the strength and limitations of basic 3D diagrammatic models and their related motion capabilities in the context of graphic analysis. The focus of such analysis is to create a computer based environment to represent visual analysis of architectural form and space. The paper highlights the restrictions that were found in a specific 3D-computer model environment to satisfy a basic diagrammatic need for analysis. Motion related features that take into account of parametric changes are also investigated to help enhance representation of analytic models. Acknowledging the restrictions, it can be stated that computational media are the only ones at present that can create an interactive multimedia format using components constructed through various computational techniques Keywords: 3D Model, Analytic Diagram, Motion Model, Form Analysis, Design Principles. Analysis points’ and also supported the theoretical model. Throughout history, the notion of ‘precedent’ has Thus, it can be said that an analysis is an abstract always provided conceptual models to serve the quest process of simplification that reduces a total work into for appropriate architectural forms. It is a relatively its essential elements. In architecture, the primary goal recent phenomenon that considers architecture of a design analysis and its representation is to expose through a veil of literary metaphors. Since the end the underlying concept, organizational pattern, design product of architecture is a built outcome, the most characteristics, and ‘tectonics’ through simplified basic theoretical stance must be supported in turn by diagrams.
    [Show full text]
  • Multi-Resolution Surfel Maps for Efficient Dense 3D Modeling And
    Journal of Visual Communication and Image Representation 25(1):137-147, Springer, January 2014, corrected Table 3 & Sec 6.2. Multi-Resolution Surfel Maps for Efficient Dense 3D Modeling and Tracking J¨orgSt¨uckler and Sven Behnke Autonomous Intelligent Systems, Computer Science Institute VI, University of Bonn, Friedrich-Ebert-Allee 144, 53113 Bonn, Germany fstueckler, Abstract Building consistent models of objects and scenes from moving sensors is an important prerequisite for many recognition, manipulation, and navigation tasks. Our approach integrates color and depth measurements seamlessly in a multi-resolution map representation. We process image sequences from RGB-D cameras and consider their typical noise properties. In order to align the images, we register view-based maps efficiently on a CPU using multi- resolution strategies. For simultaneous localization and mapping (SLAM), we determine the motion of the camera by registering maps of key views and optimize the trajectory in a probabilistic framework. We create object models and map indoor scenes using our SLAM approach which includes randomized loop closing to avoid drift. Camera motion relative to the acquired models is then tracked in real-time based on our registration method. We benchmark our method on publicly available RGB-D datasets, demonstrate accuracy, efficiency, and robustness of our method, and compare it with state-of-the- art approaches. We also report on several successful public demonstrations where it was used in mobile manipulation tasks. Keywords: RGB-D image registration, simultaneous localization and mapping, object modeling, pose tracking 1. Introduction Robots performing tasks in unstructured environments must estimate their pose in reference to objects and parts of the environment.
    [Show full text]