Autonomous 3D Reconstruction, Mapping and Exploration of Indoor Environments with a Robotic

IEEE ROBOTICS AND AUTOMATION LETTERS. PREPRINT VERSION. ACCEPTED JUNE, 2019 1 Autonomous 3D reconstruction, mapping and exploration of indoor environments with a robotic arm Yiming Wang1, Stuart James2, Elisavet Konstantina Stathopoulou3, Carlos Beltran-Gonz´ alez´ 4 Yoshinori Konishi5 and Alessio Del Bue1 Abstract—We propose a novel information gain metric that the autonomous agent is embodied by a robotic arm equipped combines hand-crafted and data-driven metrics to address the with a RGBD camera providing both depth and pose. The next best view problem for autonomous 3D mapping of unknown proposed approach is general and can be extended to other indoor environments. For the hand-crafted metric, we propose an entropy-based information gain that accounts for the previous mobile systems [25], [12] and aerial 3D mapping [18]. view points to avoid the camera to revisit the same location and Regardless various tasks, such as active object recogni- to promote the motion toward unexplored or occluded areas. tion [17], [13], [10] and 3D reconstruction [16], [15], [24], Whereas for the learnt metric, we adopt a Convolutional Neural [12], [4], the main methodology of NBV is similar: modelling Network (CNN) architecture and formulate the problem as a the information gain and selecting the next view with the most classification problem. The CNN takes as input the current depth image and outputs the motion direction that suggests the informative measurement where additional constraints, such as largest unexplored surface. We train and test the CNN using a the motion-related cost and safety, can be applied [16], [25], new synthetic dataset based on the SUNCG dataset. The learnt [24], [12], [4]. Our system follows such paradigm: at each time motion direction is then combined with the proposed hand- step the system selects a new pose among a set of candidates crafted metric to help handle situations where using only the that satisfies certain constraints regarding: i) navigation, i.e. hand-crafted metric tends to face ambiguities. We finally evaluate the autonomous paths over several real and synthetic indoor the new pose should be reachable from the current pose, scenes including complex industrial and domestic settings and and ii) feasible 3D reconstruction, i.e. the new pose should prove that our combined metric is able to further improve the guarantee a minimal Field of View (FoV) overlapping with exploration coverage compared to using only the proposed hand- current pose. The pose with the highest metric that quantifies crafted metric. the information gain will be the next pose. Index Terms—Computer Vision for Automation; Range Sens- Modelling the information gain plays a key role in the NBV ing; Deep Learning in Robotics and Automation problem. In this work we propose a novel hand-crafted metric together with a data-driven metric to address the severely ill- posed NBV problem for 3D mapping of an unknown envi- I. INTRODUCTION ronment. For the hand-crafted information metric, we propose UTONOMOUS systems need to perceive their surround- an entropy-based volumetric utility that accounts for historical A ing 3D world in order to navigate and operate safely. view points to avoid the camera revisiting the same pose, while Even if 3D reconstruction is a mature technology, less attention attracting the camera to visit voxels with a lower confidence has been posed to the problem of efficiently mapping and that can be caused by occlusion. We achieve this via a view- covering the 3D structure of an unknown space. This task has dependent visibility descriptor that records the accumulated strong relations to the longstanding Next Best View (NBV) information gain distributed by viewing directions. Different problem [3], [14], [22], [19] but with the additional issue of from inferring the occlusion from spatial hints as in other not having any prior knowledge of the environment where works [15], [4]; we try to infer the occlusion via the temporal- the autonomous system is moving. In particular, in this work spatial hint provided by the proposed descriptor. For the data-driven metric, we introduce an approach that avoids to Manuscript received: February, 24, 2019; Revised May, 27, 2019; Accepted June, 19, 2019. formulate the task as a regression problem or training a 3D This paper was recommended for publication by Editor Cesar Dario Cadena CNN, which commonly require large numbers of parameters to Lerma upon evaluation of the Associate Editor and Reviewers’ comments. optimize along with extensive training data. Instead, we learn 1 Y. Wang and A. Del Bue are with Visual Geometry and Mod- the information utility function in a classification setting which elling (VGM) Lab, Istituto Italiano di Tecnologia (IIT), Genova, Italy [email protected] needs less training data and computation. We propose to use a 2S. James is with Center for Cultural Heritage Technology (CCHT), Istituto more efficient CNN architecture, that takes as input the current Italiano di Tecnologia (IIT), Marghera, Italy depth image and outputs the ranked motion directions as a 3E. K. Stathopoulou contributed to this paper while she was a member of the Visual Geometry and Modelling (VGM) Lab, Istituto Italiano di indication for where to explore. We further propose various Tecnologia (IIT), Genova, Italy strategies to combine the learnt metric with the proposed hand- 4 C. Beltran-Gonz´ alez´ is with Pattern Analysis and Computer Vision crafted metric in order to drive the robot out of ambiguous (PAVIS) Department, Istituto Italiano di Tecnologia (IIT), Genova, Italy 5Y. Konishi is with OMRON research, Kyoto, Japan. situations when using only the hand-crafted metric. This likely Digital Object Identifier (DOI): see top of this page. happens in the beginning of the exploration when moving any 2 IEEE ROBOTICS AND AUTOMATION LETTERS. PREPRINT VERSION. ACCEPTED JUNE, 2019 Fig. 1: At each step the pose and depth are provided to the motion planner. The Depth CNN provides a data-driven metric for suggesting the direction while the volumetric gain is a hand-crafted metric. The two metrics are combined to produce the next movement. This process is iterated taking the best of both metrics to maximize surface coverage in as few steps as possible. directions can be equally good in terms of volumetric gain. contraction for occlusion plane) voxels that are defined as To summarize, our contributions are: 1) A novel entropy voxels bordering free and occluded space [24]. based view dependent information gain function, 2) A simple In addition to counting-based metrics, there are also metrics and efficient data-driven CNN that learns the information gain based on probabilistic occupancy estimation that accounts for function over depth images, 3) a combined hand-crafted and the measurement uncertainty [6], [12],[11]. The main method data-driven technique over time and 4) a synthetic and real for computing probabilistic information metrics is based on dataset with ground truth for NBV problems. information entropy [15], [7], [12], [4]. As a ray traverse the The rest of the paper is organized as follow. SectionII map, the information gain of each ray is the accumulated discusses the related works on selecting the NBV for 3D gain of all visible voxels in the form of either a sum [4] or exploration and mapping. Section III describes our hand- an average [7]. The sum favours views with rays traversing crafted information gain utility, SectionIV describes the deeper into the map, while the average favours more on the CNN structure for learning the motion hints, and SectionV uncertainty of voxels regardless the ray depth. Moreover, inac- details how we combine both utilities. SectionVI provides the curate prediction of the new measurement probability during experimental evaluation and finally Section VII concludes the ray tracing can be an issue for computing the information paper providing perspectives for future work. gain if occlusion is not considered. To address this issue, Potthast and Sukhatme [15] utilize a Hidden Markov Model to estimate the likelihood of an unknown voxel being visible at II. RELATED WORK any viewpoints. The work in [4] accounts for the probability In this section, we will cover related works on NBV for of a voxel being occluded via weighting the information gain 3D exploration and mapping with particular focus on the by the product of the emptiness probability of all voxels modelling of information gain. before reaching that voxel. Although the computation can be NBV without 3D knowledge is an ill-posed problem and different, the heuristic behind both [15], [4] is similar, i.e. a how to model the information gain plays an essential role voxel with a large unobserved volume between its position in addressing this task. The modelling of information gain and the candidate view position is more likely to be occluded greatly depends on how the 3D environment is represented, and therefore contributes less information gain. which can be categorized as surface-based [2] and volume- In addition to the hand-crafted information gain metrics, based representation [20]. The volumetric representation is there is also a recent effort to train a 3D CNN that aims to learn often employed for online motion planing for its compactness the information utility function [5]. 3D CNN is trained with and efficiency in visibility operations [4]. The status of each the known 3D models of the scene, and the utility is defined voxel can be either occupied, free or unknown. The candidate as the decrease in uncertainty of surface voxels with a new camera poses for the next move should be in the free voxels measurement. The goal is that, at run time, the network works and satisfy any kinematic or navigation constraints. as an oracle that can predict the information gain for any input Multiple volumetric information metrics have been propose given the current occupancy map.

Autonomous 3D Reconstruction, Mapping and Exploration of Indoor Environments with a Robotic

Image-Based 3D Reconstruction: Neural Networks Vs. Multiview Geometry

Amodal 3D Reconstruction for Robotic Manipulation Via Stability and Connectivity

Stereoscopic Vision System for Reconstruction of 3D Objects

Configurable 3D Scene Synthesis and 2D Image Rendering with Per-Pixel Ground Truth Using Stochastic Grammars

3D Scene Reconstruction from Multiple Uncalibrated Views

3D Shape Reconstruction from Vision and Touch

3D Reconstruction Is Not Just a Low-Level Task: Retrospect and Survey

Automatic Reconstruction of Textured 3D Models of Textured 3Dmodels Automatic Reconstruction Dipl.-Ing

Image-Based Synthesis and Re-Synthesis of Viewpoints Guided by 3D Models

3D Reconstruction and Recognition Acknowledgement

Bayesian Reconstruction of 3D Human Motion from Single-Camera Video

Arxiv:2001.05613V2 [Cs.CV] 14 Oct 2020 Mental Results Demonstrate That the Mean Per Joint Position I.E., Parts Or All of the Body Must Not Be Lost at Any Time