Deliberative Perception
Total Page:16
File Type:pdf, Size:1020Kb
Deliberative Perception Venkatraman Narayanan CMU-RI-TR-17-67 Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Robotics The Robotics Institute Carnegie Mellon University Pittsburgh, Pennsylvania 15213 Thesis Committee Maxim Likhachev, Chair Martial Hebert Siddhartha S. Srinivasa Manuela M. Veloso Dieter Fox, University of Washington August 2017 COPYRIGHT © 2017 VENKATRAMAN NARAYANAN ii Abstract A recurrent and elementary robot perception task is to identify and localize objects of interest in the physical world. In many real-world situations such as in automated warehouses and assembly lines, this task entails localizing specific object instances with known 3D models. Most modern-day methods for the 3D multi-object local- ization task employ scene-to-model feature matching or regression/classification by learners trained on synthetic or real scenes. While these methods are typically fast in producing a result, they are often brittle, sensitive to occlusions, and depend on the right choice of features and/or training data. This thesis introduces and advocates a deliberative approach, where the multi-object localization task is framed as an optimization over the space of hypothesized scenes. We demonstrate that deliberative reasoning — such as understanding inter-object oc- clusions — is essential to robust perception, and that discriminative techniques can effectively guide such reasoning. The contributions of this thesis broadly fall under three parts: The first part, PErception via SeaRCH (PERCH) and its extension C-PERCH, formu- lates Deliberative Perception as an optimization over hypothesized scenes, and devel- ops an efficient tree search algorithm for the same. The second part focuses on accelerating global search through statistical learners, in the form of search heuristics (Discriminatively-guided Deliberative Perception), and by modulating the search-space (RANSAC-Trees). The final part introduces general-purpose graph search algorithms that bridge statisti- cal learning and search. Of these, the first is an anytime algorithm for leveraging edge validity priors to accelerate graph search, and the second, Improved Multi-Heuristic A*, permits the use of multiple, inadmissible heuristics that might arise from learning. Experimental validation on multiple robots and real-world datasets, one of which we introduce, indicates that we can leverage the complementary strengths of fast learning- based methods and deliberative classical search to handle both "hard" (severely oc- cluded) and "easy" portions of a scene by automatically sliding the amount of deliber- ation required. iii Acknowledgements I cannot sufficiently express my gratitude for my advisor, Max, whose constant sup- port, patience, and guidance has resulted in an enjoyable journey. I consider myself fortunate to have been your student, and for having been able to work on a number of interesting problems over the years. Our association has presented me with practical lessons in research philosophy, problem-solving, and integrity, that will stay for a long time to come. I am thankful to my thesis committee members for their time and critique of this work. I am grateful to Sidd for the many valuable discussions and advice, to Martial, for the encouragement, and to Manuela, for all the insightful and big-picture questions. I thank Dieter for hosting me in his lab and for the collaboration from which I have learnt much. I must also thank Drew, for serving on my qualifier committee and for some of the most enriching classes at CMU. SBPL has been my home away from home, and I am fortunate to have had wonderful lab-mates over time: Andrew, Ben, Brad, Ellis, Ishani, Fahad, Jon, John, Kalin, Kalyan, Karthik, Mike, Sameer, Sandip, Shivam, Siddharth, Sung, and Victor. I am particularly indebted to Mike for teaching me much of what I know today, and for being the best mentor a fledgling graduate student could ask for. I sincerely hope that your time is rewarded. Thanks are due to my fellow robo-grads, Abhijeeth, Arun, Humphrey, and Karthik for their time in proof-reading manuscripts and providing feedback on practice talks at different stages of this work. I am also grateful to Varun, for being a permanent source of wisdom, and Sanjiban, for providing timely feedback and partnering in visa ordeals. I must thank the administrative staff at the RI, Peggy, Suzanne, and Rachel for always being there to help and check on, despite their incredible amounts of work. Life in Pittsburgh has been delightful thanks to a great set of friends. Madhu, Salini, Shivram, Sreekanth, and Swati: I will cherish forever the music sessions, travels, games, potlucks, and philosophical discussions. Abhijeeth and Sidzoo, I will miss our cross- word "breaks". Going back in time, I am thankful to my friends from undergrad, Brain, Padhu, Prabhu, Praha, Deepak, Guru, RT, and Vijay, for their company, and for leading me into robotics. Finally, none of this would have been possible without the support of my family. I thank my parents, Narayanan and Gomathi, for their unconditional trust and love, my sister Jayanthi for constantly reminding me of life outside the lab, and my extended family for providing a stimulating environment to grow in. v For Mangala Paati and the Sankarans vii Contents List of Figures1 List of Tables3 1 Introduction5 1.1 Motivation....................................5 1.2 Conventional Approaches...........................6 1.2.1 Case in Point: The Amazon Picking Challenge...........8 1.3 Proposed Approach...............................8 1.4 Thesis Overview................................9 2 Background 11 2.1 Object Instance Detection........................... 11 2.1.1 Local and Global 3D Feature Descriptors.............. 11 2.1.2 Generative Approaches........................ 12 2.1.3 Learning-based Approaches...................... 13 2.2 Heuristic Search................................. 14 2.2.1 A* Search................................ 14 2.2.2 Variants of A*.............................. 15 I Foundations 17 3 PERCH: Perception via Search for Multi-Object Instance Recognition 19 3.1 Setup....................................... 19 3.2 Notation..................................... 20 3.3 Optimization Formulation........................... 21 3.4 Monotone Scene Generation Tree....................... 22 3.4.1 Construction.............................. 22 3.4.2 Search.................................. 26 3.5 Completeness.................................. 27 3.6 Evaluation.................................... 28 3.6.1 Dataset.................................. 28 3.6.2 Implementation Details........................ 29 3.6.3 Baselines................................. 30 3.6.4 Results.................................. 31 viii 4 Extension to Unmodeled Clutter and Optimizations 35 4.1 C-PERCH.................................... 35 4.1.1 Notation................................. 35 4.1.2 Augmented Objective......................... 35 4.1.3 Tractability................................ 37 4.2 Pose Uncertainty Estimates.......................... 39 4.3 Experiments................................... 40 4.4 Discussion.................................... 42 4.5 Search Optimizations.............................. 43 4.5.1 Depth Image Memoization...................... 43 4.5.2 Lazy Search............................... 43 4.5.3 Edge Cost Normalization....................... 45 4.5.4 Precomputed Distance Fields..................... 45 II Discrimination and Deliberation 47 5 Discriminatively-guided Deliberative Perception 49 5.1 Discriminative Heuristic Generation..................... 49 5.2 D2P Implementation.............................. 51 5.2.1 R-CNN Heuristics........................... 52 5.2.2 Baseline Implementations....................... 53 5.3 Results...................................... 53 5.3.1 Comparison with Baselines...................... 53 5.3.2 Utility of Lazy Edge Evaluations................... 56 5.3.3 Discretization vs. ICP Tradeoff.................... 57 5.3.4 Synthetic Example........................... 57 6 RANSAC-Trees for 6 DoF Pose 59 6.1 Sampling-based Search and Sample Consensus............... 59 6.2 Algorithm.................................... 61 6.3 Theoretical Analysis.............................. 61 6.3.1 Asymptotic Properties......................... 62 6.3.2 PAC-type Bounds............................ 63 6.4 The LOV Dataset................................ 64 6.5 Experiment Details............................... 67 6.5.1 Deep Learning for Dense Object Coordinate Regression...... 67 6.5.2 RANSAC Details............................ 68 6.5.3 Evaluation................................ 69 6.5.4 Results.................................. 70 6.6 Discussion.................................... 72 ix III Bridging Heuristic Search and Learning 75 7 Anytime Search on Graphs with Time-consuming Edge Evaluations 77 7.1 Motivating Examples.............................. 77 7.2 Background................................... 79 7.3 Overview..................................... 80 7.4 Expected Shortest Paths* (ESP*)........................ 81 7.4.1 Problem Setup............................. 81 7.4.2 Algorithm................................ 82 7.4.3 Theoretical Analysis.......................... 85 7.5 Optimal Policy for Edge Evaluation under Anytime Interruption.... 86 7.6 Evaluation.................................... 89 7.6.1 Mobile Manipulation Planning.................... 90 7.6.2 Synthetic Benchmarking........................ 92 7.7 Discussion...................................