Pose Estimation Using Overhead Imagery and Semantics

Pose Estimation using Overhead Imagery and Semantics by Yousif Farid Dawood Rouben S.B., Mathematics, MIT (2016) S.B., Electrical Engineering and Computer Science, MIT (2016) Submitted to the Department of Electrical Engineering and Computer Science in partial fulfilment of the requirements for the degree of Master of Engineering in Electrical Engineering and Computer Science at the MASSACHUSETTS INSTITUTE OF TECHNOLOGY September 2018 ○c 2018 Yousif Farid Dawood Rouben. All rights reserved. The author hereby grants to MIT permission to reproduce and to distribute publicly paper and electronic copies of this thesis document in whole or in any part medium now known or hereafter created. Author................................................................ Department of Electrical Engineering and Computer Science September 7, 2018 Certified by. Nicholas Roy Professor of Aeronautics and Astronautics Thesis Supervisor Accepted by . Katrina LaCurts Chair, Master of Engineering Thesis Committee 2 Pose Estimation using Overhead Imagery and Semantics by Yousif Farid Dawood Rouben Submitted to the Department of Electrical Engineering and Computer Science on September 7, 2018, in partial fulfilment of the requirements for the degree of Master of Engineering in Electrical Engineering and Computer Science Abstract As hardware improves, so does the ability to run more complex algorithms. Improve- ments in the portability and performance of computing units has allowed for more compact robotic systems such as Micro Aerial Vehicles (MAVs), while developments in camera technologies has allowed for the generation of denser robot perception, such as depth and semantics, derived from environmental cues such as colour, shape and texture. Additionally, with the advent of commercial civilian satellites and initiatives like the Google Maps project, most of the earth’s outdoor urban environments have been captured through satellite imagery. These overhead images contain useful information that can aid with outdoor robot navigation and localisation, by, for example, being used as prior maps in a localisation pipeline. This thesis presents a method for generating semantically consistent robot localisation updates on-board a MAV. The key contribution is a process that fuses local ground and global satellite image information through geometric alignment at the semantic level. This method uses a semantically segmented collapsed point cloud from a camera system on-board a MAV, without any notion of mapping, referencing only a potentially stale global overhead image. A particle filter is used to enable multi- hypothesis data association and allow for efficient recovery from motion uncertainty. Thesis Supervisor: Nicholas Roy Title: Professor of Aeronautics and Astronautics 3 4 Acknowledgements To my family: Thank you does not cut it; nothing ever will. I love you all. Mama, you are my light. This will never change. To my neighbours who have taught me so much in life and have been the founda- tions to my inquisitive nature: I am forever grateful. Thank you to Nick Roy for allowing me to join his group, providing me with support and direction in both research and life and for being nothing but direct with me. I would also like to thank John (Jake) Ware and John Carter for their invaluable insights, mentorship and for making research fun. Thank you Nicholas Greene for unknowingly serving as my general encyclopaedia, debugger and teacher. Thanks to Nicholas Villanueva for being in the same boat and reminding me that boats float more often than they sink. Thanks to Fernando Yordan for listening to my ideas and generally being a potato. Finally, I would like to thank the Robust Robotics Group and FLA team for their support and feedback - you have taught me a lot this past year. 5 6 Contents 1 Introduction 13 1.1 Motivation . 14 1.2 Problem Statement . 17 1.3 Contributions . 18 1.4 Thesis Outline . 19 2 Related Work 21 2.1 Robot Localisation . 21 2.1.1 Basic Notation . 22 2.1.2 Dead Reckoning and GPS . 23 2.1.3 Recursive State Estimation . 24 2.2 Vision Based Overhead Localisation . 29 2.2.1 Feature Based Approaches . 29 2.2.2 Learned Embedding Approaches . 35 2.3 Semantic Segmentation . 36 2.3.1 Datasets . 39 3 Approach 43 3.1 Algorithm Outline . 43 3.2 Problem Formulation . 45 3.2.1 Notation . 45 3.2.2 Problem Statement . 46 3.3 Semantic Segmentation: Training . 47 7 3.4 Semantic Topview Generation . 50 3.5 Monte Carlo Localisation . 52 3.5.1 Initialisation . 53 3.5.2 Prediction: Motion Model . 54 3.5.3 Correction: Measurement Model for Particle Weighting . 55 3.5.4 Pose Estimate Calculations . 57 3.6 Summary . 59 4 Results 61 4.1 Semantic Segmentation . 61 4.2 Simulation . 67 4.2.1 Setup . 67 4.2.2 Results . 70 4.3 Real Flight Data . 73 5 Conclusion 85 8 List of Figures 1-1 Examples of Commercial Autonomous Robots . 14 1-2 Examples of Overhead Imagery . 15 1-3 Examples of Perception Sensors for Autonomous Vehicles . 17 2-1 Graphical Model of Markovian State Estimation Problem . 24 2-2 Semantic Segmentation Annotated Training Image Pair Example . 39 3-1 Semantic Overhead Localisation Pipeline . 44 3-2 Semantic Topview Generation Pipeline . 51 3-3 Odometry Model Diagram . 55 4-1 Overhead Image of the Simulated Environment . 69 4-2 Approximate Flight Path for Simulation Tests . 70 4-3 “Figure-Eight” Simulation Flight Path . 73 4-4 Example of “Easy” Simulation Trajectory and Errors . 75 4-5 Example of “Medium” Simulation Trajectory and Errors . 76 4-6 Example of “Hard” Simulation Trajectory and Errors . 77 4-7 Simulation Position and Orientation Mean Square Errors . 78 4-8 CDFs of Absolute Position and Orientation Errors in Simulation . 79 4-9 Examples of Pose Estimate Trajectories along a “Figure-Eight” Path . 80 4-10 Semantically Labelled Overhead Image of Real World Flight Region . 81 4-11 SAMWISE vs Semantic Overhead Localisation Estimate Trajectories 82 4-12 Semantic Path Reconstruction using SAMWISE . 83 4-13 Semantic Path Reconstruction using Semantic Overhead Localisation 84 9 10 List of Tables 3.1 Semantic Segmentation Training Set: Class Mappings . 49 4.1 Semantic Segmentation Prediction Examples Across Test Datasets . 62 4.2 Semantic Segmentation Results: Pixel Accuracies . 65 4.3 Semantic Segmentation Results: Intersection over Union . 65 4.4 Semantic Segmentation Training Set: Cross-Entropy Loss Reweightings 66 4.5 Parameters Controlling Testing Difficulty in Simulation . 68 4.6 Mean and Standard Deviation of Simulation Position Mean Square Errors 71 11 12 Chapter 1 Introduction The field of robotics has progressed greatly over the past few decades. However, what remains one of the most compelling and yet most difficult challenges in robotics, is autonomy: allowing a robot to operate without human intervention or guidance. A key aspect of autonomy, particularly for robots that are tasked with kinematic actions such as ground and aerial vehicles that navigate in various environments, is localisation. In order for a robot to complete its designated tasks, it needs to know, with accuracy, the where (localisation) and then when (synchronisation) such that it can be able to do the how (planning) and the what (actions). This thesis presents a robot localisation approach based on Monte Carlo Local- isation (MCL) methods that uses satellite imagery as the prior map and a camera system with depth estimation capabilities as the environmental measurement sensor. Semantic segmentation is used, enabling semantically meaningful alignments between the global overhead image and the local onboard camera imagery. The presented approach is designed for real-time use onboard autonomous vehicles and is tested using a micro aerial vehicle (MAV) framework. This chapter outlines the motivations for the research presented (Section 1.1), a brief introduction to the robot localisation problem that this thesis addresses (Sec- tion 1.2), the contribution of this thesis towards exploring the field of semantic localisation using an overhead image (Section 1.3), and finally, an overview of the subsequent chapters (Section 1.4) 13 (a) Skydio aerial photography drone [1] (b) Waymo’s self-driving car [2] (c) Chinese surveillance drone [3] Figure 1-1: Examples of autonomous robots for transport, photography and surveillance. 1.1 Motivation The development of autonomous robots has been largely driven by the advancements in both hardware and software. Improvements in hardware have enabled the execution of more complex algorithms and systems, while the development of new algorithms have allowed for more robust robot perception, motion, control and planning. Indeed, the two go hand in hand in moving towards truly autonomous, real-time systems that could integrate themselves into our everyday lives, ranging in purpose from aerial photography [1] to driving [2], surveillance [3] to search and rescue [4]. Typically, the robotic platform of focus for localisation techniques has been autonomous ground vehicles. However, a more generalised and arguably more challenging class of robots is aerial vehicles, particularly, micro aerial vehicles (MAVs). Although their dynamic models are well understood, they are, in some ways, a more challenging platform over traditional ground vehicles. Their movements are not restricted over planar surfaces (such as roads), while additional degrees of freedom impose requirements for systems that can account for increased complexity. For example, cameras mounted on MAVs are no longer restricted to a fixed “dashcam” view, 14 Figure 1-2: Examples

Load more