Visual-Inertial Odometry for Autonomous Ground Vehicles

DEGREE PROJECT IN COMPUTER SCIENCE AND ENGINEERING, SECOND CYCLE, 30 CREDITS STOCKHOLM, SWEDEN 2017 Visual-Inertial Odometry for Autonomous Ground Vehicles AKSHAY KUMAR BURUSA KTH ROYAL INSTITUTE OF TECHNOLOGY SCHOOL OF COMPUTER SCIENCE AND COMMUNICATION Visual-Inertial Odometry for Autonomous Ground Vehicles Akshay Kumar Burusa [email protected] Master’s Thesis at school of Computer Science and Communication ——— KTH Supervisor : John Folkesson Scania Supervisor : Zhan Wang Examiner : Patric Jensfelt September 2017 Abstract Monocular cameras are prominently used for estimating motion of Unmanned Aerial Vehicles. With growing interest in autonomous vehicle technology, the use of monocular cameras in ground vehicles is on the rise. This is especially favorable for localization in situations where Global Navigation Satellite System (GNSS) is unreliable, such as open-pit mining environments. However, most monocular camera based approaches suffer due to obscure scale information. Ground vehicles impose a greater difficulty due to high speeds and fast movements. This thesis aims to estimate the scale of monocular vision data by using an inertial sensor in addi- tion to the camera. It is shown that the simultaneous estimation of pose and scale in autonomous ground vehicles is possible by the fusion of visual and inertial sensors in an Extended Kalman Filter (EKF) framework. However, the convergence of scale is sensitive to several factors including the initialization error. An accurate estimation of scale allows the accurate estimation of pose. This facilitates the localization of ground vehicles in the absence of GNSS, providing a reliable fall-back option. Sammanfattning Monokulära kameror används ofta vid rörelseestimering av obe- mannade flygande farkoster. Med det ökade intresset för auto- noma fordon har även användningen av monokulära kameror i fordon ökat. Detta är fram för allt fördelaktigt i situationer där satellitnavigering (Global Navigation Satellite System (GNSS)) är opålitlig, exempelvis i dagbrott. De flesta system som använder sig av monokulära kameror har problem med att estimera skalan. Denna estimering blir ännu svårare på grund av ett fordons större hastigheter och snabbare rörelser. Syftet med detta exjobb är att försöka estimera skalan baserat på bild data från en monokulär kamera, genom att komplettera med data från tröghetssensorer. Det visas att simultan estimering av position och skala för ett fordon är möjligt genom fusion av bild- och tröghetsdata från sen- sorer med hjälp av ett utökat Kalmanfilter (EKF). Estimeringens konvergens beror på flera faktorer, inklusive initialiseringsfel. En noggrann estimering av skalan möjliggör också en noggrann estimering av positionen. Detta möjliggör lokalisering av fordon vid avsaknad av GNSS och erbjuder därmed en ökad redundans. Acknowledgements This thesis was completed at the department of Autonomous Transport Solu- tion, REPF, at Scania CV AB. I would like to thank Zhan Wang for providing me the opportunity to work on this thesis. I thank him for his continuous guidance and for giving me great insight regarding how to approaching problems. I thank Lars Hjorth for supporting and encouraging me to complete the thesis. I thank Bengt Boberg for providing me the necessary materials that helped me achieve a greater understanding of concepts. I also thank Patricio Valenzuela for all the interesting and significant discussions. I would like to thank my supervisor at KTH, John Folkesson, for encouraging me and for all his suggestions which kept me focused on the important problems. Finally, I thank my dear family and amazing friends who were very supportive of my decisions and helped me throughout my studies. This thesis would not be possible without them. Akshay Kumar Burusa Stockholm, September 2017 Contents Abstract Sammanfattning Acknowledgements Glossary 1 Introduction 1 1.1 Problem Statement . 1 1.2 Objectives . 2 1.3 Organization . 2 2 Background 5 2.1 Quaternions . 5 2.2 Sensors . 7 2.2.1 Pose from Visual Sensor . 7 2.2.2 Pose from IMU . 12 2.3 Fusion of IMU and Vision . 14 2.3.1 Bayesian Inference . 15 2.3.2 Kalman Filter . 16 2.3.3 Extended Kalman Filter . 18 2.3.4 Error-State Kalman Filter . 20 3 Related Work 23 3.1 Visual Odometry . 23 3.1.1 Feature-based Approach . 23 3.1.2 Direct Approach . 24 3.1.3 Hybrid Approach . 25 3.2 Visual-Inertial Odometry . 26 3.2.1 Loosely-coupled . 26 3.2.2 Tightly-coupled . 28 4 Implementation 31 4.1 Coordinate Frames . 31 4.2 Vector Notation . 32 4.3 Visual-Inertial System . 32 4.4 Extended Kalman Filter design . 33 4.4.1 Continuous-time Nonlinear State Model . 33 4.4.2 Discrete-time Nonlinear State Model . 34 4.4.3 Prediction Step . 34 4.4.4 Measurement Model . 36 4.4.5 Update Step . 37 5 Results 39 5.1 Malaga Dataset . 39 5.2 Experimental Setup . 39 5.2.1 Malaga Track 7 Results . 42 5.2.2 Malaga Track 8 Results . 45 5.3 Summary . 48 6 Conclusion 51 Bibliography 53 Appendices 56 A Sustainability, Ethics and Social Impact 57 B Detailed Results 59 B.1 Malaga Track 7 . 59 B.2 Malaga Track 8 . 62 Glossary Pose The combination of position and orientation of an object with respect to a coordinate frame. Odometry Use of data from sensors to estimate egomotion of an object over time. This estimation with the use of cameras is termed as visual odometry. Scale Scale refers to the ratio between the size of estimated vehicle trajectory and the size of true trajectory. Epipolar Geometry Projective geometry between two different views of a Camera. Photometric Error Geometric error between a pixel in an image and it’s corresponding projection in another image. Pixels are matched based on in- tensity. Reprojection Error Geometric error between a 2d location of a feature in an image and corresponding projection of the 3d feature in another image. Chapter 1 Introduction Experiments for developing autonomous vehicles started as early as 1920’s [15]. Almost a century later, autonomous vehicle technology has come a long way and is almost ready for large scale commercial production. In 1995, the NavLab 5 team at Carnegie Mellon University successfully developed an autonomous vehicle [21] which drove from Pittsburgh to San Diego without much human intervention. However, further development was required to make the vehicles safe and robust in all conditions. These vehicles are required to navigate through a complex environment of urban streets. A major contribution to autonomous vehicles came from the robotic technology that was developed for the DARPA challenge [28], an autonomous vehicle challenge funded by the US Department of Defense which started in 2004. Soon all major automotive companies started adopting this technology and today driverless vehicles have gained huge popularity although concerns for safety still remain. The autonomous vehicle technology is not a solved problem yet, in the sense that they are not completely ready for the roads. They fail to handle environments such as a crowded streets or bad weather conditions. Regarding safety of human life, there are several ethical issues such as the famous “Trol- ley problem” [27] and how an autonomous vehicle would react under such circumstances. 1.1 Problem Statement Autonomous vehicles rely heavily on their sensors for information about the environment. One obvious drawback here is that the failure of one sensor can result in serious consequences. This is addressed by having multiple sensors 1 CHAPTER 1. INTRODUCTION that can perceive similar information. For example, obstacle detection can be achieved using cameras and ultrasonic sensors. The combined use of these sensors is highly desirable for higher accuracy and also to handle cases of sensor failure. This thesis focuses on the problem of localization and odometry of autonomous ground vehicles, in particular, autonomous trucks. Global Navigation Satellite System (GNSS) are most commonly used for this purpose. GNSS is also prone to failure. It is very common for GNSS receivers to lose satellite signals in remote locations such as open-pit mining environment. A high accuracy GNSS sensor is very expensive and hence using multiple GNSS sensors is not practical. This implies that we require other set of sensors that can perform the same role. Lately, cameras are being used for localization and odometry in Unmanned Aerial Vehicles (UAVs). The use of monocular camera in ground vehicles introduces some challenges such as fast movements and obscure scale information. This thesis investigates the possibility to simultaneously estimate vehicle position and recover scale information by combining data from monocular camera and Inertial Measure- ment Unit (IMU). 1.2 Objectives • Investigate a robust real-time visual odometry algorithm for autonomous trucks in an open-pit mining scenario. • Investigate the possibility of accurate scale recovery with the fusion of monocular camera and IMU. • Analyze the algorithm for real data which closely represents open-pit mining scenario. 1.3 Organization Chapter 2: Background An introduction to odometry using cameras and IMU is given, followed by an overview of probabilistic state estimation and sensor fusion techniques. Chapter 3: Related Work The several methods available for sensor fusion, particularly in visual-inertial odometry, will be focused in this chapter. Some of the state-of-the-art works are introduced. 2 1.3. ORGANIZATION Chapter 4: Implementation An algorithm for visual-inertial odometry for autonomous vehicles is implemented using an Extended Kalman Filter (EKF) framework. Chapters 5, 6 : Results and Conclusion The algorithm was tested on an online dataset and the estimation of scale for monocular visual data was analyzed. 3 Chapter 2 Background This chapter begins with the introduction of quaternions, which are widely used to represent 3D orientation of autonomous vehicles. This is followed by techniques of pose estimation using visual and inertial data. Estimation theory is then introduced that leads to the implemented filter.

Load more