User-Oriented Markerless Augmented Reality Framework Based on 3D Reconstruction and Loop Closure Detection
Total Page:16
File Type:pdf, Size:1020Kb
The University of Birmingham A USER-ORIENTED MARKERLESS AUGMENTED REALITY FRAMEWORK BASED ON 3D RECONSTRUCTION AND LOOP CLOSURE DETECTION by YUQING GAO A thesis submitted to the University of Birmingham for the degree of DOCTOR OF PHILOSOPHY (Ph.D.) School of Engineering Department of Electronic, Electrical & Systems Engineering University of Birmingham November 2016 University of Birmingham Research Archive e-theses repository This unpublished thesis/dissertation is copyright of the author and/or third parties. The intellectual property rights of the author or third parties in respect of this work are as defined by The Copyright Designs and Patents Act 1988 or as modified by any successor legislation. Any use made of information contained in this thesis/dissertation must be in accordance with that legislation and must be properly acknowledged. Further distribution or reproduction in any format is prohibited without the permission of the copyright holder. ABSTRACT Visual augmented reality aims to blend computer-generated graphics with real-world scenes to enhance the end users’ perception and interaction with a given environment. For those AR applications which require registering the augmentations in the exact locations within the real scene, the system must track the user’s camera motion and estimate the spatial relationship between the user and the environment. Vision-based tracking is one of the more effective tracking methods, and uses relatively low-cost and easy-access cameras as input devices and, furthermore, exploits several computer vision (CV) techniques to solve the problem. It can typically be divided into marker-based and markerless methods. The marker-based AR applications have, in the past, proved sufficiently robust and accurate, and the marker-based tracking methods have been widely supported by almost every AR software development kit developed and marketed to date. However, they always require the introduction of artificial markers into the workplace, which may be undesirable in some cases (e.g. outdoors), due to deterioration over time as a result of exposure to weather effects, or due to requirements not to tamper with objects and sites of historic or religious significance. In contrast, markerless tracking methods attempt to make use of the natural features extracted from the original environment as their reference. Several CV-based methods, such as Structure from Motion (SfM) or visual SLAM, have already been applied for the process of unsupervised markerless template training, and many research projects have applied markerless tracking methods for solving AR issues within their chosen application area. However, a general development framework supporting the higher- level application designers or developers, supporting the customisation of AR applications for different environments and different purposes, is rare. 1 The present research proposes a conceptual markerless AR framework system, the process for which is divided into two stages – an offline database training session for the designers, and an online AR tracking and display session for the final users. In the offline session, two types of 3D reconstruction application, RGBD-SLAM and SfM are integrated into the development framework for building the reference template of a target environment. The performance and applicable conditions of these two methods are presented in the present thesis, and the application developers can choose which method to apply for their developmental demands. A general developmental user interface is provided to the developer for interaction, including a simple GUI tool for augmentation configuration. The present proposal also applies a Bag of Words strategy to enable a rapid “loop-closure detection” in the online session, for efficiently querying the application user-view from the trained database to locate the user pose. The rendering and display process of augmentation is currently implemented within an OpenGL window, which is one result of the research that is worthy of future detailed investigation and development. 2 ACKNOWLEDGEMENTS I would like to express my sincere, respectful gratitude to my supervisor, Professor Robert J. Stone, for his guidance, supervision, constant encouragement and great help throughout these years. Thank you for supporting me with your unlimited patience, positive attitude and the warmest heart, which helped me to overcome the hardest times during my PhD study. I would like to express my appreciation to Dr. Mike Spann, for his precious suggestions and academic supports. I would also extend my gratitude to Dr. Jingjing Xiao, for helpful discussion and collaboration. Her endlessly intellectual enthusiasm impressed me deeply. My special thanks to Dr. Cheng Qian for his tremendous assistance throughout my PhD study and for being such a sincere friend. I would like to thank Professor Peter Gardner, Mary Winkles for their kind help and support for my PhD programme. Many thanks to Vishant Shingari, Chris Bibb, Mohammadhossein Moghimi, Laura Nice, and all the colleagues in the Human Interface Technologies (HIT) team for their help and being great friends. I would like to express my sincere gratitude to Deborah Cracknell and Mark Du'chesne from The National Marine Aquarium, for sharing the valuable data and information with me. I also want to thank James Robert, whose writing has inspired me so much. The greatest thanks go to my beloved family. I cannot express my gratitude to my mother and father, who always give me unconditional support and love. I could not have achieved anything without your support. Finally, I would also like to thank all of my friends who supported thus far. Thank you! TABLE OF CONTENTS ABSTRACT ..................................................................................................................... 1 TABLE OF CONTENTS ................................................................................................. 3 LIST OF ILLUSTRATIONS ........................................................................................... 7 LIST OF TABLES ......................................................................................................... 12 LIST OF ABBREVIATIONS ........................................................................................ 14 Chapter 1 Introduction .............................................................................................. 18 1.1. Scientific and Novel Contributions .................................................................. 27 1.2. Thesis outline ................................................................................................... 29 Chapter 2 Literature Review and Background .......................................................... 32 2.1. Augmented Reality .......................................................................................... 33 2.1.1. Visual AR technologies ............................................................................ 35 2.1.2. AR development frameworks ................................................................... 52 2.1.3. AR system evaluation ............................................................................... 54 2.2. Computer vision methods in AR ..................................................................... 57 2.2.1. Camera and camera calibration ................................................................ 60 2.2.2. Visual features .......................................................................................... 65 2.2.3. CV-based localisation and mapping ......................................................... 82 2.2.4. Image retrieval for loop closing ............................................................... 97 2.3. Hardware, software supports and datasets for evaluation .............................. 101 2.3.1. Hardware ................................................................................................ 101 2.3.2. Software .................................................................................................. 110 2.3.3. CV datasets for evaluation ...................................................................... 112 2.4. Problems and Challenges ............................................................................... 121 Chapter 3 Preparatory Studies ................................................................................. 124 3.1. AR application development and requirement audience survey ................... 124 3.1.1. Questionnaire analysis ............................................................................ 125 3.1.2. Conclusion .............................................................................................. 139 3.2. Geometric transformations ............................................................................ 140 3.2.1. Reference frames and coordinate systems .............................................. 141 3.2.2. 3D-to-3D rigid transformations .............................................................. 145 3.2.3. 3D-to-2D camera projections ................................................................. 148 Chapter 4 3D Reconstruction for Template Training.............................................. 151 4.1. RGBD-SLAM ................................................................................................ 154 4.1.1. Graph-based SLAM ................................................................................ 158 4.1.2. System implementation .........................................................................