Efficient and Robust Pose Estimation Based on Inertial and Visual Sensing

TECHNISCHE UNIVERSITAT¨ MUNCHEN¨ Lehrstuhl für Echtzeitsysteme und Robotik Efficient and Robust Pose Estimation Based on Inertial and Visual Sensing Elmar Mair Vollständiger Abdruck der von der Fakultätder Informatik der Technischen Universität München zur Erlangung des akademischen Grades eines Doktors der Naturwissenschaften (Dr. rer. nat.) genehmigten Dissertation. Vorsitzender: Univ.-Prof. Dr. Nassir Navab Prüfer der Dissertation: 1. Univ.-Prof. Dr.-Ing. Darius Burschka 2. Prof. Gregory Donald Hager, Ph.D. Johns Hopkins University, Baltimore/USA Die Dissertation wurde am 01.12.2011 bei der Technischen UniversitätMünchen ein- gereicht und durch die FakultätfürInformatik am 30.03.2012 angenommen. Abstract Reliable motion estimation on resource-limited platforms is important for many applications. While insects solve this problem in an exemplary man- ner, mobile robots still require a bulky computation and sensor equipment to provide the required robustness. In this thesis, we aim for an efficient and reliable navigation system which is independent of external devices. For that, we assess highly effectual, but still application-independent, biological concepts. Based on these insights, we propose an inertial-visual system as a minimal sensor combination which still allows for efficient and robust navigation. Thereby, we focus especially on algorithms for image-based motion estimation. Different methods are developed to allow for efficient image processing and pose estimation at high frame rates. Tracking of several hundreds of features and a motion estimation from these correspondences in a few mil- liseconds on low-power processing units have been achieved. The precision of the motion computation is evaluated in dependence of the aperture angle, the tracking accuracy, and the number of features. In addition, we derive error propagations for image-based pose estimation algorithms. These can be used as accuracy estimate when fusing camera measurements with other sensors. We propose two different ways of combining inertial measurement units and cameras. Either the inertial data is used to support the feature tracking or it is fused with the visual motion estimates in a Kalman filter. For the spatial and temporal registration of the sensors we present also different solutions. Finally, the presented approaches are evaluated on synthetic and on real data. Furthermore, the algorithms are integrated into several applications, like hand-held 3D scanning, visual environment modeling, and driving as well as flying robots. Zusammenfassung Eine zuverlässige Bewegungsschätzung auf Plattformen mit eingeschränkten Ressourcen ist essentiell fur¨ viele Anwendungen. Während Insekten diese Aufgabe auf vorbildliche Weise lösen, benötigen mobile Roboter immer noch umfangreiche Sensorik und Rechen-Kapazität, um die erforderliche Robustheit gewährleisten zu können. Auf der Suche nach einer effizienten und zuverlässigen Navigationslösung, beginnt diese Arbeit mit einer Beur- teilung von effektiven, aber noch anwendungsunabhängigen, biologischen Konzepten. Basierend auf diesen Einsichten wird die Kombination eines In- ertialsensors und einer Kamera als minimales Navigationssystem, das noch angemessene Zuverlässigkeit gewährt, vorgeschlagen. Dabei liegt der Fokus vor allem auf der bildbasierten Positionsbestimmung. Verschiedene Methoden wurden entwickelt, um eine effiziente Bildverarbei- tung und Positionsberechnung bei hohen Bildwiederholungsraten zu ermög- lichen. Es wurde erreicht, dass auf einem Low-Power-Prozessor in wenigen Millisekunden mehrere hundert Merkmale verfolgt werden können. Der Ein- fluss des Offnungswinkels,¨ die Tracking-Genauigkeit und die Anzahl der Merkmale auf die Positionsbestimmung werden evaluiert. Weiters werden Fehlerabschätzungen fur¨ Positionsberechnungs-Methoden hergeleitet, die als Gutemaß¨ fur¨ die Fusion von Kameramessungen mit anderen Sensoren Ver- wendung finden. Zwei verschiedene Kombinationsmöglichkeiten von inertialen Navigationssystemen und Kameras werden vorgestellt. Entweder un- terstutzt¨ die inertiale Messung das Tracking von Merkmalen oder ein Kal- man-Filter fusioniert die inertialen und visuellen Bewegungsschätzungen. Fur¨ die zeitliche und räumliche Registrierung der Sensoren werden eben- falls verschiedene Ansätze vorgeschlagen. Schließlich findet eine Auswertung der vorgestellten Verfahren mit realen und synthetischen Daten statt. Zudem werden die Algorithmen in verschiedene Applikationen integriert, wie z.B. handgefuhrtes¨ 3D Scannen, visuelle Umgebungsmodellierung und fahrende sowie fliegende Roboter. Riassunto La stima affidabile del moto su piattaforme con risorse limitate èimportante per molte applicazioni. Mentre gli insetti risolvono questo problema in modo esemplare, i robot mobili, per garantire la robustezza necessaria, hanno bisogno di svariati sensori e di ingombranti attrezzature per il calcolo. In questa tesi l’obiettivo èdi sviluppare un sistema di navigazione efficace ed affidabile e allo stesso tempo indipendente da dispositivi esterni. Il primo approccio èstata la valutazione positiva di concetti biologici adatti a fornire le basi per gli studi seguenti. Grazie a queste intuizioni si èlavorato ad una combinazione di sensori minima, che tuttavia garantisce una navigazione efficiente e robusta, formata da un sensore inerziale ed una videocamera. Questa dissertazione si concentra specialmente sulla stima della posizione basata sull’analisi delle immagini. Per rendere efficace la loro elaborazione e il calcolo della posizione ad elevata frequenza, sono stati proposti diversi metodi. In questo modo èpossibile monitorare diverse centinaia di punti di riferimento con un processore a basso consumo. Inoltre sono stati valutati l’influenza dell’angolo di apertura, l’esattezza del tracking e il numero di punti di riferimento sulla stima della posizione. Si èpoi risalito a propa- gazioni dell’errore di calcolo della posizione, che possono essere usati come indice di qualitàper la fusione di misurazioni video con altri sensori. Infine sono state individuate due possibili combinazioni di un’unitàdi misura inerziale con una videocamera. Queste informazioni inerziali possono sostenere il tracking di punti di riferimento, oppure possono essere combinate in un filtro Kalman con il calcolo del movimento basato su una camera. Per la sincronizzazione spaziale e temporale dei sensori sono state proposte diverse metodiche. Alla fine i metodi presentati sono stati analizzati con dati reali e sintetici. Gli algoritmi sono stati integrati in diverse applicazioni, come per esempio la scansione manuale 3D, la modellatura visuale dell’ambiente e robot mobili e volanti. This thesis is dedicated to my parents for their love, endless support and encouragement. Acknowledgements First of all I want to thank Prof. Darius Burschka, who gave me the op- portunity to do my PhD in his group. He was a great supervisor, mentor and friend to me. I want to thank him for his time and patience and all the chances he offered me. Next I want to thank Prof. Gregory D. Hager, who allowed me to visit him and his lab at the Johns Hopkins University (JHU) for almost five months. It was a great experience with a lot of insights and fruitful discussions with him and his lab members. Further, I want to thank him for reviewing my thesis. Special thanks also to Prof. Gerhard Hirzinger and Dr. Michael Suppa of the institute of Robotics and Mechatronics at the German Aerospace Center (DLR) for the funding and the chance to work in this so inspiring research environment since the early beginning of my PhD. Thanks also to Dr. Wolfgang Stürzl of the department of neurobiology at the university of Bielefeld and Prof. Jochen Zeil of the Research School of Biology (RSB) of the Australian National University (ANU) for the chance to visit and talk to the biologists in Canberra. The discussions with them were so inspiring to me. Both greatly supported me with the biological motivation of this thesis and gave me a lot of hints. Thanks also to the excellence cluster Cognition for Technical Systems (CoTeSys) for the funding of the first one and a half years of my work at the TUM. Further, I want to thank Dr. Gerhard Schrott, Monika Knürr, Amy Bücherl, and Gisela Hibsch for their administrative support at the TUM and for making it to more than a research environment. I could have never done this work without the support, the exchange of ideas and the discussions with my colleagues and friends at the institute for Robotics and Embedded Systems at the TUM, especially the Machine Vision and Perception (MVP) group, and at the institute for Robotics and Mechatronics at the DLR, especially the department Perception and Cogni- tion. Thank you all for the great time and all the fun we had. At this point I want to mention Oliver Ruepp, my office mate at the TUM, which opened my eyes for some mathematical problems and helped me, e.g., formulating Appendix A.4 in a (as he would say) “less intuitive” way. Further, I want to thank the main project partners for the great collaboration: Werner Meier in the Virtual Environment Modeling project (CoTeSys), Klaus Strobl in the DLR-3D Modeler project, and my office mate at the DLR, Korbinian Schmid, in the DLR multicopter project. Special thanks belong also to my students Felix Ahlborg, Marcus Augus- tine, Michael Fleps, Michelle Kruell, Sebastian Riedel, Konstantin Werner, Florian Wilde, and Jonas Zaddach for their great work. Finally, I want to thank my family Luise, Hans, Sabine and Egon for their support

Load more