Gesture-Based Interaction with Time-Of-Flight Cameras

Aus dem Institut fur¨ Neuro- und Bioinformatik der Universität zu Lubeck¨ Direktor: Prof. Dr. rer. nat. Thomas Martinetz Gesture-Based Interaction with Time-of-Flight Cameras Inauguraldissertation zur Erlangung der Doktorwurde¨ der Universität zu Lubeck¨ – Aus der Technisch-Naturwissenschaftlichen Fakultät – vorgelegt von Martin Haker aus Reinbek Lubeck¨ 2010 Erster Berichterstatter: Prof. Dr.-Ing. Erhardt Barth Zweiter Berichterstatter: Prof. Dr.-Ing. Achim Schweikard Tag der mundlichen¨ Prufung:¨ 20. Dezember 2010 Zum Druck genehmigt. Lubeck,¨ den 10. Januar 2011 ii Contents Acknowledgements vi I Introduction and Time-of-Flight Cameras 1 1 Introduction 3 2 Time-of-Flight Cameras 7 2.1 Introduction .............................. 7 2.2 State-of-the-art TOFSensors . 8 2.3 Alternative Optical Range Imaging Techniques . 11 2.3.1 Triangulation. 11 2.3.2 Photometry........................... 12 2.3.3 Interferometry . 13 2.3.4 Time-of-Flight . 13 2.3.5 Summary............................ 14 2.4 Measurement Principle . 15 2.5 Technical Realization . 17 2.6 Measurement Accuracy . 21 2.7 Limitations............................... 23 2.7.1 Non-Ambiguity Range . 23 2.7.2 Systematic Errors. 24 2.7.3 Multiple Reflections . 24 2.7.4 Flying Pixels .......................... 25 2.7.5 Motion Artefacts . 25 2.7.6 Multi-Camera Setups. 26 II Algorithms 27 3 Introduction 29 4 Shading Constraint 33 4.1 Introduction .............................. 33 4.2 Method................................. 35 4.2.1 Probabilistic Image Formation Model . 35 4.2.2 Lambertian Reflectance Model . 37 iii CONTENTS 4.2.3 Computation of Surface Normals . 38 4.2.4 Application to Time-of-Flight Cameras . 40 4.3 Results ................................. 41 4.3.1 Synthetic Data . 41 4.3.2 Real-World Data . 44 4.4 Discussion ............................... 45 5 Segmentation 49 5.1 Introduction .............................. 49 5.1.1 Segmentation Based on a Background Model . 50 5.1.2 Histogram-based Segmentation . 53 5.1.3 Summary............................ 56 6 Pose Estimation 59 6.1 Introduction .............................. 59 6.2 Method................................. 61 6.3 Results ................................. 64 6.3.1 Qualitative Evaluation . 64 6.3.2 Quantitative Evaluation of Tracking Accuracy . 66 6.4 Discussion ............................... 69 7 Features 71 7.1 Geometric Invariants. 75 7.1.1 Introduction .......................... 75 7.1.2 Geometric Invariants. 76 7.1.3 Feature Selection . 79 7.1.4 Nose Detector .... ......... .......... .. 80 7.1.5 Results ............................. 81 7.1.6 Discussion ........................... 83 7.2 Scale Invariant Features . 84 7.2.1 Introduction .......................... 84 7.2.2 Nonequispaced Fast Fourier Transform (NFFT) . 85 7.2.3 Nose Detection . 88 7.2.4 Experimental Results. 90 7.2.5 Discussion ........................... 94 7.3 Multimodal Sparse Features for Object Detection . 95 7.3.1 Introduction .......................... 95 iv CONTENTS 7.3.2 Sparse Features. 96 7.3.3 Nose Detection . 98 7.3.4 Results ............................. 102 7.3.5 Discussion ........................... 107 7.4 Local Range Flow for Human Gesture Recognition . 108 7.4.1 Range Flow........................... 109 7.4.2 Human Gesture Recognition . 111 7.4.3 Results ............................. 115 7.4.4 Discussion ........................... 117 III Applications 119 8 Introduction 121 9 Facial Feature Tracking 123 9.1 Introduction .............................. 123 9.2 Nose Tracking ............................. 124 9.3 Results ................................. 126 9.4 Conclusions .............................. 127 10 Gesture-Based Interaction 129 10.1Introduction .............................. 129 10.2Method................................. 131 10.2.1 Pointing Gesture . 131 10.2.2 Thumbs-up Gesture . 135 10.2.3 System Calibration . 136 10.3Results ................................. 138 10.4Discussion ............................... 140 11 Depth of Field Based on Range Maps 143 11.1 Introduction .............................. 143 11.2Method................................. 146 11.2.1 Upsampling the TOFrange map . 146 11.2.2 Preprocessing the Image Data . 148 11.2.3 Synthesis of Depth of Field . 150 11.2.4 Dealing with Missing Data . 152 11.3Results ................................. 154 v CONTENTS 11.4 Discussion ............................... 156 12 Conclusion 159 References 161 vi Acknowledgements There are a number of people who have contributed to this thesis, both in form and content as well on a moral level, and I want to express my gratitude for this support. First of all, I want to thank Erhardt Barth. I have been under his supervision for many years and I feel that I have received a lot more than supervision; that is the earnest support on both a technical and the above mentioned moral level. Thank you! At the Institute for Neuro- and Bioinformatics I am grateful to Thomas Martinetz for giving me the opportunity to pursue my PhD. I thank Martin Böhme for a fruitful collaboration on the work presented in this thesis. Special gratitude goes to Michael Dorr, who was always around when advice or a motivating chat was needed. I also want to thank all my other colleagues for providing such a healthy research environ- ment – not only on a scientific level but also at the foosball table. I also need thank the undergraduate students who worked with me over the years. Special thanks go to Michael Glodek and Tiberiu Viulet who significantly contributed to the results of this thesis. Major parts of this thesis were developed within the scope of the European project ARTTS (www.artts.eu). I thank both the ARTTS consortium for the collaboration as well as European Commission for funding the project (contract no. IST-34107) within the Information Society Technologies (IST) priority of the 6th Framework Programme. This publication reflects the views only of the authors, and the Commis- sion cannot be held responsible for any use which may be made of the information contained therein. vii Part I Introduction and Time-of-Flight Cameras 1 1 Introduction Nowadays, computers play a dominant role in various aspects of our lives and it is difficult, especially for younger generations, to imagine a world without them. In their early days, computers were mainly used to perform the tedious work of solving complex mathematical evaluations. Only universities and large companies were able to afford computers and computing power was expensive. This changed with the advent of personal computers in the 1970’s, such as the Altair and the Apple II. These small, yet powerful, machines for individual usage brought the technology into both offices and homes. In this period, computers were also introduced to the gaming market. As computers were intended for a wider range of users, they required suitable forms of user interaction. While punched cards provided a means for computer spe- cialists to feed simple programs into early computer models even up to the 1960’s, the general user required a graphical user interface and intuitive input devices, such as mouse and keyboard. In the years that followed, the computer industry experienced an enormous tech- nological progress; the computing power could be increased significantly while both the cost of production and the size of the devices were reduced. This development lead to the emergence of computers in more and more different aspects of our lives. Hand-held devices allow us to access the internet wherever we are. Complex mul- timedia systems for viewing digital media, such as movies or photos, appear in our living rooms. And computer-aided systems provide support in automobiles, production facilities, and hospitals. These new appliances of computer technology require novel forms of human- computer interaction; mouse and keyboard are not always the ideal means for providing user input. Here, a lot of progress has been achieved in the areas of speech 3 CHAPTER 1. INTRODUCTION recognition systems and touch screens in recent years. In the former case, however, it has proven difficult to devise systems that perform with sufficient reliability and ro- bustness, i.e. apart from correctly recognizing speech it is difficult to decide whether an incoming signal is intended for the system or whether it belongs to an arbitrary conversation. So far, this limited the widespread use of speech recognition systems. Touch screens, on the other hand, have experienced a broad acceptance in information terminals and hand-held devices, such as Apple’s iPhone. Especially, in the latter case one can observe a new trend in human-machine interaction – the use of gestures. The advantage of touch sensitive surfaces in combination with a gesture recognition system is that a touch onto the surface can have more meaning than a simple pointing action to select a certain item on the screen. For example, the iPhone can recognize a swiping gesture to navigate between so-called views, which are used to present different content to the user, and two finger gestures can be used to rotate and resize images. Taking the concept of gestures one step further is to recognize and interpret human gestures independently of a touch sensitive surface. This has two major advan- tages: (i) It comes closer to our natural use of gestures as a body language and (ii) it opens a wider range of applications where users do not need to be in physical contact with the input device. The success of such systems has first been shown in the gaming market

Load more