3D Gesture Recognition and Tracking for Next Generation of Smart Devices
Total Page:16
File Type:pdf, Size:1020Kb
3D Gesture Recognition and Tracking for Next Generation of Smart Devices Theories, Concepts, and Implementations SHAHROUZ YOUSEFI Department of Media Technology and Interaction Design School of Computer Science and Communication KTH Royal Institute of Technology Doctoral Thesis in Media Technology Stockholm, February 2014 3D Gesture Recognition and Tracking for Next Generation of Smart De- vices: Theories, Concepts, and Implementations Shahrouz Yousefi Department of Media Technology and Interaction Design (MID) School of Computer Science and Communication (CSC) KTH Royal Institute of Technology SE-100 44, Stockholm, Sweden Author's e-mail: [email protected] Akademisk avhandling som med tillst˚andav Kungliga Tekniska H¨ogskolan framl¨aggstill offentlig granskning f¨oravl¨aggandeav Teknologie Doktorsex- amen i Medieteknik, m˚andagenden 17 mars 2014 kl 13:15 i sal F3 Lindst- edsv¨agen26, Kungliga Tekniska H¨ogskolan, Stockholm. TRITA-CSC-A-2014-02 ISSN-1653-5723 ISRN-KTH/CSC/A{14/02-SE ISBN-978-91-7595-031-0 Copyright c 2014 by Shahrouz Yousefi, All rights reserved. Typeset in LATEX by Shahrouz Yousefi E-version available at http://kth.diva-portal.org Printed by E-print AB, Stockholm, Sweden, 2014 Distributor: KTH School of Computer Science and Communication Abstract The rapid development of mobile devices during the recent decade has been greatly driven by interaction and visualization technologies. Al- though touchscreens have significantly enhanced the interaction technol- ogy, it is predictable that with the future mobile devices, e.g., augmented reality glasses and smart watches, users will demand more intuitive in- puts such as free-hand interaction in 3D space. Specifically, for manipu- lation of the digital content in augmented environments, 3D hand/body gestures will be extremely required. Therefore, 3D gesture recognition and tracking are highly desired features for interaction design in future smart environments. Due to the complexity of the hand/body motions, and limitations of mobile devices in expensive computations, 3D gesture analysis is still an extremely difficult problem to solve. This thesis aims to introduce new concepts, theories and technologies for natural and intuitive interaction in future augmented environments. Contributions of this thesis support the concept of bare-hand 3D ges- tural interaction and interactive visualization on future smart devices. The introduced technical solutions enable an effective interaction in the 3D space around the smart device. High accuracy and robust 3D mo- tion analysis of the hand/body gestures is performed to facilitate the 3D interaction in various application scenarios. The proposed technologies enable users to control, manipulate, and organize the digital content in 3D space. Keywords: 3D gestural interaction, gesture recognition, gesture tracking, 3D visualization, 3D motion analysis, augmented environments. Shahrouz Yousefi February 2014 Sammanfattning Den snabba utvecklingen av mobila enheter under det senaste decenniet har i stor utstr¨ackning drivits av interaktion och visualiseringsteknologi. Aven¨ om peksk¨armaravsev¨arthar f¨orb¨attratinteraktions tekniken ¨ar det f¨oruts¨agbartatt med framtida mobila enheter, t.ex. augmented real- ity glas¨ogonoch smarta klockor, kommer anv¨andarekr¨aver mer intuitiva s¨attatt interagera, s˚asomex. fri hand interaktion i 3D-rymden. Speciellt viktigt blir det vid manipulation av digitalt inneh˚alli ut¨okade milj¨oer d¨ar3D hand/kropp gester kommer att vara ytterst n¨odv¨andig. D¨arf¨or ¨ar3D gest igenk¨anning och sp˚arningh¨ogt¨onskade egenskaper f¨orin- teraktionsdesign i framtida smarta milj¨oer. P˚agrund av komplexiteten i hand/kroppsr¨orelser,och begr¨ansningarav mobila enheter vid dyra ber¨akningar,¨ar3D- gest analys fortfarande ett mycket sv˚artproblem att l¨osa. Avhandlingen syftar till att inf¨oranya begrepp, teorier och tekniker f¨or naturlig och intuitiv interaktion i framtida ut¨okade milj¨oer.Bidrag fr˚an denna avhandling st¨oderbegreppet naken-hand 3D gest interaktion och interaktiv visualisering p˚aframtida smarta enheter. De inf¨ordatekniska l¨osningarm¨ojligg¨oreffektiv interaktion i 3D-rymden runt den smarta enheten. H¨ognoggrannhet och robust 3D r¨orelseanalysav hand/kropp gester utf¨orsf¨oratt underl¨atta3D-interaktion i olika till¨ampningsscenar- ier. De f¨oreslagnateknik g¨ordet m¨ojligtf¨oranv¨andareatt kontrollera, manipulera, och organiserar digitalt inneh˚alli 3D-rymden. Nyckelord: 3D gest interaktion, gest igenk¨annande,gest sp˚arning,3D visualisering, 3D r¨orelseanalys, ut¨okademilj¨oer. Shahrouz Yousefi februari 2014 Acknowledgements First of all, I wish to express my sincere gratitude to my main ad- visor, Prof. Haibo Li, for providing me this research opportunity. Thank you for your motivation, enthusiasm, and support during these years. Without your supervision and mentoring this thesis would not have been possible. You inspired me to be more adven- turous in research. I would like to thank my second advisor, Dr. Li Liu, for all the motivational and fruitful discussions. Special thanks to my dear friend and colleague, Farid Kondori. We had many collaborations, interesting discussions and enjoyable moments during these years. I would like to thank my former colleagues at Digital Media Lab, Ume˚aUniversity for their helpful suggestions and comments on my research projects. Special thanks to Annemaj Nilsson, Mona-Lisa Gunnarsson, and the friendly staff of the department of Applied Physics and Electronics, Ume˚aUniversity. My time at KTH was really enjoyable due to the friendly col- leagues of the department of Media Technology and Interaction Design. I am grateful for the time spent with them at work meet- ings, seminars and social events. I must especially thank Prof. Ann Lantz for providing an excellent research environment at the MID department. Thanks for your support, encouragement and kind- ness. I would also like to thank Henrik Artman, Cristian Bogdan, Ambj¨ornNaeve, Olle B¨alter,Eva-Lotta Salln¨as,and other senior researchers at MID department for their support and guidance. Many thanks should go to Dr. Roberto Bresin and Prof. Yngve Sundblad for reviewing my thesis. Your constructive ideas, in- sightful comments, and suggestions made a great improvement in the quality of my PhD thesis. Winning the first prize in KTH Innovation Idea Competition, best project work in Uminova Academic Business Challenge, and being selected as one of the top PhD works at ACM Multimedia Doctoral Symposium, motivated me to work harder on the development of my research ideas. I would especially like to thank H˚akan Borg and Cecilia Sandell from KTH Innovation for their great support on patentability analysis, business development and commercial- ization of my research results. Finally, and most importantly, I am grateful to my loving par- ents, my brother, and his family for giving me the endless intellec- tual support and encouragement to pursue my studies during these years. I would especially like to thank my best friend and com- panion Shora. Thanks for the wonderful and precious moments we shared together. Shahrouz Yousefi February 2014 Contents Contents v 1 Introduction 1 1.1 Motivation . 1 1.2 Research Problem . 3 1.2.1 Future Mobile Devices . 4 1.2.2 Experience Design . 4 1.2.3 Limitations in Interaction Facilities . 6 1.2.4 Limitations in Visualization . 8 1.2.5 Technical Challenges in 3D Gestural Interaction . 8 1.3 Future Trends in Multimedia Context . 10 1.3.1 3D Interaction Technology . 10 1.3.2 3D Visualization . 10 1.3.3 Passive Vision to Active/Interactive Vision . 10 1.3.4 Gesture Analysis: from Computer Vision Methods to Image-based Search Methods . 11 1.4 Research Strategy . 11 2 Related Work 13 2.1 Terminology . 13 2.2 Related Work . 14 2.2.1 3D Motion Capture Technologies in Available Interactive Systems . 15 v CONTENTS 2.2.1.1 Passive Motion Tracking and Its Applications 15 2.2.1.2 Active Motion Tracking and Its Applications . 16 2.2.1.3 Comparison Between Active and Passive Meth- ods . 16 2.2.2 3D Motion Estimation for Mobile Interaction . 18 2.2.3 3D Gesture Recognition and Tracking . 19 2.2.4 3D Visualization on Mobile Devices . 21 3 General Concept and Methodology 23 3.1 General Concept . 23 3.1.1 Interaction/Visualization Space . 25 3.1.2 Sharing the Interaction/Visualization space . 27 3.1.2.1 Single-user, Single-device . 27 3.1.2.2 Multi-user, Multi-device with Shared Interac- tion Space . 28 3.1.2.3 Multi-user, Single-device with Shared Visual- ization Space . 28 3.1.2.4 Interaction from Different Locations for Multi- user Multi-device . 28 3.2 Evolution of Interaction/Visualization Spaces . 28 3.3 Enabling Media Technologies . 31 3.3.1 Vision-based Motion Tracking in 3D Space . 32 3.3.2 3D Visualization . 33 3.4 Methodology Overview . 36 3.5 Gesture Analysis through the Pattern Recognition Methods . 38 3.6 Gesture Analysis through the Large-scale Image Retrieval . 40 4 Enabling Media Technologies 43 4.1 Gesture Detection and Tracking Based on Low-level Pattern Recognition . 44 vi CONTENTS 4.1.1 3D Motion Analysis . 46 4.2 Gesture Detection and Tracking Based on Gesture Search Engine . 48 4.2.1 Providing the Database of Gesture Images . 49 4.2.2 Query Processing and Matching . 50 4.2.3 Scoring System . 50 4.2.4 Quality of Hand Gesture Database . 52 4.3 Interactive 3D Visualization . 54 4.4 Methods for 3D Visualization . 56 4.4.1 Depth Recovery and 3D Visualization from a Single View 56 4.4.2 3D Visualization from Multiple 2D Views . 57 4.5 3D Channel Coding . 57 5 Experimental Results 59 5.1 Experiments on Gesture Detection, Tracking and 3D Motion Analysis . 59 5.1.1 Camera and Experiment Condition . 59 5.1.2 Algorithm . 60 5.1.3 Programming Environment and Results . 62 5.2 Experiments on Gesture Search Framework . 63 5.2.1 Constructing the Database . 63 5.2.2 Forming the Vocabulary Table .