3D Gesture Recognition and Tracking for Next Generation of Smart Devices

3D Gesture Recognition and Tracking for Next Generation of Smart Devices Theories, Concepts, and Implementations SHAHROUZ YOUSEFI Department of Media Technology and Interaction Design School of Computer Science and Communication KTH Royal Institute of Technology Doctoral Thesis in Media Technology Stockholm, February 2014 3D Gesture Recognition and Tracking for Next Generation of Smart De- vices: Theories, Concepts, and Implementations Shahrouz Yousefi Department of Media Technology and Interaction Design (MID) School of Computer Science and Communication (CSC) KTH Royal Institute of Technology SE-100 44, Stockholm, Sweden Author's e-mail: [email protected] Akademisk avhandling som med tillst˚andav Kungliga Tekniska Högskolan framläggstill offentlig granskning föravläggandeav Teknologie Doktorsex- amen i Medieteknik, m˚andagenden 17 mars 2014 kl 13:15 i sal F3 Lindst- edsvägen26, Kungliga Tekniska Högskolan, Stockholm. TRITA-CSC-A-2014-02 ISSN-1653-5723 ISRN-KTH/CSC/A{14/02-SE ISBN-978-91-7595-031-0 Copyright c 2014 by Shahrouz Yousefi, All rights reserved. Typeset in LATEX by Shahrouz Yousefi E-version available at http://kth.diva-portal.org Printed by E-print AB, Stockholm, Sweden, 2014 Distributor: KTH School of Computer Science and Communication Abstract The rapid development of mobile devices during the recent decade has been greatly driven by interaction and visualization technologies. Al- though touchscreens have significantly enhanced the interaction technology, it is predictable that with the future mobile devices, e.g., augmented reality glasses and smart watches, users will demand more intuitive in- puts such as free-hand interaction in 3D space. Specifically, for manipulation of the digital content in augmented environments, 3D hand/body gestures will be extremely required. Therefore, 3D gesture recognition and tracking are highly desired features for interaction design in future smart environments. Due to the complexity of the hand/body motions, and limitations of mobile devices in expensive computations, 3D gesture analysis is still an extremely difficult problem to solve. This thesis aims to introduce new concepts, theories and technologies for natural and intuitive interaction in future augmented environments. Contributions of this thesis support the concept of bare-hand 3D gestural interaction and interactive visualization on future smart devices. The introduced technical solutions enable an effective interaction in the 3D space around the smart device. High accuracy and robust 3D motion analysis of the hand/body gestures is performed to facilitate the 3D interaction in various application scenarios. The proposed technologies enable users to control, manipulate, and organize the digital content in 3D space. Keywords: 3D gestural interaction, gesture recognition, gesture tracking, 3D visualization, 3D motion analysis, augmented environments. Shahrouz Yousefi February 2014 Sammanfattning Den snabba utvecklingen av mobila enheter under det senaste decenniet har i stor utsträckning drivits av interaktion och visualiseringsteknologi. Aven¨ om pekskärmaravsevärthar förbättratinteraktions tekniken är det förutsägbartatt med framtida mobila enheter, t.ex. augmented reality glasögonoch smarta klockor, kommer användarekräver mer intuitiva sättatt interagera, s˚asomex. fri hand interaktion i 3D-rymden. Speciellt viktigt blir det vid manipulation av digitalt inneh˚alli utökade miljöer där3D hand/kropp gester kommer att vara ytterst nödvändig. Därför är3D gest igenkänning och sp˚arninghögtönskade egenskaper förin- teraktionsdesign i framtida smarta miljöer. P˚agrund av komplexiteten i hand/kroppsrörelser,och begränsningarav mobila enheter vid dyra beräkningar,är3D- gest analys fortfarande ett mycket sv˚artproblem att lösa. Avhandlingen syftar till att införanya begrepp, teorier och tekniker för naturlig och intuitiv interaktion i framtida utökade miljöer.Bidrag fr˚an denna avhandling stöderbegreppet naken-hand 3D gest interaktion och interaktiv visualisering p˚aframtida smarta enheter. De infördatekniska lösningarmöjliggöreffektiv interaktion i 3D-rymden runt den smarta enheten. Högnoggrannhet och robust 3D rörelseanalysav hand/kropp gester utförsföratt underlätta3D-interaktion i olika tillämpningsscenar- ier. De föreslagnateknik gördet möjligtföranvändareatt kontrollera, manipulera, och organiserar digitalt inneh˚alli 3D-rymden. Nyckelord: 3D gest interaktion, gest igenkännande,gest sp˚arning,3D visualisering, 3D rörelseanalys, utökademiljöer. Shahrouz Yousefi februari 2014 Acknowledgements First of all, I wish to express my sincere gratitude to my main advisor, Prof. Haibo Li, for providing me this research opportunity. Thank you for your motivation, enthusiasm, and support during these years. Without your supervision and mentoring this thesis would not have been possible. You inspired me to be more adven- turous in research. I would like to thank my second advisor, Dr. Li Liu, for all the motivational and fruitful discussions. Special thanks to my dear friend and colleague, Farid Kondori. We had many collaborations, interesting discussions and enjoyable moments during these years. I would like to thank my former colleagues at Digital Media Lab, Ume˚aUniversity for their helpful suggestions and comments on my research projects. Special thanks to Annemaj Nilsson, Mona-Lisa Gunnarsson, and the friendly staff of the department of Applied Physics and Electronics, Ume˚aUniversity. My time at KTH was really enjoyable due to the friendly colleagues of the department of Media Technology and Interaction Design. I am grateful for the time spent with them at work meet- ings, seminars and social events. I must especially thank Prof. Ann Lantz for providing an excellent research environment at the MID department. Thanks for your support, encouragement and kind- ness. I would also like to thank Henrik Artman, Cristian Bogdan, AmbjörnNaeve, Olle Bälter,Eva-Lotta Sallnäs,and other senior researchers at MID department for their support and guidance. Many thanks should go to Dr. Roberto Bresin and Prof. Yngve Sundblad for reviewing my thesis. Your constructive ideas, in- sightful comments, and suggestions made a great improvement in the quality of my PhD thesis. Winning the first prize in KTH Innovation Idea Competition, best project work in Uminova Academic Business Challenge, and being selected as one of the top PhD works at ACM Multimedia Doctoral Symposium, motivated me to work harder on the development of my research ideas. I would especially like to thank H˚akan Borg and Cecilia Sandell from KTH Innovation for their great support on patentability analysis, business development and commercial- ization of my research results. Finally, and most importantly, I am grateful to my loving par- ents, my brother, and his family for giving me the endless intellec- tual support and encouragement to pursue my studies during these years. I would especially like to thank my best friend and com- panion Shora. Thanks for the wonderful and precious moments we shared together. Shahrouz Yousefi February 2014 Contents Contents v 1 Introduction 1 1.1 Motivation . 1 1.2 Research Problem . 3 1.2.1 Future Mobile Devices . 4 1.2.2 Experience Design . 4 1.2.3 Limitations in Interaction Facilities . 6 1.2.4 Limitations in Visualization . 8 1.2.5 Technical Challenges in 3D Gestural Interaction . 8 1.3 Future Trends in Multimedia Context . 10 1.3.1 3D Interaction Technology . 10 1.3.2 3D Visualization . 10 1.3.3 Passive Vision to Active/Interactive Vision . 10 1.3.4 Gesture Analysis: from Computer Vision Methods to Image-based Search Methods . 11 1.4 Research Strategy . 11 2 Related Work 13 2.1 Terminology . 13 2.2 Related Work . 14 2.2.1 3D Motion Capture Technologies in Available Interactive Systems . 15 v CONTENTS 2.2.1.1 Passive Motion Tracking and Its Applications 15 2.2.1.2 Active Motion Tracking and Its Applications . 16 2.2.1.3 Comparison Between Active and Passive Meth- ods . 16 2.2.2 3D Motion Estimation for Mobile Interaction . 18 2.2.3 3D Gesture Recognition and Tracking . 19 2.2.4 3D Visualization on Mobile Devices . 21 3 General Concept and Methodology 23 3.1 General Concept . 23 3.1.1 Interaction/Visualization Space . 25 3.1.2 Sharing the Interaction/Visualization space . 27 3.1.2.1 Single-user, Single-device . 27 3.1.2.2 Multi-user, Multi-device with Shared Interac- tion Space . 28 3.1.2.3 Multi-user, Single-device with Shared Visual- ization Space . 28 3.1.2.4 Interaction from Different Locations for Multi- user Multi-device . 28 3.2 Evolution of Interaction/Visualization Spaces . 28 3.3 Enabling Media Technologies . 31 3.3.1 Vision-based Motion Tracking in 3D Space . 32 3.3.2 3D Visualization . 33 3.4 Methodology Overview . 36 3.5 Gesture Analysis through the Pattern Recognition Methods . 38 3.6 Gesture Analysis through the Large-scale Image Retrieval . 40 4 Enabling Media Technologies 43 4.1 Gesture Detection and Tracking Based on Low-level Pattern Recognition . 44 vi CONTENTS 4.1.1 3D Motion Analysis . 46 4.2 Gesture Detection and Tracking Based on Gesture Search Engine . 48 4.2.1 Providing the Database of Gesture Images . 49 4.2.2 Query Processing and Matching . 50 4.2.3 Scoring System . 50 4.2.4 Quality of Hand Gesture Database . 52 4.3 Interactive 3D Visualization . 54 4.4 Methods for 3D Visualization . 56 4.4.1 Depth Recovery and 3D Visualization from a Single View 56 4.4.2 3D Visualization from Multiple 2D Views . 57 4.5 3D Channel Coding . 57 5 Experimental Results 59 5.1 Experiments on Gesture Detection, Tracking and 3D Motion Analysis . 59 5.1.1 Camera and Experiment Condition . 59 5.1.2 Algorithm . 60 5.1.3 Programming Environment and Results . 62 5.2 Experiments on Gesture Search Framework . 63 5.2.1 Constructing the Database . 63 5.2.2 Forming the Vocabulary Table .

3D Gesture Recognition and Tracking for Next Generation of Smart Devices

Gesture Recognition for Human-Robot Collaboration: a Review

Inverse Kinematic Infrared Optical Finger Tracking

Study of Gesture Recognition Methods and Augmented Reality

Airpen: a Touchless Fingertip Based Gestural Interface for Smartphones and Head-Mounted Devices

Auraring: Precise Electromagnetic Finger Tracking

Hand Interface Using Deep Learning in Immersive Virtual Reality

Lightweight Palm and Finger Tracking for Real-Time 3D Gesture Control

Improving Gesture Recognition Accuracy on Touch Screens for Users with Low Vision

Controlling Robots with Wii Gestures

Perception-Action Loop in the Experience of Virtual Environments

A Wii-Based Gestural Interface for Computer Conducting Systems

Analysis and Categorization of 2D Multi-Touch Gesture Recognition Techniques THESIS Presented in Partial Fulfillment of the Requ