Independent Hand-Tracking from a Single Two-Dimensional View and Its Application to South African Sign Language Recognition

UNIVERSITY OF THE WESTERN CAPE Independent hand-tracking from a single two-dimensional view and its application to South African sign language recognition by Imran Achmed A thesis submitted in conformity with the requirements for the degree of Doctor of Philosophy in the Faculty of Science Department of Computer Science February 2014 Declaration of Authorship I declare that Independent hand-tracking from a single two-dimensional view and its application to South African sign language recognition is my own work, that it has not been submitted for any degree or examination in any other university, and that all the sources I have used or quoted have been indicated and acknowledged by complete references. Signed: Date: i UNIVERSITY OF THE WESTERN CAPE Abstract Faculty of Science Department of Computer Science Doctor of Philosophy by Imran Achmed Supervisor: I. M. Venter Co-supervisor: P. Eisert Hand motion provides a natural way of interaction that allows humans to interact not only with the environment, but also with each other. The effectiveness and accuracy of hand-tracking is fundamental to the recognition of sign language. Any inconsistencies in hand-tracking result in a breakdown in sign language communication. Hands are articulated objects, which complicates the tracking thereof. In sign language communication the tracking of hands is often challenged by the occlusion of the other hand, other body parts and the environment in which they are being tracked. The thesis investi- gates whether a single framework can be developed to track the hands independently of an individual from a single 2D camera in constrained and unconstrained environments without the need for any special device. The framework consists of a three-phase strategy, namely, detection, tracking and learning phases. The detection phase validates whether the object being tracked is a hand, using extended local binary patterns and random forests. The tracking phase tracks the hands independently by extending a novel data-association technique. The learning phase exploits contextual features, using the scale-invariant features transform (SIFT) algorithm and the fast library for approx- imate nearest neighbours (FLANN) algorithm to assist tracking and the recovering of hands from any form of tracking failure. The framework was evaluated on South African sign language phrases that use a single hand, both hands without occlusion, and both hands with occlusion. These phrases were performed by 20 individuals in constrained and unconstrained environments. The experiments revealed that integrating all three phases to form a single framework is suitable for tracking hands in both constrained and unconstrained environments, where a high average accuracy of 82,08% and 79,83% was achieved respectively. Acknowledgements One of the joys of completing my thesis is looking back at the long journey that has passed, the uphill battles, the endless late nights, the frustration, the hard work and remembering all of those who supported and helped me along this long but fulfilling road. First and foremost, I thank the Almighty God for blessing me throughout my studies and granting me the knowledge, wisdom and guidance to reach this level of success. My heartfelt thanks and appreciations are conveyed to my supervisors, Prof. Isabella M. Venter and Prof. Peter Eisert. I feel fortunate and priviledged to have been guided by these remarkable academics. I value their insights, unparalleled knowledge, and the valuable time they spent guiding me in my research. Without their constructive critiques and advice, this thesis would not have been the same. Throughout my research, Mr. Dodds contributed much of his time proof-reading my articles and providing me with sound advice and constructive comments that was of tremendous benefit. An enormous thanks goes to him for his generous assistance. I am sincerely grateful and would like to thank Prof. Richard Madsen, Professor in Statistics, for his willingness to assist and provide insightful guidance on my statistical analysis from the research planning stages until the very end. His statistical expertise, especially in this thesis, is greatly appreciated. I would also like to thank Prof. Alex Potter, Professor in English, for professionally editing the thesis and for his thoughtful comments and suggestions. Many thanks to the Computer Science Department staff, especially, Ms. Abbott, Mrs. Jacobs-Samsodien and Mr. Leenderts for their selfless assistance in seeing to my (at times urgent) needs. A word of thanks to my sponsor, Telkom, for their financial support throughout the entire duration of my studies. Without their support, it would not have been possible to do this research. A further thanks is extended to all the individuals who volunteered and offered their time to participate in this research. To my lab colleagues, Pei-Li, Dane, Diego, Warren, Ibraheem, Kenzo and Roland, thank you for providing a stimulating environment to work in. I had a great time working alongside you all, sharing ideas and the many laughters we had. It was truly enjoyable. I am forever indebted to my parents who instilled in me the values of hard work and dedication. Even though they did not complete their school education, their enormous iii iv support in the furthering of my studies has pushed me to excel. It was always difficult for them to understand why I wanted to pursue my Ph.D., but my passion for research and seeing the research transform into a thesis has made them realise its worth. A special mention is given to my friends and family who have supported me through my studies; Arnold, Andrew, Ashley, Fozia, Chiara, Riedewaan, Mrs. Yvonne Meyer (Ma), Toshca, Jody, Latasha, Isaac, Lindy and Jason, thank you for your genuine concern in my well-being and always motivating me to succeed. Above all, I would like to convey a special thanks to the most important person in my life, my fiancee, and soon to be wife, Tamlynne Meyer. Her love, support, patience and understanding were just a few of her qualities that kept me sane throughout my studies. There were times when I was ready to throw in the towel, when no-one understood what I was going through but she was always beside me, listening to me and encouraging me. She kept me motivated and gave me the momentum to see through each year. No-one will understand the extent to which she has supported me, not only through my Ph.D., but throughout my studies since undergraduate level. To her I owe everything. There are no words to convey how much I love her. She is my source of inspiration. Tamlynne, to you I dedicate this thesis. Contents Declaration of Authorship i Abstract ii Acknowledgements iii List of Figures ix List of Tables xiii Glossary xv 1 Introduction 1 1.1 Background and motivation .......................... 1 1.2 Research problem ................................ 2 1.3 Research questions ............................... 5 1.4 Research objectives ............................... 5 1.5 Premises ..................................... 6 1.6 Contributions .................................. 7 1.7 Thesis outline .................................. 7 2 Related Work 9 2.1 Hand detection ................................. 10 2.1.1 Three-dimensional hand detection . 10 2.1.2 Two-dimensional hand detection ................... 11 2.2 Context-based object tracking ......................... 14 2.2.1 Knowledge-based context ....................... 14 2.2.2 Temporal context ............................ 15 2.2.3 Spatial context ............................. 16 2.3 Hand-tracking .................................. 17 2.3.1 Auxiliary approaches .......................... 17 2.3.1.1 Two-dimensional input ................... 18 2.3.1.2 Three-dimensional input . 19 2.3.2 Passive approaches ........................... 21 2.3.2.1 Model-based methods .................... 21 2.3.2.2 Appearance-based methods . 25 v Contents vi 2.3.2.2.1 Three-dimensional input . 26 2.3.2.2.2 Two-dimensional input . 27 2.4 Discussion and summary ............................ 32 3 Design Science Research 35 3.1 Research philosophy .............................. 35 3.2 Epistemology .................................. 37 3.3 Theoretical perspective ............................. 38 3.4 Methodology .................................. 39 3.4.1 Design science research ........................ 39 3.5 Methods ..................................... 42 3.6 Summary .................................... 43 4 Detection-Tracking-Learning 44 4.1 Detection Phase ................................ 44 4.1.1 Local binary patterns ......................... 44 4.1.1.1 Uniform LBP ......................... 47 4.1.1.2 Rotational-invariant LBP . 48 4.1.1.3 Global versus local LBP histograms . 48 4.1.2 Random forests ............................. 49 4.1.2.1 Error, strength and correlation using out-of-bag-error . 52 4.1.2.2 Variable importance ..................... 53 4.1.2.3 Testing ............................ 53 4.1.2.4 Parameter selection ..................... 55 4.1.2.5 Strengths and weaknesses of random forests . 55 4.1.3 Support vector machines ........................ 56 4.1.3.1 Kernel functions ....................... 61 4.2 Tracking phase ................................. 62 4.2.1 Skin segmentation ........................... 63 4.2.1.1 Face detection ........................ 64 4.2.1.2 Does a suitable colour space exist? . 65 4.2.1.3 Dynamic skin model ..................... 66 4.2.2 Connected-components labelling ................... 67 4.2.3 Independent hand-tracking .....................

Load more