Fingerspelling Gesture Recognition Using Principal Components Analysis
Total Page:16
File Type:pdf, Size:1020Kb
KWAME NKRUMAH UNIVERSITY OF SCIENCE AND TECHNOLOGY, KUMASI COLLEGE OF SCIENCE DEPARTMENT OF MATHEMATICS FINGERSPELLING GESTURE RECOGNITION USING PRINCIPAL COMPONENTS ANALYSIS A THESIS SUBMITTED TO THE DEPARTMENT OF MATHEMATICS THROUGH THE NATIONAL INSTITUTE FOR MATHEMATICAL SCIENCES IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE AWARD OF MASTER OF PHILOSOPHY DEGREE (SCIENTIFIC COMPUTING & INDUSTRIAL MODELING) BY RUDOLPH ELIKEM KLU JUNE, 2017 Declaration I hereby declare that, this thesis is the result of my own original research and that no part of it has been submitted to any institution or organization anywhere for the award of a degree. All inclusion for the work of others has been dully acknowledged. Rudolph Elikem, Klu .................................... Student (PG3325414) Signature Date Certified by: Dr. Peter Amoako-Yirenkyi .................................... Member, Supervisory Committee Signature Date Dr. Akoto Yaw Omari-Sasu .................................... Member, Supervisory Committee Signature Date Dr. Peter Romeo Nyarko .................................... Member, Supervisory Committee Signature Date Dr. R. K. Avuglah .................................... Head of Department Signature Date i Abstract Sign language is one of the most natural and raw forms of language and communica- tion. which could be dated back to as early as the advent of the human civilization, when the first theories of sign languages appeared in history. This thesis presents an approach to recognize hand spelling gestures to aid the deaf/mute communicate with non-signers. Images were captured using a computer camera. Skin/hand ar- eas were segmented out of the images using color, rotated onto their principal axis by the method of moments and transformed into a PCA feature space for gesture feature identification and characterization. Data was trained in this system and sub- sequently test data was classified using a single space euclidean classifier and a single space Mahalanobis classifier which utilized the Euclidean and Mahalanobis distances respectively. The two classifiers are compared for their accuracy and efficiency. The results of the work indicated that the single space Mahalanobis Classifier performed better than the single space euclidean classifier especially as the number of principal components increased. The number of principal components selected also greatly affected the accuracy of classification, with more principal components introducing noise in the images. ii Acknowledgements Glory be to the Almighty God who made this work possible. It is with great pleasure that, my supervisor, Dr. Peter Amoako-Yirenkyi is also acknowledged, for his immense contribution, advice and assistance. I would also like to acknowledge my parents and siblings for their prayers and moral support, to my friends, course mates and the countless others who in one way or the other aided in making this project a success. May the Almighty God richly bless you all. iii Dedication This thesis is dedicated to my father, Mr. Rudolph Klu, mother, Mrs. Patricia Klu, my siblings, Yaa Klu, Aku Brocke, Adjoa Klu, Emmanuel Klu, Ostwin and JNOKO iv Contents Declarationi Abstract ii Acknowledgements iii Dedication iv Contentsv List of Figures viii 1 INTRODUCTION1 1.1 Background................................1 1.2 Motivation.................................4 1.3 Problem Statement............................4 1.4 Objective.................................4 1.5 Outline of the Methodology.......................5 1.6 Justification of the Study.........................5 1.7 Organization of Chapters.........................5 2 LITERATURE REVIEW7 2.1 Introduction................................7 2.2 Detection.................................9 2.2.1 Color................................9 v 2.2.2 Shape............................... 11 2.2.3 Detectors that learn from pixel values.............. 13 2.2.4 3D model-based detection.................... 14 2.2.5 Motion............................... 14 2.3 Tracking.................................. 15 2.3.1 Template Based Tracking.................... 15 2.3.2 Optimal Estimation Techniques................. 17 2.4 Recognition................................ 18 2.4.1 Template Matching........................ 18 2.4.2 Methods based on Principal Component Analysis....... 19 2.5 Complete Gesture Recognition Systems................. 20 3 METHODOLOGY 21 3.1 Introduction................................ 21 3.2 Segmentation............................... 21 3.2.1 Skin Detection(Colour)...................... 22 3.2.2 RGB and Normalized RGB Color Space............ 23 3.2.3 YCbCr Color Space........................ 24 3.2.4 HSV Color Space......................... 25 3.2.5 Image Rotation.......................... 32 3.2.6 Principal Component Analysis(PCA).............. 33 3.2.7 Construction of Image Vectors.................. 38 3.2.8 Training Phase.......................... 39 3.2.9 Classification........................... 41 vi 4 Analysis and Recognition Results 43 4.1 Pre-processing(Skin Detection and Segmentation)........... 44 5 CONCLUSION and RECOMMENDATIONS 51 5.1 Summary of Results........................... 51 5.2 Conclusion................................. 51 5.3 Recommendation for Further Studies.................. 52 5.3.1 Recommendation......................... 52 5.3.2 Further Studies.......................... 52 REFERENCES 53 vii List of Figures 3.1 The RGB color space-RGB Cube.................... 24 3.2 The YCbCr color space-YCbCr Cube.................. 25 3.3 The HSV color space-HSV Cone..................... 26 3.4 An image in the RGB and HSV Color Space.............. 27 3.5 Histogram Plot of the HSV Color Space................ 27 3.6 Histogram Plot of the RGB Color Space................ 28 3.7 Skin Locus Proposed by [Soriano et al.(2003)]............. 29 3.8 Skin Color Distribution Proposed by [Soriano et al.(2003)]...... 30 3.9 Coarse Skin Region Proposed by [Soriano et al.(2003)]........ 31 3.10 Feature space transformation....................... 35 3.11 An example of a plot of eigenvalues, extracted from a data set of human hand images............................ 36 3.12 The first? three PCs. σi, is the standard deviation along the ith PC, and σi “ λi................................ 39 3.13 Image reconstruction using PCA..................... 40 4.1 The 6 hand spelling gestures that were trained and recognized.... 43 4.2 Hand region segmentation for gesture "A"............... 44 4.3 Hand region segmentation for gesture "B"............... 45 4.4 Hand region segmentation for gesture "C"............... 45 viii 4.5 Hand region segmentation for gesture "Five".............. 46 4.6 Hand region segmentation for gesture "Point"............. 46 4.7 Hand region segmentation for gesture "V"............... 47 4.8 Testing Classification for gesture "B".................. 47 4.9 Testing Classification for gesture "C".................. 48 4.10 Testing Classification for gesture "Point"................ 48 4.11 Testing Classification for gesture "V".................. 49 4.12 Testing Classification for gesture "V".................. 49 ix Abbreviations/Acronyms HMM Hidden Markov Models SLR Sign Language Recognition ASL American Sign Language GSL Ghanaian Sign Language AdaSL Adamorobe Sign Language PCA Principal Components Analysis PC Principal Components x Chapter 1 INTRODUCTION 1.1 Background Human gestures constitute a space of motion expressed by the body, face, and/or hands. Among a variety of gestures, hand gesture is the most expressive and the most frequently used. A gesture is defined as a movement of part of the body, especially a hand or the head, to express an idea or meaning[The Oxford Dictionary]. It can also be said to be an action performed to convey a feeling or intention. As humans, communication is done with our voices. Other parts of the body such as our limbs and face are used to make various gestures. In a few decades, many attempts have been made to create systems with the concept of computer vision to understand and interpret gestures. Sign languages in the form of hand gesture is the main form of communication among the deaf and the hearing-impaired. This perhaps can be attributed to the fact that, There are special rules of context and grammars that support each expression. Gesture based communication is a standout amongst the most normal and crude types of dialect and correspondence. Gesture based communication goes back to the early approach of the human development; from the fifth century BC,when the 1 principal composed records of communication through signing are recorded ever. It has utilized even before talked dialect/correspondence developed. From that point forward, the gesture based communication has developed and been received as a ba- sic piece of our everyday correspondence forms. Presently, communications through signing are being utilized widely in worldwide sign language for the deaf and dumb, in the realm of games, for religious practices and furthermore at work places [Rockett (2003)]. Gestures are one of the main types of correspondence when a baby figures out how to express its requirement for nourishment, warmth and solace. It improves the accentuation of talked dialects and aides in communicating considerations and sentiments successfully. A basic signal with one hand has a similar importance ev- erywhere throughout the world and means either "greetings" or 'farewell'. Many individuals go to remote nations without knowing the official dialect of the went by nation and still figure out how to perform correspondence