Cooperative Gesture Recognition Learning Characteristics of Classifiers and Navigating the User to an Ideal Situation
Total Page:16
File Type:pdf, Size:1020Kb
Cooperative Gesture Recognition Learning Characteristics of Classifiers and Navigating the User to an Ideal Situation Hiromasa Yoshimoto and Yuichi Nakamura Academic Center for Computing and Media Studies, Kyoto University, Sakyo-ku, Kyoto 606–8501, Japan Keywords: Cooperative Recognition, Gesture Interface, and Usability. Abstract: This paper introduces a novel scheme of gesture interface that guides the user toward obtaining better perfor- mance and usability. The accuracy of gesture recognition is heavily affected by how the user makes postures and moves, as well as environmental conditions such as lighting. The usability of the gesture interface can potentially be improved by notifying the user of when and how better accuracy is obtained. For this purpose, we propose a method for estimating the performance of gesture recognition in its current condition, and a method for suggesting possible ways to improve performance to the user. In performance estimation, accu- racy in the current condition is estimated based on supervised learning with a large number of samples and corresponding ground truths. If the estimated accuracy is insufficient, the module searches for better condi- tions that can be reached with the user’s cooperation. If a good improvement is possible, the way to improve is communicated to the user in terms of visual feedback, which shows how to avoid or how to recover from the undesirable condition. In this way, users benefit, i.e., better accuracy and usability, by cooperating with the gesture interface. 1 INTRODUCTION For this navigation, we introduce a condition vec- tor and a condition space to represent how the gesture As the success of Microsoft’s Kinect sensor shows, is easily recognized by the gesture recognition mod- a gesture-based interface is one of practical solu- ule. The condition vector is defined by the parameters tion for a natural user interface(Shotton and Sharp, measured by sensors, such as the user’s standing po- 2011; OpenNI organization, 2010; PrimeSense Inc., sition, posture of upper body, and movement speed of 2010). Its fundamental technologies, such as mo- the user’s hands. tion tracking and gesture classification, however, still We then use recall and precision as a measure of face many difficulties and challenges (Pavlovic et al., recognition performance, and calculates the distribu- 1997; Moeslund et al., 2006; Ren et al., 2013), be- tion in condition space. Here, we can assume that cause of the ambiguities of human’s behavior, self- if the condition space has enough dimensions, i.e., occlusion, and the sensor’s measurement noise, etc. enough parameters, performances around neighbor- Moreover, a gesture-based interface has additional us- ing positions are similar. If the recognition perfor- ability requirements. It must not only maintain good mance is satisfactory for one condition vector, it is recognition performance, but also needs to indicate expected to be satisfactory in its neighborhood. For its internal status, to give easy ways to recover from covering over the condition space, we assume it can failures(Nielsen, 2009; Jacko and Sears, 2003). be separated into two portions: a portion in which the In this paper, we focus on a framework for deal- recognition performance is satisfactory and another ing with both of the above goals: to improve the per- portion with unsatisfactory performance. formance of gesture recognition, and to improve the With the above assumptions, our cooperative usability of the gesture interface. For these two goals, recognition scheme is composed by adding two mod- we propose a cooperative gesture recognition scheme, ules to a common gesture interface scheme: a per- which navigates the user into changing their behavior formance estimation module and a feedback module. so that both performance and usability are improved These two modules work as follows. The perfor- at the same time. mance estimation module estimates recall and preci- 210 Yoshimoto H. and Nakamura Y.. Cooperative Gesture Recognition - Learning Characteristics of Classifiers and Navigating the User to an Ideal Situation. DOI: 10.5220/0005206902100218 In Proceedings of the International Conference on Pattern Recognition Applications and Methods (ICPRAM-2015), pages 210-218 ISBN: 978-989-758-077-2 Copyright c 2015 SCITEPRESS (Science and Technology Publications, Lda.) CooperativeGestureRecognition-LearningCharacteristicsofClassifiersandNavigatingtheUsertoanIdealSituation sion from the current condition vector, and the feed- back module advises the user to change his/her be- havior to a more desirable condition, where the recall and precision become higher. We modeled this performance estimation module as functions that map condition vectors into recall and precision. If enough labeled training samples are available, these functions are constructed by su- pervised machine learning. With these functions, the module can not only estimate the performance of cur- rent conditions, but also predict performance for any condition. The feedback module uses estimate functions to search for more suitable conditions around the cur- Figure 1: Example of ”easy” and “difficult” images for rent condition, where the predicted performance in- gesture/sign recognition; the difficulties arise from image creases. If a suitable condition is found, then the feed- noise, occlusions, motion blurs, etc. back module synthesizes visual hints to navigate the user into the suitable condition. For example, if the Table 1: Four categories of the classifier’s output. left position is predicted to have more suitable condi- Ground truth is: tions for the gesture recognition, the feedback module X X¯ draws an arrow-shaped computer graphic (CG) object Output TP: True positive FP: False positive X that induces the user to move a little to the left. of (Correct) (Type I error) The main contribution of this scheme proposes not classifier FN: False negative TN: True negative X¯ a specialized application but a practical and general- is: (Type II error) (Correct) ized scheme. Our scheme can improve the perfor- mance of the existing gesture interface by just adding two modules. No modification of existing gesture tured color and depth image cause ambiguities in im- classifiers is required. age analysis. Figure 1 shows actual hand depth im- This paper is structured as follows. In Section 2, ages captured by a Kinect sensor; there are various an overview of related works is given. The proposed images ranging from “easy” to “difficult” recognition. scheme is described in Section 3. In Section 4, we Although various approaches have been proposed to present the experimental evaluation, and we conclude overcome these difficulties in terms of CV or PR, ro- in Section 5. bust and accurate gesture recognition remains a dif- ficult task(Pavlovic et al., 1997; ?). In other words, there is no approach that can guarantee 100% accu- racy in gesture recognition. 2 RELATED WORK However, from the viewpoint of the usability en- gineering, the user interface should be designed to be In the fields of amusement or entertainment, a efficient to use and easy to learn. Nielsen’s(Nielsen, gesture-based interface is a practical technology. 2009) guidelines in usability engineering appeal for There are many commercial products, such as Mi- a gesture-based user interface to provide some func- crosoft’s Kinect sensors(Ren et al., 2013), Nintendo tions that help the user to diagnose how accurate the Wii, and Eye cam Sony, etc. However, despite these recognized result is, and to rectify the error when ges- latest technologies, there are still challenging prob- ture recognition fails. These features have already lems in gesture recognition. been proposed in many studies and are used in many From the viewpoint of computer vision (CV) and commercial products. However, the pattern of appli- pattern recognition (PR), gesture recognition still has cation design is not yet well discussed and formalized. challenges in accuracy and the robustness of measure- ment and recognition. For example, in hand gesture recognition, the difficulties arise from the complex structure of hands and the wide variations in their 3 COOPERATIVE GESTURE appearances. Moreover, the ambiguities of the sen- RECOGNITION sor’s observation make further difficulties for gesture recognition; e.g., when we use an RGB–D camera as In general, each algorithm in gesture recognition, the sensor, the pixel’s noise and occlusions on the cap- such as a feature detector or a gesture classifier, uses 211 ICPRAM2015-InternationalConferenceonPatternRecognitionApplicationsandMethods tational problems in handling high-dimensional fea- FN “difficult” condition tures, e.g., the curse of dimensionality; therefore, we for classifier need to design the instances of condition vectors care- TP FN fully. FP TP TN TN 3.1 Predicting Gesture Recognition “easy” condition Performance for classifier (a) Condition space Once the condition space is composed, the next prob- lem is to decide what kind of measure is appropriate for “appropriateness”. Our suggestion is that the use FN of recognition accuracy is simple and sufficient. TP Let us consider the gesture recognition process FN FP as a multi-class classifier of an input gesture. More TP Navigate specifically, we assume an one-versus-the-rest config- TN TN uration of binary classifiers; for each gesture class CX , the gesture recognition works as a binary-class clas- sifier FX that outputs X or X¯ , indicating whether the (b) User navigation input belongs to gesture CX or not. Next, we catego-