Gesture Recognition on Few Training Data Using Slow Feature Analysis and Parametric Bootstrap
Total Page:16
File Type:pdf, Size:1020Kb
Gesture Recognition on Few Training Data using Slow Feature Analysis and Parametric Bootstrap Patrick Koch, Wolfgang Konen, and Kristine Hein Abstract— Slow Feature Analysis (SFA) has been established clearly differs from traditional game controllers. But even as a robust and versatile technique from the neurosciences to though there exist a large number of games for the Nintendo learn slowly varying functions from quickly changing signals. Wii, recognition of complex gestures is still a challenging Recently, the method has been also applied to classification tasks. Here we apply SFA for the first time to a time series task. In spite of the large number of classification approaches classification problem originating from gesture recognition. The using the Wiimote controller with the infrared device, e.g. the gestures used in our experiments are based on acceleration work from Lee [Lee08], we focus here on approaches where signals of the Bluetooth Wiimote controller (Nintendo). We no infrared device is used. Although the task becomes more show that SFA achieves results comparable to the well-known difficult then, we are not so dependent on the Wii structure Random Forest predictor in shorter computation time, given a sufficient number of training patterns. However – and this and can easily transfer our approach to other applications. is a novelty to SFA classification – we discovered that SFA This can be of high importance when the use of infrared requires the number of training patterns to be strictly greater is not possible for any reason and only acceleration based than the dimension of the nonlinear function space. If too few sensors are available. Similar works exist, e.g., a recent study patterns are available, we find that the model constructed by by Schlomer¨ et al. [SPHB08], who present a classification SFA severely overfits and leads to high test set errors. We analyze the reasons for overfitting and present a new solution tool based on a Hidden Markov Chain approach and Liu et + based on parametric bootstrap to overcome this problem. al. [LW 09] who apply personalized gesture recognition to a user authentication problem. Rehm et al. [RBA08] describe I. INTRODUCTION classification of Wii gestures with different methods and Slow Feature Analysis (SFA) is a new learning algorithm expose the influence of cultural behaviours in gestures. Mal- emerging from neuroscience which is capable of learning mestig and Sundberg [MS08] use the Wiimote to recognize unsupervised new features or ’concepts’ from multidimen- gestures for sign language recognition with good results. The sional time signals. SFA has been originally developed in commercial product LiveMove2 [Inc09] integrates features of context of an abstract model of unsupervised learning of the Nintendo Motion Plus addon but no benchmark results invariances in the visual system of vertebrates [Wis98] and are available for publicity. is described in detail in [WS02], [Wis03]. Although SFA is The Slow Feature Analysis and Random Forest algorithms inspired from neuroscience, it does not have the drawbacks are briefly introduced in the next section. We also give of conventional ANNs such as long training times or strong detailed information about the acquisition and preparation of dependencies on initial conditions. Instead, SFA is fast in the gathered data. The experimental results are described in training and it has the potential to find hidden features out of Sec. 3 and discussed in Sec. 4. We finish with a conclusion of multidimensional signals, as has been shown impressively by the results and give an outlook of our future work in Sec. 5. [Ber05] for handwritten-digit recognition. For these reasons II. METHODS we apply SFA for the first time to a complex time series A. Slow Feature Analysis classification problem originating from gesture recognition, namely acceleration signals of the Nintendo Bluetooth Wi- Slow Feature Analysis has been invented by Wiskott imote controller and observe several interesting facts when and Sejnowski [WS02], [Wis03] and later extended by applying this method to small data sets. Berkes [Ber05] for classification. The original SFA approach The inclusion of accelerometers in electronic devices such for time series analysis is defined as follows: For a multivari- as mobile phones or game controllers for gesture recognition ate time series signal ~x(t) where t indicates time, find the set has become more and more popular in consumer electronics. of real-valued output functions g1(~x); g2(~x); :::; gM (~x), such While in the nineties, the recognition of gestures placed a that each output function high demand on hardware and was only present in the labs yj(t) = gj(~x(t)) (1) of research institutes, gesture recognition has now made its 1 way into the homes. With over 50 million units sold [Shi09] minimally changes in time : the reason for the success of the Nintendo Wii console can 2 ∆yj(t) = hy_j it is minimal (2) be connected to the novel type of controller handling, which To exclude trivial solutions we add some constraints: Patrick Koch, Wolfgang Konen, and Kristine Hein are with the Depart- ment of Computer Sciences and Engineering Science, Cologne University hyjit = 0 (zero mean) (3) of Applied Sciences, 51643 Gummersbach, Germany (email: fpatrick.koch, 1 wolfgang.konen, [email protected]) h·it means average over time and y_ indicates the time derivative. TABLE I 2 ABSOLUTE EXECUTION TIMES (IN SECONDS) FOR FIVE EXAMPLE hy it = 1 (unit variance) (4) j GESTURES PERFORMED BY TEN TEST PERSONS.EACH GESTURE WAS RECORDED TEN TIMES. hykyjit = 0 (decorrelation for k > j) (5) Gesture Minimum Median Maximum Std.Dev. The third equation is only relevant from the second slow Circle 0.773 1.607 4.914 0.343 Throw 0.416 0.957 2.306 0.471 signal on and higher ones to prevent higher signals from Frisbee 0.376 0.883 2.036 0.353 learning features already represented by slower signals. Bowling 0.530 1.325 2.303 0.377 For arbitrary functions this problem is difficult to solve, Z 0.980 1.607 4.280 0.453 but SFA finds a solution by expanding the input signal into a nonlinear function space by applying certain basis R-implementation of Random Forest [LW02]. It has been functions, e.g. monomials of degree d. This expanded signal shown in the past that the algorithm is quite robust to is sphered to fulfill the constraints of equations (3), (4) and overfitting and noise, that is why we use it as a comparison (5). Then SFA calculates the time derivative of the sphered to the SFA approach. expanded signal and determines from its covariance matrix the normalized eigenvector with the smallest eigenvalue. Finally the sphered expanded signal is projected onto this eigenvector to obtain the slowest output signal y1(t). Berkes [Ber05] has extended this approach to classify a set of handwritten digits. The main idea of this extension is to create many small time series out of the class patterns. Let us assume that for a K-class problem each class cm 2 fc1; :::; cK g has got Nm patterns. We then reformulate the ∆-objective function (2) for SFA with distinct indices k and l as the mean of the difference over all possible pairs: N K m 2 1 X X (m) (m) Fig. 1. Wiimote Controller ∆(yj) = · gj(pk ) − gj(pl ) (6) npair m=1 k;l=1; k<l C. Data Acquisition and Data Preparation (m) where npair denotes the total count of all pairs and pk Nintendo has invented the Wiimote Controller (Fig. 1) (m) and pl represent the k-th and l-th class pattern of class as a Bluetooth compliant game controller for the Nintendo m. In our case such a pattern can be understood as a Wii console. In its basic version an acceleration sensor, a gesture, consisting of a concatenated signal from the 3 axis so called accelerometer, is implemented inside the device, accelerometer of the Wiimote controller. Constraints (3), (4) as well as one infrared device for position determination and (5) can be reformulated then by substituting the average of the controller during games. Additionally there exists over time with the average over all patterns, such that the an extension to the Wiimote, the Wii Motion Plus which learned functions are going to have zero mean, unit variance contains two gyrometers for calculation of angular velocities. and be decorrelated [Ber05]. In order to keep things as simple as possible with respect As [Ber05] has shown, the (K − 1) slowest SFA output to the sensor and hardware side we neither use the sensor signals are expected to have a low intra-class variation, but information from the Wii infrared sensor (sensor bar) nor usually a high inter-class variation. Therefore it is natural from the Wii Motion Plus sensors. As observed in other to train a standard Gaussian classifier on the slowest (K − studies by Schlomer¨ et al. [SPHB08] and Rehm et al. 1) SFA outputs produced from the training records. The [RBA08] we expect to get comparably good and accurate Gaussian classifier will seek an optimal position and shape of recognition rates with the sensor data solely taken from the a Gauss function for each class in this (K − 1)-dimensional accelerometer. space. We recorded five different gestures from ten test per- All SFA calculations were performed with the extended sons with the Avetana Bluetooth library together with a MATLAB toolkit sfa-tk V2.0 from [Kon09], [Ber05]. modified implementation of the Wiigee framework by Pop- pinga [Pop07b]. Accelerometer data were recorded with B. Random Forest their timestamps at a rate of approximately 100 Hz, not The Random Forest (RF) predictor by Breiman [Bre01] is equidistantly delivered from the Bluetooth interface. In a an extension of the well-known classification and regression preliminary data preparation step we interpolated the ac- trees (CART) [BFSO84].