Deep, Convolutional, and Recurrent Models for Human Activity Recognition Using Wearables
Total Page:16
File Type:pdf, Size:1020Kb
Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence (IJCAI-16) Deep, Convolutional, and Recurrent Models for Human Activity Recognition Using Wearables Nils Y. Hammerla1,2, Shane Halloran2, Thomas Plotz¨ 2 1babylon health, London, UK 2Open Lab, School of Computing Science, Newcastle University, UK [email protected] Abstract these relatively simple means suffice to obtain impressive recognition accuracies. However, more elaborate behaviours Human activity recognition (HAR) in ubiquitous which are, for example, of interest in medical applications, computing is beginning to adopt deep learning to pose a significant challenge to this manually tuned approach substitute for well-established analysis techniques [Hammerla et al., 2015]. Furthermore, some work has sug- that rely on hand-crafted feature extraction and gested that the dominant technical approach in HAR benefits classification methods. However, from these iso- from biased evaluation settings [Hammerla and Plotz,¨ 2015], lated applications of custom deep architectures it is which may explain some of the apparent inertia in the field difficult to gain an overview of their suitability for toward the adoption of deep learning techniques. problems ranging from the recognition of manipu- Deep learning has the potential to have significant impact lative gestures to the segmentation and identifica- on HAR in ubicomp. It can substitute for manually designed tion of physical activities like running or ascending feature extraction procedures which lack the robust physio- stairs. In this paper we rigorously explore deep, logical basis that benefits other fields such as speech recog- convolutional, and recurrent approaches across nition. However, for the practitioner it is difficult to select three representative datasets that contain movement the most suitable deep learning method for their application. data captured with wearable sensors. We describe Work that promotes deep learning almost always provides how to train recurrent approaches in this setting, in- only the performance of the best system, and rarely includes troduce a novel regularisation approach, and illus- details on how its seemingly optimal parameters were dis- trate how they outperform the state-of-the-art on a covered. As only a single score is reported it remains unclear large benchmark dataset. We investigate the suit- how this peak performance compares with the average during ability of each model for HAR, across thousands parameter exploration, and how likely a practitioner would be of recognition experiments with randomly sampled to discover a system that performs similarly well. model configurations, explore the impact of hy- In this paper we provide the first unbiased and systematic perparameters using the fANOVA framework, and exploration of the performance of state-of-the-art deep learn- provide guidelines for the practitioner who wants ing approaches on three different recognition problems typi- to apply deep learning in their problem setting. cal for HAR in ubicomp. The training process for deep, con- volutional, and recurrent models is described in detail, and 1 Introduction we introduce a novel approach to regularisation for recur- Deep learning represents the biggest trend in machine learn- rent networks. In more than 4,000 experiments we investigate ing over the past decade. Since the inception of the umbrella the suitability of each model for different tasks in HAR, ex- term the diversity of methods it encompasses has increased plore the impact each model’s hyper-parameters have on per- rapidly, and will continue to do so driven by the resources of formance, and conclude guidelines for the practitioner who both academic and commercial interests. Deep learning has wants to apply deep learning to their application scenario. become accessible to everyone via machine learning frame- Over the course of these experiments we discover that re- works like Torch7 [Collobert et al., 2011], and has had sig- current networks outperform the state-of-the-art and that they nificant impact on a variety of application domains [LeCun et allow novel types of real-time application of HAR through al., 2015]. sample-by-sample prediction of physical activities. One field that has yet to benefit from deep learning is Hu- man Activity Recognition (HAR) in Ubiquitous Computing 2 Deep Learning in Ubiquitous Computing (ubicomp). The dominant technical approach in HAR in- Movement data collected with body-worn sensors are multi- cludes sliding window segmentation of time-series data cap- variate time-series data with relatively high spatial and tem- tured with body-worn sensors, manually designed feature ex- poral resolution (e.g. 20Hz - 200Hz). Analysis of this data traction procedures, and a wide variety of (supervised) clas- in ubicomp is typically follows a pipeline-based approach sification methods [Bulling et al., 2014]. In many cases, [Bulling et al., 2014]. The first step is to segment the time- 1533 Figure 1: Models used in this work. From left to right: (a) LSTM network hidden layers containing LSTM cells and a final softmax layer at the top. (b) bi-directional LSTM network with two parallel tracks in both future direction (green) and to the past (red). (c) Convolutional networks that contain layers of convolutions and max-pooling, followed by fully-connected layers and a softmax group. (d) Fully connected feed-forward network with hidden (ReLU) layers. series data into contiguous segments (or frames), either driven been explored in two settings. First, [Neverova et al., 2015] by some signal characteristics such as signal energy [Plotz¨ investigated a variety of recurrent approaches to identify in- et al., 2012], or through a sliding-window segmentation ap- dividuals based on movement data recorded from their mo- proach. From each of the frames a set of features is ex- bile phone in a large-scale dataset. Secondly, [Ordo´nez˜ and tracted, which most commonly include statistical features or Roggen, 2016] compared the performance of a recurrent ap- stem from the frequency domain. proach to CNNs on two HAR datasets, representing the cur- The first deep learning approach explored to substitute for rent state-of-the-art performance on Opportunity, one of the this manual feature selection corresponds to deep belief net- datasets utilised in this work. In both cases, the recurrent net- works as auto-encoders, trained generatively with Restricted work was paired with a convolutional network that encoded Boltzmann Machines (RBMs) [Plotz¨ et al., 2011]. Results the movement data, effectively employing the recurrent net- were mixed, as in most cases the deep model was outper- work to only model temporal dependencies on a more ab- formed by a combination of principal component analysis stract level. Recurrent networks have so far not been applied and a statistical feature extraction process. Subsequently to model movement data at the lowest possible level, which is a variety of projects have explored the use of pre-trained, the sequence of individual samples recorded by the sensor(s). fully-connected networks for automatic assessment in Parkin- son’s Disease [Hammerla et al., 2015], as emission model 3 Comparing deep learning for HAR in Hidden Markov Models (HMMs) [Zhang et al., 2015; While there has been some exploration of deep models for Alsheikh et al., 2015], and to model audio data for ubicomp a variety of application scenarios in HAR there is still a applications [Lane et al., 2015]. lack of a systematic exploration of deep learning capabili- The most popular approach so far in ubicomp relies on con- ties. Authors report to explore the parameter space in prelim- volutional networks. Their performance for a variety of ac- inary experiments, but usually omit the details. The overall tivity recognition tasks was explored by a number of authors process remains unclear and difficult to replicate. Instead, [Zeng et al., 2014; Ronao and Cho, 2015; Yang et al., 2015; single instantiations of e.g. CNNs are presented that show Ronaoo and Cho, 2015]. Furthermore, convolutional net- good performance in an application scenario. Solely report- works have been applied to specific problem domains, such as ing peak performance figures does, however, not reflect the the detection of stereotypical movements in Autism [Rad et overall suitability of a method for HAR in ubicomp, as it re- al., 2015], where they significantly improved upon the state- mains unclear how much effort went into tuning the proposed of-the-art. approach, and how much effort went into tuning other ap- Individual frames of movement data in ubicomp are usu- proaches it was compared to. How likely is a practitioner to ally treated as statistically independent, and applications of find a parameter configuration that works similarly well for sequential modelling techniques like HMMs are rare [Bulling their application? How representative is the reported perfor- et al., 2014]. However, approaches that are able to exploit mance across the models compared during parameter explo- the temporal dependencies in time-series data appear as the ration? Which parameters have the largest impact on perfor- natural choice for modelling human movement captured with mance? These questions are important for the practitioner, sensor data. Deep recurrent networks, most notably those that but so far remain unanswered in related work. rely on Long Short-Term Memory cells (LSTMs) [Hochreiter In this paper we provide the first unbiased comparison of and Schmidhuber,