Motion Primitive-Based Human Activity Recognition Using a Bag-Of-Features Approach

Motion Primitive-Based Human Activity Recognition Using a Bag-of-Features Approach Mi Zhang Alexander A. Sawchuk Signal and Image Processing Institute Signal and Image Processing Institute Ming Hsieh Department of Electrical Engineering Ming Hsieh Department of Electrical Engineering University of Southern California University of Southern California Los Angeles, CA 90089 USA Los Angeles, CA 90089 USA [email protected] [email protected] ABSTRACT 1. INTRODUCTION Human activity modeling and recognition using wearable In pervasive healthcare, human activity analysis and recog- sensors is important in pervasive healthcare, with applica- nition plays a central role because the specific activities peo- tions including quantitative assessment of motor function, ple perform in their daily lives can be used to assess the rehabilitation, and elder care. Previous human activity recog- fitness of human body and quality of life. Traditionally, nition techniques use a “whole-motion” model in which con- activity analysis is studied as a computer vision problem tinuous sensor streams are divided into windows with a fixed where human activities are captured by cameras deployed in time duration whose length is chosen such that all the rel- the infrastructure. The major drawback of the vision-based evant information in each activity signal can be extracted platform is its inability to track people beyond the reach from each window. In this paper, we present a statistical of the cameras. The emergence of wearable sensor systems motion primitive-based framework for human activity repre- attempts to address this problem. These sensors are minia- sentation and recognition. Our framework is based on Bag- turized such that they can be worn on the human body and of-Features (BoF), which builds activity models using his- continuously capture people’s activity signals unobtrusively. tograms of primitive symbols. We experimentally validate Most wearable sensor-based activity recognition techniqu- the effectiveness the BoF-based framework for recognizing es represent activities using a“whole-motion”model in which nine activity classes and evaluate six factors which impact continuous sensor streams are divided into fixed-length win- the performance of the framework. The factors include windows. The window length is properly chosen such that all dow size, choices of features, methods to construct motion the information of the activity can be extracted from each primitives, motion vocabulary size, weighting schemes of window. Features are then extracted from the window which motion primitive assignments, and learning machine kernel are used as input to the classifier for classification. Although functions. Finally, we demonstrate that our statistical BoF- this “whole-motion” model has proven very effective in ex- based framework can achieve much better performance com- isting studies, the performance is highly dependent on the pared to a non-statistical string-matching-based approach. window length [1]. As a possible solution to this problem, motion primitive-based models were proposed and have recently attracted numerous research attention. Categories and Subject Descriptors The motion primitive-based models are inspired by the I.5.4 [Pattern Recognition]: Applications; J.3 [Computer similarity of human speech signals and human motion [2]. Applications]: Life and Medical Sciences In human speech recognition, sentences are first divided into isolated words, which are then divided into a sequence of phonemes. Models are first built for the approximately 50 General Terms phonemes shared by all words (in English). These phoneme Algorithms, Design, Experimentation, Performance models then act as the basic building blocks to build words and sentences in a hierarchical manner [3]. Following the Keywords same idea, in motion primitive-based model, each activity is represented as a sequence of motion primitives which act as Pervasive Healthcare, Wearable Sensing Technologies, Hu- the smallest units to be modeled. Different from the “whole- man Activity Recognition, Motion Primitives, Pattern Recog- motion” model that examines the global features for human nition, Bag-of-Features activities, motion primitives capture the invariance aspects of the local features and more importantly, provide insights for better understanding of human motion. The key issues related to the motion primitive-based model Permission to make digital or hard copies of all or part of this work for are: (1) constructing meaningful motion primitives that con- personal or classroom use is granted without fee provided that copies are tain salient motion information; and (2) representing activ- not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to ities based on the extracted primitives. Most existing ap- republish, to post on servers or to redistribute to lists, requires prior specific proaches construct primitives either using fixed-length win- permission and/or a fee. dows with identical temporal/spatial duration or through IHI’12, January 28–30, 2012, Miami, Florida, USA. clustering. Each window is then mapped to a symbol ac- Copyright 2012 ACM 978-1-4503-0781-9/12/01 ...$10.00. cording to a specific mapping rule. As a consequence, the work. The first goal is to explore the feasibility of applying continuous activity signal is transformed into a string of a BoF-based framework for human activity recognition and symbols where each symbol represents a primitive. Figure 1 examine whether BoF can achieve better performance com- shows an example on two activity classes: walking forward pared to the string-matching-based approach. Our second (top) and running (bottom). For illustration purposes, a goal is to perform a thorough study on several factors which total of five motion primitives are used (labeled A, B, C, could impact the performance of the framework. These fac- D, E in different colors). In this example, walking forward tors include the size of windows, choices of features, meth- contains five types of motion primitives (A, B, C, D, E) ods to construct motion primitives, size of motion vocab- while running contains four (B, C, D, E). For both activi- ulary, weighting schemes of motion primitive assignments, ties, the first line shows the original sensor signal and the and kernel functions of the learning machines. second line shows the primitive mapping of the original sen- The rest of this paper is organized as follows. Section 2 sor signal. Below these are five lines showing the locations gives a brief survey of some recent work on human activ- of the five motion primitives in the signal. The last line ity recognition. Section 3 introduces the sensing platform is a sample of the symbol string. To build activity models and dataset used for this study. Section 4 describes the basic idea of BoF and outlines the key components of the BoF framework. Section 5 presents our experimental results on the evaluations of these factors and compares the performance between BoF and the traditionally used string- matching-based approach. Finally, section 6 concludes this paper and establishes directions for future work. 2. RELATED WORK In keeping with our recognition technique, we broadly group existing activity recognition methods into two categories based on the granularity level human activities are modeled: “whole-motion”-based methods and motion primitive based methods. In this section, we review some recent work from each category respectively. In the case of “whole-motion” model, different combina- tions of features and classifiers have been extensively studied on different sets of activities. In [11], Bao et al. studied sta- Figure 1: An example of activity representation tistical and frequency domain features in conjunction with (walking forward (top) and running (bottom)) us- four classifiers including decision trees (C4.5), decision ta- ing five motion primitives (labeled A, B, C, D, E in bles, naive Bayes and nearest-neighbor. Among these classi- different colors). fiers, the decision tree achieved the best performance with an overall recognition accuracy of 84%. Ravi et al. in [12] used based on these extracted primitives, one common strategy similar features as in [11]. They compared the performance is to adopt a string-matching-based approach. Specifically, of various base-level classifiers with meta-level classifiers in- in the training stage, for each activity class, a string which cluding Bagging, Boosting, Plurality Voting, and Stacking. minimizes the sum of intra-class distances is created and acts Based on the experimental results, they concluded that us- as a template to represent all training instances belonging ing meta-classifiers was in general effective. In particular, to that class. Since different strings in general do not have combining classifiers using Plurality Voting turned out to the same length, the distances between them are normally be the best classifier. measured by edit distance (Levenshtein distance) [4]. In Recently, motion primitive-based approaches receive nu- the recognition stage, the test instance is first transformed merous research attention due to their capability of captur- into the primitive string, and then classified to the activity ing local characteristics of activity signals. In [6], motion class whose template matches the test instance the best. Al- primitives were constructed by dividing the activity trajec- though this string-matching-based strategy shows competi- tory into fixed-length windows with identical spatial dura- tive performance in both vision-based and wearable sensor- tion, where each window was mapped to a motion primi- based activity recognition tasks [5] [6] [7] [8], the main draw- tive based on its trajectory direction in the Cartesian space. back is its high sensitivity to noise and its poor performance The problem of activity recognition was then formulated in the presence of high intra-class variation [9]. Under such as a standard string-matching problem. Fihl et al. in [5] conditions, it is extremely difficult to extract a meaningful took a similar idea but replaced the standard deterministic template for each activity class.

Motion Primitive-Based Human Activity Recognition Using a Bag-Of-Features Approach

Channel Selective Activity Recognition with Wifi: a Deep Learning Approach Exploring Wideband Information

Human Activity Analysis and Recognition from Smartphones Using Machine Learning Techniques

Human Activity Recognition Using Computer Vision Based Approach – Various Techniques

Vision-Based Human Tracking and Activity Recognition Robert Bodor Bennett Jackson Nikolaos Papanikolopoulos AIRVL, Dept

Fine-Grained Activity Recognition by Aggregating Abstract Object Usage

Joint Recognition of Multiple Concurrent Activities Using Factorial Conditional Random Fields

Real-Time Physical Activity Recognition on Smart Mobile Devices Using Convolutional Neural Networks

LSTM Networks for Mobile Human Activity Recognition

Convolutional Neural Networks for Human Activity Recognition Using Mobile Sensors

Activity Recognition Using Cell Phone Accelerometers

Deep Learning Based Human Activity Recognition Using Spatio-Temporal Image Formation of Skeleton Joints

Human Activity Recognition Using Deep Learning Models on Smartphones and Smartwatches Sensor Data