Feature Engineering for Human Activity Recognition

(IJACSA) International Journal of Advanced Computer Science and Applications, Vol. 12, No. 2, 2021 Feature Engineering for Human Activity Recognition Basma A. Atalaa1*, Ibrahim Ziedan2, Ahmed Alenany3, Ahmed Helmi4 Department of Computer and Systems Engineering Faculty of Engineering Zagazig University, Zagazig, 44519 Egypt Abstract—Human activity recognition (HAR) techniques can relatively limited-resources smart devices. Therefore, significantly contribute to the enhancement of health and life numerous studies in literature have been conducted to look for care systems for elderly people. These techniques, which suitable representative features for activities, as well as good generally operate on data collected from wearable sensors or enough recognition models [9]. Moreover, benchmark datasets those embedded in most smart phones, have therefore attracted available in literature are different in type of activities, number increasing interest recently. In this paper, a random forest-based of recorded examples for each activity, experimental settings, classifier for human activity recognition is proposed. The i.e. controlled procedure [18] whether indoor or outdoor classifier is trained using a set of time-domain features extracted environments [19], used sensors and sensor position on from raw sensor data after being segmented into windows of 5 subject body. According to aforementioned factors, there is a seconds duration. A detailed study of model parameter selection significant variance of available HAR systems accuracy in is presented using the statistical t-test. Several simulation experiments are conducted on the WHARF accelerometer conjunction with different datasets [20]. benchmark dataset, to compare the performance of the proposed HAR recognition techniques can be grouped into two main classifier to support vector machines (SVM) and Artificial Neural categories. The first is based on computer vision [21, 22] and Network (ANN). The proposed model shows high recognition the second is based on data collected from one or more rates for different activities in the WHARF dataset compared to sensors. What makes the latter approach appealing is that other classifiers using the same set of features. Furthermore, it sensors are affordable and are usually found in reasonably achieves an overall average precision of 86.1% outperforming priced smartphones. Another advantage is that computational the recognition rate of 79.1% reported in the literature using Convolution Neural Networks (CNN) for the WHARF dataset. and storage requirements for processing sensor data is less From a practical point of view, the proposed model is simple and than those required for image processing techniques. efficient. Therefore, it is expected to be suitable for In this work, the relatively challenging Wearable Human implementation in hand-held devices such as smart phones with Activity Recognition Folder (WHARF) dataset is extensively their limited memory and computational resources. investigated. This dataset is collected using a tri-axial accelerometer placed on the right wrist of subjects; hence it Keywords—Human activity recognition; random forest; feature emulates a smart watch. It is chanllenging because of its small engineering; sensor signal processing sampling rate, 32 Hz, compared to other datasets collected I. INTRODUCTION using e.g. 50 Hz sampling frequency. Real-time considerations for HAR systems require dealing with segments of data points In daily life, a person performs diverse set of activities with window length between 2 seconds and 10 seconds. such as standing up, sitting down, walking, climbing stairs, Therefore, sensors with small sampling rate will deliver fewer etc. Automatic recognition of human activities has interesting data points complicating the task of HAR system. Moreover, applications in healthcare [1], keeping track of elderly people there are 12 different activities in WHARF with few number [2], and home automation [3]. Also, it has many clinical of examples per activity [13]. The proposed approach here applications for stroke patients [4], Parkinson's disease applies data preporcessing in which signals are filtered using a patients[5], heart rate estimation [6] and in a smart health care low-pass filter and then scaled so that all features lie within environment [7]. the same range. In the second step, data is segmented into The last two decades witnessed increasing interest in windows of length 5 seconds with 50% overlapping. In the Human Activity Recognition (HAR) techniques due to the third step, several effective time-domain functions or features availability of low cost sensors specially those built-in sensors are extracted. The proposed classifier employs the Random available in affordable smartphones [8-10]. Commonly used Forest (RF) algorithm which achieves the best precision and sensor types in HAR applications are accelerometers [11-14], also the best training time compared to other classifiers such heart rate belt sensor [15], gyroscope [16, 17], magnetometer as Artificial Neural Networks (ANN) and Support Vector [17], or three-inertial sensor units mounted on chest, right Machine (SVM). The proposed system is expected to be thigh and left ankle [12]. Such inertia devices operate at low efficient and resource-friendly for smart devices. Besides, frequencies and require low sampling rates. There are several sensitivity analysis of proposed system components such as issues which make HAR task challenging such as noisy sensor RF parameters, some important features and preprocessing data, insufficient training examples due to few participating scaling step is conducted. Also, feature importance is subjects, and the need to implement HAR systems on discussed using the statistical t-test. *Corresponding Author 160 | P a g e www.ijacsa.thesai.org (IJACSA) International Journal of Advanced Computer Science and Applications, Vol. 12, No. 2, 2021 The contribution of this work can be highlighted as On the other hand, classifiers used in HAR studies can be follows: (1) introducing RF-based effective and efficient HAR classified into supervised or unsupervised. Supervised system with average precision of 86.1% and average accuracy classifiers [20] include multilayer neural networks [17, 18, 30, of 84.8% which improves the state-of-the-art rate of 79.1% for 31, 34], support vector machine (SVM) [11, 12], decision WHARF dataset, (2) testing the proposed system on the trees [30, 31], random forest [12], k-Nearest Neighbors (kNN) challenging WHARF datase which is considered in only few [12, 16] and Bayes classifier [16, 25]. Unsupervised studies in literature [23] and [24], (3) discussing the practical technique, on the other hand, include Gaussian mixture model implementation issues of proposed system which is important (GMM) [13], linear-discriminant analysis [27, 28], minimal in case of further system application on smart devices, and (4) learning machine (MLM) [16], k-means clustering, conducting sensitivity analysis of important system convolutional neural networks (CNN) [35-37] and hidden components to determine the optimal settings for proposed Markov model (HMM) [12]. system. III. TIME-DOMAIN AND STATISTICAL FEATURES The rest of this paper is organized as follows. In Section II, relevant related work in the literature is reviewed. The set of In this section, the set of features extracted from pre- features to be employed and the proposed Random Forest- processed raw acceleration signals is listed. It is assumed that based classifier are presented in Sections III and IV, there is a three-dimensional dataset of size N data points respectively. In Section V, a set of experiments are conducted collected from an accelerometer or a gyroscope, ax(i), ay(i), to evaluate the performance of the proposed model and az(i), i =1, 2, · · · , N, for the x, y, and z dimensions. The data is compare it to other machine learning techniques. Sensitivity first filtered using low pass filter to reduce noise and extract analysis is preformed to optimally select the parameters of the the body acceleration bx(i), by(i), bz(i) and gravity acceleration proposed model in Section VI. Finally, conclusions and gx(i), gy(i), gz(i) components [24]. possible future work are drawn in Section VII. The set of features to be employed in classification are derived from both body and gravity acceleration signals as II. RELATED WORK listed in Table I. The body acceleration signal features include The HAR procedure from preprocessed raw sensory data the mean (M) and standard deviation (STD) of filtered signals, can be divided into two steps: (1) extracting relevant key autoregressive model coefficients, signal magnitude area, tilt features from collected data signals (so-called feature angle, mean, standard deviation, entropy of jerk of signals, engineering), and (2) classifying the observed activity based mean, standard deviation, power and entropy of jerk of roll on the extracted features. The reduction of data dimensionality angle. For gravity acceleration component, the signal power may can also be required using e.g. principle component along each axis and the mean of angle of x-axis component are analysis [25]. Due to the diversity of feature types and the used. classifiers that can be used in these two steps, respectively, the literature of HAR problem is wide and extensive. IV. THE PROPOSED MODEL Sensors such as tri-axial accelerometer and gyroscope The proposed classifier consists of three stages as shown provide time domain acceleration

Load more