Human-Activity-Recognition.Pdf

Neurocomputing 444 (2021) 217–225 Contents lists available at ScienceDirect Neurocomputing journal homepage: www.elsevier.com/locate/neucom Human activity recognition by manifold regularization based dynamic graph convolutional networks ⇑ Weifeng Liu a, , Sichao Fu a, Yicong Zhou b, Zheng-Jun Zha c, Liqiang Nie d a College of Control Science and Engineering, China University of Petroleum (East China), Qingdao 266580, China b Faculty of Science and Technology, University of Macau, Macau 999078, China c School of Information Science and Technology, University of Science and Technology of China, Hefei 230026, China d School of Computer Science and Technology, Shandong University, Qingdao 266237, China article info abstract Article history: Deep learning has shown superiority to extract more representative features from multimedia data in Received 12 June 2019 recent years. Recently, the most typical graph convolutional networks (GCN) has achieved excellent per- Revised 29 September 2019 formance in the semi-supervised framework-based data representation learning tasks. GCN successfully Accepted 13 December 2019 generalizes traditional convolutional neural networks to encode arbitrary graphs by exploiting the graph Available online 25 November 2020 Laplacian-based sample structure information. However, GCN only fuses the static structure information. It is difficult to guarantee that its structure information is optimal during the training process and appli- Keywords: cable for all practical applications. To tackle the above problem, in this paper, we propose a manifold reg- Graph convolutional networks ularized dynamic graph convolutional network (MRDGCN). The proposed MRDGCN automatically Semi-supervised learning Human activity recognition updates the structure information by manifold regularization until model fitting. In particular, we build an optimization convolution layer formulation to acquire the optimal structure information. Thus, MRDGCN can automatically learn high-level sample features to improve the performance of data representation learning. To demonstrate the effectiveness of our proposed model, we apply MRDGCN on the semi-supervised classification tasks. The extensive experiment results on human activity datasets and citation network datasets validate the performance of MRDGCN compared with GCN and other semi- supervised learning methods. Ó 2020 Elsevier B.V. All rights reserved. 1. Introduction rithms including support vector machine [13], kernel least squares [14] and logistic regression [15] cannot extract more representa- With the advancement of science and technology, and the pop- tive sample features and meet development needs, which directly ularity of smart terminals such as smartphones and notebook com- affect the results of the image classification tasks. puters, large scale multimedia data (e.g. document, picture, audio To acquire high-level sample features from massive images, and video) are generated and uploaded to the Internet every day. deep learning (DL) was introduced and has been demonstrated to Image is one of the largest, fastest growth speed and most informa- be an effective method. In practical life, a small amount of labeled tive multimedia data carriers in the current society. Therefore, samples are readily available, whereas massive labeled samples images classification and recognition, such as human activity cannot be directly obtained because it require a lot of manpower, recognition (HAR) [1–3], face recognition [4–6], pedestrian detec- material resources and financial resources. The most successful tion [7–9] and object detection [10–12], have become an important method is semi-supervised learning with manifold regularization part of computer vision, pattern recognition and machine learning (MRSSL), which uses the manifold structure information of unla- in recent years, which can effectively analyze the content of digital beled and labeled samples distribution by regarding it as a regular- images and give correct judgments. With the development of vir- ization term of the objective function. That is to say, any two tual reality and augmented reality, HAR has attracted much atten- samples with the closer space distance generally belong to the tion in many areas including video surveillance and accident same category. Liu et al. [16] presented a kernel logistic regression warning. However, traditional conventional shallow learning algo- with Laplacian regularization for web image annotation by employing the graph Laplacian to preserve the local geometry of the potential manifold. Tao et al. [17] proposed a Hessian regular- ⇑ Corresponding author. ized support vector machines model to improve the performance E-mail address: [email protected] (W. Liu). https://doi.org/10.1016/j.neucom.2019.12.150 0925-2312/Ó 2020 Elsevier B.V. All rights reserved. W. Liu, S. Fu, Y. Zhou et al. Neurocomputing 444 (2021) 217–225 of image annotation due to the richer null space of Hessian. Liu Laplacian L, respectively. UT X represent the frequency domain sig- et al. [18] combined p-Laplacian with support vector machines nals in the Fourier domain. That is to say, spectral graph convolu- and kernel least squares by utilizing p-Laplacian to express high- tion converts the convolution of the time domain to the point order manifold distribution. Ma et al. [19] utilized hypergraph p- multiplication of the frequency domain. Laplacian to capture the complex relationships among the different However, this method is not suitable for large graphs and has a samples. very high computation cost. To over this problem, Defferrard et al. MRSSL methods are only effective for regular Euclidean data. [20] utilized the Chebyshev polynomials about the normalized There exists vast amounts of non-Euclidean data or graph data of graph Laplacian to approximate the filter g, and then proposed a arbitrary structures. In recent years, spectral convolution methods spectral convolution with K-order polynomials on graphs, i.e. have received an increasing attention and achieved better perfor- Xk mance including text classification [20–22] and image recognition ghðÞL HX ¼ hkTk L X ð2Þ [23–25]. Each node of graphs gathers its neighbors information by k¼0 the convolution operation of the Fourier domain, in other words, 2 this methods do not directly make the convolution on the graphs. In this method, L is rescaled according to L À IN. kmax represents kmax Kipf and Welling [26] presented a graph convolutional networks to the maximum eigenvalue of the normalized graph Laplacian L; L is learn the sample features by fusing the direct neighbors relation- À1 À1 P equal to I À D 2AD 2; D ¼ A . A denotes the similarity matrix ships of each node. Fu et al. [27] considered the direct and indirect N ii j ij among different samples. The Chebyshev polynomials is recursively neighbors relationships to learn richer sample features, which expressed according to T ðÞ¼X 1; T ðÞ¼X X and improved the performance of semi-supervised classification. 0 1 T ðÞ¼X 2XT À ðÞÀX T À ðÞX . Yadati et al. [28] proposed a hypergraph-based GCN for document k k 1 k 2 To further build a linear and deep model, Kipf and Welling lim- classification by using hypergraph to describe the multicultural ited the order of the Chebyshev polynomials (K ¼ 1), in other relationships among samples. words, it only considered the direct relationships between any However, the above methods depend on the static samples dis- two samples. Finally, it acquired a linear convolution layer formu- tribution, which limit the range of its application. Confronting this ðÞLþ1 ðÞL ðÞL À1 À1 ðÞL challenging problem, it is important to design a dynamic graph lation, i.e. H ¼ r B H W . B is equal to D 2ðÞA þ IN D 2. W structure learning model, which aims to automatically optimize is weight parameter matrix to be learned in the training iteration the local geometry of samples. In this paper, we propose a dynamic process. The detailed evolution process can be found in [26]. Dur- graph convolutional network based on manifold regularization ing the training process of GCN, it increased the number of training (MRDGCN) for semi-supervised classification. We introduce a man- iteration according to value of cross entropy loss objective function ifold regularization term to the objective function, which can drive and will stop until model fitting. the objective function to change over the potential sample distribution manifold. When the objective function value cannot meet 3. Manifold regularized dynamic graph convolutional networks a specific threshold, MRGCN separately updates or optimizes its manifold structure information (except the first convolution layer) MRDGCN can learn more effective sample features by continu- and network weight matrix until model fitting. After many times ously updating manifold distribution information of samples apart training iteration, our proposed MRGCN can acquire optimal struc- from the first convolution layer during the convolution process, ture information. In addition, we make an optimization and deriva- which yields better classification performance than GCN model. tion for the convolution layer formulation of GCN, and then We first introduce the traditional manifold regularized framework propose a general graph structure learning framework. Finally, to original objective function of GCN, and then propose a dynamic MRDGCN can extract more high-level sample features by fusing graph structure learning (DGSL) method. Following, we give the its dynamic structure information to improve

Human-Activity-Recognition.Pdf

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support