A Video Recognition Method by Using Adaptive Structural Learning Of
Total Page:16
File Type:pdf, Size:1020Kb
A Video Recognition Method by using Adaptive Structural Learning of Long Short Term Memory based Deep Belief Network Shin Kamada Takumi Ichimura Advanced Artificial Intelligence Project Research Center, Advanced Artificial Intelligence Project Research Center, Research Organization of Regional Oriented Studies, Research Organization of Regional Oriented Studies, Prefectural University of Hiroshima and Faculty of Management and Information System, 1-1-71, Ujina-Higashi, Minami-ku, Prefectural University of Hiroshima Hiroshima 734-8558, Japan 1-1-71, Ujina-Higashi, Minami-ku, E-mail: [email protected] Hiroshima 734-8558, Japan E-mail: [email protected] Abstract—Deep learning builds deep architectures such as function that classifies an given image or detects an object multi-layered artificial neural networks to effectively represent while predicting the future, is required. Understanding of time multiple features of input patterns. The adaptive structural series video is expected in various kinds industrial fields, learning method of Deep Belief Network (DBN) can realize a high classification capability while searching the optimal network such as human detection, pose or facial estimation from video structure during the training. The method can find the optimal camera, autonomous driving system, and so on [10]. number of hidden neurons of a Restricted Boltzmann Machine LSTM (Long Short Term Memory) is a well-known method (RBM) by neuron generation-annihilation algorithm to train the for time-series prediction and is applied to deep learning given input data, and then it can make a new layer in DBN methods[11]. The method enabled the traditional recurrent by the layer generation algorithm to actualize a deep data representation. Moreover, the learning algorithm of Adaptive neural network recognizes not only short-term memory but RBM and Adaptive DBN was extended to the time-series analysis also long-term memory for given sequential data [12]. For by using the idea of LSTM (Long Short Term Memory). In this video recognition of LSTM, the idea using convolutional filter paper, our proposed prediction method was applied to Moving instead of one-dimensional neuron can be used since one frame MNIST, which is a benchmark data set for video recognition. of sequential video can be seen as one image [13]. We challenge to reveal the power of our proposed method in the video recognition research field, since video includes rich In our research, we proposed the adaptive structural learning source of visual information. Compared with the LSTM model, method of DBN [14]. The adaptive structural learning can our method showed higher prediction performance (more than find a suitable size of network structure for given input space 90% predication accuracy for test data). during its training. The neuron generation and annihilation Index Terms —Deep learning, Deep Belief Network, Adaptive algorithms [15], [16] were implemented on Restricted Boltz- structural learning method, Video recognition mann Machine (RBM) [17], and layer generation algorithm I. INTRODUCTION [18] was implemented on Deep Belief Network (DBN) [19]. The adaptive structural learning of DBN (Adaptive DBN) Recently, Artificial Intelligence (AI) with sophisticated tech- shows the highest classification capability in the research field nologies has become an essential technique in our life. [1]. Es- of image recognition by using some benchmark data sets such pecially, the recent advances in deep learning methods enable as MNIST [20], CIFAR-10, and CIFAR-100 [21]. Moreover, higher performance for several big data compared to traditional the learning algorithm of Adaptive RBM and Adaptive DBN methods [2], [3]. For example, CNNs (Convolutional Neural was extended to the time-series prediction by using the idea arXiv:1909.13480v1 [cs.NE] 30 Sep 2019 Network) such as AlexNet [4], GoogLeNet [5], VGG16 [6], of LSTM [22]. LSTM was often implemented on a CNN and ResNet [7], highly improved classification or detection structure, we implemented LSTM on our Adaptive RBM accuracy in image recognition [8]. and DBN, and then the proposed method showed higher As improvement of image recognition, deep learning is prediction accuracy than the other methods for several time- also applied to video recognition [9]. The video recognition series benchmark data sets, such as Nottingham (MIDI) and is kind of fusion task which needs both image recognition CMU (Motion Capture). and time-series prediction simultaneously. This is, recurrent For further improvement of the method, our proposed c 2019 IEEE. Personal use of this material is permitted. Permission from method was applied to Moving MNIST [23] in this paper, IEEE must be obtained for all other uses, in any current or future media, which is a benchmark data set for video recognition. We including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers challenge to reveal the power of our proposed method in or lists, or reuse of any copyrighted component of this work in other works. the video recognition research field, since video includes rich source of visual information. Compared with the LSTM model hidden neurons [24], our method make a higher performance of prediction. ... The remainder of this paper is organized as follows. In h0 h1 hJ section II, basic idea of the adaptive structural learning of DBN is briefly explained. Section III gives the description W of the extension algorithm of Adaptive DBN for time-series ij prediction. In section IV, the effectiveness of our proposed method is verified on moving MNIST. In section V, we give v ... v some discussions to conclude this paper. 0 v1 v2 I II. ADAPTIVE LEARNING METHOD OF DEEP BELIEF visible neurons NETWORK This section explains the traditional RBM [17] and DBN Fig. 1. A network structure of RBM [19] to describe the basic behavior of our proposed adaptive learning method of DBN. hidden neurons hidden neurons h0 h1 h0 hnew h1 A. Neuron Generation and Annihilation Algorithm of RBM generation While recent deep learning model has higher classification capability, some problems related to the network structure or v v v v the number of some parameters still remains to become a 0 v1 v2 3 0 v1 v2 3 difficult task as the AI research. For the problem, we have visible neurons visible neurons developed the adaptive structural learning method in RBM (a) Neuron generation model (Adaptive RBM) [14]. RBM as shown in Fig. 1 is an hidden neurons hidden neurons unsupervised graphical and energy based model on two kinds h h h h h h of layers; visible layer for input and hidden layer for feature 0 1 2 0 1 2 vector, respectively. The neuron generation algorithm of the annihilation Adaptive RBM can generate an optimal number of hidden neurons and the trained RBM is suitable structure for given v0 v v v3 v0 v v v3 input space. 1 2 1 2 The neuron generation is based on the idea of Walking Dis- visible neurons visible neurons tance (WD), which is inspired from the multi-layered neural (b) Neuron annihilation network in the paper [25]. WD is the difference between the Fig. 2. Adaptive RBM prior variance and the current one of learning parameters.RBM has 3 kinds of parameters according to visible neurons, hidden neurons, and the weights among their connections. The Adap- tive RBM can monitor their parameters excluding the visible can represent the specified features from an abstract concept to one ( The paper [14] describes the reason of the disregard).The an concrete object in the direction from input layer to output situation means that only the existing hidden neurons cannot layer. However, the optimal number of RBMs depends on the represent an ambiguous pattern, because there is the lack of the target data space. number of hidden neurons. In order to express the ambiguous We developed Adaptive DBN which can automatically patterns, a new neuron is inserted to inherit the attributes of adjust an optimal network structure by the self-organization in the parent hidden neuron as shown in Fig. 2(a). the similar way of WD monitoring. If both WD and the energy In addition to the neuron generation, the neuron annihilation function do not become small values, then a new RBM will algorithm was applied to the Adaptive RBM after neuron be generated to keep the suitable network classification power generation process as shown in Fig. 2(b). We may meet that for the data set, since the RBM has lacked the power of data some unnecessary or redundant neurons were generated due representation to draw an image of input patterns. Therefore, to the neuron generation process. Therefore, such neurons will the condition for layer generation is defined by using the total be removed the corresponding hidden neuron according to the WD and the energy function. Fig. 3 shows the overview of output activities. layer generation in Adaptive DBN. B. Layer Generation Algorithm of DBN III. ADAPTIVE RNN-DBN FOR TIME-SERIES PREDICTION A DBN is a hierarchical model of stacking the several pre- trained RBMs. For building process, output (hidden neurons In time-series prediction, some LSTM methods improve the activation) of l-th RBM can be seen as the next input of (l+1)- prediction performance of the traditional recurrent neural net- th RBM. Generally, DBN with multiple RBMs has higher data work by using the several gates such as forget-gate, peephole representation power than one RBM. Such hierarchical model connection gate, full and gradient gate [11]. These gates can pre-training between Suitable number of hidden neurons and layers 4th and 5th layers, is automatically generated. and fine-tuning for supervised learning pre-training between 3rd and 4th layers Annihilation pre-training between 2nd and 3rd layers Generation pre-training between 1st and 2nd layers Generation Input Fig. 3. Overview of Adaptive DBN represent multiple patterns of time-series sequence, that is not c(1) h(1) c(2) h(2) .