Knowledge Extracted from Recurrent Deep Belief Network for Real Time
Total Page:16
File Type:pdf, Size:1020Kb
Knowledge Extracted from Recurrent Deep Belief Network for Real Time Deterministic Control Shin Kamada Takumi Ichimura Graduate School of Information Sciences, Faculty of Management and Information Systems, Hiroshima City University Prefectural University of Hiroshima 3-4-1, Ozuka-Higashi, Asa-Minami-ku, 1-1-71, Ujina-Higashi, Minami-ku, Hiroshima, 731-3194, Japan Hiroshima, 734-8559, Japan Email: [email protected] Email: [email protected] Abstract—Recently, the market on deep learning including ately to determine the next move. The speed of the events not only software but also hardware is developing rapidly. being controlled is an important and critical factor in a real- Big data is collected through IoT devices and the industry time operation. Among events, a machine must respond the world will analyze them to improve their manufacturing process. Deep Learning has the hierarchical network architecture to determined next position within absolute limit response time represent the complicated features of input patterns. Although to predict the following events in its environment. A machine deep learning can show the high capability of classification, must realize a strict control between hard and soft real-time prediction, and so on, the implementation on GPU devices are necessities. required. We may meet the trade-off between the higher precision Deep learning algorithm works the implementation of task by deep learning and the higher cost with GPU devices. We can success the knowledge extraction from the trained deep learning on GPU (Graphics Processing Unit) device. In a hard real- with high classification capability. The knowledge that can realize time system such as embed system with GPU, the machine faster inference of pre-trained deep network is extracted as IF- need a high cost to build an overall system. But in a soft real- THEN rules from the network signal flow given input data. Some time system, a late response makes an inevitable result such experiment results with benchmark tests for time series data sets as the deterioration of production efficiency. We may meet the showed the effectiveness of our proposed method related to the computational speed. trade-off between the higher precision by deep learning and the higher cost with GPU devices. I. INTRODUCTION Deep Learning has the hierarchical network architecture Recently, the market on deep learning including not only to represent the complicated features of input patterns [3]. software but also hardware is developing rapidly. According Such architecture is well known to represent higher learning to the new market research report [1], this market is expected capability compared with some conventional models if the best to be worth more than USD 1770 billion by 2022, growing at set of parameters in the optimal network structure is found. We a CAGR (Compound Annual Growth Rate) of 65.3% between have been developing the adaptive learning method that can 2016 and 2022. discover the optimal network structure in Deep Belief Network Many new algorithms in various learning structures of (DBN) [4], [5], [6]. The learning method can construct the deep learning have been reported and a new methodology network structure with the optimal number of hidden neurons on Artificial Intelligence (AI) has permeated through industry in each Restricted Boltzmann Machine [7] and with the optimal number of layers in the DBN during learning phase. arXiv:1807.03954v1 [cs.NE] 11 Jul 2018 world. In 2016, the IoT (Internet of Things) Acceleration Consortium was established in Japan with the aim of cre- Moreover, we develop the recurrent neural network based ating an adequate environment for attracting investment in Deep Belief Network (DBN) to make a higher predictor to the future with the IoT through public-private collaboration. the time series data set [8]. The collected big data though IoT technologies is analyzed However, the implementation of our developed deep learn- by AI technologies including deep learning. Deep learning ing method required some expensive GPU devices because technology becomes a popular and sophisticated technology the computation time to output feed-forward calculation is embedded in the measuring machine that the higher precision too long to realize a real-time control in the manufacturing is required [2]. process or image diagnosis device. In order to make the spread For example, a machine that has a robot arm with closed- of our developed method, the operating method in the small loop control should know the precise position data immedi- embedded system or smart tablet without GPU devices is required. c 2017 IEEE. Personal use of this material is permitted. Permission from In this paper, the knowledge that can realize faster infer- IEEE must be obtained for all other uses, in any current or future media, ence of pre-trained deep network is extracted as IF-THEN including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers rules from the network signal flow given input data. Some or lists, or reuse of any copyrighted component of this work in other works. experiment results with benchmark test for time series data set showed the effectiveness of our proposed method related to the computational speed. c(1) h(1) c(2) h(2) . c(t) h(t) . c(T) h(T) II. ADAPTIVE LEARNING METHOD OF RNN-DBN W W Recurrent Neural Network Restricted Boltzmann Machine uh b(1) v(1) b(2) v(2) . b(t) v(t) . b(T) v(T) (RNN-RBM) [9] is known to be a unsupervised learning W uv W algorithm for a time series data based on RBM model [7]. vu Fig. 1 shows the network structure of RNN-RBM. The model Wuu u(0) u(1) u(2) . u(t) . u(T) forms a directed graphical model consisting of a sequence of RBMs such as a recurrent neural network as well as Temporal Fig. 1. Recurrent Neural Network RBM[9] RBM (Fig. 2) or Recurrent TRBM (Fig. 3) [10]. The RNN-RBM model has the state of representing con- texts in time series data, u ∈ {0, 1}K, related to past sequences of time series data in addition to the visible Whh neurons and the hidden neurons of the traditional RBM. h(0) h(1) h(2) . h(t) . h(T) Let the sequence of input sequence with the length T be c(1) c(2) c(t) c(T) t V = {v(1), ··· , v(t), ··· , v(T )}. The parameters b( ) and c(t) W for the visible layer and the hidden layer, respectively are calculated from u(t−1) for time t − 1 by using Eq.(1) and (t) (1) (2) (t) (T) Eq.(2). The state u at time t is updated by using Eq.(3). v v v v b(t) b W u(t−1) = + uv , (1) Fig. 2. Temporal RBM[10] (t) (t−1) c = c + W uhu , (2) (t) (t−1) (t) u = σ(u + W uuu + W vuv ), (3) u(0) where σ() is a sigmoid function. is the initial state Whh h(0) h(1) h(2) . h(t) . h(T) which is given a random value. At each time t, the learn- c(1) c(2) c(t) c(T) t ing of traditional RBM can be executed with b( ) and t '(1) '(2) '(t) '(T) c( ) at time t and weights W between them. After the h W h h h W error are calculated till time T , the gradients for θ = hv b c W u W W W W { , , , , uv, uh, vu, uu} are updated to trace v(1) v(2) v(t) v(T) from time T back to time t by BPTT (Back Propagation b(1) b(2) b(t) b(T) Through Time) method [11], [12]. Fig. 3. Recurrent Temporal RBM[10] We proposed the adaptive learning method of RNN-RBM with self-organization function of network structure according to a given input data [8]. The optimal number of hidden neurons can be automatically determined according to the variance of weight and parameter of the hidden neurons during hidden neurons hidden neurons the learning by neuron generation / annihilation algorithms as h0 h1 h0 hnew h1 shown in Fig.4 [4], [5]. The structure of general deep belief generation network as shown in Fig.5 is the accumulating 2 or more RBMs. For RNN-RBM, an ingenious contrivance is required v v v v to treat time series data. RNN-RBM was improved by building 0 v1 v2 3 0 v1 v2 3 the hierarchical network structure to pile up the pre-trained visible neurons visible neurons RNN-RBM. As shown in Fig. 6, the output signal of hidden t (a) Neuron Generation neuron h( ) at time t can be seen as the input signal of visible hidden neurons hidden neurons neuron v(t) of the next layer of RNN-RBM. We also proposed h h h h h h the adaptive learning method of RNN-DBN that can determine 0 1 2 0 1 2 the optimal number of hidden layers for a given input data [6]. annihilation III. KNOWLEDGE DISCOVERY v0 v v v3 v0 v v v3 The knowledge discovery is one of the approaches that 1 2 1 2 we should solve in Deep Learning. The inference with the visible neurons visible neurons trained deep neural network is considered to realize highly (b) Neuron Annihilation classification capability. However, the network trained by the Fig. 4. Adaptive RBM deep neural network forms a black box. It is difficult for us hidden layer 3 ... h3 extracted by C4.5. However, the teach signal to the input data are not given and the output pattern are determined by the feed b3, c3, W3 forward calculation.