Extended Hierarchical Extreme Learning Machine with Multilayer Perceptron

196 ECTI TRANSACTIONS ON COMPUTER AND INFORMATION TECHNOLOGY VOL.10, NO.2 November 2016 Extended Hierarchical Extreme Learning Machine with Multilayer Perceptron Khanittha Phumrattanaprapin 1 and Punyaphol Horata2, Non-members ABSTRACT and application developers have applied ELM in vari- The Deep Learning approach provides a high per- ous applications, such as face classification [6], image formance of classification, especially when invoking segmentation [7], human action recognition [8], and image classification problems. However, a shortcom- power-load forecasting [9, 10]. ming of the traditional Deep Learning method is the Although ELM is a fast learning method, many large time scale of training. The hierarchical extreme researchers have developed numerous ways to im- learning machine (H-ELM) framework was based on prove the ELM. Kasun, et al., 2003 [11] developed an the hierarchical learning architecture of multilayer adapted version of the ELM, referred to as the Mul- perceptron to address the problem. H-ELM is com- tilayer Extreme Learning Machine (ML-ELM), which posed of two parts; the first entails unsupervised mul- increases the number of layers in each network. ML- tilayer encoding, and the second is the supervised fea- ELM learning is achieved through a building block ture classification. H-ELM can give a higher accuracy approach using an ELM-based autoencoder for clas- rate than the traditional ELM. However, there still re- sification problems. Each layer of the training net- mains room to enhance its classification performance. work is represented as hierarchical learning, stacked This paper therefore proposes a new method termed layer by layer, suitable for learning within large sized the extending hierarchical extreme learning machine datasets. In ML-ELM training, input weights and bi- (EH-ELM), which extends the number of layers in the ases are orthogonal and randomly generated within supervised portion of the H-ELM from a single layer hidden parameters. Through this method, the ML- to multiple layers. To evaluate the performance of ELM demonstrated superior performance over the the EH-ELM, the various classification datasets were traditional ELM within large datasets. The ML-ELM studied and compared with the H-ELM and the mul- differs from other multilayer schemes, such as the tilayer ELM, as well as various state-of-the-art such Deep Belief Network (DBN) [12] and the Deep Boltz- deep architecture methods. The experimental results mann Machine (DBM) [13], because it does not need show that the EH-ELM improved the accuracy rates to fine-tune its parameters. In the decision making over most other methods. process of the ML-ELM, prior to the supervised least mean square optimiaztion, the encoding outputs are Keywords: Hierarchical Extreme Learning Ma- directly fed into the last layer, which calculates out- chine, Hierarchical learning, Multilayer Perceptron put weights, without random feature mapping. While the ML-ELM might not use all of the learning advan- 1. INTRODUCTION tages of the ELM [1], the universal approximation capability [14] of the ELM requires the feature map- The Extreme Leaning Machine (ELM) is one of the ping of inputs in order to calculate the final results. most popular algorithms for regression and classifica- Meanwhile, the hierarchical learning schemes are tion. Guang-Bin Huang, et al., 2012 [1] proposed designed to be usable with large datasets. Example an ELM based on the least squares approach. The algorithms include the deep learning (DL) approaches ELM is capable of learning from a simple structure [15,16], in which the deep architectural [17] learn- called the single layer feedforward neuron networks ing schemes are represented as multiple levels. DL (SLFNs) [2, 3]. Within the ELM learning process, is used for complicated, multi-feature extraction, in input weights and biases are randomly generated in order to learn how to automatically produce features hidden layers which calculate and compute the out- in the higher layers from the lower layers. The extrac- put weights. The ELM offers better performance tion process is based on an unsupervised learning ap- and faster training time scales than traditional algo- proach, in which the output of the unsupervised parts rithms; such as the back propagation (BP) [4] and the provide the inputs of the supervised learniing parts. support vector machine (SVM) [5]. Many researchers The DL approach based on the BP algorithm requires multiple fine-tuning of the network parameters. The Manuscript received on October 4, 2016 ; revised on Novem- ber 24, 2016. complicated DL structure affects the learning speed, Final manuscript received on February 8, 2017. which is very slow in large sized datasets. 1;2 The authors are with the Department of Computer Sci- The hierarchical ELM (H-ELM) [18,19] was pro- ence, Faculty of Science, Khon kaen University, 123 Mitraphab Road, Nai-Meuang, Meuang, Khon kaen, 40002 Thailand, E- posed to address this problem, which combined the mail: [email protected] and [email protected] ELM with the hierarchical learning approach. The Extended Hierarchical Extreme Learning Machine with Multilayer Perceptron 197 H-ELM consists of two parts: 1) the unsupervised ELM can resolve the learning problem is as follows feature extraction part: the endoding of the input dataset via the sparse autoencoder (based on the T = Hβ (2) ELM autoencoder), and 2) the supervised feature where H is the hidden layer output matrix classification part: the single layer used to compute the final output weights, similar to the original ELM. 2 3 2 3 However, the H-ELM may modify the second part to h(x1) h1(x1) ··· hL(x1) improve classification performance. 6 7 6 7 H = 4 . 5 = 4 . .. 5 ; The limitations of the supervised classification . ··· part of the single layered H-ELM have inspired h(xN) h1(xN ) hL(xN ) the use of such deep architecture concepts, such as (3) the two-hidden-layer extreme learning machine (T- and T is a target matrix or the desired matrix as ELM),. Qu, et al., 2016 [20], which increases the the 2 3 2 3 tT t ··· t single layer of the original ELM to two layers. The 1 11 1m 6 . 7 6 . 7 performance and testing accuracy of the T-ELM were T = 4 . 5 = 4 . .. 5 : (4) higher than the those of the batch ELM. Therefore, T ··· tN tN1 tNm in order to improve the performance of the H-ELM, we propose herein a modified version of the H-ELM, where m is the dimension of matrix T for each referred to as the extended hierarchical extreme learn- 1;:::;N data. ing machine (EH-ELM), in which the second part of ELM computes the output weights as follows the proposed method (supervised classification train- y ing) extends to two or more layers. β = H T (5) The rest of this paper is organized as follow: the where Hy is the Moore-Penrose pseudo-inverse of ma- next section introduces preliminary related works, trix H. consisting of general information and concepts of the Learning of ELM is summarized in three steps as ELM. The third section describes the proposed EH- the following: ELM framework, and the fourth section contains the Step 1: Randomly generated parameters consist of performance evaluation with which to verify the effi- input weights wij and bias bj for hidden layer. ciency of the EH-ELM with the various classification Step 2: Calculate the hidden layer output matrix datasets and the experimental results of other MLP H. learning algorithms. Eventually, we draw conclusions Step 3: Compute the output weights β. in the last section. 2.2 Hierarchical Extreme Learning Machine 2. RELATED WORKS (H-ELM) In this section, we will describe the related works Hierarchical Extreme Learning Machine (H-ELM) to prepare some essential background knowledge. proposed by Tang et al. [18]. H-ELM is a new multi- There are comprised of the conventional ELM [1], H- layer perceptron (MLP) algorithm for training data. ELM framework [18] and T-ELM [20]. The data before learning should be transformed to the ELM random feature space, and the H-ELM al- 2.1 Extreme Learning Machine (ELM) gorithm has two parts as shown in Fig.1: ELM is a learning algorithm, which represented 1) The first part is the unsupervised learning to for SLFN with L dimension random feature space. extract features of the input samples as the hierar- The hidden neurons are randomly generated the pa- chical representation. This part consists of N-layers rameters (input weights and biases). Let (xj ; tj ), to transform the input samples to sparse high-level j = 1;:::;N be a sample of the N distinct samples features, and the output of each layer is computed as where xj is the j-th input sample, tj is the target · vector of xj . The input samples are mapped to L di- Hi = g(HHi−1 β) (6) mension ELM feature spaces, and the output of net- where H is the random mapping output of the i-th work is defined as follows i layer, Hi−1 is a previous random mapping output of X L Hi layer, g(·) is an activation function, β is the hidden fL(xxx) = g(wj x + bj) = h(x)β (1) j=1 layer weights. Moreover, each layer extracts features of the input samples using the ELM sparse autoen- where w denotes the input weight vector between j coder, which it is defined the objective function as the input layer and the output layer, bjis the bias of jth layer, β = [β ;:::; β ]T is the output weight { } 1 L O = argmin kHβ − Xk2 + kβk (7) matrix, g(·) is the activation function, x is the in- β β `1 T put sample and pmbh(x) = [g1(xxx); : : : ; gL(xxx)] is the where X is the input data, H is the random mapping vector of the output matrix H input x.

Load more