Utilizing Hierarchical Extreme Learning Machine Based Reinforcement Learning for Object Sorting
Total Page:16
File Type:pdf, Size:1020Kb
International Journal of Advanced and Applied Sciences, 6(1) 2019, Pages: 106-113 Contents lists available at Science-Gate International Journal of Advanced and Applied Sciences Journal homepage: http://www.science-gate.com/IJAAS.html Utilizing hierarchical extreme learning machine based reinforcement learning for object sorting Nouar AlDahoul *, ZawZaw Htike Mechatronics Department, International Islamic University Malaysia, Kuala Lumpur, Malaysia ARTICLE INFO ABSTRACT Article history: Automatic and intelligent object sorting is an important task that can sort Received 22 August 2018 different objects without human intervention, using the robot arm to carry Received in revised form each object from one location to another. These objects vary in colours, 6 December 2018 shapes, sizes and orientations. Many applications, such as fruit and vegetable Accepted 7 December 2018 grading, flower grading, and biopsy image grading depend on sorting for a structural arrangement. Traditional machine learning methods, with Keywords: extracting handcrafted features, are used for this task. Sometimes, these Object sorting features are not discriminative because of the environmental factors, such as Reinforcement learning light change. In this study, Hierarchical Extreme Learning Machine (HELM) is Hierarchical extreme learning-machine utilized as an unsupervised feature learning to learn the object observation Deep learning directly, and HELM was found to be robust against external change. Feature learning Reinforcement learning (RL) is used to find the optimal sorting policy that maps each object image to the object’s location. The reason for utilizing RL is lack of output labels in this automatic task. The learning is done sequentially in many episodes. At each episode, the accuracy of sorting is increased to reach the maximum level at the end of learning. The experimental results demonstrated that the proposed HELM-RL sorting can provide the same accuracy as the labelled supervised HELM method after many episodes. © 2018 The Authors. Published by IASE. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/). 1. Introduction defective fruits from normal ones (Pandey et al., 2013). In the Japanese automobile industry, *Object sorting is one of the most important Japanese cucumbers have been graded by size, automatic tasks, with the objective of recognizing shape, colour, and other attributes, using deep different objects varied in colours, sizes, shapes and learning to sort cucumbers into nine different orientations that map each object to its specific classes. Sorting and grading of flowers were also location. Sorting has an important role in the applied in the greenhouse and market (Sun et al., production line, which has attracted many 2017) using the multi-input convolutional neural researchers to utilize the vision-based techniques to network for the flower sorting. The variable changes increase productivity, using the automatic sorting in the visual appearance of the fruits and vegetables, systems (Tho et al., 2016; Tho and Thinh, 2015). as well as the features extracted make the sorting Application of object sorting task is common in task more challenging (Susnjak et al., 2013). agricultural, industrial, and medical sectors. Fruits However, many efforts are still being made to and vegetables are the examples of objects that need improve the accuracies of sorting of fruit varieties. to be sorted and graded in the smart marketing to Object sorting can be done by different machine increase the production. Traditional image learning techniques, such as supervised learning and processing techniques have been used for grading of unsupervised learning. In supervised learning, many fruits into different categories, such as size, shape, image samples are labelled manually to perform the colour and texture. Colour-based fruit grading was classification. Above that, the expert knowledge is used to extract colour features to identify the required to develop the input/output pairs and this knowledge is not always available. Traditional hand- crafted features depend on colour, length, blob, * Corresponding Author. corner or edge. These methods are application Email Address: [email protected] (N. AlDahoul) dependent (different features for different https://doi.org/10.21833/ijaas.2019.01.015 Corresponding author's ORCID profile: applications). Above that, the features are not https://orcid.org/0000-0001-5522-0033 adaptive to the environmental changes, such as 2313-626X/© 2018 The Authors. Published by IASE. lighting. Features learning take its place as a robust This is an open access article under the CC BY-NC-ND license method against external change. (http://creativecommons.org/licenses/by-nc-nd/4.0/) 106 Nouar AlDahoul, ZawZaw Htike /International Journal of Advanced and Applied Sciences, 6(1) 2019, Pages: 106-113 Different deep models were used for significantly. ELM is used in the last layer for classification and recognition, and these models classification/regression (Huang et al., 2006), and H- require long training because of weights fine-tuning. ELM has a good generalization and efficient learning Graphical Processing Unit (GPU) is used to speed up time. Please refer to Tang et al. (2016) for more the learning. Moreover, extreme learning machine details concerning H-ELM. with multiple layers has been demonstrated to be fast deep models without weights fine-tuning (Tang 2.2. Reinforcement learning et al., 2016). The input weights are generated randomly, the output weights are calculated Reinforcement learning, identified as one of the analytically, and HELM can be run on the Central significant learning methods, focuses on how agents Processing Unit (CPU). Above that, their perform optimal actions to get the maximum value of performances are comparable with other deep the discounted cumulative reward formulated in Eq. models in the terms of accuracy and learning time 1 (Sutton and Barto, 2018). (AlDahoul et al., 2018). HELM-RL technique was ∞ 푇 utilized for maze navigation (Aldahoul et al., 2017), 푅 = ∑푇=0 훾 푟푇+1 (1) and it was found to outperform gradient based auto- encoder in term of learning time. It also provided a where 0 <γ<1 represents the discounted factor. comparable performance with the principal RL framework is represented as a Markov component analysis in term of accuracy. decision process (MDP), which differs from the The objective of this study is to utilize the fast conventional learning, and it does not require feature learning of HELM in reinforcement learning previous information about the environmental to find optimal actions after observing high model. The basic blocks of the RL model for the dimensional visual data for objects sorting task. The sorting task are: novelty of this work is as follows: Environment observations O: images of objects in This is the first work that utilizes HELM based RL the start region. as a fast-deep reinforcement model for object Agent actions A: selection of orientation and sorting task. location. RL is utilized to learn the optimal behaviour Reward R: the reward is given to the agent after automatically without human intervention (no selecting an action. It is +1 for a positive action and prior knowledge or labels). -1 for a negative one. Reward supervised learning approach is proposed to generate rewards as a replacement of pre- Q-learning is one of the most common and useful defined reward function. RL algorithms. It is a model-free method. Q-learning depends on updating value function in value The paper is structured as follows: In section 2, iteration algorithm, and its value function is HELM feature learning, ELM classification, and formulated in Eq. 2. The resultant optimal policy is reinforcement learning methods are summarized. formulated in Eq. 3. The main steps of the proposed HELM-RL agent are ′ also explained. Section 3 discusses the experimental 푄푓(푠, 푎) = 푄푓(푠, 푎) + 훼(푅 + 훾 푚푎푥푎′ (푄푓(푠′, 푎 )) − results and the analysis. The comparison between 푄푓(푠, 푎)) (2) HELM multi-labelled supervised classification and π (s) = 푎푟푔푎 max (푄푓(푠, 푎)) (3) the proposed HELM-RL is also demonstrated in term of testing accuracy. Section 4 demonstrates the where Qf represents the value function, α is the rate efficiency of the proposed system by summarizing of learning. the outcome of this work. 2.3. Classification with extreme learning machine 2. Methodology Extreme learning machine (ELM) is different 2.1. Hierarchical ELM for feature learning neural network architecture with a feed forward property, which consists of a single hidden layer. The Instead of using hand-engineered features, deep ability of generalization and efficient learning time models automatically extract hierarchical abstract are the main reasons to make this method successful representations from the data. Hierarchical extreme (Huang et al., 2006). The weights and biases of the learning machine is a fast-deep model used to learn hidden layers are given in a random way. However, features automatically by utilizing unsupervised the output weights are found analytically. sparse ELM auto-encoder (Tang et al., 2016). The sparse ELM encoder utilizes the fast-iterative 푀 푓 (푥) = ∑푖=1 퐹푖(푥, 푊푖 , 푏푖). 훽푖 (4) shrinkage-thresholding (FISTA) algorithm, and H- 푑 푊푖 휖 푅 , 푏푖 , 훽푖 휖 푅 ELM does not require the encoder’s weights to be fine-tuned iteratively. This feature assists in where Fi (•) is the activation function of i-th hidden reducing the time used for learning/ training neuron, bi is the bias, Wi is the input weight, βi is the 107 Nouar AlDahoul, ZawZaw Htike /International Journal of Advanced and Applied Sciences, 6(1) 2019, Pages: 106-113 output weight, and M is the nodes number in the finally get a reward. This process is repeated until hidden layer. achieving the optimal performance. In the testing stage, the image of the object in the start area is † 푇 1 푇 −1 훽 = 퐺 푇 , 훽 = 퐺 ( + 퐺 . 퐺 ) . 푇 (5) mapped to the optimal action directly.