Collaborative Learning of Human and Computer: Supervised Actor-Critic Based Collaboration Scheme

Collaborative Learning of Human and Computer: Supervised Actor-Critic based Collaboration Scheme Ashwin Devanga1 and Koichiro Yamauchi2 1Indian Institute of Technology Guwahati, Guwahati, India 2Centre of Engineering, Chubu University, Kasugai-shi, Aichi, Japan Keywords: Actor-Critic Model, Kernel Machine, Learning on a Budget, Super Neural Network, Colbagging, Supervised Learning, Reinforcement Learning, Collaborative Learning Scheme between Human and Learning Machine. Abstract: Recent large-scale neural networks show a high performance to complex recognition tasks but to get such ability, it needs a huge number of learning samples and iterations to optimize it’s internal parameters. However, under unknown environments, learning samples do not exist. In this paper, we aim to overcome this problem and help improve the learning capability of the system by sharing data between multiple systems. To accelerate the optimization speed, the novel system forms a collaboration with human and reinforcement learning neural network and for data sharing between systems to develop a super neural network. 1 INTRODUCTION each worker (Ogiso et al., 2016). The learning machine learns corresponding worker actions to imitate During recent years, high performance computers , them. The learning machine’s outputs are integrated which we could never have imagined before, have by calculating weighted sum. The weights are deter- been developed. Recent large scale neural networks mined by means of performances of workers. By us- and its machine learning methods rely on this compu- ing such architecture, we can extract the function of tational ability. each worker by using the online learning machines. One drawback of the machine learning methods Note that even if each worker is absent, the learning for neural networks is that they require a huge num- machine substitute the absent workers. Moreover, the ber of learning samples, which is usually more than integrated solution is fed back to each worker to make the number of internal parameters used in the learn- them generate better solution. By using this mecha- ing machine. If the problem domain is unknown new nism, workers grow smarter. field, we cannot collect such large number of samples The literature (Fernandez` Anta et al., 2015) in advance. Moreover, the learning method to opti- presents a thorough review of the research related mize these parameters usually needs a large number to this field and a mathematical analysis of mod- of repeats to reach the optimal parameter values. One ern crowd-sourcing methods. Similar to our ap- solution to solve this problem is using the reinforce- proach, crowd-sourcing methods try to construct a ro- ment learning. However, the reinforcement learning bust protocol to tackle incorrect solution candidates. also wastes huge number of try-and-error testing to The classical models answer the majority of the so- get appropriate action (or label) for each situation. lution candidates to realize robustness (ANTA and Another possibility to solve this problem is us- LUIS LOPEZ, 2010) (Konwar et al., 2015). Al- ing the crowd sourcing. In the crowd sourcing, though these approaches are similar to ours, these many workers in the cyberspace collaborate to solve systems do not provide feedback about the integrated such open problems and yield many solution candi- solution to each worker. On the other hand, crowd- dates(e.g. (Konwar et al., 2015)). Although the crowd sourcing systems based on the game theoretic ap- sourcing techniques are able to collect many number proach partially provide feedback about the output of of solution candidates, the architecture is not designed some of the workers together with their inputs to each to get the best function for solving the problems. We crowd worker (Golle and Mironov, 2001) (Forges, have developed a solution to this problem. In the sys- 1986). These approaches provide feedback regarding tem, there is an online learning machine beside of the situation of each worker and the evaluation results, 794 Devanga, A. and Yamauchi, K. Collaborative Learning of Human and Computer: Supervised Actor-Critic based Collaboration Scheme. DOI: 10.5220/0007568407940801 In Proceedings of the 8th International Conference on Pattern Recognition Applications and Methods (ICPRAM 2019), pages 794-801 ISBN: 978-989-758-351-3 Copyright c 2019 by SCITEPRESS – Science and Technology Publications, Lda. All rights reserved Collaborative Learning of Human and Computer: Supervised Actor-Critic based Collaboration Scheme which is a reward, to each worker. These approaches 2 COLLABORATIVE BAGGING are similar to our new proposed method that improves SYSTEM USING SUPERVISED its ability by using a reinforcement learning manner. However, these models do not provide feedback about ACTOR CRITIC MODEL the integrated solution to each worker. In contrast, our proposed new model provides feedback as to the in- Bagging is an old concept of creating a super neural tegrated output to each worker to guide how to revise network combining the intelligence of multiple neu- his/her answer. ral networks working on the same problem but differ- This study improves the previous study (Ogiso ent learning and testing scenarios. The idea was that et al., 2016) by replacing each learning machine this could save a lot of computational power and time with a supervised actor-critic method (Rosenstein and and could also run on simpler hardware such as smart Barto, 2012). This means that each learning machine phones or raspberry-pis. also has the ability to explore new solutions by itself A rough sketch of the ColBagging system is il- without the help of each worker. lustrated in Fig 1. The system repeats two phases By using this architecture, each worker do not alternately. The first phase is the training phase, need to manage the system full time because the where each worker tries to solve a problem to be new model realizes semi-automatic learning. In other solved. Their solutions are emitted by the correspond- words, the new method also explores better solutions ing online incremental learning machine (MGRNN without our operations. (Tomandl and Schober, 2001)). At the same time, the The learning machine suitable in this situation is a performance estimator monitors the solutions from all light weight method. We found that supervised actor- workers and estimates their quality. This estimation critic model using kernel method for learning is a is usually done by the masters or by a pre-determined very light weight learning machine which can handle evaluation function. The performance estimator out- one pass learning. Using this algorithm with kernel puts the results as the weights for the all workers. method makes the calculations simpler for a computer This idea was proposed in (Ogiso et al., 2016). It to calculate thus increasing efficiency. is an improved version of the bagging techniques used Supervised Actor-Critic Model is a state of the art before but rather a sophisticated method which calcu- algorithm which runs very light for a reinforcement lates weighted averages of the weights of the input learning machine. Reinforcement Learning meth- neural networks which results in more accurate super ods are often applied to problems involving sequen- neural networks. tial dynamics and optimization of a scalar perfor- In this study, the previous system was improved mance objective, with online exploration of the ef- by introducing one variation of the reinforcement fects of actions. Supervised learning methods, on learning method: supervised-actor critic for the learn- the other hand, are frequently used for problems in- ing machine (see Fig 2). By introducing super- volving static input-output mappings and minimiza- vised actor-critic method, the solution candidate of tion of a vector error signal, with no explicit depen- each worker will be refined automatically by the ex- dence on how training examples are gathered. The ploration done by the reinforcement learning. This key feature distinguishing Reinforcement Learning means that each worker just need to help the learn- and supervised learning is whether training informa- ing machine by teaching action partly. It will not only tion from the environment serves as an evaluation sig- reduce the work of each worker but also improve the nal or as an error signal. In this model, both kinds of learning speed of each learning machine. feedback are available. To explain the scheme, the next section explains Since application of this environment for real the supervised actor-critic method used in this system. world problems would take huge amount of time it was tested on a T-Rex game similar to http://www. trex-game.skipser.com/ which was developed specifi- 3 SUPERVISED ACTOR CRITIC cally for this project. In the developed game, the height and width of a MODEL part of cactus were modified to make the game player cannot solve them easily without help. Moreover, one Supervised Actor Critic Model (Rosenstein and another jumping option was added. But these spec- Barto, 2012) is a variation of reinforcement learning ifications were not announced to the players before- algorithm that introduces human input as the super- hand. The simulations are explained in section 4. vised signal. It is well known that the reinforcement learning algorithm can be executed effectively by introducing kernel machines, which add new kernels by 795 ICPRAM 2019 - 8th International Conference on Pattern Recognition Applications and Methods Figure 1: Collaborative Bagging system: The system re- Figure 2: Improved collaborative Bagging system: The pre- peats the training and feedback teaching phases alternately. vious system was improved by replacing the MGRNN with After the repeats of them several times, the reasoning is the supervised actor-critic model. done by integrating outputs from the learning machines. itself (Xu et al., 2007)1. the T-Rex game in the test case scenario. The state We found that when this model is coupled with variables of the game are supplied to the actor model Kernel Regression Neural Networks, we get a very and the critic model calculates TD error also known efficient learning machine.

Load more