Systems Control with Generalized Probabilistic Fuzzy- Reinforcement Learning

Systems control with generalized probabilistic fuzzy- reinforcement learning Citation for published version (APA): Hinojosa, J., Nefti, S., & Kaymak, U. (2011). Systems control with generalized probabilistic fuzzy-reinforcement learning. IEEE Transactions on Fuzzy Systems, 19(1), 51-64. https://doi.org/10.1109/TFUZZ.2010.2081994 DOI: 10.1109/TFUZZ.2010.2081994 Document status and date: Published: 01/01/2011 Document Version: Publisher’s PDF, also known as Version of Record (includes final page, issue and volume numbers) Please check the document version of this publication: • A submitted manuscript is the version of the article upon submission and before peer-review. There can be important differences between the submitted version and the official published version of record. People interested in the research are advised to contact the author for the final version of the publication, or visit the DOI to the publisher's website. • The final author version and the galley proof are versions of the publication after peer review. • The final published version features the final layout of the paper including the volume, issue and page numbers. Link to publication General rights Copyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright owners and it is a condition of accessing publications that users recognise and abide by the legal requirements associated with these rights. • Users may download and print one copy of any publication from the public portal for the purpose of private study or research. • You may not further distribute the material or use it for any profit-making activity or commercial gain • You may freely distribute the URL identifying the publication in the public portal. If the publication is distributed under the terms of Article 25fa of the Dutch Copyright Act, indicated by the “Taverne” license above, please follow below link for the End User Agreement: www.tue.nl/taverne Take down policy If you believe that this document breaches copyright please contact us at: [email protected] providing details and we will investigate your claim. Download date: 27. Sep. 2021 IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 19, NO. 1, FEBRUARY 2011 51 Systems Control With Generalized Probabilistic Fuzzy-Reinforcement Learning William M. Hinojosa, Samia Nefti, Member, IEEE, and Uzay Kaymak, Member, IEEE Abstract—Reinforcement learning (RL) is a valuable learning appropriate method than supervised or unsupervised methods method when the systems require a selection of control actions when the systems require a selection of control actions whose whose consequences emerge over long periods for which input– consequences emerge over long periods for which input–output output data are not available. In most combinations of fuzzy systems and RL, the environment is considered to be deterministic. data are not available. In many problems, however, the consequence of an action may be An RL problem can be defined as a decision process where uncertain or stochastic in nature. In this paper, we propose a novel the agent learns how to select an action based on feedback from RL approach to combine the universal-function-approximation ca- the environment. It can be said that the agent learns a policy pability of fuzzy systems with consideration of probability distri- that maps states of the environment into actions. Often, the RL butions over possible consequences of an action. The proposed generalized probabilistic fuzzy RL (GPFRL) method is a modified agent must learn a value function, which is an estimate of the version of the actor–critic (AC) learning architecture. The learning appropriateness of a control action given the observed state. In is enhanced by the introduction of a probability measure into the many applications, the value function that needs to be learned learning structure, where an incremental gradient–descent weight– can be rather complex. It is then usual to use general function updating algorithm provides convergence. Our results show that approximators, such as neural networks and fuzzy systems to the proposed approach is robust under probabilistic uncertainty while also having an enhanced learning speed and good overall approximate the value function. This approach has been the start performance. of extensive research on fuzzy and neural RL controllers. In this paper, our focus is on fuzzy RL controllers. Index Terms—Actor–critic (AC), learning agent, probabilistic fuzzy systems, reinforcement learning (RL), systems control. In most combinations of fuzzy systems and RL, the environment is considered to be deterministic, where the rewards are known, and the consequences of an action are well-defined. In I. INTRODUCTION many problems, however, the consequence of an action may be EARNING agents can tackle problems where prepro- uncertain or stochastic in nature. In that case, the agent deals L grammed solutions are difficult or impossible to design. with environments where the exact nature of the choices is un- Depending on the level of available information, learning agents known, or it is difficult to foresee the consequences or outcomes can apply one or more types of learning, such as unsupervised of events with certainty. Furthermore, an agent cannot simply or supervised learning. Unsupervised learning is suitable when assume what the world is like and take an action according to target information is not available and the agent tries to form those assumptions. Instead, it needs to consider multiple pos- a model based on clustering or association among data. Su- sible contingencies and their likelihood. In order to handle this pervised learning is much more powerful, but it requires the key problem, instead of predicting how the system will respond knowledge of output patterns corresponding to input data. In to a certain action, a more appropriate approach is to predict a dynamic environments, where the outcome of an action is not system probability of response [1]. immediately known and is subject to change, correct target data In this paper, we propose a novel RL approach to combine may not be available at the moment of learning, which implies the universal-function-approximation capability of fuzzy sys- that supervised approaches cannot be applied. In these envi- tems with consideration of probability distributions over possi- ronments, reward information, which may be available only ble consequences of an action. In this way, we seek to exploit sparsely, may be the best signal that an agent receives. For such the advantages of both fuzzy systems and probabilistic systems, systems, reinforcement learning (RL) has proven to be a more where the fuzzy RL controller can take into account the probabilistic uncertainty of the environment. The proposed generalized probabilistic fuzzy RL (GPFRL) method is a modified version of the actor–critic (AC) learning Manuscript received October 12, 2009; revised June 12, 2010; accepted architecture, where uncertainty handling is enhanced by the August 23, 2010. Date of publication September 30, 2010; date of current version February 7, 2011. introduction of a probabilistic term into the actor and critic W. M. Hinojosa is with the Robotics and Automation Laboratory, learning, enabling the actor to effectively define an input–output The University of Salford, Greater Manchester, M5 4WT, U.K. (e-mail: mapping by learning the probabilities of success of performing [email protected]). S. Nefti is with the School of Computing Science and Engineering, The each of the possible output actions. In addition, the final output University of Salford, Greater Manchester, M5 4WT, U.K. (e-mail: s.nefti- of the system is evaluated considering a weighted average of all [email protected]). possible actions and their probabilities. U. Kaymak is with the Econometric Institute, Erasmus School of Eco- nomics, Erasmus University, Rotterdam 1738, The Netherlands (e-mail: The introduction of the probabilistic stage in the controller [email protected]). adds robustness against uncertainties and allows the possibility Digital Object Identifier 10.1109/TFUZZ.2010.2081994 1063-6706/$26.00 © 2010 IEEE 52 IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 19, NO. 1, FEBRUARY 2011 of setting a level of acceptance for each action, providing flex- ibility to the system while incorporating the capability of sup- porting multiple outputs. In the present work, the transition function of the classic AC is replaced by a probability distribu- tion function. This is an important modification, which enables us to capture the uncertainty in the world, when the world is either complex or stochastic. By using a fuzzy set approach, the system is able to accept multiple continuous inputs and to generate continuous actions, rather than discrete actions, as in traditional RL schemes. GPFRL not only handles the uncertainty in the input states but also has a superior performance in comparison with similar fuzzy-RL models. The remainder of the paper is organized as follows. In Section II, we discuss related previous work. Our proposed architecture for GPFRL is discussed in Section III. GPFRL learning is considered in Section IV. In Section V, we discuss three examples that illustrate various aspects of the proposed approach. Finally, conclusions are given in Section VI. II. RELATED WORK Fig. 1. AC architecture. Over the past few years, various RL schemes have been developed, either by designing new learning methods [2] or by de- or AC techniques [10], [11]. Lin and Lin developed RL strategy veloping new hybrid architectures that combine RL with other based on fuzzy-adaptive-learning control network (FALCON- systems, like neural networks and fuzzy logic. Some early ap- RL) method [12], Jouffe’s fuzzy-AC-learning (FACL) method proaches include the work given in [3], where a box system is [10], Lin’s RL-adaptive fuzzy-controller (RLAFC) method [11], used for the purpose of describing a system state based on its and Wang’s fuzzy AC RL network (FACRLN) method [14].

Load more