Intelligent Flapping Wing Control Reinforcement Learning for the Delfly M.W

Total Page:16

File Type:pdf, Size:1020Kb

Intelligent Flapping Wing Control Reinforcement Learning for the Delfly M.W Intelligent Flapping Wing Control Reinforcement Learning for the DelFly M.W. Goedhart Technische Universiteit Delft Intelligent Flapping Wing Control Reinforcement Learning for the DelFly Master of Science Thesis For obtaining the degree of Master of Science in Aerospace Engineering at Delft University of Technology M.W. Goedhart June 7, 2017 Faculty of Aerospace Engineering Delft University of Technology ¢ Delft University Of Technology Department Of Control and Simulation Dated: June 7, 2017 Readers: Dr. ir. E. van Kampen Dr. ir. C. C. de Visser Dipl.-Ing. S. F.Armanini Dr. O. A. Sharpanskykh Dr. ir. Q. P.Chu Preface This reports concludes the work conducted for my thesis. In the graduation phase at the department of Control and Simulation at the faculty of Aerospace Engineering of Delft University of Technology, a phase of academic research is formalized in the course AE5310 Thesis Control and Operations. This report is intended to show the knowledge gained during the thesis phase. The main scientific contri- butions of this project are described in a scientific paper, which is included in this thesis. This is intended as a conference paper. It is aimed at readers with a background in intelligent aerospace control, and is therefore written at a more difficult level. It should be readable as a standalone document. For readers without previous knowledge of Reinforcement Learning control, a literature survey is sum- marized in this thesis. This should be sufficient to understand the paper. Such readers are advised to read Chapter6 to9 before continuing to the paper. I would like to thank dr. Erik-Jan van Kampen, dr. Coen de Visser and Sophie Armanini, my supervisors on this thesis project. This work would not have been possible without their support and valuable feedback. Also, I would like to thank my family for supporting me during the months I spend on my thesis, and the years of study before that. M.W. Goedhart Zwijndrecht, May 2017 v Contents Preface V 1 Introduction 1 2 Research SCOPE AND FRAMEWORK3 2.1 Scope of the project........................................3 2.2 Research questions........................................4 2.3 Challenges of DelFly control...................................5 2.4 Project procedure.........................................6 3 ScientifiC PAPER 9 4 ReflECTION ON CLASSIfiCATION CONTROL 37 4.1 Model identification algorithms.................................. 37 4.2 Intermediate models....................................... 41 4.3 Neural network overfitting.................................... 44 4.4 Filtering phase shift........................................ 45 5 Assessment OF RECOMMENDATIONS 47 5.1 Filtered classifier training..................................... 47 5.2 Combination of algorithms.................................... 48 6 Introduction TO Reinforcement LEARNING 51 6.1 Basics............................................... 51 6.1.1 Origins of Reinforcement Learning............................ 51 6.1.2 Important concepts.................................... 51 6.2 Dynamic Programming...................................... 52 6.2.1 Bellman equations..................................... 52 6.2.2 Generalized Policy Iteration................................ 53 6.3 Temporal Difference methods.................................. 54 6.3.1 Temporal Difference Learning............................... 54 6.3.2 Sarsa............................................ 54 6.3.3 Q-learning......................................... 54 7 Reinforcement LEARNING IN AEROSPACE CONTROL 55 7.1 Continuous Reinforcement Learning solutions.......................... 55 7.1.1 Extending Reinforcement Learning to continuous state-action spaces........... 55 7.1.2 The action selection problem............................... 56 7.2 Actor-critic algorithms...................................... 56 7.2.1 Heuristic Dynamic Programming............................. 57 7.2.2 Dual Heuristic Programming................................ 58 7.2.3 Action Dependent Heuristic Dynamic Programming................... 58 7.2.4 Comparison of actor-critic algorithms........................... 59 7.3 Critic-only algorithms....................................... 59 7.3.1 Solving the optimization problem............................. 59 7.3.2 Including the optimal control............................... 60 7.3.3 Single Network Adaptive Critic............................... 60 7.3.4 Model-free critic-only control............................... 62 7.4 Actor-only algorithms....................................... 62 7.4.1 Making use of inaccurate models............................. 63 7.4.2 Stochastic model identification.............................. 64 vii viii Contents 7.5 High-level control methods.................................... 65 7.5.1 Helicopter hovering competitions............................. 65 7.5.2 Q-learning for local controller selection.......................... 66 7.5.3 Optimistic model predictive control............................ 66 7.6 Preselection from literature.................................... 66 8 Implementation AND TRAINING OF INTELLIGENT AGENTS 69 8.1 Neural networks......................................... 69 8.2 Solution schemes......................................... 72 9 Algorithm PRESELECTION 75 9.1 Complex dynamics........................................ 75 9.1.1 Problem description.................................... 75 9.1.2 Results........................................... 76 9.2 Variability............................................. 77 9.2.1 Problem description.................................... 77 9.2.2 Results........................................... 77 9.3 Local models........................................... 77 9.3.1 Problem description.................................... 78 9.3.2 Results........................................... 78 9.3.3 Discounted learning.................................... 79 9.4 Step test.............................................. 80 9.4.1 Problem description.................................... 80 9.4.2 Results........................................... 81 10 Concluding THE PROJECT 83 BibliogrAPHY 85 1 Introduction The DelFly [11] is a Micro Air Vehicle (MAV) that generates lift by flapping its wings. It has been capable of flight since 2005. Several authors have been able to develop automatic flight control systems for the DelFly by means of PID control [10–12, 23,48]. However, these controllers are all limited to specific flight conditions. Only for flight in a wind tunnel, more advanced control concepts have been applied [7,8]. Due to the small size of the DelFly, manufacturing is very complicated, which causes variations between individual vehicles [21]. In practice, it is found that the gains of the controllers need to be hand-tuned for each individual DelFly, which is time-consuming [21]. The application of more advanced controllers has been limited by the unavailability of accurate aero- dynamic models. The first linear model was published only in 2013 [5]. Research efforts have led to accu- rate models for specific flight conditions [2], but the identification of a global model until recently has been challenging. Classical control uses frequency or Laplace domain techniques to develop controllers for linear systems. In modern control, time domain representations are used. This has made it possible to handle non- linear systems as well. However, these controllers are fully predefined by the control engineers who designed them. Controllers that are able to adapt their behavior could be seen as a new phase in control engineering [26]. Adaptive controllers contain a parametrized representation of their behavior. Algorithms are used to change the parameters during interaction with the system. Reinforcement Learning (RL) control tries to mimic the learning methods that humans and animals use [46]. Reinforcement Learning is a machine learning technique where the intelligent system learns by trial and error. Historically, RL has been used to study the learning behavior of animals [43]. Reinforcement Learning is inherently based on optimal control [43]. The intelligent agent is always trying to optimize some function. From a control engineer’s perspective, it provides a trial and error approach to optimal control, where the performance of a poor controller is improved continuously. The strength of Reinforcement Learning methods is that they can often deal with an inaccurate model of the controlled system. This makes RL suitable for problems where accurate models are not available [21]. The challenges of limited scope of the models and variability of the DelFly may be effectively tackled by Reinforcement Learning [21]. Reinforcement Learning has been applied to flapping wings to maximize lift [33,34,41, 50], but has never been used for flight control of Flapping Wing MAVs. In this graduation project, a Reinforcement Learning controller will be developed in order to achieve automatic flight of the DelFly. A detailed explanation of the scope, challenges and phases of this project is presented in Chapter2. The main contributions to science are described in a scientific paper, contained in Chapter3. Chapter4 provides some additional results and reflection on the methods and results of the paper. Some of the recommenda- tions from Chapter3 are analyzed further in Chapter5. After the results have been presented, the road towards them will be explained. Chapter6 provides an introduction to the field of Reinforcement Learning. A review of Reinforcement Learning literature for flight control applications
Recommended publications
  • Анализ Данных В Науке И Технике Data-Driven Science and Engineering
    Стивен Л. Брантон, Дж. Натан Куц Анализ данных в науке и технике Data-Driven Science and Engineering Machine Learning, Dynamical Systems, and Control Steven L. Brunton, J. Nathan Kutz Анализ данных в науке и технике Машинное обучение, динамические системы и управление Стивен Л. Брантон, Дж. Натан Куц Москва, 2021 УДК 001.5, 004.6 ББК 20, 32.97 Б87 Брантон С. Л., Куц Дж. Н. Б87 Анализ данных в науке и технике / пер. с англ. А. А. Слинкина. – М.: ДМК Пресс, 2021. – 542 с.: ил. ISBN 978-5-97060-910-1 Открытия, сделанные на основе анализа данных, совершили революцию в мо- делировании, прогнозировании поведения и управлении сложными системами. В этой книге приводятся сведения из машинного обучения, инженерной математи- ки и математической физики с целью показать, как моделирование и управление динамическими системами сочетаются с современными методами науки о дан- ных. Рассказывается о многих достижениях в области научных расчетов, которые позволяют применять управляемые данными методы к изучению разнообразных сложных систем, таких как турбулентность, науки о мозге, климатология, эпиде- миология, финансы, робототехника и автономные системы. Книга рассчитана на интересующихся студентов старших курсов и аспирантов первого года обучения инженерных и физических специальностей, в ней рассмат­ ривается широкий круг тем и методов на уровне от введения до недавних работ. УДК 001.5, 004.6 ББК 20, 32.97 *** Russian language edition copyright © 2021 by DMK Press. All rights reserved. Все права защищены. Любая часть этой книги не может быть воспроизведена в ка- кой бы то ни было форме и какими бы то ни было средствами без письменного разрешения владельцев авторских прав. ISBN 978­1­108­42209­3 (англ.) © Steven L.
    [Show full text]
  • Application of Reinforcement Learning Techniques to the Development of Flight Control Systems for Miniature Uav Rotorcraft
    MACHINE LEARNING FOR INTELLIGENT CONTROL: APPLICATION OF REINFORCEMENT LEARNING TECHNIQUES TO THE DEVELOPMENT OF FLIGHT CONTROL SYSTEMS FOR MINIATURE UAV ROTORCRAFT A thesis submitted in partial fulfilment of the requirements for the Degree of Master of Engineering in Mechanical Engineering in the University of Canterbury by Edwin Hayes University of Canterbury 2013 Abstract This thesis investigates the possibility of using reinforcement learning (RL) techniques to create a flight controller for a quadrotor Micro Aerial Vehicle (MAV). A capable flight control system is a core requirement of any unmanned aerial vehicle. The challenging and diverse applications in which MAVs are destined to be used, mean that consider- able time and effort need to be put into designing and commissioning suitable flight controllers. It is proposed that reinforcement learning, a subset of machine learning, could be used to address some of the practical difficulties. While much research has delved into RL in unmanned aerial vehicle applications, this work has tended to ignore low level motion control, or been concerned only in off-line learning regimes. This thesis addresses an area in which accessible information is scarce: the performance of RL when used for on-policy motion control. Trying out a candidate algorithm on a real MAV is a simple but expensive proposition. In place of such an approach, this research details the development of a suitable simulator environment, in which a prototype controller might be evaluated. Then inquiry then proposes a possible RL-based control system, utilising the Q-learning algorithm, with an adaptive RBF-network providing function approximation. The operation of this prototypical control system is then tested in detail, to determine both the absolute level of performance which can be expected, and the effect which tuning critical parameters of the algorithm has on the functioning of the controller.
    [Show full text]
  • Machine Learning in a Dynamic World" (Invited), M.M.Kokar Organizer, P Roc
    P. J. Antsaklis, "Learning in Control,” in Panel Discussion on "Machine Learning in a Dynamic World" (Invited), M.M.Kokar Organizer, P roc. o f t he 3 rd I EEE I ntern. S ymposium o n I ntelligent C ontro l , pp. 500- 507, Arlington, VA, Aug. 24-26, 1988. MACHINE LEARNING IN A DYNAMIC WORLD Panel Dmussion Miecsyslaw M. Kohr Panel O&m and Editor Northeastern Univdty Boston, Massachwtts Panelists: P. J. Atnsaklis, University of Notre Dame K. A. DeJong, George Mason University A. L. Meyrowitz, Office of Naval Research A. Meystel, Drexel University R. S. Michal~gi,George Mason University R. S. Sutton, GTE Laboratories Machine Learning was established as a research discipline in the lems while ignoring common-senseknowledge. How much of progresd 1970's and experienced a growth expansion in the 1980's. One of in controlling real-world systems can be made without taking into the roots of machine learning research was in Cybernetic Systems account all kinds of imprecise information? and Adaptive Control. Machine Learning has been significantly influenced by Artificial Intelligence, Cognitive Science, Computer In the following we present the views of the panelists. This pre- Science, and other disciplines; Machine Learning has developed its sentation is based on the written material submitted to the panel own research paradigms, methodologies, and a set of research ob- organizers by the panelists. The preeentation has been divided jectives different from those of control systems. In the meantime, a into four sections. The first section is devoted to the trends in the new field of Intelligent Control has emerged.
    [Show full text]
  • Energy Consumption Prediction Using Machine Learning; a Review
    Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 11 March 2019 doi:10.20944/preprints201903.0131.v1 Review Energy consumption prediction using machine learning; a review Amir Mosavi 1,2,3*, Abdullah Bahmani 1, 1 Department of Computer Science, Norwegian University of Science and Technology, Trondheim, Norway; [email protected] 2 Kando Kalman Faculty of Electrical Engineering, Obuda University, Budapest; Hungary 3 School of the Built Environment, Oxford Brookes University, Oxford, UK Abstract: Machine learning (ML) methods has recently contributed very well in the advancement of the prediction models used for energy consumption. Such models highly improve the accuracy, robustness, and precision and the generalization ability of the conventional time series forecasting tools. This article reviews the state of the art of machine learning models used in the general application of energy consumption. Through a novel search and taxonomy the most relevant literature in the field are classified according to the ML modeling technique, energy type, perdition type, and the application area. A comprehensive review of the literature identifies the major ML methods, their application and a discussion on the evaluation of their effectiveness in energy consumption prediction. This paper further makes a conclusion on the trend and the effectiveness of the ML models. As the result, this research reports an outstanding rise in the accuracy and an ever increasing performance of the prediction technologies using the novel hybrid and ensemble prediction models. Keywords: energy consumption; prediction; machine learning models; deep learning models; artificial intelligence (AI); computational intelligence (CI); forecasting; soft computing (SC); 1. Introduction Energy consumption is among one of the essential topics of energy systems.
    [Show full text]