Intelligent Flapping Wing Control Reinforcement Learning for the Delfly M.W

Intelligent Flapping Wing Control Reinforcement Learning for the DelFly M.W. Goedhart Technische Universiteit Delft Intelligent Flapping Wing Control Reinforcement Learning for the DelFly Master of Science Thesis For obtaining the degree of Master of Science in Aerospace Engineering at Delft University of Technology M.W. Goedhart June 7, 2017 Faculty of Aerospace Engineering Delft University of Technology ¢ Delft University Of Technology Department Of Control and Simulation Dated: June 7, 2017 Readers: Dr. ir. E. van Kampen Dr. ir. C. C. de Visser Dipl.-Ing. S. F.Armanini Dr. O. A. Sharpanskykh Dr. ir. Q. P.Chu Preface This reports concludes the work conducted for my thesis. In the graduation phase at the department of Control and Simulation at the faculty of Aerospace Engineering of Delft University of Technology, a phase of academic research is formalized in the course AE5310 Thesis Control and Operations. This report is intended to show the knowledge gained during the thesis phase. The main scientific contributions of this project are described in a scientific paper, which is included in this thesis. This is intended as a conference paper. It is aimed at readers with a background in intelligent aerospace control, and is therefore written at a more difficult level. It should be readable as a standalone document. For readers without previous knowledge of Reinforcement Learning control, a literature survey is sum- marized in this thesis. This should be sufficient to understand the paper. Such readers are advised to read Chapter6 to9 before continuing to the paper. I would like to thank dr. Erik-Jan van Kampen, dr. Coen de Visser and Sophie Armanini, my supervisors on this thesis project. This work would not have been possible without their support and valuable feedback. Also, I would like to thank my family for supporting me during the months I spend on my thesis, and the years of study before that. M.W. Goedhart Zwijndrecht, May 2017 v Contents Preface V 1 Introduction 1 2 Research SCOPE AND FRAMEWORK3 2.1 Scope of the project........................................3 2.2 Research questions........................................4 2.3 Challenges of DelFly control...................................5 2.4 Project procedure.........................................6 3 ScientifiC PAPER 9 4 ReflECTION ON CLASSIfiCATION CONTROL 37 4.1 Model identification algorithms.................................. 37 4.2 Intermediate models....................................... 41 4.3 Neural network overfitting.................................... 44 4.4 Filtering phase shift........................................ 45 5 Assessment OF RECOMMENDATIONS 47 5.1 Filtered classifier training..................................... 47 5.2 Combination of algorithms.................................... 48 6 Introduction TO Reinforcement LEARNING 51 6.1 Basics............................................... 51 6.1.1 Origins of Reinforcement Learning............................ 51 6.1.2 Important concepts.................................... 51 6.2 Dynamic Programming...................................... 52 6.2.1 Bellman equations..................................... 52 6.2.2 Generalized Policy Iteration................................ 53 6.3 Temporal Difference methods.................................. 54 6.3.1 Temporal Difference Learning............................... 54 6.3.2 Sarsa............................................ 54 6.3.3 Q-learning......................................... 54 7 Reinforcement LEARNING IN AEROSPACE CONTROL 55 7.1 Continuous Reinforcement Learning solutions.......................... 55 7.1.1 Extending Reinforcement Learning to continuous state-action spaces........... 55 7.1.2 The action selection problem............................... 56 7.2 Actor-critic algorithms...................................... 56 7.2.1 Heuristic Dynamic Programming............................. 57 7.2.2 Dual Heuristic Programming................................ 58 7.2.3 Action Dependent Heuristic Dynamic Programming................... 58 7.2.4 Comparison of actor-critic algorithms........................... 59 7.3 Critic-only algorithms....................................... 59 7.3.1 Solving the optimization problem............................. 59 7.3.2 Including the optimal control............................... 60 7.3.3 Single Network Adaptive Critic............................... 60 7.3.4 Model-free critic-only control............................... 62 7.4 Actor-only algorithms....................................... 62 7.4.1 Making use of inaccurate models............................. 63 7.4.2 Stochastic model identification.............................. 64 vii viii Contents 7.5 High-level control methods.................................... 65 7.5.1 Helicopter hovering competitions............................. 65 7.5.2 Q-learning for local controller selection.......................... 66 7.5.3 Optimistic model predictive control............................ 66 7.6 Preselection from literature.................................... 66 8 Implementation AND TRAINING OF INTELLIGENT AGENTS 69 8.1 Neural networks......................................... 69 8.2 Solution schemes......................................... 72 9 Algorithm PRESELECTION 75 9.1 Complex dynamics........................................ 75 9.1.1 Problem description.................................... 75 9.1.2 Results........................................... 76 9.2 Variability............................................. 77 9.2.1 Problem description.................................... 77 9.2.2 Results........................................... 77 9.3 Local models........................................... 77 9.3.1 Problem description.................................... 78 9.3.2 Results........................................... 78 9.3.3 Discounted learning.................................... 79 9.4 Step test.............................................. 80 9.4.1 Problem description.................................... 80 9.4.2 Results........................................... 81 10 Concluding THE PROJECT 83 BibliogrAPHY 85 1 Introduction The DelFly [11] is a Micro Air Vehicle (MAV) that generates lift by flapping its wings. It has been capable of flight since 2005. Several authors have been able to develop automatic flight control systems for the DelFly by means of PID control [10–12, 23,48]. However, these controllers are all limited to specific flight conditions. Only for flight in a wind tunnel, more advanced control concepts have been applied [7,8]. Due to the small size of the DelFly, manufacturing is very complicated, which causes variations between individual vehicles [21]. In practice, it is found that the gains of the controllers need to be hand-tuned for each individual DelFly, which is time-consuming [21]. The application of more advanced controllers has been limited by the unavailability of accurate aero- dynamic models. The first linear model was published only in 2013 [5]. Research efforts have led to accurate models for specific flight conditions [2], but the identification of a global model until recently has been challenging. Classical control uses frequency or Laplace domain techniques to develop controllers for linear systems. In modern control, time domain representations are used. This has made it possible to handle non- linear systems as well. However, these controllers are fully predefined by the control engineers who designed them. Controllers that are able to adapt their behavior could be seen as a new phase in control engineering [26]. Adaptive controllers contain a parametrized representation of their behavior. Algorithms are used to change the parameters during interaction with the system. Reinforcement Learning (RL) control tries to mimic the learning methods that humans and animals use [46]. Reinforcement Learning is a machine learning technique where the intelligent system learns by trial and error. Historically, RL has been used to study the learning behavior of animals [43]. Reinforcement Learning is inherently based on optimal control [43]. The intelligent agent is always trying to optimize some function. From a control engineer’s perspective, it provides a trial and error approach to optimal control, where the performance of a poor controller is improved continuously. The strength of Reinforcement Learning methods is that they can often deal with an inaccurate model of the controlled system. This makes RL suitable for problems where accurate models are not available [21]. The challenges of limited scope of the models and variability of the DelFly may be effectively tackled by Reinforcement Learning [21]. Reinforcement Learning has been applied to flapping wings to maximize lift [33,34,41, 50], but has never been used for flight control of Flapping Wing MAVs. In this graduation project, a Reinforcement Learning controller will be developed in order to achieve automatic flight of the DelFly. A detailed explanation of the scope, challenges and phases of this project is presented in Chapter2. The main contributions to science are described in a scientific paper, contained in Chapter3. Chapter4 provides some additional results and reflection on the methods and results of the paper. Some of the recommendations from Chapter3 are analyzed further in Chapter5. After the results have been presented, the road towards them will be explained. Chapter6 provides an introduction to the field of Reinforcement Learning. A review of Reinforcement Learning literature for flight control applications

Load more