Optimal Control and Reinforcement Learning of Switched Systems

Optimal Control and Reinforcement Learning of Switched Systems Dissertation Presented in Partial Fulfillment of the Requirements for the Degree Doctor of Philosophy in the Graduate School of The Ohio State University By Hua Chen, B.E. Graduate Program in Electrical and Computer Engineering The Ohio State University 2018 Dissertation Committee: Prof. Wei Zhang, Advisor Prof. Andrea Serrani Prof. Vadim Utkin c Copyright by Hua Chen 2018 Abstract This dissertation studies optimal control and reinforcement learning of switched systems. Roughly speaking, a switched system consists of several subsystems and a switching signal determining which subsystem is being used for evolving the system dynamics at each time instant. Optimal control of such switched systems involves finding the discrete switching signal and the associated continuous input into the chosen subsystem to jointly optimize certain performance index. It is widely known in the literature that optimal control of switched systems is challenging to solve, mainly due to the discrete nature of the switching signal that makes the overall problem com- binatorial. Two problems with different settings are considered in this dissertation. The first problem we consider is a general optimal control of continuous-time nonlinear switched systems. We focus on the so-called embedding-based approach. Rather than proposing new embedding-based algorithms, we develop a framework originating from a novel topological perspective of the embedding-based technique. The proposed framework unifies most existing embedding-based algorithms as special cases and provides guidance on how to construct new ones. The second problem studied in this dissertation is optimal control of discrete-time switched linear system. Due to the fact that knowledge about accurate system dynamics is in general hard to obtain for practical systems, we do not assume knowledge about the system model. Alterna- tively, a simulator is adopted for generating the successive state given any state-input ii pair. Based on this simulator, we utilize the reinforcement learning framework for solving the problem in a model-free manner. Instead of directly applying existing neural network based algorithms, we develop a distinct Q-learning algorithm which explicitly incorporates the analytical insights about the optimal solution from tradi- tional optimal control. In particular, a specific parametric Q-function approximation is proposed. To update the involved parameters, two approaches based on different structural information of the underlying model are adopted. iii This is dedicated to my parents. iv Acknowledgments This dissertation summarizes my six-and-half year PhD study at The Ohio State University, which would not have been possible without the help of many people. First of all, I would like to express my deepest gratitude to my PhD advisor, Prof. Wei Zhang, for his guidance, patience, and inspirations. I am also greatly indebted to his generous support for my various conference travelings. I would also like to thank Prof. Andrea Serrani and Prof. Vadim Utkin for serving on my PhD committee. Prof. Andrea Serrani's enthusiasm toward his research topics has constantly been a strong motivation for me. His courses on linear system theory, nonlinear system theory and adaptive control are among the best that I have ever taken. Although I have never taken a course from Prof. Utkin and have never directly worked with him, his seminal works on sliding mode control were a strong motive for me to working on problems related to switched systems. I would like to thank Prof. Antonio Conejo for many enlightening discussions when I was working on control problems in power systems. I am also greatly indebted to Dr. Jianming Lian from the Pacific Northwest National Laboratory for lots of fruitful discussions on voltage control problems and offering me an internship there. Many other individuals have also contributed to this dissertation, either directly or indirectly, including, but not limited to, David Casbeer, Krishna Kalyanam, Laurentiu Marinovici, Chin-Yao Chang, Kiryung Lee, Lin Zhao, Sen Li, Jianzhe Liu, Yueyun v Lu, Yanzheng Zhu, Huaqing Xiong, Hao Li, Bowen Weng and many more, for making my life at Ohio State memorable. Finally, I want to thank my wife, Lingxiao Zhou, and my parents, Zhengping Chen and Huimin Ma, for their support, patience and unconditional love. vi Vita 2012 . .B.E., Automation, Zhejiang University 2012-present . .Graduate Research Associate, Electrical and Computer Engineering, The Ohio State University Publications Research Publications H. Chen, K. Krishnamoorthy, W. Zhang, D. Casbeer \Intruder Isolation on a General Road Network Under Partial Information". IEEE Transactions on Control Systems Technology, vol. 25, no. 1, pp. 222 - 234, January 2017 H. Chen, W. Zhang \On weak topology for optimal control of switched nonlinear systems". Automatica, vol. 81, pp. 409-415, July 2017 J. Liu, H. Chen, W. Zhang, B. Yurkovich, G. Rizzoni \Energy Management Problems Under Uncertainties for Grid-Connected Microgrids: a Chance Constrained Program- ming Approach". IEEE Transactions on Smart Grid, vol. 8, no.6, pp. 2585-2596, November 2017 H. Chen, K. Krishnamoorthy, W. Zhang, D. Casbeer \Continuous-time intruder isolation using Unattended Ground Sensors on graphs". American Control Conference (ACC) , pp. 5270-5275, 2014 J. Lian, D. Wu, K. Kalsi, H. Chen \Theoretical Framework for Integrating Distributed Energy Resources into Distribution Systems ". Power & Energy Society General Meeting , pp. 1-5, 2017 vii H. Chen, W. Zhang, J. Lian, A. J. Conejo \Robust Distributed volt/var Control of Distribution Systems ". Conference on Decision and Control (CDC) , pp. 6321-6326, 2017 H. Li, H. Chen, W. Zhang \On Model-free Reinforcement Learning for Switched Lin- ear Systems: A Subspace Clustering Approach ". 56th Annual Allerton Conference on Communication, Control, and Computing, October 2018 Fields of Study Major Field: Electrical and Computer Engineering viii Table of Contents Page Abstract . ii Dedication . iv Acknowledgments . .v Vita......................................... vii List of Tables . xi List of Figures . xii 1. Background . .1 1.1 Motivations and Overview . .1 1.2 Literature Review . .3 1.2.1 Classical Optimal Control . .3 1.2.2 Optimal Control of Continuous-time Switched Systems . .5 1.2.3 Optimal Control and Reinforcement Learning of Discrete- time Switched Systems . .8 1.3 Preview of Main Results and Contributions . 11 1.4 Organization . 14 2. A Weak-Topology based Framework for Optimal Control of Continuous- time Switched Systems . 16 2.1 Introduction . 16 2.2 Problem Formulation and Preliminaries . 17 2.3 Weak Topologies and Infinite-dimensional Optimization . 24 2.3.1 Topologies and Weak Topologies . 24 2.3.2 Infinite-dimensional Optimization for Optimal Control . 28 ix 2.4 A Unified Framework for Switched Optimal Control Problem . 32 2.4.1 Convergence Analysis and Proofs . 35 2.5 Case Studies . 40 2.5.1 Problem with terminal cost . 40 2.5.2 Problem with mode-dependent cost . 41 2.6 Conclusion . 49 3. Reinforcement Learning for Switched Linear Systems . 51 3.1 Introduction . 51 3.2 Problem Formulation . 53 3.3 Model-based Optimal Control . 56 3.3.1 Dynamic Programming . 56 3.3.2 Switched Linear Quadratic Regulation . 62 3.3.3 Limitations . 67 3.4 Q-learning . 68 3.5 Main Results . 75 3.5.1 Q-function and parametric approximator . 75 3.5.2 Q-function Update . 77 3.6 Case Studies . 91 3.6.1 A Simple 2-Dimensional Example . 91 3.6.2 Another More Interesting 2-Dimensional Example . 94 3.6.3 A 3-Dimensional Example . 96 3.7 Conclusion . 97 4. Contributions and Future Work . 102 4.1 Future Works . 104 Bibliography . 105 x List of Tables Table Page 2.1 Configurations of DES system . 43 xi List of Figures Figure Page 2.1 Convergence of terminal states under different topologies . 42 3.1 General Neural Network Structure . 74 3.2 Original data distribution . 84 3.3 Geometric Algorithm on 2-D Synthetic Data . 85 3.4 Histogram of empirical error between the identified model and underlying true . 86 3.5 Empirical Costs Comparison among Subspace Clustering Approach, Geometric Approach and SLQR - Example 1 . 92 3.6 Pij convergence with subspace clustering - Example 1 . 92 3.7 Pij convergence with geometric approach - Example 1 . 93 3.8 Empirical Costs Comparison among Subspace Clustering Approach, Geometric Approach and SLQR - Example 2 . 94 3.9 Pij convergence with subspace clustering - Example 2 . 95 3.10 Pij convergence with geometric approach - Example 2 . 95 3.11 Empirical Costs Comparison among Subspace Clustering Approach, Geometric Approach and SLQR - Example 3 . 98 3.12 Pij convergence with subspace clustering - Example 3 . 99 3.13 Pij convergence with geometric approach - Example 3 . 100 xii Chapter 1: Background 1.1 Motivations and Overview In this dissertation, we study optimal control and reinforcement learning for switched systems. Roughly speaking, a switched system contains a number of subsystems and a switching signal determining which subsystem is being used for evolving the system dynamics at each time instant. Optimal control of such switched systems involves finding the discrete switching signal and the associated continuous input into the chosen subsystem to jointly optimize certain performance index. Switched systems have been extensively studied in the literature due to its strong capability of modeling various engineering phenomena involving multi-mode behav- iors, such as power electronics [60], automotive systems [38, 68, 83], robotics [95], and manufacturing [23, 61]. Because of the presence of the discrete switching signal, classical theories and techniques on control systems cannot be directly applied to the switched systems. During the past several decades, mathematical theories for switched system have been developed. In particular, [47] provides an overview of several fundamental questions in control of switched systems, including stability and stabilizability properties for controlled switching, autonomous switching and so 1 on. From a higher level perspective, switched systems serves as a particular class of hybrid systems which involve both continuous and logical dynamics [16, 32, 33, 53].

Load more