Applications of Reinforcement Learning to Routing and Virtualization in Computer Networks
Total Page:16
File Type:pdf, Size:1020Kb
Applications of Reinforcement Learning to Routing and Virtualization in Computer Networks by Soroush Haeri B. Eng., Multimedia University, Malaysia, 2010 Dissertation Submitted in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy in the School of Engineering Science Faculty of Applied Science © Soroush Haeri 2016 SIMON FRASER UNIVERSITY Spring 2016 All rights reserved. However, in accordance with the Copyright Act of Canada, this work may be reproduced without authorization under the conditions for “Fair Dealing.” Therefore, limited reproduction of this work for the purposes of private study, research, criticism, review and news reporting is likely to be in accordance with the law, particularly if cited appropriately. Abstract Computer networks and reinforcement learning algorithms have substantially advanced over the past decade. The Internet is a complex collection of inter-connected networks with a numerous of inter-operable technologies and protocols. Current trend to decouple the network intelligence from the network devices enabled by Software-Defined Networking (SDN) provides a centralized implementation of network intelligence. This offers great computational power and memory to network logic processing units where the network intelligence is implemented. Hence, reinforcement learning algorithms viable options for addressing a variety of computer networking challenges. In this dissertation, we propose two applications of reinforcement learning algorithms in computer networks. We first investigate the applications of reinforcement learning for deflection routing in buffer- less networks. Deflection routing is employed to ameliorate packet loss caused by contention in buffer-less architectures such as optical burst-switched (OBS) networks. We present a framework that introduces intelligence to deflection routing (iDef). The iDef framework decouples design of the signaling infrastructure from the underlying learning algorithm. It is implemented in the ns-3 network simulator and is made publicly available. We propose the predictive Q-learning deflection routing (PQDR) algorithm that enables path recov- ery and reselection, which improves the decision making ability of the node in high load conditions. We also introduce the Node Degree Dependent (NDD) signaling algorithm. The complexity of the algorithm only depends on the degree of the node that is NDD compliant while the complexity of the currently available reinforcement learning-based de- flection routing algorithms depends on the size of the network. Therefore, NDD is better suited for larger networks. Simulation results show that NDD-based deflection routing al- gorithms scale well with the size of the network and outperform the existing algorithms. We also propose a feed-forward neural network (NN) and a feed-forward neural network with episodic updates (ENN). They employ a single hidden layer and update their weights using an associative learning algorithm. Current reinforcement learning-based deflection routing algorithms employ Q-learning, which does not efficiently utilize the received feed- back signals. We introduce the NN and ENN decision-making algorithms to address the deficiency of Q-learning. The NN-based deflection routing algorithms achieve better results than Q-learning-based algorithms in networks with low to moderate loads. iii The second application of reinforcement learning that we consider in this dissertation is for modeling the Virtual Network Embedding (VNE) problem. We develop a VNE simulator (VNE-Sim) that is also made publicly available. We define a novel VNE objective function and prove its upper bound. We then formulate the VNE as a reinforcement learning prob- lem using the Markov Decision Process (MDP) framework and then propose two algorithms (MaVEn-M and MaVEn-S) that employ Monte Carlo Tree Search (MCTS) for solving the VNE problem. In order to further improve the performance, we parallelize the algorithms by employing MCTS root parallelization. The advantage of the proposed algorithms is that, time permitting, they search for more profitable embeddings compared to the avail- able algorithms that find only a single embedding solution. The simulation results show that proposed algorithms achieve superior performance. Keywords: Computer networks; machine learning; reinforcement learning; deflection rout- ing; virtual network embedding iv For Inga, Baba, Maman, and Sina v Acknowledgements Writing this dissertation would not have been possible without the intellectual and emo- tional contributions of the generous individuals I met throughout this journey. I would like to thank my Senior Supervisor Dr. Ljiljana Trajković who dedicated count- less hours of hard work for reviewing my works and guiding my research. She was very generous and encouraging to allow me to explore my ideas and I am truly grateful for the confidence she invested in me and my research. I would also like to thank my committee members Dr. Hardy, Dr. Gu, Dr. Peters, and Dr. Boukerche for reviewing my dissertation and providing constructive suggestions and comments. I would like to thank Dr. Wilson Wang-Kit Thong. He piqued my interest in deflection routing and introduced me to the ns-3 tool. The ideas in the first portion of this dissertation were conceived during the time he was a visiting Ph.D. student at the Communication Networks Laboratory at Simon Fraser University. I would like to express my gratitude to Marilyn Hay and Toby Wong of BCNET as well as Nabil Seddigh and Dr. Biswajit Nandy of Solana Networks, with whom I collaborated on industry-based projects. Although these projects are not part of this dissertation, many subjects I learned during our collaborations helped me develop my research skill. I would like to thank my brother Dr. Sina Haeri who has always been a source of support and encouragement. His excellence in parallel programming helped me develop the parallel Monte-Carlo Tree Search that is used in this dissertation. I would like to thank my Mom and Dad for their unconditional love and support through- out my life. Their support enabled me to study and pursue my educational goals. I would also like to thank the scuba divers of Vancouver’s Tec Club, especially John Nunes, Roger Sonnberger, and Eli Wolpin. Furthermore, I would like to extend my gratitude to Royse Jackson and Alan Johnson of the International Diving Center as well as Paul Quiggle of Vancouver’s Diving Locker. Many ideas presented in this dissertation were conceived exploring the majestic waters of Vancouver. I would like to extend my gratitude to my friends Majid Arianezhad, Alireza Jafarpour, Shima Nourbakhsh, Kenneth Fenech, and Luis Domingo Suarez Canales who have helped me and provided support in countless ways. Last but not least, I would like to thank my lovely wife Inga. You supported me through the toughest times of my life. Without you, I would not have made it this far. vi Table of Contents Approval ii Abstract iii Dedication v Acknowledgements vi Table of Contents vii List of Tables x List of Figures xi List of Abbreviations xv 1 Introduction 1 1.1 Optical Burst-Switching and Deflection Routing . 3 1.2 Virtual Network Embedding . 4 1.3 Roadmap . 5 2 Reinforcement Learning 6 2.1 Q-Learning . 7 2.2 Feed-Forward Neural Networks for Reinforcement Learning . 7 2.3 Markov Decision Process . 9 2.3.1 Solution of Markov Decision Processes: The Exact Algorithms . 9 2.3.2 Solution of Large Markov Decision Processes and Monte Carlo Tree Search . 11 2.3.3 Parallel Monte Carlo Tree Search . 14 3 Reinforcement Learning-Based Deflection Routing in Buffer-Less Net- works 16 3.1 Buffer-Less Architecture, Optical Burst Switching, and Contention . 18 3.1.1 Optical Burst Switching and Burst Traffic . 18 vii 3.1.2 Contention in Optical Burst-Switched Networks . 20 3.2 Deflection Routing by Reinforcement Learning . 21 3.3 The iDef Framework . 23 3.4 Predictive Q-Learning-Based Deflection Routing Algorithm . 24 3.5 The Node Degree Dependent Signaling Algorithm . 27 3.6 Neural Networks for Deflection Routing . 30 3.6.1 Feed-Forward Neural Networks for Deflection Routing with Single- Episode Updates . 30 3.6.2 Feed-Forward Neural Networks for Deflection Routing with k-Episode Updates . 33 3.6.3 Time Complexity Analysis . 34 3.7 Network Topologies: A Brief Overview . 34 3.8 Performance Evaluation . 36 3.8.1 National Science Foundation Network Scenario . 37 3.8.2 Complex Network Topologies and Memory Usage . 39 3.9 Discussion . 43 4 Reinforcement Learning-Based Algorithms for Virtual Network Em- bedding 46 4.1 Virtual Network Embedding Problem . 48 4.1.1 Objective of Virtual Network Embedding . 49 4.1.2 Virtual Network Embedding Performance Metrics . 52 4.2 Available Virtual Network Embedding Algorithms . 53 4.2.1 Virtual Node Mapping Algorithms . 53 4.2.2 Virtual Link Mapping Algorithms . 57 4.3 Virtual Network Embedding Algorithms and Data Center Networks . 59 4.3.1 Data Center Network Topologies . 60 4.4 Virtual Network Embedding as a Markov Decision Process . 61 4.4.1 A Finite-Horizon Markov Decision Process Model for Coordinated Virtual Node Mapping . 61 4.4.2 Monte Carlo Tree Search for Solving the Virtual Node Mapping . 62 4.4.3 MaVEn Algorithms . 63 4.4.4 Parallelization of MaVEn . 66 4.5 Performance Evaluation . 67 4.5.1 Simulation Environment . 67 4.5.2 Internet Service Provider Substrate Network Topology . 68 4.5.3 Variable Virtual Network Request Arrival Rate Scenarios . 70 4.5.4 Parallel MaVEn Simulation Scenarios . 75 4.5.5 Data Center Substrate Networks . 77 viii 4.6 Discussion . 84 5 VNE-Sim: A Virtual Network Embedding Simulator 86 5.1 The Simulator Core: src/core .......................... 89 5.1.1 Network Component Classes . 90 5.1.2 Virtual Network Embedding Classes . 91 5.1.3 Discrete Event Simulation Classes . 92 5.1.4 Experiment and Result Collection Classes . 93 5.1.5 Operation Related Classes . 93 6 Conclusion 95 Bibliography 97 Appendix A iDef: Selected Code Sections 109 Appendix B VNE-Sim: Selected Code Sections 119 ix List of Tables Table 3.1 Summary of the presented algorithms.