Baconian: a Unified Open-Source Framework for Model-Based

Baconian: A Unified Open-source Framework for Model-Based Reinforcement Learning Demonstration Linsen Dong Guanyu Gao Xinyi Zhang School of Computer Science and School of Computer Science and School of Computer Science and Engineering, Nanyang Technological Engineering, Nanyang Technological Engineering, Nanyang Technological University University University Singapore Singapore Singapore [email protected] [email protected] [email protected] Liangyu Chen Yonggang Wen School of Electrical and Electronic School of Computer Science and Engineering, Nanyang Technological Engineering, Nanyang Technological University University Singapore Singapore [email protected] [email protected] ABSTRACT Dyna, GPS, ME, MPC, iLQR, ME- Algorithm Model-Based Reinforcement Learning (MBRL) is one category of PPO, DQN, DDPG, PPO OpenAI Gym, PyBullet, DeepMind Reinforcement Learning (RL) algorithms which can improve sam- Environment pling efficiency by modeling and approximating system dynamics. Control Suite It has been widely adopted in the research of robotics, autonomous Built-in Logging and Visualization, driving, etc. Despite its popularity, there still lacks some sophis- Utility TensorFlow Integration, ticated and reusable open-source frameworks to facilitate MBRL Parameter Management research and experiments. To fill this gap, we develop a flexible User Guide and API References, Open-source MIT License, and modularized framework, Baconian, which allows researchers Support to easily implement a MBRL testbed by customizing or building Benchmark Results Released upon our provided modules and algorithms. Our framework can free users from re-implementing popular MBRL algorithms from Figure 1: Feature list of Baconian. scratch thus greatly save users’ efforts on MBRL experiments. KEYWORDS Existing model-based frameworks are few and have some short- Reinforcement Learning, Model-based Reinforcement Learning, comings. The work in [18] gives a comprehensive benchmark over Open-source Library; state-of-the-art MBRL algorithms, but the implementations are scat- tered across different codebases without a unified implementation, posing obstacles to conduct experiments with it. The work in [5] 1 INTRODUCTION provides the implementations for Guided Policy Search(GPS) [10], Model-Based Reinforcement Learning (MBRL) is proposed to reduce which supports robotics controlling tasks. But it lacks support for sample complexity introduced by model-free Deep Reinforcement other MBRL algorithms. Thus, a unified MBRL open-source frame- arXiv:1904.10762v4 [cs.LG] 16 Mar 2021 Learning (DRL) algorithms [12]. Specifically, MBRL approximates work is in need. To fill this gap, we design and implement aunified the system dynamics with a parameterized model, which can be MBRL framework, Baconian, by trading off the diversity of included utilized for policy optimizing when the training data is very limited MBRL algorithms against the complexity of the framework. Users or costly to obtain in the real world. can reproduce benchmark results or prototype their idea easily Implementing a RL experiments from scratch can be tedious and with it by a minimal amount of codes without understanding the bug-introducing. Luckily, many open-source frameworks have been detailed implementations. Moreover, the design of Baconian not developed to facilitate DRL research, including baselines [3], rllab only benefits the research of MBRL, but is also applicable toother [4], Coach [2], and Horizon [6]. However, these frameworks are types of RL algorithms including model-free algorithms. The code- mainly implemented for model-free DRL methods, and lack enough base is available at https://github.com/cap-ntu/baconian-project. support for MBRL. The demo video is available at https://youtu.be/J6xI6qI3CvE. Proc. of the 19th International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2020), B. An, N. Yorke-Smith, A. El Fallah Seghrouchni, G. Sukthankar (eds.) 2 MAIN FEATURES © 2020 International Foundation for Autonomous Agents and Multiagent Systems (www.ifaamas.org). All rights reserved. Baconian supports many RL algorithms, test environments, and https://doi.org/doi experiment utilities. We summarize the main features in Fig. 1. Linsen Dong, Guanyu Gao, Xinyi Zhang, Liangyu Chen, and Yonggang Wen Experiment Manager Training Engine Agent Status Policy Algorithm Collector Status Train manage Utilize Agent Experiment Control Dynamic Exploration Setup Settings Launch Flow Strategy User Model flow Experiment Control Recorder Environment Record log Sample Flow Create Environment User Monitor Experiment Logger Plotter Send log Data Run Log file/ Console/ Tensorflow model Figure 3: The procedure to create an MBRL experiment in Figure 2: The system design of Baconian. The system is di- Baconian. Each module is replaceable and configurable to vided into three main modularized components to minimize reduce the effort of building from scratch. the coupling for flexibility and maintainability. 3.1 Experiment Manager The Experiment Manager consists of Experiments Settings, Status State-of-the-Art RL Algorithms. We implement many widely Collector, and Experiment Recorder. Experiments Settings manages used RL algorithms. For model-based algorithms, we implement the creating and initialization of each module. Status Collector con- Dyna[15], ME-TRPO (Model-ensemble Trust Region Policy Opti- trols the status information collected across different modules to mization) [9], MPC (Model Predictive Control)[13], iLQR (Iterative compose a globally shared status that can be used including learn- Linear Quadratic Regulator)[17], etc. Since many model-based al- ing rate decay, exploration strategy scheduling, etc. Experiment gorithms are built upon model-free algorithms, we also implement Recorder will record the information generated from the experi- some popular model-free algorithms including DQN[7], DDPG[11], ment, such as loss, rewards. Such information will be handed to the and PPO[14] in Baconian. Monitoring layer for rendering or saving. Supported Test Environments. To evaluate the performance of RL algorithms, it is a must to support a wide range of test environments. Baconian support OpenAI Gym[1], RoboSchool[8], 3.2 Training Engine DeepMind Control Suite[16]. These test environments cover most Training Engine handles the training process of the MBRL algo- essential tasks in RL community. rithms. The novelty of the design lies in abstracting and encap- Experiment Utilities. Baconian provides many utilities to re- sulating the training process as a Control Flow module, which duce users’ efforts on experiment set-up, hyper-parameter tuning, controls the execution processes of the experiment based on the logging, result visualization, and algorithms diagnosis. We provide user’s specifications, including the agent’s sampling from environ- integration of TensorFlow to support neural network building, train- ment, policy model and dynamics model optimization, and testing. ing, and managing. As the hyper-parameters play a critical role in MBRL algorithms can be complicated[12, 15]. Such abstractions RL experiments, we provide user-friendly parameter management can decouple the tangled and complicated MBRL training processes utility to remove the tedious work of setting, loading, and saving into some independent tasks, which are further encapsulated as the these hyper-parameters. sub-modules of Control Flow module for providing flexibility. Open-source Support. Baconian provide detailed user guides and API references1, so users can hand on Baconian easily and 3.3 Monitor conduct novel MBRL research upon it. We also release some pre- Monitor is responsible for monitoring and recording of the exper- liminary benchmark results in the codebase. iment as it proceeds. This includes recording necessary loggings, printing information/warning/error, and rendering the results. 3 DESIGN AND IMPLEMENTATION Baconian consists of three major components, namely, Experiment 4 USAGE Manager, Training Engine, and Monitor. The system overview of This section presents the procedures to create a MBRL experiment Baconian is shown in Fig. 2. Various design patterns are applied to with the essential modules in Baconian. The procedures are shown decouple the complicated dependencies across different modules to in Fig. 32. For high flexibility, most of modules are customizable. enable the easily extension and programming over the framework. Meanwhile, user can directly adopt built-in benchmark module or codes if customization is unnecessary. 2For more details of how to configure these modules, please see the documentation 1 The documentation can be found at https://baconian-public.readthedocs.io/en/latest/ page https://baconian-public.readthedocs.io/en/latest/step_by_step.html due to the API.html. page limit. Baconian: A Unified Open-source Framework for Model-Based Reinforcement Learning First, the user should create a environment and a RL algorithm [4] Yan Duan, Xi Chen, Rein Houthooft, John Schulman, and Pieter Abbeel. 2016. module with necessary hyper-parameters configured, e.g., neural Benchmarking deep reinforcement learning for continuous control. In Interna- tional Conference on Machine Learning. 1329–1338. network size, learning rate. Algorithm module is usually composed [5] C. Finn, M. Zhang, J. Fu, X. Tan, Z. McCarthy, E. Scharff, and S. Levine. 2016. of a policy module and a dynamics model module depending on Guided Policy Search Code Implementation. (2016). http://rll.berkeley.edu/gps Software available from

Load more