Learning Optimal Parameters for Coordinated Helicopter Turns
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
Learning Robust Rewards with Adversarial Inverse Reinforcement Learning
Published as a conference paper at ICLR 2018 LEARNING ROBUST REWARDS WITH ADVERSARIAL INVERSE REINFORCEMENT LEARNING Justin Fu, Katie Luo, Sergey Levine Department of Electrical Engineering and Computer Science University of California, Berkeley Berkeley, CA 94720, USA [email protected],[email protected], [email protected] ABSTRACT Reinforcement learning provides a powerful and general framework for decision making and control, but its application in practice is often hindered by the need for extensive feature and reward engineering. Deep reinforcement learning meth- ods can remove the need for explicit engineering of policy or value features, but still require a manually specified reward function. Inverse reinforcement learning holds the promise of automatic reward acquisition, but has proven exceptionally difficult to apply to large, high-dimensional problems with unknown dynamics. In this work, we propose AIRL, a practical and scalable inverse reinforcement learn- ing algorithm based on an adversarial reward learning formulation. We demon- strate that AIRL is able to recover reward functions that are robust to changes in dynamics, enabling us to learn policies even under significant variation in the environment seen during training. Our experiments show that AIRL greatly out- performs prior methods in these transfer settings. 1 INTRODUCTION While reinforcement learning (RL) provides a powerful framework for automating decision making and control, significant engineering of elements such as features and reward functions has typically been required for good practical performance. In recent years, deep reinforcement learning has al- leviated the need for feature engineering for policies and value functions, and has shown promising results on a range of complex tasks, from vision-based robotic control (Levine et al., 2016) to video games such as Atari (Mnih et al., 2015) and Minecraft (Oh et al., 2016). -
Reinforcement Learning Lecture Inverse Reinforcement Learning
Reinforcement Learning Inverse Reinforcement Learning Inverse RL, behaviour cloning, apprenticeship learning, imitation learning. Vien Ngo MLR, University of Stuttgart Outline Introduction to Inverse RL • Inverse RL vs. behavioral cloning • IRL algorithms • (Inspired from a lecture from Pieter Abbeel.) 2/?? Inverse RL: Informal Definition Given Measurements of an agent’s behaviour π over time (s ; a ; s0 ), in • t t t different circumstances. If possible, given transition model (not given reward function). π Goal: Find the reward function R (s; a; s0). • 3/?? Inverse Reinforcement Learning Advanced Robotics Vien Ngo Machine Learning & Robotics lab, University of Stuttgart Universittsstrae 38, 70569 Stuttgart, Germany June 17, 2014 1 A Small Maze Domain Given a 2 4 maze as in figure. Given a MDP/R = S,A,P , where S is a state space consisting of 8 states; A is an action × { } space consisting of four movement actions move-left,move-right,move-down,move-up , the optimal policy (arrows in { } figure), and the transition function T (s, a, s0) = P (s0 s, a). | Goal π∗ 1 Given γ = 0.95, compute (I γP )− . • − Then, write full formulation of the LP problem in slide # 12. • Bonus: use fminimax function in Matlab (or other LP library) to find R∗. • 4/?? 1 inspired from a poster of Boularias, Kober, Peters. Construction of an intelligent agent in a particular domain: Car driver, • helicopter (Ng et al),... (imitation learning, apprenticeship learning) Motivation: Two Sources The potential use of RL/related methods as computational model for • animal and human learning: bee foraging (Montague et al 1995), song-bird vocalization (Doya & Sejnowski 1995), .. -
Air Dominance Through Machine Learning: a Preliminary Exploration of Artificial Intelligence–Assisted Mission Planning
C O R P O R A T I O N LI ANG ZHANG, JIA XU, DARA GOLD, JEFF HAGEN, AJAY K. KOCHHAR, ANDREW J. LOHN, OSONDE A. OSOBA Air Dominance Through Machine Learning A Preliminary Exploration of Artificial Intelligence– Assisted Mission Planning For more information on this publication, visit www.rand.org/t/RR4311 Library of Congress Cataloging-in-Publication Data is available for this publication. ISBN: 978-1-9774-0515-9 Published by the RAND Corporation, Santa Monica, Calif. © Copyright 2020 RAND Corporation R® is a registered trademark. Limited Print and Electronic Distribution Rights This document and trademark(s) contained herein are protected by law. This representation of RAND intellectual property is provided for noncommercial use only. Unauthorized posting of this publication online is prohibited. Permission is given to duplicate this document for personal use only, as long as it is unaltered and complete. Permission is required from RAND to reproduce, or reuse in another form, any of its research documents for commercial use. For information on reprint and linking permissions, please visit www.rand.org/pubs/permissions. The RAND Corporation is a research organization that develops solutions to public policy challenges to help make communities throughout the world safer and more secure, healthier and more prosperous. RAND is nonprofit, nonpartisan, and committed to the public interest. RAND’s publications do not necessarily reflect the opinions of its research clients and sponsors. Support RAND Make a tax-deductible charitable contribution at www.rand.org/giving/contribute www.rand.org Preface This project prototypes a proof of concept artificial intelligence (AI) system to help develop and evaluate new concepts of operations in the air domain. -
Occam's Razor Is Insufficient to Infer The
Occam’s razor is insufficient to infer the preferences of irrational agents Stuart Armstrong *∗ y Soren¨ Mindermann* z Future of Humanity Institute Vector Institute University of Oxford University of Toronto [email protected] [email protected] Abstract Inverse reinforcement learning (IRL) attempts to infer human rewards or pref- erences from observed behavior. Since human planning systematically deviates from rationality, several approaches have been tried to account for specific human shortcomings. However, the general problem of inferring the reward function of an agent of unknown rationality has received little attention. Unlike the well-known ambiguity problems in IRL, this one is practically relevant but cannot be resolved by observing the agent’s policy in enough environments. This paper shows (1) that a No Free Lunch result implies it is impossible to uniquely decompose a policy into a planning algorithm and reward function, and (2) that even with a reasonable simplicity prior/Occam’s razor on the set of decompositions, we cannot distinguish between the true decomposition and others that lead to high regret. To address this, we need simple ‘normative’ assumptions, which cannot be deduced exclusively from observations. 1 Introduction In today’s reinforcement learning systems, a simple reward function is often hand-crafted, and still sometimes leads to undesired behaviors on the part of RL agent, as the reward function is not well aligned with the operator’s true goals4. As AI systems become more powerful and autonomous, these failures will become more frequent and grave as RL agents exceed human performance, operate at time-scales that forbid constant oversight, and are given increasingly complex tasks — from driving cars to planning cities to eventually evaluating policies or helping run companies. -
Generative Adversarial Imitation Learning
Under review as a conference paper at ICLR 2021 LEARNING FROM DEMONSTRATIONS WITH ENERGY BASED GENERATIVE ADVERSARIAL IMITATION LEARNING Anonymous authors Paper under double-blind review ABSTRACT Traditional reinforcement learning methods usually deal with the tasks with explicit reward signals. However, for vast majority of cases, the environment wouldn’t feedback a reward signal immediately. It turns out to be a bottleneck for modern reinforcement learning approaches to be applied into more realistic scenarios. Recently, inverse reinforcement learning (IRL) has made great progress in making full use of the expert demonstrations to recover the reward signal for reinforcement learning. And generative adversarial imitation learning is one promising approach. In this paper, we propose a new architecture for training generative adversarial imitation learning which is so called energy based generative adversarial imitation learning (EB-GAIL). It views the discriminator as an energy function that attributes low energies to the regions near the expert demonstrations and high energies to other regions. Therefore, a generator can be seen as a reinforcement learning procedure to sample trajectories with minimal energies (cost), while the discriminator is trained to assign high energies to these generated trajectories. In detail, EB-GAIL uses an auto-encoder architecture in place of the discriminator, with the energy being the reconstruction error. Theoretical analysis shows our EB-GAIL could match the occupancy measure with expert policy during the training process. Meanwhile, the experiments depict that EB-GAIL outperforms other SoTA methods while the training process for EB-GAIL can be more stable. 1 INTRODUCTION Motivated by applying reinforcement learning algorithms into more realistic tasks, we find that most realistic environments cannot feed an explicit reward signal back to the agent immediately.