Learning Preference Models for Autonomous Mobile Robots in Complex Domains
Total Page:16
File Type:pdf, Size:1020Kb
Learning Preference Models for Autonomous Mobile Robots in Complex Domains David Silver CMU-RI-TR-10-41 December 2010 Robotics Institute Carnegie Mellon University Pittsburgh, Pennsylvania, 15213 Thesis Committee: Tony Stentz, chair Drew Bagnell David Wettergreen Larry Matthies Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Robotics Copyright c 2010. All rights reserved. Abstract Achieving robust and reliable autonomous operation even in complex unstruc- tured environments is a central goal of field robotics. As the environments and scenarios to which robots are applied have continued to grow in complexity, so has the challenge of properly defining preferences and tradeoffs between various actions and the terrains they result in traversing. These definitions and parame- ters encode the desired behavior of the robot; therefore their correctness is of the utmost importance. Current manual approaches to creating and adjusting these preference models and cost functions have proven to be incredibly tedious and time-consuming, while typically not producing optimal results except in the sim- plest of circumstances. This thesis presents the development and application of machine learning tech- niques that automate the construction and tuning of preference models within com- plex mobile robotic systems. Utilizing the framework of inverse optimal control, expert examples of robot behavior can be used to construct models that generalize demonstrated preferences and reproduce similar behavior. Novel learning from demonstration approaches are developed that offer the possibility of significantly reducing the amount of human interaction necessary to tune a system, while also improving its final performance. Techniques to account for the inevitability of noisy and imperfect demonstration are presented, along with additional methods for improving the efficiency of expert demonstration and feedback. The effectiveness of these approaches is confirmed through application to sev- eral real world domains, such as the interpretation of static and dynamic perceptual data in unstructured environments and the learning of human driving styles and maneuver preferences. Extensive testing and experimentation both in simulation and in the field with multiple mobile robotic systems provides empirical confir- mation of superior autonomous performance, with less expert interaction and no hand tuning. These experiments validate the potential applicability of the devel- oped algorithms to a large variety of future mobile robotic systems. Keywords: Mobile Robots, Field Robotics, Learning from Demonstration, Im- itation Learning, Inverse Optimal Control, Active Learning, Preference Models, Cost Functions, Parameter Tuning 2 Contents 1 Introduction9 1.1 Mobile Robotic Systems...................... 10 1.2 Real World Challenges of Robotic Systems............ 10 1.3 Real World Requirements of Autonomous Systems........ 13 2 Related Work in Autonomous Mobile Robotic Systems 15 2.1 Mobile Systems in Semi-Structured Environments......... 16 2.2 Mobile Systems in Unstructured Environments.......... 20 2.3 Recent Trends in Mobile Robotic Systems............. 25 3 Behavioral Preference Models in Mobile Robotic Systems 27 3.1 Continuous Preferences over Behaviors.............. 28 3.2 Manually Constructing Behavioral Preference Models....... 30 3.3 Problem Statement......................... 32 4 Automating Construction of Preference Models 35 4.1 Expert Supervised Learning and Classification........... 36 4.2 Self-Supervised Learning from Experience............. 38 4.3 Supervision through Explicit Reinforcement............ 41 4.4 Supervision through Expert Demonstration............ 42 5 Learning Terrain Preferences from Expert Demonstration 45 5.1 Constrained Optimization from Expert Demonstration....... 46 5.2 Extension to Dynamic and Unknown Environments........ 55 5.3 Imperfect and Inconsistent Demonstration............. 57 5.4 Application to Mobile Robotic Systems.............. 66 5.5 Experimental Results........................ 72 3 CONTENTS CONTENTS 6 Learning Action Preferences from Expert Demonstration 85 6.1 Extending LEARCH to State-Action Pairs............. 86 6.2 Learning Planner Preference Models................ 89 6.3 Application To Mobile Robotic Systems.............. 103 6.4 Experimental Results........................ 107 7 Stable, Consistent, and Efficient Learning from Expert Demonstration 125 7.1 Different Forms of Expert Demonstration and Feedback...... 126 7.2 Active Learning for Example Selection............... 133 7.3 Experimental Results........................ 138 8 Conclusion 149 8.1 Contributions............................ 150 8.2 Future Work............................. 153 Acknowledgements 154 Bibliography 155 4 List of Figures 1.1 Deployed Systems.................................. 11 1.2 Experimental Systems................................ 13 3.1 Preference Models and Robot Behavior....................... 29 3.2 Backpropagation Signal Degradation......................... 31 4.1 Near-To-Far Learning................................. 39 5.1 LEARCH Example.................................. 53 5.2 LEARCH Generalization............................... 54 5.3 Unachievable Example Paths............................. 59 5.4 Replanning with Corridor Constraints........................ 62 5.5 Plan/Behavior Mismatch............................... 65 5.6 Boosted Features................................... 67 5.7 Constrained Planners................................. 72 5.8 Crusher........................................ 73 5.9 Crusher Data Flow.................................. 73 5.10 Cost Ratio....................................... 74 5.11 Corridor Size..................................... 75 5.12 Static Learned vs Hand Tuned............................ 76 5.13 Hand Tuned vs. Learned Prior Costmaps....................... 77 5.14 Static Learned vs Hand Tuned............................ 79 5.15 Crusher Perception Example............................. 80 5.16 Offline Experiments on Logged Perceptual Data................... 81 5.17 Filtering........................................ 81 5.18 Crusher Perception Comparison........................... 82 6.1 Driving Styles..................................... 86 6.2 Cost-to-go and Heading................................ 90 6.3 Motion Planner Action Set.............................. 91 6.4 Receding Horizon Effects............................... 95 6.5 Learned Planner Preference Models......................... 108 6.6 Learned Planner Preference Models......................... 109 6.7 Learned Planner Preference Models......................... 110 6.8 Effects of Slack Re-scaling.............................. 111 6.9 Planning Results................................... 112 5 LIST OF FIGURES LIST OF FIGURES 6.10 PD-LEARCH vs DR-LEARCH........................... 113 6.11 Planning Results................................... 114 6.12 Planner Hand Tuning................................. 115 6.13 Best vs Final Performance.............................. 115 6.14 E-Gator........................................ 117 6.15 E-Gator Learning................................... 118 6.16 E-Gator Perception Costs............................... 120 6.17 E-Gator Hand Tuning................................. 121 6.18 Best vs Final Performance.............................. 122 7.1 Relative Examples.................................. 128 7.2 Acceptable Examples................................. 130 7.3 Dynamic Range.................................... 131 7.4 Test Site........................................ 138 7.5 Redundancy of Expert Demonstration........................ 139 7.6 Relative Examples.................................. 140 7.7 Lethal Examples................................... 141 7.8 Acceptable Examples................................. 142 7.9 Novelty Based Active Learning............................ 143 7.10 Density Estimation.................................. 144 7.11 Uncertainty Based Active Learning.......................... 145 7.12 Active Learning From Scratch............................ 146 7.13 Active Learning.................................... 147 6 List of Tables 5.1 Online Performance of Learned vs. Hand Tuned Prior Costmaps.......... 76 5.2 Online Perception Comparison Results........................ 84 6.1 Online Perception and Planning Comparison Results................ 123 8.1 Summary of Experiments............................... 150 List of Algorithms 1 The linear MMP algorithm............................... 50 2 The LEARCH algorithm................................ 54 3 The Dynamic LEARCH algorithm........................... 58 4 The Dynamic Robust LEARCH algorithm....................... 64 5 The linear LEARCH algorithm with a feature learning phase............. 69 6 D-LEARCH for planner preference models with known perception cost....... 93 7 PD-LEARCH for planner preference models with known perception cost...... 102 8 Learning perception and planner preference models.................. 104 9 Novelty Based Active Learning............................ 136 10 Uncertainty Based Active Learning.......................... 137 7 LIST OF ALGORITHMS LIST OF ALGORITHMS 8 Chapter 1 Introduction The field of robotics is poised to have a dramatic impact on many industrial, scientific, and mil- itary domains. Already, complex robotic systems are moving pallets in warehouses, dismantling bombs or finding mines in volatile regions, and searching for water on Mars. These applications demonstrate