Autonomous Motion Learning for Near Optimal Control
Total Page:16
File Type:pdf, Size:1020Kb
AUTONOMOUS MOTION LEARNING FOR NEAR OPTIMAL CONTROL Dissertation Submitted to The School of Engineering of the UNIVERSITY OF DAYTON In Partial Fulfillment of the Requirements for The Degree of Doctor of Philosophy in Electrical Engineering By Alan Lance Jennings Dayton, Ohio August, 2012 AUTONOMOUS MOTION LEARNING FOR NEAR OPTIMAL CONTROL Name: Jennings, Alan Lance APPROVED BY: Raul´ Ordo´nez,˜ Ph.D. Frederick G. Harmon, Ph.D., Lt Col Advisor, Committee Chairman Committee Member Associate Professor, Electrical and Assistant Professor, Dept of Aeronautics and Computer Engineering Astronautics Eric Balster, Ph.D. Andrew Murray, Ph.D. Committee Member Committee Member Assistant Professor, Electrical and Computer Associate Professor, Mechanical and Engineering Aerospace Engineering John G. Weber, Ph.D. Tony E. Saliba, Ph.D. Associate Dean Dean, School of Engineering School of Engineering & Wilke Distinguished Professor ii c Copyright by Alan Lance Jennings All rights reserved 2012 ABSTRACT AUTONOMOUS MOTION LEARNING FOR NEAR OPTIMAL CONTROL Name: Jennings, Alan Lance University of Dayton Advisor: Dr. Raul´ Ordo´nez˜ Human intelligence has appealed to the robotics community for a long time; specifically, a person’s ability to learn new tasks efficiently and eventually master the task. This ability is the result of decades of development as a person matures from an infant to an adult and a similar developmental period seems to be required if robots are to obtain the ability to learn and master new skills. Applying developmental stages to robotics is a field of study that has been growing in acceptance. The paradigm shift is from directly pursuing the desired task to progressively building competencies until the desired task is reached. This dissertation seeks to apply a developmental approach to autonomous optimization of robotic motions, and the methods presented extend to function shaping and parameter optimization. Humans have a limited ability to concentrate on multiple tasks at once. For robots with many degrees of freedom, human operators need a high-level interface, rather than controlling the positions of each angle. Mo- tion primitives are scalable control signals that have repeatable, high-level results. Examples include walking, jumping or throwing where the result can be scaled in terms of speed, height or distance. Traditionally, mo- tion primitives require extensive, robot-specific analysis making development of large databases of primitives infeasible. This dissertation presents methods of autonomously creating and refining optimal inverse func- tions for use as motion primitives. By clustering contiguous local optima, a continuous inverse function can iii be created by interpolating results. The additional clusters serve as alternatives if the chosen cluster is poorly suited to the situation. For multimodal problems, a population based optimization can efficiently search a large space. Staged learning offers a path to mimic the progression from novice to master, as seen in human learning. The dimension of the input wave parameterization, which is the number degrees of freedom for optimiza- tion, is incremented to allow for additional improvement. As the parameterization increases in order, the true optimal continuous-time control signal is approached. All previous experiences can be directly moved to the higher parameterization when expanding the parameterization, if a proper parameterization is selected. Incrementally increasing complexity and retaining experience efficiently optimizes to high dimensions when contrasted with undirected global optimizations, which would need to search the entire high dimension space. The method presented allows for unbounded resolution since the parameterization is not fixed at program- ming. This dissertation presents several methods that make steps towards the goal of learning and mastering motion-related tasks without programmed, task-specific heuristics. Trajectory optimization based on a high- level system description has been demonstrated for a robotic arm performing a pick-place task. In addition, the inverse optimal function was applied to optimizing robotic tracking precision in a method suitable for online tracking. Staging of the learning is able determine an optimal motor spin-up waveform despite large variations in system parameters. Global optimization, using a population based search, and unbounded reso- lution increasing provide the foundation for autonomously developing scalable motions superior to what can be designed by hand. iv And the Lord God doth work by means to bring about his great and eternal purposes; and by very small means the Lord doth confound the wise and bringeth about the salvation of many souls. Alma 37:7 That is which is of God is light; and he that receiveth light, and continueth in God, receiveth more light; and that light groweth brighter and brighter until the perfect day. Doctrine and Covenants 50:24 And Jesus increased in wisdom and stature, and in favour with God and man. Luke 2:52 v ACKNOWLEDGMENTS My sincere thanks goes to the Dayton Area Graduate Studies Institute (DAGSI) and the Ohio Space Grant Consortium (OSGC) for their generous support and recruiting me to the Dayton area. vi TABLE OF CONTENTS Page ABSTRACT . iii DEDICATION . v ACKNOWLEDGMENTS . vi TABLE OF CONTENTS . vii LIST OF FIGURES . x LIST OF TABLES . xv CHAPTERS: I. INTRODUCTION . 1 II. PROBLEM STATEMENT . 7 III. DEVELOPMENTAL LEARNING . 12 3.1 The Basis and Need for Developmental Learning . 12 3.2 Nature of Developmental Learning . 14 3.3 The Categorization Problem . 16 3.4 Sensorimotor Control . 18 3.5 Mental and Social Learning . 19 3.6 The Progress and Potential of the Developmental Learning Paradigm . 21 3.7 Developmental Structures . 22 3.8 Directed Learning . 26 3.9 Function Approximators . 30 IV. NUMERICAL PATH OPTIMIZATION . 36 4.1 Introduction . 37 4.2 Problem Statement . 39 vii 4.3 Method . 40 4.4 Optimality Validation . 43 4.4.1 Setting up the problem . 43 4.4.2 Path evaluation . 47 4.4.3 Optimality conditions . 50 4.5 Practical Application: Industrial Robot . 52 4.6 Conclusion . 55 V. INVERSE FUNCTION CLUSTERING . 58 5.1 Introduction . 59 5.2 Problem Statement . 61 5.3 Algorithm . 62 5.3.1 Optimization . 62 5.3.2 Execution and cluster organization . 65 5.4 Results . 67 5.4.1 2-D test problems . 67 5.4.2 Robotic arm control . 72 5.5 Conclusion . 84 5.6 Addendum . 85 5.6.1 Convergence proof . 85 VI. INCREASING RESOLUTION . 88 6.1 Introduction . 90 6.2 Problem Statement . 92 6.3 Methodology . 95 6.4 Test Problems . 102 6.5 Physical Problem . 108 6.6 Conclusions . 114 VII. CONCLUSION . 117 7.1 Focused Problem Statements . 118 7.1.1 How does directed learning fit in a global or local search context? . 118 7.1.2 How can motions be optimized based on a high-level rigid body representation? . 119 7.1.3 How can operation of a multiple-input, single-output system be simplified when considering tracking and optimization? . 119 7.1.4 How can equivalent, but fundamentally different motions be organized? . 119 7.1.5 How should motion primitives be represented to facilitate incremental increases in the number of control parameters? . 120 7.1.6 What are the limitations fundamental to staged learning? . 121 7.2 Considerations . 121 viii BIBLIOGRAPHY . 123 Appendices: A. PROGRAM EXAMPLES . 142 A.1 Pendulum Control Optimization . 142 A.2 Robotic Arm Pick-Place Optimization . 142 A.3 Using Continuous Optimal Inverse Functions for Optimizing Precision . 143 A.4 Testing of Continuous Autonomous Learning . 143 ix LIST OF FIGURES Figure Page 1 The pendulum is either inverted or suspended by changing the direction of gravity with re- spect to θ = 0. The solid model used is shown on the right. It consists of the pendulum and a small piece serving as the pin of rotation. 45 2 The SimMechanics program solves the kinetic problem using relations of forces and rigid bodies. 46 3 For a suspended pendulum, the cost of all methods is comparable suggesting that results are reasonable. However, due to saturation and geometric nonlinearities, the linear based methods fail to reach the final state for some angles, while DIDO gives consistently good results. 48 4 For an inverted pendulum, the LQR gives a slightly higher cost then DIDO. The unstable nature of this system makes the results of the LQ controller irrelevant due to the large final error. The cost peaks at 120 degrees due to the large initial torque required as the pendulum is extended sideways. 49 5 For a suspended pendulum beginning at a small angle, the results from numerical optimiza- tion are compared to results using an LQ and LQR controller. With this weak nonlinearity, all controllers reach the desired state. The path-optimal, LQ and DIDO controllers exploit the self righting nature of the pendulum. The LQR controller is a feedback controller and the large initial error has a corresponding spike in control. ..