AUTONOMOUS MOTION LEARNING FOR NEAR OPTIMAL CONTROL
Dissertation
Submitted to
The School of Engineering of the
UNIVERSITY OF DAYTON
In Partial Fulfillment of the Requirements for
The Degree of
Doctor of Philosophy in Electrical Engineering
By
Alan Lance Jennings
Dayton, Ohio
August, 2012 AUTONOMOUS MOTION LEARNING FOR NEAR OPTIMAL CONTROL
Name: Jennings, Alan Lance
APPROVED BY:
Raul´ Ordo´nez,˜ Ph.D. Frederick G. Harmon, Ph.D., Lt Col Advisor, Committee Chairman Committee Member Associate Professor, Electrical and Assistant Professor, Dept of Aeronautics and Computer Engineering Astronautics
Eric Balster, Ph.D. Andrew Murray, Ph.D. Committee Member Committee Member Assistant Professor, Electrical and Computer Associate Professor, Mechanical and Engineering Aerospace Engineering
John G. Weber, Ph.D. Tony E. Saliba, Ph.D. Associate Dean Dean, School of Engineering School of Engineering & Wilke Distinguished Professor
ii c Copyright by
Alan Lance Jennings
All rights reserved
2012 ABSTRACT
AUTONOMOUS MOTION LEARNING FOR NEAR OPTIMAL CONTROL
Name: Jennings, Alan Lance University of Dayton Advisor: Dr. Raul´ Ordo´nez˜
Human intelligence has appealed to the robotics community for a long time; specifically, a person’s ability to learn new tasks efficiently and eventually master the task. This ability is the result of decades of development as a person matures from an infant to an adult and a similar developmental period seems to be required if robots are to obtain the ability to learn and master new skills. Applying developmental stages to robotics is a field of study that has been growing in acceptance. The paradigm shift is from directly pursuing the desired task to progressively building competencies until the desired task is reached. This dissertation seeks to apply a developmental approach to autonomous optimization of robotic motions, and the methods presented extend to function shaping and parameter optimization.
Humans have a limited ability to concentrate on multiple tasks at once. For robots with many degrees of freedom, human operators need a high-level interface, rather than controlling the positions of each angle. Mo- tion primitives are scalable control signals that have repeatable, high-level results. Examples include walking, jumping or throwing where the result can be scaled in terms of speed, height or distance. Traditionally, mo- tion primitives require extensive, robot-specific analysis making development of large databases of primitives infeasible. This dissertation presents methods of autonomously creating and refining optimal inverse func- tions for use as motion primitives. By clustering contiguous local optima, a continuous inverse function can
iii be created by interpolating results. The additional clusters serve as alternatives if the chosen cluster is poorly suited to the situation. For multimodal problems, a population based optimization can efficiently search a large space.
Staged learning offers a path to mimic the progression from novice to master, as seen in human learning.
The dimension of the input wave parameterization, which is the number degrees of freedom for optimiza- tion, is incremented to allow for additional improvement. As the parameterization increases in order, the true optimal continuous-time control signal is approached. All previous experiences can be directly moved to the higher parameterization when expanding the parameterization, if a proper parameterization is selected.
Incrementally increasing complexity and retaining experience efficiently optimizes to high dimensions when contrasted with undirected global optimizations, which would need to search the entire high dimension space.
The method presented allows for unbounded resolution since the parameterization is not fixed at program- ming.
This dissertation presents several methods that make steps towards the goal of learning and mastering motion-related tasks without programmed, task-specific heuristics. Trajectory optimization based on a high- level system description has been demonstrated for a robotic arm performing a pick-place task. In addition, the inverse optimal function was applied to optimizing robotic tracking precision in a method suitable for online tracking. Staging of the learning is able determine an optimal motor spin-up waveform despite large variations in system parameters. Global optimization, using a population based search, and unbounded reso- lution increasing provide the foundation for autonomously developing scalable motions superior to what can be designed by hand.
iv And the Lord God doth work by means to bring about his great and eternal purposes; and by very small means the Lord doth confound the wise and bringeth about the salvation of many souls. Alma 37:7
That is which is of God is light; and he that receiveth light, and continueth in God, receiveth more light; and that light groweth brighter and brighter until the perfect day.
Doctrine and Covenants 50:24
And Jesus increased in wisdom and stature, and in favour with God and man.
Luke 2:52
v ACKNOWLEDGMENTS
My sincere thanks goes to the Dayton Area Graduate Studies Institute (DAGSI) and the Ohio Space Grant
Consortium (OSGC) for their generous support and recruiting me to the Dayton area.
vi TABLE OF CONTENTS
Page
ABSTRACT ...... iii
DEDICATION ...... v
ACKNOWLEDGMENTS ...... vi
TABLE OF CONTENTS ...... vii
LIST OF FIGURES ...... x
LIST OF TABLES ...... xv
CHAPTERS:
I. INTRODUCTION ...... 1
II. PROBLEM STATEMENT ...... 7
III. DEVELOPMENTAL LEARNING ...... 12
3.1 The Basis and Need for Developmental Learning ...... 12 3.2 Nature of Developmental Learning ...... 14 3.3 The Categorization Problem ...... 16 3.4 Sensorimotor Control ...... 18 3.5 Mental and Social Learning ...... 19 3.6 The Progress and Potential of the Developmental Learning Paradigm ...... 21 3.7 Developmental Structures ...... 22 3.8 Directed Learning ...... 26 3.9 Function Approximators ...... 30
IV. NUMERICAL PATH OPTIMIZATION ...... 36
4.1 Introduction ...... 37 4.2 Problem Statement ...... 39
vii 4.3 Method ...... 40 4.4 Optimality Validation ...... 43 4.4.1 Setting up the problem ...... 43 4.4.2 Path evaluation ...... 47 4.4.3 Optimality conditions ...... 50 4.5 Practical Application: Industrial Robot ...... 52 4.6 Conclusion ...... 55
V. INVERSE FUNCTION CLUSTERING ...... 58
5.1 Introduction ...... 59 5.2 Problem Statement ...... 61 5.3 Algorithm ...... 62 5.3.1 Optimization ...... 62 5.3.2 Execution and cluster organization ...... 65 5.4 Results ...... 67 5.4.1 2-D test problems ...... 67 5.4.2 Robotic arm control ...... 72 5.5 Conclusion ...... 84 5.6 Addendum ...... 85 5.6.1 Convergence proof ...... 85
VI. INCREASING RESOLUTION ...... 88
6.1 Introduction ...... 90 6.2 Problem Statement ...... 92 6.3 Methodology ...... 95 6.4 Test Problems ...... 102 6.5 Physical Problem ...... 108 6.6 Conclusions ...... 114
VII. CONCLUSION ...... 117
7.1 Focused Problem Statements ...... 118 7.1.1 How does directed learning fit in a global or local search context? ...... 118 7.1.2 How can motions be optimized based on a high-level rigid body representation? . 119 7.1.3 How can operation of a multiple-input, single-output system be simplified when considering tracking and optimization? ...... 119 7.1.4 How can equivalent, but fundamentally different motions be organized? ...... 119 7.1.5 How should motion primitives be represented to facilitate incremental increases in the number of control parameters? ...... 120 7.1.6 What are the limitations fundamental to staged learning? ...... 121 7.2 Considerations ...... 121
viii BIBLIOGRAPHY ...... 123
Appendices:
A. PROGRAM EXAMPLES ...... 142
A.1 Pendulum Control Optimization ...... 142 A.2 Robotic Arm Pick-Place Optimization ...... 142 A.3 Using Continuous Optimal Inverse Functions for Optimizing Precision ...... 143 A.4 Testing of Continuous Autonomous Learning ...... 143
ix LIST OF FIGURES
Figure Page
1 The pendulum is either inverted or suspended by changing the direction of gravity with re- spect to θ = 0. The solid model used is shown on the right. It consists of the pendulum and a small piece serving as the pin of rotation...... 45
2 The SimMechanics program solves the kinetic problem using relations of forces and rigid bodies...... 46
3 For a suspended pendulum, the cost of all methods is comparable suggesting that results are reasonable. However, due to saturation and geometric nonlinearities, the linear based methods fail to reach the final state for some angles, while DIDO gives consistently good results...... 48
4 For an inverted pendulum, the LQR gives a slightly higher cost then DIDO. The unstable nature of this system makes the results of the LQ controller irrelevant due to the large final error. The cost peaks at 120 degrees due to the large initial torque required as the pendulum is extended sideways...... 49
5 For a suspended pendulum beginning at a small angle, the results from numerical optimiza- tion are compared to results using an LQ and LQR controller. With this weak nonlinearity, all controllers reach the desired state. The path-optimal, LQ and DIDO controllers exploit the self righting nature of the pendulum. The LQR controller is a feedback controller and the large initial error has a corresponding spike in control...... 50
6 For a suspended pendulum beginning at a large angle, the results from numerical optimization are compared to results using an LQ and LQR controller. With this strong nonlinearity, the LQ controller fails outright while the LQR approaches but does not reach the goal. DIDO is able to accurately reach the final state...... 51
x 7 For an inverted pendulum beginning at a small angle, the results from numerical optimization are compared to results using an LQ and LQR controller. Even though the nonlinearity is weak, its unstable nature causes the open-loop LQ controller to not reach the final state. The feedback LQR controller takes a initially aggressive approach compared to the optimized controller...... 51
8 For an inverted pendulum beginning at a large angle, the results from numerical optimization are compared to results using an LQ and LQR controller. The LQ controller fails again, noticeably suffering from saturation effects. The LQR controller approaches the final state asymptotically while DIDO directly approaches the final state...... 52
9 The top surface shows how the control signal for an inverted pendulum changes as a result of the starting angle. The control saturates around a 90◦ starting angle due to the high initial torque to overcome the most powerful gravity torque. When the control constraint is not active, the optimality residual should be zero. Otherwise, their product should be negative; as is the case with these results...... 53
10 The residuals for all nodes for all starting angles show the optimality conditions are met to a reasonable numeric standard. Control residuals are shown as black boxes, state residuals are shown as green circles and costate residuals are shown as red crosses...... 53
11 The coordinates for the four degree-of-freedom arm are shown on the left. The initial and final poses are in the middle and right respectively...... 54
12 The path of the arm is shown by poses...... 55
13 The optimized path is shown on the left compared to the traditional path on the right. . . . . 56
14 There are two phases of the algorithm. The clusters of optima are found forming the functions
hk(y). The set of hk’s are used to adapt to changing conditions, such as the operator set point,
yd...... 63
th th 15 The k cluster is formed about the k settled point, xk,0. New agents are added in the positive and negative output directions by Eqn.’s 24 and 25, shown by ’s at [0]. New agents × step by Eqn. 21 and eventually settle to locations shown by ’s, which are added to the ◦ cluster provided they meet certain criteria. The process of adding and updating agents is
repeated until that direction is terminated. The set of settled points are then used form hk via interpolation as shown by the dashed line...... 66
16 The clusters formed with the quadratic cost function and the periodic output function are shown. Different clusters are shown by a different color. Each cluster represents a locally- optimal point for the output contour...... 69
xi 17 The clusters for Figure 16 are shown as yd varies. Each dimension of the set of hk’s is shown along with the associated cost. The light green cluster is obviously the preferred cluster
covering all of yd and always having the lowest cost...... 69
18 The clusters formed with the linear-quadratic cost function and the multimodal Gaussian as the output function are shown. Different clusters are shown by a different color. Each cluster represents a locally-optimal point for the output contour...... 70
19 The clusters for Figure 18 are shown as yd varies. Each dimension of the set of hk’s is shown along with the associated cost. The cost chart can be used to select an appropriate cluster. If one cluster cannot be found, then the upper charts can be used to find clusters that are near,
such as the red and yellow cluster going from yd < 0 to yd > 0 at x1 = 1 and x2 = 1. . 70 − −
20 A cluster from Figure 16 is tested for optimality and accuracy. Accuracy is checked by gener-
ating test points (in dark blue) on the cluster (in red) for random values of yd and comparing ∗ ∗ f(x ) yd to zero. Test points in the neighborhood of x (in light green) are compared by − cost in Υk. Cluster points are always lower then the test-points except in the case where the test points extend away from the cluster and into another cluster’s optimality neighborhood. . 71
21 The planar robot has three revolute joints. Additionally, the base rotates so that the robot’s plane can be arbitrarily chosen...... 73
22 The cost function was scaled so that the magnitude of its gradient was similar to the output function’s gradient...... 74
23 The output has mostly elliptical contours resulting in low cost along the major axis and higher costs along the minor axis...... 75
24 The inverse functions are traced in the joint angle space and distinguished by marker type. . 76
25 Once the application is known, the range of distances will limit the choice of clusters. The operator could also consider the distribution of the cost or other factors, such as if a positive
or negative θ1 is desirable. Marker codes match results in Figure 24...... 76
26 Points interpolated for each inverse function of the HP-3 show the expected output with sur- rounding points showing higher costs...... 77
27 Poses of the HP-3 as it traces a circle using standard planning. For reference, a circle showing the entire path is added...... 79
28 Poses of the HP-3 as it traces a circle using an optimal inverse function. For reference, the circle showing the entire path is added...... 79
xii 29 Use of the optimal inverse function gave about a 25% improvement in precision over the standard pose. A joint resolution of 0.088 deg/pulse is assumed to provide a magnitude to the results...... 79
30 The complex robot has seven revolute joints though only θ3, θ4 and θ5 appear in the optimiza- tion...... 81
31 The cost and output functions are ellipsoidal in θ3 and θ5, but θ4 has a lesser effect shown by
the minor changes along the θ4 direction...... 81
32 Due to additional symmetries of the IA-20, twice as many clusters are formed for the 3 dimensional optimization...... 82
33 Points interpolated from the inverse function (dark blue) show the expected output and sur- rounding points (light green) show higher costs...... 83
34 A memory-based model interacts with the system by collecting (a, y, J) triplets as needed
to ensure that estimates yˆ(aq) and Jˆ(aq) have sufficient data in the proximity of the queried
point, aq...... 94
35 Optimal inputs, a∗, are determined by optimization over the memory model. Sets of a∗ are
collected and organized by yd into a reflex function. As needed, the process in Figure 34 is employed to ensure accuracy...... 94
36 After the operator selects a desired output, the optimal input based on the current parameter- ization is generated by simple interpolation...... 94
37 By creating the higher dimension function from the lower dimension function, no discontinu- ity is introduced at the new node location. Therefore the higher dimension function exactly produces the lower dimension function...... 99
38 One possible scheme of locations for adding a new node is by adding a node at the midpoint to the first largest interval...... 99
39 Basis functions are scaled and offset so that they are orthonormal...... 101
40 Beyond N = 4, results appear identical by sight. Bootstrapped, direct optimization also appear identical...... 104
41 Results have comparable accuracy with direct optimization...... 105
xiii 42 The number of samples required grows exponentially with the resolution of u(t). Equations of the trend lines are 12, 000 100.031N for direct optimization and 4, 000 100.068N for the · · memory-based model...... 105
43 As the size of the set of yd increases, fewer samples are needed for the new points with the memory-based method. Direct optimization however is purely linear. The line types are dash-dot, dash and solid for N=5, 9 and 14, respectively...... 106
44 Results when the sine frequency is tripled behave as expected and are shown for yd = 0.07, 0.38, 0.82...... 108
45 The rate of improvement offers no guarantee over future improvement. Results when the sine frequency is tripled have a slow initial rate of improvement before performance dramatically improves...... 108
46 The input signal controls a motor through an amplifier. A peak detector senses the maximum current; while a low pass filter measures the final speed from a tachometer...... 110
47 Results for ten independent runs are shown in boxplot format (box extends from 25-75% with median shown by a line). The median continues to decrease until the 5th dimension...... 112
48 A representative set of example waveforms for the motor start up problem are shown as the resolution increases...... 113
49 The actual output is within the confidence interval of the expected output...... 114
xiv LIST OF TABLES
Table Page
1 Pendulum Parameters ...... 44
2 Pseudocode for Connecting SimMechanics to DIDO ...... 45
3 Fundamental Algorithm of Unbounded Learning ...... 96
xv CHAPTER I
INTRODUCTION
Robots typically offer many degrees of freedom making it possible for them to accomplish a wide variety of tasks. However, the many degrees of freedom add a substantial burden to an operator or programmer. This dissertation presents methods to improve the interface by helping the robot to autonomously learn motions. Traditional machine learning techniques focus on solving a given task, but are then typically limited by programmed heuristics or the chosen resolution. Developmental robotics however seeks to create robots that learn general skills and develop over time, similar to human development. In this chapter, the traditional method of programming motions ‘by hand’ is contrasted with having the robot progressively learn. Staged learning allows for better performance by being able to develop over an extended period of time and eventually beyond the programmers ability or understanding.
Robots are defined by their ability to do many different tasks yet the ability to naturally control them lies beyond our grasp. Robots are hard to operate. For familiar tasks, the human brain abstracts the task into a motion so that attention is not diverted. Some common tasks, such as tying knots, shooting a basketball, climbing stairs or swimming, are acted out subconsciously. Despite the fad of human multitasking, studies have shown that performance significantly decreases as attention is diverted [1]. Only a very few simultaneous tasks can be consciously handled. Because robots are made to do many different tasks, they typically require many degrees of freedom, each being actively controlled. Often the human-machine interface is not natural but requires a high degree of attention. This results in teleoperation requiring singular attention and having a high rate of failure. The solution to this problem is training (such as with unmanned air vehicles), having a very natural interface (such as with surgical instruments and haptic feedback) or providing the user with a high-level control. For example, cars automatically calculate the proper timing and fuel quantity based on a 1 high-level command coming from the accelerator pedal. The approach taken in this dissertation is to improve the interface and reduce the burden on the operator, rather than focusing on training of the operator. The research of this work has been directed to methods so that robots can autonomously develop motions.
Having robots learn, or programming intelligence, is not a novel concept. Three key technologies provide the foundation of an intelligent robot. First, sensors, such as digital cameras, microphones and pressure sensors, allow robots to receive much of the same information that humans use. Second, the growth in computing power has made processing volumes of data online possible for reasoning on complex problems.
Finally, electro-mechanical systems offer a method for a computer to act on decisions it makes. Together, they offer the ability to observe, reason and act. Reasoning allows for creation of system models, meaning being able to make predictions based on observations and actions. Firsthand observations allow for perfecting a system model. Acting allows the agent to direct its process of learning and shape its environment. Each of these three disciplines have had recent growth, improving the practicality of highly intelligent agents.
However, simply adding these three components does not guarantee good performance. For simple, structured tasks, a programmer could reason the appropriate behavior. For some tasks, (such as playing Jeopardy! [2]), the structure for organizing data and the methods for making decisions can be programmed with the specifics being filled in by experience. How to create general purpose intelligence for robotic agents has not been solved. The scope of this dissertation focuses on learning motions. The learned motions compose a set of primitives. For each motion primitive, the appropriate low-level command can easily be determined based on a desired objective.
The difficulty humans have manipulating complex systems results in poor performance. However, once a task can be understood and handled using high-level concepts, performance improves dramatically. Pro- gramming is an excellent example. At the machine code level, programs take tremendous time to create and require significant debugging. Compilers now are used to abstract programs to libraries of functions. Other programs create low-level programs from high-level languages, such as LabVIEW, MATLAB or web-page
2 design. Even though high-level programs are executed as low-level instructions at some point, programmers do not have to be concerned with details, such as matrix multiplication of floating point numbers, and can focus on high-level objectives.
Motion primitives act like functions, in that these motions can be called with expected results. A motion primitive is a coordinated control law, that when executed produces a specific outcome. Many degrees of freedom can be reduced to understandable objectives such as bulk movement. For example, efficient walking requires motion of all legs but really is a one dimensional motion defined by a desired speed. Other primitives could be turning, jumping, interception or following. Multiple primitives could accomplish the same task, such as walking with different nuances; for instance, long strides for running, short strides for more stability, hopping, or skipping. The different versions offer alternatives if one appears unsuited for the situation at hand. With a library of motions, an operator could manage the system at an objective level, rather than controlling each joint, with alternative inputs for different considerations.
Work has been done in creating motion primitives in the robotic community, but mostly in ‘by hand’ type of approaches. The system and motion are analyzed by an engineer (or a team of engineers) who composes a trajectory or control law that gives sufficient behavior. The engineer normally has an idea of what would succeed and pursues it. The reason these are done by hand is that a solution to the task is desired, and the ‘by hand’ method is the most effective and reliable method to date. Here are some examples: Shigeo
Hirose et al. of HiBot created a robot to inspect high voltage power lines and uses a motion learned from an operator to pass obstacles it encounters [3]. Hirose has also done other work to develop snake robots, walking robots and combinations of wheeled and crawling robots [4, 5]. Bojan Nemec from the Jozef Stefan Institute created a controller for autonomous skiing [6], but the question-and-answer at the presentation at IEEE/RSJ
International Conference on Intelligent Robots and Systems is telling of the challenge of ‘by hand’ control design. “Can it stop? Yes, sort of. What would it need to be able to compete with humans in a down hill
3 skiing race in, say, 2015? [Pause.] I think people are expecting too much from this robot.”1 The process of opening a heavy door was addressed by Hitoshi Arisumi of Japan’s National Institute of Advanced Industrial
Science and Technology [7]. Many hexapods have been created that use one central motor with linkages that produce a fixed walking motion. The investment in design and tuning shows how individually producing a large number of motion primitives ‘by hand’ is infeasible. The proposed solution is autonomous learning.
In order to learn autonomously, the system must be able to plan and conduct experiments without outside involvement. The system must be able to decide which inputs should be tested in order to develop. It must have internal metrics for evaluating its progress in learning. Results should be automatically organized in a useful fashion, so that high-level planners can take advantage of learned motions without needing human interpretation. Therefore, autonomous learning is best suited for objective tasks, such as locomotion; rather than subjective tasks, which are best learned in a social context. The obvious advantage of autonomous learning is the lack of requiring a human trainer. More subtle advantages include not being affected by biases in human trainers and better sharing of learned skills among similar robots because to group can coevolve.
Staged learning draws inspiration from the biological developmental cycle. Biological systems often cannot immediately solve novel complex tasks. However, when the task is similar to previous tasks, a solution can be found. Looking from the other direction, by learning to develop solutions for simple tasks, solutions to related but more complex tasks will come naturally. Staged learning begins with a restricted search space. The small space is used to identify productive areas, meaning regions that offer repeatable, useful motions. The standard example is learning to crawl and walk before running. Dynamic actions like running offer a smaller set of satisfactory inputs. So balance is developed by standing or hopping in place, with results than extended to dynamic actions. The advantages of breaking the learning process into smaller tasks can be seen from how linearization is used in nonlinear programming. Even though the global behavior is nonlinear, reducing the problem into a linear optimization allows for using a set algorithm that incrementally progresses to the
1Anne-Marie Coreley, “Skiing Robot Races Down Slope,” IEEE Spectrum, www.spectrum.ieee.org/automaton/robotics/robotics- software/skiing-robot-races-down-slope/ Oct 16 2009, retrieved Jun 9 2012
4 optimum. A staged learning algorithm will steps toward a refined motion in small increments to balance the complexity of the learned skill with the current proficiency and experience. Motion primitives are an efficient representation for staged learning. Learning methods can act on a set of motion primitives to create a more refined set of motions or create more elaborate motions through concatenations.
When searching for better solutions, there are two fundamental approaches. The first is to take an existing solution and continue to improve it. Performance can often be improved by slightly adjusting the motion.
Larger improvement is gained by the accumulation of slight improvements. This approach is based on local optimization. The second involves global search. The high jump was traditionally dominated by scissoring the legs, bringing the other up after the first crossed. Then in 1960’s, Dick Fosbury created the Fosbury flop of crossing the back and head first which quickly dominated high jumping due to its improved performance
[8]. The Fosbury flop represented a disjoint solution in the solution space. It cannot be achieved directly from improving the scissor approach. By the same approach, it is not an obvious solution, so a general purpose optimization would require extensive searching to find it. Most global optimizations balance the need for focusing solutions and increasing diversity.
‘By hand’ approaches incorporate the designer’s intuition. If the prevailing intuition is inefficient, then the best motion will remain elusive until the appearance of a fundamentally different solution. Autonomous learning algorithms address situations where biomimetics do not apply and humans do not have an intuition for the solutions. The martian rover, Spirit, was stuck in a sand trap for months while engineers on Earth attempted to propose a motion that would free the rover. Staged learning could allow the rover to first develop movements that create motion, then find concatenations that produce the most productive motion.
The contribution of this dissertation is a step towards mimicking biological development. Its scope lies in autonomous improvement through the process of training. Therefore, the scope does not extend to the synergistic benefits of tutoring or group learning. The focus is on incremental improvement over a long period of time. This is achieved by application of a number of numeric methods in a developmental sense.
5 Specifically addressed are how to represent motions in such a way that they can be modified for a given objective so as to provide a reflexive result, and the learning process balances the complexity of the search space with the ability to continue to develop. With the foundation of individual learning, group learning can benefit by the injection of new solutions. Ultimately, staged learning would provide a method for motions to be perfected beyond the programmer’s ability or understanding.
Chapter II addresses the research questions. A review of related work is given in Chapter III. Chapter IV presents a method of abstracting optimal robotic control to a general trajectory optimization. This method allows the designer to work in terms of linkage geometry and obtain trajectories without derivation of low- level equations of motion. Next, Chapter V shows how a swarm optimization efficiently searches a global search space in a way that lends to creation of optimal inverse functions. Staged learning in presented in
Chapter VI, where motions begin at a limited parameterization and progress to an unbounded resolution. The dissertation is concluded in Chapter VII.
6 CHAPTER II
PROBLEM STATEMENT
The central questions of developmental robotics deal with how to enable a robot to stage learn- ing to more complicated tasks. This dissertation focuses specifically on developing near optimal motions. The specific questions addressed throughout this dissertation are enumerated and dis- cussed. Considerations for comparing different learning methods are presented including if local or global search is done or how additional data is able to result in better performance. Rep- resentations used for motions also have a significant effect on learning and how to identifying advantages is addressed by research questions. The questions posed in this chapter provide a map to progress towards the goal of having a robot learn and master new tasks.
The overarching goal of this dissertation is to answer, even if partially, the question of how to match the ability of animals to grow from novice to master on motion related tasks. This would help solve some aspects of the problem of how to create general-purpose intelligence. Despite remarkable achievements, the solution has been elusive, although some principles are becoming more clear. With growing acceptance, the process of learning for general purpose systems is seen as a long term process, rather than an isolated training phase.
Also, staging of the learning offers the advantage of directing the learning in an unobtrusive manner. In order to pursue the overarching goal, the principles of long term learning and staging have been applied to form the principal problem statement of this dissertation.
How can autonomous staged learning be used to develop near-optimal motion primitives in high dimensions?
To support the principal problem statement, additional problems are enumerated, to provide more focused steps: 7 1. How does directed learning fit in a global or local search context?
2. How can motions be optimized based on a high-level rigid body representation?
3. How can operation of a multiple-input, single-output system be simplified when considering tracking
and optimization?
4. How can equivalent, but fundamentally different motions be organized?
5. How should motion primitives be represented to facilitate incremental increases in the number of con-
trol parameters?
6. What are the limitations fundamental to staged learning?
This chapter elaborates on these questions and their relationship to the principal question.
Global search seeks to find solutions wherever they exist in the solution space. Local search simply tries to find a better solution in its neighborhood. The advantage of the global search is that it provides more choices. However, humans are adversely conditioned for too many choices [9]. As shown in Schwartz et al., additional choices (after a certain threshold) add additional regret for unused good solutions and increase expectations leading to lower satisfaction with the chosen solution. Though the disadvantages of too many choices are presented from a psychological point of view, it is useful to consider that this mechanism is present in intelligent systems tending them toward simple solutions. From an algorithmic point of view, the additional choices increase the search space exponentially. Depending on the representation, the increase in solutions above a threshold may not match the growth in the search space, thus decreasing the likelihood of finding a good candidate. Local optimization limits the search space to the local neighborhood. If the search space is Lipschitz, then local gradients provide the most efficient directions for search. Because global search offers the benefit of finding disjoint solutions, and local search has increased efficiency, both should be employed to some extent when directing learning.
8 Motion primitives are based on a trajectory control problem. The general trajectory control problem is to
find a set of control signals U such that each element, u(t), satisfies g(x, u, t) 0 given that x˙ = f(x, u, t), ≥ where x˙ = dx/dt. The application is directed to physical systems, where u(t) represents a control input.
The function f(x, u, t) represents system dynamics and g(x, u, t) represents a set of trajectory constraints.
Constraints can be restrictions on states, rates of change, the input, or running measures like energy or fuel used or functions of the combination of these. Constraints can also be used to specify output quantities such as progressing to x1 3 or to exactly x1 = 3. Though trajectory control is selected as the inspiration for ≥ this dissertation, results extend to other areas, such as function shaping, function approximation and static optimization.
In cases where multiple solutions offer the same desired output, a cost function should be considered to rank solutions. It is assumed that at least one solution exists; proving the absence of a solution is a difficult challenge and beyond the scope of this work. The members of the set U should be partitioned into clusters such that each cluster has no more than one optimal candidate for a given output2. In addition, clusters should be contiguous as the desired output changes. Each cluster represents a motion primitive where a variable outputs scales the motion for different tasks or situations. If the system were linear, then scaling would simply be linear, but in general, nonlinear motion primitives would be expected. The cluster structure is advantageous over other methods, such as only keeping Pareto-optimal results, since it allows for fundamentally different solutions to be used when a primary solution fails.
Learning methods employ techniques to explore and generate knowledge from a problem’s topography.
The representation of the variables in a problem often significantly affects how the method will behave. It has been shown that initial choices for learning can affect final motions learned and almost always affect the rate of learning [10, 11, 12]. The progression of learning in humans is from core muscles to extremal joints [13].
In the work of Alexander Stoytchev, core muscles were advantageous since they produced the most motion
2 In the case where U contains one member, U is itself a cluster with the sole member being the optimal candidate.
9 and therefore the most reliable response [14]. For this reason, consideration of the representation used is as critical as the algorithm.
Though staged learning offers advantages, few existing methods in the literature stage the development.
Most often staging is done by using primitives with a high-level planner [15]. Activities used to identify systems can be viewed as stages of learning [14]. Only one work has autonomous staging where attention is directed to productive learning, but the method cannot advance beyond its programmed representation
[16]. This means that once the programmer determines the resolution, the level of possible performance is
fixed. For autonomous motion learning to be more effective than preprogrammed methods (such as ‘by hand’ approaches), it must allow for unbounded learning. For this reason the representation must not be bounded at programming, but should approach the set of continuous functions in the limit. In addition, the system should autonomously determine when it is ready for the higher dimension representation.
Some low-dimension parameterizations that can be expanded to continuous functions in the limit may not provide a direct relation as the parameterization changes. For example, based on the domain considered, coefficients of lower order polynomial approximations do not necessarily appear in higher order approxima- tions. A quick example can be seen approximating f(x) = x2 over the range 0 x 1. The first three least ≤ ≤ squared error polynomial approximations are
fˆ1(x) = 1/3 = 0.333
fˆ2(x) = 1/6 + 1x = 0.167 + x − −
2 2 fˆ3(x) = 0 + 0x + x = x . (1)
The sequence of parameters is [1/3(, 0, 0)], [ 1/6, 1(, 0)] and [0, 0, 1]. None of the terms carry over or are − even close to previous approximations. The fastest learning rate would be expected when the previous and new optimal results are closest in the new parameterization.
10 This dissertation addresses the problems enumerated in this chapter for progress in achieving efficient and effective staged learning. Local search offers the best chance of immediate improvement while global search organizes an exploration to find alternatives. For autonomous learning, the description of the system should be as intuitive as possible. By utilizing high-level representations, such as rigid body dynamics, with general purpose optimization, the learning process can be automated, as shown in Chapter IV. The high- level representations are more intuitive for human designers than using complicated low-level equations of motion, and can help automate the design process if a human is not involved. The basic definition of a primitive implies that it can be used for multiple purposes. By composing continuously variable primitives, solutions are better suited for changing tasks or environments. This is demonstrated in a global search sense in
Chapter V and a local search sense in Chapter VI. Using inverse functions as primitives provides a simplified interface for operators, compared to regulating and optimizing a multiple-input system. By autonomous training, the system learns and refines the set of functions available to the operator. As presented in Chapter
VI, learning can be staged by limiting the search space from all continuous functions (which requires an infinite parameterization) to a finite parameterization and then increasing the dimension of parameterization to approach the set of continuous functions. Both the representation and organization are investigated for improved learning efficiency. By investigating the limitations of staged learning, performance bounds can be gauged.
11 CHAPTER III
DEVELOPMENTAL LEARNING
Rather than try to learn skills directly, developmental learning approaches seek to create gen- eral purpose learning systems inspired by the human developmental process. The premise is that intelligent behavior can only be achieved through the course of experience, and attempts to circumvent this approach will lead to a plateau limiting performance. There are many facets of applying human developmental principles and some of the common examples are categorization, sensorimotor control and social learning. These are relatively simple tasks for people, yet chal- lenging for an autonomous robot. A survey of research in this recent field is presented with a focus on describing the concepts and the tools used. While this dissertation provides only one component for a general purpose learner, it can be combine to support or benefit from other work in this field. The breadth of structures employed is presented, especially as related to function approximation. Function approximation can be seen as a form of learning and methods from developmental learning are compared to numeric methods. This review is intended to give suffi- cient depth that the paradigm of developmental learning can be understood as applied to motion optimization in later chapters.
3.1 The Basis and Need for Developmental Learning
Developmental learning methods are staged so that learning begins at simple tasks and builds up to more complex or challenging tasks. This relatively new approach was inspired by the developmental milestones in biology. Some research laboratories have adopted this approach and it is seen more often at conferences and in journals.
Developmental milestones result from incremental progress on low-level tasks which enable high-level accomplishments. Models of self and environment and the ability to plan, categorize and use tools evidence cognitive development. Gross motor skills (large, powerful motions such as walking) and fine motor skills
12 (precise motions such as picking up beans) are examples of sensorimotor development (coordination of sen- sory input and motion). Attempts to circumvent the developmental learning process result in an algorithm proficient in its task, but one that cannot build on itself. The premise of developmental robotics is well characterized by a quote from A. Stoytchev:
Our basic research hypothesis is that truly intelligent robot behavior cannot be achieved in the absence of a prolonged interaction with a physical or a social environment. In other words, robots must undergo a developmental period similar to that of humans and animals. [17]
The final goal would be to have a robot that is at least as competent as an adult at being able to solve physical
(and possibly mental) problems autonomously, be trained by people or robots, and adapt for reasonable failures. Staged learning is attractive since if the robot can autonomously develop, it can work in novel environments and beyond the designers’ understanding [18]. This chapter surveys the principles of this approach, the applications and the tools used.
Humans are the primary examples of intelligence from a developmental process, but development is also studied in other species as well [19]. Psychology plays a prominent role in the field and two theories are often cited. Jean Piaget proposes an individual learner [20, 21], while Lev Vygotsky proposes development resulting primarily through social interactions [22]. Representation of the foundational knowledge gained in the developmental process affects the ability to learn, and representations better suited for learning are being researched. Eleanor J. Gibson proposed the foundational theory on how people hold models of what objects can accomplish, called the theory of affordances [23]. Biologist see developmental robotics offering prospects at understand biological development, but this dissertation focuses on improving robot intelligence.
Traditional robotic controllers (such as state-space or fuzzy-logic) typically fail to provide a powerful, general framework; and thus designs are difficult to adapt to novel tasks [24, 25]. Traditional controllers typically perform poorly on poorly understood systems, such as intractably complex systems or unknown physics, such as flapping wing dynamics. Telepresence has limitations in that operation is mentally fatiguing
13 [26]. Humans can typically only hold attention on one hand at a time, so operating many degrees of freedom in unfamiliar ways is beyond most people’s ability [11].
The developmental approach is partly a response to the traditional artificial intelligence (AI) approach which believed that with sufficient knowledge (in the form of logic rules), an agent would be able to out- perform any of the designers in any future task [27]. This rule-based, general-purpose approach proved complicated, slow and fragile. Later, fuzzy rules and heuristics were applied with more success. The seeds from traditional AI can still be seen in some developmental robotic methods which try to use rules for mod- eling and planning. However, heuristic methods applied with slight changes can surprisingly result in much worse performance [11]. By imposing heuristics on the problem, the algorithm does not develop its own heuristics, and good performance is limited to the correctness of the heuristic. Therefore, the agent should be able to acquire its own understanding of the environment rather than the designer’s understanding [28].
Heuristics are powerful in the real world, and an intelligent agent must be able to infer good strategies to deal with complexity.
This chapter follows with a description of developmental learning in Section 3.2. Three divisions are are made in the field: Categorization, Sensorimotor control and Mental and social learning in Sections 3.3 to 3.5.
Assessment on the field as a whole are given in Section 3.6. The chapter concludes drawing developmen- tal learning principles into the framework used in this dissertation. Structures employed in developmental learning algorithms are the subject of Section 3.7. Methods to direct the learning efficiently are discussed in Section 3.8. Finally, learning can be seen as creating a function approximation, so methods outside of developmental learning are compared and contrasted in Section 3.9.
3.2 Nature of Developmental Learning
One of the challenges of artificial intelligence research is that there is no precise definition of what in- telligence is. Most definitions of intelligence involve making good decisions based on previous experiences.
14 Within this definition, there are three components: decision, goodness and experience. An intelligent system must be able to make a decision to manifest its intelligence. Besides the ability to act, this could be a clas- sification decision or internal decisions. Note that this definition would classify adaptive controllers, if they have proven stability for their application, as intelligent. However if their application changes, then they may no longer be stable; and so there should be some measure of breadth of applications or enhanced ability when confronting novel problems.
A sensor array that can selective route signals, has internal states or adapts how it fuses data may act intelligently. What is ‘good’ might be able to be put into words or might simply be a mathematical compulsion affecting goals. What is ‘good’ does not need to be constant, and research is being done to use a value system to direct learning, which in some ways mimics emotions. This abstracts the pursuit of knowledge to higher levels allowing for more mature development. Since absolute logical validity is not being required, experience should be sufficient for initial formation of knowledge, often distinguished as on-line learning.
Some plasticity is required to change or refine rules or respond as the environment changes, but too much will prevent learning subtle rules. This is consistent with research demarcating limits to human learning based on plasticity. Experience, and not the designers understanding, should drive the agents understanding.
A premise of robotics is that the exact application is not know to the robot designer, so designs should be general purpose. The structure inherent in the experience shapes the rules learned by the system from a general purpose to one formed around that structure.
The novel aspects of developmental learning in artificial intelligence come from how concepts of devel- opmental psychology are applied. Some methods seek to create a similar developmental pathway for robots, so that algorithms can be used to learn skills that provide a foundation for learning high-level tasks. For example, the ability to identify self is required before identifying an extension of self, ie. a tool [14]. The premise is that if robots mastered foundational skills, then mastering the next level would be a trivial task.
This breaks the learning process into many small steps and human development provides a road map. Though
15 this might not be the best path (because high-level skills are built on general purpose skills rather that skills focused to that task), it works well with humans. Failure of traditional artificial intelligence can be related to lack of skills that young children gain, such as being able to find relevant similarities, remembering ob- jects which are out of sight and dealing with real world complexity. Methods following human development seek to create solid foundational skills which can be readily employed by high-level algorithms. What these methods lack is the synergistic benefit of co-development of skills.
Other methods create developmental milestones as products from the method, rather than heuristics used to guide the method. Social approaches seek to create an environment where learning can take place so that agents will contribute to a social knowledge, similar to swarm systems and emergent behavior research.
Other methods stage the learning on the individual level. By adapting the learning method, or possibly by simple repetition of the learning method on the changing model, the agent develops resulting in substantially better performance. On the surface, it can seem the same as using developmental milestones as stages; but the difference lies in that the milestones are byproducts of the method rather than objectives. An agent gains an increased proficiency at simple problems through experience which is related to more complex abilities.
Since the milestones are largely a byproduct, many of these methods are more abstract.
3.3 The Categorization Problem
One of the foremost problems in developmental learning is categorization. This falls under different applications such as: identification, feature creation and representation. Categorization is required for au- tonomous learning since an efficient representation is needed, so one piece of knowledge should cover a set of information. For example, categorization in the context of this dissertation is applied by separating clusters of locally-optimal solutions based on internal continuity to create inverse functions in Chapter V. An agent should be able to recognize discriminating features to reduce the dimension of the input and form categories
16 that generalize to new experiences. Categorization could relate to states or to relations, such as learning the rules and meaning of words in a language.
Language development requires extensive categorization to work in practical scenarios. First, real-valued, continuous time signals are segmented for development of building blocks: phones, phonemes, vowels, syl- lables, words and sentences [29, 30, 31, 32, 33]. Completely unsupervised learning is generally not desired since results should conform to the structure of the desired language. For this reason, tutors or exemplars are often used to direct the learning [34, 35, 36]. One study uses children to teach virtual agents to accentuate the benefits of co-development [37]. Experience based learning allows learning other ‘languages’ such as tonal meaning when dealing with infants or the influence of gaze on meaning [38, 39, 40, 41, 42]. Most work is, however, to deduce word meaning from video narrations [43, 44, 45, 46, 47], which requires its own segmentation [48]. General purpose segmentation is also researched [49]. This work can also be done in a predictive manner to compose narrations [50]. Narrations are particularly useful in determining others beliefs or intentions, based on theory of the mind, for determining social rules [51, 52]. Note that knowledge representation affects learning performance for real scenarios [53], and work is done to understand repre- sentational scaffolding to direct and assist learning [54]. This is however culturally dependent, and research is done to determine cultural influences and recommend specific methods based on culture [55, 56, 57, 58].
Order of presentation affects learning rate and the learner can accelerate learning by guiding the topics online
[35, 59, 60].
Vision is the other primary focus for categorization, partially due to the prevalence of high resolution cameras, opposed to scarce haptic devices, such as the one used for touch categorization used in [61]. Studies that primarily categorize features in a motor context are described in Section 3.4, though many share prin- ciples with vision systems. Traditional machine vision is more top-down (meaning relating the symbolic to the sensor response); rather than developmental, experience-based approaches which group sensor responses
17 to features and symbolic representations [62, 63, 64, 65, 66, 67]. From experience, categories can be au- tonomously formed to employ co-developing top-down and bottom-up approaches [68, 69]. By directing the learning, such as dwelling on features, planning scan routes or optimizing sensor sensitivity, more robust measures are developed [70, 71, 72, 73, 74, 75, 76]. Other features are based more on interaction with the object, such as if an object is a container [77, 78, 79]. How to grasp an object is a common application
[80]. Features that predict a high likelihood of a successful grip are identified and matched to specific motor commands and objectives [81, 82]. Feature robustness and plasticity are dependent on the learning process and the mechanism used to store the knowledge [83, 84]. For this reason, disabilities (such as the autism spectrum disorders) and the developmental schedule for abilities are researched to find foundational skills
[85, 86, 87, 88, 89, 90]. Artificial neural networks and support vector machines are commonly used and performance can be improved by adding expectation predictors, dynamic neural networks or mirror neuron systems [91, 92, 93, 94, 95].
3.4 Sensorimotor Control
One of the most basic competencies is learning to control one’s own motion. For robotic systems, this is a forward and inverse kinematic question that can be addressed by positions, velocities and forces, but assumes that joints can be actively sensed. More complicated systems could indirectly sense joints by visual feedback. Other sensorimotor systems could involve finding the forcing function of active sensors which increases sensitivity. The goal of sensorimotor control is a relation between control signals to measurements or the relation from a desired output and the feedback law necessary to achieve it [96]. From a cognitive developmental robotics perspective, the robot should develop a mapping from actuation to sensor behavior based solely on experience [28, 97]. This has been often done involving vision [98, 99]. Active sensors can be used to direct learning for such skills as saccadic motion [100, 101]. For simple systems or well-known systems, this is often a trivial problem; but for a complex system or adapting behavior in response to damage,
18 a general purpose algorithm is desired [102, 103]. Tool use requires determining what a tool can do (its affordance) and how to use it [104, 105]. These identification methods have also been used for predictive capabilities: based on new body configuration or new tool, hypothesize the affordances [106, 107].
General purpose methods often fail in large spaces due to the dimension of the space involved [65, 108].
Reduction in the number of dimensions is one method to deal with the freedom, and can be done via a developmental approach [109]. Other methods guide exploration by emotion or motivation based heuristics, such as curiosity or disappointment [110, 111, 112, 113]. Tutoring directs learning based on feedback from an expert [113, 114]. In an environment with noise, motion models should balance smoothness to generalize results to novel circumstances, and exactness to achieve the best performance [115, 116, 11]. Similar to categorization tasks, representation affects learning performance. Two common representations are kinematic or kinetic representations and the choice between them can be made based on how well the task is described in each [117].
Primitives are often used as building blocks to high-level behaviors, such as words being composed of syllables [100, 118]. Primitives can also be used to make operation simpler and are a step toward using triplet- based world models3 [15, 119]. This vocabulary need not be static and can continue to develop [120]. Nor do primitives need to be trajectories, but can be models or feedback control laws learned through experience or tutoring [121, 122, 123, 124]. These motor-memory responses with automatic responses can prevent planning-execution delays. Failure to form proper motor vocabularies has been associated with stuttering
[125].
3.5 Mental and Social Learning
Beyond simple features, meaning basic movements and rules or models, more complicated world models can be created. These can be based on an individual agent’s experience or emergent through social interaction.
Menial problems are assumed to not be trivial or may require meta-learning (learning better ways to learn).
3Triplet models are based on the logic rule: IF((State==A)&(Action==B)) Then (Next State==C).
19 The model of the world typically represents states of the world with rules describing transitions to future states [126, 127]. Without experience, the initial model will be incomplete or wrong. State estimation and creating or editing rules must be done judiciously to handle ambiguity, but should require as few experiences as possible for rapid learning. Surprise measures can determine if events justify special consideration [128].
Curiosity normally takes a slightly different role than it does in simple self-model identification [129].
Representing an individual’s memory is a daunting task and methods are developed for efficient represen- tation [130, 131, 132]. Studies are done on infants or people with disorders (especially autism) to determine people’s internal representations and how the specific representation affects performance [133, 134, 135,
136, 137, 138]. Similarly, studies are done to identify and apply methods humans use to learn to research in a robotic developmental learning context [139, 140, 141]. To progress to more or harder tasks, learning can be transferred to novel tasks [142, 143, 144, 145].
Co-learning or co-development refers to multiple agents or possibly multiple abilities of a single agent developing concurrently to balance complexity though it has an increased risk of divergence [146, 147].
Strategies employed for co-learning differ from strategies used for learning in isolation. The choice of the pairing, such as human-child or human-robot, significantly influences each strategy [148, 149]. Learning can be applied to learning how to learn, as children do, or learning how to teach well [150, 151, 152, 153, 154].
The ability of agents to identify that other agents have internal states, such as beliefs, separate from their own states is referred to as theory of mind. With a set of internal and external beliefs, roles can develop [155, 156].
Internal regularity in a network, representing distinct beliefs, tends to have better performance when gener- alizing [157]. Neural representations have been pursued to support biologically plausible implementation
[158].
20 3.6 The Progress and Potential of the Developmental Learning Paradigm
Developmental learning addresses deficiencies of task specific algorithms, most importantly the need for an expert to tailor algorithms to the application. The field is still very young and has not reached its goal, but initial results show that robots are able to autonomously learn. Robots are able to 1) develop a model of how their actions relate to the environment, 2) develop basic motions to achieve desired results, and then
3) combine those motions for high-level behavior. Current research focuses on only one or two skills at a time, but the skills are laying a foundation for high-level skills. In order to solve the really hard problems, a unifying framework needs to be created so that skills can be synergistically tied together. In addition future research must allow for continual improvement so that robots can reach and surpass adult’s abilities, up to the point that equipment limits performance.
Use of the developmental perspective has been shown to be beneficial. The principles used by infants, children and adults to learn skills offer a scaffolding that directs learning from very basic competencies to high-level skills. This pathway provides guidelines to check emergent behaviors against developmental milestones. Lack of achieving milestones could indicate potential difficulties later on. Theory of the mind provides a framework for dealing with other agents and uncertainty. Experience based learning is able to develop simplified descriptions of the world, similar to heuristics which people use to manage complexity.
This departure from the traditional AI methods, that needed definitive conclusions, allows for much more efficient learning which can be done on-line and concurrently. Because these heuristics are based on ex- perience, robots could learn in very different environments where typical heuristics do not apply, such as the moon or deep sea. Often information is coded as causal relations, allowing for directing the learning process. Developmental robotics research not only advances artificial intelligence, but helps us understand mechanisms of our own cognitive structure [125]. The cognitive process is the aim of this research, rather than the specific agent’s behavior. Principles can be transferred from one domain to another, such as applying language patterns to motion patterns.
21 The common feature of developmental research should be a progression from simple to complex abili- ties. Unfortunately, even with the designation of developmental learning, very little in what has been done autonomously exploits low-level skills to enable more complex skills. Most published results show devel- opment by having additional models, but with the same complexity. At best, motion primitives are used by a high-level planner, but one level is as far as that staging goes. By contrast, the learning method should scale well to allow for near limitless progression. The complexity must be allowed to increase with sufficient experience, otherwise the method simply sequentially learns tasks of similar complexity. An example of a boundless learning scheme and the arguments for it is given in [159]. Unfortunately, boundless learning it- self is not sufficient since the increase in search space would make finding better solutions combinatorially infeasible unless searching was directed. Developmental research seeks to find methods to guide learning in the ever growing search space without imposing our understanding [25]. If a general purpose learner has a
fixed resolution, then it will either be too small to learn complicated tasks or too large so that learning simple tasks is intractable.
3.7 Developmental Structures
Developmental structures must be able to add knowledge and should be able to progress beyond the designers understanding. For this reason, much work has be done to create a structure that can autonomously learn without any knowledge from the designer, termed “Cognitive Developmental Robotics” [18]. None of the structures proposed has proved complete, but performance is progressing. Intelligent behavior can exist without an explicit model of the environment; only understanding of how sensors change with actions is needed. Because small discrete sets are easier to practice with, many of the methods discretize the domain of values and time. An understanding of actions can be stated as a triplet, If Statea and Actionb T hen Statec, which can then be used in logic methods. Unfortunately, logical induction is not valid for most situations; therefore no rule generation method is perfect. Based on these triplets, a robot has learned to plan a path to a
22 target despite the target being moved to another place [128]. Logical methods are very difficult to develop to continuous functions, and it seems that a continuous representation would be required for real-world changing tasks. Therefore, logic based implementations are seen as outside the scope of this dissertation though some useful illustrative principles will be presented.
The alternative to logical structures is to use function approximators, Sfuture = f(Scurrent, u) for dis- crete time systems and S˙ = f(S, u) for continuous systems, where u represents the action. The state, S, can represent a state based on current sensor readings, a continuous state determined internally, or a node in a finite state machine. Finite state machines are a directed graph where the current state transitions along edges according to some criteria. For example a graph might have two states: ‘DoorOpen’ and ‘DoorClose’ where the transitions would be ‘CloseT heDoor’ and ‘OpenT heDoor’. The finite state represents the model of the world and can be used to plan sequences to achieve desired states.
A very general purpose method by Grupen et al. uses a finite state machine where a different controller is appropriate for each state [10, 160]. By using a Bayesian belief network trained to data, nodes can represent combinations of distinct states that are ambiguous based on sensor readings but can be discriminated by history. For each distinct state, a feedback controller is trained by reinforcement learning. This controller represents a sensorimotor paradigm held by the agent and provides robustness by attraction to the desired trajectory [161]. A high-level controller shapes learning through the reinforcement learning process which directs the desired or natural state progression. The high-level controller can have multiple copies of state machines and feedback controllers. Which network to use is selected by an appropriateness measure possibly determined by a value system. There is biological support for distinct networks for each motion, referred to as a system of experts. Aronson suggests the central nervous system shows organization by motion patterns rather than by muscle location [162]. This is a general overview and the actual method is much more complex, but the key structure of sets of state machines and reinforcement learned feedback controllers is covered. This approach is very general to be applied in various circumstances, but fails to provide an obvious method of
23 generalizing to new circumstances or tasks, or to direct motion to accelerate learning when novel tasks are encountered. Ref. [160] argues that developmental progression occurs due to specific developmental reflexes which would come from high-level controller and is outside the scope of the structure.
Other studies have used biologically based motion centric structures. In [163], it is argued that motor control begins as a high gain system that tends to the observed developmental reflexes which disappear later in life (for example, the grasping reflex when something touches the palm). These reflexes are specific responses to sensor inputs that can be represented as vector force fields in state space. Initially, any sensor input is magnified to dominate the behavior and the force field directs the path to the observed motion. Upon maturity, the response has a lower gain and volitional control can be observed. The work in [163] proposes that force vector basis functions should be used as the representation of motor control. A similar reactive structure in [159] is shown to be sufficient to achieve useful high-level behaviors like walking, jumping or phototaxis (honing to a light). Other explanations of developmental reflex disappearance and emergence of
fine motor control deal with the process of myelination of peripheral nerves which dramatically increases the signal strength from the central nervous system. However, myelination alone cannot account for the emergence of new behaviors [164, 165].
Most methods that use neural networks use them as function approximators. However, there have been attempts to use them in other ways in a developmental sense. In [25], a recursive neural network is built from sensors to motors based on a genetic code. The genetic code is optimized for the task of moving while avoiding obstacles. The genetic code specifies how neurons are grown rather than the structure itself. Sensors and motors nodes begin in fixed locations in a grid and based on local environmental conditions nodes will 1) divide to produce additional nodes, 2) grow connections or 3) excrete chemicals. However, this method seems excessively complicated and did not produce noteworthy results. Another study used a genetic algorithm to evolve creature topology along with a neural network for control [159]. Genetic algorithms can be seen as staged learning through the use of generations, but they typically require too many trials to be reasonable for
24 physical systems. In [159], the agents were formed by blocks attached by various joints with each joint having an internal control law based on other joint angles to determine joint force. Both the body and controller co- evolved so that one was not excessively complicated. The creatures were given different tasks (swimming, walking, jumping and honing) in a physics based simulation. Not restricting the agent form allowed for development beyond what the designer could have created by hand. Another study sought to emulate the pattern generators in the cerebellum by using chaotic neural networks [129]. Chaotic neural networks are formed by including oscillating elements in a neural network so that the steady state becomes very sensitive to the sensor values. The network was trained to follow paths, which it could, but then performed poorly on new paths.
Another framework deals with understanding the effect of actions. The first goal is to discover self, meaning the things that you have direct control over. This is found through experimentation which is an important part of development. Drawing from child development research, the term motor babbling indicates random commands sent to the motors for the purpose of collecting data. The model of self is based on a high correlation of sensor features to commands. A robotic arm in [14] used the time of the response to commands as a discriminating feature of what is self from what is environment. Here the features were color coded targets, and a map was built to determine the relation of motor commands to which features would move. In another static experiment with a robotic arm, blob detection was used to detect interesting regions that were then classified [166]. Mutual information measures were used to train which clusters were correlated with commands. After the shoulder and elbow where babbled, the fingers where babbled while the arm was fixed.
Using designations from the previous self clusters, the fine motions of the fingers were sufficient to identify blobs as being self or not self. This is a good example of how dividing learning into stages can make detection easier.
Swarm approaches are similar to developmental learning in that simple rules are used to generate emer- gent behavior. An approach that straddles both fields uses sets of independent nodes that can attach to form a
25 composite robot. The system is based on neighbors and uses hormones to generate global behavior based on local rules [167]. Different topologies such as a snake, legged (greater than 3 legs) or loop were demonstrated.
However, communication and locomotion rules were hand generated. Other work using social interaction to drive development takes language as the primary concern [168, 169]. Social learning is important for large scale developmental fleets, but most all social learning work assumes that the robot has basic motor com- petencies necessary for interaction and basic sensor classification for determining key features. Basic motor competencies can only be guaranteed for simple systems, such as differential drive robots. Basis sensor classification is progressing, but distinctive features are often used since general classification learning has not reached reliable performance. Because social learning has not shown to be able to develop basic motor competencies, much of the social learning work is outside the scope of this dissertation.
3.8 Directed Learning
The goal of directed learning is to move beyond random motor babbling to trails that directly help the goal. For robots to be effective in many human tasks, they will need many degrees of freedom. However, getting sufficient global data spanning a high dimension physical system takes too much time and memory.
Besides, this is inconsistent with biological systems since they only grow proficiency with common tasks and use planning techniques for infrequent tasks [170]. Without sufficient data, some learning methods become unstable and cannot find any consistent generalization to abstract. Directed learning is seen in human development where infants first use core muscles for reaching. After that has developed, elbows, hands and then fingers are used [13]. Legs stiffen when learning to walk and balance but then relax and progress to a more energy-efficient gait [171]. Similar work has been done to stage degrees of freedom by delaying access to some actuators [166, 172, 173]. Staging degrees of freedom helps to establish the dominating relationships which are then used to help discriminate the more subtle relationships. Many function learning methods require a set scale to define modes or clusters (i.e. nodes needed for radial basis functions, clusters needed
26 for k-means clustering). If the scale changes, a bifurcation in the representation could occur resulting in poor similarity between the representations. By only looking for the dominant features, noise is much less likely to be picked up as a feature or obscure a true feature.
Many methods have been investigated to maximize learning rates by pursuing the state with highest error or least data [174, 175, 176, 177, 178, 179]. Unfortunately, this may try to learn data everywhere, which takes a prohibitively long time. Also, it may be asking for too much generalization. Though people possess great cognitive plasticity, there are limits showing that trying to learn too much may distract from becoming proficient in the common motions. Trying to refine the maximum error will not work for real systems due to noisy, incomplete data. Some relations cannot be represented as a function. For example in [180], a robot learns to reduce the information in an image down to 1% for classification by mutual information and support vector machines, but then forms logic tuples. It chooses the action with the least confidence at prediction.
If an action produced random results, no single logic tuple could describe it and there would still be a low confidence at prediction. Therefore the system would converge to selecting this action without learning the consistent rules of the other actions.
A better method considers the behavior of the error [16, 181]. If an action (given a certain state) is becoming more predictable, then this would be an action to refine. After a certain point, sensor noise or other errors would dominate the prediction error showing that the function is learned as well as it can be approximated by the representation used. Other methods would then have greater learning rates and focus would would be directed to them. Because with only a few data points, an outlier may create a worse learning rate, there is a chance (10% to 30%) that an action is randomly selected. This added significant robustness for factors such as environmental changes. Data is partitioned to experts once a certain number of samples is obtained. The partition location is optimized creating demarcation justified by the data. This approach is attractive since it provides an indicator (all learning rates are stagnant or all outcome likelihoods exceed a threshold) of when a development stage is done and sensor information or motor control should be advanced.
27 An interesting result of this method, dubbed “Intelligent Adaptive Curiosity,” is that it exhibits unpro- grammed learning stages. Two results have been presented, one of a simulation of a robot moving about a box [181] and another of a robot dog on a infant’s play mat [16]. Initial motions will first be very similar to motor babbling since no predictor has been reasonably trained. Next it learns that no action often produces no environment change giving a default representation of what the environment does on its own. Then an action will begin to be predictable (simpler ones are more likely) and it will be repeated many more times than before. At some points this focus of attention may shift to another action and possibly back after a period.
The method gives no direction to seek certain senses, yet at the end of the study the dog would ‘successfully’ behave by choosing actions that resulted in a more specific outcome. This is a result of the successes (biting a bite-able object) being rare, so after the model is generally well defined these rare events capture the most attention as it tries to learn them better.
Other directed methods can be heuristic-based. To learn on an unstable system, the controller is given highest priority to develop a good local model and second to pursue a symmetric cycle [182]. In [11], a humanoid robot ASIMO is trained how to point to locations using all the body joints. Training data is generated by having ASIMO trace a circle with the right hand and a circle or a figure 8 with the left. The inverse kinematics is learned by a recursive neural network for either set. When ASIMO was trained to both circles, he was successfully able to generate poses that pointed to novel locations with reasonable accuracy.
However, when the network was trained with the figure 8 set of data, ASIMO’s generated poses where very distorted in one axis. There is no clear explanation why one training set performed so much better than the other showing that heuristic success often can not be known a priori.
Numeric methods can be see as directed learning, most often pursuing a minimization based on gradient information. In [12], simulated annealing and simplex method are used to optimize a trajectory of animals in a physics based simulation. Simulated annealing contains a set of candidate points (or a single candidate) and allows the candidate to move based on the new cost value and a temperature parameter. When the temperature
28 parameter is large, the candidate movement is largely undirected. As the temperature falls motion progresses to a steepest descent. Given a sufficiently slow rate of temperature decline, the global optimum is guaranteed to be found. The simplex method is a linear optimization algorithm that considers solutions on the boundary of the feasible region, which is where the optimum must occur for a linear optimization. This directed trial- and-error method may have exponential time in worst case, but in common practice it finds the solution in near linear time. As stated before, though numeric methods are increasing in efficiency, they typically require too many function evaluations (trials of physical systems) to be practical for robotic learning methods. This is especially true if the results have random elements requiring multiple repetitions to capture an average result and its likelihood. There are other numeric methods more applicable to trajectory generation, but they require high fidelity models of the system and are sensitive to stochastic systems. These methods employ more greedy methods like gradient descent, Newton’s method and pseudospectral methods. In order to take accurate step directions, a complete basis is sampled so that they generally scale poorly to high dimension simulations.
Another method generally not thought of as a numeric method is the class of genetic algorithms. It is included in this category since it uses similar principles as numeric methods. Crossover operations can be seen as a type of simplex method where the two (or more) parents are vertices and the child is a new vertex that represents the intersection of projections of the parent points. Mutation operations are similar to the random motion in the simulated annealing. Genetic algorithms also generally perform better than exhaustive searches or undirected random walks, but still typically require too many iterations to be physically practical.
However, it has been shown to be effective in simulation for producing developmental results [159].
29 3.9 Function Approximators
Function approximators play a key role for developmental learning. Since the proper functions are not known, the true function can be approximated to a sufficient accuracy by a number of function approximators.
However, some work better than others, but no one approximator has been shown to be always better.
In [182], a stick juggler builds a map of stick state and robot position to resulting stick state. Due to the nature of the impacts involved, this event is well characterized by a discrete event system. Continuous time systems are frequently discretized to assist analysis, but sometimes miss critical system attributes. Linear weighted regression (LWR) employs a least squares linear regression where the error is weighted by the proximity to the point where the LWR is evaluated. This allows for more accurate generalization of noisy, nonlinear data. In addition, statistical measures provide a convenient measure of uncertainty and confidence bounds that are more accurate than simple error. Many additional techniques are used to prevent ill condition inversions, such as adding a diagonal matrix to the matrix to be inverted and adding random perturbations to the control to encourage data to support every direction. The learning in [182] is directed to first build a confident local model before progressing to a sustainable equilibrium point because of the instability of the system.
A number of function approximation methods use basis functions, either explicitly or implicitly. Basis functions can be added in linear combinations to produce new functions. Examples include linear interpo- lation where basis functions are tent functions [12], Fourier series where the basis functions are sinusoids and single-layer neural networks where the basis functions are the activation functions. A method similar to single layer neural networks is called projection pursuit regression (PPR) [183]. The approximator of the output, y R, based on m input samples of x Rn is ∈ ∈
m X > y =¯y + βigi(αi x) (2) i=1
30 n where y¯ is a mean value (so that bases can be centered about zero), βi R, αi R are parameters of ∈ ∈ the function approximator and gi is a scalar valued function, to be described later. In the neural network
th equivalent, βi are the output weights, αi is the input weights to the i node and each node has an activation function gi( ). The difference is not structural since PPR can be seen as a special case of neural networks. · For multiple output functions, (2) can be repeated in each output and concatenated to form the output vector.
A few reasons make PPR more attractive than standard neural network procedures. First, basis func- tions are not arbitrarily selected. They are formed from smoothed functions of the data, α>x, y . Typical { i } smoothed functions are created by applying low pass filters to the data. Therefore even, odd or multi-modal data can easily be modeled without needing to know the number of inflections. The amount of local smooth- ing is proportional to the local variation. By having more smoothing where data has significant variation, initial nodes do not try to over-fit the data by explaining variation due to an orthogonal dimension. The direc- tion of projection, αi is found through a local optimization of the gi fit. Results of the direction optimization can be rejected if it is a local optimum that results in poor performance. After a basis function is generated, only the residuals are used for subsiquent determination of directions and basis functions. To keep parame- ters on a similar scale, the outputs are often normalized. A frequent complaint of neural networks is that they cannot be easily interpreted. PPR however provides both important directions and what the function of that direction looks like, gi, given in one dimension. A downside of this method is that the iterative process is not optimal and it has not been shown to be an universal approximator. PPR easily provides a reasonable function approximation method without the need for human analysis because it determines direction of decomposition from the data.
Linear weighted regression and PPR are nonparametric methods in that they directly use the data to determine function form. The alternative is to assign a function form and then determine parameters to fit the data. A disadvantage to non-parametric forms is that they require all the data, which can lead to large
31 mathematical operations. Research is done to identify data to be removed, normally based on time or locating points with a small separation distance.
Parametric methods automatically condense the data to a set of parameters. Considering universal func- tion approximators, neural networks are most often seen. Standard forms have emerged to exploit some good properties. Single-layer feed-forward networks are composed by having a the inputs pass directly into a set of activation functions, with the network output being a linear combination of the activation function outputs.
These networks can easily be trained as a linear regression on the basis function outputs. Radial basis func- tions (RBF) have an activation function that is a scalar function of the distance to a point, normally a bell shaped curve with the points spread throughout the domain. This provides a method of localizing the output value to a section of the domain. Sufficient function plasticity and coverage can be achieved by setting up a grid across all input dimensions. This method is used in [24] to solve arm kinematics. The mapping of joint angles, 4 degrees of freedom (DOF), to end effector position, 3 DOF, is done. The Jacobian is also determined by changing the basis functions to the derivatives in the input-output dimension considered. Because the size of the network plays a significant role on the ability to generalize to novel inputs, the appropriate network size was also investigated. Using a minimum residual criterion, nodes are added to the network until the threshold is reached. A minimum error of zero can be achieved by using a node for each sample. It was found that there was a noticeable transition point in the network size once the error threshold was large enough. This transition point gave good generalization results. Therefore the network was retrained until a stable network size was obtained.
Feed forward networks can be effectively organized through Bayesian belief networks. Bayesian belief networks choose parent nodes (inputs) and then use conditional samples based on the parent nodes value to guess the probability of transitions to child states. Unfortunately, hidden states are not observed, so conditions of hidden states cannot directly be seen from the data. Training is still an open research question but practical methods exist. After a network is trained, unreliable links (those with a low conditional probability) can be
32 removed and the graph is used as the structure for a neural network. In [184], training data was used to create a forward kinematics model of a 1 DOF robot. The network structure also provided a graph that could be traced backwards to compose an inverse map. This is useful in the social development context for imitating demonstrated motions and for planning.
Recursive neural networks are another common choice. They are much harder to train due to local minima, but can model complex systems more compactly than single layer networks and are seen as more biomimetic. To develop a model of the inverse kinematics used in whole body motions for pointing with two hands in three dimensional space (6 DOF input and 15 DOF output), a sparse recurrent network was chosen in [11]. Two sets of training data were formed by the choice of a circular or figure 8 motion. This network worked well on circular training data, but performed poorly in one dimension with the other training set showing it is difficult to determine generalization a priori.
Classification problems are similar to function approximation in that they learn a map from inputs to outputs. Important features have been traditionally selected by the researchers and chosen for the greatest convenience. Common selections are specific colors, blobs or well defined intensity optima. In order for truly autonomous development, the agent needs to be able to determine distinguishing features autonomously.
Machine vision is common in developmental projects and support vector machines (SVM) are used in [166,
180] to find significant features for developing an If Statea and Actionb T hen Statec model. In [180],
SVM are used to reduce the dimension space to 1% of the original space, thereby significantly reducing computational demands. In [166], blobs are first identified and then clustered based on hue and saturation distributions. Then mutual information of the clusters and motions are compared to classify cluster locations based on a causal relation with control commands.
The method for determining locally linearized dynamics of robots based on kinematic chains is well developed, but complicated and difficult to measure. These functions are often built by hand but then tuned online through adaptation. Note that these are no longer universal approximators but are specific to controlling
33 rigid bodies. There is disagreement over what to model, mostly due to questions of biological systems. It is generally accepted that biological systems contain an internal model [21]. The differences of the ability of species to create models correlate with the ability to learn novel tasks [14, 168]. However, it is not clear if humans control position or force or something else. Position based methods are discussed followed by dynamic methods.
In support of a position based approach, human motions are roughly straight with bell shaped velocity profiles [185]. Even studies on neonatal infants show multiple straight line segments [186, 187]. Stiffening of extremities when learning motions supports a posture-based approach [171]. Adults do not look at hands when reaching and babies can locate glowing object in the dark with hands [188, 189]. This suggests that even from an early age, control of position can be done without visual feedback. For these reasons, static models for positional control are learned in [11, 24].
In [118], the motion of a human grabbing a cup is recorded with a second cup placed around it. Reaching shows coordination of the shoulder and elbow joints, consistent with the independence of wrist with other motions in reaching [190]. Depending on the relative location of the second cup, the human motion would be scaled away from it, but the form would be the same and the resulting motions clustered into three distinct trajectories. This brings up a second point concerning motion primitives. It is believed that there are sets of predefined motions that are modified for novel circumstances. In [191], the claim is made that there exist learned sets of artificial force fields that are imposed to regulate motion, similar to the suggestion that the cerebellum regulates joint stiffness [192]. The result of these controllers creates a trajectory, but does not seem to explain the bell-shaped velocity. When doing a novel reaching motion, joint stiffness increases to be significant in all directions (as opposed to only away from the body as is the case when at rest). After the motion is learned, joint stiffness decreases [185].
If motion were position-based, then exocentric forces would significantly affect the path; yet this is not seen. In support of dynamic based motions, animals can respond in dynamic ways and optimize gaits in ways
34 based on forces [12]. Imitation on a velocity level may be more natural where the position is necessarily different [184]. Actuators, including muscles, are normally force-based and therefore could be optimized more than if they were constrained to use a position regulator. Few studies, however, are very rigorous in general-purpose force-based models. In [193], rigid body robotic equations are used apologetically with the explanation that they had not the means to develop a satisfactory learner. However, parameters are fit to data and the model is able to achieve the developmental milestone of the agent identifying when something is acting on it despite having large joint forces due to gravity or motion.
35 CHAPTER IV
NUMERICAL PATH OPTIMIZATION
A basic developmental milestone is being able to control muscles to produce a desired motion. In addition, the motion is optimized as it becomes more familiar. Optimal control problems are challenging to solve even on simple systems. Few boundary valued, nonlinear problems afford analytical solutions because of the lack of a closed form solution to the differential equations. Ki- netic systems, such as robotic linkages, can be complicated to solve because of nonlinearities and large numbers of constraints and degrees of freedom. In order to autonomously create motions, a learning method should be easily adaptable for a broad range of systems without requiring system dependent heuristics. A numeric control optimization software, DIDO, is coupled to a numeric kinetic solver, SimMechanics, within MATLAB. The kinetic model is created directly from a solid model assembly eliminating human errors or requiring judgement. A pendulum with control saturation is tested to validate satisfaction of theoretical conditions (< 10% optimality residuals, typically < 5%). The numeric method is contrasted to a linear-quadratic-regulator (LQR) and the optimal linear state transfer. A four degree-of-freedom, arm robot pick-and-place command is also optimized and realizes a 50% decrease in energy used over the traditional ramp to constant velocity maneuver. This coupling obtains near optimal solutions without intense, model specific analysis. Having a general purpose program for determining motions to accom- plish tasks is a necessary step for other developmental skills, such as social interaction.
NOMENCLATURE f(x, u, t) State derivatives/ state dynamics equations g(x, u, t) State or control constraints, required to be 0 ≤ H(x, u, t) Optimal control Hamiltonian
J Path cost to, tf Initial and final time
To, Tf Set of admissible initial and final times 36 u(t) Control signal x, x,˙ x¨ State and its first and second time derivatives, respectively
Xo, Xf States meeting initial or terminal conditions
ψi(xi, ti) Cost at event i
φ(x, u, t) Running cost
λ(t) Optimality co-states
4.1 Introduction
Many motor-skill tasks can be viewed as an optimal control problem. An optimal control problem seeks to find control trajectories that minimize a performance function while satisfying constraints. A system of differential equations (DE) distinguish optimal control problems from static optimization problems. How humans control and plan motion is poorly understood, but the mathematical principles involved in optimizing a trajectory are well established and can be use in lieu of biologically derived optimizations. The optimality conditions from calculus of variations are also differential equations. The lack of closed-form solutions for many nonlinear DE’s impair indirect optimization. One prevailing method discretizes time to remove the differential equation and applies nonlinear programming to the set of all states, controls and constraints
[194]. In the limit as the discretization approaches the continuous system, the Karush-Kuhn-Tucker (KKT) conditions approach the optimality conditions developed by calculus of variations [195]. Two-point boundary value problems cannot march forward in time with an arbitrary set of initial conditions. Ensuring terminal conditions are satisfied makes boundary valued problems much more complex than initial value problems. A survey of historical methods to solve optimal control problems is given in [196]. Some of the more modern techniques pose the problem as a generic optimization and use search methods such as genetic algorithms.
These heuristic methods often have limitations requiring specific knowledge of the problem [197, 198].
37 Modern collocation methods have been developed from areas such as computational fluid dynamics using pseudospectral techniques. Rather than rely on low-order Newton’s method approximations and equally spaced points; new programs have been written which use higher order techniques and nodes of the Legendre-
Gauss, Chebyshev or other polynomial roots to achieve optimal spectral accuracy. The theory has been well developed and programmed [199, 200, 201]. The program used in this work is DIDO4 written by I. Michael
Ross [202]. It has been flight proven for the International Space Station and some examples of applications are shown in [203, 204]. Another common program in the literature is GPOPS5 written by Anil Rao et al. and applies similar techniques. The name of this program has changed with code structure or other issues; it is also known as GPOCS, OPENPOCS, GPOP and PSCOL [205]. Examples such as close satellite formations and variable mass reentry are shown in [205, 206, 207]. These programs implement the low-level nonlinear programming for optimal control with a high-level problem statement.
Similarly, kinetic solvers have been developed to solve rigid body dynamics. These programs incorporate physical constraints into ordinary differential equation solvers for finding the response of initial value prob- lems. A number of such programs are available with different applications such as machine design, impact reconstruction and virtual reality. A brief survey of programs for solving kinetic problems are given in [208].
At each moment in time, the geometry is analyzed to determine permissible movement directions based on kinematics. Within those restrictions, body and applied forces are used to determine the actual movement.
Using variable step solvers, very accurate solutions can be found. The program used in this chapter is Sim-
Mechanics, a blockset of Simulink which is an add-on to MATLAB6. The methods specific to this program are explained in [209]. The numeric kinetic solver gives high-fidelity solutions without extensive time spent modeling. Example results and programming explanations are shown in [210, 211, 212, 213]. SimMechanics offers compatibility with other MATLAB programs, visualization using imported 3D models, determination
4Distributed by Elissar, www.elissar.biz, Monterey, CA 93942, USA 5Open source available from www.gpops.org 6The MathWorks, Inc., Natick, MA 01760, USA
38 of mass properties from most common solid modeling programs and simulation of friction models, such as stiction.
Generalized numeric solvers operate very well once the problem is put in their framework. After that, the problem is in proper form and the solver can be treated largely as a black-box. The designer can then see the high-level behavior without needing to tailor an optimization to the particular geometry. Of course the disadvantage is that individual properties of a system are not exploited by the solver, so general purpose numeric solvers perform worse compared to specialized solvers. However, for a few runs to characterize the system, the general purpose solvers will often offer sufficiently accurate solutions in less time than required to compose specialized solvers. From a developmental learning context, a general purpose solver would be preferred since it only requires a high-level description of the system, rather than an understanding of the inter-relations of constraints.
The next section presents a brief statement on the vast scope of problems applicable to control optimiza- tion using a kinetic solver. In Section 4.3, the optimal control package and kinetic package will be described along with the coupling. Accuracy is validated for regulation of both a suspended and inverted pendulum for optimality in control energy in Section 4.4. Then Section 4.5 shows optimization of a pick-and-place com- mand for a four degree-of-freedom robot to demonstrate an example of a practical application. The chapter concludes in Section 4.6 by summarizing the findings and contributions of this approach.
4.2 Problem Statement
The scope of this chapter relates to rigid, three dimensional bodies with constant, finite, positive mass and moment of inertia acted on by body or joint forces. Joints are either prismatic, revolute or a combination with a finite number of degrees-of-freedom. The bodies are controlled through actuation of joints by either a specified motion or an applied force. Event constraints are imposed in the form x(to) Xo, x(tf ) Xf , ∈ ∈ to To and tf Tf where all sets are bounded. Path constraints are written in the form g(x, u, t) 0. The ∈ ∈ ≤
39 7 solution is x(t), to and the function of u(t) satisfying the constraints and minimizing the Bolza problem ,
Z tf J =ψo(xo, to) + ψf (xf , tf ) + φ(x, u, τ)dτ. (3) to
Limitations on this problem statement are that the system must be deterministic, nonsingular and com- pletely known. Though the techniques used by DIDO may converge to an ultimately bounded set if these assumptions are violated, this claim is beyond the scope of this chapter. Spring-damper links, and other spe- cialized links, are supported by the program, but are also outside the scope of this analysis along with screw joints and actuation states, such as motor dynamics. Having a known system means that the geometry and mass can be represented in SimMechanics and all other equations can be evaluated at points in time in MAT-
LAB. If a valid solid model is used, the constraints and mass properties will be physically realizable. Because the system is built around ODE solvers, all the standard assumptions apply. However some violations, such as using a linear interpolation table, still give good results. All the states, controls and times must have finite, known bounds. And finally, an optimal solution must exist.
This problem fits in the broader scope of developmental learning since it deals with the ability to plan a motion based on trying individual motions, which parallels learning by experience. The solution uses a high- level, but standardized, description of the system, so the algorithm can convert the motion optimization into a rote exercise, not requiring deep cognitive understanding. Learned motions would then facilitate high-level skills which can be employed for group learning.
4.3 Method
The method presented here is decomposed into two steps. The first is to form the kinetic problem. The second is to form the optimal control problem. The two problems are coupled by using the kinetic solver to provide dynamics to the optimal control solver. By linking these solvers, the low-level development of system
7The Bolza problem is a combination of the Lagrange problem which only considers running costs, φ, and the Mayer problem which only considers event costs, ψ.
40 of equations of motion is not required for optimizing trajectories. Solutions can display complex mechanisms that can then be exploited for system design after being verified for suitability.
The kinetic problem is formulated as a kinematic chain of rigid bodies forming a manifold of admissible movement. Forces and responses are applied to determine actual acceleration. All the designer provides is the chain and corresponding mass and moment of inertia. The chain begins with a ground coordinate system that has a specified but variable relation to a rigid body’s coordinate system, i.e. a joint. Joints relate dif- ferences between two coordinate systems. Revolute joints have angular differences about an axis. Prismatic joints have a translational difference along an axis. Composite joints, including universal joints, parallel con- straints, screw joints and gear ratios, have combinations of revolute and prismatic joints. Coordinate systems identify positions of the center of mass, joints, sensors and actuators. Coordinate systems of a rigid body are referenced together based on a constant relative vector in a body coordinate system. The process of creating and linking rigid bodies, sensors and actuators is repeated for each link. Rigid bodies can be drawn and as- sembled in popular CAD programs and exported directly into SimMechanics where joints are automatically created to match assembly constraints. Each joint applies a set of constraints to the location and velocity of adjoining links. These constraints can be linearized and solved efficiently. A discussion on best approaches and considerations in implementing this solver is given in [209]. The kinetic solver has a variable error bound and can be set to a sufficient limit. Note that if the bound is too large, results will appear inconsistent for numerical optimization.
Control optimization begins with a good understanding of what is desired. A high-level statement should be made about the objective. The performance index and constraints should be composed to ensure that the goal is met. Typical constraints involve restricting the state or control. For example, a rocket launch problem would impose a constraint that the height must be above ground throughout the path, the thrust is nonnegative and less than or equal the maximum available, the amount of fuel remaining is nonnegative and no-fly zones are observed. The first is a simple bound on the state. If thrust were the control, then the
41 second would be a simple bound on the control. In order to measure the amount of fuel, the fuel amount could be an added state with a simple bound. No-fly zones represent a constraint on the rocket state, but may not be a simple constraint. A function of the distance to the no-fly zone or other representation of the boundary would be needed and incorporated into g(x, u, t). Next, the cost or performance needs to be composed into a running cost φ(x, u, t) and boundary costs ψo(xo, to) and ψf (xf , tf ). Running costs are best used to describe good or bad aspects of the path such as control power, separation distance, or other changing quantities. The boundary costs are better suited for accessing the suitability of the initial and final states. Once these equations are written, then the optimal control problem can be easily coded. Collocation optimization techniques only require the function values to be known at certain points. This transforms the problem from involving differential equations to a static optimization, which offers many well developed algorithms [194]. The number of nodes used, like elements in a finite-element model, should be sufficient for the solution to converge to the continuous solution but not so many to amplify numerical error or make solving intractable. DIDO offers the use of an initial path for refinement, but does not require one.
The results from DIDO consist of the state, control, and time where each node is evaluated. In addition,
DIDO also returns the Hamiltonian and co-states used in the optimization. If DIDO determines that the solution converged, such a message will be returned; otherwise possible suggestions for debugging will be returned. Feasibility, optimality and sensibility should be checked to verify that the results are a solution.
First, verify that the constraints are met by the solution. The initial value problem ODE should be solved using interpolation between nodes for the control, and these states should be compared to the solved states.
There will likely be some numeric error, but it should be small enough for the application to show the results are feasible.
Optimality conditions, such as Pontryagin’s minimum principle and Bellman’s principle of optimality, should be checked as much as the specific problem will allow. Depending on the complexity of the problem, many of the optimality conditions may be difficult to check. In order to use the optimality conditions, the
42 Hamiltonian will be introduced
H(x, u, t) = φ(x, u, t) λ>f(x, u, t) (4) − where λ are the co-states and f(x, u, t) is the expression defining the state derivatives. From calculus of variations, the set of optimality conditions are:
∂H dλ = (5) ∂x dt ∂H dx = (6) ∂λ dt ∂H =0 (7) ∂u when the problem is unconstrained. Optimality of boundary conditions and active constraints are treated in
[214]. Because these representations require partial derivatives of f(x, u, t), these optimality conditions may not be practical to find analytically.
Some simple checks for optimality exist for specific problem types. If the Hamiltonian is time invariant and the final time is fixed, the optimal trajectory will have a constant Hamiltonian. If the Hamiltonian is time invariant and the final time is not actively constrained, the optimal trajectory will have a Hamiltonian equal to zero. Another check of optimality is to compare the cost (from the initial value simulation) to the costs from other methods. If the path is similar, the cost should be similar but worse than the DIDO solution. Above all, the results should be reasonable. After being shown the solution, it should make sense. This does not mean that the optimal results could be predicted, but that the results exploit some part of the system.
4.4 Optimality Validation
4.4.1 Setting up the problem
A simple example of a pendulum will be used to verify optimality. Consider a rod connected to a torque- controlled revolute joint with gravity acting vertically. This system is a practical choice: the model provides a pervasive baseline for control algorithms, it relates to robotic arms, rocket dynamics and wind-excited
43 Table 1: Pendulum Parameters
Assigned Properties Dimensions (l × w × h) 1 m × 36.9 mm × 10 mm Axis of rotation [1, 0, 0]> Gravity (g) [0, 0, −9.81]> m/sec2 Torque limits [−7, 7] N m Material (density) Aluminium (2.710 gm/cm3) Derived Properties Mass (m) 1.008 kg 1.026 0 0 Moment of inertia 1 0 1.025 0 kg m2 about CG 12 0 0 0.001 Distance from pivot 0.496 m to CG (lc)
tall buildings and it includes trigonometric functions and saturation, which are two of the most common nonlinearities. The goal is to move the pendulum from rest at an angle to straight up or down (θ = 0◦ on the respective figures). The parameters are given in Table 1 with a system illustration in Figure 1. The analytical model is
3 3l θ¨ = u c g sin(θ). (8) ml2 − l2
The performance function to be minimized is
Z tf J = u2dτ (9) to which would represent minimum energy input for an electric motor. For presentation, the square-root of J is taken. The solution is constrained with fixed initial and final times (to = 0 and tf = 1 sec) and a bounded controller ( 7 u 7 N m). − ≤ ≤
The pendulum assembly is composed of the pendulum and a pin that it rotates about. Minor effects due to the point of rotation not being on the tip, and the different cross section are automatically calculated within the solid model and kinetic solver programs. A sensor and actuator are added to the imported kinetic model.
The model is linked to the optimal control solver via the pseudocode8 in Table 2. The SimMechanics model
8 Code is available as “SimMechanics pendulum used for control optimization” (www.mathworks.com/matlabcentral/fileexchange/28597), MATLAB Central File Exchange. Retrieved Jun 10, 2012.
44 Suspended Inverted m θ Gravity L L θ m
Figure 1: The pendulum is either inverted or suspended by changing the direction of gravity with respect to θ = 0. The solid model used is shown on the right. It consists of the pendulum and a small piece serving as the pin of rotation.
Table 2: Pseudocode for Connecting SimMechanics to DIDO
Function Dynamics, f(t, θ, θ,˙ u) For each node (time step, t) Run SimMechanicsSystem([t, t + ε], [θ, θ˙], u) Store output θ˙, θ¨ Return the set of θ˙, θ¨ for all nodes
consists of the kinematic tolerances, a revolute joint for the pin, an actuator for the control input (motor), a sensor for the state derivatives and a massive body for the pendulum, as shown in Figure 2. In the pseudocode,
SimMechanics System simulates the system over the time span [t, t + ε] with a constant control input of u(t) where ε is a small number. The states given from DIDO, θ and θ˙, provide the initial conditions for the simulation which outputs the state derivatives, θ˙ and θ¨, at time t. This is repeated at each node of the solution and then the set of all outputs for all time is returned. The test solution given from DIDO is not guaranteed to be feasible, so each node is simulated in isolation. Coding the rest of the optimal control problem is presented in the user’s guide for DIDO [202]. To find an estimate of the error of the combined numeric methods, starting angles in 10◦ increments from the final state were tested for all starting angles up to 180◦.
In order to compare this method with current practices, a linear-quadratic (LQ) state transition controller was also simulated on the system linearized about the equilibrium point based on the optimal linear state
45 Figure 2: The SimMechanics program solves the kinetic problem using relations of forces and rigid bodies.
transition. Because LQ requires a linear system, it cannot account for control saturation. The linear system and control solution are
θ˙ 0 1 θ 3 0 =x ˙ = + u 3lc ˙ 2 θ¨ 2 g 0 θ ml 1 − l =Ax + Bu (10)
Z tf Aτ > A>τ Wc = e BB e dτ (11) to
> > A (tf −t) −1 Atf u(t) = B e W (e xo xf ). (12) − c −
This solution exploits the linear solution to reach the final state exactly. But due to differences with the actual system, using the nonlinear system results in a large error due to its open-loop formulation.
46 The more common infinite-horizon linear-quadratic-regulator (LQR) controller can be written in the feed- back form, u = Kx. The gain, K can be found from the set of equations −
0 =A>S + SA (SB + N) R−1 B>S + N > + Q (13) −
K =R−1 B>S + N > (14) which is defined by the cost function
Z ∞ J = x>Qx + u>Ru + 2x>Nu dt. (15) 0
The finite time LQR does not give equivalent results to the desired optimal control problem, because it does not impose an exact constraint on the final position but only a penalty function. The final state constraint can be approached by adjusting Q until the desired settling time is reached. The control weight, R, is held at 1, to scale the cost; N is set to zero and the first element of Q is used to control settling time while the other
◦ elements are zero. If this cannot be done (as for the suspended pendulum with θo 130 ), the lowest cost ≥ controller is chosen. The measure used for settling is
(g/l )(1 cos(θ)) + (1/2)θ˙2 E = c . (16) − 2 (g/lc)(1 cos(θ0)) + (1/2)θ˙ − 0 This provides a positive definite function over θ ( 180◦, 180◦) similar to the total energy (kinetic and ∈ − potential). The true energy equation was not used since in the case of the inverted pendulum, energy and would provide a false measure of settling since the kinetic energy is converted to potential energy using momentum to compensate for the angular error. When E < 0.002 = 0.2%, the system is considered settled.
This is done on the full nonlinear simulation for accurate tuning. This method provides an LQR controller with an equivalent final time, but with a different control law.
4.4.2 Path evaluation
The results condensed into cost and final error are shown in Figures 3 and 4 for the suspended and inverted pendulum (note that the square root of J is shown). The cost of all methods was comparable, suggesting the 47 Control Cost, Suspended Pendulum 6
sec 4 √
N m 2
0 0 50 100 150
Final State 5 4 DIDO 3 LQ 2 LQR 1
Settling measure, % 0 0 50 100 150 θ , deg o
Figure 3: For a suspended pendulum, the cost of all methods is comparable suggesting that results are rea- sonable. However, due to saturation and geometric nonlinearities, the linear based methods fail to reach the final state for some angles, while DIDO gives consistently good results.
DIDO solution is reasonable. Surprisingly, the suspended pendulum proved to be the more challenging control problem. The open-loop LQ system is able to have high accuracy for low initial angles, but as the angle grows, it compensates for a larger-than-actual restoring force resulting in a large overshoot. Ideally, the
LQ controller should increase in the square root of J linearly, but saturation effects limit higher costs. The
LQR controller has noticeably higher costs, but due to saturation, cannot achieve a 1 second settling time for large angles. If the control saturation constraint were removed, then results would continue along the previous trend. LQR is based on asymptotic settling, so there is always finite error while path methods, like
DIDO and LQ, can have negligible error.
For the inverted pendulum, DIDO very closely mimics the LQR controller for cost, with slightly lower cost and error. The unstable nature of this system makes the results of the LQ controller irrelevant due to the large final error. The unexpected feature is that the cost peaks at 120◦. The decrease in cost for further starting angles does not violate Bellman’s principle since the state, which includes velocity, is different than
48 Control Cost, Inverted Pendulum 8
6 sec
√ 4
N m 2
0 0 50 100 150
Final State 5 4 DIDO 3 LQ 2 LQR 1
Settling measure, % 0 0 50 100 150 θ , deg o
Figure 4: For an inverted pendulum, the LQR gives a slightly higher cost then DIDO. The unstable nature of this system makes the results of the LQ controller irrelevant due to the large final error. The cost peaks at 120 degrees due to the large initial torque required as the pendulum is extended sideways.
the one passed through by larger starting angles. When the pendulum begins near sideways, a large initial torque is required to develop momentum. At larger angles, the controller passes through this region at greater speed, and therefore less time, resulting in a slightly lower cost.
The underlying nature of the different controllers can be seen from individual paths. The trajectory for a small staring angle, shown in Figure 5, has a weak nonlinearity for the stable system. All controllers reach the desired state. Path optimal controllers, LQ and DIDO, exploit the self righting nature of the pendulum by applying a positive torque to slow the pendulum. The LQR controller is a feedback controller and the large initial error has a corresponding spike in control. This spike results in an overshoot of the final position, for
◦ which compensating requires more energy. As the nonlinearity increases for θo = 140 (see Figure 6), the overshoot caused by the initial spike is too large to settle in 1 sec. Increasing the path weight, Q, results in larger overshoot compounding the problem while lower Q is insufficient to settle in 1 sec. The LQ controller expects to dampen a much larger restoring force (proportional to θ), resulting in toppling the pendulum the
49 ° Suspended Pendulum for θ =40 o
40 DIDO 20 LQ , deg LQR θ 0
0 0.5 1
0 −50 −100 , deg/sec ˙ θ 0 0.5 1
5 0
u, N−5 m 0 0.5 1 time (sec)
Figure 5: For a suspended pendulum beginning at a small angle, the results from numerical optimization are compared to results using an LQ and LQR controller. With this weak nonlinearity, all controllers reach the desired state. The path-optimal, LQ and DIDO controllers exploit the self righting nature of the pendulum. The LQR controller is a feedback controller and the large initial error has a corresponding spike in control.
other way. DIDO results accurately reach the final state with a very smooth control wave. For the inverted pendulum (see Figures 7 and 8), despite similarity in cost and accuracy, the LQR controller functions very differently from the DIDO controller. The LQR features a stronger initial peak and asymptotically reaches the final state while the DIDO controller is more balanced and directly approaches the final state. DIDO results closely resemble the LQ results, but provide the proper compensation for the system nonlinearities.
4.4.3 Optimality conditions
Results of the DIDO controller are checked for optimality according to Pontryagin’s minimum principle.
Feasibility is checked by the solution to the initial value problem with the given controller. For intra-node values, piecewise-cubic spline interpolation was used. The results were subtracted from the DIDO solution and scaled by the maximum absolute value of the state. These co-state residuals represent stationarity with
−6 respect to the co-states (λ1, λ2). All of these were within 10 of zero. Stationarity with respect to the states,
50 ° Suspended Pendulum for θ =140 o 150 DIDO 100 50 LQ , deg LQR θ 0 −50 0 0.5 1
0 −100 −200 , deg/sec ˙ θ −300 0 0.2 0.4 0.6 0.8 1
5
0 u, N m −5 0 0.2 0.4 0.6 0.8 1 time (sec)
Figure 6: For a suspended pendulum beginning at a large angle, the results from numerical optimization are compared to results using an LQ and LQR controller. With this strong nonlinearity, the LQ controller fails outright while the LQR approaches but does not reach the goal. DIDO is able to accurately reach the final state.
° Inverted Pendulum for θ =20 o
20 DIDO 10 LQ , deg 0 θ −10 LQR 0 0.5 1
0
−20 , deg/sec ˙ θ −40 0 0.2 0.4 0.6 0.8 1
0 −2
u, N−4 m 0 0.2 0.4 0.6 0.8 1 time (sec)
Figure 7: For an inverted pendulum beginning at a small angle, the results from numerical optimization are compared to results using an LQ and LQR controller. Even though the nonlinearity is weak, its unstable nature causes the open-loop LQ controller to not reach the final state. The feedback LQR controller takes a initially aggressive approach compared to the optimized controller.
51 ° Inverted Pendulum for θ =140 o 150 100 DIDO LQ
, deg 50
θ LQR 0 0 0.5 1
0 −100
, deg/sec −200 ˙ θ 0 0.2 0.4 0.6 0.8 1
5 0
u, N−5 m 0 0.2 0.4 0.6 0.8 1 time (sec)
Figure 8: For an inverted pendulum beginning at a large angle, the results from numerical optimization are compared to results using an LQ and LQR controller. The LQ controller fails again, noticeably suffering from saturation effects. The LQR controller approaches the final state asymptotically while DIDO directly approaches the final state.
2 which is calculated from Eqn. 5, is checked by λ˙ 1 λ2glc cos(θ)/(l /3) and λ˙ 2 λ1 and then scaled by − − the maximum absolute value of the co-state derivative. These residuals were constant over each trajectory and scaled by the θo of the trajectory. The residuals for all but three paths were under 4.7% with the largest residual being 9.1%. Stationarity with respect to the control only applies when there are no active constraints
2 on the control. The control optimality residual is 2u + λ2/(mgl /3). When the control constraint is not active, the maximum residual is less than 3.0%. Figure 9 shows a surface of how the control wave changes for the inverted pendulum as a function the starting angle with the lower surface showing how that affected the control optimality residual. All residuals are shown in Figure 10.
4.5 Practical Application: Industrial Robot
A pick-and-place command is a fundamental task for an industrial robot which loads machines or sorts parts. It consists of picking an object up from one pose and then placing it at another pose. A solid model
52 Control vs starting angle
5 0 −5
Control (Nm) 1 0.5 100 150 0 50 Time (sec) Starting Angle (deg) Control Optimality residual
1.5 1 0.5 0
Residual (Nm) 1 0.5 100 150 0 50 Time (sec) Starting Angle (deg)
Figure 9: The top surface shows how the control signal for an inverted pendulum changes as a result of the starting angle. The control saturates around a 90◦ starting angle due to the high initial torque to overcome the most powerful gravity torque. When the control constraint is not active, the optimality residual should be zero. Otherwise, their product should be negative; as is the case with these results.
Residuals, Residuals, suspended pendulum inverted pendulum 1.5 1.5
1 1
0.5 0.5
0 0 0 0.5 1 0 0.5 1 time (sec) time (sec) Suspended, zoomed in Inverted, zoomed in 0.05 0.05 0 0 −0.05 −0.05 −0.1 −0.1
0 0.5 1 0 0.5 1 time (sec) time (sec)
Figure 10: The residuals for all nodes for all starting angles show the optimality conditions are met to a reasonable numeric standard. Control residuals are shown as black boxes, state residuals are shown as green circles and costate residuals are shown as red crosses.
53 Figure 11: The coordinates for the four degree-of-freedom arm are shown on the left. The initial and final poses are in the middle and right respectively.
was built and scaled similar to the outer four linkages of the Motoman SIA-20D [215] and is shown in Figure
11. This model was imported to Simulink and prepared as described in Section 4.49. A 2 sec execution time was chosen along with the initial and final poses of the pick-and-place command, as shown in Figure 11. The same cost function is used, but implemented as a vector
1 Z tf J = u>udτ (17) tf to − to where u R4 with each element being the torque applied to a joint. The traditional trajectory for industrial ∈ robotics is created from a ramp to a constant velocity for each link and this method is used for comparison purposes. The optimized path is shown through intermediate poses in Figure 12 and as signals in Figure 13.
The optimized path is very dissimilar to the traditional path. Rather than transitioning directly to the final state, the arm goes vertical to reduce the control required to hold the arm up (Joint U). Also surprisingly, the upper linkage over rotates and then rotates back (Joint R), presumably to use two motors rather than one so that the square of the control is lowered. The square-root of the cost is reduced from 45.7 Nm to 19.5 Nm, a
57% reduction. 9 Code is available as “Control optimization of a 4DOF arm using DIDO” (www.mathworks.com/matlabcentral/fileexchange/28596), MATLAB Central File Exchange. Retrieved Jun 10, 2012.
54 Figure 12: The path of the arm is shown by poses.
4.6 Conclusion
This chapter shows that the challenging problem of finding optimal trajectories can be solved without specific analysis of the system by using general kinetic solvers and optimal control solvers from a CAD assembly. The methods used by these solvers are well documented and reliable. Use of these programs allows for insight into high-level planning that may not be obvious from the low-level equations. A simple case was shown to test two common types of nonlinearities: geometric and saturation. Results show that the numeric method on the full nonlinear system is able to generate motions with more accuracy and lower cost than optimal paths of simplified system models, such as the optimal linear-quadratic state transition (LQ) and the infinite-time linear-quadratic-regulator (LQR). A practical case is also shown involving a more complex system. The method is able to handle stable and unstable systems, as shown by the results and contrasted to the linearized controllers which can fail on even stable systems.
55 Figure 13: The optimized path is shown on the left compared to the traditional path on the right.
These generalized programs allow for more freedom and innovation in the initial design phase by solving for representative paths without a commitment in analysis to a given design. Mechanisms for reducing the cost can be identified and then incorporated into the system design. This work prepares a method to be used for autonomous optimal trajectory generation on amalgamations of modular robots. Modular SimMechan- ics programs could be created for the robot to automatically compose a model of its current configuration.
However, there is currently still need to check results for feasibility and sensibility. On the other hand, for many cases the method would succeed and allow for higher level tasks, such as cooperative tasks or steps in a process, to focus on the high level behavior, without a need for low level motions to be developed by hand.
56 The good performance obtained by the numerical methods also shows how reasoning and understanding are not required for planning. All that is required is a high-level description of the goal or task and the system. Rather than using heuristics to determine where to search for candidate solutions, this method simply uses perturbations to refine a solution. These perturbations can be related to local goal babbling, though incorporated by different methods. By using principles from numeric optimization, a very efficient selection of trials are tested. Only evaluations of the motions are needed, as solved by SimMechanics, but results could extend to other systems without an analytical representation, such as tests of a physical system. Because of the mathematical soundness of modern numeric methods, they can represent a near optimal comparison when the discretization of time is considered. Methods to increase the temporal resolution are presented in Chapter
VI.
57 CHAPTER V
INVERSE FUNCTION CLUSTERING
Finding optimal inputs for a multiple-input, single-output system is taxing for a system operator. Population-based optimization is used to create sets of functions that produce a locally-optimal input based on a desired output. An operator or a high-level planner could use one of these inverse functions in real time. For the optimization, each agent in the population uses the cost and output gradients to take steps lowering the cost while maintaining their current output. When an agent reaches an optimal input for its current output, additional agents are generated in the output gradient directions. The new agents then settle to the local optima for their new output values. The set of associated optimal points forms an inverse function from a desired output to an optimal input, via spline interpolation. In this manner, multiple locally-optimal functions can be created. These functions are naturally clustered in input and output spaces allowing for a continuous inverse function. The operator selects the best cluster over the anticipated range of desired outputs and adjusts the set point (desired output) while maintaining optimality. This reduces the demand from controlling multiple inputs, to controlling a single set point with no loss in performance. Results are demonstrated on a sample set of functions and on a robot control problem.
Nomenclature
x Input or design variable, Rn, with n > 1 ∈ Ω Domain of x
th bi,l bi,u Lower and upper bounds of Ω in the i dimension y = f(x) Output function, R ∈ yd(t) Desired output value
J Cost function, R ∈ x∗ Locally-optimal value of x based on neighboring points with identical value of f(x) 58 th hk(y) The k optimal-input function, k 1,...,K ∈ { }
Υk Domain of y for hk(y)
C1 The set of continuously differentiable functions
f(x) Gradient of function f with respect to x ∇ f,ˆ Jˆ Length limited versions of f and J, respectively, Rn ∇ ∇ ∈ δy[i], δJ[i] Actual change in output and cost, respectively, when going to step i
n ∆J Cost reduction step, R ∈
n ∆Y Step for increasing output , R ∈
βJ , βy, β∆ Saturation length for J, y and ∆J , R ∇ ∇ ∈
βδ Step length reduction factor, (0, 1) ∈ σ Armijo rule factor, (0, 1) ∈ sat(x) if x 1, sat(x) = x; otherwise sat(x) = x/ x k k ≤ k k
γJ Maximum step length for cost, R ∈
γy Maximum step length for output, R ∈
5.1 Introduction
Constrained optimization is the process of finding the input that satisfies required conditions and mini- mizes a given cost function. Robust optimization considers if there are variations in the constraint or cost functions. It may use likelihoods to minimize the expected cost, or it may use bounds to limit constraint violation or the maximum cost. An example is minimizing the weight of a bracket based on the maximum load. If the true maximal load is less than the designed maximum load, then the design may no longer be optimal. In control applications, inputs can often be easily changed so the true optimal can be achieved.
Optimization inputs could be knots describing a control waveform, used by colocation methods for optimal
59 control [199, 200, 201] or control parameters for gain scheduling. An operator may set the speed of a walk- ing robot while using the minimum energy gait. Existing work has solved for individual optimal points or
Pareto-optimal parameters and then interpolated between them for novel speeds [216, 217]. The danger of this method is that the Pareto-optimal fronts are not guaranteed to be continuous in the input space which can result in discontinuities when transitioning between Pareto-optimal points. This work finds a path of contiguous, locally-optimal inputs guaranteeing that transitions are smooth for a smoothly changing desired output.
A number of population based optimization methods exist, such as: genetic algorithms [218, 219], ant colony optimization [220], bacteria foraging [221] and particle swarms [222]. Each population method uses a set of agents that interact with the environment or other agents to search a large space. The essence of a swarm algorithm is that local information and interaction are used to create global behavior. For this chapter, the local information is contained by gradients and the agents act differently based on their current stage.
Explicit or implicit niching can be added so that subpopulations focus on distinct optima for multimodal global optimization [223], such as glowworm optimization [224], sharing and clearing in genetic algorithms
[225] and partially connected neighborhoods in particle swarms [226]. By using a population, the input space is efficiently searched without the need for prior knowledge. No methods presenting a multimodal approach to creating inverse, optimal functions are present in the available literature.
From a developmental perspective, inverse functions represent the ability to master a given task. Having multiple inverse functions corresponds with the mixture of experts theory since alternative solutions are retained and can be selected based on other factors. Though this chapter deals with a set of parameters,
Chapters IV and VI show how a continuous function can be represented by a set of parameters. Chapter VI additionally shows how sets of parameters can be extended to continuous functions in the limit.
The chapter begins with the formal problem statement (Section 5.2) followed by a detailed description of the algorithm (Section 5.3). Two types of examples have been tested. The first in Subsection 5.4.1 looks at
60 a variety of output and cost functions to examine the behavior of the agents and path of inverse functions.
Next, in Subsection 5.4.2, the method is applied to optimizing a robotic arm’s poses to increase precision.
The chapter concludes with a discussion of results.
5.2 Problem Statement
∗ ∗ A set of K functions, x = hk(yd) with k 1,...,K , is desired. Each function hk produces x ∈ { } ∈
n ∗ ∗ Ω R as a function of yd Υk R such that yd = f(x ) and x : f(x) = yd in a neighborhood, J(x ) ⊂ ∈ ⊂ ∀ ≤
∗ J(x); meaning x minimizes J for neighborhood points with identical output values. The functions hk(y) should be continuous over their domain, Υk, which is a non-empty, open set. The closed set Ω represents the allowable input values and is defined by bounds on each individual dimension, i.e. bi,l xi bi,u. In ≤ ≤ order to have a well defined gradient, f(x) and J(x) should be in C1 and bounded over Ω. For the Armijo condition on step size, J(x) should also be in C2. The functions J(x) and f(x) should not be constant for optimization and inversion, respectively.
The output function, f(x), represents the mapping from the input to an output. The output represents a desired condition which is selected on-the-fly by an operator. The function J(x) represents the mapping of the input to a cost. In addition to f and J, the gradients f and J, should be available for evaluation. The ∇ ∇ cost and output should be able to be represented solely as a function of x, meaning that no outside parameter influences either. This means that with a proper adjustment of x, then y and J can be manipulated. After
∗ selecting an inverse function, hk, then y can be controlled to be yd via x , provided yd lies in Υk. Each of the K functions represents the optimal inverse function for a region, so Ω is divided into separate clusters representing disjoint, locally-optimal solutions. Selection of an hk would be based on considerations of Υk and the values of J(hk(yd)) over Υk.
The technique is primarily directed to multimodal optimizations over a bounded set. If the problem is convex or obvious, then this method does not exploit that knowledge. Even though it would come to
61 the same solution, it would search and therefore take more function evaluations than needed. Like most optimization methods, scaling can significantly affect performance and the relative scale is assumed to be known and reasonably constant. This method finds static optima in that it does not consider the rate at which
∗ yd changes or the corresponding rate of change of x . Once the functions hk(y) are found, the limit on the rate of change of yd can be found based on the limit of the rate of change of x, or vice versa. The inverse functions are not intended for robustness for rapidly fluctuating or unknown yd. This method is well suited for developing a human-control interface, as shown in Section 5.4.2, where the operator would select hk based on the anticipated range of yd, and then vary the desired yd with hk automatically providing the locally-optimal x∗.
5.3 Algorithm
There are two major phases of this algorithm. The first is the optimization while the second is the oper- ation. To begin optimization, particles are distributed across Ω and begin constrained gradient descent. As particles find local, constrained optima, additional particles are created in the f(x) directions and then ±∇
∗ ∗ optimized to create a set of (x , f(x )) pairs which are used to create an hk(y). The final phase is the execu- tion phase, where an hk is selected and used online as yd is obtained. Calculations in the execution phase are computationally simple for real-time processing. The overview is shown in Figure 14. This method abstracts the low-level behavior of finding optimal points in n-dimension space, allowing the operator to simply adjust a set point in real-time.
5.3.1 Optimization
The principle behind swarm optimization, is that a population of agents using simple, local rules can efficiently locate an optimal point. Traditional particle swarm optimizations are concerned with exploring to
find the global optimum. In this work, local optima are desired and the effect of moving an agent is known by gradients, so mechanisms like particle velocity and local best are not needed for good performance. Agents
62 Optimization Execution
Initialize Move Agents: Set of Operator Select inverse Population Lower J(x), hk(yd)’s function, hk(yd) Maintain f(x)
Get y , Move to Check for removal or Form d Evaluate h new x* settling conditions Cluster k
Figure 14: There are two phases of the algorithm. The clusters of optima are found forming the functions hk(y). The set of hk’s are used to adapt to changing conditions, such as the operator set point, yd.
locate the optimal point along their current output contour by stepping opposite the component of the cost gradient, J, that is orthogonal to the output gradient, f. Using the Armijo condition on step length, −∇ ∇ the change in output can be bounded and the sequence of the cost will converge to a stationary point. The associated proofs are shown in Addendum 5.6.1 of this chapter.
To ensure that the step size is sufficiently small and prevent ill-defined directions for very small gradients, these gradients are saturated by
Jˆ(x) =sat ( J(x)/βJ ) (18) ∇
fˆ(x) =sat ( f(x)/βy) (19) ∇ where sat(x) returns x for the case x 1 and x/ x otherwise. When x 1, the vector magnitude k k ≤ k k k k ≤ diminishes with the gradient magnitude; while in the other case, the magnitude is limited to a unit length. The value of βJ is chosen so that step length is diminished as an optimum point is approached. The value of βy is chosen so that when f gets small, meaning the output is relatively constant, the length of fˆ diminishes. k∇ k The step direction vector is given by
> ∆J (x) = Jˆ(x) + Jˆ(x) fˆ(x) fˆ(x). (20) −
63 When fˆ is a unit magnitude ( f(x) βy), then ∆J is the component of Jˆ orthogonal to f. As fˆ ∇ ≥ − ∇ diminishes in magnitude to zero, then ∆J extends to Jˆ for a greater decrease in J with only a minor change
> in y. The maximum length of ∆J is a unit length and occurs when J(x) βJ and J f = 0; otherwise ∇ ≥ ∇ ∇ the length of ∆J is J /βJ times the sine of the angle between the vectors. k∇ k
The update rule for the ith step of the jth agent is given by
m xj[i + 1] =xj[i] + γJ (βδ) sat (∆J (xj[i])/β∆) (21) where m is the lowest nonnegative integer that satisfies the Armijo condition:
m > J(xj[i + 1]) J(xj[i]) σ γJ (βδ) sat (∆J (xj[i])/β∆) f(xj[i]). (22) − ≤ ∇
This condition requires that the cost decrease by at least a fraction, σ, of expected decrease based on the local gradient and the step. If xj[i + 1] were to step past the lowest cost in the step direction, the cost would increase. By successively reducing the step by a factor of βδ, a limit cycle is prevented where the reduction in cost goes to zero, though the step length does not. The value of γJ should be small enough to capture relevant features of f(x) and J(x). The saturation length β∆ specifies the threshold when ∆J should be reduced due to either small cost gradient or alignment of the two gradients.
If this step would cause xj[i+1] to fall outside Ω, then the magnitude of the step is reduced until xj[i+1] will lie on the boundary of Ω; then an additional step is taken with the domain-limited components of Jˆ and fˆ reduced to zero. The step of Eqn. 21 without the domain-limited components is repeated until no boundary violation occurs. Agents that approach within a threshold of a cluster or another agent are expected to converge to the same cluster, so one of the two agents is eliminated.
If xj[i + 1] xj[i] ∆min for an agent, then the first-order, constrained optimality conditions are k − k ≤ satisfied within a bound10. The location of the agent can be described as having no directions that maintain
10First order optimality conditions provide a necessary but not sufficient optimality condition. Constrained, maximum also satisfy these conditions, but are unlikely due to the unstable equilibrium caused by the space of the first even non-zero partial derivative, in the null space of the output gradient, after an odd partial derivative being negative definite. The use of saturation and optimality thresholds does exacerbate this issue however.
64 the current output and reduce the cost significantly while remaining in Ω. The jth agent is removed from the
∗ population and a new cluster is built about the location of this settled agent, xj . This location is used as the
first member, xk,0, where k 1 is the number of previously created clusters. A cluster is grown by creating − agents in the positive and negative directions of f(x). Agents are created by ∇
∆Y (x) =γyfˆ(x) (23)
xk,p+1[0] =xk,p + ∆Y (xk,p) (24)
xk,q−1[0] =xk,q ∆Y (xk,q) (25) −
where p and q are the previous largest and smallest indices of the cluster. Both xk,p+1 and xk,q−1 are then updated according to Eqn. 21 until settled. This progression is shown for a few steps in Figure 15. If an agent created from Eqn.’s 24 or 25 settles neither too close nor too far and the change of f(x) is in the correct direction, then it is added as a data point to the kth cluster and Eqn.’s 24-25 are used to continue the cluster formation process. If the agent fails to settle in a given number of iterations, then the new point has not been determined to be optimal and this direction of the cluster is terminated. If the agent settles too near or too far, then the cluster likely does not extend in Υ in that direction and the cluster is terminated in that direction.
By requiring a non-zero change in J(x∗) and x∗, the inverse function is guaranteed to have a finite Lipschitz constant. Clusters that contain only one point are discarded since they have an empty Υ domain. The set of
th points in the k cluster are used to form hk. The process of creating the inverse function, Eqn.’s 21, 24 and
25, is executed while steps of the general population are suspended. Since a cluster is known to exist there, it is prudent to mark its extents before general exploration.
5.3.2 Execution and cluster organization
Since the output and cost are scalars, and contain the information important to the operator, they offer a convenient representation to examine and test clusters. Plots described in this section will be shown for the following examples. By plotting the cost vs the output for the set of clusters, the operator can focus on the
65 th th Figure 15: The k cluster is formed about the k settled point, xk,0. New agents are added in the positive and negative output directions by Eqn.’s 24 and 25, shown by ’s at [0]. New agents step by Eqn. 21 and eventually settle to locations shown by ’s, which are added to the× cluster provided they meet certain criteria. The process of adding and updating agents◦ is repeated until that direction is terminated. The set of settled points are then used form hk via interpolation as shown by the dashed line.
range of values in the output’s domain that are of interest and then select the cluster with the lowest cost, as averaged based on the operator’s judgement. To compare a set of clusters in the x domain, each dimension of x can be plotted against y. This allows access to the low-level behavior of the optimal function so the operator can identify potential problems not incorporated in the optimization, transitions between clusters or gain insight into optimality tradeoffs.
In order to construct hk(yd), a piecewise, n dimension, cubic Hermite spline is used that preserves the local monotonicity of the data [227]. This cubic interpolation provides a function that is C1 and will capture sharp features of the clusters, such as those generated by interactions with the boundary of Ω, without the oscillation that smoother interpolations would generate.
Optimality and accuracy of a cluster can easily be verified via a Monte Carlo simulation. Points along the cluster are generated by hk and the actual cost and output are calculated. The difference in predicted output to actual output suggests the performance of the cluster regarding accuracy. Perturbations from the cluster
66 nodes are randomly generated based on the scale of the envelope of the cluster in Ω. The cost and output of each of these neighboring points are calculated. A plot of cost versus output should have the points from hk along the lower bound of the data; meaning for any given output in Υk, all the perturbations in the input space result in a higher cost. However, if perturbations leave the locally-optimal defined neighborhood of the cluster, then it may be possible for neighboring points to result in a lower cost but these points would not offer a continuous function connecting it to the other points in hk.
5.4 Results
Two different test are presented. First, tests were conducted across a set of two dimensional problems and analyzed graphically. At output strict extrema, the point is optimal because no neighboring points have the same output. Clusters extend from these extrema according to the local cost contours. Second, the pose of two robotic arms are optimized to improve precision of the radial distance to the tool tip. Precision relates to the sensitivity of distance to different joint angles. By reducing the sensitivity of the distance to joint angles, a higher precision is obtained for the same joint angle resolution of the equipment.
5.4.1 2-D test problems
Four functions were tested in all permutation as J(x) and f(x). They were
1. Multimodal Gaussian11
2 2 2. Quadratic: z = x + (x2 0.2) 1 −
2 3. Linear-Quadratic: z = 1.25(x1 0.2)x − 2
4. Periodic: z = x1 sin(πx2)
11Equation for Multimodal Gaussian is 2 2 2 2 2 2 2 −4x1−(2x2+1) −(2x1+1) −4x2 3 5 −4x1−4x2 z = [3(1 − 2x1) e − e /3 − 10(2x1/5 − 8x1 − 32x2)e ]/7
67 This set included multimodal functions, broad flat sections, saddle points and some obvious solutions. The input set is Ω: 1 x1 1, 1 x2 1.5 . These functions are scaled to provide gradients typically {− ≤ ≤ − ≤ ≤ } about a unit length. One set of optimization parameters was chosen and performed well on all functions, but parametric influence was not studied. Initial population size was 60, with a maximum number of population iterations of 400 and 150 steps for a cluster point. These limits were rarely reached because the population size quickly decreased. New cluster tips were restricted to be at least 7.5e-6 away from the current tip but within 0.1 of it. Agents within 0.15 of another agent or cluster triggered removal of an agent. Step parameters were βJ = 0.05, βy = 0.025, β∆ = 0.5, γJ = 0.05, γy = 0.03, βδ = 0.88, σ = 0.02.
Two illustrative examples are shown. The first has the quadratic function for the cost with the periodic
S function for the output. Results are shown in Ω in Figure 16 and in Υk in Figure 17. The next example has the linear quadratic function for the cost with the multimodal Gaussian function as the output. Results
S are shown in Ω in Figure 18 and in Υk in Figure 19. As would be expected, the clusters have tips near extremum in the output. Clusters also formed where the output becomes flat, resulting in a near discontinuity of hk; or where the cost function becomes flat, resulting in local optimum. Other clusters are formed along the domain boundary until a valley in the cost function is reached. For a high dimension system, an operator
S would look across the domain of Υk, first for the cost, and then for x to determine cluster nearness if a transition must be made.
The test of a cluster from Figure 16 is shown in Figure 20. Points in the cluster were accurate to 1% of the full range of yd for all permutations of functions. When the cost of test points were compared to randomly generated neighboring points in Ω, the points from hk would consistently be on the lower boundary for a given y; except when the neighboring points would extend into the optimality watershed of another cluster. This is still consistent with the points being locally optimal, since the size of the neighborhood used to generate the random points was not limited based on the mathematically defined neighborhood used for optimality determination. Neighboring points were limited to Ω however.
68 Figure 16: The clusters formed with the quadratic cost function and the periodic output function are shown. Different clusters are shown by a different color. Each cluster represents a locally-optimal point for the output contour.
Figure 17: The clusters for Figure 16 are shown as yd varies. Each dimension of the set of hk’s is shown along with the associated cost. The light green cluster is obviously the preferred cluster covering all of yd and always having the lowest cost.
69 Figure 18: The clusters formed with the linear-quadratic cost function and the multimodal Gaussian as the output function are shown. Different clusters are shown by a different color. Each cluster represents a locally- optimal point for the output contour.
Figure 19: The clusters for Figure 18 are shown as yd varies. Each dimension of the set of hk’s is shown along with the associated cost. The cost chart can be used to select an appropriate cluster. If one cluster cannot be found, then the upper charts can be used to find clusters that are near, such as the red and yellow cluster going from yd < 0 to yd > 0 at x1 = 1 and x2 = 1. − −
70 Figure 20: A cluster from Figure 16 is tested for optimality and accuracy. Accuracy is checked by generating ∗ test points (in dark blue) on the cluster (in red) for random values of yd and comparing f(x ) yd to zero. ∗ − Test points in the neighborhood of x (in light green) are compared by cost in Υk. Cluster points are always lower then the test-points except in the case where the test points extend away from the cluster and into another cluster’s optimality neighborhood.
Results of these test problems show that Eqn.’s 21, 24 and 25, do capture the relevant features necessary for a locally-optimal, continuous inverse function. The constrained gradient step presented offers sufficient accuracy without the need for second order approximations. The criteria for removing agents allows the large population to be quickly reduced, since most agents converge as they approached local optimum. As a result, the method offers an efficient method for segmenting a large search space into inverse functions.
71 5.4.2 Robotic arm control
Robotic arms with revolute joints have geometric nonlinearities so multiple poses exist with identical positions of the tool, even for non-redundant systems. The end of the tool will be considered the tip for this work. Only location will be considered, not orientation. Redundant systems offer more joints then degrees of freedom, so there is automatically multiple poses with identical tip locations. The Jacobian of the tool tip location represents the sensitivity of the tool tip’s position in each direction to each joint’s angle and is a function of the current pose. Due to practical limitations, each joint has a limited angular resolution, so larger values in the Jacobian result in lower precision. Because of the trigonometric nonlinearities, optimizing the pose to increase precision is a multimodal problem.
This subsection examines increasing the precision of the radial distance for robotic arms with 4 and 7 revolute joints. The geometry of linkages is based on the Yaskawa Motoman HP-3 [228] and IA-20 [215] with a 101 mm tool length. These problems reduce to a 2 dimensional and 3 dimensional optimization. The optimization for the 7 revolute jointed robot, the IA-20, includes revolute joints axis aligned with link axis, termed twist joints. The radial distance is less sensitive to these twist joints, creating poor scaling, yet the algorithm preforms well.
Planar robot
The arm considered can be viewed as a planar linkage, with the plane being allowed to rotate. The actual
HP-3 also includes two addition twist joints, but those are held at a zero angle so that the poses can be viewed in a 2D plane. The layout of the joints is shown in Figure 21. For analysis, an angle of zero represents the adjoining links being collinear. The distance to the tip is taken from the robot’s base, located at the axis of
θ0. By changing θ0, any planar point at that radius can be selected. By also rotating the robot’s plane, any cylindrical point can be chosen. For simplicity, self intersections of the robot are not considered since they would violate the domain assumptions given in the problem statement.
72 Figure 21: The planar robot has three revolute joints. Additionally, the base rotates so that the robot’s plane can be arbitrarily chosen.
The output is the square of the distance to the tool tip, found by the dot product of the tool tip position with itself.
y = R>R
2 2 2 = L1 + L2 + L3 + 2L1L2 cos(θ1) + 2L1L3 cos(θ1 + θ2) + 2L2L3 cos(θ2) (26)
with R being the vector to the tip and L1 =0.290 m, L2 =0.312 m and L3 =0.192 m being the distances between the θ0 axis, the θ1 axis, the θ2 axis and the tool tip, respectively. The cost is the magnitude of the gradient of y squared, found by a dot product, and scaled by a factor of ten,
> J = 10 R>R R>R ∇ ∇