AUTONOMOUS MOTION LEARNING FOR NEAR

Dissertation

Submitted to

The School of Engineering of the

UNIVERSITY OF DAYTON

In Partial Fulfillment of the Requirements for

The Degree of

Doctor of Philosophy in Electrical Engineering

By

Alan Lance Jennings

Dayton, Ohio

August, 2012 AUTONOMOUS MOTION LEARNING FOR NEAR OPTIMAL CONTROL

Name: Jennings, Alan Lance

APPROVED BY:

Raul´ Ordo´nez,˜ Ph.D. Frederick G. Harmon, Ph.D., Lt Col Advisor, Committee Chairman Committee Member Associate Professor, Electrical and Assistant Professor, Dept of Aeronautics and Computer Engineering Astronautics

Eric Balster, Ph.D. Andrew Murray, Ph.D. Committee Member Committee Member Assistant Professor, Electrical and Computer Associate Professor, Mechanical and Engineering Aerospace Engineering

John G. Weber, Ph.D. Tony E. Saliba, Ph.D. Associate Dean Dean, School of Engineering School of Engineering & Wilke Distinguished Professor

ii c Copyright by

Alan Lance Jennings

All rights reserved

2012 ABSTRACT

AUTONOMOUS MOTION LEARNING FOR NEAR OPTIMAL CONTROL

Name: Jennings, Alan Lance University of Dayton Advisor: Dr. Raul´ Ordo´nez˜

Human intelligence has appealed to the community for a long time; specifically, a person’s ability to learn new tasks efficiently and eventually master the task. This ability is the result of decades of development as a person matures from an infant to an adult and a similar developmental period seems to be required if robots are to obtain the ability to learn and master new skills. Applying developmental stages to robotics is a field of study that has been growing in acceptance. The paradigm shift is from directly pursuing the desired task to progressively building competencies until the desired task is reached. This dissertation seeks to apply a developmental approach to autonomous optimization of robotic motions, and the methods presented extend to function shaping and parameter optimization.

Humans have a limited ability to concentrate on multiple tasks at once. For robots with many degrees of freedom, human operators need a high-level interface, rather than controlling the positions of each angle. Mo- tion primitives are scalable control signals that have repeatable, high-level results. Examples include walking, jumping or throwing where the result can be scaled in terms of speed, height or distance. Traditionally, mo- tion primitives require extensive, robot-specific analysis making development of large databases of primitives infeasible. This dissertation presents methods of autonomously creating and refining optimal inverse func- tions for use as motion primitives. By clustering contiguous local optima, a continuous inverse function can

iii be created by interpolating results. The additional clusters serve as alternatives if the chosen cluster is poorly suited to the situation. For multimodal problems, a population based optimization can efficiently search a large space.

Staged learning offers a path to mimic the progression from novice to master, as seen in human learning.

The dimension of the input wave parameterization, which is the number degrees of freedom for optimiza- tion, is incremented to allow for additional improvement. As the parameterization increases in order, the true optimal continuous-time control signal is approached. All previous experiences can be directly moved to the higher parameterization when expanding the parameterization, if a proper parameterization is selected.

Incrementally increasing complexity and retaining experience efficiently optimizes to high dimensions when contrasted with undirected global optimizations, which would need to search the entire high dimension space.

The method presented allows for unbounded resolution since the parameterization is not fixed at program- ming.

This dissertation presents several methods that make steps towards the goal of learning and mastering motion-related tasks without programmed, task-specific heuristics. based on a high- level system description has been demonstrated for a robotic arm performing a pick-place task. In addition, the inverse optimal function was applied to optimizing robotic tracking precision in a method suitable for online tracking. Staging of the learning is able determine an optimal motor spin-up waveform despite large variations in system parameters. Global optimization, using a population based search, and unbounded reso- lution increasing provide the foundation for autonomously developing scalable motions superior to what can be designed by hand.

iv And the Lord God doth work by means to bring about his great and eternal purposes; and by very small means the Lord doth confound the wise and bringeth about the salvation of many souls. Alma 37:7

That is which is of God is light; and he that receiveth light, and continueth in God, receiveth more light; and that light groweth brighter and brighter until the perfect day.

Doctrine and Covenants 50:24

And Jesus increased in wisdom and stature, and in favour with God and man.

Luke 2:52

v ACKNOWLEDGMENTS

My sincere thanks goes to the Dayton Area Graduate Studies Institute (DAGSI) and the Ohio Space Grant

Consortium (OSGC) for their generous support and recruiting me to the Dayton area.

vi TABLE OF CONTENTS

Page

ABSTRACT ...... iii

DEDICATION ...... v

ACKNOWLEDGMENTS ...... vi

TABLE OF CONTENTS ...... vii

LIST OF FIGURES ...... x

LIST OF TABLES ...... xv

CHAPTERS:

I. INTRODUCTION ...... 1

II. PROBLEM STATEMENT ...... 7

III. DEVELOPMENTAL LEARNING ...... 12

3.1 The Basis and Need for Developmental Learning ...... 12 3.2 Nature of Developmental Learning ...... 14 3.3 The Categorization Problem ...... 16 3.4 Sensorimotor Control ...... 18 3.5 Mental and Social Learning ...... 19 3.6 The Progress and Potential of the Developmental Learning Paradigm ...... 21 3.7 Developmental Structures ...... 22 3.8 Directed Learning ...... 26 3.9 Function Approximators ...... 30

IV. NUMERICAL PATH OPTIMIZATION ...... 36

4.1 Introduction ...... 37 4.2 Problem Statement ...... 39

vii 4.3 Method ...... 40 4.4 Optimality Validation ...... 43 4.4.1 Setting up the problem ...... 43 4.4.2 Path evaluation ...... 47 4.4.3 Optimality conditions ...... 50 4.5 Practical Application: Industrial Robot ...... 52 4.6 Conclusion ...... 55

V. INVERSE FUNCTION CLUSTERING ...... 58

5.1 Introduction ...... 59 5.2 Problem Statement ...... 61 5.3 Algorithm ...... 62 5.3.1 Optimization ...... 62 5.3.2 Execution and cluster organization ...... 65 5.4 Results ...... 67 5.4.1 2-D test problems ...... 67 5.4.2 Robotic arm control ...... 72 5.5 Conclusion ...... 84 5.6 Addendum ...... 85 5.6.1 Convergence proof ...... 85

VI. INCREASING RESOLUTION ...... 88

6.1 Introduction ...... 90 6.2 Problem Statement ...... 92 6.3 Methodology ...... 95 6.4 Test Problems ...... 102 6.5 Physical Problem ...... 108 6.6 Conclusions ...... 114

VII. CONCLUSION ...... 117

7.1 Focused Problem Statements ...... 118 7.1.1 How does directed learning fit in a global or local search context? ...... 118 7.1.2 How can motions be optimized based on a high-level rigid body representation? . 119 7.1.3 How can operation of a multiple-input, single-output system be simplified when considering tracking and optimization? ...... 119 7.1.4 How can equivalent, but fundamentally different motions be organized? ...... 119 7.1.5 How should motion primitives be represented to facilitate incremental increases in the number of control parameters? ...... 120 7.1.6 What are the limitations fundamental to staged learning? ...... 121 7.2 Considerations ...... 121

viii BIBLIOGRAPHY ...... 123

Appendices:

A. PROGRAM EXAMPLES ...... 142

A.1 Pendulum Control Optimization ...... 142 A.2 Robotic Arm Pick-Place Optimization ...... 142 A.3 Using Continuous Optimal Inverse Functions for Optimizing Precision ...... 143 A.4 Testing of Continuous Autonomous Learning ...... 143

ix LIST OF FIGURES

Figure Page

1 The pendulum is either inverted or suspended by changing the direction of gravity with re- spect to θ = 0. The solid model used is shown on the right. It consists of the pendulum and a small piece serving as the pin of rotation...... 45

2 The SimMechanics program solves the kinetic problem using relations of forces and rigid bodies...... 46

3 For a suspended pendulum, the cost of all methods is comparable suggesting that results are reasonable. However, due to saturation and geometric nonlinearities, the linear based methods fail to reach the final state for some angles, while DIDO gives consistently good results...... 48

4 For an inverted pendulum, the LQR gives a slightly higher cost then DIDO. The unstable nature of this system makes the results of the LQ controller irrelevant due to the large final error. The cost peaks at 120 degrees due to the large initial torque required as the pendulum is extended sideways...... 49

5 For a suspended pendulum beginning at a small angle, the results from numerical optimiza- tion are compared to results using an LQ and LQR controller. With this weak nonlinearity, all controllers reach the desired state. The path-optimal, LQ and DIDO controllers exploit the self righting nature of the pendulum. The LQR controller is a feedback controller and the large initial error has a corresponding spike in control...... 50

6 For a suspended pendulum beginning at a large angle, the results from numerical optimization are compared to results using an LQ and LQR controller. With this strong nonlinearity, the LQ controller fails outright while the LQR approaches but does not reach the goal. DIDO is able to accurately reach the final state...... 51

x 7 For an inverted pendulum beginning at a small angle, the results from numerical optimization are compared to results using an LQ and LQR controller. Even though the nonlinearity is weak, its unstable nature causes the open-loop LQ controller to not reach the final state. The feedback LQR controller takes a initially aggressive approach compared to the optimized controller...... 51

8 For an inverted pendulum beginning at a large angle, the results from numerical optimization are compared to results using an LQ and LQR controller. The LQ controller fails again, noticeably suffering from saturation effects. The LQR controller approaches the final state asymptotically while DIDO directly approaches the final state...... 52

9 The top surface shows how the control signal for an inverted pendulum changes as a result of the starting angle. The control saturates around a 90◦ starting angle due to the high initial torque to overcome the most powerful gravity torque. When the control constraint is not active, the optimality residual should be zero. Otherwise, their product should be negative; as is the case with these results...... 53

10 The residuals for all nodes for all starting angles show the optimality conditions are met to a reasonable numeric standard. Control residuals are shown as black boxes, state residuals are shown as green circles and costate residuals are shown as red crosses...... 53

11 The coordinates for the four degree-of-freedom arm are shown on the left. The initial and final poses are in the middle and right respectively...... 54

12 The path of the arm is shown by poses...... 55

13 The optimized path is shown on the left compared to the traditional path on the right. . . . . 56

14 There are two phases of the algorithm. The clusters of optima are found forming the functions

hk(y). The set of hk’s are used to adapt to changing conditions, such as the operator set point,

yd...... 63

th th 15 The k cluster is formed about the k settled point, xk,0. New agents are added in the positive and negative output directions by Eqn.’s 24 and 25, shown by ’s at [0]. New agents × step by Eqn. 21 and eventually settle to locations shown by ’s, which are added to the ◦ cluster provided they meet certain criteria. The process of adding and updating agents is

repeated until that direction is terminated. The set of settled points are then used form hk via interpolation as shown by the dashed line...... 66

16 The clusters formed with the quadratic cost function and the periodic output function are shown. Different clusters are shown by a different color. Each cluster represents a locally- optimal point for the output contour...... 69

xi 17 The clusters for Figure 16 are shown as yd varies. Each dimension of the set of hk’s is shown along with the associated cost. The light green cluster is obviously the preferred cluster

covering all of yd and always having the lowest cost...... 69

18 The clusters formed with the linear-quadratic cost function and the multimodal Gaussian as the output function are shown. Different clusters are shown by a different color. Each cluster represents a locally-optimal point for the output contour...... 70

19 The clusters for Figure 18 are shown as yd varies. Each dimension of the set of hk’s is shown along with the associated cost. The cost chart can be used to select an appropriate cluster. If one cluster cannot be found, then the upper charts can be used to find clusters that are near,

such as the red and yellow cluster going from yd < 0 to yd > 0 at x1 = 1 and x2 = 1. . 70 − −

20 A cluster from Figure 16 is tested for optimality and accuracy. Accuracy is checked by gener-

ating test points (in dark blue) on the cluster (in red) for random values of yd and comparing ∗ ∗ f(x ) yd to zero. Test points in the neighborhood of x (in light green) are compared by − cost in Υk. Cluster points are always lower then the test-points except in the case where the test points extend away from the cluster and into another cluster’s optimality neighborhood. . 71

21 The planar robot has three revolute joints. Additionally, the base rotates so that the robot’s plane can be arbitrarily chosen...... 73

22 The cost function was scaled so that the magnitude of its gradient was similar to the output function’s gradient...... 74

23 The output has mostly elliptical contours resulting in low cost along the major axis and higher costs along the minor axis...... 75

24 The inverse functions are traced in the joint angle space and distinguished by marker type. . 76

25 Once the application is known, the range of distances will limit the choice of clusters. The operator could also consider the distribution of the cost or other factors, such as if a positive

or negative θ1 is desirable. Marker codes match results in Figure 24...... 76

26 Points interpolated for each inverse function of the HP-3 show the expected output with sur- rounding points showing higher costs...... 77

27 Poses of the HP-3 as it traces a circle using standard planning. For reference, a circle showing the entire path is added...... 79

28 Poses of the HP-3 as it traces a circle using an optimal inverse function. For reference, the circle showing the entire path is added...... 79

xii 29 Use of the optimal inverse function gave about a 25% improvement in precision over the standard pose. A joint resolution of 0.088 deg/pulse is assumed to provide a magnitude to the results...... 79

30 The complex robot has seven revolute joints though only θ3, θ4 and θ5 appear in the optimiza- tion...... 81

31 The cost and output functions are ellipsoidal in θ3 and θ5, but θ4 has a lesser effect shown by

the minor changes along the θ4 direction...... 81

32 Due to additional symmetries of the IA-20, twice as many clusters are formed for the 3 dimensional optimization...... 82

33 Points interpolated from the inverse function (dark blue) show the expected output and sur- rounding points (light green) show higher costs...... 83

34 A memory-based model interacts with the system by collecting (a, y, J) triplets as needed

to ensure that estimates yˆ(aq) and Jˆ(aq) have sufficient data in the proximity of the queried

point, aq...... 94

35 Optimal inputs, a∗, are determined by optimization over the memory model. Sets of a∗ are

collected and organized by yd into a reflex function. As needed, the process in Figure 34 is employed to ensure accuracy...... 94

36 After the operator selects a desired output, the optimal input based on the current parameter- ization is generated by simple interpolation...... 94

37 By creating the higher dimension function from the lower dimension function, no discontinu- ity is introduced at the new node location. Therefore the higher dimension function exactly produces the lower dimension function...... 99

38 One possible scheme of locations for adding a new node is by adding a node at the midpoint to the first largest interval...... 99

39 Basis functions are scaled and offset so that they are orthonormal...... 101

40 Beyond N = 4, results appear identical by sight. Bootstrapped, direct optimization also appear identical...... 104

41 Results have comparable accuracy with direct optimization...... 105

xiii 42 The number of samples required grows exponentially with the resolution of u(t). Equations of the trend lines are 12, 000 100.031N for direct optimization and 4, 000 100.068N for the · · memory-based model...... 105

43 As the size of the set of yd increases, fewer samples are needed for the new points with the memory-based method. Direct optimization however is purely linear. The line types are dash-dot, dash and solid for N=5, 9 and 14, respectively...... 106

44 Results when the sine frequency is tripled behave as expected and are shown for yd = 0.07, 0.38, 0.82...... 108

45 The rate of improvement offers no guarantee over future improvement. Results when the sine frequency is tripled have a slow initial rate of improvement before performance dramatically improves...... 108

46 The input signal controls a motor through an amplifier. A peak detector senses the maximum current; while a low pass filter measures the final speed from a tachometer...... 110

47 Results for ten independent runs are shown in boxplot format (box extends from 25-75% with median shown by a line). The median continues to decrease until the 5th dimension...... 112

48 A representative set of example waveforms for the motor start up problem are shown as the resolution increases...... 113

49 The actual output is within the confidence interval of the expected output...... 114

xiv LIST OF TABLES

Table Page

1 Pendulum Parameters ...... 44

2 Pseudocode for Connecting SimMechanics to DIDO ...... 45

3 Fundamental Algorithm of Unbounded Learning ...... 96

xv CHAPTER I

INTRODUCTION

Robots typically offer many degrees of freedom making it possible for them to accomplish a wide variety of tasks. However, the many degrees of freedom add a substantial burden to an operator or programmer. This dissertation presents methods to improve the interface by helping the robot to autonomously learn motions. Traditional machine learning techniques focus on solving a given task, but are then typically limited by programmed heuristics or the chosen resolution. Developmental robotics however seeks to create robots that learn general skills and develop over time, similar to human development. In this chapter, the traditional method of programming motions ‘by hand’ is contrasted with having the robot progressively learn. Staged learning allows for better performance by being able to develop over an extended period of time and eventually beyond the programmers ability or understanding.

Robots are defined by their ability to do many different tasks yet the ability to naturally control them lies beyond our grasp. Robots are hard to operate. For familiar tasks, the human brain abstracts the task into a motion so that attention is not diverted. Some common tasks, such as tying knots, shooting a basketball, climbing stairs or swimming, are acted out subconsciously. Despite the fad of human multitasking, studies have shown that performance significantly decreases as attention is diverted [1]. Only a very few simultaneous tasks can be consciously handled. Because robots are made to do many different tasks, they typically require many degrees of freedom, each being actively controlled. Often the human-machine interface is not natural but requires a high degree of attention. This results in teleoperation requiring singular attention and having a high rate of failure. The solution to this problem is training (such as with unmanned air vehicles), having a very natural interface (such as with surgical instruments and haptic feedback) or providing the user with a high-level control. For example, cars automatically calculate the proper timing and fuel quantity based on a 1 high-level command coming from the accelerator pedal. The approach taken in this dissertation is to improve the interface and reduce the burden on the operator, rather than focusing on training of the operator. The research of this work has been directed to methods so that robots can autonomously develop motions.

Having robots learn, or programming intelligence, is not a novel concept. Three key technologies provide the foundation of an intelligent robot. First, sensors, such as digital cameras, microphones and pressure sensors, allow robots to receive much of the same information that humans use. Second, the growth in computing power has made processing volumes of data online possible for reasoning on complex problems.

Finally, electro-mechanical systems offer a method for a computer to act on decisions it makes. Together, they offer the ability to observe, reason and act. Reasoning allows for creation of system models, meaning being able to make predictions based on observations and actions. Firsthand observations allow for perfecting a system model. Acting allows the agent to direct its process of learning and shape its environment. Each of these three disciplines have had recent growth, improving the practicality of highly intelligent agents.

However, simply adding these three components does not guarantee good performance. For simple, structured tasks, a programmer could reason the appropriate behavior. For some tasks, (such as playing Jeopardy! [2]), the structure for organizing data and the methods for making decisions can be programmed with the specifics being filled in by experience. How to create general purpose intelligence for robotic agents has not been solved. The scope of this dissertation focuses on learning motions. The learned motions compose a set of primitives. For each motion primitive, the appropriate low-level command can easily be determined based on a desired objective.

The difficulty humans have manipulating complex systems results in poor performance. However, once a task can be understood and handled using high-level concepts, performance improves dramatically. Pro- gramming is an excellent example. At the machine code level, programs take tremendous time to create and require significant debugging. Compilers now are used to abstract programs to libraries of functions. Other programs create low-level programs from high-level languages, such as LabVIEW, MATLAB or web-page

2 design. Even though high-level programs are executed as low-level instructions at some point, programmers do not have to be concerned with details, such as matrix multiplication of floating point numbers, and can focus on high-level objectives.

Motion primitives act like functions, in that these motions can be called with expected results. A motion primitive is a coordinated control law, that when executed produces a specific outcome. Many degrees of freedom can be reduced to understandable objectives such as bulk movement. For example, efficient walking requires motion of all legs but really is a one dimensional motion defined by a desired speed. Other primitives could be turning, jumping, interception or following. Multiple primitives could accomplish the same task, such as walking with different nuances; for instance, long strides for running, short strides for more stability, hopping, or skipping. The different versions offer alternatives if one appears unsuited for the situation at hand. With a library of motions, an operator could manage the system at an objective level, rather than controlling each joint, with alternative inputs for different considerations.

Work has been done in creating motion primitives in the robotic community, but mostly in ‘by hand’ type of approaches. The system and motion are analyzed by an engineer (or a team of engineers) who composes a trajectory or control law that gives sufficient behavior. The engineer normally has an idea of what would succeed and pursues it. The reason these are done by hand is that a solution to the task is desired, and the ‘by hand’ method is the most effective and reliable method to date. Here are some examples: Shigeo

Hirose et al. of HiBot created a robot to inspect high voltage power lines and uses a motion learned from an operator to pass obstacles it encounters [3]. Hirose has also done other work to develop snake robots, walking robots and combinations of wheeled and crawling robots [4, 5]. Bojan Nemec from the Jozef Stefan Institute created a controller for autonomous skiing [6], but the question-and-answer at the presentation at IEEE/RSJ

International Conference on Intelligent Robots and Systems is telling of the challenge of ‘by hand’ control design. “Can it stop? Yes, sort of. What would it need to be able to compete with humans in a down hill

3 skiing race in, say, 2015? [Pause.] I think people are expecting too much from this robot.”1 The process of opening a heavy door was addressed by Hitoshi Arisumi of Japan’s National Institute of Advanced Industrial

Science and Technology [7]. Many hexapods have been created that use one central motor with linkages that produce a fixed walking motion. The investment in design and tuning shows how individually producing a large number of motion primitives ‘by hand’ is infeasible. The proposed solution is autonomous learning.

In order to learn autonomously, the system must be able to plan and conduct experiments without outside involvement. The system must be able to decide which inputs should be tested in order to develop. It must have internal metrics for evaluating its progress in learning. Results should be automatically organized in a useful fashion, so that high-level planners can take advantage of learned motions without needing human interpretation. Therefore, autonomous learning is best suited for objective tasks, such as locomotion; rather than subjective tasks, which are best learned in a social context. The obvious advantage of autonomous learning is the lack of requiring a human trainer. More subtle advantages include not being affected by biases in human trainers and better sharing of learned skills among similar robots because to group can coevolve.

Staged learning draws inspiration from the biological developmental cycle. Biological systems often cannot immediately solve novel complex tasks. However, when the task is similar to previous tasks, a solution can be found. Looking from the other direction, by learning to develop solutions for simple tasks, solutions to related but more complex tasks will come naturally. Staged learning begins with a restricted search space. The small space is used to identify productive areas, meaning regions that offer repeatable, useful motions. The standard example is learning to crawl and walk before running. Dynamic actions like running offer a smaller set of satisfactory inputs. So balance is developed by standing or hopping in place, with results than extended to dynamic actions. The advantages of breaking the learning process into smaller tasks can be seen from how linearization is used in nonlinear programming. Even though the global behavior is nonlinear, reducing the problem into a linear optimization allows for using a set algorithm that incrementally progresses to the

1Anne-Marie Coreley, “Skiing Robot Races Down Slope,” IEEE Spectrum, www.spectrum.ieee.org/automaton/robotics/robotics- software/skiing-robot-races-down-slope/ Oct 16 2009, retrieved Jun 9 2012

4 optimum. A staged learning algorithm will steps toward a refined motion in small increments to balance the complexity of the learned skill with the current proficiency and experience. Motion primitives are an efficient representation for staged learning. Learning methods can act on a set of motion primitives to create a more refined set of motions or create more elaborate motions through concatenations.

When searching for better solutions, there are two fundamental approaches. The first is to take an existing solution and continue to improve it. Performance can often be improved by slightly adjusting the motion.

Larger improvement is gained by the accumulation of slight improvements. This approach is based on local optimization. The second involves global search. The high jump was traditionally dominated by scissoring the legs, bringing the other up after the first crossed. Then in 1960’s, Dick Fosbury created the Fosbury flop of crossing the back and head first which quickly dominated high jumping due to its improved performance

[8]. The Fosbury flop represented a disjoint solution in the solution space. It cannot be achieved directly from improving the scissor approach. By the same approach, it is not an obvious solution, so a general purpose optimization would require extensive searching to find it. Most global optimizations balance the need for focusing solutions and increasing diversity.

‘By hand’ approaches incorporate the designer’s intuition. If the prevailing intuition is inefficient, then the best motion will remain elusive until the appearance of a fundamentally different solution. Autonomous learning algorithms address situations where biomimetics do not apply and humans do not have an intuition for the solutions. The martian rover, Spirit, was stuck in a sand trap for months while engineers on Earth attempted to propose a motion that would free the rover. Staged learning could allow the rover to first develop movements that create motion, then find concatenations that produce the most productive motion.

The contribution of this dissertation is a step towards mimicking biological development. Its scope lies in autonomous improvement through the process of training. Therefore, the scope does not extend to the synergistic benefits of tutoring or group learning. The focus is on incremental improvement over a long period of time. This is achieved by application of a number of numeric methods in a developmental sense.

5 Specifically addressed are how to represent motions in such a way that they can be modified for a given objective so as to provide a reflexive result, and the learning process balances the complexity of the search space with the ability to continue to develop. With the foundation of individual learning, group learning can benefit by the injection of new solutions. Ultimately, staged learning would provide a method for motions to be perfected beyond the programmer’s ability or understanding.

Chapter II addresses the research questions. A review of related work is given in Chapter III. Chapter IV presents a method of abstracting optimal robotic control to a general trajectory optimization. This method allows the designer to work in terms of linkage geometry and obtain trajectories without derivation of low- level equations of motion. Next, Chapter V shows how a swarm optimization efficiently searches a global search space in a way that lends to creation of optimal inverse functions. Staged learning in presented in

Chapter VI, where motions begin at a limited parameterization and progress to an unbounded resolution. The dissertation is concluded in Chapter VII.

6 CHAPTER II

PROBLEM STATEMENT

The central questions of developmental robotics deal with how to enable a robot to stage learn- ing to more complicated tasks. This dissertation focuses specifically on developing near optimal motions. The specific questions addressed throughout this dissertation are enumerated and dis- cussed. Considerations for comparing different learning methods are presented including if local or global search is done or how additional data is able to result in better performance. Rep- resentations used for motions also have a significant effect on learning and how to identifying advantages is addressed by research questions. The questions posed in this chapter provide a map to progress towards the goal of having a robot learn and master new tasks.

The overarching goal of this dissertation is to answer, even if partially, the question of how to match the ability of animals to grow from novice to master on motion related tasks. This would help solve some aspects of the problem of how to create general-purpose intelligence. Despite remarkable achievements, the solution has been elusive, although some principles are becoming more clear. With growing acceptance, the process of learning for general purpose systems is seen as a long term process, rather than an isolated training phase.

Also, staging of the learning offers the advantage of directing the learning in an unobtrusive manner. In order to pursue the overarching goal, the principles of long term learning and staging have been applied to form the principal problem statement of this dissertation.

How can autonomous staged learning be used to develop near-optimal motion primitives in high dimensions?

To support the principal problem statement, additional problems are enumerated, to provide more focused steps: 7 1. How does directed learning fit in a global or local search context?

2. How can motions be optimized based on a high-level rigid body representation?

3. How can operation of a multiple-input, single-output system be simplified when considering tracking

and optimization?

4. How can equivalent, but fundamentally different motions be organized?

5. How should motion primitives be represented to facilitate incremental increases in the number of con-

trol parameters?

6. What are the limitations fundamental to staged learning?

This chapter elaborates on these questions and their relationship to the principal question.

Global search seeks to find solutions wherever they exist in the solution space. Local search simply tries to find a better solution in its neighborhood. The advantage of the global search is that it provides more choices. However, humans are adversely conditioned for too many choices [9]. As shown in Schwartz et al., additional choices (after a certain threshold) add additional regret for unused good solutions and increase expectations leading to lower satisfaction with the chosen solution. Though the disadvantages of too many choices are presented from a psychological point of view, it is useful to consider that this mechanism is present in intelligent systems tending them toward simple solutions. From an algorithmic point of view, the additional choices increase the search space exponentially. Depending on the representation, the increase in solutions above a threshold may not match the growth in the search space, thus decreasing the likelihood of finding a good candidate. Local optimization limits the search space to the local neighborhood. If the search space is Lipschitz, then local gradients provide the most efficient directions for search. Because global search offers the benefit of finding disjoint solutions, and local search has increased efficiency, both should be employed to some extent when directing learning.

8 Motion primitives are based on a trajectory control problem. The general trajectory control problem is to

find a set of control signals U such that each element, u(t), satisfies g(x, u, t) 0 given that x˙ = f(x, u, t), ≥ where x˙ = dx/dt. The application is directed to physical systems, where u(t) represents a control input.

The function f(x, u, t) represents system dynamics and g(x, u, t) represents a set of trajectory constraints.

Constraints can be restrictions on states, rates of change, the input, or running measures like energy or fuel used or functions of the combination of these. Constraints can also be used to specify output quantities such as progressing to x1 3 or to exactly x1 = 3. Though trajectory control is selected as the inspiration for ≥ this dissertation, results extend to other areas, such as function shaping, function approximation and static optimization.

In cases where multiple solutions offer the same desired output, a cost function should be considered to rank solutions. It is assumed that at least one solution exists; proving the absence of a solution is a difficult challenge and beyond the scope of this work. The members of the set U should be partitioned into clusters such that each cluster has no more than one optimal candidate for a given output2. In addition, clusters should be contiguous as the desired output changes. Each cluster represents a motion primitive where a variable outputs scales the motion for different tasks or situations. If the system were linear, then scaling would simply be linear, but in general, nonlinear motion primitives would be expected. The cluster structure is advantageous over other methods, such as only keeping Pareto-optimal results, since it allows for fundamentally different solutions to be used when a primary solution fails.

Learning methods employ techniques to explore and generate knowledge from a problem’s topography.

The representation of the variables in a problem often significantly affects how the method will behave. It has been shown that initial choices for learning can affect final motions learned and almost always affect the rate of learning [10, 11, 12]. The progression of learning in humans is from core muscles to extremal joints [13].

In the work of Alexander Stoytchev, core muscles were advantageous since they produced the most motion

2 In the case where U contains one member, U is itself a cluster with the sole member being the optimal candidate.

9 and therefore the most reliable response [14]. For this reason, consideration of the representation used is as critical as the algorithm.

Though staged learning offers advantages, few existing methods in the literature stage the development.

Most often staging is done by using primitives with a high-level planner [15]. Activities used to identify systems can be viewed as stages of learning [14]. Only one work has autonomous staging where attention is directed to productive learning, but the method cannot advance beyond its programmed representation

[16]. This means that once the programmer determines the resolution, the level of possible performance is

fixed. For autonomous motion learning to be more effective than preprogrammed methods (such as ‘by hand’ approaches), it must allow for unbounded learning. For this reason the representation must not be bounded at programming, but should approach the set of continuous functions in the limit. In addition, the system should autonomously determine when it is ready for the higher dimension representation.

Some low-dimension parameterizations that can be expanded to continuous functions in the limit may not provide a direct relation as the parameterization changes. For example, based on the domain considered, coefficients of lower order polynomial approximations do not necessarily appear in higher order approxima- tions. A quick example can be seen approximating f(x) = x2 over the range 0 x 1. The first three least ≤ ≤ squared error polynomial approximations are

fˆ1(x) = 1/3 = 0.333

fˆ2(x) = 1/6 + 1x = 0.167 + x − −

2 2 fˆ3(x) = 0 + 0x + x = x . (1)

The sequence of parameters is [1/3(, 0, 0)], [ 1/6, 1(, 0)] and [0, 0, 1]. None of the terms carry over or are − even close to previous approximations. The fastest learning rate would be expected when the previous and new optimal results are closest in the new parameterization.

10 This dissertation addresses the problems enumerated in this chapter for progress in achieving efficient and effective staged learning. Local search offers the best chance of immediate improvement while global search organizes an exploration to find alternatives. For autonomous learning, the description of the system should be as intuitive as possible. By utilizing high-level representations, such as rigid body dynamics, with general purpose optimization, the learning process can be automated, as shown in Chapter IV. The high- level representations are more intuitive for human designers than using complicated low-level equations of motion, and can help automate the design process if a human is not involved. The basic definition of a primitive implies that it can be used for multiple purposes. By composing continuously variable primitives, solutions are better suited for changing tasks or environments. This is demonstrated in a global search sense in

Chapter V and a local search sense in Chapter VI. Using inverse functions as primitives provides a simplified interface for operators, compared to regulating and optimizing a multiple-input system. By autonomous training, the system learns and refines the set of functions available to the operator. As presented in Chapter

VI, learning can be staged by limiting the search space from all continuous functions (which requires an infinite parameterization) to a finite parameterization and then increasing the dimension of parameterization to approach the set of continuous functions. Both the representation and organization are investigated for improved learning efficiency. By investigating the limitations of staged learning, performance bounds can be gauged.

11 CHAPTER III

DEVELOPMENTAL LEARNING

Rather than try to learn skills directly, developmental learning approaches seek to create gen- eral purpose learning systems inspired by the human developmental process. The premise is that intelligent behavior can only be achieved through the course of experience, and attempts to circumvent this approach will lead to a plateau limiting performance. There are many facets of applying human developmental principles and some of the common examples are categorization, sensorimotor control and social learning. These are relatively simple tasks for people, yet chal- lenging for an autonomous robot. A survey of research in this recent field is presented with a focus on describing the concepts and the tools used. While this dissertation provides only one component for a general purpose learner, it can be combine to support or benefit from other work in this field. The breadth of structures employed is presented, especially as related to function approximation. Function approximation can be seen as a form of learning and methods from developmental learning are compared to numeric methods. This review is intended to give suffi- cient depth that the paradigm of developmental learning can be understood as applied to motion optimization in later chapters.

3.1 The Basis and Need for Developmental Learning

Developmental learning methods are staged so that learning begins at simple tasks and builds up to more complex or challenging tasks. This relatively new approach was inspired by the developmental milestones in biology. Some research laboratories have adopted this approach and it is seen more often at conferences and in journals.

Developmental milestones result from incremental progress on low-level tasks which enable high-level accomplishments. Models of self and environment and the ability to plan, categorize and use tools evidence cognitive development. Gross motor skills (large, powerful motions such as walking) and fine motor skills

12 (precise motions such as picking up beans) are examples of sensorimotor development (coordination of sen- sory input and motion). Attempts to circumvent the developmental learning process result in an algorithm proficient in its task, but one that cannot build on itself. The premise of developmental robotics is well characterized by a quote from A. Stoytchev:

Our basic research hypothesis is that truly intelligent robot behavior cannot be achieved in the absence of a prolonged interaction with a physical or a social environment. In other words, robots must undergo a developmental period similar to that of humans and animals. [17]

The final goal would be to have a robot that is at least as competent as an adult at being able to solve physical

(and possibly mental) problems autonomously, be trained by people or robots, and adapt for reasonable failures. Staged learning is attractive since if the robot can autonomously develop, it can work in novel environments and beyond the designers’ understanding [18]. This chapter surveys the principles of this approach, the applications and the tools used.

Humans are the primary examples of intelligence from a developmental process, but development is also studied in other species as well [19]. Psychology plays a prominent role in the field and two theories are often cited. Jean Piaget proposes an individual learner [20, 21], while Lev Vygotsky proposes development resulting primarily through social interactions [22]. Representation of the foundational knowledge gained in the developmental process affects the ability to learn, and representations better suited for learning are being researched. Eleanor J. Gibson proposed the foundational theory on how people hold models of what objects can accomplish, called the theory of affordances [23]. Biologist see developmental robotics offering prospects at understand biological development, but this dissertation focuses on improving robot intelligence.

Traditional robotic controllers (such as state-space or fuzzy-logic) typically fail to provide a powerful, general framework; and thus designs are difficult to adapt to novel tasks [24, 25]. Traditional controllers typically perform poorly on poorly understood systems, such as intractably complex systems or unknown physics, such as flapping wing dynamics. Telepresence has limitations in that operation is mentally fatiguing

13 [26]. Humans can typically only hold attention on one hand at a time, so operating many degrees of freedom in unfamiliar ways is beyond most people’s ability [11].

The developmental approach is partly a response to the traditional artificial intelligence (AI) approach which believed that with sufficient knowledge (in the form of logic rules), an agent would be able to out- perform any of the designers in any future task [27]. This rule-based, general-purpose approach proved complicated, slow and fragile. Later, fuzzy rules and heuristics were applied with more success. The seeds from traditional AI can still be seen in some developmental robotic methods which try to use rules for mod- eling and planning. However, heuristic methods applied with slight changes can surprisingly result in much worse performance [11]. By imposing heuristics on the problem, the algorithm does not develop its own heuristics, and good performance is limited to the correctness of the heuristic. Therefore, the agent should be able to acquire its own understanding of the environment rather than the designer’s understanding [28].

Heuristics are powerful in the real world, and an intelligent agent must be able to infer good strategies to deal with complexity.

This chapter follows with a description of developmental learning in Section 3.2. Three divisions are are made in the field: Categorization, Sensorimotor control and Mental and social learning in Sections 3.3 to 3.5.

Assessment on the field as a whole are given in Section 3.6. The chapter concludes drawing developmen- tal learning principles into the framework used in this dissertation. Structures employed in developmental learning algorithms are the subject of Section 3.7. Methods to direct the learning efficiently are discussed in Section 3.8. Finally, learning can be seen as creating a function approximation, so methods outside of developmental learning are compared and contrasted in Section 3.9.

3.2 Nature of Developmental Learning

One of the challenges of artificial intelligence research is that there is no precise definition of what in- telligence is. Most definitions of intelligence involve making good decisions based on previous experiences.

14 Within this definition, there are three components: decision, goodness and experience. An intelligent system must be able to make a decision to manifest its intelligence. Besides the ability to act, this could be a clas- sification decision or internal decisions. Note that this definition would classify adaptive controllers, if they have proven stability for their application, as intelligent. However if their application changes, then they may no longer be stable; and so there should be some measure of breadth of applications or enhanced ability when confronting novel problems.

A sensor array that can selective route signals, has internal states or adapts how it fuses data may act intelligently. What is ‘good’ might be able to be put into words or might simply be a mathematical compulsion affecting goals. What is ‘good’ does not need to be constant, and research is being done to use a value system to direct learning, which in some ways mimics emotions. This abstracts the pursuit of knowledge to higher levels allowing for more mature development. Since absolute logical validity is not being required, experience should be sufficient for initial formation of knowledge, often distinguished as on-line learning.

Some plasticity is required to change or refine rules or respond as the environment changes, but too much will prevent learning subtle rules. This is consistent with research demarcating limits to human learning based on plasticity. Experience, and not the designers understanding, should drive the agents understanding.

A premise of robotics is that the exact application is not know to the robot designer, so designs should be general purpose. The structure inherent in the experience shapes the rules learned by the system from a general purpose to one formed around that structure.

The novel aspects of developmental learning in artificial intelligence come from how concepts of devel- opmental psychology are applied. Some methods seek to create a similar developmental pathway for robots, so that algorithms can be used to learn skills that provide a foundation for learning high-level tasks. For example, the ability to identify self is required before identifying an extension of self, ie. a tool [14]. The premise is that if robots mastered foundational skills, then mastering the next level would be a trivial task.

This breaks the learning process into many small steps and human development provides a road map. Though

15 this might not be the best path (because high-level skills are built on general purpose skills rather that skills focused to that task), it works well with humans. Failure of traditional artificial intelligence can be related to lack of skills that young children gain, such as being able to find relevant similarities, remembering ob- jects which are out of sight and dealing with real world complexity. Methods following human development seek to create solid foundational skills which can be readily employed by high-level algorithms. What these methods lack is the synergistic benefit of co-development of skills.

Other methods create developmental milestones as products from the method, rather than heuristics used to guide the method. Social approaches seek to create an environment where learning can take place so that agents will contribute to a social knowledge, similar to swarm systems and emergent behavior research.

Other methods stage the learning on the individual level. By adapting the learning method, or possibly by simple repetition of the learning method on the changing model, the agent develops resulting in substantially better performance. On the surface, it can seem the same as using developmental milestones as stages; but the difference lies in that the milestones are byproducts of the method rather than objectives. An agent gains an increased proficiency at simple problems through experience which is related to more complex abilities.

Since the milestones are largely a byproduct, many of these methods are more abstract.

3.3 The Categorization Problem

One of the foremost problems in developmental learning is categorization. This falls under different applications such as: identification, feature creation and representation. Categorization is required for au- tonomous learning since an efficient representation is needed, so one piece of knowledge should cover a set of information. For example, categorization in the context of this dissertation is applied by separating clusters of locally-optimal solutions based on internal continuity to create inverse functions in Chapter V. An agent should be able to recognize discriminating features to reduce the dimension of the input and form categories

16 that generalize to new experiences. Categorization could relate to states or to relations, such as learning the rules and meaning of words in a language.

Language development requires extensive categorization to work in practical scenarios. First, real-valued, continuous time signals are segmented for development of building blocks: phones, phonemes, vowels, syl- lables, words and sentences [29, 30, 31, 32, 33]. Completely unsupervised learning is generally not desired since results should conform to the structure of the desired language. For this reason, tutors or exemplars are often used to direct the learning [34, 35, 36]. One study uses children to teach virtual agents to accentuate the benefits of co-development [37]. Experience based learning allows learning other ‘languages’ such as tonal meaning when dealing with infants or the influence of gaze on meaning [38, 39, 40, 41, 42]. Most work is, however, to deduce word meaning from video narrations [43, 44, 45, 46, 47], which requires its own segmentation [48]. General purpose segmentation is also researched [49]. This work can also be done in a predictive manner to compose narrations [50]. Narrations are particularly useful in determining others beliefs or intentions, based on theory of the mind, for determining social rules [51, 52]. Note that knowledge representation affects learning performance for real scenarios [53], and work is done to understand repre- sentational scaffolding to direct and assist learning [54]. This is however culturally dependent, and research is done to determine cultural influences and recommend specific methods based on culture [55, 56, 57, 58].

Order of presentation affects learning rate and the learner can accelerate learning by guiding the topics online

[35, 59, 60].

Vision is the other primary focus for categorization, partially due to the prevalence of high resolution cameras, opposed to scarce haptic devices, such as the one used for touch categorization used in [61]. Studies that primarily categorize features in a motor context are described in Section 3.4, though many share prin- ciples with vision systems. Traditional machine vision is more top-down (meaning relating the symbolic to the sensor response); rather than developmental, experience-based approaches which group sensor responses

17 to features and symbolic representations [62, 63, 64, 65, 66, 67]. From experience, categories can be au- tonomously formed to employ co-developing top-down and bottom-up approaches [68, 69]. By directing the learning, such as dwelling on features, planning scan routes or optimizing sensor sensitivity, more robust measures are developed [70, 71, 72, 73, 74, 75, 76]. Other features are based more on interaction with the object, such as if an object is a container [77, 78, 79]. How to grasp an object is a common application

[80]. Features that predict a high likelihood of a successful grip are identified and matched to specific motor commands and objectives [81, 82]. Feature robustness and plasticity are dependent on the learning process and the mechanism used to store the knowledge [83, 84]. For this reason, disabilities (such as the autism spectrum disorders) and the developmental schedule for abilities are researched to find foundational skills

[85, 86, 87, 88, 89, 90]. Artificial neural networks and support vector machines are commonly used and performance can be improved by adding expectation predictors, dynamic neural networks or mirror neuron systems [91, 92, 93, 94, 95].

3.4 Sensorimotor Control

One of the most basic competencies is learning to control one’s own motion. For robotic systems, this is a forward and inverse kinematic question that can be addressed by positions, velocities and forces, but assumes that joints can be actively sensed. More complicated systems could indirectly sense joints by visual feedback. Other sensorimotor systems could involve finding the forcing function of active sensors which increases sensitivity. The goal of sensorimotor control is a relation between control signals to measurements or the relation from a desired output and the feedback law necessary to achieve it [96]. From a cognitive developmental robotics perspective, the robot should develop a mapping from actuation to sensor behavior based solely on experience [28, 97]. This has been often done involving vision [98, 99]. Active sensors can be used to direct learning for such skills as saccadic motion [100, 101]. For simple systems or well-known systems, this is often a trivial problem; but for a complex system or adapting behavior in response to damage,

18 a general purpose algorithm is desired [102, 103]. Tool use requires determining what a tool can do (its affordance) and how to use it [104, 105]. These identification methods have also been used for predictive capabilities: based on new body configuration or new tool, hypothesize the affordances [106, 107].

General purpose methods often fail in large spaces due to the dimension of the space involved [65, 108].

Reduction in the number of dimensions is one method to deal with the freedom, and can be done via a developmental approach [109]. Other methods guide exploration by emotion or motivation based heuristics, such as curiosity or disappointment [110, 111, 112, 113]. Tutoring directs learning based on feedback from an expert [113, 114]. In an environment with noise, motion models should balance smoothness to generalize results to novel circumstances, and exactness to achieve the best performance [115, 116, 11]. Similar to categorization tasks, representation affects learning performance. Two common representations are kinematic or kinetic representations and the choice between them can be made based on how well the task is described in each [117].

Primitives are often used as building blocks to high-level behaviors, such as words being composed of syllables [100, 118]. Primitives can also be used to make operation simpler and are a step toward using triplet- based world models3 [15, 119]. This vocabulary need not be static and can continue to develop [120]. Nor do primitives need to be trajectories, but can be models or feedback control laws learned through experience or tutoring [121, 122, 123, 124]. These motor-memory responses with automatic responses can prevent planning-execution delays. Failure to form proper motor vocabularies has been associated with stuttering

[125].

3.5 Mental and Social Learning

Beyond simple features, meaning basic movements and rules or models, more complicated world models can be created. These can be based on an individual agent’s experience or emergent through social interaction.

Menial problems are assumed to not be trivial or may require meta-learning (learning better ways to learn).

3Triplet models are based on the logic rule: IF((State==A)&(Action==B)) Then (Next State==C).

19 The model of the world typically represents states of the world with rules describing transitions to future states [126, 127]. Without experience, the initial model will be incomplete or wrong. State estimation and creating or editing rules must be done judiciously to handle ambiguity, but should require as few experiences as possible for rapid learning. Surprise measures can determine if events justify special consideration [128].

Curiosity normally takes a slightly different role than it does in simple self-model identification [129].

Representing an individual’s memory is a daunting task and methods are developed for efficient represen- tation [130, 131, 132]. Studies are done on infants or people with disorders (especially autism) to determine people’s internal representations and how the specific representation affects performance [133, 134, 135,

136, 137, 138]. Similarly, studies are done to identify and apply methods humans use to learn to research in a robotic developmental learning context [139, 140, 141]. To progress to more or harder tasks, learning can be transferred to novel tasks [142, 143, 144, 145].

Co-learning or co-development refers to multiple agents or possibly multiple abilities of a single agent developing concurrently to balance complexity though it has an increased risk of divergence [146, 147].

Strategies employed for co-learning differ from strategies used for learning in isolation. The choice of the pairing, such as human-child or human-robot, significantly influences each strategy [148, 149]. Learning can be applied to learning how to learn, as children do, or learning how to teach well [150, 151, 152, 153, 154].

The ability of agents to identify that other agents have internal states, such as beliefs, separate from their own states is referred to as theory of mind. With a set of internal and external beliefs, roles can develop [155, 156].

Internal regularity in a network, representing distinct beliefs, tends to have better performance when gener- alizing [157]. Neural representations have been pursued to support biologically plausible implementation

[158].

20 3.6 The Progress and Potential of the Developmental Learning Paradigm

Developmental learning addresses deficiencies of task specific algorithms, most importantly the need for an expert to tailor algorithms to the application. The field is still very young and has not reached its goal, but initial results show that robots are able to autonomously learn. Robots are able to 1) develop a model of how their actions relate to the environment, 2) develop basic motions to achieve desired results, and then

3) combine those motions for high-level behavior. Current research focuses on only one or two skills at a time, but the skills are laying a foundation for high-level skills. In order to solve the really hard problems, a unifying framework needs to be created so that skills can be synergistically tied together. In addition future research must allow for continual improvement so that robots can reach and surpass adult’s abilities, up to the point that equipment limits performance.

Use of the developmental perspective has been shown to be beneficial. The principles used by infants, children and adults to learn skills offer a scaffolding that directs learning from very basic competencies to high-level skills. This pathway provides guidelines to check emergent behaviors against developmental milestones. Lack of achieving milestones could indicate potential difficulties later on. Theory of the mind provides a framework for dealing with other agents and uncertainty. Experience based learning is able to develop simplified descriptions of the world, similar to heuristics which people use to manage complexity.

This departure from the traditional AI methods, that needed definitive conclusions, allows for much more efficient learning which can be done on-line and concurrently. Because these heuristics are based on ex- perience, robots could learn in very different environments where typical heuristics do not apply, such as the moon or deep sea. Often information is coded as causal relations, allowing for directing the learning process. Developmental robotics research not only advances artificial intelligence, but helps us understand mechanisms of our own cognitive structure [125]. The cognitive process is the aim of this research, rather than the specific agent’s behavior. Principles can be transferred from one domain to another, such as applying language patterns to motion patterns.

21 The common feature of developmental research should be a progression from simple to complex abili- ties. Unfortunately, even with the designation of developmental learning, very little in what has been done autonomously exploits low-level skills to enable more complex skills. Most published results show devel- opment by having additional models, but with the same complexity. At best, motion primitives are used by a high-level planner, but one level is as far as that staging goes. By contrast, the learning method should scale well to allow for near limitless progression. The complexity must be allowed to increase with sufficient experience, otherwise the method simply sequentially learns tasks of similar complexity. An example of a boundless learning scheme and the arguments for it is given in [159]. Unfortunately, boundless learning it- self is not sufficient since the increase in search space would make finding better solutions combinatorially infeasible unless searching was directed. Developmental research seeks to find methods to guide learning in the ever growing search space without imposing our understanding [25]. If a general purpose learner has a

fixed resolution, then it will either be too small to learn complicated tasks or too large so that learning simple tasks is intractable.

3.7 Developmental Structures

Developmental structures must be able to add knowledge and should be able to progress beyond the designers understanding. For this reason, much work has be done to create a structure that can autonomously learn without any knowledge from the designer, termed “Cognitive Developmental Robotics” [18]. None of the structures proposed has proved complete, but performance is progressing. Intelligent behavior can exist without an explicit model of the environment; only understanding of how sensors change with actions is needed. Because small discrete sets are easier to practice with, many of the methods discretize the domain of values and time. An understanding of actions can be stated as a triplet, If Statea and Actionb T hen Statec, which can then be used in logic methods. Unfortunately, logical induction is not valid for most situations; therefore no rule generation method is perfect. Based on these triplets, a robot has learned to plan a path to a

22 target despite the target being moved to another place [128]. Logical methods are very difficult to develop to continuous functions, and it seems that a continuous representation would be required for real-world changing tasks. Therefore, logic based implementations are seen as outside the scope of this dissertation though some useful illustrative principles will be presented.

The alternative to logical structures is to use function approximators, Sfuture = f(Scurrent, u) for dis- crete time systems and S˙ = f(S, u) for continuous systems, where u represents the action. The state, S, can represent a state based on current sensor readings, a continuous state determined internally, or a node in a finite state machine. Finite state machines are a directed graph where the current state transitions along edges according to some criteria. For example a graph might have two states: ‘DoorOpen’ and ‘DoorClose’ where the transitions would be ‘CloseT heDoor’ and ‘OpenT heDoor’. The finite state represents the model of the world and can be used to plan sequences to achieve desired states.

A very general purpose method by Grupen et al. uses a finite state machine where a different controller is appropriate for each state [10, 160]. By using a Bayesian belief network trained to data, nodes can represent combinations of distinct states that are ambiguous based on sensor readings but can be discriminated by history. For each distinct state, a feedback controller is trained by reinforcement learning. This controller represents a sensorimotor paradigm held by the agent and provides robustness by attraction to the desired trajectory [161]. A high-level controller shapes learning through the reinforcement learning process which directs the desired or natural state progression. The high-level controller can have multiple copies of state machines and feedback controllers. Which network to use is selected by an appropriateness measure possibly determined by a value system. There is biological support for distinct networks for each motion, referred to as a system of experts. Aronson suggests the central nervous system shows organization by motion patterns rather than by muscle location [162]. This is a general overview and the actual method is much more complex, but the key structure of sets of state machines and reinforcement learned feedback controllers is covered. This approach is very general to be applied in various circumstances, but fails to provide an obvious method of

23 generalizing to new circumstances or tasks, or to direct motion to accelerate learning when novel tasks are encountered. Ref. [160] argues that developmental progression occurs due to specific developmental reflexes which would come from high-level controller and is outside the scope of the structure.

Other studies have used biologically based motion centric structures. In [163], it is argued that motor control begins as a high gain system that tends to the observed developmental reflexes which disappear later in life (for example, the grasping reflex when something touches the palm). These reflexes are specific responses to sensor inputs that can be represented as vector force fields in state space. Initially, any sensor input is magnified to dominate the behavior and the force field directs the path to the observed motion. Upon maturity, the response has a lower gain and volitional control can be observed. The work in [163] proposes that force vector basis functions should be used as the representation of motor control. A similar reactive structure in [159] is shown to be sufficient to achieve useful high-level behaviors like walking, jumping or phototaxis (honing to a light). Other explanations of developmental reflex disappearance and emergence of

fine motor control deal with the process of myelination of peripheral nerves which dramatically increases the signal strength from the central nervous system. However, myelination alone cannot account for the emergence of new behaviors [164, 165].

Most methods that use neural networks use them as function approximators. However, there have been attempts to use them in other ways in a developmental sense. In [25], a recursive neural network is built from sensors to motors based on a genetic code. The genetic code is optimized for the task of moving while avoiding obstacles. The genetic code specifies how neurons are grown rather than the structure itself. Sensors and motors nodes begin in fixed locations in a grid and based on local environmental conditions nodes will 1) divide to produce additional nodes, 2) grow connections or 3) excrete chemicals. However, this method seems excessively complicated and did not produce noteworthy results. Another study used a genetic algorithm to evolve creature topology along with a neural network for control [159]. Genetic algorithms can be seen as staged learning through the use of generations, but they typically require too many trials to be reasonable for

24 physical systems. In [159], the agents were formed by blocks attached by various joints with each joint having an internal control law based on other joint angles to determine joint force. Both the body and controller co- evolved so that one was not excessively complicated. The creatures were given different tasks (swimming, walking, jumping and honing) in a physics based simulation. Not restricting the agent form allowed for development beyond what the designer could have created by hand. Another study sought to emulate the pattern generators in the cerebellum by using chaotic neural networks [129]. Chaotic neural networks are formed by including oscillating elements in a neural network so that the steady state becomes very sensitive to the sensor values. The network was trained to follow paths, which it could, but then performed poorly on new paths.

Another framework deals with understanding the effect of actions. The first goal is to discover self, meaning the things that you have direct control over. This is found through experimentation which is an important part of development. Drawing from child development research, the term motor babbling indicates random commands sent to the motors for the purpose of collecting data. The model of self is based on a high correlation of sensor features to commands. A robotic arm in [14] used the time of the response to commands as a discriminating feature of what is self from what is environment. Here the features were color coded targets, and a map was built to determine the relation of motor commands to which features would move. In another static experiment with a robotic arm, blob detection was used to detect interesting regions that were then classified [166]. Mutual information measures were used to train which clusters were correlated with commands. After the shoulder and elbow where babbled, the fingers where babbled while the arm was fixed.

Using designations from the previous self clusters, the fine motions of the fingers were sufficient to identify blobs as being self or not self. This is a good example of how dividing learning into stages can make detection easier.

Swarm approaches are similar to developmental learning in that simple rules are used to generate emer- gent behavior. An approach that straddles both fields uses sets of independent nodes that can attach to form a

25 composite robot. The system is based on neighbors and uses hormones to generate global behavior based on local rules [167]. Different topologies such as a snake, legged (greater than 3 legs) or loop were demonstrated.

However, communication and locomotion rules were hand generated. Other work using social interaction to drive development takes language as the primary concern [168, 169]. Social learning is important for large scale developmental fleets, but most all social learning work assumes that the robot has basic motor com- petencies necessary for interaction and basic sensor classification for determining key features. Basic motor competencies can only be guaranteed for simple systems, such as differential drive robots. Basis sensor classification is progressing, but distinctive features are often used since general classification learning has not reached reliable performance. Because social learning has not shown to be able to develop basic motor competencies, much of the social learning work is outside the scope of this dissertation.

3.8 Directed Learning

The goal of directed learning is to move beyond random motor babbling to trails that directly help the goal. For robots to be effective in many human tasks, they will need many degrees of freedom. However, getting sufficient global data spanning a high dimension physical system takes too much time and memory.

Besides, this is inconsistent with biological systems since they only grow proficiency with common tasks and use planning techniques for infrequent tasks [170]. Without sufficient data, some learning methods become unstable and cannot find any consistent generalization to abstract. Directed learning is seen in human development where infants first use core muscles for reaching. After that has developed, elbows, hands and then fingers are used [13]. Legs stiffen when learning to walk and balance but then relax and progress to a more energy-efficient gait [171]. Similar work has been done to stage degrees of freedom by delaying access to some actuators [166, 172, 173]. Staging degrees of freedom helps to establish the dominating relationships which are then used to help discriminate the more subtle relationships. Many function learning methods require a set scale to define modes or clusters (i.e. nodes needed for radial basis functions, clusters needed

26 for k-means clustering). If the scale changes, a bifurcation in the representation could occur resulting in poor similarity between the representations. By only looking for the dominant features, noise is much less likely to be picked up as a feature or obscure a true feature.

Many methods have been investigated to maximize learning rates by pursuing the state with highest error or least data [174, 175, 176, 177, 178, 179]. Unfortunately, this may try to learn data everywhere, which takes a prohibitively long time. Also, it may be asking for too much generalization. Though people possess great cognitive plasticity, there are limits showing that trying to learn too much may distract from becoming proficient in the common motions. Trying to refine the maximum error will not work for real systems due to noisy, incomplete data. Some relations cannot be represented as a function. For example in [180], a robot learns to reduce the information in an image down to 1% for classification by mutual information and support vector machines, but then forms logic tuples. It chooses the action with the least confidence at prediction.

If an action produced random results, no single logic tuple could describe it and there would still be a low confidence at prediction. Therefore the system would converge to selecting this action without learning the consistent rules of the other actions.

A better method considers the behavior of the error [16, 181]. If an action (given a certain state) is becoming more predictable, then this would be an action to refine. After a certain point, sensor noise or other errors would dominate the prediction error showing that the function is learned as well as it can be approximated by the representation used. Other methods would then have greater learning rates and focus would would be directed to them. Because with only a few data points, an outlier may create a worse learning rate, there is a chance (10% to 30%) that an action is randomly selected. This added significant robustness for factors such as environmental changes. Data is partitioned to experts once a certain number of samples is obtained. The partition location is optimized creating demarcation justified by the data. This approach is attractive since it provides an indicator (all learning rates are stagnant or all outcome likelihoods exceed a threshold) of when a development stage is done and sensor information or motor control should be advanced.

27 An interesting result of this method, dubbed “Intelligent Adaptive Curiosity,” is that it exhibits unpro- grammed learning stages. Two results have been presented, one of a simulation of a robot moving about a box [181] and another of a robot dog on a infant’s play mat [16]. Initial motions will first be very similar to motor babbling since no predictor has been reasonably trained. Next it learns that no action often produces no environment change giving a default representation of what the environment does on its own. Then an action will begin to be predictable (simpler ones are more likely) and it will be repeated many more times than before. At some points this focus of attention may shift to another action and possibly back after a period.

The method gives no direction to seek certain senses, yet at the end of the study the dog would ‘successfully’ behave by choosing actions that resulted in a more specific outcome. This is a result of the successes (biting a bite-able object) being rare, so after the model is generally well defined these rare events capture the most attention as it tries to learn them better.

Other directed methods can be heuristic-based. To learn on an unstable system, the controller is given highest priority to develop a good local model and second to pursue a symmetric cycle [182]. In [11], a humanoid robot ASIMO is trained how to point to locations using all the body joints. Training data is generated by having ASIMO trace a circle with the right hand and a circle or a figure 8 with the left. The inverse kinematics is learned by a recursive neural network for either set. When ASIMO was trained to both circles, he was successfully able to generate poses that pointed to novel locations with reasonable accuracy.

However, when the network was trained with the figure 8 set of data, ASIMO’s generated poses where very distorted in one axis. There is no clear explanation why one training set performed so much better than the other showing that heuristic success often can not be known a priori.

Numeric methods can be see as directed learning, most often pursuing a minimization based on gradient information. In [12], simulated annealing and simplex method are used to optimize a trajectory of animals in a physics based simulation. Simulated annealing contains a set of candidate points (or a single candidate) and allows the candidate to move based on the new cost value and a temperature parameter. When the temperature

28 parameter is large, the candidate movement is largely undirected. As the temperature falls motion progresses to a steepest descent. Given a sufficiently slow rate of temperature decline, the global optimum is guaranteed to be found. The simplex method is a linear optimization algorithm that considers solutions on the boundary of the feasible region, which is where the optimum must occur for a linear optimization. This directed trial- and-error method may have exponential time in worst case, but in common practice it finds the solution in near linear time. As stated before, though numeric methods are increasing in efficiency, they typically require too many function evaluations (trials of physical systems) to be practical for robotic learning methods. This is especially true if the results have random elements requiring multiple repetitions to capture an average result and its likelihood. There are other numeric methods more applicable to trajectory generation, but they require high fidelity models of the system and are sensitive to stochastic systems. These methods employ more greedy methods like gradient descent, Newton’s method and pseudospectral methods. In order to take accurate step directions, a complete basis is sampled so that they generally scale poorly to high dimension simulations.

Another method generally not thought of as a numeric method is the class of genetic algorithms. It is included in this category since it uses similar principles as numeric methods. Crossover operations can be seen as a type of simplex method where the two (or more) parents are vertices and the child is a new vertex that represents the intersection of projections of the parent points. Mutation operations are similar to the random motion in the simulated annealing. Genetic algorithms also generally perform better than exhaustive searches or undirected random walks, but still typically require too many iterations to be physically practical.

However, it has been shown to be effective in simulation for producing developmental results [159].

29 3.9 Function Approximators

Function approximators play a key role for developmental learning. Since the proper functions are not known, the true function can be approximated to a sufficient accuracy by a number of function approximators.

However, some work better than others, but no one approximator has been shown to be always better.

In [182], a stick juggler builds a map of stick state and robot position to resulting stick state. Due to the nature of the impacts involved, this event is well characterized by a discrete event system. Continuous time systems are frequently discretized to assist analysis, but sometimes miss critical system attributes. Linear weighted regression (LWR) employs a least squares linear regression where the error is weighted by the proximity to the point where the LWR is evaluated. This allows for more accurate generalization of noisy, nonlinear data. In addition, statistical measures provide a convenient measure of uncertainty and confidence bounds that are more accurate than simple error. Many additional techniques are used to prevent ill condition inversions, such as adding a diagonal matrix to the matrix to be inverted and adding random perturbations to the control to encourage data to support every direction. The learning in [182] is directed to first build a confident local model before progressing to a sustainable equilibrium point because of the instability of the system.

A number of function approximation methods use basis functions, either explicitly or implicitly. Basis functions can be added in linear combinations to produce new functions. Examples include linear interpo- lation where basis functions are tent functions [12], Fourier series where the basis functions are sinusoids and single-layer neural networks where the basis functions are the activation functions. A method similar to single layer neural networks is called projection pursuit regression (PPR) [183]. The approximator of the output, y R, based on m input samples of x Rn is ∈ ∈

m X > y =¯y + βigi(αi x) (2) i=1

30 n where y¯ is a mean value (so that bases can be centered about zero), βi R, αi R are parameters of ∈ ∈ the function approximator and gi is a scalar valued function, to be described later. In the neural network

th equivalent, βi are the output weights, αi is the input weights to the i node and each node has an activation function gi( ). The difference is not structural since PPR can be seen as a special case of neural networks. · For multiple output functions, (2) can be repeated in each output and concatenated to form the output vector.

A few reasons make PPR more attractive than standard neural network procedures. First, basis func- tions are not arbitrarily selected. They are formed from smoothed functions of the data, α>x, y . Typical { i } smoothed functions are created by applying low pass filters to the data. Therefore even, odd or multi-modal data can easily be modeled without needing to know the number of inflections. The amount of local smooth- ing is proportional to the local variation. By having more smoothing where data has significant variation, initial nodes do not try to over-fit the data by explaining variation due to an orthogonal dimension. The direc- tion of projection, αi is found through a local optimization of the gi fit. Results of the direction optimization can be rejected if it is a local optimum that results in poor performance. After a basis function is generated, only the residuals are used for subsiquent determination of directions and basis functions. To keep parame- ters on a similar scale, the outputs are often normalized. A frequent complaint of neural networks is that they cannot be easily interpreted. PPR however provides both important directions and what the function of that direction looks like, gi, given in one dimension. A downside of this method is that the iterative process is not optimal and it has not been shown to be an universal approximator. PPR easily provides a reasonable function approximation method without the need for human analysis because it determines direction of decomposition from the data.

Linear weighted regression and PPR are nonparametric methods in that they directly use the data to determine function form. The alternative is to assign a function form and then determine parameters to fit the data. A disadvantage to non-parametric forms is that they require all the data, which can lead to large

31 mathematical operations. Research is done to identify data to be removed, normally based on time or locating points with a small separation distance.

Parametric methods automatically condense the data to a set of parameters. Considering universal func- tion approximators, neural networks are most often seen. Standard forms have emerged to exploit some good properties. Single-layer feed-forward networks are composed by having a the inputs pass directly into a set of activation functions, with the network output being a linear combination of the activation function outputs.

These networks can easily be trained as a linear regression on the basis function outputs. Radial basis func- tions (RBF) have an activation function that is a scalar function of the distance to a point, normally a bell shaped curve with the points spread throughout the domain. This provides a method of localizing the output value to a section of the domain. Sufficient function plasticity and coverage can be achieved by setting up a grid across all input dimensions. This method is used in [24] to solve arm kinematics. The mapping of joint angles, 4 degrees of freedom (DOF), to end effector position, 3 DOF, is done. The Jacobian is also determined by changing the basis functions to the derivatives in the input-output dimension considered. Because the size of the network plays a significant role on the ability to generalize to novel inputs, the appropriate network size was also investigated. Using a minimum residual criterion, nodes are added to the network until the threshold is reached. A minimum error of zero can be achieved by using a node for each sample. It was found that there was a noticeable transition point in the network size once the error threshold was large enough. This transition point gave good generalization results. Therefore the network was retrained until a stable network size was obtained.

Feed forward networks can be effectively organized through Bayesian belief networks. Bayesian belief networks choose parent nodes (inputs) and then use conditional samples based on the parent nodes value to guess the probability of transitions to child states. Unfortunately, hidden states are not observed, so conditions of hidden states cannot directly be seen from the data. Training is still an open research question but practical methods exist. After a network is trained, unreliable links (those with a low conditional probability) can be

32 removed and the graph is used as the structure for a neural network. In [184], training data was used to create a forward kinematics model of a 1 DOF robot. The network structure also provided a graph that could be traced backwards to compose an inverse map. This is useful in the social development context for imitating demonstrated motions and for planning.

Recursive neural networks are another common choice. They are much harder to train due to local minima, but can model complex systems more compactly than single layer networks and are seen as more biomimetic. To develop a model of the inverse kinematics used in whole body motions for pointing with two hands in three dimensional space (6 DOF input and 15 DOF output), a sparse recurrent network was chosen in [11]. Two sets of training data were formed by the choice of a circular or figure 8 motion. This network worked well on circular training data, but performed poorly in one dimension with the other training set showing it is difficult to determine generalization a priori.

Classification problems are similar to function approximation in that they learn a map from inputs to outputs. Important features have been traditionally selected by the researchers and chosen for the greatest convenience. Common selections are specific colors, blobs or well defined intensity optima. In order for truly autonomous development, the agent needs to be able to determine distinguishing features autonomously.

Machine vision is common in developmental projects and support vector machines (SVM) are used in [166,

180] to find significant features for developing an If Statea and Actionb T hen Statec model. In [180],

SVM are used to reduce the dimension space to 1% of the original space, thereby significantly reducing computational demands. In [166], blobs are first identified and then clustered based on hue and saturation distributions. Then mutual information of the clusters and motions are compared to classify cluster locations based on a causal relation with control commands.

The method for determining locally linearized dynamics of robots based on kinematic chains is well developed, but complicated and difficult to measure. These functions are often built by hand but then tuned online through adaptation. Note that these are no longer universal approximators but are specific to controlling

33 rigid bodies. There is disagreement over what to model, mostly due to questions of biological systems. It is generally accepted that biological systems contain an internal model [21]. The differences of the ability of species to create models correlate with the ability to learn novel tasks [14, 168]. However, it is not clear if humans control position or force or something else. Position based methods are discussed followed by dynamic methods.

In support of a position based approach, human motions are roughly straight with bell shaped velocity profiles [185]. Even studies on neonatal infants show multiple straight line segments [186, 187]. Stiffening of extremities when learning motions supports a posture-based approach [171]. Adults do not look at hands when reaching and babies can locate glowing object in the dark with hands [188, 189]. This suggests that even from an early age, control of position can be done without visual feedback. For these reasons, static models for positional control are learned in [11, 24].

In [118], the motion of a human grabbing a cup is recorded with a second cup placed around it. Reaching shows coordination of the shoulder and elbow joints, consistent with the independence of wrist with other motions in reaching [190]. Depending on the relative location of the second cup, the human motion would be scaled away from it, but the form would be the same and the resulting motions clustered into three distinct trajectories. This brings up a second point concerning motion primitives. It is believed that there are sets of predefined motions that are modified for novel circumstances. In [191], the claim is made that there exist learned sets of artificial force fields that are imposed to regulate motion, similar to the suggestion that the cerebellum regulates joint stiffness [192]. The result of these controllers creates a trajectory, but does not seem to explain the bell-shaped velocity. When doing a novel reaching motion, joint stiffness increases to be significant in all directions (as opposed to only away from the body as is the case when at rest). After the motion is learned, joint stiffness decreases [185].

If motion were position-based, then exocentric forces would significantly affect the path; yet this is not seen. In support of dynamic based motions, animals can respond in dynamic ways and optimize gaits in ways

34 based on forces [12]. Imitation on a velocity level may be more natural where the position is necessarily different [184]. Actuators, including muscles, are normally force-based and therefore could be optimized more than if they were constrained to use a position regulator. Few studies, however, are very rigorous in general-purpose force-based models. In [193], rigid body robotic equations are used apologetically with the explanation that they had not the means to develop a satisfactory learner. However, parameters are fit to data and the model is able to achieve the developmental milestone of the agent identifying when something is acting on it despite having large joint forces due to gravity or motion.

35 CHAPTER IV

NUMERICAL PATH OPTIMIZATION

A basic developmental milestone is being able to control muscles to produce a desired motion. In addition, the motion is optimized as it becomes more familiar. Optimal control problems are challenging to solve even on simple systems. Few boundary valued, nonlinear problems afford analytical solutions because of the lack of a closed form solution to the differential equations. Ki- netic systems, such as robotic linkages, can be complicated to solve because of nonlinearities and large numbers of constraints and degrees of freedom. In order to autonomously create motions, a learning method should be easily adaptable for a broad range of systems without requiring system dependent heuristics. A numeric control optimization software, DIDO, is coupled to a numeric kinetic solver, SimMechanics, within MATLAB. The kinetic model is created directly from a solid model assembly eliminating human errors or requiring judgement. A pendulum with control saturation is tested to validate satisfaction of theoretical conditions (< 10% optimality residuals, typically < 5%). The numeric method is contrasted to a linear-quadratic-regulator (LQR) and the optimal linear state transfer. A four degree-of-freedom, arm robot pick-and-place command is also optimized and realizes a 50% decrease in energy used over the traditional ramp to constant velocity maneuver. This coupling obtains near optimal solutions without intense, model specific analysis. Having a general purpose program for determining motions to accom- plish tasks is a necessary step for other developmental skills, such as social interaction.

NOMENCLATURE f(x, u, t) State derivatives/ state dynamics equations g(x, u, t) State or control constraints, required to be 0 ≤ H(x, u, t) Optimal control Hamiltonian

J Path cost to, tf Initial and final time

To, Tf Set of admissible initial and final times 36 u(t) Control signal x, x,˙ x¨ State and its first and second time derivatives, respectively

Xo, Xf States meeting initial or terminal conditions

ψi(xi, ti) Cost at event i

φ(x, u, t) Running cost

λ(t) Optimality co-states

4.1 Introduction

Many motor-skill tasks can be viewed as an optimal control problem. An optimal control problem seeks to find control trajectories that minimize a performance function while satisfying constraints. A system of differential equations (DE) distinguish optimal control problems from static optimization problems. How humans control and plan motion is poorly understood, but the mathematical principles involved in optimizing a trajectory are well established and can be use in lieu of biologically derived optimizations. The optimality conditions from calculus of variations are also differential equations. The lack of closed-form solutions for many nonlinear DE’s impair indirect optimization. One prevailing method discretizes time to remove the differential equation and applies nonlinear programming to the set of all states, controls and constraints

[194]. In the limit as the discretization approaches the continuous system, the Karush-Kuhn-Tucker (KKT) conditions approach the optimality conditions developed by calculus of variations [195]. Two-point boundary value problems cannot march forward in time with an arbitrary set of initial conditions. Ensuring terminal conditions are satisfied makes boundary valued problems much more complex than initial value problems. A survey of historical methods to solve optimal control problems is given in [196]. Some of the more modern techniques pose the problem as a generic optimization and use search methods such as genetic algorithms.

These heuristic methods often have limitations requiring specific knowledge of the problem [197, 198].

37 Modern collocation methods have been developed from areas such as computational fluid dynamics using pseudospectral techniques. Rather than rely on low-order Newton’s method approximations and equally spaced points; new programs have been written which use higher order techniques and nodes of the Legendre-

Gauss, Chebyshev or other polynomial roots to achieve optimal spectral accuracy. The theory has been well developed and programmed [199, 200, 201]. The program used in this work is DIDO4 written by I. Michael

Ross [202]. It has been flight proven for the International Space Station and some examples of applications are shown in [203, 204]. Another common program in the literature is GPOPS5 written by Anil Rao et al. and applies similar techniques. The name of this program has changed with code structure or other issues; it is also known as GPOCS, OPENPOCS, GPOP and PSCOL [205]. Examples such as close satellite formations and variable mass reentry are shown in [205, 206, 207]. These programs implement the low-level nonlinear programming for optimal control with a high-level problem statement.

Similarly, kinetic solvers have been developed to solve rigid body dynamics. These programs incorporate physical constraints into ordinary differential equation solvers for finding the response of initial value prob- lems. A number of such programs are available with different applications such as machine design, impact reconstruction and virtual reality. A brief survey of programs for solving kinetic problems are given in [208].

At each moment in time, the geometry is analyzed to determine permissible movement directions based on kinematics. Within those restrictions, body and applied forces are used to determine the actual movement.

Using variable step solvers, very accurate solutions can be found. The program used in this chapter is Sim-

Mechanics, a blockset of Simulink which is an add-on to MATLAB6. The methods specific to this program are explained in [209]. The numeric kinetic solver gives high-fidelity solutions without extensive time spent modeling. Example results and programming explanations are shown in [210, 211, 212, 213]. SimMechanics offers compatibility with other MATLAB programs, visualization using imported 3D models, determination

4Distributed by Elissar, www.elissar.biz, Monterey, CA 93942, USA 5Open source available from www.gpops.org 6The MathWorks, Inc., Natick, MA 01760, USA

38 of mass properties from most common solid modeling programs and simulation of friction models, such as stiction.

Generalized numeric solvers operate very well once the problem is put in their framework. After that, the problem is in proper form and the solver can be treated largely as a black-box. The designer can then see the high-level behavior without needing to tailor an optimization to the particular geometry. Of course the disadvantage is that individual properties of a system are not exploited by the solver, so general purpose numeric solvers perform worse compared to specialized solvers. However, for a few runs to characterize the system, the general purpose solvers will often offer sufficiently accurate solutions in less time than required to compose specialized solvers. From a developmental learning context, a general purpose solver would be preferred since it only requires a high-level description of the system, rather than an understanding of the inter-relations of constraints.

The next section presents a brief statement on the vast scope of problems applicable to control optimiza- tion using a kinetic solver. In Section 4.3, the optimal control package and kinetic package will be described along with the coupling. Accuracy is validated for regulation of both a suspended and inverted pendulum for optimality in control energy in Section 4.4. Then Section 4.5 shows optimization of a pick-and-place com- mand for a four degree-of-freedom robot to demonstrate an example of a practical application. The chapter concludes in Section 4.6 by summarizing the findings and contributions of this approach.

4.2 Problem Statement

The scope of this chapter relates to rigid, three dimensional bodies with constant, finite, positive mass and moment of inertia acted on by body or joint forces. Joints are either prismatic, revolute or a combination with a finite number of degrees-of-freedom. The bodies are controlled through actuation of joints by either a specified motion or an applied force. Event constraints are imposed in the form x(to) Xo, x(tf ) Xf , ∈ ∈ to To and tf Tf where all sets are bounded. Path constraints are written in the form g(x, u, t) 0. The ∈ ∈ ≤

39 7 solution is x(t), to and the function of u(t) satisfying the constraints and minimizing the Bolza problem ,

Z tf J =ψo(xo, to) + ψf (xf , tf ) + φ(x, u, τ)dτ. (3) to

Limitations on this problem statement are that the system must be deterministic, nonsingular and com- pletely known. Though the techniques used by DIDO may converge to an ultimately bounded set if these assumptions are violated, this claim is beyond the scope of this chapter. Spring-damper links, and other spe- cialized links, are supported by the program, but are also outside the scope of this analysis along with screw joints and actuation states, such as motor dynamics. Having a known system means that the geometry and mass can be represented in SimMechanics and all other equations can be evaluated at points in time in MAT-

LAB. If a valid solid model is used, the constraints and mass properties will be physically realizable. Because the system is built around ODE solvers, all the standard assumptions apply. However some violations, such as using a linear interpolation table, still give good results. All the states, controls and times must have finite, known bounds. And finally, an optimal solution must exist.

This problem fits in the broader scope of developmental learning since it deals with the ability to plan a motion based on trying individual motions, which parallels learning by experience. The solution uses a high- level, but standardized, description of the system, so the algorithm can convert the motion optimization into a rote exercise, not requiring deep cognitive understanding. Learned motions would then facilitate high-level skills which can be employed for group learning.

4.3 Method

The method presented here is decomposed into two steps. The first is to form the kinetic problem. The second is to form the optimal control problem. The two problems are coupled by using the kinetic solver to provide dynamics to the optimal control solver. By linking these solvers, the low-level development of system

7The Bolza problem is a combination of the Lagrange problem which only considers running costs, φ, and the Mayer problem which only considers event costs, ψ.

40 of equations of motion is not required for optimizing trajectories. Solutions can display complex mechanisms that can then be exploited for system design after being verified for suitability.

The kinetic problem is formulated as a kinematic chain of rigid bodies forming a manifold of admissible movement. Forces and responses are applied to determine actual acceleration. All the designer provides is the chain and corresponding mass and moment of inertia. The chain begins with a ground coordinate system that has a specified but variable relation to a rigid body’s coordinate system, i.e. a joint. Joints relate dif- ferences between two coordinate systems. Revolute joints have angular differences about an axis. Prismatic joints have a translational difference along an axis. Composite joints, including universal joints, parallel con- straints, screw joints and gear ratios, have combinations of revolute and prismatic joints. Coordinate systems identify positions of the center of mass, joints, sensors and actuators. Coordinate systems of a rigid body are referenced together based on a constant relative vector in a body coordinate system. The process of creating and linking rigid bodies, sensors and actuators is repeated for each link. Rigid bodies can be drawn and as- sembled in popular CAD programs and exported directly into SimMechanics where joints are automatically created to match assembly constraints. Each joint applies a set of constraints to the location and velocity of adjoining links. These constraints can be linearized and solved efficiently. A discussion on best approaches and considerations in implementing this solver is given in [209]. The kinetic solver has a variable error bound and can be set to a sufficient limit. Note that if the bound is too large, results will appear inconsistent for numerical optimization.

Control optimization begins with a good understanding of what is desired. A high-level statement should be made about the objective. The performance index and constraints should be composed to ensure that the goal is met. Typical constraints involve restricting the state or control. For example, a rocket launch problem would impose a constraint that the height must be above ground throughout the path, the thrust is nonnegative and less than or equal the maximum available, the amount of fuel remaining is nonnegative and no-fly zones are observed. The first is a simple bound on the state. If thrust were the control, then the

41 second would be a simple bound on the control. In order to measure the amount of fuel, the fuel amount could be an added state with a simple bound. No-fly zones represent a constraint on the rocket state, but may not be a simple constraint. A function of the distance to the no-fly zone or other representation of the boundary would be needed and incorporated into g(x, u, t). Next, the cost or performance needs to be composed into a running cost φ(x, u, t) and boundary costs ψo(xo, to) and ψf (xf , tf ). Running costs are best used to describe good or bad aspects of the path such as control power, separation distance, or other changing quantities. The boundary costs are better suited for accessing the suitability of the initial and final states. Once these equations are written, then the optimal control problem can be easily coded. Collocation optimization techniques only require the function values to be known at certain points. This transforms the problem from involving differential equations to a static optimization, which offers many well developed algorithms [194]. The number of nodes used, like elements in a finite-element model, should be sufficient for the solution to converge to the continuous solution but not so many to amplify numerical error or make solving intractable. DIDO offers the use of an initial path for refinement, but does not require one.

The results from DIDO consist of the state, control, and time where each node is evaluated. In addition,

DIDO also returns the Hamiltonian and co-states used in the optimization. If DIDO determines that the solution converged, such a message will be returned; otherwise possible suggestions for debugging will be returned. Feasibility, optimality and sensibility should be checked to verify that the results are a solution.

First, verify that the constraints are met by the solution. The initial value problem ODE should be solved using interpolation between nodes for the control, and these states should be compared to the solved states.

There will likely be some numeric error, but it should be small enough for the application to show the results are feasible.

Optimality conditions, such as Pontryagin’s minimum principle and Bellman’s principle of optimality, should be checked as much as the specific problem will allow. Depending on the complexity of the problem, many of the optimality conditions may be difficult to check. In order to use the optimality conditions, the

42 Hamiltonian will be introduced

H(x, u, t) = φ(x, u, t) λ>f(x, u, t) (4) − where λ are the co-states and f(x, u, t) is the expression defining the state derivatives. From calculus of variations, the set of optimality conditions are:

∂H dλ = (5) ∂x dt ∂H dx = (6) ∂λ dt ∂H =0 (7) ∂u when the problem is unconstrained. Optimality of boundary conditions and active constraints are treated in

[214]. Because these representations require partial derivatives of f(x, u, t), these optimality conditions may not be practical to find analytically.

Some simple checks for optimality exist for specific problem types. If the Hamiltonian is time invariant and the final time is fixed, the optimal trajectory will have a constant Hamiltonian. If the Hamiltonian is time invariant and the final time is not actively constrained, the optimal trajectory will have a Hamiltonian equal to zero. Another check of optimality is to compare the cost (from the initial value simulation) to the costs from other methods. If the path is similar, the cost should be similar but worse than the DIDO solution. Above all, the results should be reasonable. After being shown the solution, it should make sense. This does not mean that the optimal results could be predicted, but that the results exploit some part of the system.

4.4 Optimality Validation

4.4.1 Setting up the problem

A simple example of a pendulum will be used to verify optimality. Consider a rod connected to a torque- controlled revolute joint with gravity acting vertically. This system is a practical choice: the model provides a pervasive baseline for control algorithms, it relates to robotic arms, rocket dynamics and wind-excited

43 Table 1: Pendulum Parameters

Assigned Properties Dimensions (l × w × h) 1 m × 36.9 mm × 10 mm Axis of rotation [1, 0, 0]> Gravity (g) [0, 0, −9.81]> m/sec2 Torque limits [−7, 7] N m Material (density) Aluminium (2.710 gm/cm3) Derived Properties Mass (m) 1.008 kg 1.026 0 0  Moment of inertia 1 0 1.025 0 kg m2 about CG 12   0 0 0.001 Distance from pivot 0.496 m to CG (lc)

tall buildings and it includes trigonometric functions and saturation, which are two of the most common nonlinearities. The goal is to move the pendulum from rest at an angle to straight up or down (θ = 0◦ on the respective figures). The parameters are given in Table 1 with a system illustration in Figure 1. The analytical model is

3 3l θ¨ = u c g sin(θ). (8) ml2 − l2

The performance function to be minimized is

Z tf J = u2dτ (9) to which would represent minimum energy input for an electric motor. For presentation, the square-root of J is taken. The solution is constrained with fixed initial and final times (to = 0 and tf = 1 sec) and a bounded controller ( 7 u 7 N m). − ≤ ≤

The pendulum assembly is composed of the pendulum and a pin that it rotates about. Minor effects due to the point of rotation not being on the tip, and the different cross section are automatically calculated within the solid model and kinetic solver programs. A sensor and actuator are added to the imported kinetic model.

The model is linked to the optimal control solver via the pseudocode8 in Table 2. The SimMechanics model

8 Code is available as “SimMechanics pendulum used for control optimization” (www.mathworks.com/matlabcentral/fileexchange/28597), MATLAB Central File Exchange. Retrieved Jun 10, 2012.

44 Suspended Inverted m θ Gravity L L θ m

Figure 1: The pendulum is either inverted or suspended by changing the direction of gravity with respect to θ = 0. The solid model used is shown on the right. It consists of the pendulum and a small piece serving as the pin of rotation.

Table 2: Pseudocode for Connecting SimMechanics to DIDO

Function Dynamics, f(t, θ, θ,˙ u) For each node (time step, t) Run SimMechanicsSystem([t, t + ε], [θ, θ˙], u) Store output θ˙, θ¨ Return the set of θ˙, θ¨ for all nodes

consists of the kinematic tolerances, a revolute joint for the pin, an actuator for the control input (motor), a sensor for the state derivatives and a massive body for the pendulum, as shown in Figure 2. In the pseudocode,

SimMechanics System simulates the system over the time span [t, t + ε] with a constant control input of u(t) where ε is a small number. The states given from DIDO, θ and θ˙, provide the initial conditions for the simulation which outputs the state derivatives, θ˙ and θ¨, at time t. This is repeated at each node of the solution and then the set of all outputs for all time is returned. The test solution given from DIDO is not guaranteed to be feasible, so each node is simulated in isolation. Coding the rest of the optimal control problem is presented in the user’s guide for DIDO [202]. To find an estimate of the error of the combined numeric methods, starting angles in 10◦ increments from the final state were tested for all starting angles up to 180◦.

In order to compare this method with current practices, a linear-quadratic (LQ) state transition controller was also simulated on the system linearized about the equilibrium point based on the optimal linear state

45 Figure 2: The SimMechanics program solves the kinetic problem using relations of forces and rigid bodies.

transition. Because LQ requires a linear system, it cannot account for control saturation. The linear system and control solution are

θ˙  0 1 θ 3 0 =x ˙ = + u 3lc ˙ 2 θ¨ 2 g 0 θ ml 1 − l =Ax + Bu (10)

Z tf Aτ > A>τ Wc = e BB e dτ (11) to

> > A (tf −t) −1 Atf u(t) = B e W (e xo xf ). (12) − c −

This solution exploits the linear solution to reach the final state exactly. But due to differences with the actual system, using the nonlinear system results in a large error due to its open-loop formulation.

46 The more common infinite-horizon linear-quadratic-regulator (LQR) controller can be written in the feed- back form, u = Kx. The gain, K can be found from the set of equations −

0 =A>S + SA (SB + N) R−1 B>S + N > + Q (13) −

K =R−1 B>S + N > (14) which is defined by the cost function

Z ∞ J = x>Qx + u>Ru + 2x>Nu dt. (15) 0

The finite time LQR does not give equivalent results to the desired optimal control problem, because it does not impose an exact constraint on the final position but only a penalty function. The final state constraint can be approached by adjusting Q until the desired settling time is reached. The control weight, R, is held at 1, to scale the cost; N is set to zero and the first element of Q is used to control settling time while the other

◦ elements are zero. If this cannot be done (as for the suspended pendulum with θo 130 ), the lowest cost ≥ controller is chosen. The measure used for settling is

(g/l )(1 cos(θ)) + (1/2)θ˙2 E = c . (16) − 2 (g/lc)(1 cos(θ0)) + (1/2)θ˙ − 0 This provides a positive definite function over θ ( 180◦, 180◦) similar to the total energy (kinetic and ∈ − potential). The true energy equation was not used since in the case of the inverted pendulum, energy and would provide a false measure of settling since the kinetic energy is converted to potential energy using momentum to compensate for the angular error. When E < 0.002 = 0.2%, the system is considered settled.

This is done on the full nonlinear simulation for accurate tuning. This method provides an LQR controller with an equivalent final time, but with a different control law.

4.4.2 Path evaluation

The results condensed into cost and final error are shown in Figures 3 and 4 for the suspended and inverted pendulum (note that the square root of J is shown). The cost of all methods was comparable, suggesting the 47 Control Cost, Suspended Pendulum 6

sec 4 √

N m 2

0 0 50 100 150

Final State 5 4 DIDO 3 LQ 2 LQR 1

Settling measure, % 0 0 50 100 150 θ , deg o

Figure 3: For a suspended pendulum, the cost of all methods is comparable suggesting that results are rea- sonable. However, due to saturation and geometric nonlinearities, the linear based methods fail to reach the final state for some angles, while DIDO gives consistently good results.

DIDO solution is reasonable. Surprisingly, the suspended pendulum proved to be the more challenging control problem. The open-loop LQ system is able to have high accuracy for low initial angles, but as the angle grows, it compensates for a larger-than-actual restoring force resulting in a large overshoot. Ideally, the

LQ controller should increase in the square root of J linearly, but saturation effects limit higher costs. The

LQR controller has noticeably higher costs, but due to saturation, cannot achieve a 1 second settling time for large angles. If the control saturation constraint were removed, then results would continue along the previous trend. LQR is based on asymptotic settling, so there is always finite error while path methods, like

DIDO and LQ, can have negligible error.

For the inverted pendulum, DIDO very closely mimics the LQR controller for cost, with slightly lower cost and error. The unstable nature of this system makes the results of the LQ controller irrelevant due to the large final error. The unexpected feature is that the cost peaks at 120◦. The decrease in cost for further starting angles does not violate Bellman’s principle since the state, which includes velocity, is different than

48 Control Cost, Inverted Pendulum 8

6 sec

√ 4

N m 2

0 0 50 100 150

Final State 5 4 DIDO 3 LQ 2 LQR 1

Settling measure, % 0 0 50 100 150 θ , deg o

Figure 4: For an inverted pendulum, the LQR gives a slightly higher cost then DIDO. The unstable nature of this system makes the results of the LQ controller irrelevant due to the large final error. The cost peaks at 120 degrees due to the large initial torque required as the pendulum is extended sideways.

the one passed through by larger starting angles. When the pendulum begins near sideways, a large initial torque is required to develop momentum. At larger angles, the controller passes through this region at greater speed, and therefore less time, resulting in a slightly lower cost.

The underlying nature of the different controllers can be seen from individual paths. The trajectory for a small staring angle, shown in Figure 5, has a weak nonlinearity for the stable system. All controllers reach the desired state. Path optimal controllers, LQ and DIDO, exploit the self righting nature of the pendulum by applying a positive torque to slow the pendulum. The LQR controller is a feedback controller and the large initial error has a corresponding spike in control. This spike results in an overshoot of the final position, for

◦ which compensating requires more energy. As the nonlinearity increases for θo = 140 (see Figure 6), the overshoot caused by the initial spike is too large to settle in 1 sec. Increasing the path weight, Q, results in larger overshoot compounding the problem while lower Q is insufficient to settle in 1 sec. The LQ controller expects to dampen a much larger restoring force (proportional to θ), resulting in toppling the pendulum the

49 ° Suspended Pendulum for θ =40 o

40 DIDO 20 LQ , deg LQR θ 0

0 0.5 1

0 −50 −100 , deg/sec ˙ θ 0 0.5 1

5 0

u, N−5 m 0 0.5 1 time (sec)

Figure 5: For a suspended pendulum beginning at a small angle, the results from numerical optimization are compared to results using an LQ and LQR controller. With this weak nonlinearity, all controllers reach the desired state. The path-optimal, LQ and DIDO controllers exploit the self righting nature of the pendulum. The LQR controller is a feedback controller and the large initial error has a corresponding spike in control.

other way. DIDO results accurately reach the final state with a very smooth control wave. For the inverted pendulum (see Figures 7 and 8), despite similarity in cost and accuracy, the LQR controller functions very differently from the DIDO controller. The LQR features a stronger initial peak and asymptotically reaches the final state while the DIDO controller is more balanced and directly approaches the final state. DIDO results closely resemble the LQ results, but provide the proper compensation for the system nonlinearities.

4.4.3 Optimality conditions

Results of the DIDO controller are checked for optimality according to Pontryagin’s minimum principle.

Feasibility is checked by the solution to the initial value problem with the given controller. For intra-node values, piecewise-cubic spline interpolation was used. The results were subtracted from the DIDO solution and scaled by the maximum absolute value of the state. These co-state residuals represent stationarity with

−6 respect to the co-states (λ1, λ2). All of these were within 10 of zero. Stationarity with respect to the states,

50 ° Suspended Pendulum for θ =140 o 150 DIDO 100 50 LQ , deg LQR θ 0 −50 0 0.5 1

0 −100 −200 , deg/sec ˙ θ −300 0 0.2 0.4 0.6 0.8 1

5

0 u, N m −5 0 0.2 0.4 0.6 0.8 1 time (sec)

Figure 6: For a suspended pendulum beginning at a large angle, the results from numerical optimization are compared to results using an LQ and LQR controller. With this strong nonlinearity, the LQ controller fails outright while the LQR approaches but does not reach the goal. DIDO is able to accurately reach the final state.

° Inverted Pendulum for θ =20 o

20 DIDO 10 LQ , deg 0 θ −10 LQR 0 0.5 1

0

−20 , deg/sec ˙ θ −40 0 0.2 0.4 0.6 0.8 1

0 −2

u, N−4 m 0 0.2 0.4 0.6 0.8 1 time (sec)

Figure 7: For an inverted pendulum beginning at a small angle, the results from numerical optimization are compared to results using an LQ and LQR controller. Even though the nonlinearity is weak, its unstable nature causes the open-loop LQ controller to not reach the final state. The feedback LQR controller takes a initially aggressive approach compared to the optimized controller.

51 ° Inverted Pendulum for θ =140 o 150 100 DIDO LQ

, deg 50

θ LQR 0 0 0.5 1

0 −100

, deg/sec −200 ˙ θ 0 0.2 0.4 0.6 0.8 1

5 0

u, N−5 m 0 0.2 0.4 0.6 0.8 1 time (sec)

Figure 8: For an inverted pendulum beginning at a large angle, the results from numerical optimization are compared to results using an LQ and LQR controller. The LQ controller fails again, noticeably suffering from saturation effects. The LQR controller approaches the final state asymptotically while DIDO directly approaches the final state.

2 which is calculated from Eqn. 5, is checked by λ˙ 1 λ2glc cos(θ)/(l /3) and λ˙ 2 λ1 and then scaled by − − the maximum absolute value of the co-state derivative. These residuals were constant over each trajectory and scaled by the θo of the trajectory. The residuals for all but three paths were under 4.7% with the largest residual being 9.1%. Stationarity with respect to the control only applies when there are no active constraints

2 on the control. The control optimality residual is 2u + λ2/(mgl /3). When the control constraint is not active, the maximum residual is less than 3.0%. Figure 9 shows a surface of how the control wave changes for the inverted pendulum as a function the starting angle with the lower surface showing how that affected the control optimality residual. All residuals are shown in Figure 10.

4.5 Practical Application: Industrial Robot

A pick-and-place command is a fundamental task for an industrial robot which loads machines or sorts parts. It consists of picking an object up from one pose and then placing it at another pose. A solid model

52 Control vs starting angle

5 0 −5

Control (Nm) 1 0.5 100 150 0 50 Time (sec) Starting Angle (deg) Control Optimality residual

1.5 1 0.5 0

Residual (Nm) 1 0.5 100 150 0 50 Time (sec) Starting Angle (deg)

Figure 9: The top surface shows how the control signal for an inverted pendulum changes as a result of the starting angle. The control saturates around a 90◦ starting angle due to the high initial torque to overcome the most powerful gravity torque. When the control constraint is not active, the optimality residual should be zero. Otherwise, their product should be negative; as is the case with these results.

Residuals, Residuals, suspended pendulum inverted pendulum 1.5 1.5

1 1

0.5 0.5

0 0 0 0.5 1 0 0.5 1 time (sec) time (sec) Suspended, zoomed in Inverted, zoomed in 0.05 0.05 0 0 −0.05 −0.05 −0.1 −0.1

0 0.5 1 0 0.5 1 time (sec) time (sec)

Figure 10: The residuals for all nodes for all starting angles show the optimality conditions are met to a reasonable numeric standard. Control residuals are shown as black boxes, state residuals are shown as green circles and costate residuals are shown as red crosses.

53 Figure 11: The coordinates for the four degree-of-freedom arm are shown on the left. The initial and final poses are in the middle and right respectively.

was built and scaled similar to the outer four linkages of the Motoman SIA-20D [215] and is shown in Figure

11. This model was imported to Simulink and prepared as described in Section 4.49. A 2 sec execution time was chosen along with the initial and final poses of the pick-and-place command, as shown in Figure 11. The same cost function is used, but implemented as a vector

1 Z tf J = u>udτ (17) tf to − to where u R4 with each element being the torque applied to a joint. The traditional trajectory for industrial ∈ robotics is created from a ramp to a constant velocity for each link and this method is used for comparison purposes. The optimized path is shown through intermediate poses in Figure 12 and as signals in Figure 13.

The optimized path is very dissimilar to the traditional path. Rather than transitioning directly to the final state, the arm goes vertical to reduce the control required to hold the arm up (Joint U). Also surprisingly, the upper linkage over rotates and then rotates back (Joint R), presumably to use two motors rather than one so that the square of the control is lowered. The square-root of the cost is reduced from 45.7 Nm to 19.5 Nm, a

57% reduction. 9 Code is available as “Control optimization of a 4DOF arm using DIDO” (www.mathworks.com/matlabcentral/fileexchange/28596), MATLAB Central File Exchange. Retrieved Jun 10, 2012.

54 Figure 12: The path of the arm is shown by poses.

4.6 Conclusion

This chapter shows that the challenging problem of finding optimal trajectories can be solved without specific analysis of the system by using general kinetic solvers and optimal control solvers from a CAD assembly. The methods used by these solvers are well documented and reliable. Use of these programs allows for insight into high-level planning that may not be obvious from the low-level equations. A simple case was shown to test two common types of nonlinearities: geometric and saturation. Results show that the numeric method on the full nonlinear system is able to generate motions with more accuracy and lower cost than optimal paths of simplified system models, such as the optimal linear-quadratic state transition (LQ) and the infinite-time linear-quadratic-regulator (LQR). A practical case is also shown involving a more complex system. The method is able to handle stable and unstable systems, as shown by the results and contrasted to the linearized controllers which can fail on even stable systems.

55 Figure 13: The optimized path is shown on the left compared to the traditional path on the right.

These generalized programs allow for more freedom and innovation in the initial design phase by solving for representative paths without a commitment in analysis to a given design. Mechanisms for reducing the cost can be identified and then incorporated into the system design. This work prepares a method to be used for autonomous optimal trajectory generation on amalgamations of modular robots. Modular SimMechan- ics programs could be created for the robot to automatically compose a model of its current configuration.

However, there is currently still need to check results for feasibility and sensibility. On the other hand, for many cases the method would succeed and allow for higher level tasks, such as cooperative tasks or steps in a process, to focus on the high level behavior, without a need for low level motions to be developed by hand.

56 The good performance obtained by the numerical methods also shows how reasoning and understanding are not required for planning. All that is required is a high-level description of the goal or task and the system. Rather than using heuristics to determine where to search for candidate solutions, this method simply uses perturbations to refine a solution. These perturbations can be related to local goal babbling, though incorporated by different methods. By using principles from numeric optimization, a very efficient selection of trials are tested. Only evaluations of the motions are needed, as solved by SimMechanics, but results could extend to other systems without an analytical representation, such as tests of a physical system. Because of the mathematical soundness of modern numeric methods, they can represent a near optimal comparison when the discretization of time is considered. Methods to increase the temporal resolution are presented in Chapter

VI.

57 CHAPTER V

INVERSE FUNCTION CLUSTERING

Finding optimal inputs for a multiple-input, single-output system is taxing for a system operator. Population-based optimization is used to create sets of functions that produce a locally-optimal input based on a desired output. An operator or a high-level planner could use one of these inverse functions in real time. For the optimization, each agent in the population uses the cost and output gradients to take steps lowering the cost while maintaining their current output. When an agent reaches an optimal input for its current output, additional agents are generated in the output gradient directions. The new agents then settle to the local optima for their new output values. The set of associated optimal points forms an inverse function from a desired output to an optimal input, via spline interpolation. In this manner, multiple locally-optimal functions can be created. These functions are naturally clustered in input and output spaces allowing for a continuous inverse function. The operator selects the best cluster over the anticipated range of desired outputs and adjusts the set point (desired output) while maintaining optimality. This reduces the demand from controlling multiple inputs, to controlling a single set point with no loss in performance. Results are demonstrated on a sample set of functions and on a robot control problem.

Nomenclature

x Input or design variable, Rn, with n > 1 ∈ Ω Domain of x

th bi,l bi,u Lower and upper bounds of Ω in the i dimension y = f(x) Output function, R ∈ yd(t) Desired output value

J Cost function, R ∈ x∗ Locally-optimal value of x based on neighboring points with identical value of f(x) 58 th hk(y) The k optimal-input function, k 1,...,K ∈ { }

Υk Domain of y for hk(y)

C1 The set of continuously differentiable functions

f(x) Gradient of function f with respect to x ∇ f,ˆ Jˆ Length limited versions of f and J, respectively, Rn ∇ ∇ ∈ δy[i], δJ[i] Actual change in output and cost, respectively, when going to step i

n ∆J Cost reduction step, R ∈

n ∆Y Step for increasing output , R ∈

βJ , βy, β∆ Saturation length for J, y and ∆J , R ∇ ∇ ∈

βδ Step length reduction factor, (0, 1) ∈ σ Armijo rule factor, (0, 1) ∈ sat(x) if x 1, sat(x) = x; otherwise sat(x) = x/ x k k ≤ k k

γJ Maximum step length for cost, R ∈

γy Maximum step length for output, R ∈

5.1 Introduction

Constrained optimization is the process of finding the input that satisfies required conditions and mini- mizes a given cost function. Robust optimization considers if there are variations in the constraint or cost functions. It may use likelihoods to minimize the expected cost, or it may use bounds to limit constraint violation or the maximum cost. An example is minimizing the weight of a bracket based on the maximum load. If the true maximal load is less than the designed maximum load, then the design may no longer be optimal. In control applications, inputs can often be easily changed so the true optimal can be achieved.

Optimization inputs could be knots describing a control waveform, used by colocation methods for optimal

59 control [199, 200, 201] or control parameters for gain scheduling. An operator may set the speed of a walk- ing robot while using the minimum energy gait. Existing work has solved for individual optimal points or

Pareto-optimal parameters and then interpolated between them for novel speeds [216, 217]. The danger of this method is that the Pareto-optimal fronts are not guaranteed to be continuous in the input space which can result in discontinuities when transitioning between Pareto-optimal points. This work finds a path of contiguous, locally-optimal inputs guaranteeing that transitions are smooth for a smoothly changing desired output.

A number of population based optimization methods exist, such as: genetic algorithms [218, 219], ant colony optimization [220], bacteria foraging [221] and particle swarms [222]. Each population method uses a set of agents that interact with the environment or other agents to search a large space. The essence of a swarm algorithm is that local information and interaction are used to create global behavior. For this chapter, the local information is contained by gradients and the agents act differently based on their current stage.

Explicit or implicit niching can be added so that subpopulations focus on distinct optima for multimodal global optimization [223], such as glowworm optimization [224], sharing and clearing in genetic algorithms

[225] and partially connected neighborhoods in particle swarms [226]. By using a population, the input space is efficiently searched without the need for prior knowledge. No methods presenting a multimodal approach to creating inverse, optimal functions are present in the available literature.

From a developmental perspective, inverse functions represent the ability to master a given task. Having multiple inverse functions corresponds with the mixture of experts theory since alternative solutions are retained and can be selected based on other factors. Though this chapter deals with a set of parameters,

Chapters IV and VI show how a continuous function can be represented by a set of parameters. Chapter VI additionally shows how sets of parameters can be extended to continuous functions in the limit.

The chapter begins with the formal problem statement (Section 5.2) followed by a detailed description of the algorithm (Section 5.3). Two types of examples have been tested. The first in Subsection 5.4.1 looks at

60 a variety of output and cost functions to examine the behavior of the agents and path of inverse functions.

Next, in Subsection 5.4.2, the method is applied to optimizing a robotic arm’s poses to increase precision.

The chapter concludes with a discussion of results.

5.2 Problem Statement

∗ ∗ A set of K functions, x = hk(yd) with k 1,...,K , is desired. Each function hk produces x ∈ { } ∈

n ∗ ∗ Ω R as a function of yd Υk R such that yd = f(x ) and x : f(x) = yd in a neighborhood, J(x ) ⊂ ∈ ⊂ ∀ ≤

∗ J(x); meaning x minimizes J for neighborhood points with identical output values. The functions hk(y) should be continuous over their domain, Υk, which is a non-empty, open set. The closed set Ω represents the allowable input values and is defined by bounds on each individual dimension, i.e. bi,l xi bi,u. In ≤ ≤ order to have a well defined gradient, f(x) and J(x) should be in C1 and bounded over Ω. For the Armijo condition on step size, J(x) should also be in C2. The functions J(x) and f(x) should not be constant for optimization and inversion, respectively.

The output function, f(x), represents the mapping from the input to an output. The output represents a desired condition which is selected on-the-fly by an operator. The function J(x) represents the mapping of the input to a cost. In addition to f and J, the gradients f and J, should be available for evaluation. The ∇ ∇ cost and output should be able to be represented solely as a function of x, meaning that no outside parameter influences either. This means that with a proper adjustment of x, then y and J can be manipulated. After

∗ selecting an inverse function, hk, then y can be controlled to be yd via x , provided yd lies in Υk. Each of the K functions represents the optimal inverse function for a region, so Ω is divided into separate clusters representing disjoint, locally-optimal solutions. Selection of an hk would be based on considerations of Υk and the values of J(hk(yd)) over Υk.

The technique is primarily directed to multimodal optimizations over a bounded set. If the problem is convex or obvious, then this method does not exploit that knowledge. Even though it would come to

61 the same solution, it would search and therefore take more function evaluations than needed. Like most optimization methods, scaling can significantly affect performance and the relative scale is assumed to be known and reasonably constant. This method finds static optima in that it does not consider the rate at which

∗ yd changes or the corresponding rate of change of x . Once the functions hk(y) are found, the limit on the rate of change of yd can be found based on the limit of the rate of change of x, or vice versa. The inverse functions are not intended for robustness for rapidly fluctuating or unknown yd. This method is well suited for developing a human-control interface, as shown in Section 5.4.2, where the operator would select hk based on the anticipated range of yd, and then vary the desired yd with hk automatically providing the locally-optimal x∗.

5.3 Algorithm

There are two major phases of this algorithm. The first is the optimization while the second is the oper- ation. To begin optimization, particles are distributed across Ω and begin constrained gradient descent. As particles find local, constrained optima, additional particles are created in the f(x) directions and then ±∇

∗ ∗ optimized to create a set of (x , f(x )) pairs which are used to create an hk(y). The final phase is the execu- tion phase, where an hk is selected and used online as yd is obtained. Calculations in the execution phase are computationally simple for real-time processing. The overview is shown in Figure 14. This method abstracts the low-level behavior of finding optimal points in n-dimension space, allowing the operator to simply adjust a set point in real-time.

5.3.1 Optimization

The principle behind swarm optimization, is that a population of agents using simple, local rules can efficiently locate an optimal point. Traditional particle swarm optimizations are concerned with exploring to

find the global optimum. In this work, local optima are desired and the effect of moving an agent is known by gradients, so mechanisms like particle velocity and local best are not needed for good performance. Agents

62 Optimization Execution

Initialize Move Agents: Set of Operator Select inverse Population Lower J(x), hk(yd)’s function, hk(yd) Maintain f(x)

Get y , Move to Check for removal or Form d Evaluate h new x* settling conditions Cluster k

Figure 14: There are two phases of the algorithm. The clusters of optima are found forming the functions hk(y). The set of hk’s are used to adapt to changing conditions, such as the operator set point, yd.

locate the optimal point along their current output contour by stepping opposite the component of the cost gradient, J, that is orthogonal to the output gradient, f. Using the Armijo condition on step length, −∇ ∇ the change in output can be bounded and the sequence of the cost will converge to a stationary point. The associated proofs are shown in Addendum 5.6.1 of this chapter.

To ensure that the step size is sufficiently small and prevent ill-defined directions for very small gradients, these gradients are saturated by

Jˆ(x) =sat ( J(x)/βJ ) (18) ∇

fˆ(x) =sat ( f(x)/βy) (19) ∇ where sat(x) returns x for the case x 1 and x/ x otherwise. When x 1, the vector magnitude k k ≤ k k k k ≤ diminishes with the gradient magnitude; while in the other case, the magnitude is limited to a unit length. The value of βJ is chosen so that step length is diminished as an optimum point is approached. The value of βy is chosen so that when f gets small, meaning the output is relatively constant, the length of fˆ diminishes. k∇ k The step direction vector is given by

 >  ∆J (x) = Jˆ(x) + Jˆ(x) fˆ(x) fˆ(x). (20) −

63 When fˆ is a unit magnitude ( f(x) βy), then ∆J is the component of Jˆ orthogonal to f. As fˆ ∇ ≥ − ∇ diminishes in magnitude to zero, then ∆J extends to Jˆ for a greater decrease in J with only a minor change

> in y. The maximum length of ∆J is a unit length and occurs when J(x) βJ and J f = 0; otherwise ∇ ≥ ∇ ∇ the length of ∆J is J /βJ times the sine of the angle between the vectors. k∇ k

The update rule for the ith step of the jth agent is given by

m xj[i + 1] =xj[i] + γJ (βδ) sat (∆J (xj[i])/β∆) (21) where m is the lowest nonnegative integer that satisfies the Armijo condition:

m > J(xj[i + 1]) J(xj[i]) σ γJ (βδ) sat (∆J (xj[i])/β∆) f(xj[i]). (22) − ≤ ∇

This condition requires that the cost decrease by at least a fraction, σ, of expected decrease based on the local gradient and the step. If xj[i + 1] were to step past the lowest cost in the step direction, the cost would increase. By successively reducing the step by a factor of βδ, a limit cycle is prevented where the reduction in cost goes to zero, though the step length does not. The value of γJ should be small enough to capture relevant features of f(x) and J(x). The saturation length β∆ specifies the threshold when ∆J should be reduced due to either small cost gradient or alignment of the two gradients.

If this step would cause xj[i+1] to fall outside Ω, then the magnitude of the step is reduced until xj[i+1] will lie on the boundary of Ω; then an additional step is taken with the domain-limited components of Jˆ and fˆ reduced to zero. The step of Eqn. 21 without the domain-limited components is repeated until no boundary violation occurs. Agents that approach within a threshold of a cluster or another agent are expected to converge to the same cluster, so one of the two agents is eliminated.

If xj[i + 1] xj[i] ∆min for an agent, then the first-order, constrained optimality conditions are k − k ≤ satisfied within a bound10. The location of the agent can be described as having no directions that maintain

10First order optimality conditions provide a necessary but not sufficient optimality condition. Constrained, maximum also satisfy these conditions, but are unlikely due to the unstable equilibrium caused by the space of the first even non-zero partial derivative, in the null space of the output gradient, after an odd partial derivative being negative definite. The use of saturation and optimality thresholds does exacerbate this issue however.

64 the current output and reduce the cost significantly while remaining in Ω. The jth agent is removed from the

∗ population and a new cluster is built about the location of this settled agent, xj . This location is used as the

first member, xk,0, where k 1 is the number of previously created clusters. A cluster is grown by creating − agents in the positive and negative directions of f(x). Agents are created by ∇

∆Y (x) =γyfˆ(x) (23)

xk,p+1[0] =xk,p + ∆Y (xk,p) (24)

xk,q−1[0] =xk,q ∆Y (xk,q) (25) −

where p and q are the previous largest and smallest indices of the cluster. Both xk,p+1 and xk,q−1 are then updated according to Eqn. 21 until settled. This progression is shown for a few steps in Figure 15. If an agent created from Eqn.’s 24 or 25 settles neither too close nor too far and the change of f(x) is in the correct direction, then it is added as a data point to the kth cluster and Eqn.’s 24-25 are used to continue the cluster formation process. If the agent fails to settle in a given number of iterations, then the new point has not been determined to be optimal and this direction of the cluster is terminated. If the agent settles too near or too far, then the cluster likely does not extend in Υ in that direction and the cluster is terminated in that direction.

By requiring a non-zero change in J(x∗) and x∗, the inverse function is guaranteed to have a finite Lipschitz constant. Clusters that contain only one point are discarded since they have an empty Υ domain. The set of

th points in the k cluster are used to form hk. The process of creating the inverse function, Eqn.’s 21, 24 and

25, is executed while steps of the general population are suspended. Since a cluster is known to exist there, it is prudent to mark its extents before general exploration.

5.3.2 Execution and cluster organization

Since the output and cost are scalars, and contain the information important to the operator, they offer a convenient representation to examine and test clusters. Plots described in this section will be shown for the following examples. By plotting the cost vs the output for the set of clusters, the operator can focus on the

65 th th Figure 15: The k cluster is formed about the k settled point, xk,0. New agents are added in the positive and negative output directions by Eqn.’s 24 and 25, shown by ’s at [0]. New agents step by Eqn. 21 and eventually settle to locations shown by ’s, which are added to the× cluster provided they meet certain criteria. The process of adding and updating agents◦ is repeated until that direction is terminated. The set of settled points are then used form hk via interpolation as shown by the dashed line.

range of values in the output’s domain that are of interest and then select the cluster with the lowest cost, as averaged based on the operator’s judgement. To compare a set of clusters in the x domain, each dimension of x can be plotted against y. This allows access to the low-level behavior of the optimal function so the operator can identify potential problems not incorporated in the optimization, transitions between clusters or gain insight into optimality tradeoffs.

In order to construct hk(yd), a piecewise, n dimension, cubic Hermite spline is used that preserves the local monotonicity of the data [227]. This cubic interpolation provides a function that is C1 and will capture sharp features of the clusters, such as those generated by interactions with the boundary of Ω, without the oscillation that smoother interpolations would generate.

Optimality and accuracy of a cluster can easily be verified via a Monte Carlo simulation. Points along the cluster are generated by hk and the actual cost and output are calculated. The difference in predicted output to actual output suggests the performance of the cluster regarding accuracy. Perturbations from the cluster

66 nodes are randomly generated based on the scale of the envelope of the cluster in Ω. The cost and output of each of these neighboring points are calculated. A plot of cost versus output should have the points from hk along the lower bound of the data; meaning for any given output in Υk, all the perturbations in the input space result in a higher cost. However, if perturbations leave the locally-optimal defined neighborhood of the cluster, then it may be possible for neighboring points to result in a lower cost but these points would not offer a continuous function connecting it to the other points in hk.

5.4 Results

Two different test are presented. First, tests were conducted across a set of two dimensional problems and analyzed graphically. At output strict extrema, the point is optimal because no neighboring points have the same output. Clusters extend from these extrema according to the local cost contours. Second, the pose of two robotic arms are optimized to improve precision of the radial distance to the tool tip. Precision relates to the sensitivity of distance to different joint angles. By reducing the sensitivity of the distance to joint angles, a higher precision is obtained for the same joint angle resolution of the equipment.

5.4.1 2-D test problems

Four functions were tested in all permutation as J(x) and f(x). They were

1. Multimodal Gaussian11

2 2 2. Quadratic: z = x + (x2 0.2) 1 −

2 3. Linear-Quadratic: z = 1.25(x1 0.2)x − 2

4. Periodic: z = x1 sin(πx2)

11Equation for Multimodal Gaussian is 2 2 2 2 2 2 2 −4x1−(2x2+1) −(2x1+1) −4x2 3 5 −4x1−4x2 z = [3(1 − 2x1) e − e /3 − 10(2x1/5 − 8x1 − 32x2)e ]/7

67 This set included multimodal functions, broad flat sections, saddle points and some obvious solutions. The input set is Ω: 1 x1 1, 1 x2 1.5 . These functions are scaled to provide gradients typically {− ≤ ≤ − ≤ ≤ } about a unit length. One set of optimization parameters was chosen and performed well on all functions, but parametric influence was not studied. Initial population size was 60, with a maximum number of population iterations of 400 and 150 steps for a cluster point. These limits were rarely reached because the population size quickly decreased. New cluster tips were restricted to be at least 7.5e-6 away from the current tip but within 0.1 of it. Agents within 0.15 of another agent or cluster triggered removal of an agent. Step parameters were βJ = 0.05, βy = 0.025, β∆ = 0.5, γJ = 0.05, γy = 0.03, βδ = 0.88, σ = 0.02.

Two illustrative examples are shown. The first has the quadratic function for the cost with the periodic

S function for the output. Results are shown in Ω in Figure 16 and in Υk in Figure 17. The next example has the linear quadratic function for the cost with the multimodal Gaussian function as the output. Results

S are shown in Ω in Figure 18 and in Υk in Figure 19. As would be expected, the clusters have tips near extremum in the output. Clusters also formed where the output becomes flat, resulting in a near discontinuity of hk; or where the cost function becomes flat, resulting in local optimum. Other clusters are formed along the domain boundary until a valley in the cost function is reached. For a high dimension system, an operator

S would look across the domain of Υk, first for the cost, and then for x to determine cluster nearness if a transition must be made.

The test of a cluster from Figure 16 is shown in Figure 20. Points in the cluster were accurate to 1% of the full range of yd for all permutations of functions. When the cost of test points were compared to randomly generated neighboring points in Ω, the points from hk would consistently be on the lower boundary for a given y; except when the neighboring points would extend into the optimality watershed of another cluster. This is still consistent with the points being locally optimal, since the size of the neighborhood used to generate the random points was not limited based on the mathematically defined neighborhood used for optimality determination. Neighboring points were limited to Ω however.

68 Figure 16: The clusters formed with the quadratic cost function and the periodic output function are shown. Different clusters are shown by a different color. Each cluster represents a locally-optimal point for the output contour.

Figure 17: The clusters for Figure 16 are shown as yd varies. Each dimension of the set of hk’s is shown along with the associated cost. The light green cluster is obviously the preferred cluster covering all of yd and always having the lowest cost.

69 Figure 18: The clusters formed with the linear-quadratic cost function and the multimodal Gaussian as the output function are shown. Different clusters are shown by a different color. Each cluster represents a locally- optimal point for the output contour.

Figure 19: The clusters for Figure 18 are shown as yd varies. Each dimension of the set of hk’s is shown along with the associated cost. The cost chart can be used to select an appropriate cluster. If one cluster cannot be found, then the upper charts can be used to find clusters that are near, such as the red and yellow cluster going from yd < 0 to yd > 0 at x1 = 1 and x2 = 1. − −

70 Figure 20: A cluster from Figure 16 is tested for optimality and accuracy. Accuracy is checked by generating ∗ test points (in dark blue) on the cluster (in red) for random values of yd and comparing f(x ) yd to zero. ∗ − Test points in the neighborhood of x (in light green) are compared by cost in Υk. Cluster points are always lower then the test-points except in the case where the test points extend away from the cluster and into another cluster’s optimality neighborhood.

Results of these test problems show that Eqn.’s 21, 24 and 25, do capture the relevant features necessary for a locally-optimal, continuous inverse function. The constrained gradient step presented offers sufficient accuracy without the need for second order approximations. The criteria for removing agents allows the large population to be quickly reduced, since most agents converge as they approached local optimum. As a result, the method offers an efficient method for segmenting a large search space into inverse functions.

71 5.4.2 Robotic arm control

Robotic arms with revolute joints have geometric nonlinearities so multiple poses exist with identical positions of the tool, even for non-redundant systems. The end of the tool will be considered the tip for this work. Only location will be considered, not orientation. Redundant systems offer more joints then degrees of freedom, so there is automatically multiple poses with identical tip locations. The Jacobian of the tool tip location represents the sensitivity of the tool tip’s position in each direction to each joint’s angle and is a function of the current pose. Due to practical limitations, each joint has a limited angular resolution, so larger values in the Jacobian result in lower precision. Because of the trigonometric nonlinearities, optimizing the pose to increase precision is a multimodal problem.

This subsection examines increasing the precision of the radial distance for robotic arms with 4 and 7 revolute joints. The geometry of linkages is based on the Yaskawa Motoman HP-3 [228] and IA-20 [215] with a 101 mm tool length. These problems reduce to a 2 dimensional and 3 dimensional optimization. The optimization for the 7 revolute jointed robot, the IA-20, includes revolute joints axis aligned with link axis, termed twist joints. The radial distance is less sensitive to these twist joints, creating poor scaling, yet the algorithm preforms well.

Planar robot

The arm considered can be viewed as a planar linkage, with the plane being allowed to rotate. The actual

HP-3 also includes two addition twist joints, but those are held at a zero angle so that the poses can be viewed in a 2D plane. The layout of the joints is shown in Figure 21. For analysis, an angle of zero represents the adjoining links being collinear. The distance to the tip is taken from the robot’s base, located at the axis of

θ0. By changing θ0, any planar point at that radius can be selected. By also rotating the robot’s plane, any cylindrical point can be chosen. For simplicity, self intersections of the robot are not considered since they would violate the domain assumptions given in the problem statement.

72 Figure 21: The planar robot has three revolute joints. Additionally, the base rotates so that the robot’s plane can be arbitrarily chosen.

The output is the square of the distance to the tool tip, found by the dot product of the tool tip position with itself.

y = R>R

2 2 2 = L1 + L2 + L3 + 2L1L2 cos(θ1) + 2L1L3 cos(θ1 + θ2) + 2L2L3 cos(θ2) (26)

with R being the vector to the tip and L1 =0.290 m, L2 =0.312 m and L3 =0.192 m being the distances between the θ0 axis, the θ1 axis, the θ2 axis and the tool tip, respectively. The cost is the magnitude of the gradient of y squared, found by a dot product, and scaled by a factor of ten,

> J = 10 R>R R>R ∇ ∇

2 2 2 2 = 10 L1(L2 sin(θ1) + L3 sin(θ1 + θ2)) + L3(L2 sin(θ2) + L1 sin(θ1 + θ2)) . (27) and the actual radial precision is

s J P = dθ (28) 10y

73 Figure 22: The cost function was scaled so that the magnitude of its gradient was similar to the output function’s gradient.

◦ where dθ is the angular resolution of all the joints. The joint limits were 1.75 rad ( 100 ) θ1 − − ≤ ≤

◦ ◦ ◦ 2.83 rad (162 ) and 2.18 rad ( 125 ) θ2 2.18 rad (125 ). − − ≤ ≤

Note that θ0 does not appear in Eqn 26 or 27 resulting in a 2 dimensional optimization. This is because it causes the tool tip to move perpendicular to the radial position, so it has no effect on the radial distance and can be removed from the optimization. The gradients of each function were found algebraically for an exact evaluation. The scaling of J by 10 was chosen so that the distributions for magnitudes of each gradient were similar, see Figure 22, which happened to be on the order of 0.2. Step parameters were βJ = 0.025,

βy = 0.05, β∆ = 0.5, γJ = 0.025, γy = 0.08, βδ = 0.88, σ = 0.4.

The output and cost functions are shown in Figure 23. The output contours are fairly elliptical, resulting in low costs along the major axis and high cost along its minor axis. This simplification breaks down however as the extents of the domain are approached.

74 Figure 23: The output has mostly elliptical contours resulting in low cost along the major axis and higher costs along the minor axis.

Five clusters were formed, as shown in Figure 24 and Figure 25. Two pairs are reflections (‘+’ with ‘ ’ ◦ and ‘ ’ with ‘ ’), though ‘+’ and ‘ ’ have larger domains due to the asymmetric range of θ1. Due to the  4  larger positive range of θ1, a fifth cluster (‘ ’) is formed that outperforms the others for a small range of the ×

2 output (0.020 < yd 0.075 m ). Testing the of the accuracy and the optimality for the ‘+’ and ‘ ’ clusters ≤ × can be seen in Figure 26.

Results have been implemented on a real robot. Rather than doing real time control, a circle is traced to provide consistent comparison. Results generalize to other path shapes that lie in the same range, only differing in the distribution of results. The first step is to select a cluster to use for the inverse function. The circle was positioned 0.3 m in front of the base with a radius of 0.1 m. This corresponds to yd going between

0.04 and 0.16 m2. A set of candidate inverse functions is formed based on an expected range for the output.

Clusters ‘ ’, ‘ ’ and ‘+’ do not cover this range, leaving only the ‘ ’ and ‘ ’ clusters. From the set of valid × ◦  4 candidates, one is chosen based on preference. The ‘ ’ cluster approaches the circle from below while ‘ ’  4 approaches it from the top, and both have the same cost. Depending on the application, these clusters provide two distinct solutions where one may be preferred above the other. Note that joint angle limits may prevent some points from being reached, so the reachable envelope should also be considered. For example, if the

75 Figure 24: The inverse functions are traced in the joint angle space and distinguished by marker type.

Figure 25: Once the application is known, the range of distances will limit the choice of clusters. The operator could also consider the distribution of the cost or other factors, such as if a positive or negative θ1 is desirable. Marker codes match results in Figure 24.

76 (a) Inverse function ‘+’

(b) Inverse function ‘×’

Figure 26: Points interpolated for each inverse function of the HP-3 show the expected output with surround- ing points showing higher costs.

77 circle were lower, the ‘ ’ inverse function would still be valid while the ‘ ’ inverse function would exceed 4  the limits of θ0 (which does not appear in the optimization).

The HP-3 has an existing controller that accepts a desired set of joint angles. For this example, the desired

3D point is known based on the circle’s coordinates. For free-form operation, the operator would specify either the direction and distance or a 3D point. Cartesian points are converted to a direction and distance. The base angle (not labeled) is determined by the plane containing the point and the base axis. From the distance of the θ0 axis to the point, θ1 and θ2 are calculated by hk for the chosen cluster. The choice of θ1 and θ2 will result in a planar angle, and θ0 will make up the balance to the desired planar angle. Calculation of θ1 and θ2 by hk uses interpolation and the other two joints are calculated by closed-form trigonometric formula. These are basic calculations which are feasible for most real-time systems. The calculated joint angles are then sent to the robot at the desired time, or for a known path can be precomputed and sent as a trajectory.

Results of the ‘ ’ cluster are compared to a standard configuration which keeps the tool vertical. Portions 4 of the path are shown by poses in Figure 27 and Figure 28 for the standard and optimal path, respectively.

Results in Figure 29 show the cost measure decreasing by half of the original value corresponding to a 25% increase in precision as evaluated by Eqn. 26-28. The scaling given in Figure 29 for precision assumes a 14 bit encoder over 2π radians, resulting in a joint resolution of 0.088 degrees/pulse. Results could be directly implemented even across large volumes since the joints are controlled directly, opposed to using additional precision stages which would have limited range and require precision sensors for feedback.

Complex robot

To consider a more complex application, the Motoman IA-20 robotic arm was used. This arm has 7 degrees of freedom, but the first three axes intersect, making them irrelevant to the distance to the tip; and the final axis twists the tool and is also irrelevant. The result is a 3 dimensional optimization. The layout of the joints is shown in Figure 30. The distance to the tip is taken from the intersection of the θ0, θ1 and θ2

78 (a) 0% (b) 25% (c) 50%

Figure 27: Poses of the HP-3 as it traces a circle using standard planning. For reference, a circle showing the entire path is added.

(a) 0% (b) 25% (c) 50%

Figure 28: Poses of the HP-3 as it traces a circle using an optimal inverse function. For reference, the circle showing the entire path is added.

Figure 29: Use of the optimal inverse function gave about a 25% improvement in precision over the standard pose. A joint resolution of 0.088 deg/pulse is assumed to provide a magnitude to the results.

79 axes. Similar to the planar robot, any spherical point can be reached by selection of θ0 and θ1. In addition, if θ1 = 0, θ2 allows additional freedom in selecting the plane of tool approach. Again, self intersections are 6 not considered. The same output and cost measures are used:

2 2 2 y =4L1 + 4L3 + L5 + 8L1L3 cos(θ3) + 4L3L5 cos(θ5)

+ 4L1L5 cos(θ3) cos(θ5) 4L1L5 sin(θ3) cos(θ4) sin(θ5) (29) −

 2 2 J =10 4L5 (L3 sin(θ5) + L1 cos(θ3) sin(θ5) + L1 sin(θ3) cos(θ4) cos(θ5))

2 2 + 4L1 (2L3 sin(θ3) + L5 sin(θ3) cos(θ5) + L5 cos(θ3) cos(θ4) sin(θ5))

2 2 2  + 4L1L5 (sin(θ3) sin(θ4) sin(θ5)) (30)

with L1 =0.260 m and L3 =0.195 m being half the distance from the the θ1 axis, the θ3 axis and the θ5 axis, respectively, and L5 =0.332 m being the distance from the θ5 axis and the tool tip. Though the gradient samples were roughly twice the size as the previous problem, the same optimization parameters were used to show robustness of the algorithm to parameter tuning. Isosurfaces showed that θ4 only had a minor influence, thus creating poor scaling for the problem, see Figure 31.

Due to the additional complexity and symmetry in the system, more clusters were formed. Looking at the inverse functions in Figure 32 provides some insight into the system. The last joint is turned in as much as possible and only extends to increase the possible reach. At lower distances, θ4 rotates the tip link to change the distance and then θ3 will extend with a minor change in θ4. The slight differences in the cluster paths in

θ3 may appear as poor resolution showing redundant paths, but in fact each is distinctly locally optimal. The slight differences are due to left and right handed orientations influencing sensitivity. Testing for two of these clusters can be seen in Figure 33.

80 Figure 30: The complex robot has seven revolute joints though only θ3, θ4 and θ5 appear in the optimization.

Figure 31: The cost and output functions are ellipsoidal in θ3 and θ5, but θ4 has a lesser effect shown by the minor changes along the θ4 direction.

81 Figure 32: Due to additional symmetries of the IA-20, twice as many clusters are formed for the 3 dimensional optimization.

82 (a) IA-20 inverse function example #1

(b) IA-20 inverse function example #2

Figure 33: Points interpolated from the inverse function (dark blue) show the expected output and surrounding points (light green) show higher costs.

83 5.5 Conclusion

Results show that by using a population search and clusters, locally-optimal inverse functions can be found. The results show that practical accuracy can be obtained and the algorithm can work on a diverse set of functions. The algorithm scales well in the number of dimensions in Ω for optimization and especially well in execution where only spline interpolation is required. The inverse functions are Lipschitz continuous, so there exists a small enough rate of change of the output that satisfies limitations on input rates of change.

Unfortunately, there are no other published results for multimodal, inverse, optimal functions to compare against, since most work optimizes for a specific operating point or allows discontinuous inverse functions.

The advantage of this work is that the optimal input can the be calculated in real-time once the current desired output is determined, such as by a human operator monitoring the system. With this tool the operator is relieved from performing a high dimension optimization in real-time but simply needs to adjusting a scalar after selecting the appropriate inverse function.

There are limitations with the current algorithm that deserve a second mention. First, the current work forms Ω as a hyper cube, so that boundary interactions are regular. Sets with complicated boundaries may be able to be transformed into a hyper cube; but if a transforms exists, it may not be obvious or introduce disproportionate scaling. Otherwise, a subset of the complicated domain could be used. The algorithm could be implemented on multiple subsets; and adjoining subsets would each have a cluster that shares a common point on the adjoining boundary. Second, only scalar outputs are considered which limits the number of applications. The scalar output allows for natural organization of the cluster. Only a single sweep of output values is needed to cover the domain of the inverse function. The algorithm is conceptually possible to extend to higher dimensions, but would require additional investigation of the best interpolation, the process of finding nodes to cover the inverse function domain and methods for dealing with degenerate cases. Scalability in the output dimension is also a concern, since it may require an exponential growth in

84 complexity. Despite these limitations, this algorithm extends the literature with a novel, general-purpose algorithm that finds continuous, locally-optimal inverse functions.

Acknowledgements

The authors are grateful to Temesguen Messay-Kebede for his experience and taking the time to run the paths on the HP-3.

5.6 Addendum

5.6.1 Convergence proof

Proof of convergence is established by showing that the cost is non-increasing, the change in output is

finite for a finite sequence, that a non-zero step length exists that mets the Armijo condition and that J for any infinite sequence will diverge to . Proofs will be shown based on the jth agent. −∞

The change in cost at a step is approximated by the Taylor series expansion:

> m J(xj[i + 1]) J(xj[i]) = δJj[i] = J γJ (βδ) sat(∆J (xj[i])/β∆) + H.O.T. (31) − ∇ where the higher order terms will be neglected from this point on. The sat function can be rewritten to give

> m ∆J (xj[i]) δJj[i] = J γJ (βδ) (32) ∇ max( ∆J (xj[i]) , β∆) k k

Additional substitutions can be made noting that J = max( J , βJ )Jˆ, ∇ k∇ k

m γJ (βδ)  >  >   >  δJj[i] = J Jˆ + Jˆ fˆ J fˆ max( ∆J , β∆) −∇ ∇ k k m  2 γJ (βδ) >  >  = max( J , βJ )Jˆ Jˆ + max( J , βJ ) Jˆ fˆ max( ∆J , β∆) − k∇ k k∇ k k k m  2 γJ (βδ) max( J , βJ )   = k∇ k Jˆ 2 + Jˆ>fˆ (33) max( ∆J , β∆) −k k k k

The inner product, Jˆ>fˆcan be expressed as Jˆ fˆ cos(θ) where θ is the angle between the vectors. Hence k k k k

m ˆ 2 γJ (βδ) J max( J , βJ )  2 2  δJj[i] = k k k∇ k fˆ cos (θ) 1 (34) max( ∆J , β∆) k k − k k 85 where all the terms outside of the parenthesis are greater or equal to zero and the quantity in the parenthesis is less than or equal to zero. The only case were the change in cost would be zero is if J = 0, fˆ cos(θ) = k∇ k k k 1 or m = . For the first case, J = 0, the point satisfies first order optimality regardless of the ± ∞ k∇ k constraint condition. In the second case, fˆ cos(θ) = 1 meaning the cost gradient is parallel to the output k k ± gradient, so there is no direction that satisfies the equality constraint while lowering the cost. It will be shown that there always exists a finite m that satisfies the Armijo condition. Hence any point that is converged to meets first order constrained optimality conditions.

The step rule also bounds the change in output for a step. Following the same approach,

δy[i] =y(xj[i + 1]) y(xj[i]) −

> m = f γJ (βδ) sat(∆J (xj[i])/β∆) ∇

> ∆J (xj[i]) = f γJ ∇ max( ∆J (xj[i]) , β∆) k k m γJ (βδ)  >  >   >  = max( f , βy)fˆ Jˆ + max( f , βy) Jˆ fˆ fˆ fˆ max( ∆J , β∆) − k∇ k k∇ k k k m γJ (βδ) max( f , βy)     = k∇ k Jˆ>fˆ fˆ 2 1 . (35) max( ∆J , β∆) k k − k k   Here the change per step is bounded given a Lipschitz continuity condition on f. Note that the sign of Jˆ>fˆ is based on the orientation of the two vectors, so the output will consistently decrease if the vectors are in the same direction and increase otherwise. Unfortunately, the bound cannot be arbitrarily limited, but is based on the specific problem’s topology.

Given that J is in C2, the derivative in J along a given direction is also continuous, with its rate of change

m > bounded. The expected change from the linear approximation is γJ (βδ) sat (∆J (xj[i])/β∆) J(xj[i]). ∇ A positive change in the derivative results in the cost decreasing less than what would be expected. Two integrations of the second derivative measures the difference of expected to actual decrease. The rate of change is limited by a bound on the second derivative, and step length is limited, so the integrals are always

finite. By decreasing the limits of integration, there exists some non-zero interval so that double integration

86 is less than the bound specified by σ (assuming that the gradient is non-zero, in which case the step length would be zero to begin with). Hence, there is exists a finite m so that the actual cost decrease is within a factor σ of the linear approximation.

The benefit of using the Armijo condition is that the sequence will either converge to a single point (as recently described), or J will diverge to . If the sequence does not converge to a single point, then the step −∞ size will be larger than 0. To show a contradiction, assume that the cost converges to a finite value instead of

. From the Armijo condition, Eq. 22, and because the sequence is non-increasing, J(xj[i+1]) J(xj[i]) −∞ − must equal zero, implying that J(xj[i])∆J (xj[i]) = 0. If J(xj[i]) = 0, then the step size would go ∇ k∇ k to zero contradicting that a limit cycle was reached. Otherwise, the step would be perpendicular to the cost gradient which is impossible due how the step direction is determined in Eq. 20.

87 CHAPTER VI

INCREASING RESOLUTION

An algorithm is presented for autonomous motion development with unbounded waveform res- olution. Rather than a single optimization in a very large space, a memory is built to support incremental improvements so complexity is balanced by experience. Analogously, human devel- opment manages complexity by limiting it during initial learning stages. Motions are represented by cubic spline interpolation, so the development technique applies broadly to function optimiza- tion. Adding a node to the splines allows all previous memory samples to transfer to the higher dimension space exactly. The memory-based model, a locally weighted regression, predicts the expected outcome for a motion and provides gradient information for optimizing the motion. Results are compared against bootstrapping a direct optimization on a mathematical problem. Additionally, the method has been implemented to learn voltage profiles with the lowest peak current for starting a motor. This method shows practical accuracy and scalability.

Nomenclature

a Parameterization of input signal, u(t), RN×1 ∈ a[i] ith input sample of the memory database, i 1,...,M ∈ { }

∗ a (yd) Optimal input producing output yd a¯ Sample point less the query point, aq

th ai i parameter of a, [amin, amax], i 1,...,N ∈ ∈ { } aq Evaluation point of locally weighted regression (LWR)

A Regression coefficients for LWR, RNx×1 ∈

th th bi,j coefficient for j power in the i interval of u(t) d Distance from query point, aq, R ∈ 88 E LWR error, R ∈ h Neighborhood radius used for weighting error

J System cost, to be minimized, R ∈ Jˆ(a) LWR of J(a) ki Scale and shift parameters for basis functions, i 1, 2, 3 ∈ { } M Number of samples in memory database

N Dimension of parameterization which is the length of a

Nx Number of basis functions used for LWR p Number of new samples to generate when testing system

th Pi i candidate set of new samples to test t Dependent variable, assumed to be time, [to, tf ] ∈

th ti Location of i node, i 1,...,N ∈ { } u(t) System input signal, to be optimized

W Diagonal matrix of LWR sample weights, RM×M ∈ w(d) Weighting function, R ∈ x(¯a) Vector of LWR bases evaluated at a¯, R1×Nx ∈ x¯ Samples of a evaluated in LWR bases, RM×Nx ∈ y System output, R ∈ yˆ(a) LWR of y(a) y¯ LWR output samples (either y or J), RM×1 ∈ yd Desired output value

89 6.1 Introduction

One specific characteristic which sets human intelligence apart from logic-based learning and biomimetic systems is the ability to continue to learn at increasing complexity. There has been a shift from mimick- ing human intelligence towards understanding the human developmental process and applying it to robotics

[28, 18, 229, 230]. The method proposed here seeks to capture the ability of some animals to progress from a novice to an expert. In contrast, most learning methods are structured on a fixed number of parameters or a fixed discretization limiting the final resolution. Though robots operate in an analog world, they use a representation at a resolution which has a finite ability to represent continuous functions. The representation used (i.e. radial basis functions) shapes how parameters affect the continuous function. The resolution of the representation used (i.e. number of bases) determine the set of continuous functions that can be produced.

With a fixed set of possible functions, either initial learning is overcomplicated or later learning is limited.

Fixing the representation impedes developmental stages that occur as the learner progresses to higher levels.

For example, the number and shape of basis functions, the complexity of central pattern generators (CPG’s) or artificial neural networks [231], parameterizations for interpolation [232, 233] or parameterizations for feedback laws all limit the possible expression. Limiting the set of possible u(t)’s may preclude better solu- tions. This work’s novelty is indefinitely balancing optimization complexity with experience. The algorithm is referred to as continual autonomous learning (CAL). The general form pertains to optimizing the shape of a function, so it can apply to other function optimizations such as basis shaping [234] or geometry optimization.

Most methods for generating motions can be categorized as biomimetic, kinetic or search based. Biomimicry often uses observed biological motions, such as using motion capture systems, for creating computer anima- tions [235] or robot motions. Human motions similarly appear to be kinematically planned (meaning, based on positions rather than applied forces) [236]. The argument against biomimicry is that a biological opti- mum may be poorly suited for electromechanical systems [237]. Bio-inspired approaches adapt a biological heuristic to mechanical systems, but are limited by the designers understanding [238]. Determining gaits is

90 especially difficult for unfamiliar configurations, such as tripods or self-configurable robots [239]. Kinetic approaches analyze force and motion constraints [240]. These approaches give high fidelity results and can be used for stability analysis, but make extensive simplifications and are often numerically solved off-line

(for example [241]). For some cases, feedback laws have been developed [242]. Search based approaches seek desirable trajectories with minimal assumptions by learning through trial and error. Direct optimiza- tion, iterative or reinforcement learning, genetic algorithms and other artificial intelligence methods can be considered search approaches. Optimizations can adjust scaling of basis functions, interpolation points or more abstract parameters such as for CPG’s. Because search technique performance is typically based on the topology of the problem, there is active research being conducted on the best representations to simplify the optimization [243]. Biomimetic approaches or expert instruction can provide a good starting point for a search [244].

The goal of CAL is to achieve a learning process and representation that can mimic biological learning, rather than the motion itself. Locally-weighted-regression (LWR) is a robust, non-parametric model that is useful when the form of the model is unknown [245]. LWR does a regression on pairs of input and output vectors where the regression error is weighted by the proximity of the data samples from the point where the

LWR is evaluated. It was chosen since it can handle additional data and captures the local gradient of the iterative trial and error experiments. The continuous motion is initially parameterized in a low dimension, forming an input vector representing the continuous function. Learned motions are organized by a measure of an output; for example walking speed. The input value for a new output value is created by interpolation from a set of learned motions with known output (similar to other methods [232, 216]). The large memory database of input/output pairs can be used for analysis, reasoning and planning; while the learned motion will readily provide a solution for the desired output, similar to a reflexive responses. With the database and learned motion, CAL separates conscious motion planning (computationally expensive) from the learned response (computational inexpensive). Methods that only use the simplified representation and discard all

91 trial data, simply adapt and can lose optimality based on the order of data being presented [246]. When all motions are optimized for a given resolution, an additional parameter is added to the trajectory. The new dimension increases the temporal resolution allowing more freedom to shape the continuous waveform. After development of an inverse, optimal function; the robot’s operator can focus on the desired output, rather than the input wave, reducing the operator’s cognitive load [26, 247].

Resolution is a critical issue in waveform optimization since the search space is exponential based on the number of waveform parameters. LWR does not avoid exponential growth, but the learning is conducted over the system’s life, not as an isolated training or search phase, similar to biological development. However, the reflex function is an inverse mapping and scales in complexity linearly with the dimension of waveform parameters so it should always be computationally feasible, even for real-time systems. Biological systems limit complexity in initial training phases, as suggested by Montessori techniques [248]. Infants have coarse, spasmodic movements partly due to an underdeveloped nervous system [249, 250]. Over extensive periods of time, the infant develops a base of proficiency for representations of more complex motions and learned reflexes such as grabbing or balancing. Many control designs also begin at reduced complexity, but do not provide a method to improve beyond the resolution limit [251].

The statement of the problem is given, followed by a description of the method. Mathematical examples from [252] are presented to illustrate the method and compare it to a traditional bootstrapping. Results on a physical system are then shown to address system variation. Finally, conclusions are discussed with limitations identified.

6.2 Problem Statement

> A set of parameters, a = [a1, . . . , aN ] , are interpolated to form a continuous input signal, u(t). The domain of a and t are closed, amin ai amax i and to t tf . The system acts on u(t) producing an ≤ ≤ ∀ ≤ ≤ output, y, and a cost, J, which are both scalars (see Figure 34). Internal to the system are any relations of the

92 control, states, output, cost and their dynamics. The system is assumed to be repeatable and continuous so the input-output relation, y(a) and J(a), is repeatable and in 2. The repeatability condition can be realized C by having a fixed initial state and well-behaved, deterministic functionals. If the system is not deterministic, but is stationary, then the expected value can be estimated from random samples. Regarding the design of the system, legitimate outputs and costs could be state extremes, such as height of a jump or maximum power; or values at a specific time, such as distance traveled or energy expended. For this work, it is assumed that the system has already been defined and does not consider creating or modifying the output or cost functions. The problem statement extends beyond motions and can pertain to other function optimizations, such as airfoil design. The memory model collects all samples, (a, y, J), and uses them to produce estimates, yˆ(aq) and

Jˆ(aq) at a query point aq. The memory-based model must accommodate new data and determine additional trials to test as needed for the method to be autonomous.

∗ The problem is to find input parameters, a , that minimize J and yield a desired output, yd, shown in

Figure 35. The dynamics are implicit in the system, so the problem can be posed as a static constrained optimization problem:

arg min J(a): y(a) = yd . (36) a { }

∗ Nonlinear programming is run on the estimates to determine the optimal input for a given output, (yd, a ).

If there is not sufficient experience for LWR during the optimization, the memory-based model generates additional experience by testing the system, as shown in Figure 34. The optimization results are stored in a

∗ reflex function, a (yd), to be used by an operator, or high-level planner, as shown in Figure 36. The reflex

∗ function uses an appropriate interpolation of the (a , yd) pairs for new desired outputs.

The novelty of the approach is that the dimension, N, of parameter a is not bounded, but is incrementally increased. The mapping of a in the lower dimension to the higher dimension must be exact so all the samples in memory can be transferred without introducing error. As N increases, the size of the search space grows, and therefore the number of samples needed to support the memory-based model, as prescribed by the curse 93 Developing the Memory Model (y, J)

a u(t) Memory Cubic System Model Interpolation

a ˆ q yˆ(aq), J(aq)

Figure 34: A memory-based model interacts with the system by collecting (a, y, J) triplets as needed to ensure that estimates yˆ(aq) and Jˆ(aq) have sufficient data in the proximity of the queried point, aq.

Developing the Reflex Function

a yd Memory Reflex Optimization Model y, Jˆ Function ˆ  a∗

a∗(yd)

Figure 35: Optimal inputs, a∗, are determined by optimization over the memory model. Sets of a∗ are collected and organized by yd into a reflex function. As needed, the process in Figure 34 is employed to ensure accuracy.

Using the Reflex Function

yd Operator or Higher (y, J) Level Planner

a∗ u(t) Reflex Cubic System Function Interpolation

Figure 36: After the operator selects a desired output, the optimal input based on the current parameterization is generated by simple interpolation.

of dimensionality [253]. The choice of how the parameterization changes with N may have an effect on the system, so the designer should consider a parameterization appropriate for the problem. For systems such as quadrupeds, incrementing N by 2 or 4 may be appropriate. N should be increased by a small number

94 each time, so that complexity is only incrementally increased. In the higher dimension space, only the new parameters must be explored and only in a local area. Complexity is balanced to the amount of experience so the optimization advances over the life of the system, rather than a limited training period. This mimics the theory of perceptual affordances where the perceptual system is balanced with the system’s ability to act [23].

With more experience/perception, more degrees of freedom, called affordances, are opened for exploration.

The evaluation of CAL is based on the learning rate, meaning the number of samples required as the number of nodes increases. At any specific number of nodes, a direct optimization should have better per- formance and accuracy, since the memory model adds an abstraction [254]. CAL retains the advantage of direct optimization so that the system does not need to be well known to prevent systematic modeling error; and retains the disadvantage that only local minima are found. A typical direct optimization would repeat all the previously tested points; whereas CAL stores the information. The memory provides data to partially support LWR after increasing the dimension or for other values of yd. The learning rate based on the number of optimized yd values should increase at a decreasing rate; whereas with direct optimization, this would increase at a constant rate. For these two reasons, a fair evaluation is difficult; but results should be on the same order of magnitude.

6.3 Methodology

The concept of CAL can be thought of as accumulating experience and then using it to explore additional nuances to a motion. Memory is used to prevent superfluous repetition. The ‘understanding’ comes from

LWR distilling the data to how each parameter affects the outcome, which is basically the local gradient.

The motions with the lowest cost are found by a constrained optimization. CAL begins with a set of a’s

∗ which are then locally optimized to maintain their initial output, yd, producing a set of a . These parameters, a∗, represent the locally-best motion that can be obtained given the limitation of resolution. After existing parameters have been optimized, N is incremented and the new parameter allows for a finer adjustment and

95 Table 3: Fundamental Algorithm of Unbounded Learning

Generate initial set of a’s and find (a, y, J) Use set of y’s as set of yd For each yd Optimize Jˆ(a) :y ˆ(a) = yd During Optimization, If Jˆ is ill conditioned Generate new (a, y, J) samples as needed Increment N and transfer data to higher dimension Return to loop over yd

a higher frequency content in u(t). Motions can be pre-computed for a map from a desired output, yd, to a best input, a∗, forming a reflex function. The optimization and incrementing are repeated as the system has time to practice; so complexity is balanced by experience. These steps are given in Table 3.

The system input is composed by cubic interpolation,

2 3 u(t) =bi,0 + bi,1t + bi,2t + bi,3t , ti t < ti+1, (37) ≤

th th th where ti is the location of the i node and bi,j is the interpolation coefficient for the i interval and j power and i [1,...,N 1]. The coefficients bi,j are found by a set of linear constraints that define the type ∈ −

th of spline. The first set of constraints come from u(t) = ai at the i node from the left and right, for i ∈ 1,...,N . The second set comes from continuity at each node of the function’s first and second derivatives. { } The two remaining constraints in this work are set by the ‘not-a-knot’ condition, meaning that continuity of the third derivative is maintained at the nodes neighboring the boundary nodes: i = 2,N 1 (see { − }

[255]). The procedure for finding the coefficients, bi,j, is well established and not given here in the interest of space. Every segment in cubic interpolation is a cubic function, so the interpolant’s third derivative is piece- wise constant. Any function in 2 can be approximated to arbitrary precision along with its first and second C derivatives by cubic interpolation with a sufficient number of nodes. Cubic spline interpolation reduces the degrees of freedom of u(t) from infinity (a continuous signal) to the number of nodes and is represented by a set (t0, a0), ...(tN , aN ) . The scheme used to add nodes is constant. So N is determined from the length of { } 96 a and t0, . . . , tN can be found from the scheme for adding nodes. Thus, a = [a0, . . . , aN ] fully describes { } u(t) for a given system.

There are other parameterizations which offer incremental resolution that could be chosen. The three obvious choices are power series, Fourier series and incremental neural networks. Power series suffer from poor coefficient scaling with high order polynomials. As the order increases, new polynomial bases are added. The new basis dominates at a higher value than the prior one. And thus the effect of the increase in resolution is concentrated in an ever decreasing region. As a result, many of the lower order coefficients must shift and the relative magnitude may diverge. For inputs that are not periodic and without zero mean, the Fourier series should begin with a constant term and then a period of twice the final time, tf , for the first three variables: a0 + a1 sin(πt/tf ) + a2 cos(πt/tf ). However, it has been discovered that optimization of

Fourier series coefficients is typically less smooth than time domain coefficients [12]. Incremental neural

2 networks may use a radial basis function, e−(t−qn) /σ, or a sigmoid function, 1/(1 + e−(t−qn)/σ), for the neural activation function where qn is the center point and σ scales the spread of the function. The output of the network is the weighted sum of the output of all the neurons. These weights would be the parameters of optimization. The problem with this method is that σ should relate to the spacing between centers. If σ is too large, then each basis is mostly flat resulting in limited ability to match high slopes. If σ is too small, then each basis is very narrow resulting in limited ability to match gentle slopes. When σ is properly sized, then the ability for radial basis networks to match functions is nearly equivalent to cubic interpolation. When using a sigmoid network, the parameters grow large compared to the function value which is not desirable for scaling an optimization. Because the spacing of the centers changes over the course of the algorithm, an initially properly sized σ will eventually be too large for the system to benefit from the additional nodes. If

σ changes with N, then the current data cannot be transferred to the higher dimension space. Having σ as a parameter of optimization adds complexity to the optimization and typically isolates minima. For cubic spline

97 however, the node spacing scales the effect of an individual parameter when solving for the bi,j coefficients of Eqn. 37 so it needs no additional tuning parameter.

When incrementing the dimension N, the lower dimension vector a is mapped to a higher dimension vector a0 by interpolating the lower dimension u(t) at the higher dimension set of nodes. For the mapping to be exact, all the lower dimension nodes must be members of the higher dimension set of nodes. Every discontinuity of the third derivative of the lower dimension u(t) is matched by a discontinuity in the higher dimension parameterization, and the new parameters are chosen to not introduce additional discontinuities.

Since there is no change in smoothness of u(t), the function lies in both spaces exactly. This is demonstrated in Figure 37. Reasonably even resolution is obtained by adding nodes at the midpoint of the first largest interval, shown for a scalar u(t) in Figure 38. If u(t) is a vector valued function, adding a node to each dimension may be appropriate depending on the application. If there is a reason for uneven resolution, a new order is straightforward to implement and does not affect the mechanics of CAL. The range of infinite interval problems can be mapped to a finite interval as appropriate, such as mapping [1, ) to [ 1, 0] by 1/x [256]. ∞ − −

The memory function chosen for this work is a locally weighted regression (LWR, also called LOESS)

[257]. Even if the regression is linear, LWR results are nonlinear, due to the nonlinear effect of the weighing function. As with typical regressions, least squared error (LSE) is assumed along with a pseudoinverse:

E =(¯y xA¯ )>W 2(¯y xA¯ ), (38) − −

A =(¯x>W 2x¯)−1x¯>W 2y,¯ (39)

Nx×1 where E R is the total weighted error and A R are the regression coefficients with Nx basis ∈ ∈ functions used for regression. The independent variables of the M local samples are evaluated in the Nx basis to form x¯ RM×Nx , while the dependent variable of the samples is used directly as y¯ RM×1. ∈ ∈

Weighting of samples is done by W RM×M which is a diagonal array of weights, w(d), where d is the ∈ distance from each sample to the query point. The LSE estimation is able to estimate the expected value of

98 Figure 37: By creating the higher dimension function from the lower dimension function, no discontinuity is introduced at the new node location. Therefore the higher dimension function exactly produces the lower dimension function.

Figure 38: One possible scheme of locations for adding a new node is by adding a node at the midpoint to the first largest interval.

stochastic data, so random process variation will not prevent convergence when a sufficiently high threshold of data required is used. The minimum eigenvalue of (¯x>W 2x¯) can be used to require a sufficient amount of data, as will be discussed later.

99 A LWR is specified by a set of basis functions and a weighting function. Beyond using a bell function, minute differences in the weighting function have a minor effect [258]. For this work, a Gaussian bell on the Euclidian norm is used for the weighting function where the tails are clipped to zero. Samples further than h from the query point, aq, have a weight of zero and can be removed from calculations of Eqn 39. The weighting function is

( 2 (−5(d/h) ) w(d) = 1.3357e d/h 1 , (40) 0 |otherwise| ≤ with d(¯a) = a¯ 2 and a¯ = a aq. The weighting function is evaluated at samples a[j] for j 1,...,M || || − ∈ { } to produce the diagonal of W . The regression should be at least a second order, so that the regression coefficients, A, capture the second order behavior of the extremum [259]. These bases are scaled and shifted so that the weighted bases are orthonormal, given as:

  k1 > k2 x(¯a) = 1, a¯ , B2(¯a) k3 , (41) h h2 −

 2 2 2  B2(¯a) = a¯1, a¯1a¯2,..., a¯1a¯N , a¯2, a¯2a¯3,..., a¯N , (42)

1×Nx 2 with k1 = 4.4724, k2 = 14.154 and k3 = 0.70759 with x R where Nx = 1 + (N + 3N)/2. A plot ∈ of the weighted basis is shown in Figure 39. Having a unit area is not a strict requirement, since many of the typical properties do not hold for unevenly sampled functions, but it helps to evenly scale the eigenvalues of x¯>W 2x¯.

The estimates are formed as yˆ(¯a) = x(¯a)Ay and Jˆ(¯a) = x(¯a)AJ where Ay and AJ come from using y and J, respectively, in Eqn. 39 for y¯. The sampled basis function array is

> x¯ = x(¯a[1])>, . . . , x(¯a[M])> (43)

th where a¯[j] is from the j sample. The derivative of yˆ with respect to a parameter ai at location a is

dyˆ dx > −1 > (a) = (a)Ay + x(a) x¯ W x¯ x¯ DW (¯y xA¯ y), (44) dai dai −   ai ai[1] 0 ... 1 − D =  ..  , (45) h  0 . 0  ... 0 ai ai[M] − 100 Figure 39: Basis functions are scaled and offset so that they are orthonormal.

ˆ dx with a similar equation for the derivative of J. When evaluated about a = aq, the term /dai Ay selects the single element from Ay corresponding to the linear basis of ai. From the regression, Ay seeks to model the 2nd-order behavior locally. The second term of Eqn. 44; however, measures the effect of the query point location to the influence of sample weight based on the error of the point. Therefore, to best capture the local,

dx quadratic shape of y(a), and because it is computationally more efficient, only the first term, /dai Ay, is used to estimate the gradient for optimization.

Each column of x¯ must be independent or the results are not unique. More practically, the eigenvalues of x¯>W 2x¯ can be used to estimate the amount of data in each eigenvector direction. Other LWR methods in- crease h to add samples [260]; but since more data can be gathered, new samples (P = a[M +1], . . . , a[M + { p] ) are generated until the magnitude of all eigenvalues exceed a threshold, termed a data threshold. Dis- } proportionately large eigenvalues are not a concern, since each direction is independent. Therefore directions with more data do not influence other directions. First, a large number of candidate sets of potential sam-

> 2 ples are generated: P1,...,Pq . The candidate set with the largest condition number of x¯ W x¯, when { } combined with the current data set, is then optimized via sequential quadratic programming for increasing the minimum eigenvalue by perturbing the new point locations. The number of new points to generate, p, is

101 based on the number of eigenvalues below the threshold. Note that x¯>W 2x¯ only depends on the independent variable, so these calculations do not require system trials. Only the p samples from the final optimized P are tested.

∗ Two optimization methods were used to find (yd, a ) pairs. For comparison with direct optimization, a well developed commercial optimization was used for comparable accuracy and efficiency. From the con- strained optimizations in MATLAB’s Optimization Toolbox12, the active-set method using gradients only had lower minima with fewer function evaluations than the interior point method (IPM) using gradients and a Hessian. The IPM uses sequential quadratic programming with constraints imposed by barrier functions.

The IPM tested had problems with convergence, presumably due to the severity of barrier functions with the inconsistent history as the LWR function changed with additional data. Active set uses a sequential quadratic programming to determine search direction, and then performs a line search, adding a second level of check- ing. These commercial optimizations assume a numerically exact function and are focused on high precision and fast convergence. Despite this, the active-set method did work well and so it was used in this work. Since active set uses a Hessian approximation, the optimization was modified to reset the approximation when new data was added. For the physical problem, where high accuracy was impossible due to variation in results and computation time was much less than evaluation time, a constrained gradient descent was employed.

6.4 Test Problems

CAL was applied to a mathematical problem to evaluate its performance. The input is a scalar waveform, the system output is the mean, and the system cost is the root-mean-square (RMS) of the difference from a

12The MathWorks Inc., Optimization Toolbox version 5.1 within MATLAB version 7.11

102 sine wave:

Z 1 y = sat (u(t)) dt (46) 0 s 1 Z 2 J = satu(t) sin(2πt) + 2/4 dt (47) 0 −  1 x > 1 sat(u) = x 0 x 1 . (48) 0 otherwise≤ ≤

Nodes are added at the midpoint of the first, largest interval, as shown in Figure 38.

Trigonometric functions are infinite order functions, so the error should always be positive. In addition, if the mean value of sat(u(t)) is different than 1/2, then the difference will drive the value of J. J is minimized when the difference is uniform due to the squared weighting in an RMS; so u(t) will be an offset but otherwise identical sine wave (for 0.25 yd .75). The ideal cost, Ji(yd), can be computed for this ≤ ≤ range as Ji = yd 1/2 . | − |

When the saturation takes effect, the difference cannot be uniform. Where the error would be largest, u(t) would be sinusoidal and the rest would saturate at a lower difference, because the square in RMS is more sensitive to larger values. At 3 nodes, a parabola is able to match the peak (for yd < 0.25) and saturate otherwise with mirrored behavior for yd > 0.75. For 0.25 yd .75, both curves have equal weight and the ≤ ≤ optimal compromise is still a straight line. Since only local optimization is done, results then remain in this local optimum. With the saturation, the ideal locally-optimal RMS used for comparison is calculated from matching one peak and saturating otherwise.

Results do behave as predicted. Figure 40 shows example waveforms for various yd for N = 3 and 4 with h = 0.05 and the minimum eigenvalue as 1e-6. Note local optima are shown in Figure 40 where the far left of yd = 0.84 remains saturated though N = 4. The iterating over yd of Table 3 was tested in an ordered

(least to greatest) and unordered (random) progression and had little difference on cost or learning rate.

To provide a baseline for learning rate, a direct optimization (DO) was formed by using the active-set algo- rithm directly on y and J. The a∗ at the previous resolution was used as the starting point for the optimization 103 Figure 40: Beyond N = 4, results appear identical by sight. Bootstrapped, direct optimization also appear identical.

at the higher dimension. The approximate bound on error of the dependent variable, TolX=0.001”, was set to achieve comparable accuracy. Accuracy over the set of yd is condensed as the mean of log (J Ji). The 10 − standard deviation of the accuracy between runs was always less than 0.075. The equivalent to the number of system trials, M, in direct optimization is the cumulative sum of all function evaluations over the course of the optimization. For these results, 105 linearly spaced values from 0 to 1 were used as the set of yd’s. The accuracy in optimality as N increases is shown in Figure 41 for CAL and DO .

The size of the sample population remains manageable if evaluation time is relatively small. The value of

M as N increases is shown in Figure 42 along with DO results. If a test can be conducted in approximately one second, then gathering data for a 5, 9 and 14 node resolution would be on the order 2, 4.5 and 10 hours, respectively. This assumes no failures of the equipment which may be an concern for some systems [261].

Because of the 2nd-order terms in (41), the expected rate would be exponential since an additional N + 1 bases are added going to N from N 1 due to the cross-term basis. The DO method only tests 1st-order − optimality conditions, but from the repetition of the optimizations also has exponential growth. If the initial point is sufficiently optimal, only N + 1 samples are required. The learning rate could be identical in theory,

104 Figure 41: Results have comparable accuracy with direct optimization.

Figure 42: The number of samples required grows exponentially with the resolution of u(t). Equations of the trend lines are 12, 000 100.031N for direct optimization and 4, 000 100.068N for the memory-based model. · ·

but the actual disparity highlights an important distinction between the methods. Where direct optimization assumes numerically exact values, LWR averages reducing sensitivity to system variation between tests, but requires more data. The LWR allows some motion of a∗, as seen by the decrease in J between 4 N 9 ≤ ≤ without increasing the learning rate. The direct optimization however has a much larger learning rate during

4 N 9 due to the function evaluations required by the line search. The dependence of the methods on ≤ ≤ the size of the set of yd is shown in Figure 43. For direct optimization each optimization is independent so

105 Figure 43: As the size of the set of yd increases, fewer samples are needed for the new points with the memory-based method. Direct optimization however is purely linear. The line types are dash-dot, dash and solid for N=5, 9 and 14, respectively.

M increases linearly with the size of the set of yd. LWR uses samples from other optimizations so M levels out.

The exponential growth rate would cause limits in learning due to memory storage and computational complexity. There are three principle issues involved: storing the data learned, finding the triplets near the point being evaluated and solving the inverse matrix in Eqn. 39. Regarding memory storage, the only variable that is required to be stored and grows are the (a, y, J) triplets. From the length of a the dimension of parameterization is determined and the time values of the nodes can be determined, and from these u(t) is interpolated. Storage of these triplets would require M (N + 2) data values. For the trend line given in · Figure 42, and using 4 byte floating point integers, this would give roughly 0.46 MB for 5 nodes, 0.96 MB for 9 nodes and 1.99 MB for 14 nodes. A size of 1 GB would require 78 nodes based on this example. These values are very reasonable considering modern computing equipment. If the robot is deployed with limited memory capabilities, the database of triplets is not needed to be deployed also, since the inverse function

106 will be used. This is another advantage of separating the learning and execution stages. From the learning database, the samples within a radius of h need to be identified. This is a common database functionality and effective methods for organizing data exist. The growth of data for off-line learning on a data set of this size is not of concern. The calculation of the LWR can however pose a problem. The number of basis grows with

N 2 and the number of samples grows exponentially. A study of numeric methods is beyond the scope of this work, but solving Eqn. 39 would likely pose the first limit regarding complexity.

Other functions were also tested. The cost and output integrands given in (46) and (47) were multiplied by functions so that the RMS or mean was focused onto a specific region and results behaved as expected.

For example, a triangular function on the output would allow the edges to approach (sin(2πt) + 2)/4 while maintaining the same output. The cost times a bell curve centered at t = 1/4 would allow the function to drift from (sin(2πt) + 2)/4 as t approached tf to achieve the desired output.

The frequency of the cost function was tripled so that more nodes were needed before performance dra- matically improved. The progression of the waveform is shown in Figure 44 with the progression as the cost approaches the ideal cost shown in Figure 45. With unknown functions, there is no way to predict future cost improvements based on previous progress, similar to genetic and other evolutionary algorithms.

The ability to diverge from a maximum was also tested. The cost function used was J = cos2(2β) where

β is the angle of the LSE line fit to u(t). The output was the mean value again but saturation was not applied to u(t). When u(t) is flat, then the cost is maximized at 1, and the minimum lies at 0 when the line is 45◦. ± The direct optimization maintains a cost of 1 through N = 6, where testing stopped. The memory-based method tends to find the minimum between the 3rd and 5th generations. Note that since active-set is used in the LWR method, it does not use the 2nd-order information to diverge from maximum, but the uneven sampling is more likely to destroy symmetry and lead away from maximum. Also worth noting is that a∗ for individual yd randomly went to positive and negative slopes. This is a case where na¨ıve interpolation for the reflex function would not work well, though nearest-neighbor interpolation would work.

107 Figure 44: Results when the sine frequency is tripled behave as expected and are shown for yd = 0.07, 0.38, 0.82.

Figure 45: The rate of improvement offers no guarantee over future improvement. Results when the sine frequency is tripled have a slow initial rate of improvement before performance dramatically improves.

6.5 Physical Problem

For the method to compare with biological learning, the algorithm must be able to work despite variation common to physical systems. The problem chosen was starting a motor to a given speed with the lowest peak current. Motors will have a high current draw when starting (inrush current) if directly connected to a

108 constant voltage source. Large currents will cause fluctuations in the supply voltage which may cause faults in other electronics. The large currents also produce large mechanical torques which increase the likelihood of a fatigue failure. To prevent damage, industrial motors may use a soft-starter to limit initial current [262]. CAL would shape the initial applied voltage to reduce the inrush current while still achieving the desired operating state at the end of the starting period. An advantage over methods that seek a constant torque is that only a measure of the peak current is needed, opposed to a continuous sampling of the current. The process given would also directly apply to limiting the inrush current of other systems such as power converters (output: nominal load draw), lighting systems (output: light produced) or amplification equipment (output: maximum output voltage).

A block diagram of the system is given in Figure 46. A trial of an input begins with ensuring the motor is stopped. This is done by sending a zero voltage to the motor’s amplifier for 0.3 sec. In addition, the peak detector was reset and then turned on. The control signal u(t) is a voltage that is amplified and applied

13 across a small DC motor with tf = 1 sec. By placing a low, constant resistance in series with the motor, the current can be sensed. The voltage across the current-sensing resistor is held by a peak detector. The peak detector is created by a diode above the resistor leading to a capacitor. Also connected to the capacitor is a transistor that can be used to drain the stored charge in the capacitor resetting the voltage to zero. A signal amplifier is used to scale the peak detector’s voltage to the full range of the data acquisition device

(DAQ). Negative voltages are obtained by using a reference voltage for the measurement at half the limits of the signal voltage. The measured, amplified voltage of the peak detector is used as the cost, J. The motor has a built in analog tachometer that outputs an AC voltage proportional to the motor speed. This voltage is smoothed by a low pass filter then amplified so the range of the measurement matches the DAQ’s limits.

The measured, smoothed and amplified tachometer voltage is used as the output, y. After tf , J and y are measured. To increase accuracy of the measurement, J and y are sampled at 1 kHz for 80 msec and the initial value of a linear fit is used as the measurement. This mitigates the effect of any sudden voltage changes at

13The motor is estimated between 1 and 10 W output power based on other motors of similar size.

109 Motor Test Set up

Voltage out

u(t)

Amplifier Motor

Current Peak Tachometer Detector

J y

Figure 46: The input signal controls a motor through an amplifier. A peak detector senses the maximum current; while a low pass filter measures the final speed from a tachometer.

the time of the measurement on results. Since there is no need to know the absolute values, J and y are not converted to amperes or revolutions per minute and the volt unit is omitted. After measuring, the motor is stopped by sending u = 0 for 0.3 sec. Total time for single test is 1.6 sec. Training was fully automated and only required an operator to start it.

Only a few adjustments were needed to adapt the algorithm from the previous section. Parameters were scaled for the input domain changing from [0,1] to [0,4]; and the output and cost range changing from [0,1] to [-5,5]. The variability in the system required more data to accurately estimate the expected value of y and J, so the minimum eigenvalue of x>W 2x was increased to 10. This value was arrived at by simulating another system with similar variation. The variation in measurements of the physical system for the same input was applied to a mathematical system. The minimum required eigenvalue was increased until the LWR was able to match the features of the mathematical system. This value was then tested on the physical system

110 and showed a reasonable balance between the amount of required data and the accuracy required for numeric optimization.

From a designer’s standpoint, there are some possible reasons for the variability. The measurements were not always measured immediately after the signal was concluded due to delays in operating system calls. The friction would vary between trials, resulting in more or less current or speed. The starting angle of the rotor could affect the minimum torque required to initiate motion due to brushes catching. From using CAL, it is irrelevant what causes the variability, since these are independent of the waveform. If the variation were due to the waveform (such as the current causing resistor heating), then the method would be able to exploit it, even without a model of the phenomenon. The variations that are independent of the waveform just create a larger deviation about the expected value.

CAL was able to learn on this uncertain system without any system model. The progression of the cost is presented in Figure 47 by a boxplot to show some of the variations between runs. For evaluation of this example, the lower bound of cost, Ji(yd), was based on the measurement of steady state current at the yd speed. The mean of the difference of the estimated cost from the lower bound for all outputs of a run was used as a measure of the average excessive cost. To show change across a range of magnitudes, a logarithm is taken of this measure:

P ∗  ! J a (yd) Ji(yd) log yd − (49) 10 P 1 yd

Ideally this would go to , but since the lower bound is excessively conservative by not accounting for any −∞ acceleration, there should be some finite lower bound to this measure. The boxplot shows that the median of this value decreased until N = 5. For N 5, the spread of this measure among runs decreased showing ≥ an increase in accuracy of predicting. Since only an estimate of the cost is known, the estimate can increase with N as more data is collected. The true expected value for a waveform is unknown, and cannot be used for comparison. A representative progression of results is shown in Figure 48. Results were as accurate as could be expected, as shown by Figure 49. 111 Figure 47: Results for ten independent runs are shown in boxplot format (box extends from 25-75% with median shown by a line). The median continues to decrease until the 5th dimension.

One run to seven dimensions would take 17.3 hours on average. The average number of trials was 38,000, which takes 16.9 hours at 1.6 sec per trial, so run time was primarily driven by the time to test a sample waveform. The time for letting the motor spin down has a direct effect on run time (0.3 sec prior and

0.3 sec after). The minimum eigenvalue determines if additional trials are required, so raising it significantly increases run time. Neither of these values were optimized, so run time simple represents an approximate upper bound for similar systems.

Though no model was used, results allow insight into the system. The cost proved to be strongly affected by the initial voltage, u(0), while the output was strongly affected by the final voltage, u(tf ). A linear u(t) gave significant reduction over a constant u(t). For N > 2, it was not apparent how to adjust u(t) to reduce the cost. Results show a tendency of large outputs to be concave up while low values may be concave down.

For this problem, it is not clear if there is an unique optimum, or if tf is large enough to allow a family of optimal curves with identical peak currents and final speeds.

112 (a) N = 2 (b) N = 3

(c) N = 7

Figure 48: A representative set of example waveforms for the motor start up problem are shown as the resolution increases.

113 Figure 49: The actual output is within the confidence interval of the expected output.

For comparison, a direct optimization was also run on the test system, as was done in the previous sub- section. The direct optimization was not able to process the random results and the optimization would either drift away from the desired output, or would fail to move. Rarely would it find a significantly improved input as N increased. Even when the mean of 5 samples was taken as a single function evaluation, the results were not sufficiently consistent for direct optimization.

6.6 Conclusions

The goal of this work was to develop an algorithm, referred to as continuous autonomous learning (CAL), for optimizing continuous functions to an unbounded resolution. Results show that the learning rate is com- parable to direct optimization with similar accuracy. Cubic spline interpolation of the input signal, u(t), allows for transferring data from lower to higher dimension parameterizations exactly. Locally weighted regression (LWR) provides gradient estimates for optimization without needing a system model and can be

114 used to determine signals to test when the existing memory is insufficient. Complex or poorly understood systems, such as electroactive polymers, flapping wings or other flexible systems, are not limited by modeling error [263, 264, 265]. Results of the optimizations are collected for a ‘reflex’ function that is computationally efficient at finding an near optimal input, u(t), for a changing output, yd, not just a single nominal operating point. This variability allows for the system to be near optimal over a much broader range [266]. Adding nodes to cubic interpolation matches the balance that biological learning has between experience and the ability to shape a motion.

The input u(t) need not be of scalar dimension, but can be a vector. This is simple because optimal results, a∗, are organized by the output dimension, not the input dimension. Only a scheme for adding nodes as N increases is required. For example, legged motion would have u(t) as a vector of control signals for each motor, y as a measure of speed and J as a measure of variation. Possible applications are not limited to motions but include airfoil design (where u is the top and bottom camber, t is distance along cord, and y and

J are aerodynamic properties) or PID tuning (where u(t) is a feedback law, y is a response time and J is a measure of disturbance rejection).

There are some limitations to the scope of the problem that are worth mentioning. Multiple objectives for y or J are not addressed (i.e. vectors). To support multiple objectives, the candidates must be organized in the vector space of y. Not every combination of outputs may be feasible, requiring demarcations. This is not a fundamental or insurmountable problem, but requires additional detail outside the scope of this dissertation.

∗ The reflex function, a (yd) is not guaranteed to be continuous, as in Chapter V, due to possible conver- gence to different local minima, as mentioned with the divergence test example at the end of Section 6.4. To provide a smooth reflex function, results should be clustered according to neighboring optima. This would:

1. identify when the reflex function should not interpolate results with neighboring yd results,

2. provide fundamentally different alternatives (disjoint reflex functions) with the same output and

115 3. facilitate exploration to see if the range of yd can be extended in higher dimensions.

In addition, the concept of clustering would provide a mechanism for coupling global search methods [267].

There are some limitations, however, that still require a breakthrough. All samples are stored regard- less of statistical significance. An explanation for human learning capacity and ability to generalize is sug- gested from our ability to forget irrelevant details [268]. However, what is irrelevant is typically not apparent when learning, making this a very challenging research direction. Forgetting also helps with adapting to environmental changes. Fortunately, there is some interesting work being done to answer these questions

[269, 270, 271].

CAL is a significant breakthrough since the method is entirely undirected, yet offers an efficient learning rate and can match the exact optimal continuous function in the limit. The algorithm only needs to know a scheme for adding nodes to the input signal and be able to access output and cost values, without needing to set a limit on complexity or accuracy. Therefore a computer generated robot (one designed without human input) would be able to learn control strategies without an explicit comprehension of its environment.

116 CHAPTER VII

CONCLUSION

The work of this dissertation addresses the research questions posed in Chapter II. This chapter recounts the advances throughout the dissertation which address these questions. The methods presented in the dissertation can work together for synergistic benefits and how they complement each other is explained. Additionally, they support other developmental learning techniques by facilitating interaction with the environment. Together, they offer a set of tools for autonomous motion learning. The novelty of these tools is that they efficiently use data to provide a rapid learning rate and allow for continual learning over an extensive period of time. With additional methods from other developmental learning research, a general purpose learner seems feasible. However, there are some significant limitations, such as how to transfer knowledge across do- mains, and they are also presented.

This dissertation presents several methods that, as a whole, make progress towards the goal of mimicking the ability of animals to grow from novice to master on motion-related tasks. The problem of mimicking gen- eral purpose intelligence is far from being solved, but methods in this work show that when a developmental learning approach is taken, systems are able to progress towards becoming experts without programmed, task-specific heuristics. This has been demonstrated for robotic arms on a pick-place task in Chapter IV and on a tracking task in Chapter V. In Chapter VI, staging of the learning has been shown to learn effectively, despite large variations within the systems, by optimizing a motor spin-up waveform. One of the principle contributions of this dissertation is the organization of information into optimal inverse functions for creation of motion primitives. These inverse functions provide a learned response which can be easily computed, such as trained reflexes like shooting a basketball. At the same time, training data can be retained for tasks requir- ing planning. Global optimization, using a population based search, and unbounded, incremental resolution 117 provide the foundation for autonomously developing motions superior to what can be designed by hand. The supporting problems statements have each been addressed through the dissertation, and are summarized as follows.

7.1 Focused Problem Statements

7.1.1 How does directed learning fit in a global or local search context?

Multimodal problems in high dimensions require a trade off of spending function evaluations refining candidates (local search) and exploring for new promising solutions (global search). Chapter IV showed that numeric methods can efficiently direct learning of optimal motions, even without providing the algorithm an understanding of the system. Chapters V and VI then went on to address global and local search in order to create motion primitives.

In order to effectively search the global space, a population of agents are used. By removing agents which would converge to the same solution, the number of agents quickly decreases, so the number of function evaluations only modestly increases with the number of agents. Motion of an agent is constrained to maintain the output value while minimizing the cost. This motion results in direct progression to the nearest optimal inverse function. The bounds of the inverse function are explored in an equally efficient manner.

The local behavior of continuous functions being optimized can be well described by polynomial models.

The mathematics community has developed efficient methods of general purpose local optimization. The optimal direction for a sufficiently small step is based directly on the gradient direction. This means that few methods offer better efficiency for local optimization on functions with a smooth second derivative. Locally weighted regression (LWR) can be used as a surrogate for optimization. Data collected can be used for later optimization steps or for other optimizations, such as for adjacent output values of an optimal inverse function.

If the dimension of the optimization increases, the search space grows exponentially. By using previous data

118 in the LWR, only bases using the new dimension need to be sampled. In addition, LWR attenuates the effect of variation, making the optimization more robust to process variation.

7.1.2 How can motions be optimized based on a high-level rigid body representa- tion?

In Chapter IV, a high-level description of a robot was employed to provide system dynamics. With this method, design of a robotic system can be done based on the rigid body geometry. Optimization can be conducted without analysis specific to the configuration. The technique was to link a general-purpose kinetic solver with a general-purpose trajectory optimization program. This provides a motion-planing skill for general kinematic linkages, which is foundational for some high-level skills. Chapters V and VI then went on to extend results beyond rigid body motion, to any trajectory optimization by considering the problem as parameter or function optimization.

7.1.3 How can operation of a multiple-input, single-output system be simplified when considering tracking and optimization?

The cognitive load of the operator can be reduced by providing inverse functions that determine an input that generates the desired output. Since multiple-input systems are considered here, the input also has the freedom to be optimized. Use of these inverse optimal functions as motion primitives reduces the challenge of finding multiple inputs with possibly complex interaction in the system, to adjusting a single parameter which is based on the operators objective.

7.1.4 How can equivalent, but fundamentally different motions be organized?

For multimodal optimizations, multiple locally-optimal solutions will exist. Collecting these fundamen- tally different solutions can provide alternatives if the true optimum is inappropriate (such as if a collision would occur) or if it does not allow for continuous optimality over the desired range of the output. Rather than using an explicit method, such as niching, multiple solutions are naturally organized into clusters by

119 the population based optimization shown in Chapter V. By connecting locally-optimal solutions as the output varies, inverse functions are guaranteed to be continuous. Evaluation of the inverse function requires only one dimensional interpolation, making it suited for real-time tracking. Selection of alternatives is simply done by changing the inverse function used.

7.1.5 How should motion primitives be represented to facilitate incremental increases in the number of control parameters?

Many parameterizations exist that can approximate the set of continuous functions in the limit. A desir- able expanding parameterization should

1. have the lower parameterization lie in the higher space exactly so all previous knowledge can be trans-

ferred to the higher space,

2. scale well in the parameter values, so that the optimization does not become ill conditioned,

3. result in short settling distances for optimization by not significantly moving the optimal location

4. not be excessively sensitive to the cost near optima.

As discussed in Chapter VI, cubic spline interpolation satisfies these criteria. By retaining the nodes of the previous parameterization, and interpolating the existing function to find the new parameters, all the previous data is able to be preserved exactly reducing the number of additional trials required to find a gradient in the higher dimension space. The scale of cubic interpolation is on the same scale as the input; as opposed to other parameterizations such as sigmoid neural networks or power series. The coefficients of radial basis functions scale well as the parameterization increases, but the spread of the basis function would need to be adjusted and the parameterization changes. If the spread is an additional parameter it would add complexity to the optimization. If it were based on the parameterization dimension, then previous results would not lie in the higher dimension space. The parameters of cubic interpolation are intuitive and have direct effects on

120 the system behavior. The parameters Fourier series, however, are not intuitive how they interrelate to shape a signal, and result in isolation of good trajectories.

7.1.6 What are the limitations fundamental to staged learning?

Staged learning does have its disadvantages. As presented in Chapter VI, an exponential number of trials are required as dimension of the parameterization increases. Second order information is necessary to distinguish gradients near optima. To gather second order information, the function must be sampled across all cross terms, resulting in exponential growth in the number of trials as the number of control parameters increases.

A general purpose learner’s model adapts based on the structure of the data. The order data is presented can have significant results on the apparent structure of the data. Therefore results can differ in positive or negative ways unexpectedly based on the course of learning. As mentioned previously, inverse functions are currently one dimensional. Most robotic systems have complex tasks requiring additional output consider- ations. This however is not a fundamental limitation, but would require an additional level of complexity to employ techniques from computational geometry to efficiently organize the inverse function in a higher dimension.

Staged learning does not provide concrete answer to vague questions such as: ‘What is learning?’ or

‘What is understanding?’ Also, staged learning has yet to provide a mechanism to relate one skill set to another. In order to see developmental progress at a quality and speed comparable to human learning, knowl- edge must be transferred across subject matters, such as the connection of music and mathematic proficiency

[272].

7.2 Considerations

Human development has many mysteries to be understood. However, it seems that most of the mech- anisms witnessed in the developmental stages can be recreated without needing ambiguous concepts such 121 as consciousness. The essence of human development lies in the ability to continue to improve. Methods that recreate that ability can over time exceed human proficiency. However, the ability to effectively transfer knowledge to new disciplines based on human learning will require a significant breakthrough in understand- ing how knowledge is stored and processed in the mind.

122 BIBLIOGRAPHY

[1] H. Heuer, “Structural constraints on bimanual movements,” Psychological Research, vol. 55, pp. 83– 98, 1993. 10.1007/BF00419639.

[2] D. Ferrucci, “Build Watson: an overview of DeepQA for the Jeopardy! challenge,” in Proceedings of the 19th international conference on Parallel architectures and compilation techniques, PACT ’10, (New York, NY, USA), pp. 1–2, ACM, 2010.

[3] P. Debenest, M. Guarnieri, K. Takita, E. Fukushima, S. Hirose, K. Tamura, A. Kimura, H. Kubokawa, N. Iwama, F. Shiga, Y. Morimura, and Y. Ichioka, “Expliner toward a practical robot for inspection of high-voltage lines,” Field and Service Robotics, vol. 62, pp. 45–55, 2010. 10.1007/978-3-642-13408- 1 5.

[4] S. Hirose and E. Fukushima, “Snakes and strings: New robotic components for rescue operations,” Experimental Robotics VIII, vol. 5, pp. 48–61, 2003. 10.1007/3-540-36268-1 3.

[5] S. Hirose, “Super mechano-system: New perspective for versatile robotic system,” Experimental Robotics VII, vol. 271, pp. 249–258, 2001. 10.1007/3-540-45118-8 26.

[6] L. Lahajnar, A. Kos, and B. Nemec, “Skiing robot - design, control, and navigation in unstructured environment,” Robotica, vol. 27, no. 04, pp. 567–577, 2009.

[7] H. Arisumi, J.-R. Chardonnet, and K. Yokoi, “Whole-body motion of a humanoid robot for passing through a door - opening a door by impulsive force -,” Intelligent Robots and Systems, 2009. IROS 2009. IEEE/RSJ International Conference on, pp. 428 –434, Oct. 2009.

[8] G. E. Fix, A study of contributing variables to two methods of high jumping. PhD thesis, University of Portland, 1970.

[9] J. M. S. L. K. W. Barry Schwartz, Andrew Ward and D. R. Lehman, “Maximizing versus satisficing: Happiness is a matter of choice.,” Journal of Personality and Social Psychology, vol. 83, no. 5, 2002.

[10] R. A. Grupen, “A developmental organization for robot behavior,” in Proceedings of the 3rd Interna- tional Workshop on Epigenetic Robots, (Boston, MA), pp. 1–12, IEEE, Aug 2003.

[11] M. Rolf, J. J. Steil, and M. Gienger, “Efficient exploration and learning of whole body kinematics,” in 8th International Conference on Development and Learning, IEEE, 2009.

123 [12] R. Grzeszczuk and D. Teropoulos, “Automated learning of muscle-actuated locomotion through con- trol abstraction,” in Proceedings of SIGGRAPH, (Los Angeles, CA), pp. 63–70, ACM, Aug 1995.

[13] N. Berthier, R. Clifton, and D. McCall, “Proximodistal structure of early reaching in human infants,” Empirical Brain Research, vol. 127, pp. 259–269, 1999.

[14] A. Stoytchev, Robot Behavior: A Developmental Approach to Autonomous Tool Use. Phd disertation, George Institute of Technology, Atlanta, GA, Aug 2007.

[15] C. W. Reynolds, “Steering behaviors for autonomous characters,” in Proceedings of Game Developers Conference, (San Francisco, CA), pp. 763–782, Miller Freeman Game Group, 1999.

[16] P.-Y. Oudeyer, F. Kaplan, V. V. Hafner, and A. Whyte, “The playground experiment: Task-independent development of a curious robot,” in Proceedings of AAAI Spring Symposium on Developmental Robotics, pp. 42–47, 2005.

[17] A. Stoytchev, “Developmental Robotics Lab @ Iowa State University.” Internet: http://www.ece.iastate.edu/ alexs/lab/, [May 27 2010]. ∼ [18] M. Lungarella and G. Metta, “Beyond gazing, pointing and reaching: a survey of developmental robotics,” in Proceedings of the 3rd International Workshop on Epigenetic Robots: Modeling Cog- nitive Development in Robotic Systems, vol. 101, pp. 1–9, Lund University Cognitive Studies, 2003.

[19] R. W. M. Sue Taylor Parker and M. L. Boccia, eds., Self-Awareness in Animals and Humans. Cam- bridge University Press, 1994. DOI: 10.1017/CBO9780511565526.

[20] J. Piaget, La psychologie de l’intelligence. Paris: Armand Colin, 1947.

[21] J. Piaget, Origins of Intelligence In Childhood. International University Press, 1952.

[22] L. Vygotsky, “The problem of the cultural development of the child II,” Journal of Genetc Psychology, vol. 36, pp. 415–32, 1929.

[23] E. J. Gibson, Principles of Perceptual Learning and Development. Englewood Cliffs, NJ: Prentice Hall, 1969.

[24] G. Sun and B. Scassellati, “Reaching through learned forward model,” in IEEE RAS/RSJ International Conference on Humanoid Robots, (Santa Monica, CA), pp. 1–20, 2004.

[25] O. Michel and P. Collard, “Artificial neurogenesis: An application to autonomous robotics,” in Pro- ceedings of the 8th IEEE International Conference on Tools with Artificial Intellegence, pp. 207–214, 1996.

[26] J. Burke and R. Murphy, “Human-robot interaction in USAR technical search: two heads are better than one,” in Robot and Human Interactive Communication, 2004. ROMAN 2004. 13th IEEE Interna- tional Workshop on, pp. 307–312, Sept. 2004.

[27] S. Russell and P. Norvig, Artificial Intellegence: A Modern Approach. Prentice Hall Series in Artificial Intellegence, Saddle River, New Jersey: Prentice Hall, 1st ed., 1995.

124 [28] M. Asada, K. MacDorman, H. Ishiguro, and Y. Kuniyoshi, “Cognitive developmental robotics as a new paradigm for the design of humanoid robots,” Robotics and Autonomous Systems, vol. 37, pp. 185–193, 2001.

[29] T. Armstrong and T. Oates, “Riptide: Segmenting data using multiple resolutions,” in Development and Learning, IEEE Int. Conf. on, pp. 306 –311, Jul 2007.

[30] H. Brandl, B. Wrede, F. Joublin, and C. Goerick, “A self-referential childlike model to acquire phones, syllables and words from acoustic speech,” in Development and Learning, IEEE Int. Conf. on, pp. 31 –36, Aug 2008.

[31] B. Lake, G. Vallabha, and J. McClelland, “Modeling unsupervised perceptual category learning,” in Development and Learning, IEEE Int. Conf. on, pp. 25 –30, Aug 2008.

[32] H. Ishihara, Y. Yoshikawa, K. Miura, and M. Asada, “Caregiver’s sensorimotor magnets lead infant’s vowel acquisition through auto mirroring,” in Development and Learning, IEEE Int. Conf. on, pp. 49 –54, Aug 2008.

[33] L. Schillingmann, B. Wrede, and K. Rohlfing, “Towards a computational model of acoustic packag- ing,” in Development and Learning, IEEE Int. Conf. on, pp. 1 –6, Jun 2009.

[34] K. Miura, Y. Yoshikawa, and M. Asada, “Realizing being imitated: Vowel mapping with clearer artic- ulation,” in Development and Learning, IEEE Int. Conf. on, pp. 262 –267, Aug 2008.

[35] A. Fernald, V. Marchman, and N. Hurtado, “Input affects uptake: How early language experience influences processing efficiency and vocabulary learning,” in Development and Learning, IEEE Int. Conf. on, pp. 37 –42, Aug 2008.

[36] M. Vaz, H. Brandl, F. Joublin, and C. Goerick, “Learning from a tutor: Embodied speech acquisition and imitation learning,” in Development and Learning, IEEE Int. Conf. on, pp. 1 –6, Jun 2009.

[37] W. Kerr, S. Hoversten, D. Hewlett, P. Cohen, and Y.-H. Chang, “Learning in wubble world,” in Devel- opment and Learning, IEEE Int. Conf. on, pp. 330 –335, Jul 2007.

[38] E. Kim, K. Gold, and B. Scassellati, “What prosody tells infants to believe,” in Development and Learning, IEEE Int. Conf. on, pp. 274 –279, Aug 2008.

[39] H. Sumioka, Y. Yoshikawa, and M. Asada, “Development of joint attention related actions based on reproducing interaction contingency,” in Development and Learning, IEEE Int. Conf. on, pp. 256 –261, Aug 2008.

[40] H. Sumioka, M. Asada, and Y. Yoshikawa, “Causality detected by transfer entropy leads acquisition of joint attention,” in Development and Learning, IEEE Int. Conf. on, pp. 264 –269, Jul 2007.

[41] D. Messinger, M. Mahoor, S. Cadavid, S.-M. Chow, and J. Cohn, “Early interactive emotional devel- opment,” in Development and Learning, IEEE Int. Conf. on, pp. 232 –237, Aug 2008.

[42] H. Jasso, J. Triesch, and G. Deak, “A reinforcement learning model of social referencing,” in Develop- ment and Learning, IEEE Int. Conf. on, pp. 286 –291, Aug 2008. 125 [43] K. Gold, M. Doniec, and B. Scassellati, “Learning grounded semantics with word trees: Prepositions and pronouns,” in Development and Learning, IEEE Int. Conf. on, pp. 25 –30, Jul 2007.

[44] A. Plebe, V. De la Cruz, and M. Mazzone, “Artificial learners of objects and names,” in Development and Learning, IEEE Int. Conf. on, pp. 300 –305, Jul 2007.

[45] J. Weng, Q. Zhang, M. Chi, and X. Xue, “Complex text processing by the temporal context machines,” in Development and Learning, IEEE Int. Conf. on, pp. 1 –8, Jun 2009.

[46] Y. Yoshikawa, T. Nakano, M. Asada, and H. Ishiguro, “Multimodal joint attention through cross fa- cilitative learning based on µx principle,” in Development and Learning, IEEE Int. Conf. on, pp. 226 –231, Aug 2008.

[47] T. Oezer, “Acquisition of lexical semantics through unsupervised discovery of associations between perceptual symbols,” in Development and Learning, IEEE Int. Conf. on, pp. 19 –24, Aug 2008.

[48] J. Hunter, D. Wilkes, D. Levin, C. Heaton, and M. Saylor, “Autonomous segmentation of human action for behaviour analysis,” in Development and Learning, IEEE Int. Conf. on, pp. 250 –255, Aug 2008.

[49] M. Miller and A. Stoytchev, “Hierarchical voting experts: An unsupervised algorithm for hierarchical sequence segmentation,” in Development and Learning, IEEE Int. Conf. on, pp. 186 –191, Aug 2008.

[50] G. Satish and A. Mukerjee, “Acquiring linguistic argument structure from multimodal input using attentive focus,” in Development and Learning, IEEE Int. Conf. on, pp. 43 –48, Aug 2008.

[51] C. Crick and B. Scassellati, “Inferring narrative and intention from playground games,” in Development and Learning, IEEE Int. Conf. on, pp. 13 –18, Aug 2008.

[52] C. Crick, M. Doniec, and B. Scassellati, “Who is it? inferring role and intent from agent motion,” in Development and Learning, IEEE Int. Conf. on, pp. 134 –139, Jul 2007.

[53] S. Jockel, M. Mendes, J. Zhang, A. Coimbra, and M. Crisostomo, “Robot navigation and manipulation based on a predictive associative memory,” in Development and Learning, IEEE Int. Conf. on, pp. 1 –7, Jun 2009.

[54] J. Maouene, S. Hidaka, and L. Smith, “Body-part categories of early-learned verbs: Different granu- larities at different points in development,” in Development and Learning, IEEE Int. Conf. on, pp. 268 –273, Aug 2008.

[55] M. Kuwabara, J. Son, and L. Smith, “Trait or situation? cultural differences in judgments of emotion,” in Development and Learning, IEEE Int. Conf. on, pp. 163 –167, Aug 2008.

[56] M. Bomstein, C. Mash, and M. Arterberry, “Infant perception and categorization of object-context relations,” in Development and Learning, IEEE Int. Conf. on, pp. 99 –104, Jul 2007.

[57] J. Chen, C. Chan, R. Pulverman, T. Tardif, M. Casasola, X. Zheng, and X. Meng, “English- and Mandarin-speaking infants’ discrimination of persons, actions, and objects in a dynamic event without audio inputs,” in Development and Learning, IEEE Int. Conf. on, pp. 1 –7, Jun 2009.

126 [58] M. Kuwabara and L. Smith, “Cultural differences in relational knowledge,” in Development and Learn- ing, IEEE Int. Conf. on, pp. 1 –6, Jun 2009.

[59] J. de Greeff, F. Delaunay, and T. Belpaeme, “Human-robot interaction in concept acquisition: a com- putational model,” in Development and Learning, IEEE Int. Conf. on, pp. 1 –6, Jun 2009.

[60] S. Sahni and T. Rogers, “Sound versus meaning: What matters most in early word learning?,” in Development and Learning, IEEE Int. Conf. on, pp. 280 –285, Aug 2008.

[61] S. Takamuku, G. Gomez, K. Hosoda, and R. Pfeifer, “Haptic discrimination of material properties by a robotic hand,” in Development and Learning, IEEE Int. Conf. on, pp. 1 –6, Jul 2007.

[62] N. Koenig, “Toward real-time human detection and tracking in diverse environments,” in Development and Learning, IEEE Int. Conf. on, pp. 94 –98, Jul 2007.

[63] H. Kim, H. Jasso, G. Deak, and J. Triesch, “A robotic model of the development of gaze following,” in Development and Learning, IEEE Int. Conf. on, pp. 238 –243, Aug 2008.

[64] C. Rothkopf, T. Weisswange, and J. Triesch, “Learning independent causes in natural images explains the spacevariant oblique effect,” in Development and Learning, IEEE Int. Conf. on, pp. 1 –6, Jun 2009.

[65] J. Stober and B. Kuipers, “From pixels to policies: A bootstrapping agent,” in Development and Learn- ing, IEEE Int. Conf. on, pp. 103 –108, Aug 2008.

[66] H. Zhao, Z. Ji, M. Luciw, and J. Weng, “Developmental learning for avoiding dynamic obstacles using attention,” in Development and Learning, IEEE Int. Conf. on, pp. 318 –323, Jul 2007.

[67] M. Schembri, M. Mirolli, and G. Baldassarre, “Evolving internal reinforcers for an intrinsically moti- vated reinforcement-learning robot,” in Development and Learning, IEEE Int. Conf. on, pp. 282 –287, Jul 2007.

[68] J. Weng, “A theory of architecture for spatial abstraction,” in Development and Learning, IEEE Int. Conf. on, pp. 1 –8, Jun 2009.

[69] M. Ogino, A. Watanabe, and M. Asada, “Detection and categorization of facial image through the interaction with caregiver,” in Development and Learning, IEEE Int. Conf. on, pp. 244 –249, Aug 2008.

[70] J. Tsotsos, “What roles can attention play in recognition?,” in Development and Learning, IEEE Int. Conf. on, pp. 55 –60, Aug 2008.

[71] N. Butko and J. Movellan, “I-pomdp: An infomax model of eye movement,” in Development and Learning, IEEE Int. Conf. on, pp. 139 –144, Aug 2008.

[72] Y. Sandamirskaya and G. Schoner, “Dynamic field theory of sequential action: A model and its imple- mentation on an embodied agent,” in Development and Learning, IEEE Int. Conf. on, pp. 133 –138, Aug 2008.

[73] F. Shic, K. Chawarska, J. Bradshaw, and B. Scassellati, “Autism, eye-tracking, entropy,” in Develop- ment and Learning, IEEE Int. Conf. on, pp. 73 –78, Aug 2008. 127 [74] Y. Nagai, “From bottom-up visual attention to robot action learning,” in Development and Learning, IEEE Int. Conf. on, pp. 1 –6, Jun 2009.

[75] J. Ruesch and A. Bernardino, “Evolving predictive visual motion detectors,” in Development and Learning, IEEE Int. Conf. on, pp. 1 –6, Jun 2009.

[76] A. Franz and J. Triesch, “Emergence of disparity tuning during the development of vergence eye movements,” in Development and Learning, IEEE Int. Conf. on, pp. 31 –36, Jul 2007.

[77] Z. Ji, J. Weng, and D. Prokhorov, “Where-what network 1: Where and what assist each other through top-down connections,” in Development and Learning, IEEE Int. Conf. on, pp. 61 –66, Aug 2008.

[78] S. Griffith, J. Sinapov, M. Miller, and A. Stoytchev, “Toward interactive learning of object categories by a robot: A case study with container and non-container objects,” in Development and Learning, IEEE Int. Conf. on, pp. 1 –6, Jun 2009.

[79] L. Grabowski, M. Luciw, and J. Weng, “A system for epigenetic concept development through au- tonomous associative learning,” in Development and Learning, IEEE Int. Conf. on, pp. 175 –180, Jul 2007.

[80] L. Montesano and M. Lopes, “Learning grasping affordances from local visual descriptors,” in Devel- opment and Learning, IEEE Int. Conf. on, pp. 1 –6, Jun 2009.

[81] F. Stulp, A. Fedrizzi, and M. Beetz, “Learning and performing place-based mobile manipulation,” in Development and Learning, IEEE Int. Conf. on, pp. 1 –7, Jun 2009.

[82] R. Detry, E. Baseski, M. Popovic, Y. Touati, N. Kruger, O. Kroemer, J. Peters, and J. Piater, “Learning object-specific grasp affordance densities,” in Development and Learning, IEEE Int. Conf. on, pp. 1 –7, Jun 2009.

[83] T. Yoshida, T. Tani, and S. Tanaka, “Orientation plasticity in visual cortex of mice reared under single- orientation exposure,” in Development and Learning, IEEE Int. Conf. on, pp. 1 –6, Jun 2009.

[84] M. Davis, N. Otero, K. Dautenhahn, C. Nehaniv, and S. Powell, “Creating a software to promote understanding about narrative in children with autism: Reflecting on the design of feedback and op- portunities to reason,” in Development and Learning, IEEE Int. Conf. on, pp. 64 –69, Jul 2007.

[85] C.-F. Hsu, A. Karmiloff-Smith, O. Tzeng, R.-T. Chin, and H.-C. Wang, “Semantic knowledge in williams syndrome: Insights from comparing behavioural and brain processes in false memory tasks,” in Development and Learning, IEEE Int. Conf. on, pp. 48 –52, Jul 2007.

[86] J. Yoon, J. Winawer, N. Witthoft, and E. Markman, “Striking deficiency in top-down perceptual reor- ganization of two-tone images in preschool children,” in Development and Learning, IEEE Int. Conf. on, pp. 181 –186, Jul 2007.

[87] T. Kriete and D. Noelle, “Modeling the development of overselectivity in autism,” in Development and Learning, IEEE Int. Conf. on, pp. 79 –84, Aug 2008.

128 [88] J. Fagard, R. Esseily, and J. Nadel, “The role of observational learning in perceiving pbject properties in infants (march 2008),” in Development and Learning, IEEE Int. Conf. on, pp. 198 –203, Aug 2008.

[89] J. Trommershauser, “Acquisition of knowledge about uncertainty in the outcome of sensory motor decision tasks,” in Development and Learning, IEEE Int. Conf. on, pp. 1 –6, Jun 2009.

[90] C. Lange-Kuttner and H. Green, “What is the age of mental rotation?,” in Development and Learning, IEEE Int. Conf. on, pp. 259 –263, Jul 2007.

[91] M. Luciw, J. Weng, and S. Zeng, “Motor initiated expectation through top-down connections as ab- stract context in a physical world,” in Development and Learning, IEEE Int. Conf. on, pp. 115 –120, Aug 2008.

[92] M. Solgi and J. Weng, “Temporal information as top-down context in binocular disparity detection,” in Development and Learning, IEEE Int. Conf. on, pp. 1 –7, Jun 2009.

[93] C. Glaser, F. Joublin, and C. Goerick, “Homeostatic development of dynamic neural fields,” in Devel- opment and Learning, IEEE Int. Conf. on, pp. 121 –126, Aug 2008.

[94] H. Einarsdottir, F. Montani, and S. Schultz, “A mathematical model of receptive field reorganization following stroke,” in Development and Learning, IEEE Int. Conf. on, pp. 211 –216, Jul 2007.

[95] R. Nishimoto and J. Tani, “Development process of functional hierarchy for actions and motor im- agery,” in Development and Learning, IEEE Int. Conf. on, pp. 1 –6, Jun 2009.

[96] L. Paletta, G. Fritz, F. Kintzler, J. Irran, and G. Dorffner, “Learning to perceive affordances in a framework of developmental embodied cognition,” in Development and Learning, IEEE Int. Conf. on, pp. 110 –115, Jul 2007.

[97] M. Asada, K. Hosoda, H. Ishiguro, Y. Kuniyoshi, and T. Inui, “Towards computational developmental model based on synthetic approaches,” in Development and Learning, IEEE Int. Conf. on, pp. 1 –8, Jun 2009.

[98] M. Hikita, S. Fuke, M. Ogino, T. Minato, and M. Asada, “Visual attention by saliency leads cross- modal body representation,” in Development and Learning, IEEE Int. Conf. on, pp. 157 –162, Aug 2008.

[99] M. Hulse, S. McBrid, and M. Lee, “Robotic hand-eye coordination without global reference: A bio- logically inspired learning scheme,” in Development and Learning, IEEE Int. Conf. on, pp. 1 –6, Jun 2009.

[100] Y. Choe, H.-F. Yang, and N. Misra, “Motor system’s role in grounding, receptive field development, and shape recognition,” in Development and Learning, IEEE Int. Conf. on, pp. 67 –72, Aug 2008.

[101] M. Lopes, A. Bernardino, J. Santos-Victor, K. Rosander, and C. von Hofsten, “Biomimetic eye-neck coordination,” in Development and Learning, IEEE Int. Conf. on, pp. 1 –8, Jun 2009.

[102] T. Wu, N. Butko, P. Ruvulo, M. Bartlett, and J. Movellan, “Learning to make facial expressions,” in Development and Learning, IEEE Int. Conf. on, pp. 1 –6, Jun 2009. 129 [103] L. Natale, F. Nori, G. Sandini, and G. Metta, “Learning precise 3d reaching in a humanoid robot,” in Development and Learning, IEEE Int. Conf. on, pp. 324 –329, Jul 2007.

[104] C. Nabeshima, Y. Kuniyoshi, and M. Lungarella, “Towards a model for tool-body assimilation and adaptive tool-use,” in Development and Learning, IEEE Int. Conf. on, pp. 288 –293, Jul 2007.

[105] J. Sinapov and A. Stoytchev, “Learning and generalization of behavior-grounded tool affordances,” in Development and Learning, IEEE Int. Conf. on, pp. 19 –24, Jul 2007.

[106] J. Sinapov and A. Stoytchev, “Detecting the functional similarities between tools using a hierarchical representation of outcomes,” in Development and Learning, IEEE Int. Conf. on, pp. 91 –96, Aug 2008.

[107] T. Schatz and P.-Y. Oudeyer, “Learning motor dependent crutchfield’s information distance to antici- pate changes in the topology of sensory body maps,” in Development and Learning, IEEE Int. Conf. on, pp. 1 –6, Jun 2009.

[108] A. Baranes and P.-Y. Oudeyer, “Robust intrinsically motivated exploration and active learning,” in Development and Learning, IEEE Int. Conf. on, pp. 1 –6, Jun 2009.

[109] N. Sprague, “Basis iteration for reward based dimensionality reduction,” in Development and Learn- ing, IEEE Int. Conf. on, pp. 187 –192, Jul 2007.

[110] E. Uchibe and K. Doya, “Constrained reinforcement learning from intrinsic and extrinsic rewards,” in Development and Learning, IEEE Int. Conf. on, pp. 163 –168, Jul 2007.

[111] E. Ugur, M. Dogar, M. Cakmak, and E. Sahin, “Curiosity-driven learning of traversability affordance on a mobile robot,” in Development and Learning, IEEE Int. Conf. on, pp. 13 –18, Jul 2007.

[112] S. Yonekura, Y. Kuniyoshi, and Y. Kawaguchi, “Neural and behavioral substrates of emotions as ac- tions to reduce embodied dissonance,” in Development and Learning, IEEE Int. Conf. on, pp. 312 –317, Jul 2007.

[113] A. Thomaz and C. Breazeal, “Robot learning via socially guided exploration,” in Development and Learning, IEEE Int. Conf. on, pp. 82 –87, Jul 2007.

[114] E. Kim and B. Scassellati, “Learning to refine behavior using prosodic feedback,” in Development and Learning, IEEE Int. Conf. on, pp. 205 –210, Jul 2007.

[115] D. Caligiore, D. Parisi, and G. Baldassarre, “Toward an integrated biomimetic model of reaching,” in Development and Learning, IEEE Int. Conf. on, pp. 241 –246, Jul 2007.

[116] M. Luciw and J. Weng, “Laterally connected lobe component analysis: Precision and topography,” in Development and Learning, IEEE Int. Conf. on, pp. 1 –8, Jun 2009.

[117] A. Haith and S. Vijayakumar, “Robustness of vor and okr adaptation under kinematics and dynamics transformations,” in Development and Learning, IEEE Int. Conf. on, pp. 37 –42, Jul 2007.

[118] F. Stulp, I. Kresse, A. Maldonado, F. Ruiz, A. Fedrizzi, and M. Beetz, “Compact models of human reaching motions for robotic control in everyday manipulation tasks,” in Development and Learning, IEEE Int. Conf. on, pp. 1 –7, Jun 2009. 130 [119] K. Perlin, “Real time responsive animation with personality,” IEEE Transactions on Visualization and Computer Graphics, vol. 1, pp. 1–24, Mar 1995.

[120] D. Grollman and O. Jenkins, “Learning robot soccer skills from demonstration,” in Development and Learning, IEEE Int. Conf. on, pp. 276 –281, Jul 2007.

[121] S. Hart, “An intrinsic reward for affordance exploration,” in Development and Learning, IEEE Int. Conf. on, pp. 1 –6, Jun 2009.

[122] M. Nicolescu, O. Jenkins, and A. Stanhope, “Fusing robot behaviors for human-level tasks,” in Devel- opment and Learning, IEEE Int. Conf. on, pp. 76 –81, Jul 2007.

[123] G. Konidaris and A. Barto, “Sensorimotor abstraction selection for efficient, autonomous robot skill acquisition,” in Development and Learning, IEEE Int. Conf. on, pp. 151 –156, Aug 2008.

[124] C. Glaser, F. Joublin, and C. Goerick, “Learning and use of sensorimotor schemata maps,” in Devel- opment and Learning, IEEE Int. Conf. on, pp. 1 –8, Jun 2009.

[125] P. Howell, “Development of fluency control and the speech-language interface: The explan model of fluency control,” in Development and Learning, IEEE Int. Conf. on, pp. 336 –341, Jul 2007.

[126] M. Malfaz and M. Salichs, “Learning to deal with objects,” in Development and Learning, IEEE Int. Conf. on, pp. 1 –6, Jun 2009.

[127] E. Stone, M. Skubic, and J. Keller, “Adaptive temporal difference learning of spatial memory in the water maze task,” in Development and Learning, IEEE Int. Conf. on, pp. 85 –90, Aug 2008.

[128] N. Ranasinghe and W.-M. Shen, “Surprise-based developmental learning and experimental results on robots,” in 8th International Conference on Development and Learning, pp. 1–6, IEEE, 2009.

[129] H. Arie, T. Endo, T. Arakaki, S. Sugano, and J. Tani, “Creating novel goal-directed actions using chaotic dynamics,” in 8th International Conference on Development and Learning, pp. 1–6, IEEE, 2009.

[130] W. C. Ho, S. Watson, and K. Dautenhahn, “Amia: A knowledge representation model for computa- tional autobiographic agents,” in Development and Learning, IEEE Int. Conf. on, pp. 247 –252, Jul 2007.

[131] M. Ogino, T. Ooide, A. Watanabe, and M. Asada, “Acquiring peekaboo communication: Early com- munication model based on reward prediction,” in Development and Learning, IEEE Int. Conf. on, pp. 116 –121, Jul 2007.

[132] J. Beal and T. Knight, “Analyzing composability in a sparse encoding model of memorization and association,” in Development and Learning, IEEE Int. Conf. on, pp. 180 –185, Aug 2008.

[133] J. Van Herwegen, D. Ansari, F. Xu, and A. Karmiloff-Smith, “Can developmental disorders provide evidence for two systems of number computation in humans?,” in Development and Learning, IEEE Int. Conf. on, pp. 43 –47, Jul 2007.

131 [134] M. Chen and A. Leslie, “Continuous versus discrete quantity in infant multiple object tracking,” in Development and Learning, IEEE Int. Conf. on, pp. 105 –109, Jul 2007.

[135] J. Luo, “Rethinking piaget for a developmental robotics of object permanence,” in Development and Learning, IEEE Int. Conf. on, pp. 235 –240, Jul 2007.

[136] F. Shic, B. Scassellati, D. Lin, and K. Chawarska, “Measuring context: The gaze patterns of children with autism evaluated from the bottom-up,” in Development and Learning, IEEE Int. Conf. on, pp. 70 –75, Jul 2007.

[137] M. Kunda and A. Goel, “How thinking in pictures can explain many characteristic behaviors of autism,” in Development and Learning, IEEE Int. Conf. on, pp. 304 –309, Aug 2008.

[138] A. Franz and J. Triesch, “Modeling the development of causality and occlusion perception in infants,” in Development and Learning, IEEE Int. Conf. on, pp. 174 –179, Aug 2008.

[139] F. Dandurand, T. Shultz, and F. Rivest, “Complex problem solving with reinforcement learning,” in Development and Learning, IEEE Int. Conf. on, pp. 157 –162, Jul 2007.

[140] A. Berkeljon and M. Raijmakers, “An ART neural network model of discrimination learning,” in De- velopment and Learning, IEEE Int. Conf. on, pp. 169 –174, Jul 2007.

[141] O. Grynszpan, J.-C. Martin, and J. Nadel, “What influences human computer interaction in autism?,” in Development and Learning, IEEE Int. Conf. on, pp. 53 –58, Jul 2007.

[142] M. Pardowitz and R. Dillmann, “Towards life-long learning in household robots: The piagetian ap- proach,” in Development and Learning, IEEE Int. Conf. on, pp. 88 –93, Jul 2007.

[143] T. Erez and W. Smart, “What does shaping mean for computational reinforcement learning?,” in De- velopment and Learning, IEEE Int. Conf. on, pp. 215 –219, Aug 2008.

[144] F. Orabona, B. Caputo, A. Fillbrandt, and F. Ohl, “A theoretical framework for transfer of knowledge across modalities in artificial and biological systems,” in Development and Learning, IEEE Int. Conf. on, pp. 1 –7, Jun 2009.

[145] Y. Sugita and J. Tani, “A sub-symbolic process underlying the usage-based acquisition of a composi- tional representation: Results of robotic learning experiments of goal-directed actions,” in Develop- ment and Learning, IEEE Int. Conf. on, pp. 127 –132, Aug 2008.

[146] V. Gyenes and A. Lorinez, “Language development among co-learning agents,” in Development and Learning, IEEE Int. Conf. on, pp. 294 –299, Jul 2007.

[147] J. Mugan and B. Kuipers, “Learning to predict the effects of actions: Synergy between rules and landmarks,” in Development and Learning, IEEE Int. Conf. on, pp. 253 –258, Jul 2007.

[148] M. Cakmak, N. DePalma, R. Arriaga, and A. Thomaz, “Computational benefits of social learning mechanisms: Stimulus enhancement and emulation,” in Development and Learning, IEEE Int. Conf. on, pp. 1 –7, Jun 2009.

132 [149] A.-L. Vollmer, K. Lohan, K. Fischer, Y. Nagai, K. Pitsch, J. Fritsch, K. Rohlfing, and B. Wrede, “Peo- ple modify their tutoring behavior in robot-directed interaction for action learning,” in Development and Learning, IEEE Int. Conf. on, pp. 1 –6, Jun 2009.

[150] N. Butko and J. Movellan, “Learning to learn,” in Development and Learning, IEEE Int. Conf. on, pp. 151 –156, Jul 2007.

[151] V. Schmittmann and M. Raijmakers, “Development of reversal shift learning. an individual differences analysis,” in Development and Learning, IEEE Int. Conf. on, pp. 199 –204, Jul 2007.

[152] Y. Nagai and K. Rohlfing, “Parental action modification highlighting the goal versus the means,” in Development and Learning, IEEE Int. Conf. on, pp. 1 –6, Aug 2008.

[153] P. Ruvolo, J. Whitehill, M. Virnes, and J. Movellan, “Building a more effective teaching robot using apprenticeship learning,” in Development and Learning, IEEE Int. Conf. on, pp. 209 –214, Aug 2008.

[154] W. Knox and P. Stone, “Tamer: Training an agent manually via evaluative reinforcement,” in Develop- ment and Learning, IEEE Int. Conf. on, pp. 292 –297, Aug 2008.

[155] L. Montesano, M. Lopes, A. Bernardino, and J. Santos-Victor, “Affordances, development and imita- tion,” in Development and Learning, IEEE Int. Conf. on, pp. 270 –275, Jul 2007.

[156] J. Schwertfeger and O. Jenkins, “Multi-robot belief propagation for distributed robot allocation,” in Development and Learning, IEEE Int. Conf. on, pp. 193 –198, Jul 2007.

[157] J. Kwon and Y. Choe, “Internal state predictability as an evolutionary precursor of self-awareness and agency,” in Development and Learning, IEEE Int. Conf. on, pp. 109 –114, Aug 2008.

[158] W. Erlhagen, A. Mukovskiy, F. Chersi, and E. Bicho, “On the development of intention understanding for joint action tasks,” in Development and Learning, IEEE Int. Conf. on, pp. 140 –145, Jul 2007.

[159] K. Sims, “Evolving virtual creatures,” in Proceedings of the 21st Annual Conference on Computer Graphics and Interactive Techniques, SIGGRAPH, (Orlando, FL), pp. 15–22, ACM, July 1994.

[160] S. Hart, S. Sen, and R. A. Grupen, “Generalization and transfer in robot control,” in Proceedings of the 8th International Conference on Epigenetic Robots: Modeling Cognitive Development in Robotic Systems, vol. 139, (Brighton, UK), pp. 37–44, Lund University Cognitive Studies, Jul 2008.

[161] M. Huber and L. Gruben, “Feedback control stucture for online learning tasks,” Journal of Robots and Autonomous Systems, vol. 22, no. 3-4, pp. 303–315, 1997.

[162] A. Aronson, Clinical Examinations in Neurology. Philadelphia, PA: W. B. Saunders Co., 1981.

[163] J. Konczak, “On the notion of motor primitives in humans and robots,” in Proceedings of the 5th International Workshop on Epigenetic Robots: Modeling Cognitive Development in Robotic Systems, vol. 123, pp. 47–53, July 2005.

[164] B. A. Brody, H. C. Kenney, A. Kloman, and G. F. H., “Sequence of central nervous system myelination in human infancy: I An autopsy study of myelination,” Journal of Neurpathology, vol. 46, no. 3, pp. 283–301, 1987. 133 [165] G. Li and J. Martin, “Postnatal development of connectional specificity of corticospinal terminals in the cat,” Journal of Comparative Neurology, vol. 447, no. 1, pp. 57–71, 2002.

[166] A. Edsinger and C. C. Kemp, “What can I control? A framework for robot self-discovery,” in Proceed- ings of the 6th International Conference on Epigenetic Robots, (Paris, FR), pp. 1–8, 2006.

[167] W.-M. Shen, B. Salemi, and P. Will, “Hormone-inspired adaptive communication and distributive control for CONRO self-reconfigurable robots,” IEEE Transactions on Robots and Automation, vol. 18, pp. 700–712, Oct 2002.

[168] J. Zlatev, “The mimetic origins of self-consciousness in phylo-, onto- and robotogenesis,” in 26th Annual Conference of the IEEE Industrial Electronics Society (IECON), vol. 4, pp. 2921–2928, 2000.

[169] K. Dautenhahn and A. Billard, “Studying robot social cognition within a developmental psychology framework,” in 3rd European Workshop on Advanced Mobile Robots (EuroBot), pp. 187–194, IEEE, 1999.

[170] R. Brooks, “Elephants don’t play chess,” Designing autonomous agents: Theory and practice from biology to engineering and back, vol. 6, pp. 3–15, 1991.

[171] B. Bril and Y. Breniere, “Postural requirements and progression velocity in young walkers,” Journal of Motor Behavior, vol. 24, no. 1, pp. 105–116, 1992.

[172] M. Lungarella and L. Berthouze, “Adaptivity through physical immaturity,” in Proceedings of the 2nd International Workshop on Epigenetic Robotics, pp. 79–86, 2002.

[173] G. Taga, “Freezing and freeing degrees of freedom in a model of neuro-musculo skeletal system for the development of locomotion,” in Proceedings of the 16th International Congress of the Society of Biomechanics, vol. 47, 1997.

[174] X. Huang and J. Weng, “Novelty and reinforcement learning in the value system of developmental robots,” in Proceedings of the 2nd International Workshop on Epigenetic Robots: Modeling Cognitive Development in Robotic Systems, vol. 94, pp. 47–55, Lund University Cognative Studies, 2002.

[175] J. Weng, “A theory for mentally developing robots,” in 2nd International Conference on Development and Learning, IEEE Computer Society Press, 2002.

[176] D. Collin, L. Atlas, and R. Lander, “Improving generalization with active learning,” Machine Learning, vol. 15, no. 2, pp. 201–221, 1994.

[177] S. Thurn, “Exploration in active learning,” Handbook of Brain Science and Neural Networks, 1994.

[178] A. Barto, S. Singh, and N. Chentanez, “Intrinsically motivated learning of hierarchical collections of skills,” in 3rd International Conference on Development and Learning, 2004.

[179] J. Marshall, D. Blank, and L. Meeden, “An emergent framework for self motivation in developmental robotics,” in 3rd International Conference on Development and Learning, 2004.

134 [180] M. Cakmak, M. R. Dogar, E. Ugur, and E. Sahin, “Affordances as a framework for robot control,” in Proceedings of the 7th International Conference on Epigenetic Robots: Modeling Cognitive Develop- ment in Robotic Systems, vol. 135, pp. 1–8, 2004.

[181] P.-Y. Oudeyer and F. Kaplan, “Intelligent adaptive curiosity: a source of self-development,” in Pro- ceedings of the 4th International Workshop on Epigenetic Robots, vol. 117, pp. 127–130, 2004.

[182] S. Schaal and C. Atkeson, “Robot juggling: implementation of memory-based learning,” Control Sys- tems Magazine, IEEE, vol. 14, pp. 57–71, Feb 1994.

[183] J. Friedman, “Classification and multiple regression through projection pursuit,” Tech. Rep. Technical Report LCS12, Stanford University, Lab for Computational Statistics, Stanford, CA, 1985.

[184] Y. Demiris and A. Dearden, “From motor babbling to hierarchical learning by imitation: a robot devel- opmental pathway,” in Proceedings of the 5th International Conference on Epigenetic Robots: Model- ing Cognitive Development in Robotic Systems, vol. 123, pp. 31–37, July 2005.

[185] H. Gomi and M. Kawato, “Human arm stiffness and equilibrium-point trajectory during multi-joint movement,” in Biological Cybernetics, vol. 76, pp. 163–171, Springer-Verlag, 1997.

[186] C. vonHofsten, “Development of visually directed reaching: the approach phase,” Journal of Human Motion Studies, vol. 5, pp. 160–168, 1979.

[187] A. Matthew and M. Cook, “The control of reaching movements by young infints,” Child Development, vol. 61, pp. 1238–1257, 1990.

[188] R. S. Johansson, G. Westling, A. Backstrom, and J. R. Flanagan, “Eye hand coordination in object manipulation,” Journal of Neuroscience, vol. 21, no. 17, pp. 6917–6932, 2001.

[189] R. K. Clefton, D. W. Muir, D. H. Ashmead, and M. G. Clarkson, “Is visual guided reaching in early infancy a myth?,” Child Development, vol. 64, pp. 1099–1110, 1993.

[190] F. Lacquaniti and J. Soechting, “Coordination of arm and wrist motion during reaching task,” Journal of Neuroscience, vol. 2, no. 2, pp. 334–408, 1982.

[191] A. Fod, M. Matarik, and O. C. Jenkins, “Automated derivation of primatives for movement classifi- cation,” in 1st IEEE RAS International Conference on Humanoid Robotics, (Cambridge, MA), MIT Press, 200.

[192] A. M. Smith, “Does the cerebellum learn strategies for the optimal time varying control of joint stiff- ness?,” Behavioral Brain Science, vol. 19, 1996.

[193] A. Edsinger-Gonzales, “Developmentally guided ego-exo force discrimination for a humanoid robot,” in Proceedings of the 5th International Conference on Epigenetic Robots: Modeling Cognitive Devel- opment in Robotic Systems, vol. 123, pp. 31–37, July 2005.

[194] R. Fletcher, Pratical Methods of Optimization, vol. 2, Constrained Optimization. New York: Wiley, 1985.

135 [195] A. E. Bryson Jr., Dynamic Optimization. Menlo Park, California: Addison-Wesley, 1999.

[196] J. T. Betts, “Survey of numerical methods for trajectory optimization,” Journal of Guidance, Control, and Dynamics, vol. 21, pp. 193–207, Mar-Apr 1998.

[197] Z. Michalewicz, J. B. Krawczyk, M. Kazemi, and C. Z. Janikow, “Genetic algorithms and optimal control problems,” in Conference on Decision and Control, (Honolulu, HI, USA), pp. 1664–1666, IEEE, Dec 1990.

[198] J. Tian, N. Cui, and R. Mu, “Optimal formation reconfiguration using genetic algorithms,” in Interna- tional Conference on Computer Modeling and Simulation, pp. 95–98, IEEE, Feb 2009.

[199] G. Elnagar, M. A. Kazemi, and M. Razzaghi, “The pseudospectral Legendre method for discretizing optimal control problems,” IEEE Transactions on Automatic Control, vol. 40, no. 10, pp. 1793–1796, 1995.

[200] I. M. Ross and F. Fahroo, “Pseudospectral knotting methods for solving optimal control problems,” Journal of Guidance, Control and Dynamics, vol. 27, no. 3, pp. 397–405, 2004.

[201] D. A. Benson, G. T. Huntington, T. P. Thorvaldsen, and A. V. Rao, “Direct trajectory optimization and costate estimation via an orthogonal collocation method,” AIAA Journal of Guiadance, Control, and Dynamics, vol. 29, pp. 1435–1440, Nov-Dec 2006.

[202] F. Fahroo and I. M. Ross, “User’s manual for DIDO 2002: A MATLAB application package for dy- namic optimization,” Technical Report NPS-AA-02-002, Naval Postgraduate School, Monterey, CA, June 2002.

[203] N. Bedrossian, S. Bhatt, M. Lammers, L. Nguyen, and Y. Zhang, “First ever flight demonstration of zero propellant manueverTM attitude control concept,” in Guidance, Navigation, and Control Confer- ence, vol. AIAA-2007-6734, (Hilton Head, South Carolina), pp. 1–12, AIAA, August 2007.

[204] S. I. Infeld, S. B. Josselyn, W. Murray, and I. M. Ross, “Design and control of liberation point space- craft formations,” in Guidance, Navigation, and Control Conference, vol. AIAA-2004-4786, (Provi- dence, Rhode Island), pp. 1–13, AIAA, August 2004.

[205] W. J. Karasz, “Optimal re-entry trajectory terminal state due to variations in waypoint locations,” Master’s thesis, Air Force Institute of Technology, Wright Patterson Air Force Base, OH, Dec 2008.

[206] C. Gogu, T. Matsumura, R. T. Haftka, and A. V. Rao, “Aeroassisted orbital transfer trajectory opti- mization considering thermal protection system mass,” Journal of Guidance, Control and Dynamics, vol. 32, pp. 927–938, May-June 2009.

[207] G. S. Aoude, J. P. How, and I. M. Garcia, “Two-stage path planning approach for designing multiple spacecraft reconfiguration maneuvers,” in 20th International Symposium on Space Flight Dynamics, (Annapolis, Maryland), pp. 1–16, NASA GSFC, September 2007.

[208] S. A. Vase, J. E. DeVault, and P. Krishnaswami, “Modeling of hybrid electromechanical systems using a component-based approach,” in International Conference on Mechatronics and Automation, (Niagra Falls, Ontario, Canada), pp. 204–209, IEEE, July 2005. 136 [209] G. D. Wood and D. C. Kennedy, “Simulating mechanical systems in Simulink with SimMechanics,” Tech. Rep. 91124v00, The MathWorks, Natick, MA, USA, 2003.

[210] M. I. C. Dede and S. Tosunoglu, “Development of a real-time force-reflecting teleoperation system based on MATLAB simulations,” in Florida Conference on Recent Advances in Robotics, (Miami, FL), pp. 1–12, ASME, May 2006.

[211] M. I. C. Dede, S. Tosunoglu, and D. W. Repperger, “Effects of time delay on force-feedback teleop- eration systems,” in Mediterranean Conference on Control Automation, (Kusadasi, Aydin ,Turkey), pp. 1–6, IEEE/CSS, June 2004.

[212] P. Skworcow, D. Putra, A. Sahih, J. Goodband, O. C. L. Haas, K. J. Burnham, and J. A. Mills, “Pre- dictive tracking for respiratroy induced motion compensation in adaptive radiotherapy,” in UKACC Control Conference, (Glasgow, UK), pp. 203–210, IFAC, IEEE, Aug 2006.

[213] S.-D. Stan, M. Manic, V. Maties, and R. Balan, “Kinematic analysis design, and control of an Isoglide3 parallel robot (IG3PR),” in Industrial Electronics Conference, pp. 2636–2641, IEEE, Nov 2008.

[214] D. E. Kirk, Optimal Control Theory: An Introduction. New York: Dover, 2004.

[215] Motoman Inc., “IA20 robot, DS-304-A.” datasheet, West Carrolton, OH, Oct 2006.

[216] G. Capi, Y. Nasu, L. Barolli, M. Yamano, K. Mitobe, and K. Takeda, “A neural network implemen- tation of biped robot optimal gait during walking generated by genetic algorithm,” in Mediterranean Conference on Control and Automation, (Dubrovnik, Croatia), pp. 1–8, IEEE, KoREMA, Jul 27-29 2001.

[217] M. Saito, M. Fukaya, and T. Iwasaki, “Serpentine locomotion with robotic snakes,” Control Systems Magazine, IEEE, vol. 22, pp. 64 –81, Feb. 2002.

[218] H. J. Bremermann, M. Rogson, and S. Salaff, “Global properties of evolution processes,” in Natural Automata and Useful Simulations (H. H. Pattee, E. A. Edlsack, L. Fein, and A. B. Callahan, eds.), (Washington DC), pp. 3–41, Spartan Books, 1966.

[219] D. E. Goldberg, Gentic Algorithms in Search, Optimization, and Machine Learning. Addison-Wesley Professional, 1 ed., Jan 1989.

[220] M. Dorigo and T. Stutzle,¨ Ant Colony Optimization: From Natural to Artificial Systems. The MIT Press, 2004.

[221] K. Passino, “Biomimicry of bacterial foraging for distributed optimization and control,” Control Sys- tems Magazine, IEEE, vol. 22, pp. 52 –67, June 2002.

[222] J. Kennedy and R. Eberhart, “Particle swarm optimization,” in Neural Networks, 1995. Proceedings., IEEE International Conference on, vol. 4, pp. 1942 –1948, 1995.

[223] J. Ronkk¨ onen,¨ Continuous Multimodal Global Optimization with Differential Evolution-Based Meth- ods. Doctor of science, Lappeenranta University of Technology, Lappeenranta, Finland, Dec 2009.

137 [224] K. Krishnanand and D. Ghose, “Glowworm swarm optimization for simultaneous capture of multiple local optima of multimodal functions,” Swarm Intelligence, vol. 3, pp. 87–124, 2009.

[225] S. W. Mahfoud, Niching Methods for Genetic Algorithms. Doctor of philosophy, University of Illinois at Urbana-Champaign, May 1995.

[226] R. Brits, A. P. Engelbrecht, and F. V. D. Bergh, “A niching particle swarm optimizer,” in In Proceedings of the Conference on Simulated Evolution And Learning, pp. 692–696, 2002.

[227] F. N. Fritsch and R. E. Carlson, “Monotone piecewise cubic interpolation,” SIAM Journal on Numeri- cal Analysis, vol. 17, no. 2, pp. pp. 238–246, 1980.

[228] Motoman Inc., “HP3 robot, DS-231-E.” datasheet, West Carrolton, OH, Feb 2007.

[229] M. Asada, K. Hosoda, Y. Kuniyoshi, H. Ishiguro, T. Inui, Y. Yoshikawa, M. Ogino, and C. Yoshida, “Cognitive developmental robotics: A survey,” Autonomous Mental Development, IEEE Transactions on, vol. 1, pp. 12 –34, May 2009.

[230] A. Jennings and R. Ordonez, “Biomimetic learning, not learning biomimetics: A survey of develop- mental learning,” Aerospace and Electronics Conference (NAECON), Proceedings of the IEEE 2010 National, pp. 11 –17, July 2010.

[231] C. Liu, Q. Chen, and D. Wang, “CPG-inspired workspace trajectory generation and adaptive loco- motion control for quadruped robots,” Systems, Man, and Cybernetics, Part B: Cybernetics, IEEE Transactions on, vol. 41, pp. 867 –880, June 2011.

[232] T. Li, Y.-T. Su, S.-W. Lai, and J.-J. Hu, “Walking motion generation, synthesis, and control for biped robot by using PGRL, LPI, and fuzzy logic,” Systems, Man, and Cybernetics, Part B: Cybernetics, IEEE Transactions on, vol. 41, pp. 736 –748, June 2011.

[233] P.-Y. Wu, D. Campbell, and T. Merz, “Multi-objective four-dimensional vehicle motion planning in large dynamic environments,” Systems, Man, and Cybernetics, Part B: Cybernetics, IEEE Transactions on, vol. 41, pp. 621 –634, June 2011.

[234] M. Narimani, H. Lam, R. Dilmaghani, and C. Wolfe, “LMI-based stability analysis of fuzzy-model- based control systems using approximated polynomial membership functions,” Systems, Man, and Cybernetics, Part B: Cybernetics, IEEE Transactions on, vol. 41, pp. 713 –724, June 2011.

[235] Y. Han, “Computer animation in mobile phones using a motion capture database compressed by poly- nomial curve-fitting techniques,” Consumer Electronics, IEEE Transactions on, vol. 54, pp. 1008 – 1016, August 2008.

[236] D. M. Wolpert, “Computational approaches to motor control,” Trends in Cognitive Sciences, vol. 1, no. 6, pp. 209 – 216, 1997.

[237] T. Ha and C.-H. Choi, “An effective trajectory generation method for biped walking,” Robotics and Autonomous Systems, vol. 55, pp. 795–810, June 2007.

138 [238] W. Provancher, S. Jensen-Segal, and M. Fehlberg, “ROCR: An energy-efficient dynamic wall-climbing robot,” Mechatronics, IEEE/ASME Transactions on, vol. 16, pp. 897 –906, Oct. 2011.

[239] H. Wei, Y. Chen, J. Tan, and T. Wang, “Sambot: A self-assembly modular robot system,” Mechatron- ics, IEEE/ASME Transactions on, vol. 16, pp. 745 –757, Aug. 2011.

[240] A. Goswami, “Kinematic and dynamic analogies between planar biped robots and the reaction mass pendulum (RMP) model,” in Humanoid Robots, 8th IEEE-RAS International Conference on, IEEE, Dec. 2008.

[241] S. Ma, T. Tomiyama, and H. Wada, “Omnidirectional static walking of a quadruped robot,” Robotics, IEEE Transactions on, vol. 21, pp. 152 – 161, April 2005.

[242] E. R. Westervelt, J. W. Grizzle, C. Chevallereau, J. H. Choi, and B. Morris, Feedback Control of Bipedal Robot Locomotion, vol. 1 of Control and Automation. Boca Raton, FL: CRC Press, 1 ed., 2007.

[243] J. Su and W. Xie, “Motion planning and coordination for robot systems based on representation space,” Systems, Man, and Cybernetics, Part B: Cybernetics, IEEE Transactions on, vol. 41, pp. 248 –259, Feb. 2011.

[244] J. Kober and J. Peters, “Policy search for motor primitives in robotics,” Machine Learning, vol. 84, pp. 171–203, 2011. 10.1007/s10994-010-5223-6.

[245] D. Li, A. Becker, K. Shorter, T. Bretl, and E. Hsiao-Wecksler, “Estimating system state during human walking with a powered ankle-foot orthosis,” Mechatronics, IEEE/ASME Transactions on, vol. 16, pp. 835 –844, Oct. 2011.

[246] Y. W. Wong, K. P. Seng, and L.-M. Ang, “Radial basis function neural network with incremental learning for face recognition,” Systems, Man, and Cybernetics, Part B: Cybernetics, IEEE Transactions on, vol. 41, pp. 940 –949, Aug. 2011.

[247] X. Fan and J. Yen, “Modeling cognitive loads for evolving shared mental models in human-agent collaboration,” Systems, Man, and Cybernetics, Part B: Cybernetics, IEEE Transactions on, vol. 41, pp. 354 –367, April 2011.

[248] P. P. Lillard and L. L. Jessen, Montessori from the start: the child at home from birth to age three. New York: Schocken Books, 1st ed., 2003.

[249] P. Jansen-Osmann, S. Richter, J. Konczak, and K. Kalveram, “Force adaptation transfers to untrained workspace regions in children: Evidence for developing inverse dynamic models,” Experimental Brain Research, vol. 143, pp. 212–220, 2002.

[250] J. Konczak, “Neural development and sensorimotor control,” in Proceedings of the Fourth Interna- tional Workshop on Epigenetic Robotics (L. Berthouze, H. Kozima, C. G. Prince, G. Sandini, G. Sto- janov, G. Metta, and C. Balkenius, eds.), vol. 117, (Genoa, Italy), pp. 11–13, Lund: LUCS, 2004.

139 [251] M. Aureli, V. Kopman, and M. Porfiri, “Free-locomotion of underwater vehicles actuated by ionic polymer metal composites,” Mechatronics, IEEE/ASME Transactions on, vol. 15, pp. 603 –614, Aug. 2010.

[252] A. Jennings and R. Ordonez, “Memory-based motion optimization for unbounded resolution,” Proceedings of the IASTED International Conference, vol. 753, pp. 81–88, Nov 2011. doi: 10.2316/P.2011.753-031.

[253] R. Bellman, Dynamic Programming. Princeton University Press, 1958.

[254] T. G. Kolda, R. M. Lewis, and V. Torczon, “Optimization by direct search: New perspectives on some classical and modern methods,” SIAM Review, vol. 45, no. 3, pp. 385–482, 2004. doi: 10.1137/S003614450242889.

[255] G. H. Behforooz, “A comparison of the E(3) and not-a-knot cubic splines,” Applied Mathematics and Computation, vol. 72, no. 2-3, pp. 219 – 223, 1995. doi: 10.1016/0096-3003(94)00185-7.

[256] D. Garg, W. Hager, and A. Rao, “Gauss pseudospectral method for solving infinite-horizon optimal control problems,” Guidance, Navigation and Control Conference, pp. 1–9, Aug 2010. AIAA-2010- 7890.

[257] W. Cleveland, “Robust locally weighted regression and smoothing scatterplots,” Journal of the Amer- ican Statistical Association, vol. 74, pp. 829–836, Dec 1979.

[258] S. Kadade and G. Shakhnarovich, “Large scale learning: Locally weighted regression.” CMSC 3590 Lecture Notes, Toyota Technological Institute at Chicago, 2009.

[259] M. de Oca, T. Stutzle, K. Van den Enden, and M. Dorigo, “Incremental social learning in particle swarms,” Systems, Man, and Cybernetics, Part B: Cybernetics, IEEE Transactions on, vol. 41, pp. 368 –384, April 2011.

[260] A. C. Oztireli,¨ G. Guennebaud, and M. Gross, “Feature preserving point set surfaces based on non- linear kernel regression.,” Computer Graphics Forum, vol. 28, no. 2, pp. 493 – 501, 2009.

[261] J. Bongard and H. Lipson, “Automated damage diagnosis and recovery for remote robotics,” Robotics and Automation Proceedings, IEEE International Conference on, vol. 4, pp. 3545 – 3550, May 2004. doi: 10.1109/ROBOT.2004.1308802.

[262] A. Nied, J. de Oliveira, R. de Farias Campos, R. Dias, and L. de Souza Marques, “Soft starting of induction motor with torque control,” Industry Applications, IEEE Transactions on, vol. 46, pp. 1002 –1010, May-June 2010.

[263] W. Kaal and S. Herold, “Electroactive polymer actuators in dynamic applications,” Mechatronics, IEEE/ASME Transactions on, vol. 16, pp. 24 –32, Feb. 2011.

[264] S. John, G. Alici, and C. Cook, “Inversion-based feedforward control of polypyrrole trilayer bender actuators,” Mechatronics, IEEE/ASME Transactions on, vol. 15, pp. 149 –156, Feb. 2010.

140 [265] H. Xie and S. Re andgnier, “Development of a flexible robotic system for multiscale applications of micro/nanoscale manipulation and assembly,” Mechatronics, IEEE/ASME Transactions on, vol. 16, pp. 266 –276, April 2011.

[266] P. Tandon, A. Awasthi, B. Mishra, P. Rathore, and R. Shukla, “Design and simulation of an intelligent bicycle transmission system,” Mechatronics, IEEE/ASME Transactions on, vol. 16, pp. 509 –517, June 2011.

[267] A. Jennings and R. Ordonez, “Population based optimization for variable operating points,” Evolution- ary Computation (CEC), 2011 IEEE Congress on, pp. 145 –151, June 2011.

[268] S. Freedman and J. Adams, “Filtering data based on human-inspired forgetting,” Systems, Man, and Cybernetics, Part B: Cybernetics, IEEE Transactions on, vol. 41, pp. 1544 –1555, Dec. 2011.

[269] C. Gurrin, H. Lee, and J. Hayes, “iForgot: a model of forgetting in robotic memories,” Proceedings of the 5th ACM/IEEE International Conference on Human-Robot Interaction, pp. 93–94, 2010.

[270] R. Wood, P. Baxter, and T. Belpaeme, “A review of long-term memory in natural and synthetic sys- tems,” Adaptive Behavior, vol. 20, no. 2, pp. 81–103, 2012.

[271] W. C. Ho, K. Dautenhahn, M. Y. Lim, P. Vargas, R. Aylett, and S. Enz, “An initial memory model for virtual and robot companions supporting migration and long-term interaction,” Robot and Human In- teractive Communication, 2009. RO-MAN 2009. The 18th IEEE International Symposium on, pp. 277 –284, Oct. 2009.

[272] V. L. Gadsden, “The arts and education: Knowledge generation, pedagogy, and the discourse of learn- ing,” Review of Research in Education, vol. 32, no. 1, pp. 29–61, 2008.

141 APPENDIX A

PROGRAM EXAMPLES

A.1 Pendulum Control Optimization

In Chapter IV, Section 4.4, the minimum energy control that would restore the pendulum to equilibrium was determined from a control optimization program, DIDO, using dynamics of the pendulum as determined from a Simulink model. The optimization was repeated over a set of initial angles, so that a control signal could be determined for a range of outputs. The model was set up with a positive or negative gravity to test a suspended or inverted condition, respectively. In addition, programs calculating the linear-quadratic state transition controller’s and the infinite horizon linear-quadratic-regulator’s performance are included.

Code is available as “SimMechanics pendulum used for control optimization” www.mathworks.com/matlabcentral/fileexchange/28597 on MATLAB Central File Exchange (Retrieved Jun 10, 2012).

A.2 Robotic Arm Pick-Place Optimization

In Chapter IV, Section 4.5, the minimum energy control for a robotic arm to perform a pick-place com- mand was determined from a control optimization program, DIDO, using dynamics of the robotic arm as determined from a Simulink model.

Code is available as “Control optimization of a 4DOF arm using DIDO” 142 www.mathworks.com/matlabcentral/fileexchange/28596 on MATLAB Central File Exchange (Retrieved Jun 10, 2012).

A.3 Using Continuous Optimal Inverse Functions for Optimizing Precision

In Chapter V, Section 5.4.2, continuous optimal inverse functions are created to optimize the precision of the radial distance to the tip by adjusting the pose.

Code is available as “Inverse Optimal Functions for Motoman HP-3 Tip Precision” www.mathworks.com/matlabcentral/fileexchange/37649 on MATLAB Central File Exchange (Retrieved Jul 27, 2012).

A.4 Testing of Continuous Autonomous Learning

In Chapter VI, Section 6.4, the performance of using cubic interpolation and locally weighted regression to allow for unbounded resolution in optimizing a continuous function is evaluated.

Code is available as “Unbounded Resolution for Function Approximation” www.mathworks.com/matlabcentral/fileexchange/ on MATLAB Central File Exchange.

143