A DIFFERENTIAL GAMES APPROACH FOR ANALYSIS OF SPACECRAFT POST-DOCKING OPERATIONS

By TAKASHI HIRAMATSU

A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY UNIVERSITY OF FLORIDA 2012 c 2012 Takashi Hiramatsu

2 I dedicate this to everyone that helped me write this manuscript.

3 ACKNOWLEDGMENTS My biggest appreciation goes to my advisor Dr. Norman G. Fitz-Coy for his great help and support. Every time I talked to him he motivated me with critical responses and encouraged me whenever I was stuck in the middle of my research. I also thank my committee Dr. Warren Dixon, Dr. Gloria Wiens, and Dr. William Hager for their supports. Finally, I thank my colleagues in Space Systems Group and all other friends, who directly or indirectly helped me throughout the years I spent at University of Florida.

4 TABLE OF CONTENTS page ACKNOWLEDGMENTS...... 4 LIST OF TABLES...... 8 LIST OF FIGURES...... 9 ABSTRACT...... 11

CHAPTER 1 INTRODUCTION...... 13 1.1 Spacecraft Rendezvous and Docking...... 13 1.1.1 Cooperative Scenarios...... 14 1.1.2 Noncooperative Scenarios...... 14 1.2 Small Satellites...... 15 1.3 Game Theoretic Approach...... 15 2 MATHEMATICAL BACKGROUND FOR THE APPROACH...... 18 2.1 Differential Games and Control Theory...... 18 2.1.1 Minimax Strategy...... 20 2.1.2 Nash Strategy...... 20 2.1.3 Stackelberg Strategy...... 21 2.1.4 Open-Loop Strategies for Two-Person Linear Quadratic Differential Games...... 23 2.2 Numerical Methods to Optimal Control Problem...... 24 2.3 Bilevel Programming...... 25 3 TECHNICAL DESCRIPTION...... 27 3.1 Reduction of Stackelberg Differential Games to Optimal Control...... 30 3.2 Conversion of Stackelberg Differential Games to Stackelberg Static Games 32 3.3 Costate Mapping of Stackelberg Differential Games...... 35 3.4 Conclusion...... 45 4 DYNAMICS OF DOCKED SPACECRAFT...... 47 4.1 Formulation of Dynamics...... 47 4.1.1 Relative Motion Dynamics of a Satellite...... 47 4.1.1.1 Translation...... 48 4.1.1.2 Rotation...... 48 4.1.2 Dynamics of Two Docked Satellites...... 49 4.2 Simulation...... 52 4.2.1 Case I: Nonzero Linear Velocity...... 52 4.2.2 Case II: Nonzero Rotational Velocity...... 58

5 4.2.3 Case III: Nonzero Linear and Rotational Velocities...... 62 4.3 Conclusion...... 66 5 LINEAR CONTROLLER DESIGN WITH STACKELBERG STRATEGY..... 67 5.1 Post-Docking Study with Linear Quadratic Game...... 67 5.2 Simulation and Results...... 69 5.3 Conclusion...... 75 6 SOLUTIONS TO TWO-PLAYER LINEAR QUADRATIC STACKELBERG GAMES WITH TIME-VARYING STRUCTURE...... 77 6.1 Game Based on Additive Errors...... 78 6.1.1 Open-loop Stackelberg Solution...... 80 6.1.2 Closed-loop Stackelberg Solution...... 85 6.2 Game Based on Multiplicative Errors...... 92 6.2.1 Open-loop Stackelberg Solution...... 94 6.2.2 Closed-loop Stackelberg Solution...... 99 6.3 Simulations and Results...... 99 6.4 Conclusion...... 106 7 CONCLUSION AND FUTURE WORKS...... 108

APPENDIX A OPTIMALITY CONDITIONS OF TWO-PERSON STACKELBERG DIFFERENTIAL GAMES...... 109 A.1 Fixed Final Time...... 109 A.1.1 Follower’s Strategy...... 109 A.1.1.1 Variation of the augmented cost functional...... 110 A.1.1.2 Optimality conditions...... 112 A.1.2 Leader’s Strategy...... 112 A.1.2.1 Variation of the augmented cost functional...... 113 A.1.2.2 Optimality conditions...... 115 A.2 Free Final Time...... 115 A.2.1 Follower’s Strategy...... 116 A.2.1.1 Variation of the augmented cost functional...... 116 A.2.1.2 Optimality conditions...... 118 A.2.2 Leader’s strategy...... 119 A.2.2.1 Variation of the augmented cost functional...... 119 A.2.2.2 Optimality conditions...... 122 A.3 Linear-Quadratic Differential Game...... 122 A.3.1 Fixed Final Time...... 123 A.3.2 Free Final Time...... 123

6 B RISE STABILITY ANALYSIS...... 126 B.1 Rise Feedback Control Development...... 126 B.2 Stability Analysis...... 128 C COSTATE ESTIMATION FOR THE TRANSCRIBED STACKELBERG GAMES 132 C.1 Transformed Optimality Conditions...... 132 C.2 Discretization of Two-person Stackelberg Differential Games...... 134 C.3 KKT Conditions and Costate Mapping...... 136 REFERENCES...... 141 BIOGRAPHICAL SKETCH...... 147

7 LIST OF TABLES Table page 4-1 The simulation parameters for Case I...... 55 4-2 The simulation parameters for Case II...... 59 4-3 The simulation parameters for Case III...... 63 5-1 The simulation parameters for the linear quadratic game...... 72 6-1 The simulation parameters for the Stackelberg-RISE controller...... 105

8 LIST OF FIGURES Figure page 1-1 A design iteration through satellite post-docking analysis...... 17 3-1 Relationship among optimization problems...... 29 3-2 Direct and indirect methods...... 30 4-1 A representation of the position of a satellite with the inertial and the nominal reference frames...... 47

4-2 A satellite with a body-fixed reference frame i ...... 48 F 4-3 An exeggerated view of two satellites near the nominal orbit...... 50 4-4 Two satellites initially on the same nominal orbit...... 53 4-5 Two satellites initially radially aligned...... 54 4-6 Case I: the interaction forces applied to the SV and the RSO...... 54 4-7 Case I: the interaction torques applied to the SV and the RSO...... 56 4-8 Case I: the linear motion of the RSO relative to the SV...... 56 4-9 Case I: the rotational motion of the RSO relative to the SV...... 57 4-10 Case II: the interaction forces applied to the SV and the RSO...... 58 4-11 Case II: the interaction torques applied to the SV and the RSO...... 60 4-12 Case II: the linear motion of the RSO relative to the SV...... 60 4-13 Case II: the rotational motion of the RSO relative to the SV...... 61 4-14 Case III: the interaction forces applied to the SV and the RSO...... 62 4-15 Case III: the interaction torques applied to the SV and the RSO...... 64 4-16 Case III: the linear motion of the RSO relative to the SV...... 64 4-17 Case III: the rotational motion of the RSO relative to the SV...... 65 5-1 Two rigid bodies on circular orbits...... 67 5-2 The resultant trajectory...... 73 5-3 The control force inputs...... 74 5-4 The control torque inputs...... 75 6-1 The relationship among the current and the desired orientations...... 92

9 6-2 Two docked satellites approximated as two rigid bodies connected via a torsion spring...... 100 6-3 f (t) and g(t) as respective weights on the game and an arbitrary disturbances. 104 6-4 The simulation results for the Stackelberg and RISE controller...... 107

10 Abstract of Dissertation Presented to the Graduate School of the University of Florida in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy A DIFFERENTIAL GAMES APPROACH FOR ANALYSIS OF SPACECRAFT POST-DOCKING OPERATIONS By Takashi Hiramatsu August 2012 Chair: Norman G. Fitz-Coy Major: Mechanical Engineering An increase of responsive space assets will contribute to the growing number of spacecraft in space and in turn the growing potential for failures to occur. The number of spacecraft which has past its operational life also keeps increasing. Without proper treatment these satellites become space debris, which could lead to more failure due to collision with other spacecraft. Thus, there will be a need for effective debris abatement (i.e., repair and /or disposal of failed satellites) which will require autonomous service satellites. Such a “tow truck” concept is expected to take a crucial role for sustainable small satellite utilization in the future. Current and past investigation where autonomous docking plays an important role have all considered “cooperative” interactions between satellites. That is, either the target has the same goals as the service vehicle or the target is not actuated and passively follows the lead of the service vehicle. Cooperative scenarios are not always guaranteed, thus it is imperative that we consider docking scenarios with “noncooperative” targets. Such noncooperative scenarios involve motion which may be resistible or unpredictable (i.e., the target may have lost its control authority and thus its motion becomes resistible or it may be “adversarial” and is maneuvering to avoid capture). Maintaining a post-docked state requires a control system which minimizes the effects of the uncertain interactions due to a noncooperative behavior. In the robust control sense, such uncertainty needs to be upper-bounded in order to develop

11 effective control strategies. For that purpose it is important to characterize this uncertain interactions. An approach to approximate this uncertainty is to model the docked state of spacecraft as a differential game with each spacecraft being the player, where the interactions are the outcome of the gameplay. Differential games is a class of game theory that describe a dynamical system with multiple control inputs with different objectives. Each input which actively affects the behavior of the system (e.g., control inputs, noise, disturbance, and other external inputs) is called a player of the game, and those players cooperate with or compete against one another to achieve their objectives. This manuscript addresses the characterization of a noncooperative behavior expected in satellite post-docking, and the corresponding controller law to achieve docking-state maintenance through modeling and solving the two-person Stackelberg differential game.

12 CHAPTER 1 INTRODUCTION This dissertation is aimed at outlining an approach to estimating a noncooperative behavior of the target spacecraft in the post-docking state, the corresponding interaction between the docked spacecraft, and the required control strategy to maintain the state. In this chapter the background of spacecraft docking and motivations for a study of noncooperative post-docking are presented. The Sputnik 1, the world first satellite was launched in 1957. Since then, more than 6,000 satellites have been launched to space, and currently about 3,600 satellites are operational in space. Advancing of the space technologies necessitated transportation of astronauts between spacecrafts, construction of space stations, etc., which were carried out through spacecraft rendezvous and docking (R&D). A spacecraft docking and rendezvous play important roles in space utilities [1]. Since the first rendezvous of Gemini VI-A in 1965, rendezvous have been used in cases including transportation of astronauts between spacecraft and construction of a space station. For example, HTV-1 was docked with the international space station (ISS) for refueling [2].

1.1 Spacecraft Rendezvous and Docking

There are many space applications involving docking of spacecraft, satellites, and objects. Many of the current and past docking scenarios are of “cooperative” nature; before two spacecraft are docked they have to rendezvous, which can be achieved if (i) both spacecraft work together to match their motions, or (ii) one of them is stationary or in constant motion such that the other can adjust to match it. Docking with a spacecraft which is tumbling is considered “non-cooperative.” Thus manned missions are inherently cooperative, while un-manned missions with autonomous rendezvous and docking could be noncooperative.

13 1.1.1 Cooperative Scenarios

Servicing operations and return missions assume the targets cooperate to get services or to dock. Examples include refueling (Orbital Express [3,4], ConeXpress [5], HTV-1 (KOUNOTORI) [2], and HTV-2 (KOUNOTORI 2) [6]), towing (Orbital maneuvering vehicle (OMV) [7]), and repairing, such as the servicing missions to repair Hubble Space Telescope [8–11].

1.1.2 Noncooperative Scenarios

Cooperative scenarios are not guaranteed for all future missions and the likelihood of docking with noncooperative target is quite high. For example, the motion of the target may either be unpredictable or resistible (i.e., the target motion is not fully under control or favorable). Cook [12] defined the non-cooperative target as any spacecraft which is either not designed for docking or tumbling freely in space. In the future it is possible to add adversarial targets, which specifically try to avoid docking and rendezvous, although military use of space resources is prohibited by the current space law [13]. There are a few important applications in the future. Collision avoidance of near-Earth asteroids, for example, has drawn attentions [14, 15]. Docking maneuvers are involved when sensors are placed on asteroids to track their motion, or when actuating or explosive devices are attached to change the course of their motions. Another example is space debris. Several recent events contributed to the rapid growth of the debris population; China’s ASAT operation in 2007 [16], the destruction of USA-193 in 2008 [17], and the collision between Iridium and COSMOS in 2009 [18]. Liou [19] estimated the propagation of space debris and showed that the debris population keeps growing even if no more spacecraft is launched; the existing debris will collide with spacecraft or other debris to increase its number. Therefore, both prevention and removal of space debris are important. One of the motivational examples is the case of USA-193 [17], a reconnaissance satellite which became disabled on orbit and eventually got destroyed by a missile. The operation not only was costly but also

14 generated debris, just like the ASAT operation. If there exists a technology to safely deal with non-cooperative targets (e.g., a space tow truck to capture and take them to the graveyard orbit), such a problem can be prevented with smaller damages. Several active debris removal technologies have been proposed, including the Remora removerTM[20], and the micro remover [21, 22].

1.2 Small Satellites

In the past, mission-specific big monolithic satellites were developed and served for most of the space applications. But recently small satellites have drawn attentions for their short-time, low-cost development, and their versatile applications. The idea of constellation of small satellites is expected as a replacement of some of the tasks that have been taken by traditional satellites. However, having more satellites in space increases the risks of more space debris. Small satellites usually have a higher orbital life than traditional satellites due to higher ballistic coefficients and lower orbits [23]. The long orbital life of small satellites along with the fact that after the end of their service they become debris will increase the threat. Furthermore, the higher the number of spacecraft gets, the more likely the failure could occur. The utilization of small satellites for practical applications requires many satellites by nature, and thus there will be a need for effective debris abatement (i.e., remove the failed satellites for repair and/or disposal) which requires ability to work with non-cooperative debris. And while several works have been done on non-cooperative docking, non-cooperative post-docking has not drawn much attention.

1.3 Game Theoretic Approach

Recent, current, and future activities necessitate development of autonomous spacecraft rendezvous and docking technology, with target spacecraft having non-cooperative characteristics. Dealing with non-cooperative targets also emphasizes the importance of post-docking maintenance, which requires the design of a control system to minimize the effects of uncertain interactions due to the non-cooperative behavior of the target

15 spacecraft. In the robust control sense such uncertainty needs to be upper-bounded in order to develop effective control strategies. Therefore it is important to characterize the non-cooperative behavior and the corresponding interactions. In order to successfully achieve docking and maintain the docked state between two spacecraft, accurate information of their dynamic behaviors is required so that the corresponding interactions can be controlled. In cooperative docking maneuvers, where two fully functional spacecraft work together to dock with each other, such a requirement has been tackled by several efforts already. On the other hand, in non-cooperative docking maneuvers and the corresponding post-docking maneuvers it is difficult to analyze the interaction because one spacecraft will not act in accordance with the other. In this dissertation a differential games theoretic approach is employed to estimating the behavior of the target (non-concooperative) spacecraft in the post-docking situation, the corresponding interactions, and the required control strategy to maintain the docked state. For a specific case where two satellites with known specifications such as mass, size, and power are considered, a dedicated method of simulation such as Monte Carlo method works well. However, if the general behaviors of arbitrary satellites are to be studied, then it is beneficial to know how each parameter of the design specification of satellites affects the post-docking. Characterizing the non-cooperative post-docking behaviors as a function of the design parameters allows for consideration of different post-docking scenarios. Figure 1-1 shows an iterative design process possible with game theory1 . A differential game problem will be formulated such that the tow truck (the service vehicle, SV) and the non-cooperative target (the resident space object, RSO) respectively choose their actuation command after docking to affect the interactions between them. With a set of simulation parameters including the specifications of the satellites, the problem will be

1 More details on differential game theory is discussed in Chapter2

16 solved to yield the possible interactions and the control actuations required to achieve them. That information can be used as a feedback to redefine the design specifications of the satellites to be built (e.g., if the interactions are kept small but the required control efforts for the SV are too high, the simulation can be adjusted to weigh more towards lowering the control efforts at the cost of higher interactions).

New satellite models for a non- Solve for the interactions and cooperative post-docking the necessary control inputs to scenario minimize them Service Vehicle

Target Vehicle Service Vehicle Target Vehicle

Interactions potentially too great for the to maintain docking, or the necessary control input too big to generate Redefine the structure strength, actuator specification, etc.

Figure 1-1. A design iteration through satellite post-docking analysis.

The analysis is expected to contribute to an establishment of a new technology for the future space application. In Chapter2 several technical backgrounds including game theory will be described. In Chapter3 approaches to solving the differential game problem will be discussed. In Chapter4 the dynamics of the satellite post-docking will be investigated. The solutions to the particular game-based control design problems will be presented in Chapter5 and Chapter6.

17 CHAPTER 2 MATHEMATICAL BACKGROUND FOR THE APPROACH

2.1 Differential Games and Control Theory

Game theory is a study of conflict among multiple groups or individuals (players) making decisions in competitive situations [24]. Static games is a type of games where each player makes their decision simultaneously without the knowledge of the decisions of the others. Dynamic games or sequential games is an extension of static games where either the decision of each player is made in order (e.g., a two-level game where a player designated as the leader chooses their move and then the other player as the follower chooses theirs, based on the leader’s action) or the game is played multiple times with the players able to learn from the results of the past games. Dynamic games may be played with rules described by a set of differential equations (e.g., a pursuit-evasion game played by two aircraft subject to their respective equations of motion). Such game is differential games [25]. Noncooperative differential game theory has been applied to a variety of control problems [26–39]. While zero-sum differential games have been heavily exploited in nonlinear H∞ control theory, nonzero-sum differential games have had limited application in feedback control. In particular, Stackelberg differential games, which is based on a hierarchical relationship between the players, have been utilized in a decentralized control system [30], hierarchical control problems [28, 29, 37], and nonclassical control problems [31]. Differential games, as well as optimal control, are difficult tools to apply because of the challenges associated with determining analytical solutions, with a few exceptions such as the linear quadratic structure. One way to incorporate optimal control and differential game structures is to formulate a system composed of control terms to feedback linearize and additional control terms to optimize the residual system. For example, optimal controller are developed with feedback linearization with exact model knowledge assumption [40] and via neural networks

18 [41–43]. In [44] an open-loop Stackelberg game-based controller is developed based on the Robust Integral of the Sign of the Error (RISE) [45–47] technique. In order to design a SV for a space operation to deal with a non-cooperative interaction, it is necessary to study their dynamic behaviors as well as proper control architecture. Controlling each individual satellite which interfere one another requires game theoretic consideration. Multiobjective optimization problems, with each objective possibly conflicting one another, have been studied in the framework of game theory. The architecture of differential games was first deveoped by Isaacs [25] and has been applied to various engineering applications. Game theoretic approach to design the controller can handle the optimal control of multiple objects with conflicting objectives; even when the motion of the RSO is unknown and therefore non-cooperative, it may still be possible to analyze the interaction between the SV and the RSO, i.e., how much forces are applied to the vehicles or needed to apply. The simplest form of a two-person differential game is defined as follows: The system is given by a differential equations

x˙ = f(x, u1, u2, t), x(t0) = x0 (2–1)

consisting of two independent control inputs u1 and u2. By convention each control input is assigned to a player, such that u1 is designed by Player 1 and u2 is designed by Player 2. Each player chooses its control strategy in such a way that it minimizes the corresponding cost functionals

tf J1 (u1, u2) = Φ1 (t0, tf , x0, xf ) + L1 (x, u1, u2, t) dt (2–2) Zt0 tf J2 (u1, u2) = Φ2 (t0, tf , x0, xf ) + L2 (x, u1, u2, t) dt (2–3) Zt0 Unlike optimal control problems, this set is not well-posed. WHy In an optimal control problem the optimal solution, if exists, guarantees that the cost is minimized to satisfy all the constraints, while in a two-person differential game in general the minimum cost

19 of player 1 and 2 cannot be obtained simultaneously. Minimization of J1 often interferes minimization of J2 and vice versa. In order to solve for u1 and u2, strategies to define the nature of the equilibrium solution, in other words how the game is played, need to be imposed. For modelling the non-cooperative interactions, Minimax, Nash, and Stackelberg strategies are considered.

2.1.1 Minimax Strategy

In Minimax strategy, by assuming the worst case and trying to minimize the damage, one can obtain the solution which is the safest. Minimax considers a zero-sum game (the costs of all players add up to zero) such that

J1 = J2 = J (2–4) −

therefore each control strategy is expressed as

u1 = arg min J1 = arg min J u1 u1

u2 = arg min J2 = arg max J u2 u2 therefore, if the solution exists, it will be a saddle-point solution

u1 = arg min max J u1 u2

Minimax strategies are for zero-sum differential games, but even when a game is nonzero-sum, it is useful to consider minimax for noncooperative cases because the

Player 1 may not know the objective of the Player 2; if J2 is unknown, by assuming

J2 = J1 the Player 1 is able to estimate the possible interaction with the Player 2. − 2.1.2 Nash Strategy

In Nash strategy each player tries to optimize their objective without caring about one another however it could result in the equilibrium solution which is not optimal to each individual player (e.g., Prisoner’s Dilemma). In a two-person game Nash strategy each player tries to minimize their cost simultaneously, knowing that the other player

20 does the same. Therefore, given the cost of each player as J1(u1, u2) and J2(u1, u2), the Nash strategy u , u should satisfy the following: { 1n 2n}

J1 (u1n, u2n) J1 (u1, u2n) , ≤

J2 (u1n, u2n) J2 (u1n, u2) , ≤ that is, if one player changes their strategy, it results in increase in their cost. Therefore for zero-sum games the solution is the same as Minimax solution. For nonzero-sum games the necessary conditions for existence of Nash solution have been developed by [48]. In general, there are more than one set of the equilibrium solution for Nash strategy, or there may exist no solution at all. For a special structure of the game, linear quadratic differential game, where the cost functionals are quadratic in the state and the control, subject to the constraint which is a differential equation linear in the state and the control.

2.1.3 Stackelberg Strategy

Stackelberg strategy assumes one player (the leader) has an advantage over the other (the follower) in minimizing its cost. In Stackelberg strategy, the leader can enforce their action, and the resultant equilibrium solution is always favorable to the leader. With Player 2 as the leader, the cost of each player associated with the Stackelberg strategy is as follows:

J1 (u1s, u2s) J1 (u1, u2s) (2–5) ≤

J2 (u1s, u2s) J2 (u1, u2) (2–6) ≤ where the subscript s denotes the Stackelberg strategy. Equation (2–5) shows that resultant cost Player 1 (follower) achieve is better off playing the Stackelberg strategy if the knowledge of Player 2 (leader) playing the Stackelberg strategy is available. Stackelberg solution is always better than or equal to the Nash solution [49], suggesting

21 that the Stackelberg strategy can characterize both hierarchical and non-hierarchical cases.

J2 (u1s, u2s) J2 (u1n, u2n) (2–7) ≤

In Stackelberg games, the leader chooses its strategy first. It must be noted, however, that the order of choosing the strategy does not necessarily mean that one player physically acts before the other. Next the follower chooses its strategy such that it minimizes its cost given the leader’s strategy. Thus to the follower the game is merely an optimization problem where it minimizes the cost with any strategy provided by the leader. An optimal control input of the follower is defined by

∗ u1 (u2) = arg min J1 (u1, u2) (2–8) u1 that is, the follower’s control input is optimal to an arbitrary input of the leader. When the leader plays Stackelberg strategy, it assumes that the follower plays Stackelberg strategy; that the follower’s decision is based on Eq. (2–8). Therefore the leader

∗ chooses its decision such that it solves the optimal control problem given u1:

∗ u2s = arg min J2 (u1(u2), u2) (2–9) u2

Once the Stackelberg solution of the leader is obtained, then the Stackelberg solution of the follower is also found as

∗ u1s = u1(u2s) = arg min J1 (u1, u2s) (2–10) u1 which is based on the follower’s assumption that the leader knows the follower “follows” the leader, and that the leader tries to optimize its objective accordingly. If Eq. (2–8) can be solved analytically as a function of u2 then Eq. (2–10) is automatically determined once Eq. (2–9) is found, allowing u1s and u2s to be solved sequentially and separately, not simultaneously.

22 The concept was first applied to differential games in [50], and later solvability conditions were developed in [51, 52]. In a similar manner to the Nash strategy, the uniqueness of solution is not guaranteed in general, except for linear quadratic cases.

2.1.4 Open-Loop Strategies for Two-Person Linear Quadratic Differential Games

The solutions of two-person differential games are found by solving the optimality conditions obtained by the calculus of variations,1 but theboundary-value problems are so complicated that in many cases they don’t have analytical solutions. However, the analytical solution exists for a simpleproblem. Two-person linear quadratic (LQ) differential games is a class of differential games, where dynamic constraints given by Eq. (2–1) are described by a set of linear differential equations

dx x˙ = = Ax + B u + B u , x(t ) = x (2–11) dt 1 1 2 2 0 0 and the cost functionals given by Eqs. (2–2)-(2–3) are described by quadratic form

1 1 tf J (u , u ) = xT K x + xT Q x + uT R u + uT R u dt (2–12) 1 1 2 2 f 1f f 2 1 1 11 1 2 12 2 Zt0 1 1 tf  J (u , u ) = xT K x + xT Q x + uT R u + uT R u dt (2–13) 2 1 2 2 f 2f f 2 2 1 21 1 2 22 2 Zt0  In this case the control strategies are also linear in the states x,

−1 T u1 = R B K1x (2–14) − 11 1 −1 T u2 = R B K2x (2–15) − 22 2 where K1 and K2 are the solutions of the Riccati differential equations, which are associated with the optimality conditions for the LQ differential games and vary with

1 The derivation of the optimality conditions is provided in Appendix.

23 strategies. For Minimax strategies,

T −1 −1 K˙ 1 = K1A A K1 Q1 + K1B1R B1K1 + K1B2R B2K1, K1(tf ) = K1f (2–16) − − − 11 12 T −1 −1 K˙ 2 = K2A A K2 Q2 + K2B1R B1K2 + K2B2R B2K2, K2(tf ) = K2f (2–17) − − − 21 22

For Nash strategies,

T −1 T −1 T K˙ 1 = K1A A K1 Q1 + K1B1R B K1 + K1B2R B K2, K1(tf ) = K1f − − − 11 1 22 2 T −1 T −1 T K˙ 2 = K2A A K2 Q2 + K2B2R B K2 + K2B1R B K1, K2(tf ) = K2f − − − 22 2 11 1

For Stackelberg strategies, there is an additional set of differential equations that need to be solved.

−1 T −1 T T K˙ 1 = K1A + K1B1R B K1 + K1B2R B K2 A K1 Q, K1(tf ) = K1f − 11 1 22 2 − − −1 T −1 T T K˙ 2 = K2A + K2B1R B K1 + K2B2R B K2 A K2 Q2 − 11 1 22 2 − −

+ Q1P, K2(tf ) = K2f (2–18)

−1 T −1 T P˙ = PA + PB1R B K1 + PB2R B K2 + AP − 11 1 22 2 −1 T −1 T −1 T B1R R R B K1 + B1R B K2, P(tf ) = 0 − 11 21 11 1 11 1

2.2 Numerical Methods to Optimal Control Problem

An optimal control problem is defined as follows: Find u which minimizes the cost functional

tf J = Φ (x(t0), x(tf ), t0, tf ) + φ(x, t) + L (x, u, t) dt (2–19) Zt0 where Φ is the terminal constraint, φ is the path constraint, and with dynamic constraint

x˙ = f (x, u, t) , x(t0) = x0 (2–20)

Optimal control can be seen as a one-player differential game and thus shares with it the same problem with finding the solution. A classical approach, the indirect

24 method, is to use calculus of variations to construct a set of differential equations whose solution is the optimal control strategy. The resultant set of differential equations is a boundary-value problem, and its solution is rarely analytical. Numerically solving a boundary-value problem is often difficult due to necessity of initial guess and small radius of convergence. In optimal control the direct method transcribes the problem into a parameter optimization problem (Nonlinear Programming). The state equation and control inputs are discretized and approximated by interpolating polynomial functions, and the cost functional is evaluated by numerical integration. As discussed in [53], there are many ways of transcription, but when an optimal control problem is transcribed to a nonlinear programming problem, they are essentially two different problems, and it is important that the solution to the converted nonlinear programming problem is indeed the solution to the original optimal control problem. This can be checked by comparing the kkt multipliers of the transcribed NLP problem to the costate of the optimal control problem with indirect method. Pseudospectral method, which is also known as orthogonal collocation method [54], converts an optimal control problem to a nonlinear programming problem by approximating the dynamics with the derivative of orthogonal interpolating polynomial functions, and the integral form of the cost with the gauss quadrature numerical integration. For example, Legendre pseudospectral method (LPM) [55], uses lagrange interpolation at Lobatto collocation points, which are roots of the derivative of Legendre polynomial, as well as the corresponding Lobatto weights to evaluate the numerical integration. [56] showed that the costate approximation with LPM is exact at every collocation point.

2.3 Bilevel Programming

Bilevel Programming is a class of problems where two parameter optimization problems are arranged in such a way that some of the constraints of one problem is

25 defined by the solution to another problem. The relationship between Stackelberg differential games and bilevel programming is analogous to that between optimal control and nonlinear programming. The coupling of multiple optimization problems defines games, and the hierarchical nature relates Stackelberg games, in particular to “optimistic” bilevel programming problems [57]. Unlike the single-player parameter optimization problems (e.g., nonlinear programming) there is no well-established techniques to solving bilevel programming problems due to its complexity. [58] shows an efficient algorithm to solving linear bilevel programming problems. However, for nonlinear cases, previous works mainly focus on specific problems ([59–61]). In order to fully investigate complex nonlinear Stackelberg differential games, it is necessary to have a well established numerical method. In Section 3.2 an orthogonal collocation approach is shown to transcribe a two-player Stackelberg differential game problem to a 2N-player Stackelberg static game, and in Section 3.3 an example problem is presented. However, this dissertation does not explore bilevel programming and instead focuses on designing game-theoretic controllers for spacecraft post-docking.

26 CHAPTER 3 TECHNICAL DESCRIPTION In this section several approaches to solving two-player Stackelberg differential games are discussed. One of them considers transforming the problem to a static game problem by discretization. The static games considered here are particularly the multi-objective optimization problems which are similar to differential games except that the differential constraints and the cost functionals are replaced by static constraints and cost functions, respectively. The solution of an optimal control problem is obtained as follows [62]; first the differential constraints are augmented to the cost functional to form the Hamiltonian, and with calculus of variations the variation of the Hamiltonian with respect to each of the state and control variables are obtained. These variations are differential equations and called the optimality conditions. The optimality conditions consist of the dynamics of the states (i.e., the system dynamics or the original differential constraints) and the costates (i.e., the Lagrange multipliers used to augment the differential constraints to the cost), and combined the set of differential equations is a boundary-value problem. Whether a differential game problem admits an analytical solution or not depends on the existence of the analytical solution to the boundary value problem defined by the optimality conditions. Since differential equations do not have an analytical solution in many cases [63], The solution of differential games with calculus of variations (indirect method), where the solution is obtained by solving differential equations satisfying the optimality condition, is limited; it is difficult to ensure the existence of the solution especially when the problem is nonlinear, and even though the existence is ensured, it is still difficult to solve the boundary value problem. Although optimal control in general suffers the same problems as differential games, there is a class of numerical methods to transcribe the optimal control problem to a parameter optimization problem using

27 Direct Collocation, which can be solved with Nonlinear Programming. This so-called direct method guarantees the existence of the solution in exchange of possible loss of optimality. Three candidates for solving differential games are considered here. First, indirect method is to be used when the problem is simple and can take certain forms such that the analytical solution is well developed. Second, in some cases a two-person differential game can be converted to an optimal control problem and solved with direct methods. Third choice is inspired by direct methods for optimal control; transcribing differential games to static games using orthogonal collocation (Pseudospectral method). Since optimal control can be considered as a single-player game, there is a relationship between Stackelberg games and optimal control, as shown in Fig. 3-1. Stackelberg differential games have similar structure to Stackelberg static games, as optimal control does to nonlinear programming. A Stackelberg differential game problem can be reduced to an optimal control problem by including the optimality conditions for the follower, and the solution of the optimal control problem provides the solution of the Stackelberg leader. However, it doesn’t provide the solution of the follower; it needs to be computed separately based on the solution of the optimal control problem, and it is difficult to discuss that the solution of the leader and the follower have the same level of accuracy. The same concept applies to the transition from a bilevel programming problem to a nonlinear programming problem (Therefore those arrows are in dotted lines). Figure 3-2 shows more general relationships among optimization problems. Not all differential games can be reduced to optimal control problems, thus the indirect approach to solving differential games does not necessarily go through solving optimal control. Likewise, not all static games can be reduced to nonlinear programming. Differential games and optimal control can be solved “indirectly” via calculus of

28 Stackelberg Optimal Differential Control Game

Direct Transcription

Bilevel Nonlinear Programming Programming

Figure 3-1. Stackelberg games are hierarchical optimization and can be reduced to single optimization problems. Differential games and optimal control can be converted through discretization to static games and nonlinear programming, respectively. variations. If successfully transcribed, it is possible to solve optimal control problems and differential game problems as parameter optimization problems, such as nonlinear programming and bilevel programming. Although it hasn’t been as popular as optimal control case to discretize differential game problems to static ones, researchers have studied this discretization for pursuit-evasion (zero-sum) games. Ehtamo [64] performed both discretization of the optimality conditions to nonlinear programming and direct conversion to bilevel programming and showed they led to the same solution. Horie [65] converted the optimality conditions and solved two sets of nonlinear programming combined with genetic algorithm. Still, no attempt has been made to apply the Pseudospectral method to solve nonzero-sum differential games, to the author’s knowledge. These prior efforts inspire an approach to solving nonzero-sum Stackelberg differential games of post-docked satellites by transcribing to nonlinear programming and static game (bilevel programming). Due to the structure of Stackelberg differential games, building connections among differential/static games, optimal control, and

29 Differential Calculus of Variations Game and solve BVP Optimal Control

Direct Transcription

Nonlinear Programming Parameter Static Game Optimization

Figure 3-2. Differential games and optimal control problems can be solved by indirect methods using calculus of variations, or by direct methods through direct transcription. nonlinear programming should be possible. Although the effort is only toward Stackelberg games, it could be extended to Nash games in the future.

3.1 Reduction of Stackelberg Differential Games to Optimal Control

Two-person differential games could be posed as two coupled optimal control problems. Let the control set of the player 1 and the player 2 be u1 and u2, respectively, then u1 solves an optimal control problem

tf Minimize J1 = Φ1 (x0, xf , t0, tf ) + L1 (x, u1, u2, t) dt Zt0 (3–1)

Subject to x˙ = f (x, u1, u2, t) , x(t0) = x0

30 and u2 solves another optimal control problem

tf Minimize J2 = Φ2(x0, xf , t0, tf ) + L2 (x, u1, u2, t) dt Zt0 (3–2)

subject to x˙ = f (x, u1, u2, t) , x(t0) = x0

Suppose u1 and u2 respectively as the follower and the leader. When a Stackelberg strategy is played, a two-player nonzero-sum differential game is solved as follows [66]; first the differential constraints are augmented to the follower’s cost functional to form the follower’s Hamiltonian, from which the optimality conditions for the follower are obtained. can be converted to the optimal control problem of the leader, which can be solved numerically using collocation method. One characteristic of the Stackelberg strategy in two-person differential game is that the follower always acts optimally to the leader, therefore if the leader’s control strategy is presented, the follower u1 solves a tracking problem defined by Eq. (3–2) assuming u2 is a prescribed function of time. The knowledge that the follower tracks the leader can be used as additional constraints in solving for the leader’s strategy as follows. Let the Hamiltonian of the follower be H1, defined as

T H1 = L1 (x, u1, u2, t) + λ1 f (x, u1, u2, t) then the follower’s optimality condition is given by

∂H T 1 = 0 (3–3) ∂u  1  H ˙ T ∂ 1 ˙ T ∂Φ1 λ1 = , λ1 (tf ) = (3–4) − ∂x ∂x(tf )

Equation (3–3) relates the follower’s control u1 and the costate λ1. If u1 can be expressed explicitly in terms of x, λ1, u2, and t as

u1 = u1 (x, λ1, u2, t) (3–5)

31 then by replacing u1 with Eq. (3–5) and combining Eq. (3–4), the two-player differential game problem is reduced to the optimal control problem of the leader u2:

tf Minimize J2 = Φ2 (x0, xf , t0, tf ) + L2 (x, u1 (x, λ1, u2, t) , u2, t) dt Zt0

subject to x˙ = f (x, λ1, u2, t) , x(0) = x0 (3–6) H ˙ T ∂ 1 T ∂Φ1 λ1 = , λ1 (tf ) = − ∂x ∂x(tf )

Once the optimal control problem is defined by Eq. (3–6) it is possible to solve using existing numerical methods discussed in Section 2.2. There are two issues with this conversion: (i) the follower’s control strategy is restricted to continuous, while the leader can admit discontinuous control inputs. (ii) direct method of optimal control may sacrifice the optimality for the existence of the solution, but due to the conversion, only the leader’s optimality is sacrificed (although Stackelberg leader is better at optimizing their cost than the follower). Therefore, except for the case when Eq. (3–6) can be analytically solved, it is more reasonable to maintain the game structure, by transcribing the differential game problem to a static game problem.

3.2 Conversion of Stackelberg Differential Games to Stackelberg Static Games

In the Section 2.2 it was shown that an optimal control problem can be converted to an optimization problem by discretizing the time domain, approximating the states with interpolating functions, and numerically integrating the cost functionals. Differential game problems have the same structure as optimal control problems, as both involve optimization of functions over time subject to dynamic constraints. Thus, the same method of transcription can be employed to convert differential game and optimal control problems to static game and nonlinear programming problems, respectively. In this section, in the similar manner that an optimal control problem is transcribed to a nonlinear programming problem, a two-person Stackelberg differential game is

32 transcribed to a Stackelberg static programming problem. As discussed in Section 2.3, Stackelberg static games is a subset of bilevel programming problems.

Transcription with the LGL Collocation.. In this section general transcription formulation is developed for two-person Stackelberg differential games using the Legendre-Gauss-Lobatto (LGL) collocation points. The main idea of the numerical

approach with collocation points is to approximate x, u1, and u2 as polynomials constructed from finite data points, and find their coefficients such that the approximated functions satisfy the original game problem at each collocation point [54]. LGL points are defined as 1, 1, and the roots of the derivative of the Nth-order Legendre polynomial in − the interval t [ 1, 1] (thus total of N + 1 points). ∈ − Recall the general two-player Stackelberg differential game with the player 2 as the leader can be modeled as

Minimize

tf J1 = Φ1 (x(t0), t0, x(tf ), tf ) + L1 (x(t), u1(t), u2(t), t) dt Zt0 tf J2 = Φ2 (x(t0), t0, x(tf ), tf ) + L2 (x(t), u1(t), u2(t), t) dt Zt0 subject to

x˙ = f (x(t), u1(t), u2(t), t) , x(t0) = x0

For convenience, let M = L1, N = L2, u = u1, and v = u2. The time domain needs to be scaled to τ [ 1, 1]: ∈ −

(tf t0)τ + (tf t0) t = t + − − 0 2 tf t0 dt = − dτ 2

For N + 1 collocation points, the state dynamics becomes N + 1 equality constraints

tf t0 x˙ = − f (x(τ ), u(τ ), v(τ ), τ ), i = 1, ... , N, x(τ ) = x i 2 i i i i 0 0

33 and the cost functionals become

tf J1 = Φ1 + L1(x(t), u1(t), u2(t), t)dt Zt0 τN tf t0 = Φ + − M(x(τ), u(τ), v(τ), τ)dτ 1 2 Zτ0 N tf t0 = Φ + − w M , 1 2 i i i=0 X and

N tf t0 J = Φ + − w N , 2 2 2 i i i=0 X where wi ’s are the weights associated with Gauss-Lobatto quadrature, which approximates the integral part of the cost functional [67]. Thus, the resultant bilevel programming problem is as follows; first the follower u solves the lower level problem

Minimize

N tf t0 J = Φ + − w M , 1 1 2 i i i=0 (3–7) X subject to

tf t0 − fi x˙i = 0, i = 1, ... , N, 2 − however, since Eq. (3–7) depends also on the leader v, which has not been found yet, the lower level problem alone does not provide the unique solution for the follower. Instead, the lower level problem defines an optimal reaction set for the follower, such

that u is determined once v is defined (i.e., u = u(v)). The leader, on the other hand, in

34 solving the upper level problem

Minimize

N tf t0 J = Φ + − w N , 2 2 2 i i i=0 (3–8) X subject to

tf t0 − fi x˙i = 0, i = 1, ... , N, 2 − takes into consideration the solution of Eq. (3–7). By substituting u’s with the follower’s information from the lower level problem, Eq. (3–8) becomes a well-posed parameter optimization problem of the leader.

3.3 Costate Mapping of Stackelberg Differential Games

Direct transcription is applied to an example problem where the analytical solution exists, in order to observe the validity of the approach to two-person Stackelberg differential games. The comparison between the costates of a Stackelberg differential game problem and the costates (or the KKT multipliers) of the transcribed bilevel programming problem is given in AppendixC.

Example.. Consider a nonzero-sum pursuit-evasion game that Simaan presented in [51] given by

x˙ = u1 u2, x(0) = x0 − 1 1 2 1 2 J1 = xf + u1dt (3–9) 2 2cp 0 Z 1 1 2 1 2 J2 = x + u dt −2 f 2c 2 e Z0 where x, u1, u2 R with u2 as the leader, and cp > 0, ce > 0 are known constants ∈ (cpce = 1). This problem is chosen since the analytical solution exists, so that the solution with a direct method can be compared. The analytical solution to Eq. (3–9)

35 provided in [51] is

cp u1 = x0 −cp σce + 1 − σce u2 = x0 −cp σce + 1 − and

x0 = x0 1 xf = x0 cp σce + 1 − where

1 σ = 1 + cp

Now the problem is solved via Legendre pseudospectral method (LPM). For N = 2, the scaled time domain is

τ0 = 1, τ1 = 0, τ2 = 1 − with the corresponding Lobatto weights

1 4 1 w = , w = , w = 0 3 1 3 2 3

The state is approximated with Lagrange polynomial of order 2:

2 1 1 1 1 x(τ) = τ x0 x1 + x2 + τ x0 + x2 + x1 2 − 2 −2 2     The derivativex ˙ (τ) is then

dx 1 1 1 1 = 2τ x0 x1 + x2 x0 + x2 dτ 2 − 2 − 2 2  

36 Evaluating at each discretization points,

3 1 x˙0 = x0 + 2x1 x2 −2 − 2 1 1 x˙1 = x0 + x2 −2 2 1 3 x˙2 = x0 2x1 + x2 2 − 2

These derivatives are expressed in a matrix form

X˙ = DX where

3 1 x0 x˙0 2 − 2 − 2 X =   , X˙ =   , D =  1 1  x1 x˙1 2 0 2     −       1 3  x2 x˙2  2 2 2       −        The dynamics is discretized at three collocation points

tf t0 x˙ = − f , i = 1, 2, 3 i 2 i

or

1 DX = F 2

which results in three equality constraints

1 h0 = f0 x˙0 2 − 1 h1 = f1 x˙1 2 − 1 h2 = f2 x˙2 2 −

37 The cost functionals are approximated with quadrature rule

N tf t0 J = Φ + − w M 1 1 2 i i i=0 X 1 1 1 1 4 1 = x 2 + u2 + u2 + u2 2 f 2 2c 3 0 3 1 3 2 p   1 2 1 2 1 2 1 2 = xf + u0 + u1 + u2 2 12cp 3cp 12cp

1 2 1 2 1 2 1 2 J2 = xf + v0 + v1 + v2 −2 12ce 3ce 12ce

The resultant bilevel programming problem is as follows:

1 2 1 2 1 2 1 2 Lower level problem: min J1 = x2 + u0 + u1 + u2 x,u 2 12cp 3cp 12cp

3 1 1 1 subject to x0 + 2x1 x2 = u0 v0 (3–10) −2 − 2 2 − 2 1 1 1 1 x0 + x2 = u1 v1 −2 2 2 − 2 1 3 1 1 x0 2x1 + x2 = u2 v2 2 − 2 2 − 2

1 2 1 2 1 2 1 2 Upper level problem: min J2 = x2 + v0 + v1 + v2 x,v −2 12ce 3ce 12ce

3 1 1 1 subject to x0 + 2x1 x2 = u0 v0 (3–11) −2 − 2 2 − 2 1 1 1 1 x0 + x2 = u1 v1 −2 2 2 − 2 1 3 1 1 x0 2x1 + x2 = u2 v2 2 − 2 2 − 2 Now use the algorithm in [68] to solve the bilevel programming problem defined by Eqs. (3–10)-(3–11).

38 Let the augmented Lagrangian for J1 be Ja1 as

N

Ja1 = J1 + νφ + wi λi hi i=0 X = J1 + νφ + w0λ0h0 + w1λ1h1 + w2λ2h2

1 2 1 2 1 2 1 2 = x2 + u0 + u1 + u2 + ν (x0 y0) 2 12cp 3cp 12cp − 1 3 1 1 1 + λ0 x0 + 2x1 x2 u0 + v0 3 −2 − 2 − 2 2   4 1 1 1 1 + λ1 x0 + x2 u1 + v1 3 −2 2 − 2 2   1 1 3 1 1 + λ2 x0 2x1 + x2 u2 + v2 3 2 − 2 − 2 2   and the partial derivatives with respect to states and controls are

∂Ja1 2 2 = λ0 λ2 ∂x1 3 − 3 ∂Ja1 1 2 1 = x2 λ0 + λ1 + λ2 ∂x2 − 6 3 2 ∂Ja1 1 1 = u0 λ0 ∂u0 6cp − 6

∂Ja1 2 2 = u1 λ1 ∂u1 3cp − 3

∂Ja1 1 1 = u2 λ2 ∂u2 6cp − 6

This gives

u0 = cpλ0

u1 = cpλ1

u2 = cpλ2

39 then make substitutions to the leader’s augmented cost function Ja2:

N N ∂Ja1 Ja2 = J2 + νφ + wi µi hi + wi ψi − ∂xi i=0 i=0 X X   1 2 1 2 1 2 1 2 = x2 + v0 + v1 + v2 + ν(x0 y0) −2 12ce 3ce 12ce − 1 1 1 3 1 + µ0 cpλ0 v0 + x0 2x1 + x2 3 −2 − 2 2 − 2   4 1 1 1 1 + µ1 cpλ1 v1 + x0 x2 3 −2 − 2 2 − 2   1 1 1 1 3 + µ2 cpλ2 v2 x0 + 2x1 x2 3 −2 − 2 − 2 − 2   2 2 + ψ1 λ0 λ2 3 − 3   1 2 1 + ψ2 x2 λ0 + λ1 + λ2 − 6 3 2   The partial derivatives are

∂Ja2 2 2 = µ0 + µ2 ∂x1 −3 3 ∂Ja2 1 2 1 = x2 + µ0 µ1 µ2 + ψ2 ∂x2 − 6 − 3 − 2 ∂Ja2 1 1 = v0 µ0 ∂v0 6ce − 6 ∂Ja2 2 2 = v1 µ1 ∂v1 3ce − 3 ∂Ja2 1 1 = v2 µ2 ∂v2 6ce − 6 ∂Ja2 1 2 1 = cpµ0 + ψ1 ψ2 ∂λ0 −6 3 − 6 ∂Ja2 2 2 = cpµ1 + ψ2 ∂λ1 −3 3 ∂Ja2 1 2 1 = cpµ2 ψ1 + ψ2 ∂λ2 −6 − 3 2

40 Solving for x, v, λ, µ, and ψ,

x0 = x0 1 x = (x + x ) 1 2 0 2 1 x2 = x0 cp σce + 1 −

λ0 = λ1 = λ2 = x2

v0 = v1 = v2 = σcex2

µ0 = µ1 = µ2 = σx2 − 1 ψ1 = σcpx2 −2

ψ2 = σcpx2 − where x0, x2, ui , and vi correspond to x(t0), x(tf ), u1(t), and u2(t), respectively. This shows that same values of x, u1, and u2 were obtained from the transcribed bilevel programming problem. Now examine whether the solution to the bilevel problem is truly an optimal solution to the original Stackelberg differential game problem. In this particular example it is, since the optimal solution is already given analytically in [51], which is identical to the solution to the bilevel programming problem. Therefore the following is rather to confirm the validity of costate mapping. If the transcription maintains the structure of the problem, λi ’s in the bilevel programming problem corresponds to λ(t) in the differential game problem.

λ0 λ(t0) →

λ1 λ(t1) →

λ2 λ(tf ) →

41 First compare the values of λ. The Lagrange multipliers are

λ0 = λ1 = λ2 = x2 and the corresponding costate of the original differential game discretized at each collocation point is

λ(τ0) = λ(τ1) = λ(τN ) = xf

and it was shown that x2 = xf , thus the multipliers for the bilevel programming problem has the exact values as the costate at every collocation point. Now look at the optimality conditions:

At τ = τ0:

The derivative of the Lagrangian with respect to x0 is

∂L1 1 2 1 = ν + λ0 + λ1 λ2 = 0 ∂x0 2 3 − 6

1 3 1 ν + λ0 + 2λ1 λ2 = 0 3 2 − 2   1 3 1 1 3 ν + λ0 + 2λ1 λ2 + 2 λ0 = 0 3 −2 − 2 3 2     w0 λ˙ 0 |{z} | {z }

w0λ˙ 0 = λ0 ν (3–12) ⇒ − − ∂φ Note that ν on the right hand side of Eq. (3–12) is in fact ν. In the original ∂x(τ0) Stackelberg problem,

λ˙(τ0) = 0

∂φ(x(τ0), τ0) λ(τ0) = ν − ∂x(τ0)

42 thus, The left hand side of Eq. (3–12) is the costate dynamics at τ0, and the right hand side is the initial transversality condition.

At τ = τ1:

The derivative of the Lagrangian with respect to x1 is

∂L1 2 2 = λ0 + λ2 = 0 ∂x1 −3 3

4 1 1 λ0 + λ2 = 0 3 −2 2   w1 λ˙ 1 |{z} | {z }

w1λ˙ 1 = 0 (3–13) ⇒

Since λ˙(t) = 0 in the original problem, Eq. (3–13) is indeed the costate dynamics at

τ = τ1.

At τ = τ2:

The derivative of the Lagrangian with respect to x2 is

∂L1 1 2 1 = x2 + λ0 λ1 λ2 = 0 ∂x2 6 − 3 − 2

1 1 3 x2 + λ0 2λ1 λ2 = 0 3 2 − − 2   1 1 3 1 3 x2 + λ0 2λ1 + λ2 2 λ2 = 0 3 2 − 2 −3 2     w2 λ˙ 2 |{z} | {z }

w2λ˙ 2 = x2 + λ2 (3–14) −

43 In the original problem

λ˙(tf ) = 0

λ(tf ) = xf

Therefore The left hand side of Eq. (3–14) matches the costate dynamics at τ = τ2, and the ride hand side matches the final transversality condition. That is, the transcribed problem exactly maps λ1(t) at every collocation point.

Now look at the costates corresponding to the upper level problem. µ0, µ1, and

µ2 of the bilevel programming problem are compared with µ(t) of the differential game problem. Similarly, ψ1 and ψ2 are compared with ψ(t).

µ0 = µ1 = µ2 = σx2 −

µ(τ0) = µ(τ1) = µ(τ2) = σxf −

1 ψ1 = σcpx2 −2

ψ2 = σcpx2 −

ψ(τ0) = 0 1 ψ(τ1) = σcpxf −2

ψ(τ2) = σcpxf −

At τ = τ1:

∂L2 2 2 = µ0 + µ2 ∂x1 −3 3 4 1 1 = µ0 + µ2 = 0 3 −2 2   w1 µ˙ 1 |{z} | {z }

44 w1µ˙1 = 0 ⇒

At τ = τ2:

∂L2 1 2 1 = x2 + µ0 µ1 µ2 ψ2 = 0 ∂x2 − 6 − 3 − 2 −

1 1 3 x2 + µ0 2µ1 µ2 = ψ2 − 3 2 − − 2   1 1 3 2 3 x2 + µ0 2µ1 + µ2 µ2 = ψ2 − 3 2 − 2 −3 2     w2 µ˙2 |{z} | {z }

w2µ˙2 = µ2 + x2 + ψ2 ⇒ which matches the costate dynamics and final transversality condition

µ˙ = 0

µ(τf ) = xf ψ(τf ) − −

Thus the Lagrange multipliers in the upper level problem match the costates of the leader’s solution in the differential game exactly at every collocation point. Therefore it was confirmed that with LGL collocation method successfully converted the Stackelberg differential game problem to static one, whose solution is indeed optimal at each discretized point in the original problem.

3.4 Conclusion

Two numerical approaches to solving two-player Stackelberg differential games were presented. Converting the differential game to an optimal control problem requires the follower’s optimal strategy to be analytically defined, but for a relatively simple problem well-established optimal control solvers can be utilized to obtain the solution. Converting to a static game problem retains more characteristics of the original differential game problem, but due to lack of a bilevel programming solver comparable

45 to those for optimal control, solvability of the transcribed problem is not guaranteed. In both cases, the solution to the issues lies in advancement of parameter optimization techniques, which is not the scope of this dissertation. Therefore, instead of pursuing numerical methods for complex nonlinear differential games, the following chapters investigate simplified problems to focus on Stackelberg-based non-cooperative characteristics in spacecraft post-docking and the corresponding necessary control actions for maintaining the docking.

46 CHAPTER 4 DYNAMICS OF DOCKED SPACECRAFT

4.1 Formulation of Dynamics

The dynamics of two docked satellites is derived in this section. This is later used as a set of dynamic constraints in the Stackelberg differential game.

4.1.1 Relative Motion Dynamics of a Satellite

Consider a satellite near a circular orbit as shown in Fig. 4-1. Let the circular orbit

be the “nominal” orbit to which a reference frame N (nominal frame) is attached. The F origin of the nominal frame N relative to the center of the Earth, which is the origin F of the inertial reference frame I, is RO, and the position of the satellite relative to F 1 O i N is ri . Suppose that the N and the satellite have angular velocities ω and ω , F F respectively.

Satellite

ri

Ri FN : Nominal frame

RO

FI : Inertial frame

The Earth

Figure 4-1. A representation of the position of a satellite with the inertial and the nominal reference frames.

1 The subscript i implies the ith body although only one body is considered. Later two bodies are considered within the same framework.

47 4.1.1.1 Translation

th Let the position of the i satellite relative to the center of the Earth ( I) be denoted F as

Ri = RO + ri (4–1)

with the corresponding velocity and acceleration

R˙ i = R˙ O + r˙i (4–2) ◦ O × = R˙ O + ri + ω ri  

R¨ i = R¨ O + r¨i (4–3) ◦◦ O × ◦ O × O × = R¨ O + r i + 2 ω ri + ω ω ri        Then the translation dynamics of the satellite relative to N can be written as F

◦◦ O × ◦ O × O × r i = R¨ i R¨ O 2 ω ri ω ω ri (4–4) − − −        4.1.1.2 Rotation

For translation the satellite is treated as a particle where its position is the center of mass of the body. For rotation, consider the satellite as a rigid body, as shown in Fig. 4-2. The satellite has the reference frame fixed to center of mass of the body, which is aligned to the principal axis.

: Body-fixed reference frame

Satellite i

Figure 4-2. A satellite with a body-fixed reference frame i F

48 Let the moment of inertia matrix of the satellite be Ji , then the rotational motion of the satellite is

i i i Ji ω˙ + ω Ji ω = τi (4–5) · × ·

i i × i Ji ω˙ + ω Ji ω = τi (4–6)   where the superscript denotes the matrix operation equivalent to the vector cross × product. Introducing the angular velocity relative to N , F

ωi = ωO + ωi/O (4–7)

then since ωO is constant,

ω˙ i =ω ˙ i/O (4–8)

From Eqs. (4–6)-(4–8) the rotation dynamics of the satellite relative to N can be written F as

i/O −1 i/O O × i/O O −1 ω˙ = J ω + ω Ji ω + ω + J τi (4–9) − i i h  i 4.1.2 Dynamics of Two Docked Satellites

Now consider two satellites subject to the linear and the rotational motions defined previously. Figure 4-3 shows two satellites near the nominal orbit. Figure 4-3 shows two satellites docked together near the nominal orbit. From Eqs. (4–4) and (4–9), the translational and rotational motions can be expressed as

49 SV: Service Vehicle

rSV

FN : Nominal frame RSV

rRSO RO

RSO: Resident Space Object

RRSO

FI: Inertial frame

The Earth

Figure 4-3. An exeggerated view of two satellites near the nominal orbit.

◦◦ O ◦ O O 1 r SV = R¨ SV R¨ O 2ω rSV ω ω rSV + FSV (4–10) − − × − × × mSV ◦◦ O ◦ O O  1 r RSO = R¨ RSO R¨ O 2ω rRSO ω ω rRSO + FRSO (4–11) − − × − × × mRSO SV /O −1 SV /O O SV /O O −1 ω˙ = J ω + ω JSV ω + ω + J M (4–12) − SV × SV SV RSO/O −1  RSO/O O RSO/O  O −1 ω˙ = J ω + ω JRSO ω + ω + J M (4–13) − RSO × RSO RSO    where FSV and FRSO are the forces applied to the docking point of each body due to contact, and τSV and τRSO respectively represent the corresponding moments, which are due to the components of FSV and FRSO that are not radial to the center of mass of SV and RSO, respectively. The interactions between the SV and the RSO are modeled with a spring and a damper which connect the docking points P and Q on the SV and the RSO. The spring

50 and damping force applied to P on the SV are

FSV = FSV ,k + FSV ,c

RP RQ FSV ,k = kL RP RQ l0 − − − − RP RQ  k − k FSV ,c = cL R˙ P R˙ Q − −  where kL and cL are the linear spring and the linear damping coefficient, respectively.

The tangential component of FSV results in the moment applied to the SV as

MSV ,F = ρSV FSV . × where ‘F’ in the subscript indicates it comes from the interaction force. The torque applied to P on the SV can be modeled as

MSV = MSV,k + MSV,c + MSV,F,

where MSV,k and MSV,c are from the torsional spring and the torsional damper, respectively, as

M , = kT θaˆ, SV k − RSO/SV M , = cT ω , SV c −

where θ and aˆ respectively denote the angle and the axis of rotation to express the

RSO’s attitude relative to the SV, and cT and kT are the constant of the torsional damper and spring.

Coordinatization.. It is customary to denote the translational motion in the coordinate frame for the nominal orbit, while the rotational motion is better described in the coordinate frame of the body itself. Thus the translational motions are coordinatized in the nominal frame, and the rotational motions are coordinatized in each body frame. Thus, the translational motion and the rotational motion of SV and RSO, respectively,

51 are written as

N ◦◦ N N N O × N ◦ N O × N O × N 1 N r SV = R¨ SV R¨ O 2 ω rSV ω ω rSV + FSV − − − mSV N ◦◦ N N  N O × N ◦  N O × N O × N 1 N r RSO = R¨ RSO R¨ O 2 ω rRSO ω ω rRSO + FRSO − − − mRSO SV SV /O SV −1 SV SV /O SV O × SV  SV SV/O SV O SV −1 SV ω˙ = J ω + ω JSV ω + ω + J MSV − SV SV RSO RSO/O RSO −1 RSO RSO/O RSO O × RSO RSO RSO/O SV O ω˙ = J ω + ω JRSO ω + ω − RSO RSO −1 RSO    + JRSO MRSO (4–14)

Also the interaction force appllied to the docking point is equal and opposite, thus in the nominal reference frame

N N FRSO = FSV −

Obviously this is not true for the interaction torques

N N MRSO = MSV 6 − due to MSV ,F , since the moment arms ρSV and ρRSO are different regardless of the coordinatization.

4.2 Simulation

The dynamic model of docked satellites defined by Eq. (4–14) is analyzed. Three different cases are considered to validate the stability of the post-docked state without external perturbation (i.e., without non-cooperative disturbance of the RSO the interactions never grows to damage the docking).

4.2.1 Case I: Nonzero Linear Velocity

Suppose two satellites are on the same orbit where the SV is moving ahead of the RSO, as shown in Fig. 4-4. Figure 4-5 shows that they are oriented in such a way that the docking points and center of mass of each body are on the same line. Although this makes the initial orientation of the SV and the RSO different from the nominal orbit, the

52 difference is assumed negligible due to small separation relative to large altitude. The initial docking separation is exactly the unstretched natural length of the spring.

Service Vehicle (SV)

RSV

The Earth Spring and damper

RRSO Target (RSO)

Figure 4-4. Two satellites initially on the same nominal orbit

First a case with the initial non-zero linear relative velocity between the SV and the RSO is considered. The list of the simulation parameters is shown in Table 4-1. The results are plotted in Figs. 4-6-4-9. As shown in Figs. 4-6 and 4-7, the force and the torque applied to the spacecraft quickly decay to zero. Figure 4-8 shows that the separation between the SV and the RSO converges to 0.5m, which is the un-stretched natural spring length. The SV and the RSO maintained the relative position throughout the simulation without causing rotational motion, as shown in Fig. 4-9.

53 SV y

x

ρSV d P

Q

ρRSO y

x

RSO

Figure 4-5. Two satellites initially radially aligned

FSV (N) FSV 5 5  

4

3 0 2

1

−5 0 0 50 100 0 50 100 time (s) time (s)

FRSO (N) FRSO 5 5  

4

3 0 2

1

−5 0 0 50 100 0 50 100 time (s) time (s)

Figure 4-6. Case I: the interaction forces applied to the SV and the RSO

54 Table 4-1. The simulation parameters for Case I Name Description Value Unit t Time of the simulation 100 s mSV Mass of the service vehicle 100 kg mRSO Mass of the target vehicle 120 kg T 2 ISV Moment of inertia of the service vehicle diag [10 10 12] kg m T · 2 IRSO Moment of inertia of the target vehicle diag [12 12 15] kg m · RO The altitude of the nominal orbit 7,031 km k The spring constant 10 N/m c The damping constant 20 N s/m · kT The torsional spring constant 10 Nm/rad

55 cT The torsional damping constant 20 Nm s/rad · l0 The unstretched spring length 0.5 m µ The gravitational parameter 398,600 km3/s2 T ρSV The docking position of the SV relative to its center of mass [0 0.25 0] m − T ρRSO The docking position of the RSO relative to its center of mass [0 0.25 0] m T rSV (0) The initial separation from the nominal [0 0.5 0] m T rRSO(0) The initial separation from the nominal [0 0.5 0] m − T r˚SV (0) The initial local velocity of the SV [0 0 0] m/s T r˚RSO(0) The initial local velocity of the RSO [0 0.2 0] m/s ωSV (0) The initial angular velocity of the SV relative to the nominal [0 0 0]T rad/s ωRSO(0) The initial angular velocity of the RSO relative to the nominal [0 0 0]T rad/s T q¯ SV (0) The initial quaternion of the SV relative to the nominal [0 0 0 1] T q¯ RSO(0) The initial quaternion of the RSO relative to the nominal [0 0 0 1] MSV (Nm) MSV 2 2  

1 1.5

0 1

−1 0.5

−2 0 0 50 100 0 50 100 time (s) time (s)

MRSO (Nm) MRSO 2 2  

1 1.5

0 1

−1 0.5

−2 0 0 50 100 0 50 100 time (s) time (s)

Figure 4-7. Case I: the interaction torques applied to the SV and the RSO

N d (m) d 1 1  

0.8 0.5 0.6 0 0.4 −0.5 0.2

−1 0 0 50 100 0 50 100 time (s) time (s) N˚d (m/s) ˚d 0.5 0.5       0.4

0.3 0 0.2

0.1

−0.5 0 0 50 100 0 50 100 time (s) time (s)

Figure 4-8. Case I: the linear motion of the RSO relative to the SV

56 q¯RSO/SV 1

0.5

0

−0.5

−1 0 10 20 30 40 50 60 70 80 90 100 time (s) SV ωRSO/SV (rad/s) 0.1

0.05

0

−0.05

−0.1 0 10 20 30 40 50 60 70 80 90 100 time (s)

Figure 4-9. Case I: the rotational motion of the RSO relative to the SV

57 4.2.2 Case II: Nonzero Rotational Velocity

Next case considers non-zero initial angular velocity between the SV and the RSO, while the linear velocity is set to zero. Compared to the case I, the perturbation to the rotational results in higher relative rotational motion ωRSO/SV , as shown in Fig. 4-13, and thus higher interaction torque is observed in Fig. 4-11. It is noted that the force is also significant without initial linear velocity (Fig. 4-10) due to the separation caused by the misalignment of the SV and the RSO. Overall the relative motion between the SV and the RSO decays to the equilibrium.

FSV (N) FSV 5 5  

4

3 0 2

1

−5 0 0 50 100 0 50 100 time (s) time (s)

FRSO (N) FRSO 5 5  

4

3 0 2

1

−5 0 0 50 100 0 50 100 time (s) time (s)

Figure 4-10. Case II: the interaction forces applied to the SV and the RSO

58 Table 4-2. The simulation parameters for Case II Name Description Value Unit t Time of the simulation 100 s mSV Mass of the service vehicle 100 kg mRSO Mass of the target vehicle 120 kg T 2 ISV Moment of inertia of the service vehicle diag [10 10 12] kg m T · 2 IRSO Moment of inertia of the target vehicle diag [12 12 15] kg m · RO The altitude of the nominal orbit 7,031 km k The spring constant 10 N/m c The damping constant 20 N s/m · kT The torsional spring constant 10 Nm/rad

59 cT The torsional damping constant 20 Nm s/rad · l0 The unstretched spring length 0.5 m µ The gravitational parameter 398,600 km3/s2 T ρSV The docking position of the SV relative to its center of mass [0 0.25 0] m − T ρRSO The docking position of the RSO relative to its center of mass [0 0.25 0] m T rSV (0) The initial separation from the nominal [0 0.5 0] m T rRSO(0) The initial separation from the nominal [0 0.5 0] m − T r˚SV (0) The initial local velocity of the SV [0 0 0] m/s T r˚RSO(0) The initial local velocity of the RSO [0 0 0] m/s ωSV (0) The initial angular velocity of the SV relative to the nominal [0 0 0]T rad/s ωRSO(0) The initial angular velocity of the RSO relative to the nominal [0 0 0.05]T rad/s T q¯ SV (0) The initial quaternion of the SV relative to the nominal [0 0 0 1] T q¯ RSO(0) The initial quaternion of the RSO relative to the nominal [0 0 0 1] MSV (Nm) MSV 2 2  

1 1.5

0 1

−1 0.5

−2 0 0 50 100 0 50 100 time (s) time (s)

MRSO (Nm) MRSO 2 2  

1 1.5

0 1

−1 0.5

−2 0 0 50 100 0 50 100 time (s) time (s)

Figure 4-11. Case II: the interaction torques applied to the SV and the RSO

N d (m) d 1 1  

0.8 0.5 0.6 0 0.4 −0.5 0.2

−1 0 0 50 100 0 50 100 time (s) time (s) N˚d (m/s) ˚d 0.5 0.5       0.4

0.3 0 0.2

0.1

−0.5 0 0 50 100 0 50 100 time (s) time (s)

Figure 4-12. Case II: the linear motion of the RSO relative to the SV

60 q¯RSO/SV 1

0.5

0

−0.5

−1 0 10 20 30 40 50 60 70 80 90 100 time (s) SV ωRSO/SV (rad/s) 0.1

0.05

0

−0.05

−0.1 0 10 20 30 40 50 60 70 80 90 100 time (s)

Figure 4-13. Case II: the rotational motion of the RSO relative to the SV

61 4.2.3 Case III: Nonzero Linear and Rotational Velocities

Now the simulation is run with non-zero initial linear and rotational velocities. The simulation parameters are chosen as in Table 4-3, and the results are shown in Figs. 4-14-4-17. Figures 4-14 and 4-15 show the highest interaction forces and torques respectively due to the combination of the initial linear and rotational perturbations. However, interestingly both the interactions and the relative linear motion (Fig. 4-16) appear to converge to the lower bounds than in Case II (Figs. 4-10,-4-12). This can be interpreted as a part of the linear relative motion independent of the rotational motion contributed to the stabilization, while in Case II the whole motion was coupled, making it harder to stabilize the motion to the smaller radii of convergence. However the result does not suggest Case III better than Case II as the primary focus should be on the upper bounds of the interactions, which are certainly higher in Case III than the other two cases.

FSV (N) FSV 5 5  

4

3 0 2

1

−5 0 0 50 100 0 50 100 time (s) time (s)

FRSO (N) FRSO 5 5  

4

3 0 2

1

−5 0 0 50 100 0 50 100 time (s) time (s)

Figure 4-14. Case III: the interaction forces applied to the SV and the RSO

62 Table 4-3. The simulation parameters for Case III Name Description Value Unit t Time of the simulation 100 s mSV Mass of the service vehicle 100 kg mRSO Mass of the target vehicle 120 kg T 2 ISV Moment of inertia of the service vehicle diag [10 10 12] kg m T · 2 IRSO Moment of inertia of the target vehicle diag [12 12 15] kg m · RO The altitude of the nominal orbit 7,031 km k The spring constant 10 N/m c The damping constant 20 N s/m · kT The torsional spring constant 10 Nm/rad

63 cT The torsional damping constant 20 Nm s/rad · l0 The unstretched spring length 0.5 m µ The gravitational parameter 398,600 km3/s2 T ρSV The docking position of the SV relative to its center of mass [0 0.25 0] m − T ρRSO The docking position of the RSO relative to its center of mass [0 0.25 0] m T rSV (0) The initial separation from the nominal [0 0.5 0] m T rRSO(0) The initial separation from the nominal [0 0.5 0] m − T r˚SV (0) The initial local velocity of the SV [0 0 0] m/s T r˚RSO(0) The initial local velocity of the RSO [0 0.2 0] m/s ωSV (0) The initial angular velocity of the SV relative to the nominal [0 0 0]T rad/s ωRSO(0) The initial angular velocity of the RSO relative to the nominal [0 0 0.05]T rad/s T q¯ SV (0) The initial quaternion of the SV relative to the nominal [0 0 0 1] T q¯ RSO(0) The initial quaternion of the RSO relative to the nominal [0 0 0 1] MSV (Nm) MSV 2 2  

1 1.5

0 1

−1 0.5

−2 0 0 50 100 0 50 100 time (s) time (s)

MRSO (Nm) MRSO 2 2  

1 1.5

0 1

−1 0.5

−2 0 0 50 100 0 50 100 time (s) time (s)

Figure 4-15. Case III: the interaction torques applied to the SV and the RSO

N d (m) d 1 1  

0.8 0.5 0.6 0 0.4 −0.5 0.2

−1 0 0 50 100 0 50 100 time (s) time (s) N˚d (m/s) ˚d 0.5 0.5       0.4

0.3 0 0.2

0.1

−0.5 0 0 50 100 0 50 100 time (s) time (s)

Figure 4-16. Case III: the linear motion of the RSO relative to the SV

64 q¯RSO/SV 1

0.5

0

−0.5

−1 0 10 20 30 40 50 60 70 80 90 100 time (s) SV ωRSO/SV (rad/s) 0.1

0.05

0

−0.05

−0.1 0 10 20 30 40 50 60 70 80 90 100 time (s)

Figure 4-17. Case III: the rotational motion of the RSO relative to the SV

65 4.3 Conclusion

A dynamic model of two docked satellites was derived with the aim of investigating the interactions between them through differential games. The developed model here is shown to be stable without actuation; the interactions between the docked satellites are bounded without non-cooperative actuation. It should be noted that the relative motion observed in the simulation is due to the modeling of the problem such that the interaction force and torque are not absorbed by the docking mechanism but instead directly affect the motion of two spacecraft; if the interactions are small enough the docking can be maintained and the SV and the RSO should behave as a single body. The next step is to formulate a two-person differential game based on this dynamics and then with Stackelberg strategy to analyze the problem.

66 CHAPTER 5 LINEAR CONTROLLER DESIGN WITH STACKELBERG STRATEGY In this section the interactions of two docked satellites are studied via two-person differential games. A simplified version of the dynamics obtained in the previous chapter is used to formulate a linear quadratic (LQ) game where the dynamics is linearized at the equilibrium. As discussed in Chapter2, LQ differential games have well-defined solvability conditions (e.g., existence and uniqueness [69]), so that the spacecraft post-docking problem can be investigated without suffering computational difficulty.

5.1 Post-Docking Study with Linear Quadratic Game

The subsequent discussion follows [70]. Two satellites, the target or the resident space object (RSO) and the service vehicle (SV), are shown in Fig. 5-1. The interaction between the SV and the RSO is modeled by a spring and damper connecting them at the point P and Q such that a change in distance or velocity between P and Q produces the interaction forces, which are not necessarily radial and therefore cause torques, too. In order to maintain docking, those forces and torques need to be minimized. Each body is controlled by a force input and a torque input. It is further assumed that the thrusts and torques are decoupled.

Service Vehicle (SV) xSV Resident Space Object (RSO) k ρ SV ρ RSO P c xRSO Q

zSV

zRSO

RSV0 RRSO0

Center (The Earth)

Figure 5-1. Two rigid bodies on circular orbits.

67 The translational motion of each body moving in a circular orbit is obtained from the two-body equation given by [71] as

µ 1 r¨i + 3 ri = (Fi + ui ) , i = SV, RSO (5–1) ri mi k k where ri denotes the position vector located in the center of mass of each body, Fi denotes the external force due to the spring and the dampster, and ui is the control force input. The rotational motion is governed by Euler’s equation

Ji ω˙ i + ωi Ji ωi = Mi + τi , i = SV, RSO (5–2) × where Ji is the inertia matrix, ωi is the angular velocity, Mi is the moment due to the spring and the damper, and τi is the control torque input. The position vector of each body is

rSV = RSV 0 + η

rRSO = RRSO0 + ζ where the nominal radii of the orbit of the SV and the RSO, coordinatized in their respective frames, are

T SV R = SV 0 0 0 RSV 0  −  T RSOR = RSO0 0 0 RRSO0  −  and the linear perturbations are coordinatized in the same manner as

T SV η = η1 η2 η3   T RSO ζ = ζ1 ζ2 ζ3   Eqs. (5–1)-(5–2) are linearized to yield the linear dynamic model. With the assumption of small perturbation and that two satellites are on the same orbit (i.e., RSV RRSO ) 0 ' 0

68 since they are docked and have small body compared to the orbit, the coordinate frames are approximated to coincide. The relative distance and attitude errors are defined as

T δr = δx δy δz ζ η, ' −   T δθ = δθ δθ δθ β α. 1 2 3 ' −   Then the following linearized dynamic model is obtained:

x˙ = Ax + B1u1 + B2u2 and the state vector x of the system is composed of the relative distance, the attitude error, and their rates:

T x = δrT δθT δr˙T δθ˙T   In the following numerical analysis, the states are coordinatized in the body axis of the SV since the focus of this study is to control the SV.

5.2 Simulation and Results

The simulation for a infinite-horizon case was run with the system parameters shown in Table 5-1 with the corresponding state-space system

06×6 I6×6 A =   (5–3) A21 A22     where

06×6 06×6 B1 =   , B2 =   , B11 B21        

69 R 1 SV 0 m 0 0 0 I 0 − SV − SV 2  RSV  0 1 0 0 0 0 mSV ISV  − 1   0 0 1 0 0 0   mSV  B11 =  −  ,  1   0 0 0 I 0 0   − SV 1     0 0 0 0 1 0   ISV   − 2   1   0 0 0 0 0 I   − SV 3   

R 1 RSO0 m 0 0 0 I 0 RSO RSO2  RRSO  0 1 0 0 0 0 mRSO IRSO  − 1   0 0 1 0 0 0   mRSO  B21 =    1   0 0 0 I 0 0   RSO1     0 0 0 0 1 0   IRSO   2   1   0 0 0 0 0 I   RSO3   

1.852 1.852 1.852 0.001 1.482 0.001 − − − − −   0.926 0.926 0.926 0.001 0.741 0.001 − − − − −     0.926 0.926 0.926 0.001 0.741 0.001 A21 = − − − − −  (5–4)    0 0 0 0 0 0       0 0 0 0 0 0         0 0 0 0 0 0     

70 0.185 0.275 0.185 0 0 0 − − −   0.188 0.278 0.185 0 0 0 − − −     0.185 0.185 0.185 0 0 0 A22 = − − −  (5–5)    0 0 0 0 0.001 0      0 0 0 0.001 0 0    −     0 0 0 0 0 0     and

06×6 B1 =   (5–6) diag 0.01, 0.01, 0.01, 0.0167, 0.0167, 0.0167  {− − − − − − }  

06×6 B2 =   (5–7) diag 0.067, 0.067, 0.067, 0.01, 0.01, 0.01  { }   The units represented in matrices Aii and Bi are the linear distance in meters, mass in kg, and angular displacement in radians. The initial conditions are

T x(0) = δrT δθT δr˙T δθ˙T

  0 T

= 0.2I1×3 0.05I1×3 0.1 I1×6  −  The cost functionals are chosen as

Q1 = diag I1×3 0.1I1×3 0.5 01×5  

Q2 = Q1 −

71 Table 5-1. The simulation parameters for the linear quadratic game Name Description Value Unit c Damping constant 10 N s/m k Spring constant 20· N/m l0 Equilibrium displacement of the spring 0.3 m mSV The mass of SV 150 kg mRSO The mass of RSO 200 kg

RSV 0 The nominal altitude of SV 6600.00 km

RRSO0 The nominal altitude of RSO 6600.00 km 2 ISV The SV’s moment of inertia diag 60, 60, 60 kg m { } · 2 IRSO The RSO’s moment of inertia diag 100, 100, 100 kg m { T} · ρSV The docking position from the SV’s cm [1 1 1] m T ρRSO The docking position from the RSO’s cm [1.5 1.5 1.5] m

R11 = diag 1 0.1 0.1 1 1 1  

R12 = 16×6

R21 = R11 −

R22 = R12 −

For the sake of comparison, the problem is also solved with the linear quadratic regulator (LQR) controller assuming the interaction between the SV and the RSO cooperative. If the objectives of both the SV and the RSO are to minimize the interaction, a corresponding LQR problem can be constructed as

x˙ = Ax + Bu (5–8) and

1 ∞ J = xT Qx + uT Ru dt (5–9) 2 Z0  T T T T where B = [B1 B2] , u = u u , Q = Q1 Q2, and R = diag R11, R22 . Note 1 2 − { } that in order to be able to solve the LQR problem, R must be invertible and symmetric.

For simplicity, the off-diagnal terms are chosen to be zero instead of R12 and R21. The

72 resultant trajectories are plotted in Fig. 5-2, and the control force and torque inputs are compared in Figs. 5-3 and 5-4.

Show δx, δy, and δz.

) 0.5 0.4 δx δx δy 0.2 δy δz 0 δz 0

−0.2 error distance (m) error distance (m −0.5 −0.4 0 10 20 30 40 0 5 10 15 20 25 30 35 40 time (s) time (s) 0.2 δθ 0.3 1 δθ1 0.2 δθ δθ2 2 δθ3 0 δθ 0.1 3 0 attitude error (rad) attitude error (rad) −0.2 −0.1 0 10 20 30 40 0 5 10 15 20 25 30 35 40 time (s) time (s) A LQR B Nash

) 0.5 ) 0.5 δx δx δy δy 0 δz 0 δz

error distance (m −0.5 error distance (m −0.5 0 10 20 30 40 0 10 20 30 40 time (s) time (s) 0.2 0.5 δθ δθ 1 1 δθ δθ 2 2 0 δθ 0 δθ 3 3

attitude error (rad) −0.2 attitude error (rad) −0.5 0 10 20 30 40 0 10 20 30 40 time (s) time (s) C RSO as leader D SV as leader

Figure 5-2. The resultant trajectory.

The resultant trajectory doesn’t have notable difference whichever player plays Stackelberg leader or Nash strategy, or LQR. On the other hand, the control efforts changes significantly. It is obvious that with LQR the control efforts are the smallest because it is based on the ideal situation that RSO is not disabled. In cases where we can’t communicate with the RSO and don’t know its motion, such a cooperation is impossible, and thus LQR is not applicable. Both the SV and the RSO have much

73 SV SV 10 0.02 u u 1 1 5 u 0.01 u 2 2 u u 3 0 3 0 control force (N) control force (N) −0.01 −5 0 10 20 30 40 0 10 20 30 40 time (s) time (s) −4 RSO RSO x 10 5 5 u u 1 1 0 u u 2 0 2 u u 3 −5 3 control force (N) control force (N) −5 −10 0 10 20 30 40 0 10 20 30 40 time (s) time (s) A LQR B Nash

SV SV 10 0.2 u u 1 1 u 5 u 2 2 0 u u 0 3 3 control force (N) control force (N) −5 −0.2 0 10 20 30 40 0 10 20 30 40 time (s) time (s) RSO RSO 0.05 5 u u 1 1 0 u u 2 0 2 u u 3 −5 3 control force (N) control force (N) −0.05 −10 0 10 20 30 40 0 10 20 30 40 time (s) time (s) C Stackalberg: RSO as leader D Stackalberg: SV as leader

Figure 5-3. The control force inputs. smaller control force and torque input by playing Stackelberg leader than Stackelberg follower. Also, it is shown that the control efforts with Stackelberg follower is almost the same as with Nash strategy. Note that the trajectories and the SV’s control history represent the potential upper-bound, the control inputs of the RSO in Fig. 5-3 and Fig. 5-4 are imaginary actuation to cause motion of the disabled RSO. It is shown that the results using game-theoretic controllers are similar to LQR controller, which assumed cooperation between the SV and the RSO. These preliminary results confirm feasibility of game theoretic controller as well as the hypothesis that it

74 −3 SV SV x 10 10 4 τ τ 1 1 5 τ 2 τ 2 2 τ τ 0 0 3 3 −2

−5 control torque (N−m) control torque (N−m) 0 10 20 30 40 0 10 20 30 40 time (s) time (s) −4 RSO RSO x 10 5 2 τ τ 1 1 0 τ 0 τ 2 2 τ −2 τ −5 3 3 −10 −4 control torque (N−m) control torque (N−m) 0 10 20 30 40 0 10 20 30 40 time (s) time (s) A LQR B Nash

SV SV 4 0.5 τ τ 1 1 2 τ τ 2 0 2 τ τ 0 3 3

−2 −0.5 control torque (N−m) 0 10 20 30 40 control torque (N−m) 0 10 20 30 40 time (s) time (s) RSO RSO 0.2 2 τ τ 1 1 τ 0 τ 0 2 2 τ τ 3 −2 3 −0.2 −4

control torque (N−m) 0 10 20 30 40 control torque (N−m) 0 10 20 30 40 time (s) time (s) C Stackalberg: RSO as leader D Stackelberg: SV as leader

Figure 5-4. The control torque inputs. is possible to obtain smaller actuation inputs for the leader of a Stackelberg approach while obtaining relatively the same trajectory variables.

5.3 Conclusion

It was shown that the interaction between controlled service vehicle and the disabled target can be minimized with the LQR and the game theoretic controllers for a post-docked maneuver where the distance and the attitude error are kept small enough that linear approximation is valid. Although the LQR controller worked the best, it was based on an ideal case where the RSO was not disabled. Since the RSO being

75 disabled is the characteristic of the problem, the LQR controller is not practical. On the other hand, The game theoretic approach, with the Nash or the Stackelberg strategy, still kept the distance and the attitude errors small even with noncooperative behavior of the target RSO. Therefore, the feasibility of the game theoretic controllers in this noncooperative scenario was validated. It was also suggested that the control effort of the SV can be lowered by utilizing Stackelberg leader; such a case can provide the lower-bound for the possible interaction during the spacecraft post-docking phase, which is also useful as a design factor.

76 CHAPTER 6 SOLUTIONS TO TWO-PLAYER LINEAR QUADRATIC STACKELBERG GAMES WITH TIME-VARYING STRUCTURE Two-player linear quadratic (LQ) differential games discussed in Chapter5 have the well-defined solution structure. However, it is not suitable for dynamics that are highly nonlinear since the linear dynamic model cannot accurately describe the system behaviors, thus failing to make the resultant game solution optimal in reality. There are a few ways to address nonlinearities in class of infinite horizon problems in optimal control and differential games. Those include the state-dependent Riccati equation (SDRE) technique, where nonlinear dynamics is rewritten in a linear-parameterized form from which an algebraic Riccati equation is formed and solved, with assumption of the solution driving the optimality condition to the steady-state form. Another approach is the model predictive control (MPC), also known as the receding horizon control, where the problem is broken up to multiple short finite horizon problems that are iteratively solved. These methods generally require numerical methods in finding the solution, and as the nonlinearity increases so does the computational effort. There is also a trade-off between the computational effort and optimality of the solution. In this chapter, an approach to investigate the satellite post-docking that utilize the LQ structure while attempting to preserve nonlinearities is presented. First the system dynamics is written in the euler-lagrange form Given by Ref. [72]:

M(q)q¨ + Vm(q, q˙ )q˙ + f(q˙ ) + g(q) + τd = τ . (6–1)

n×n n×n where M(q) R denotes the inertia matrix, Vm(q, q˙ ) R denotes centripetal ∈ ∈ and Coriolis effects, f(q˙ ) Rn contains static and dynamic friction terms, g(q) Rn is ∈ ∈ n the vector of gravitational effects, and τd R contains all unmodeled disturbances. ∈ Instead of directly formulating a 2-player LQ differential game from the system dynamics, several error states are introduced to form error dynamics, and a feedback controller is designed to derive a linear error model. Although simplified, the residual linear system

77 is time variant which is associated with the original system’s nonlinearity. Formulating a game problem with this linear error dynamics yields the Riccati differential equations independent of the states (i.e., the errors), which allows an analytical solution, if exists, which works with any dynamics written in the euler-lagrange form. Two different error models based on linear and multiplicative errors are presented. For each linear error system, both the open-loop and the closed-loop Riccati equations are derived. The challenge is solving the Riccati differential equations associated with the two-player linear quadratic Stackelberg differential games. The feedback-linearized error systems are time-variant, and thus for the infinite horizon structure the optimality conditions are not reduced to the steady-state form (i.e., algebraic Riccati equations), so an analytical solution to the matrix differential equations is sought.

6.1 Game Based on Additive Errors ... Suppose a desired trajectory qd (t) is provided such that qd (t), q˙ d (t), q¨ d (t), and q d (t) exist and are bounded. The tracking error e1, the auxiliary tracking error e2, and the filtered tracking error r are then defined as

e1 = qd q, −

e2 = e˙ 1 + α1e1, (6–2)

r = e˙ 2 + α2e2,

n×n where α1, α2 R are constant gain matrices. Equation (6–1) can be rewritten using ∈ Eq. (6–2) to yield an error model

Me˙ 2 = Vme2 τ + M(q¨ + α1e˙ 1) + Vm(q˙ + α1e1) + f + g + τd . (6–3) − −

The nonlinear dynamics in Eq. (6–3) are feedback linearized by choosing τ such that all the known nonlinear terms are canceled. Group the terms in Eq. (6–3) to be canceled

78 and call it h:

h := M(q¨ + α1e˙ 1) + Vm(q˙ + α1e1) + f + g, (6–4)

which leaves the term τd in Eq. (6–3) and results in

Me˙ 2 = Vme2 τ + h + τd . − −

Then, design the control torque input as the sum of the nonlinear terms h and the game-theoretic controller u:

τ = h u, (6–5) − which yields the closed-loop error system

Me˙ 2 = Vme2 + u + τd . −

Thus, the dynamics of the tracking errors and the auxiliary errors e1 and e2, respectively, can be written as

e˙ 1 = α2e1 + e2, − −1 −1 −1 e˙ 2 = M Vme2 + M u + M τd , − which can also be written as a linear time-varying system:

x˙ = Ax + Bu + Bτd , (6–6) where

e1 α1 In×n 0n×n x = , A = − , B = (6–7)    −1   −1  e2 0n×n M Vm M    −         

79 With Eq. (6–6) a two-player LQ differential game is formulated as

1 ∞ J = xT Qx + uT R u + uT R u dt, 1 2 1 11 1 2 12 2 Z0 1 ∞  J = xT Nx + uT R u + uT R u dt, (6–8) 2 2 1 21 1 2 22 2 Z0  x˙ = Ax + Bu1 + Bτd .

Note that all the game considered in this chapter are of the form given by Eq. (6–8). Different error models correspond to different terms inside inside Eq. (6–6) (e.g., It takes the form of Eq. (6–7) for the additive error case), which contribute to different Riccati equations. Furthermore, the strategies employed (i.e., open-loop vs. closed-loop) affect the shape of the Riccati equations as well.

6.1.1 Open-loop Stackelberg Solution

Three Riccati differential equations for open-loop two-person LQ differential game are obtained by Simaan [51] as

K˙ + KA + AT K KBR−1BT K KBR−1BT P + QT = 0 (6–9) − 11 − 22 P˙ + PA + AT P PBR−1BT K PBR−1BT P + NT QS = 0 (6–10) − 11 − 22 − −1 T −1 T −1 −1 T −1 T S˙ + SA AS SBR B K SBR B P + BR R21R B K BR B P = 0 (6–11) − − 11 − 22 11 11 − 11

Any set of K, P, and S satisfying Eqs. (6–79)-(6–81) can be the solution of the game. In this paper one set is shown as a solution. The following properties are utilized in simplification of terms and finding the solution.

Property 1 (Property 2.2, Qu [73]). M˙ 2Vm is skew-symmetric. − T T T From Property1 it can be shown that M˙ Vm V , M˙ 2V , and Vm V are also − − m − m − m skew-symmetric. Although the skew-symmetricity is not a theorem but a property, the proof for Property1 is shown as follows:

80 Proof. In [73] Vm is related to the mass matrix by

1 T ∂M(q) Vm(q, q˙ )q˙ = M˙ (q)q˙ q˙ q˙ (6–12) − 2 ∂q   and since

∂M(q) dq ∂M(q) M(˙q) = = q˙ (6–13) ∂q dt ∂q    

Vm can be written as

∂M(q) 1 T ∂M(q) Vm = q˙ q˙ (6–14) ∂q − 2 ∂q     Note

∂M(q) ∂M(q) ∂M(q) ∂M(q) n×n q˙ = q˙ q˙ q˙ R (6–15) ∂q ∂q ∂q ··· ∂q ∈    1 2 n 

∂M(q) q˙ T ∂q1    T ∂M(q) ∂M(q) q˙ T  ∂q2  n×n q˙ =    R (6–16) ∂q  .  ∈    .     ∂M(q)  q˙ T   ∂q    n    From Eqs. (6–15) and (6–16),

∂M(q) T ∂M(q) q˙ = q˙ T ∂q ∂q      From Eqs. (6–14), (6–15), and (6–16),

∂M(q) ∂M(q) T ∂M(q) M˙ 2Vm = q˙ 2 q˙ + q˙ − ∂q − ∂q ∂q       ∂M(q) ∂M(q) = q˙ + q˙ T − ∂q ∂q     which is the sum of a square matrix and its transpose and therefore skew-symmetric.

81 All the skew-symmetric matrices appearing in the Riccati equations can be treated as zeros.

Assume that K, P, and S are symmetric block diagonal matrices

K11 0 P11 0 S11 0 K =   , P =   , S =   0 K22 0 P22 0 S22             and denote Q and N as block matrices

Q11 Q12 N11 N12 Q = , N =  T   T  Q12 Q22 N12 N22         then Eq. (6–80) is written in matrix form as

T P˙ 11 0 P11 0 α1 I α 0 P11 0 − − 1   +     +     ˙ −1 T −1 0 P22 0 P22 0 M Vm I VmM 0 P22      −   −              0 0 0 0

−  −1 −1 −1  −  −1 −1 −1  0 P22M R11 M K22 0 P22M R22 M P22         N11 N12 Q11 Q12 S11 0 + = 0 (6–17)  T  −  T    N12 N22 Q12 Q22 0 S22             which is simplified and decomposed to four matrix equations Eqs. (6–90)-(6–93)

T P˙ 11 P11α1 α P11 + N11 Q11S11 = 0 (6–18) − − 1 −

P11 + N12 Q12S22 = 0 (6–19) − T T P11 + N Q S11 = 0 (6–20) 12 − 12 −1 −1 −1 −1 −1 P˙ 22 P22M Vm VmM P22 P22M R M K22 − − − 11 −1 −1 −1 P22M R M P22 + N22 Q22S22 = 0 (6–21) − 22 −

82 With P22 = K22 = M, Eq. (6–93) becomes

−1 −1 R R + N22 Q22S22 = 0 (6–22) − 11 − 22 −

which makes S22 constant matrix. It is then followed that P11 and S11 are also constant. Equation (6–80) is written in matrix form as

T K˙ 11 0 K11 0 α1 I α 0 K11 0 − − 1   +     +     ˙ −1 T −1 0 K22 0 K22 0 M Vm I VmM 0 K22      −   −              0 0 0 0 Q11 Q12 + = 0 (6–23) −  −1 −1 −1  −  −1 −1 −1   T  0 K22M R11 M K22 0 K22M R22 M P22 Q12 Q22             which is simplified and decomposed to four matrix equations Eqs. (6–94)-(6–97):

T K˙ 11 K11α1 α K11 + Q11 = 0 (6–24) − − 1

K11 + Q12 = 0 (6–25)

T K11 + Q12 = 0 (6–26)

−1 T −1 −1 −1 −1 K˙ 22 K22M Vm V M K22 K22M R M K22 − − m − 11 −1 −1 −1 K22M R M P22 + Q22 = 0 (6–27) − 22

Substitute P22 = M and K22 = M into Eq. (6–97) to yield

−1 −1 R R + Q22 = 0 (6–28) − 11 − 22

From Eqs. (6–22)&(6–28), and with the assumption that N22 = Q22, a solution for S22 − is obtained as

S22 = 2I (6–29) −

83 Equation (6–81) is written in matrix form as

S˙ 11 0 S11 0 α1 I α1 I S11 0 + − −      −1  −  −1    0 S˙ 22 0 S22 0 M Vm 0 M Vm 0 S22      −   −              0 0 0 0

−  −1 −1 −1  −  −1 −1 −1  0 S22M R11 M K22 0 S22M R22 M P22         0 0 0 0 + = 0 (6–30)  −1 −1 −1 −1  −  −1 −1 −1  0 M R11 R21R11 M K22 0 M R11 M P22         which is simplified and decomposed to three1 matrix equations Eqs. (6–98)-(6–100)

S˙ 11 S11α1 + α1S11 = 0 (6–31) −

S11 S22 = 0 (6–32) − −1 −1 −1 −1 −1 −1 −1 −1 S˙ 22 S22M Vm + M VmS22 S22M R M K22 S22M R M P22 − − 11 − 22 −1 −1 −1 −1 −1 −1 −1 +M R R21R M K22 M R M P22 = 0 (6–33) 11 11 − 11

Substitute P22 = K22 = M and S22 = 2I into Eq. (6–100) to yield −

−1 −1 −1 −1 R11 + 2R22 + R11 R21R11 = 0 (6–34)

Now the rest of K, P, and S are found. From Eq. (6–99),

S11 = S22 = 2I −

Equation (6–98) is automatically satisfied since S11 is scalar multiple of identity matrix. From Eq. (6–95)-(6–96),

1 T K11 = Q12 + Q (6–35) −2 12 

1 There are four matrix equations, but one of them is 0 = 0.

84 and K11 should also satisfy Eq. (6–94) can be written as

1 Q + QT α + αT Q + QT + Q = 0 (6–36) 2 12 12 1 1 12 12 11 n×n    which is a Lyapunov equation. One way to set the parameters is to choose α1, Q12, and

T Q12 then calculate Q11 from them. From Eq. (6–91)-(6–92),

1 T T P11 = N12 + N Q12 + Q −2 12 − 12 1 T   = N12 + N + 2K11 (6–37) −2 12  Substitute Eqs. (6–36)-(6–37) into Eq. (6–94) and simplify to yield another constraint

1 N + NT α + αT N + NT + N = 0 (6–38) 2 12 12 1 1 12 12 11    which is also a Lyapunov equation. Finally, combine Eqs. (6–28) and (6–34) to get

−1 −1 −1 Q22 + R22 + R11 R21R11 = 0 (6–39)

It must be noted that the way the solution is obtained is not allowed in every case. The analytical solution is guaranteed at the expense of the constraints on the gains of the cost functionals; that is, one cannot arbitrarily choose the objective of the players and utilize the control strategy obtained here. It is not a problem for the purpose of modeling a post-docking scenario, however, since the follower’s control strategy is passively determined based on the assumed objective of the leader (DV), rather than the leader having a specific objective which needs to be satisfied.

6.1.2 Closed-loop Stackelberg Solution

The solution to a differential game consists of the optimal strategy (i.e., control input) of each player, the state trajectory propagated based on the players’ strategies, and the corresponding costs. Ideally, the solution is good for the entire horizon. However, any changes in the game, such as changes in the objective of each player and

85 different behavior of the system due to uncertainty and disturbances, compromise the optimality of the game strategies. This phenomenon is known as “time inconsistency” or “subgame-imperfection” in game theory [74]. There are strong and weak time inconsistencies, where the distinction is based on the initial conditions. The type of inconsistency addressed in this paper is associated with strong time inconsistency. To account for time inconsistency, a closed-loop Stackelberg strategy is considered. In the differential game-theoretic sense, “open-loop” refers to a decision making of each player based on the initial condition, and “closed-loop” refers to the ability of the players to change their decisions based on current information. A closed-loop strategy in Stackelberg game is more adaptive to potential discrepancy between the real behavior of the system and its hypothetical model, which is known as time inconsistency. Open-loop and closed-loop strategies in differential games theory are both in fact open-loop strategies in the control theoretic sense. However, they can be both implemented in closed-loop systems, just like linear quadratic regulator (LQR) controller. In this section one possible solution s presented. Stackelberg games provide a framework for systems that operate on different levels with a prescribed hierarchy of decisions. For a two-person Stackelberg game where the system is affected by two decision makers, the problem is cast in two solution spaces: the leader and the follower, where each player tries to minimize their respective cost functionals. The leader is a decision maker that can enforce its strategy to minimize its objective metric over the follower. For example, when two inputs affect the behavior of a system, the one with more rapid dynamics can be considered the leader in Stackelberg structure; since the system responds more rapidly to the leader’s control input, it is reasonable to put more weight in optimizing the leader’s control strategy while making the follower compromise.

A Stackelberg differential game problem, with uF as the follower and uL as the leader, is formulated by a differential constraint and the cost functionals J1(z, uF , uL),

86 J2(z, uF , uL) R as ∈

z˙ = Az + BuF + BuL 1 ∞ J = zT Qz + uT R u + uT R u dt F 2 F 11 F L 12 L (6–40) Zt0 1 ∞  J = zT Nz + uT R u + uT R u dt L 2 F 21 F L 22 L Zt0  where Q, N R2n×2n are symmetric constant matrices defined as ∈

Q11 Q12 N11 N12 Q = , N = , (6–41)  T   T  Q12 Q22 N12 N22         n×n and Qij , Nij R are positive definite and symmetric constant matrices i, j = 1, 2. A ∈ ∀ closed-loop solution is sought by extending [44]. Unlike the open-loop case, the follower assumes that the leader’s strategy explicitly affects the system. With the game being of linear-quadratic structure, the following assumption is made.

Assumption: In computing its strategy, the follower assumes that the leader’s strategy is linear in the states such that

uL = F2z,

2n×2n where F2(t) R is associated with the such that the follower’s problem is written as ∈

z˙ = (A + BF2) z + BuF 1 ∞ J = zT Q + FT R F z + uT R u dt. F 2 2 12 2 F 11 F Zt0   The Hamiltonian of the follower is

1 H = zT (Q + FT R F )z + uT R u F 2 2 12 2 F 11 F T  + λ1 ((A + BF2)x + BuF ) ,

87 where the optimal control strategy and the costate equation of the follower are obtained as

−1 T uF = R B λ1, (6–42) − 11 T T T λ˙ 1 = (Q + F R12F2) z (A + BF2) λ1. (6–43) − 2 −

Substituting (6–42) and (6–43) into the dynamics and J2(z, uF , uL, t) yields an optimal control problem of the leader

−1 T z˙ = Az BR B λ1 + BuL, − 11 1 ∞ J = zT Nz + uT R u + λT BR−1R R−1BT λ dt, L 2 L 22 L 1 11 21 11 1 Zt0  where the Hamiltonian of the leader is constructed as

1 H = zT Nz + λT BR−1R R−1BT λ + uT R u L 2 1 11 21 11 1 L 22 L T −1 T  + λ Az BR B λ1 + BuL 2 − 11 T T T  T + ψ (Q + F R12F2) z (A + BF2) λ1 , − 2 −  where

−1 T uL = R B λ2, (6–44) − 22 T T T λ˙ 2 = N x A λ2 + (Q + F R12F2)ψ, (6–45) − − 2 −1 −1 T −1 T ψ˙ = BR R21R B λ1 + BR B λ2 + (A + BF2)ψ. (6–46) − 11 11 11

The expressions derived in (6–42)-(6–46) define the solution to the differential game. The subsequent analysis aims at developing an expression for the costate variables

(λ1(t), λ2(t), ψ(t)) which can be implemented by the controllers uF (t) and uL(t). Suppose

88 that the costates are linear in the state:

λ1 = Kz, (6–47)

λ2 = Pz, (6–48)

ψ = Sz, (6–49) where K(t), P(t), S(t) R2n×2n are time-varying positive definite diagonal matrices. ∈ Given these assumed solutions, conditions and constraints are developed to ensure (6–47)-(6–49) satisfy (6–43), (6–45), and (6–46). Differentiating (6–47)-(6–49) and substituting the dynamic constraint in Eq. (6–8) along with Eqs. (6–42)-(6–46) yields three differential Riccati equations

˙ −1 T −1 T 0 = K + KA KBR11 B K KBR22 B P + Q − − (6–50) −1 −1 T T −1 T + PBR R12R B P + A K PBR B K, 22 22 − 22 ˙ −1 T −1 T 0 = P + PA PBR11 B K PBR22 B P + N − − (6–51) T −1 −1 T + A P QS PBR R12R B PS, − − 22 22 0 = S˙ + SA SBR−1BT K SBR−1BT P − 11 − 22 −1 −1 T −1 T + BR R21R B K BR B P (6–52) 11 11 − 11 AS + BR−1BT PS. − 22

89 Equations (6–50)-(6–52) can be expressed as open-loop Riccati equations plus additional terms. From Eqs. (2–18) the open-loop Riccati equations are

˙ T −1 T 0 = K + KA + A K KBR11 B K − (6–53) KBR−1BT P + QT , − 22 ˙ T −1 T 0 = P + PA + A P PBR11 B K − (6–54) PBR−1BTP + NT QS, − 22 − ˙ −1 T −1 T 0 = S + SA AS SBR11 B K SBR22 B P − − − (6–55) + BR−1R R−1BTK BR−1BTP. 11 21 11 − 11 Let the subscripts CRE and ORE denote the closed-loop (6–50)-(6–52) and open-loop (6–53)-(6–55) Ricatti equations, respectively. Then the closed-loop Riccati equations can be written as

−1 −1 T −1 T f = f + PBR R12R B P PBR B K = 0, (6–56) KCRE KORE 22 22 − 22 −1 −1 T f = PORE PBR R12R B PS = 0, (6–57) PCRE − 22 22 −1 T FSCRE = FSORE + BR22 B PS = 0. (6–58)

In Eqs. (6–56)-(6–58), K(t) and P(t) correspond to u1(t) and u2(t) respectively, while S(t) constraints the trajectory of K(t) and P(t). Equations (6–56)-(6–58) must be solved simultaneously to yield the Stackelberg control strategies for the leader and the follower.

90 If P(t), K(t), and S(t) are selected as

P11 0n×n P =   , (6–59) 0n×n M     K11 0n×n K =   , (6–60) 0n×n M     S11 0n×n S =   , (6–61) 0n×n 2In×n  −    where K11 and P11 satisfy

1 T K11 = Q12 + Q12 , −2 (6–62) 1 T  P11 = N12 + N + 2K11, −2 12  then (6–50)-(6–52) are solved with the following constraints on the cost functionals:

1 Q + QT α + αT Q + QT + Q = 0 2 12 12 1 1 12 12 11 1 T  T T  N + N α + α N + N + N = 0 2 12 12 1 1 12 12 11   −1 −1 −1 (6–63) Q22 + R22 + R11 R21R11 = 0

−1 −1 −1 −1 R 2R + Q22 + R R12R = 0, − 11 − 22 22 22

Q22 + N22 = 0.

From Eqs. (6–42), (6–44), (6–47), (6–48), (6–59), and (6–60), the closed-loop Stackelberg game-based controllers are obtained as

−1 uF = R e2, (6–64) − 11 −1 uL = R e2. (6–65) − 22

Note that the solution has the same form as the open-loop problem except that more conservative constraints are placed on the relationship among the gain matrices. In particular, constraints in Eq. (6–63) include R12, which affects the decision of u1(t) due

91 to the decision of u2(t) for the closed-loop case. Therefore, a closed-loop strategy for the follower is better than an open-loop strategy in addressing the time inconsistency.

6.2 Game Based on Multiplicative Errors

Attitude errors are inherently nonlinear. The additive error model discussed in Section 6.1, while works for wide varieties of dynamic models with hierarchical non-cooperative structure, cannot fully capture the rotational motion of a rigid body when the orientation mismatch between the current and the desired attitudes is huge. To be exact, this issue is associated with With the asymptotic controller design, the error is supposed to continuously converge to an equilibrium. The subsequent development chooses the error quaternion among the multiplicative errors. The error quaternion is defined as an additional rotation from the current to the desired orientation, as depicted in Fig. 6-1.

Desired Orientation

Current (Real) Orientation

Figure 6-1. The relationship among the current and the desired orientations.

Let the current, the desired, and the error quaternions be denoted as

q qd e q¯ =   , q¯ d =   , q¯ e =   , (6–66) q4 qd4 qe4            

× e = e1 = d41 d q q4d (6–67) − − 

92 where

2 2 q4 = 1 q , d4 = 1 d (6–68) − k k − k k q q then the auxiliary and the filtered tracking errors are defined in the same way as the additive error case

e2 = e˙ 1 + α1e1

r = e˙ 2 + α2e2

Assuming d˙ = d¨ = 0, d˙4 = d¨4 = 0,

× e˙ 1 = d41 d q˙ q˙ 4d − − T × dq = d41 d + q˙ = P(q, d)q˙ − q  4  T T T T × dq dq˙ dq q q˙ e¨ 1 = d41 d + q¨ + + q˙ = P(q, d)q¨ + Q(q, q˙ , d)q˙ − q q q3  4   4 4  Now develop the error system.

Me˙ 2 = Me¨ 1 + α1e˙ 1

= MPq¨ + M Q + α1P q˙ . (6–69) { }

Substituting q¨ from Eq. (6–1)

−1 q¨ = M Vmq˙ + g + τd τ − { − } into Eq. (6–69) yields

−1 Me˙ 2 = MPM Vmq˙ + g + τd τ + M Q + α1P q˙ (6–70) − { − } { }

Let τ = h u where −

−1 h = Vmq˙ + g + MP Q + α1P q˙ Vme2 (6–71) { } −

93 then Eq. (6–70) becomes

−1 −1 −1 Me˙ 2 = MPM Vme2 MPM u MPM τd (6–72) − − −

6.2.1 Open-loop Stackelberg Solution

The dynamics of the tracking and the auxiliary tracking error is

e˙ 1 = α1e1 + e2 (6–73) − −1 −1 −1 e˙ 2 = PM Vme2 PM u PM τd (6–74) − − −

The linear-quadratic differential game is defined as

x˙ = Ax + B1u + B2τd (6–75) 1 ∞ J = xT Qx + uT R u + uT R u dt (6–76) 1 2 1 11 1 2 12 2 Z0 1 ∞  J = xT Nx + uT R u + uT R u dt (6–77) 2 2 1 21 1 2 22 2 Z0  where

e1 α1 1 0 x = , A = − , B1 = B2 = (6–78)    −1   −1 e2 0 PM Vm PM    −  −        The difference from the previous work is the term B, so the Riccati equations need to be investigated.

K˙ + KA + AT KT KBR−1BT KT KBR−1BT PT + QT = 0 (6–79) − 11 − 22 P˙ + PA + AT PT PBR−1BT KT PBR−1BT PT + NT QS = 0 (6–80) − 11 − 22 − −1 T T −1 T −1 −1 T −1 T T S˙ + SA AS SBR B K SBR B P + BR R21R B K BR B P = 0 (6–81) − − 11 − 22 11 11 − 11

Assuming

K11 0 P11 0 S11 0 K =   , P =   , S =   (6–82) 0 K22 0 P22 0 S22            

94 Equation. (6–80), in the block diagram form, will look like:

T P˙ 11 0 α1P11 P11 α1P 0 − − 11   +   +   ˙ −1 T T −1 T T 0 P22 0 P22PM Vm P11 VmM P P22    −   −        0 0 0 0

−  −1 −1 −1 T T  −  −1 −1 −1 T T  0 P22PM R11 M P K22 0 P22PM R22 M P P22         N11 N12 Q11 Q12 S11 0 0 0 + = (6–83)  T  −  T      N12 N22 Q12 Q22 0 S22 0 0                 and is decomposed to four matrix equations:

T P˙ 11 α1P11 + α1P + N11 Q11S11 = 0 − 11 −

P11 + N12 Q12S22 = 0 − T T T P + N Q S11 = 0 11 12 − 12 −1 T −1 T T −1 −1 −1 T T P˙ 22 P22PM Vm V M P P P22PM R M P K − − m 22 − 11 22 −1 −1 −1 T T P22PM R M P P + N22 Q22S22 = 0 (6–84) − 22 22 −

In order to solve Eq. (6–84), we need to choose P22. For the additive error case, P22 = M

−1 −1 worked. For the multiplicative case, try P22 = MP (and K22 = MP too):

−1 −1 T −1 −1 MP˙ + MP˙ Vm V R R + N22 Q22S22 = 0 (6–85) − − m − 11 − 22 −

Equation (6–85) has MP˙ −1 and MP˙ −1, making it impossible to utilize the skew-symmetric

T property (i.e., M˙ Vm V = 0). However, we continue anyway and proceed to − − m

95 Eq. (6–79):

T K˙ 11 0 α1K11 K11 α1K 0 − − 11   +   +   ˙ −1 T T −1 T T 0 K22 0 K22PM Vm K11 VmM P K22    −   −        0 0 0 0

−  −1 −1 −1 T T  −  −1 −1 −1 T T  0 K22PM R11 M P K22 0 K22PM R22 M P P22         Q11 Q12 0 0 + =  T    Q12 Q22 0 0         or

T K˙ 11 α1K11 α1K + Q11 = 0 − − 11

K11 + Q12 = 0

T T K11 + Q12 = 0

−1 T −1 T T −1 −1 −1 T T K˙ 22 K22PM Vm V M P K K22PM R M P K − − m 22 − 11 22 −1 −1 −1 T T K22PM R M P P + Q22 = 0 (6–86) − 22 22

−1 With P22 = M22 = MP , Eq. (6–86) becomes

−1 −1 T −1 −1 MP˙ + MP˙ Vm V R R + Q22 = 0 (6–87) − − m − 11 − 22

Subtracting Eq. (6–87) from Eq. (6–85) yields

N22 = Q22 (1 + S22)

Assuming Q22 = N22, −

S22 = 21 −

96 Now look at Eq. (6–81).

S˙ 11 0 α1S11 S11 α1S11 21 0 0 + − − −    −1  −  −1  −  −1 −1 −1 T T  0 0 0 2PM Vm 0 2PM Vm 0 2PM R11 M P K22        −          0 0 0 0 + −  −1 −1 −1 T T   −1 −1 −1 −1 T T  0 2PM R22 M P P22 0 PM R11 R21R11 M P K22  −        0 0 0 0 = −  −1 −1 −1 T T    0 PM R11 M P P22 0 0         which is decomposed to

S˙ 11 = 0

S11 + 21 = 0

0 = 0

−1 −1 −1 T T −1 −1 −1 T T 2PM R11 M P K22 + 2PM R22 M P P22

−1 −1 −1 −1 T T −1 −1 −1 T T +PM R R21R M P K PM R M P P = 0 (6–88) 11 11 22 − 11 22

−1 Substituting K22 = P22 = MP , Eq. (6–88) becomes

−1 −1 −1 −1 −1 −1 −1 −1 −1 2PM R + 2PM R + PM R R21R PM R = 0 11 22 11 11 − 11 or

−1 −1 −1 −1 −1 −1 PM 2R + 2R + R R21R R = 0 (6–89) 11 22 11 11 − 11  From Eq. (6–89) a constraint between R11, R21, and R22 are obtained as

−1 −1 −1 −1 R = 2R R R21R 11 − 22 − 11 11

Now go back to Eq. (6–87). It is possible to make Q22 time-varying as

−1 −1 T −1 −1 Q22 = Q22(t) = MP˙ MP˙ + Vm + V + R + R − − m 11 22

97 T P˙ 11 P11α1 α P11 + N11 Q11S11 = 0 (6–90) − − 1 −

P11 + N12 Q12S22 = 0 (6–91) − T T P11 + N Q S11 = 0 (6–92) 12 − 12 −1 −1 −1 −1 −1 P˙ 22 P22M Vm VmM P22 P22M R M K22 − − − 11 −1 −1 −1 P22M R M P22 + N22 Q22S22 = 0 (6–93) − 22 −

T K˙ 11 K11α1 α K11 + Q11 = 0 (6–94) − − 1

K11 + Q12 = 0 (6–95)

T K11 + Q12 = 0 (6–96)

−1 T −1 −1 −1 −1 K˙ 22 K22M Vm V M K22 K22M R M K22 − − m − 11 −1 −1 −1 K22M R M P22 + Q22 = 0 (6–97) − 22

S˙ 11 S11α1 + α1S11 = 0 (6–98) −

S11 S22 = 0 (6–99) − −1 −1 −1 −1 −1 −1 −1 −1 S˙ 22 S22M Vm + M VmS22 S22M R M K22 S22M R M P22 − − 11 − 22 −1 −1 −1 −1 −1 −1 −1 +M R R21R M K22 M R M P22 = 0 (6–100) 11 11 − 11

98 6.2.2 Closed-loop Stackelberg Solution

Recall the closed-loop Stackelberg Riccati differential equations

˙ −1 T −1 T 0 = K + KA KBR11 B K KBR22 B P + Q − − (6–101) −1 −1 T T −1 T + PBR R12R B P + A K PBR B K, 22 22 − 22 ˙ −1 T −1 T 0 = P + PA PBR11 B K PBR22 B P + N − − (6–102) T −1 −1 T + A P QS PBR R12R B PS, − − 22 22 ˙ −1 T −1 T 0 = S + SA SBR11 B K SBR22 B P − − (6–103) −1 −1 T −1 T −1 T + BR R21R B K BR B P AS + BR B PS. 11 11 − 11 − 22 Each matrix equation R2n×2n is decomposed into four matrix equations Rn×n. ∈ ∈ 6.3 Simulations and Results

The following scenario is considered for the problem of post-docked state maintenance via a differential game-based controller. A service vehicle (SV) satellite is docked with another satellite referred to as a disabled vehicle (DV), which is defined as a satellite whose behavior is unpredictable due to a malfunctioning actuator or loss of communication. In the event of a perfect rendezvous, the body-fixed coordinate frames of both satellites are aligned. Both are equipped with 3-axis attitude control and can affect the relative orientation of one from the other. The attitude misalignment is assumed to cause an interaction force/torque at the docking point. It is assumed that the attitude error is small and is upper-bounded by 90◦2 . The use of modified Rodriguez parameter (MRP) will circumvent this constraint and allow infinite angular displacement.

2 This upper-bound is conservative and comes from elimination of the Euler parameter η in Eq. (6–105). The attitude misalignment would be much smaller for post-docking.

99 Service Vehicle Service Vehicle (SV) (SV)

Disabled Vehicle Disabled Vehicle (DV) (DV)

Figure 6-2. Two docked satellites approximated as two rigid bodies connected via a torsion spring

Note that restricting the motion to rotational is justified as long as the non-cooperating behavior of the target is not adversarial. Thus the scenario is specific to a disabled satellite, which is cooperative if not disabled. A mathematical model of the scenario shown in Fig. 6-2 is constructed where the interaction between the SV and the DV is a function of orientation mismatch and a virtual torsion spring placed at the docking point. The interaction affects the SV’s angular motion which is governed by

× Jω˙ + ω Jω = τ + τd + τs (6–104) where J is the moment of inertia matrix, ω is the SV’s angular velocity relative to

the DV, τ is the SV’s control torque input, τd is the DV’s control torque input which is noncooperative, and τs is the interaction moment due to an orientation mismatch between the SV and DV. The superscript in Eq. (6–104) denotes the skew-symmetric × matrix equivalent to vector cross product. Since τd is unknown to the SV, it is considered to be a disturbance.

100 Equation (6–104) is written as a function of a unit quaternion (i.e., the Euler

T parameters). Let  = [1 2 3] and η be defined as θ  = n sin 2 (6–105) θ η = cos 2 which represent the rotation of a vector about a unit vector n by an angle θ. The spring torque is expressed in terms of  and η as

−1  2k cos (η) if  = 03, −  6 τs = kθn =  k k (6–106) −  03 if  = 03.  where k is the spring constant. For simplicity, the spring stiffness is assumed uniform in any direction (i.e., k is a scalar). Damping effect can also be considered, but it is applied such that the orientation mismatch is corrected, so it is not considered here. The angular velocity can be expressed in terms of the Euler parameters and their time derivatives as shown in [75] as

ω1 η 3 2 1 ˙1 − −       ω ω2 3 η 1 2 ˙2 − −   =   = 2     (6–107)       0 ω3  2 1 η 3 ˙3      − −             0   1 2 3 η   η˙              or in a more compact form

× ω = 2 ηI3×3  ˙ 2η ˙ − − (6–108) 0 = T˙ + ηη˙ 

Since quaternions are four-dimensional while the problem is 3-DOF, η is eliminated and the vector components  is treated as the state of the system. Based on the

101 previous assumption that the angular displacement never goes beyond 90◦, let

η = 1 T  − p and from Eq. (6–108) it can be shown that the first and the second time derivatives are

1 η˙ = T ˙ −√1 T  − ˙T ˙ (T ˙)2 1 η¨ = T ¨ −√ T − 3 − √ T 1   √1 T  1   − − −   then the angular velocity and angular acceleration can be expressed as

T T ×  ω = 2 1   I3×3  + ˙ (6–109) − − √1 T  p  −  × T × ω˙ = 2 ˙ + 2 1   I3×3  ¨ − − − 2 hp 1  i + ˙T ˙ + T ˙ T ˙ + T ¨  (6–110) √1 T  1 T  −  −  Finally, Eqs. (6–109) and (6–110) can be substituted into Eq. (6–104) and rearranged to obtain the Euler Lagrange form

M() ¨ + Vm(, ˙) ˙ + g() + τd = τ (6–111) where

T T ×  M() = 2J 1   I3×3  + (6–112) − − √1 T  p  −  T T T × ˙  ˙ Vm(, ˙) = 2J  + +  T 3  − √1   T − √1    −    T  × T T ×  T ×  + 4 1   I3×3  + ˙ J 1   I3×3  + − − √1 T  − − √1 T  p  −   p  −  (6–113)

−1 T  2k cos ( 1  ) if  = 03, −  6 g() = τs =  k k (6–114) −  p 03 if  = 03.   102 The following parameters are chosen for demonstration. Suppose that after docking SV is expected to stay at the same location and maintain the orientation misalignment to DV as small as possible. Suppose that initially the SV’s offset is such that it is rotated about [1 2 3]T by an angle θ(0) = 10◦3 , which gives the initial condition

0.0233   q(0) = (0) = 0.0466     0.0699     and the desired trajectory is chosen to be zero.

qd (t) = 03

In this particular problem, nonzero desired trajectory means SV tries to change its orientation following the prescribed trajectory. However, since there is no direct control over the behavior of DV, there is no guarantee SV can track such trajectories along with DV while minimizing the interaction. In an ideal case, the disturbance due to the DV’s noncooperative behavior is

−1 τd = R e2. However, the controller’s performance needs to be tested for arbitrary − 22 disturbance. A test case is composed of two stages; roughly for the first five seconds the disturbance is modeled as game disturbance. after five seconds the game-based disturbance is fading and an arbitrary disturbance is applied.

Game disturbance 0.1 sin(t) + 0.15 cos(3t) −1   τd (t) = f (t) R e2(t) +g(t) 0.15 sin(2t) + 0.1 cos(t) (6–115) − 22   z }| {   3 sin(t) + 1.5 sin(t) cos(t)    Arbitrary disturbance  | {z }

3 A large number is chosen to demonstrate the controller’s capability to minimize the corresponding error.

103 where f (t) and g(t) are sigmoid-like functions to simulate smooth on/off switches between the game-based disturbance and the arbitrary disturbance as shown in Fig. 6-3:

1 f (t) = + 1 −1 + exp ( t + T ) − 1 g(t) = 1 + exp ( t + T ) − where T = 5s for this test.

1 f (t) 0.8 g(t) 0.6

0.4

0.2

0 0 5 10 15 20 t(s)

Figure 6-3. f (t) and g(t) as respective weights on the game and an arbitrary disturbances

Table 6-1 shows the list of all the parameters used in the simulation. The results with a Stackelberg strategy is shown in Fig. 6-4. Given a large angular

displacement, the SV’s control torque τ quickly adjusts its attitude and minimizes the attitude error, thus minimizing the resultant interaction with DV. As shown in Fig. 6-4F, the control torque due to RISE feedback which is required to feedback linearize the dynamics stays larger than the game-based control input and does not decay to zero.

On the other hand, Fig. 6-4E shows that the game-based control input u, which linearly depends on the tracking errors, quickly converges to zero.

104 Table 6-1. The simulation parameters for the Stackelberg-RISE controller Name Description Value Unit t Time of the simulation 20 s k The spring constant 300 N m/rad θ(0) Initial angular displacement 10· deg q(0) Initial states (0) [0.0233, 0.0466, 0.0699]T −− qd Desired trajectory 03 −− α1 Tuning gain on e1 I3×3 1/s α2 Tuning gain on e2 2I3×3 1/s

105 β Tuning gain on µ 5 N m · ks Tuning gain on µ 20 N m s J Moment of inertia of the SV diag([0.5, 0.25, 1]) kg· m· 2 · R11 Weight on u1 in J1 (Stackelberg) diag([1.2, 1, 1.15]) −− R21 Weight on u1 in J2 (Stackelberg) diag([ 0.6, 1, 1.4]) − − − −− R22 Weight on u2 in J2 (Stackelberg) diag([ 3.1579, 1.2821, 1.1006]) 20− 2.5 1.00− 4.0− 4.0 1.50 −− 2.5 10.0 2.50− 4.0 1.0 3.00  1.00 2.50 20.00 1.50− 3.00 6.00 Q Weight on x in J1 −  4.00 4.00 1.50 0.35 0 0  −− −   4.00 1.00 3.00 0 0.35 0   −   1.50 3.00 6.00 0 0 0.33   −    6.4 Conclusion

A general nonlinear dynamic system can be transformed to linear time-varying error model to admit differential game theoretic control strategies. While the result is not perfectly optimal due to feedback linearization, nonlinearity can be still captured and as a result the solution could be less limited than a linearized case where all the nonlinear characteristics of the system dynamics are removed in the first place. With relatively small noncooperative disturbance Stackelberg/RISE feedback control law demonstrates its capability of minimize the tracking error in a general nonlinear dynamic system. This control design allows both zero-sum and nonzero-sum considerations and therefore provides many ways to model noncooperative characters in a system (i.e., if there is absolutely no knowledge a Minimax strategy can be employed, and in cases where the noncooperative behaviors can be modeled to some extent a Stackelberg strategy can be applied).

106

1.2 0.02 q1 e1(1) q2 q3 e1(2) 1 0 e1(3) 0.8

−0.02 0.6

0.4 −0.04 0.2 −0.06 0

−0.08 −0.2 0 5 10 15 20 0 5 10 15 20 t(s) t(s)

A The tracking error e1(t) B q(t) to represent the SV’s orientation

10 20 θ (deg) τ τ1 τ2 3 8 15

6 10

(Nm) 4 5

0 2

−5 0 0 5 10 15 20 0 5 10 15 20 t(s) t(s) C The SV’s control torque input τ (t) D The angular displacement of SV relative to DV

20 0.6 µ u 1 u1 µ2 u2 µ3 0.4 3 15 0.2 10 0 (Nm) (Nm) −0.2 5

−0.4 0 −0.6

−0.8 −5 0 5 10 15 20 0 5 10 15 20 t(s) t(s) E Game theoretic contribution to τ F RISE feedback contribution to τ

Figure 6-4. The simulation results for the Stackelberg and RISE controller

107 CHAPTER 7 CONCLUSION AND FUTURE WORKS Necessity of understanding the spacecraft post-docking with the target being non-cooperative (e.g., disabled due to actuator malfunction and communication failure) was motivated by the potential of autonomous spacecraft for various purpose such as debris removal and rescue, and such a behavior was investigated through analysis using differential games theory with Stackelberg strategy. In order to fully analyze the non-cooperative behavior of the target spacecraft and feedback the information applying to spacecraft design full nonlinear differential games need to be solved, but difficulty of proceeding with such analysis attributes to the immaturity of concrete solution techniques for bilevel programming, with Stackelberg static game being its subclass. On the other hand, the foundation has been developed for the attitude controller for spacecraft post-docking. Utilization of the Euler-lagrange structure and the corresponding linearized error model allows of the expansion of the system dynamics to a higher fidelity model. Furthermore, the developed controller design can be implemented in real time, providing the analytical solution to the time-varying Stackelberg differential games. Based on the results, the works to be followed in the future include (i) investigating numerical methods to solve the corresponding nonlinear differential games and (ii) expanding the system dynamics to a higher fidelity model (e.g., contact force model).

108 APPENDIX A OPTIMALITY CONDITIONS OF TWO-PERSON STACKELBERG DIFFERENTIAL GAMES In order to make this manuscript relatively self-contained, in this section the first-order optimality conditions for the open-loop solution to two-person Stackelberg differential games discussed in Chapter5 are derived using calculus of variations. Suppose a two-person Stackelberg differential game is defined by the differential constraint

x˙ = f (x, u1, u2, t), x(t0) = x0 (A–1)

, and two cost functionals

tf J1 = Φ1(xf , tf ) + L1(x(t), u1(t), u2(t), t)dt, (A–2) Zt0 tf J2 = Φ2(xf , tf ) + L2(x(t), u1(t), u2(t), t)dt, (A–3) Zt0 where player 1 and player 2 are considered the follower and the leader, respectively.

A.1 Fixed Final Time

Consider a game described by the Eqs. (A–1), (A–2), and (A–3). It is assumed that tf is fixed and x(tf ) = xf is free. Assume that the control paths are unconstrained. For an open-loop Stackelberg game, leader announces his control paths u2( ). Then the · follower tries to find the optimal control path u1( ). · A.1.1 Follower’s Strategy

Then the Hamiltonian of the follower is

T H1 = L1(x(t), u1(t), u2(t), t) + λ1 f (x(t), u1(t), u2(t), t), (A–4)

109 where u2(t), for now, is assumed to be a prescribed function of time. Then the augmented cost functional of the follower is

tf T Ja1 = Φ1 + H1 λ x˙ dt. − 1 Zt0   The first variation of Ja1 is

tf T δJa1 = δΦ1 + δ H1 λ x˙ dt. (A–5) − 1 Zt0   A.1.1.1 Variation of the augmented cost functional

Now investigate the variation Eq. (A–17). The variation of the boundary condition is given by

∂Φ ∂Φ ∂Φ ∂Φ δΦ = 1 δx + 1 δt + 1 δx + 1 δt . (A–6) 1 ∂x(t ) 0 ∂t 0 ∂x(t ) f ∂t f  0   0   f   f 

Since only the final state is free, δx0 = 0 and δt0 = δtf = 0, thus Eq. (A–6) becomes

∂Φ δΦ = 1 δx . (A–7) 1 ∂x(t ) f  f  The variation of Hamiltonian is given by

∂H ∂H ∂H ∂H δH = 1 δx + 1 δλ + 1 δu + 1 δu . 1 ∂x ∂λ 1 ∂u 1 ∂u 2    1   1   2 

110 We have

tf tf T T T δ H1 λ x˙ dt = δH1 δλ x˙ λ δx˙ dt − 1 − 1 − 1 Zt0 Zt0     T T + H1 λ1 x˙ δtf H1 λ1 x˙ δt0 − tf − − tf tf   T  T  tf ˙ T = δH1 δλ1 x˙ dt λ1 δx λ1 δxdt − − t0 − Zt0  Zt0   tf   T T T = λ (tf )δx(tf ) + δH1 δλ x˙ + λ˙ δx dt − 1 − 1 1 Zt0 tf   T ∂H1 ∂H1 = λ (tf )δx(tf ) + δx + δλ1 − 1 ∂x ∂λ Zt0    1  ∂H1 ∂H1 T T + δu1 + δu2 δλ x˙ + λ˙ δx dt ∂u ∂u − 1 1  1   2   tf T ∂H1 T = λ (tf )δx(tf ) + + λ˙ δx − 1 ∂x 1 Zt0    ∂H1 T ∂H1 ∂H1 + x˙ δλ1 + δu1 + δu2 dt ∂λ − ∂u ∂u  1    1   2   Therefore the variation of the augmented cost functional of the follower is

tf T δJa1 = δΦ1 + δ H1 λ x˙ dt − 1 Zt0   tf ∂Φ1 T ∂H1 T = δxf λ (tf )δx(tf ) + + λ˙ δx ∂x(t ) − 1 ∂x 1  f  Zt0    ∂H1 T ∂H1 ∂H1 + x˙ δλ1 + δu1 + δu2 dt ∂λ − ∂u ∂u  1    1   2  

Since tf is fixed,

δx(tf ) = δxf (A–8)

T Also, since H1 = Φ1 + λ1 f,

∂H 1 = fT ∂λ1

111 and therefore

tf ∂Φ1 T ∂H1 T δJa1 = λ (tf ) δxf + + λ˙ δx ∂x(t ) − 1 ∂x 1  f   Zt0    T T ∂H1 ∂H1 + f x˙ δλ1 + δu1 + δu2 dt (A–9) − ∂u ∂u  1   2    A.1.1.2 Optimality conditions

Setting the variation (A–24) equal to zero yields a set of differential and algebraic equations

T ∂Φ1 λ1 (tf ) = ∂x(tf ) ∂H λ˙ T = 1 1 − ∂x x˙ = f (A–10) ∂H 1 = 0 ∂u1 ∂H 1 = 0 ∂u2

Equations (A–10) represent the follower’s optimal response to the leader’s control u2 ( ). · If such u1 exists to satisfy Eqs. (A–10), then it is possible to write

u1 = g (x(t), λ1(t), u2(t), t) (A–11) which is used in the subsequent development of the leader’s strategy.

A.1.2 Leader’s Strategy

Now consider the leader’s control. In the Stackelberg structure the leader assumes that the follower’s reaction is based on the leader’s action, which is provided by

Eqs. (A–10). Thus the leader solves the optimization problem with J2 and the constraints with u1 substituted by Eq. (A–11):

tf J2 = Φ2(xf , tf ) + L2 x(t), g (x(t), λ1(t), u2(t), t) , u2(t), t dt { } Zt0

112 x˙ = f (x, g(x(t), λ1(t), u2(t)), u2(t), t) , x(t0) = x0 T T ∂H1 ∂Φ1 λ˙ 1 = h = , λ1(tf ) = − ∂x ∂x(t )    f  Then the Hamiltonian of the leader is

T T H2 = L2 + λ2 f + ψ h (A–12) or

H2 (x, λ1, u2, λ2, ψ, t) = L2 (x, g (x, λ1, u2, t) , u2, t)

T T + λ2 f (x, g (x, λ1, u2, t)) + ψ h (x, λ1, u2, t) (A–13) with the augmented cost functional and its first variation

tf T T Ja2 = Φ2 + H2 λ x˙ ψ λ˙ 1 dt − 2 − Zt0  

tf T T δJa2 = δΦ2 + δ H2 λ x˙ ψ λ˙ 1 dt (A–14) − 2 − Zt0   A.1.2.1 Variation of the augmented cost functional

Now look inside the augmented cost functional (A–14). In a similar manner to the follower’s case, the variation of the boundary condition is found as

∂Φ δΦ = 2 δx 2 ∂x(t ) f  f 

The follower’s strategy u1 is replaced with λ1 and u2, and thus the variation of Hamiltonian becomes

δH2 = δH2 (x, λ1, u2, λ2, ψ, t) ∂H ∂H ∂H ∂H ∂H = 2 δx + 2 δλ + 2 δu + 2 δλ + 2 δψ ∂x ∂λ 1 ∂u 2 ∂λ 2 ∂ψ    1   2   2   

113 The variation of the integrand of Eq. (A–14) is

tf tf T T T T T T δ H2 λ x˙ ψ λ˙ 1 dt = δH2 δλ x˙ λ δx˙ δψ λ˙ 1 ψ δλ˙ 1 dt − 2 − − 2 − 2 − − Zt0 Zt0   tf   T T = δH2 δλ x˙ δψ λ˙ 1 dt − 2 − Zt0  tf  tf T tf ˙ T T tf ˙ T λ2 δx λ2 δxdt ψ δλ1 ψ δλ1dt − t0 − − t0 −  Zt0   Zt0      T T T = λ (tf )δx(tf ) ψ (tf )δλ1(tf ) + ψ (t0)δλ1(t0) − 2 − tf T T T T + δH2 δλ x˙ δψ λ˙ 1 + λ˙ δx + ψ˙ δλ1 dt − 2 − 2 Zt0   From Eq. (A–7), (A–8), and (A–10), the variations δλ1(tf ) and δx(tf ) are related by

∂ δλ1(tf ) = [λ1(tf )] δx(tf ) ∂x(tf ) ∂ ∂Φ T = 1 δx(t ) ∂x(t ) ∂x(t ) f f  f  ∂ ∂Φ T = 1 δx ∂x ∂x f f  f  The total variation of the leader’s augmented cost is then grouped into respective variations:

tf ∂Φ2 T T δJa2 = δxf + δ H2 λ x˙ ψ λ˙1 dt ∂x(t ) − 2 −  f  Zt0   T ∂Φ2 T T ∂ ∂Φ1 T = λ (tf ) ψ (tf ) δxf + ψ (t0)δλ1(t0) ∂x(t ) − 2 − ∂x(t ) ∂x(t )  f  f  f  ! tf T T T T + δH2 δλ x˙ δψ λ˙ 1 + λ˙ δx + ψ˙ δλ1 dt − 2 − 2 Zt0 (A–15)  T ∂Φ2 T T ∂ ∂Φ1 = λ (tf ) ψ (tf ) δxf ∂x(t ) − 2 − ∂x(t ) ∂x(t )  f  f  f  ! tf ∂H ∂H ∂H + 2 + λ˙ T δx + 2 + ψ˙ T δλ + 2 δu ∂x 2 ∂λ 1 ∂u 2 Zt0     1    2  ∂H2 T ∂H2 T + x˙ δλ2 + λ˙ δψ dt ∂λ − ∂ψ − 1  2      

114 A.1.2.2 Optimality conditions

Setting Eq. (A–15) to zero, by setting each group of terms associated with the variations, will yield the optimality conditions for the leader

T ∂Φ2 T T ∂ ∂Φ1 λ (tf ) ψ (tf ) = 0 ∂x(t ) − 2 − ∂x(t ) ∂x(t )  f  f  f  T ψ (t0) = 0 ∂H 2 + λ˙ T = 0 ∂x 2   ∂H 2 + ψ˙ T = 0 (A–16) ∂λ  1  ∂H 2 = 0 ∂u2 ∂H 2 x˙ T = 0 x˙ = f ∂λ − ⇒  2  ∂H2 T λ˙1 = 0 λ˙ 1 = h ∂λ − ⇒  2  Equations (A–16) are solved for the leader’s strategy and the corresponding costates. Once they are obtained, they are substituted to Eqs. (A–10) so that the follower’s strategy can be solved.

A.2 Free Final Time

The case with no constraint on the final time (i.e., free final time) is presented here.

Recall the game defined by Eqs. (A–1), (A–2), and (A–3). Assume that t0 and x(t0) are

fixed and that tf and x(tf ) are flexible. In the same way as the fixed final time case, the game is defined by

x˙ = f (x(t), u1(t), u2(t), t) , x(t0) = x0,

tf J1 = Φ1(xf , tf ) + L1 (x(t), u1(t), u2(t), t) dt, Zt0 tf J2 = Φ2(xf , tf ) + L2 (x(t), u1(t), u2(t), t) dt, Zt0

115 where t0 and x(t0) = x0 are fixed while tf and x(tf ) = xf are free. Again it is assumed that the player 2 is the leader and that the control paths are unconstrained. For an open-loop

Stackelberg game, leader announces his control path u2 ( ). · A.2.1 Follower’s Strategy

The follower tries to find the optimal control path u1 ( ). Then the follower’s · Hamiltonian is

T H1 = L1 (x(t), u1(t), t) + λ1 f (x(t), u1(t), u2(t), t) ,

where u2 is treated as a prescribed function of time. Then the augmented cost functional of the follower is constructed as

tf T Ja1 = Φ1 + H1 λ x˙ dt, − 1 Zt0   with its first variation

tf T δJa1 = δΦ1 + δ H1 λ x˙ dt. (A–17) − 1 Zt0   A.2.1.1 Variation of the augmented cost functional

Now evaluate the variation (A–17). The variation of the boundary condition is given by

∂Φ ∂Φ ∂Φ ∂Φ δΦ = 1 δx + 1 δt + 1 δx + 1 δt (A–18) 1 ∂x(t ) 0 ∂t 0 ∂x(t ) f ∂t f  0   0   f   f 

Since the final time and state are free, δx0 = 0, δt0 = 0, Eq. (A–18) is simplified as

∂Φ ∂Φ δΦ = 1 δx + 1 δt (A–19) 1 ∂x(t ) f ∂t f  f   f  The variation of the follower’s Hamiltonian is given by

∂H ∂H ∂H ∂H δH = 1 δx + 1 δλ + 1 δu + 1 δu (A–20) 1 ∂x ∂λ 1 ∂u 1 ∂u 2    1   1   2 

116 The variation of the integral part of Eq. (A–17) is expanded.

tf tf T T T T δ H1 λ1 x˙ dt = δH1 δλ1 x˙ λ1 δx˙ dt + H1 λ1 x˙ δtf − − − − tf Zt0 Zt0       T H1 λ1 x˙ δt0 − − t0 tf tf   T T tf ˙ T = δH1 δλ1 x˙ dt λ1 δx λ1 δxdt − − t0 − Zt0  Zt0      T + H1 λ1 x˙ δtf − tf  T  T = λ (tf )δx(tf ) + H1(tf ) λ (tf )x˙(tf ) δtf − 1 − 1 tf (A–21) T  T  + δH1 δλ x˙ + λ˙ δx dt − 1 1 Zt0   Substituting Eq. (A–20) into Eq. (A–21) yields

tf T T T δ H1 λ x˙ dt = λ (tf )δx(tf ) + H1(tf ) λ (tf )x˙(tf ) δtf − 1 − 1 − 1 Zt0   tf ∂H  ∂H  + 1 δx + 1 δλ ∂x ∂λ 1 Zt0    1  ∂H ∂H + 1 δu + 1 δu δλT x˙ + λ˙ T δx dt ∂u 1 ∂u 2 − 1 1  1   2   T T = λ (tf )δx(tf ) + H1(tf ) λ (tf )x˙(tf ) δtf − 1 − 1 tf H H ∂ 1 ˙ T ∂ 1  T + + λ1 δx + x˙ δλ1 (A–22) ∂x ∂λ − Zt0    1   ∂H ∂H + 1 δu + 1 δu dt ∂u 1 ∂u 2  1   2   From Eqs. (A–19) and (A–22) the variation of the augmented follower’s variation in Eq. (A–17) becomes

∂Φ ∂Φ δJ = 1 δx + 1 δt a1 ∂x(t ) f ∂t f  f   f  T T λ (tf )δx(tf ) + H1(tf ) λ (tf )x˙(tf ) δtf − 1 − 1 tf (A–23) ∂H1  T ∂H1 T + + λ˙ δx + x˙ δλ1 ∂x 1 ∂λ − Zt0    1   ∂H ∂H + 1 δu + 1 δu dt. ∂u 1 ∂u 2  1   2  

117 Since δxf and δx(tf ) are related by

δx(tf ) = δx x˙(tf )δtf , f − and ∂H 1 = fT ∂λ1

T from H1 = Φ1 + λ1 f, Eq. (A–23) becomes

∂Φ1 T ∂Φ1 δJa1 = λ1 (tf ) δxf + + H1(tf ) δtf ∂x(tf ) − ∂tf       (A–24) tf ∂H1 T T T ∂H1 ∂H1 + + λ˙ δx + f x˙ δλ1 + δu + δu dt ∂x 1 − ∂u 1 ∂u 2 Zt0     1   2    A.2.1.2 Optimality conditions

Setting each of the variations in Eq. (A–24) equal to zero yields the follower’s optimality condition

T ∂Φ1 λ1 (tf ) = ∂x(tf ) ∂Φ1 H1(tf ) = − ∂tf ∂H λ˙ T = 1 (A–25) 1 − ∂x x˙ = f ∂H 1 = 0 ∂u1

Note that the leader’s contribution (u2) is not investigated here. Equation (A–25) represents the follower’s optimal response to the leader’s control u ( ). If such u 2 · 1 exists as an explicit function of x, t, λ1, and u2, then it can be written in the same way as in the fixed-final time case

u1 = g (x(t), λ1(t), u2(t), t) . (A–26)

118 A.2.2 Leader’s strategy

Now consider the leader’s control. The cost functional is

tf J2 = Φ2(xf, tf ) + L2 (x(t), g(x(t), λ1(t), u2(t), t), u2(t), t) dt, Z0 subject to the constraints

x˙ = f (x(t), g(x(t), λ1(t), u2(t), t), u2(t), t) , x(t0) = x0, T T ∂H1 ∂Φ1 λ˙ 1 = h = , λ1(tf ) = . − ∂x ∂x(t )    f  Then the leader’s Hamiltonian is defined as

H2 (x, λ1, u2, λ2, ψ, t) = L2 (x, g(x, λ1, u2, t), t)

T T + λ2 f(x, g(x, λ1, u2, t), u2, t) + ψ h(x, λ1, u2, t) or in a compact form

T T H2 = L2 + λ2 f + ψ h

The augmented cost functional and the corresponding first variation are

tf T T Ja2 = Φ2 + H2 λ x˙ ψ λ˙ 1 dt, (A–27) − 2 − Zt0 tf  T T δJa2 = δΦ2 + δ H2 λ x˙ ψ λ˙ 1 dt. (A–28) − 2 − Zt0   A.2.2.1 Variation of the augmented cost functional

Now look inside the variation of the leader’s augmented cost functional Eq. (A–28). In a similar manner to the follower’s case, the variation of the boundary condition is found as

∂Φ ∂Φ δΦ = 2 δx + 2 δt . (A–29) 2 ∂x(t ) f ∂t f  f   f 

119 The variation of Hamiltonian is

δH2 = δH2 (x, λ1, u2, λ2, ψ, t) ∂H ∂H ∂H ∂H ∂H = 2 δx + 2 δλ + 2 δu + 2 δλ + 2 δψ ∂x ∂λ 1 ∂u 2 ∂λ 2 ∂ψ    1   2   2    Expand the integral part of Eq. (A–28):

tf tf T T T T T T δ H2 λ x˙ ψ λ˙ 1 dt = δH2 δλ ˙ λ δx˙ δψ λ˙ 1 ψ δλ˙ 1 dt − 2 − − 2 − 2 − − Zt0 Zt0     T T ˙ + H2 λ2 x˙ ψ λ1 δtf − − tf tf  T  T = δH2 δλ x˙ ψ λ˙ 1 dt − 2 − Zt0  tf  tf T tf ˙ T T tf ˙ T λ2 δx λ2 δxdt ψ δλ1 ψ δλ1dt − t0 − − t0 −  Zt0   Zt0      T T ˙ + H2 λ2 x˙ ψ λ1 δtf − − tf  T T  = λ (tf )δx(tf ) ψ (tf )δλ1(tf ) − 2 − tf T T T T + δH2 δλ x˙ δψ λ˙ 1 + λ˙ δx + ψ˙ δλ1 dt (A–30) − 2 − 2 Zt0   T T ˙ + H2 λ2 x˙ ψ λ1 δtf − − tf  

120 From Eq. (A–29) and (A–30), the total variation Eq. (A–28) becomes

tf ∂Φ2 ∂Φ2 T T δJa2 = δx + δtf + δ H2 λ x˙ ψ λ˙ 1 dt ∂x(t ) f ∂t − 2 −  f   f  Zt0   ∂Φ2 ∂Φ2 T T = δx + δtf λ (tf )δx(tf ) ψ (tf )δλ1(tf ) ∂x(t ) f ∂t − 2 −  f   f  T T ˙ + H2 λ2 x˙ ψ λ1 δtf − − tf tf  T  T T T + δH2 δλ x˙ δψ λ˙ 1 + λ˙ δx + ψ˙ δλ1 dt − 2 − 2 Zt0   ∂Φ2 ∂Φ2 T T = δx + δtf λ (tf )δx(tf ) ψ (tf )δλ1(tf ) ∂x(t ) f ∂t − 2 −  f   f  T T ˙ + H2 λ2 x˙ ψ λ1 δtf − − tf tf (A–31)  ∂H2  ∂H2 ∂H2 + + λ˙ T δx + + ψ˙ T δλ + δu ∂x 2 ∂λ 1 ∂u 2 Zt0     1    2  ∂H2 T ∂H2 T + x˙ δλ2 + λ˙ δψ dt ∂λ − ∂ψ − 1  2       Since

δx(tf ) = δxf x˙(tf )δtf , −

δλ1(tf ) = δλ1f λ˙ 1(tf )δtf , −

Equation (A–31) becomes

∂Φ2 T ∂Φ2 T δJa2 = λ (tf ) δx + + H2(tf ) δtf ψ (tf )δλ1(tf ) ∂x(t ) − 2 f ∂t −  f    f   tf ∂H ∂H ∂H + 2 + λ˙ T δx + 2 + ψ˙ T δλ + 2 δu (A–32) ∂x 2 ∂λ 1 ∂u 2 Zt0    1    2  ∂H2 T ∂H2 ˙T + x˙ δλ2 + λ δψ dt ∂λ − ∂ψ − 1  2      

121 A.2.2.2 Optimality conditions

Setting Eq. (A–32) this to zero, the optimality condition for the leader is obtained as

T ∂H2 λ˙ 2 = − ∂x   ∂Φ T λ (t ) = 2 2 f ∂x(t )  f  ∂H T ψ˙ = 2 − ∂λ  1 

ψ(tf ) = 0 (A–33) ∂H 2 = 0 ∂u2 ∂H 2 x˙ T = 0 = x˙ = f ∂λ − ⇒  2  ∂H2 T λ˙ = 0 = λ˙ 1 = h ∂ψ − 1 ⇒  

A.3 Linear-Quadratic Differential Game

Consider a game defined by the system with the dynamics and the cost functionals respectively linear and quadratic in the states and controls, x, u1, and u2, such as

1 J = xT(t )K (t)x(t ) 1 2 f 1f f tf T T T + x (t)Q1(t)x(t) + u1(t)R11(t)u1(t) + u2(t)R12(t)u2(t) dt, Zt0 1   J = xT(t )K (t)x(t ) 2 2 f 2f f tf T T T + x (t)Q2(t)x(t) + u1(t)R21(t)u1(t) + u2(t)R22(t)u2(t) dt, Zt0   x˙ = A(t)x(t) + B1(t)u1(t) + B2(t)u2(t), x(t0) = x0.

From now on, (t) will be omitted for simplicity; this should cause no confusion as the subsequent derivation of the Riccati equations is valid regardless of whether these coefficients are constant or time-varying.

122 A.3.1 Fixed Final Time

Assume t0, x (t0), and x (tf ) are fixed. Then the follower’s Hamiltonian (Player 1) is constructed as

1 H = xTQ x + uTR u + uTR u + λT (Ax + B u + B u ) , 1 2 1 1 11 1 2 12 2 1 1 1 2 2  and the corresponding optimality conditions are obtained from Eq. (A–10) as

∂H λ˙ T = 1 = xTQ + λT A , 1 − ∂x − 1 1 T ∂ 1 T T  λ (tf ) = x K x = x K , 1 −∂x 2 f 1f f f 1f f   ∂H1 T T = u1R11 + λ1 B1 = 0. ∂u1

After simplification the follower’s strategy and the costate dynamics are obtained as

˙ T T T λ1 = Q1x A λ1, λ1(tf ) = K1fxf, − − (A–34) −1 T u = R B λ1. 1 − 11 1 Next the leader’s Hamiltonian is constructed by adjoining Eq. (A–34):

T T H2 = L2 + λ2 f + ψ h

1 T T T T T T T = x Q x + u R u + u R u + λ (Ax + B u + B u ) + ψ Q x A λ1 2 2 1 21 1 2 22 2 2 1 1 2 2 − 1 − 1 T −1 −1   = xTQ x + λ B R R R BTλ + uTR u 2 2 1 1 11 21 11 1 1 2 22 2 T −1 T T T  T + λ Ax B R B λ1 + B u + ψ Q x A λ1 , 2 − 1 11 1 2 2 − 1 −   and its optimality conditions are found following Eq. (A–16):

A.3.2 Free Final Time

Assume t0 and x (t0) are fixed and tf and x (tf ) = xf are free. The follower’s Hamiltonian is

1 H = xTQ x + uTR u + uTR u + λT (Ax + B u + B u ) , 1 2 1 1 11 1 2 12 2 1 1 1 2 2 

123 and from Eq. (A–25) the optimality conditions for the follower are found as

∂H λ˙ T = 1 = xTQ + λT A , 1 − ∂x − 1 1 ∂ 1  λT (t ) = xTK x = xTK , 1 f ∂x 2 f 1f f f 1f f   ∂H1 T T = u1R11 + λ1 B1 = 0, ∂u1

∂ 1 T H1(tf ) = x K x = 0, −∂t 2 f 1f f f   which after simplification yield

˙ T T T λ1 = Q1x A λ1, λ1(tf ) = K1fxF, − − (A–35) −1 T u = R B λ1. 1 − 11 1 The leader’s Hamiltonian is constructed by augmenting Eq. (A–35) as

T T H2 = L2 + λ2 f + ψ h

1 T T T T T T T = x Q x + u R u + u R u + λ (Ax + B u + B u ) + ψ Q x A λ1 2 2 1 21 1 2 22 2 2 1 1 2 2 − 1 − 1 T −1 −1   = xTQ x + λ B R R R BTλ + uTR u 2 2 1 1 11 21 11 1 1 2 22 2 T −1 T T T  T + λ Ax B R B λ1 + B u + ψ Q x A λ1 . 2 − 1 11 1 2 2 − 1 −   Then the leader’s optimality conditions is obtained from Eq. (A–33) as

∂H λ˙ T = 2 = xTQ + λT A ψT QT , 2 − ∂x − 2 2 − 1 T ∂ 1 T T  λ (tf ) = x K x = x K , 2 −∂x 2 f 2f f f 2f f   H ˙ T ∂ 2 T −1 −1 T T −1 T ψ = = λ1 B1R11 R21R11 B1 λ2 B1R11 B1 , −∂λ1 − − T  ψ (tf ) = 0,

∂H2 T T = u2R22 + λ2 B2 = 0, ∂u2

124 which are simplified to yield

T T T λ˙ 2 = Q x A λ2 + Q ψ, λ2(tf ) = K x , − 2 − 1 2f f −1 T −1 T −1 T ψ˙ = B R R R B λ1 + B R B λ2, ψ(tf ) = 0, (A–36) − 1 11 21 11 1 1 11 1 −1 T u = R B λ2. 2 − 22 2 From Eqs. (A–35) and (A–36), the optimality conditions for the two-player linear quadratic Stackelberg differential game with free final time are found as

−1 T u = R B λ1, 1 − 11 1 −1 T u = R B λ2, 2 − 22 2 T T T λ˙ 1 = Q x A λ1, λ1(tf ) = K x , − 1 − 1f f T T T λ˙ 2 = Q x A λ2 + Q ψ, λ2(tf ) = K x , − 2 − 1 2f f −1 T T −1 T ψ˙ = B R R B λ1 + B R B λ2, ψ(tf ) = 0. − 1 11 21 1 1 11 1

125 APPENDIX B RISE STABILITY ANALYSIS The control input obtained in Chapter6 is based on the assumption that the all

the other components of the dynamics can be captured by h as in Eq. (6–5), which is not true in general. In this appendix a RISE controller is developed so that what was previously captured by h is asymptotically tracked, enabling the optimal control u to be applied. The development of the RISE controller is almost identical to [76]. Designing the feedback controller in the Euler-lagrange form allows the same stability analysis applicable to different game theoretic controllers developed in Chapter6.

B.1 Rise Feedback Control Development

In general, the bounded disturbance τd (t) and the nonlinear dynamics given in Eq. (6–4) are unknown, so the controller given in Eq. (6–5) cannot be implemented.

However, if the control input can identify and cancel these effects, then x(t) will converge to the state space model in Eq. (6–6) such that that u and τd minimize the respective performance index J1 and J2. In this section, a control input is developed that exploits RISE feedback to identify the nonlinear effects and bounded disturbances thus enabling x(t) to asymptotically converge to the state space model in Eq. (6–6). To develop the control input, the filtered tracking error in Eq. (6–2) is premultiplied by M(q), and the expressions in Eqs. (6–1) and (6–2) are utilized to obtain

Mr = V e + h + τd + α2Me (τF + τL) . (B–1) − m 2 2 −

Based on the open-loop error system in Eq. (B–1), the control input is composed of the game theoretic controllers developed in Eqs. (6–64) and (6–65), plus a subsequently designed auxiliary control term µ(t) Rn as ∈

(τF + τL) µ (u + u ) . (B–2) , − F L

126 The closed-loop tracking error system can be developed by substituting Eq. (B–2) into Eq. (B–1) as

Mr = V e + h + τd + α2Me + (u + u ) µ. (B–3) − m 2 2 F L −

n To facilitate the subsequent stability analysis the auxiliary function fd(t) R , which is ∈ defined as

fd , M(qd)q¨ d + Vm(qd, q˙ d)q˙ d + G(qd) + F(q˙ d), (B–4) is added and subtracted to Eq. (B–3) to yield

Mr = V e + h¯ + f + τ + (u + u ) µ + α Me , (B–5) − m 2 d d F L − 2 2 where h¯ Rn is defined as ∈

h¯ h f . , − d

Substituting Eq. (6–65) into Eq. (B–5), taking the time derivative, and manipulating with Eq. (6–2) yields

1 Mr˙ = Mr˙ + N˜ + N e R−1 + R−1 r µ˙ , (B–6) −2 D − 2 − 11 22 −  after strategically grouping specific terms. In Eq. (B–6), the unmeasurable auxiliary ˜ n terms N(e1, e2, r, t), ND(t) R are defined as ∈ 1 ˙ N˜ V˙ e V e˙ Mr˙ + h¯ + α2Me˙ , − m 2 − m 2 − 2 2 −1 −1 + α2Me˙ 2 + e2 + R11 + R22 α2e2, ˙  ND , fd +τ ˙d .

127 The Mean Value Theorem and Assumptions 3, 4, and 5 in [77] can be used to upper bound the auxiliary terms as

N˜ (t) ρ ( y ) y , N ζ1, N˙ ζ2, (B–7) ≤ k k k k k Dk ≤ D ≤

where y(t) R3n is defined as ∈ T y(t) T T T , , e1 e2 r   the bounding function ρ ( y ) R is a positive globally invertible nondecreasing function, k k ∈ and ζi R, i = 1, 2, denote known positive constants. Based on Eq. (B–6), the control ∈ term µ(t) is designed as the generalized solution to

µ˙ (t) , ksr(t) + β1sgn(e2), (B–8) where ks, β1 R are positive constant control gains. The closed-loop error systems for ∈ r(t) can now be obtained by substituting Eq. (B–8) into Eq. (B–6) as

1 ˙ ˜ −1 −1 Mr˙ = Mr + N + ND e2 R11 + R22 r −2 − − (B–9)  ksr β1sgn(e ). − − 2

B.2 Stability Analysis

It can be shown that the controller given by Eqs. (6–64), (6–65), (B–2), and (B–8) ensures that all system signals are bounded under closed-loop operation, and the tracking error are regulated in the sense that (see [44] for similar details)

e (t) , e (t) , r(t) 0 as t . (B–10) k 1 k k 2 k k k → → ∞

The boundedness of the closed-loop signals and the result in Eq. (B–10) can be obtained provided the control gain ks introduced in Eq. (B–8) is selected sufficiently large (see the subsequent stability analysis), and α1 and α2 are selected according to

128 the sufficient conditions

1 λ (α ) > , λ (α ) > 1, (B–11) min 1 2 min 2

where λmin(α1) and λmin(α2) are the minimum eigenvalues of α1 and α2, respectively.

The gain β1 is selected according to the following sufficient condition:

ζ2 β1 > ζ1 + . (B–12) λmin(α2)

Let a Lyapunov function VL(Φ, t): [0, ) R be a continuously differentiable D × ∞ → positive definite function defined in [44] as

2 1 2 1 T VL(Φ, t) e + e + r Mr + O. (B–13) , k 1k 2 k 2k 2

where the auxiliary function O(t) R is the solution to (see [44] for further details) ∈

T O˙ r (N β1sgn(e )) , , − D − 2 n (B–14) T O(0) = β1 e (0) e (0) N (0). | 2i | − 2 D i=1 X Taking the time derivative of Eq. (B–13) yields

1 V˙ = 2eTe˙ + eTe˙ + rTMr˙ + rTMr˙ + O˙ . L 1 1 2 2 2

Utilizing Eqs. (6–2), (B–9), and (B–14), the Lyapunov derivative is rewritten as

T T T V˙ L(Φ, t) 2e α1e + 2e e + r N˜ ≤ − 1 1 2 1 −1 −1 2 ks + λmin R + R r (B–15) − 11 22 k k  2  λmin(α2) e . − k 2k Utilizing Eqs. (B–7) and (B–15) can be further simplified as

2 2 V˙ L λ3 y ks r ρ( y ) r y , (B–16) ≤ − k k − k k − k k k k k k h i

129 where

2λmin(α1) 1 −   λ3 min λ (α ) 1 . (B–17) ,  min 2   −   −1 −2  λmin R11 + R22     Completing the squares for the terms inside the brackets in Eq. (B–16) yields

2 2 2 ρ ( y ) y V˙ L λ3 y + k k k k U(Φ), (B–18) ≤ − k k 4ks ≤ −

where U(Φ) = c y 2 for some positive constant c. The function U(Φ) is a continuous, k k positive semi-definite and defined within the closed set:

3n+1 −1 Φ R Φ ρ 2 λ3ks . D , ∈ k k ≤ n  p o

The inequality in Eq. (B–18) can be used to show that VL(Φ, t) ∞ in ; hence, e1(t), ∈ L D e2(t), and r(t) ∞ in . Then standard linear analysis methods can be used to prove ∈ L D that e˙ 1(t), e˙ 2 ∞ in from Eq. (6–2). Since e1(t), e2(t), r(t) ∞ in , Assumption ∈ L D ∈ L D 4 is used along with Eq. (6–2) to conclude that q(t), q˙ (t), q¨ (t) ∞ in , which is then ∈ L D combined with Assumption 3 to conclude that M(q), Vm(q, q˙ ), G(q), and F(q˙ ) ∞ in . ∈ L D Thus, from Eq. (6–1) and Assumption 4, it can be shown that τL(t), τF (t) ∞ in . With ∈ L D r(t) ∞ in , it can be shown thatµ ˙(t) ∞ in ; hence, Eq. (B–9) can be used to ∈ L D ∈ L D show that r˙(t) ∞ in . From e˙ 1(t), e˙ 2(t), r˙ ∞ in , the definitions for U(y) and z(t) ∈ L D ∈ L D can be used to prove that U(y) is uniformly continuous in . D Using similar arguments as given in [44] it can be shown that

c y(t) 2 0 as t y(0) . (B–19) k k → → ∞ ∀ ∈ S

Since u (t), u (t) 0 as e (t) 0 from Eq. (6–64) and (6–65), then Eq. (B–5) can F L → 2 → be used to conclude that

µ h¯ + f + τ as r(t), e (t) 0. (B–20) → d d 2 →

130 Equation (B–20) indicates that the dynamics in Eq. (6–1) converges to the state-space model in Eq. (6–7). Hence, uF(t) and uL(t) converges to optimal controllers to solve the game defined in Eq. (6–8), provided the gain constraints in Eq. (6–63) are satisfied.

131 APPENDIX C COSTATE ESTIMATION FOR THE TRANSCRIBED STACKELBERG GAMES In Section 3.2 a two-player Stackelberg differential game was discretized at Legendre-Gauss-Lobatto (LGL) points to form a static Stackelberg game problem or a bilevel programming problem. Similarities between the solution of an optimal control problem and the transcribed nonlinear programming problem are ensured by checking if the solution of the transcribed problem satisfies the optimality conditions of the original problem. This process is called costate estimation (for the transcribed solution estimates the costate dynamics of the original problem) or costate mapping (for it is done by finding the mapping between the costates (KKT multipliers) for the nonlinear programming problem and the costates for the optimal control problem). In optimal control, [78] performed the costate estimation for the case with LGL transcription (LPM) to show that the KKT multiplier at each LGL collocation point is equal to the costate weighted by the corresponding LGL weight. In this appendix the LPM costate estimation in [78] is extended to open-loop two-player Stackelberg differential games, so that the bilevel programming problem, once solved, will provide a numerical solution to the differential game problem satisfying the exact optimality conditions at each collocation point.

C.1 Transformed Optimality Conditions

First derive the optimality conditions of the original two-player Stackelberg differential game problem in the transformed time domain, which will be compared

132 with the solution of the transcribed static problem. A problem defined by

Minimize

tf J1 = Φ1 (x(t0), t0, x(tf ), tf ) + M (x(t), u(t), v(t), t) dt Zt0 tf J2 = Φ2 (x(t0), t0, x(tf ), tf ) + N (x(t), u(t), v(t), t) dt Zt0 subject to

x˙ = f (x(t), u(t), v(t), t) , x(t0) = x0 is redefined in τ-domain through

(tf t0)τ + (tf + t0) t = − , 2

which yields a new problem

Minimize 1 tf t0 J1 = Φ1 (x( 1), 1, x(1), 1) + − M (x(τ), u(τ), v(τ), τ) dτ, − − 2 −1 Z 1 tf t0 J2 = Φ2 (x( 1), 1, x(1), 1) + − N (x(τ), u(τ), v(τ), τ) dτ, − − 2 Z−1 subject to

tf t0 x˙ = − f (x(τ), u(τ), v(τ), τ) , x( 1) = x0. 2 −

The costate dynamics are derived in AppendixA.

tf t0 H = − M + λT f , 1 2 ∂H T  λ˙ = 1 − ∂x   T T tf t0 ∂M ∂f = − λ . (C–1) 2 − ∂x − ∂x "     # h | {z }

133 tf t0 Letting λ˙ = − h, 2

tf t0 H = − N + µT f + ψT h , 2 2 ∂H T  µ˙ = 2 − ∂x   T T T tf t0 ∂N ∂f ∂h = − µ ψ 2 − ∂x − ∂x − ∂x "       # T T T T T tf t0 ∂N ∂f ∂ ∂M ∂f = − µ + + λ ψ , (C–2) 2 − ∂x − ∂x ∂x ∂x ∂x      (    ) ∂H T  ψ˙ = 2 − ∂λ   T T T tf t0 ∂N ∂f ∂h = − µ ψ 2 − ∂λ − ∂λ − ∂λ "       # T T T T T tf t0 ∂N ∂f ∂ ∂M ∂f = − µ + + λ ψ . (C–3) 2 − ∂λ − ∂λ ∂λ ∂λ ∂λ      (    )   Equations (C–1), (C–2), and (C–3) are then discretized at LGL points and compared with the optimality conditions for the transcribed problem.

C.2 Discretization of Two-person Stackelberg Differential Games

The states x(t) and the control u = u1(t), v = u2(t) in the original time domain are

T n×1 x(t) = x (t) x (t) x (t) R , (C–4) 1 2 n ∈  ···  T m×1 u(t) = u (t) u (t) u (t) R , (C–5) 1 2 m ∈  ···  T m×1 v(t) = v (t) v (t) v (t) R , (C–6) 1 2 m ∈  ···  and are in time domain t [t0, tf ]. The domain is transformed to τ [ 1, 1] to match ∈ ∈ − the LGL points, through

(t t0) (tf t) τ = − − − , tf t0 −

134 such that t = t0 and t = tf correspond to τ = τ0 = 1 and τ = τN = 1, respectively. The − transformed states and controls are discretized at the N + 1 LGL points, such that

x0 := x(τ0 = 1), x1 := x(τ1), xN := x(τN = 1), − ···

u0 := u(τ0 = 1), u1 := u(τ1), uN := u(τN = 1), − ···

v0 := v(τ0 = 1), v1 := v(τ1), vN := v(τN = 1), − ···

and the approximate polynomials for x(τ), u(τ), and v(τ) are formed as

N

x(τ) x˜(τ) = φl (τ)xk , ≈ l=0 X N

u(τ) u˜ (τ) = φl (τ)uk , ≈ l=0 X N

v(τ) v˜(τ) = φl (τ)vk , ≈ l=0 X where (˜) indicates the approximated parameter to be optimized in the bilevel programming · problem, and

2 1 (τ 1) L˙ N (τ) φk (τ) = − , k = 1, ... , N, N(N + 1)LN (τk ) τ τk −

are the Lagrange polynomials of order N, and LN (τ) is the Legendre polynomial of order N. The integrand of the cost functionals are discretized in the same manner:

Mi := M(xi , ui , vi , τi ),

Ni := N(xi , ui, vi , τi ).

The approximate derivative of x is obtained by differentiating x˜(τ). Since x(τk ) are constants, the derivative of xk , i = 1, ... , N can be written as

N N

x˙(τk ) x˜˙ k = φ˙l (τk )xk = Dkl xk . (C–7) ≈ l=0 l=0 X X

135 The discretized parameters can be arranged in a matrix form where D R(N+1)×(N+1) is a ∈ differentiation matrix consisting of φ˙l (τ) evaluated at each LGL point:

1 L (τ ) N l , k = l tl tk LN (τk ) 6  −  N(N + 1)  k = l = 0, − 4 Dkl =  N(N + 1)  k = l = N, 4  0 otherwise.    Finally, a bilevel programming problem is formulated as follows:

(Follower) Minimize

N tf t0 J˜ = Φ + − w M , 1 1 2 i i i=0 (C–8) X subject to

tf t0 x˜˙ = − f˜ k 2 (Leader) Minimize

N tf t0 J˜ = Φ + − w N , 2 2 2 i i i=0 (C–9) X subject to

tf t0 x˜˙ = − f˜ k 2

C.3 KKT Conditions and Costate Mapping

The follower’s augmented cost function is

N N T tf t0 T tf t0 J˜a1 = Φ1 +ν ˜ φ + − wi Mi + λ˜ − f˜i x˙˜ i 2 i 2 − i=0 i=0 X X   Look at the KKT condition:

∂J˜a1 tf t0 ∂M0 ∂M1 ∂MN = − w0 + w1 + + wN ∂x˜ 2 ∂x˜ ∂x˜ ··· ∂x˜ k  k k k 

136 t t f˜ f˜ f˜ f 0 ˜ T ∂ 0 ˜ T ∂ 1 T ∂ N + − λ0 + λ1 + + λN 2 ∂x˜ k ∂x˜ k ··· ∂x˜ k ! ∂x˙ ∂x˙ ∂x˙ λT 0 + λT 1 + + λT N − 0 ∂x 1 ∂x ··· N ∂x  k k k  N T tf t0 ∂Mk T ∂fk λi = − wk + λk + wk Dki = 0 2 ∂xk ∂xk wi i=0   X or

N T T λ˜ i tf t0 ∂Mk ∂fk λ˜ k Dki + − + = 0, k = 1, ... , N. (C–10) wi 2 ∂xk ∂xk wk i=0 " # X      

Now compare Eq. (C–10) with Eq. (C–1). First discretize Eq. (C–1) and evaluate at τk .

λ˙ k can be written as

N

λ˙ k = Dki λi , i=0 X and substituted to Eq. (C–1) to yield

N T T tf t0 ∂Mk ∂fk Dki λi + − + λk = 0. (C–11) 2 ∂xk ∂xk i=0 " # X     Equations. (C–10) and (C–11) are related by the mapping

λ˜ i = wi λi , i = 0, ... , N, (C–12) that is, the KKT multipliers for the discretized follower’s problem match the follower’s costates in the original differential game problem, weighted by the LGL weights. The leader’s augmented cost function is constructed as

N N T tf t0 T tf t0 J˜a2 = Φ2 +ν ˜ φ + − wi Ni + µ˜ − f˜i x˙˜ i 2 i 2 − i=0 i=0 X X   T N N ˜ t t M T f˜ ˜ ˜ T λl f 0 ∂ i ∂ i λi ψi Dil + − + . −  wl 2  ∂x˜ i ∂x˜ i wi  i=0 l=0 ! X X        Note that Eq. (C–10) is similar to λ˙ h and thus augmented with minus sign, making it − look like h λ˙ which is consistent with the augmented cost functional for the differential −

137 game in AppendixA. The leader’s KKT condition associated with x˜ is

∂J˜a2 tf t0 ∂N0 ∂N1 ∂NN = − w0 + w1 + + wN ∂x˜ 2 ∂x˜ ∂x˜ ··· ∂x˜ k  k k k  tf t0 ∂f0 ∂f1 ∂fN + − µ˜ T +µ ˜ T + +µ ˜ T 2 0 ∂x˜ 1 ∂x˜ ··· N ∂x˜  k k k  ˜ ˜ ˜ T ∂x˙ 0 T ∂x˙ 1 T ∂x˙ 1 µ˜ 0 +µ ˜ 1 + +µ ˜ N − ∂x˜ k ∂x˜ k ··· ∂x˜ k ! T T T tf t0 ∂ ∂M0 ∂ ∂M1 ∂ ∂MN − ψ˜ T + ψ˜ T + + ψ˜ T − 2 0 ∂x˜ ∂x˜ 1 ∂x˜ ∂x˜ ··· N ∂x˜ ∂x˜ k  0  k  1  k  N  ! T T tf t0 ∂ ∂f˜0 λ˜ 0 ∂ ∂f˜1 λ˜ 1 − ψ˜ T + ψ˜ T − 2  0 ∂x˜ ∂x˜ w 1 ∂x˜ ∂x˜ w k 0 !  0  k 1 !  1   T ∂ ∂f˜ λ˜ + + ψ˜ T N N ··· N ∂x˜ ∂x˜ w  k N !  N   T t t N f˜ M T f˜ ˜ f 0 ∂ k T ∂ k ˜ T ∂ ∂ k ∂ ∂ k λk = − wk +µ ˜ k ψk + 2  ∂x˜ k ∂x˜ k − ∂x˜ k ∂x˜ k ∂x˜ k ∂x˜ k ! wk        N T    µ˜ i  + wk Dki = 0, wi i=0 X which is divided by wk and transposed to yield

N T T µ˜ i tf t0 ∂NK ∂f˜k µ˜ k Dki + − + wi 2  ∂x˜ k ∂x˜ k wk i=0     !   X  (C–13) T T ∂ ∂Mk ∂f˜k λ˜ k ψ˜k + = 0. −∂x˜ k " ∂x˜ k ∂x˜ k ! wk # wk        Compare Eq. (C–2) with Eq. (C–13). In a similar manner to λ˙ ,µ ˙can be discretized and written as

N

µ˙ k = Dki µi , i=0 X

138 therefore Eq. (C–2) at each discretized point can be written as

N T T tf t0 ∂Nk ∂fk Dki µi + − + µk 2  ∂xk ∂xk i=0     X  (C–14) T T T ∂  ∂Mk ∂fk + λk ψk = 0. −∂xk " ∂xk ∂xk #      

Equations (C–13) and (C–14) show that the KKT multipliersµ ˜ i and the discretized costates µi are related by

µ˜ i = wi µi , i = 0, ... , N. (C–15)

Similarly, for the KKT condition associated with λ˜ ,

∂J˜a2 tf t0 ∂N0 ∂N1 ∂NN = − w0 + w1 + + wN ∂λ˜ 2 ∂λ˜ ∂λ˜ ··· ∂λ˜ k  k k k  tf t0 ∂f0 ∂f1 ∂fN + − µ˜ T +µ ˜ T + +µ ˜ T 2 0 ∂λ˜ 1 ∂λ˜ ··· N ∂λ˜  k k k  D ˜ D ˜ D ˜ ˜ T 00 ∂λ0 01 ∂λ1 0N ∂λN ψ0 + + + − w0 ∂λ˜ w1 ∂λ˜ ··· wN ∂λ˜   k k k  D ˜ D ˜ D ˜ ˜ T 10 ∂λ0 11 ∂λ1 1N ∂λN + ψ1 + + + + w0 ∂λ˜ w1 ∂λ˜ ··· wN ∂λ˜ ···  k k k  D ˜ D ˜ D ˜ ˜ T N0 ∂λ0 N1 ∂λ1 NN ∂λN + ψN + + + w0 ∂λ˜ w1 ∂λ˜ ··· wN ∂λ˜  k k k  t t M T M T M T f 0 ˜ T ∂ ∂ 0 ˜ T ∂ ∂ 1 ˜ T ∂ ∂ N − ψ0 + ψ1 + + ψN − 2 ∂λ˜ ∂x˜ 0 ∂λ˜ ∂x˜ 1 ··· ∂λ˜ ∂x˜ N k   k   k   ! T T t t f˜ ˜ f˜ ˜ f 0 ˜ T ∂ ∂ 0 λ0 ˜ T ∂ ∂ 1 λ1 − ψ0 + ψ1 − 2  ∂λ˜ ∂x˜ 0 w0 ∂λ˜ ∂x˜ 1 w1 k !   k !    T f˜ ˜ ˜ T ∂ ∂ N λN + + ψN ··· ∂λ˜ ∂x˜ N wN  k !    T tf t0 ∂Nk ∂f˜k ∂ ∂Mk ∂f˜k λ˜ k = − w +µ ˜ T ψ˜ T + k ˜ k ˜ k ˜ ˜ ˜ 2  ∂λk ∂λk ! − ∂λk  ∂λk ∂λk ! wk        N ˜ T   ψi  + Dki , wi i=0 X  

139 which when divided by wk and transposed yields

N T T ψ˜i tf t0 ∂Nk ∂f˜k µ˜ k D + − + ki ˜ ˜ wi 2  ∂λk ∂λk ! wk i=0       X (C–16) T T ∂ ∂M ∂f˜ λ˜ ψ˜ k + k k k = 0. ˜ ˜ ˜  −∂λk  ∂λk ∂λk ! wk  wk         Equation (C–3) is discretized at the LGL points and compared with Eq. (C–16). With

N

ψ˙k = Dki ψi , i=0 X The discretized costate dynamics becomes

N T T tf t0 ∂Nk ∂fk Dki ψi + − + µk 2  ∂λk ∂λk i=0     X  (C–17) T T T ∂  ∂Mk ∂fk + λk ψk = 0. −∂λk ( ∂λk ∂λk )       Comparison between Eq. (C–16) and (C–17) results in the mapping between ψ˜ and ψ:

ψ˜i = wi ψi , i = 1, ... , N. (C–18)

From Eqs. (C–12), (C–15), and (C–18), it is confirmed that the transcribed bilevel programming problem shares the same optimality information as the original Stackelberg differential game problem via weights associated with the LGL collocation.

140 REFERENCES [1] Fehse, W., Automated rendezvous and docking of spacecraft, Cambridge University Press, 2003. [2] JAXA, HTV-1 Mission Press Kit, 2009. [3] Whelan, D., Adler, E., Wilson III, S., and Roesler Jr, G., “DARPA Orbital Express program: Effecting a revolution in space-based systems,” Proceedings of SPIE, Vol. 4136, 2000, p. 48. [4] Shoemaker, J. and Wright, M., “Orbital express space operations architecture program,” Proceedings of SPIE, Vol. 5088, 2003, p. 1. [5] Scholten, H., Nugteren, P., De Kam, J., and Cruijssen, H., “ConeXpress- Low cost access to space,” 54 th International Astronautical Congress of the International Astronautical Federation(IAF), 2003. [6] JAXA, HTV2 (KOUNOTORI 2) Mission Press Kit. [7] Stephenson, A., “Space station-the orbital maneuvering vehicle,” Aerospace America, Vol. 26, 1988, pp. 24–26. [8] “STS-61,” http://science.ksc.nasa.gov/shuttle/missions/sts-61/mission-sts-61.html. [9] “STS-82,” http://science.ksc.nasa.gov/shuttle/missions/sts-82/mission-sts-82.html. [10] “STS-103,” http://science.ksc.nasa.gov/shuttle/missions/sts-103/mission-sts-103.html. [11] “STS-104,” http://science.ksc.nasa.gov/shuttle/missions/sts-109/mission-sts-109.html. [12] Cook, W. S. and Lindell, S. D., “Autonomous Rendezvous and Docking (AR&D) for future spacecraft missions,” AIAA Space Technology Conference & Exposition, 1999, pp. 28–30. [13] Nations, U., “Treaty on Principles Governing the Activities of States in the Exploration and Use of Outer Space, Including the Moon and Other Celestial Bodies,” 1967, http://www.unoosa.org/pdf/publications/STSPACE11E.pdf. [14] Junkins, J.L., S. P. M. D. and Bottke, W., “A Study of Six Near-Earth Asteroids,” Paper of the 2005 International Conference on Computational & Experimental Engineering and Sciences, 2005. [15] Chesley, S., “Potential impact detection for Near-Earth asteroids: the case of 99942 Apophis (2004 MN4),” Proceedings of the International Astronomical Union, Vol. 1, No. S229, 2006, pp. 215–228. [16] Lieggi, S. and Quam, E., “China’s ASAT Test and the Strategic Implications of Beijing’s Military Space Policy,” The Korean Journal of Defense Analysis, Vol. 19, No. 1, 2007, pp. 5–27.

141 [17] Johnson, S., “Space Debris Assessment for USA-193,” presentation, Vienna, February, 2008, pp. 11–22. [18] Jakhu, R., “Iridium-Cosmos collision and its implications for space operations,” Yearbook on Space Policy 2008/2009, 2010, pp. 254–275. [19] Liou, J. and Johnson, N., “Instability of the present LEO satellite populations,” Advances in Space Research, Vol. 41, No. 7, 2008, pp. 1046–1053. [20] Hoyt, R. and Smith, P., “The Remora Remover TM: a zero-debris method for on-demand disposal of unwanted LEO spacecraft,” Aerospace Conference Pro- ceedings, 2000 IEEE, Vol. 4, IEEE, 2000, pp. 239–246. [21] Nishida, S.-I., Kawamoto, S., Okawa, Y., and Kitamura, S., “A Study on Active Removal System of Space Debris,” Fifth European Conference on Space Debris, 2009. [22] Nishida, S.-I., Kawamoto, S., Okawa, Y., Terui, F., and Kitamura, S., “Space debris removal system using a small satellite,” Acta Astronautica, Vol. 65, No. 1-2, 2009, pp. 95–102. [23] Cojuangco, A.-A. L. C., Orbital Lifetime Analyses of Pico- and Nano-Satellites, Master’s thesis, University of Florida, 2007. [24] Myerson, R., Game theory: analysis of conflict, Harvard Univ Pr, 1997. [25] Isaacs, R., Differential Games: A Mathematical Theory with Applications to Warfare and Persuit, Control and Optimization, Dover Publications, 1999. [26] Basar, T. and Olsder, G., Dynamic Noncooperative Game Theory, SIAM, PA, 1999. [27] Bloem, M., Alpcan, T., and Bas¸ar, T., “A Stackelberg game for power control and channel allocation in cognitive radio networks,” Proc. Workshop on Game Theory in Communication Networks (GameComm), Nantes, France, October 2007. [28] Basar, T. and Selbuz, H., “Closed-Loop Stackelberg Strategies with Applications in the Optimal Control of Multilevel Systems,” IEEE Trans. Autom. Control, Vol. 24 no. 2, 1979, pp. 166–179. [29] Medanic, J., “Closed-Loop Stackelberg Strategies in Linear-Quadratic Problems,” IEEE Trans. Autom. Control, Vol. 23 no. 4, 1978, pp. 632–637. [30] Simaan, M. and Cruz, J., J., “A Stackelberg solution for games with many players,” IEEE Trans. Autom. Control, Vol. 18, No. 3, 1973, pp. 322 – 324. [31] Papavassilopoulos, G. and Cruz, J., “Nonclassical Control Problems and Stackelberg Games,” IEEE Trans. Autom. Control, Vol. 24 no. 2, 1979, pp. 155–166.

142 [32] Gambier, A., Wellenreuther, A., and Badreddin, E., “A New Approach to Design Multi-loop Control Systems with Multiple Controllers,” Proc. IEEE Conf. Decis. Control, 13-15 2006, pp. 1828 –1833. [33] Hongbin, J. and Huang, C. Y., “Non-cooperative uplink power control in cellular radio systems,” Wireless Networks, Vol. 4 no. 3, 1998, pp. 233–240. [34] Basar, T. and Bernhard, P., H-infinity Optimal Control and Related Minimax Design Problems, Boston: Birkhuser, 2008. [35] Isidori, A. and Astolfi, A., “Disturbance attenuation and H-Infinity-control via measurement feedback in nonlinear systems,” IEEE Trans. Autom. Control, Vol. 37, No. 9, Sept. 1992, pp. 1283–1293. [36] Pavel, L., “A noncooperative game approach to OSNR optimization in optical networks,” IEEE Trans. Autom. Control, Vol. 51 no. 5, 2006, pp. 848–852. [37] Tomlin, C., Lygeros, J., and Sastry, S., Hybrid Systems: Computation and Control, chap. 27, Lecture Notes in Computer Science, Springer Berlin / Heidelberg, 1998, pp. 360–373. [38] Basar, T. and Olsder, G. J., “Team-optimal closed loop Stackelberg strategies in hierarchical control problems,” Automatica, Vol. 16 no. 4, 1980, pp. 409–414. [39] Jungers, M., Trelat, E., and Abou-Kandil, H., “Min-max and min-min Stackelberg strategy with closed-loop information,” HAL Hyper Articles en Ligne, Vol. 3, 2010. [40] Johansson, R., “Quadratic optimization of motion coordination and control,” Automatic Control, IEEE Transactions on, Vol. 35, No. 11, 1990, pp. 1197–1208. [41] Kim, Y. and Lewis, F., “Optimal design of CMAC neural-network controller for robot manipulators,” IEEE Trans. Syst. Man Cybern. Part C Appl. Rev., Vol. 30, No. 1, feb 2000, pp. 22 –31. [42] Kim, Y., Lewis, F., and Dawson, D., “Intelligent optimal control of robotic manipulators using neural networks,” AUTOMATICA-OXFORD-, Vol. 36, 2000, pp. 1355–1364. [43] Abu-Khalaf, M. and Lewis, F., “Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network HJB approach,” Automatica, Vol. 41, No. 5, 2005, pp. 779–791. [44] Johnson, M., Hiramatsu, T., Fitz-Coy, N., and Dixon, W., “Asymptotic Stackelberg Optimal Control Design for an Uncertain Euler Lagrange System,” IEEE Conference on Decision and Control, Atlanta, December 2010, pp. 6686–6691. [45] Cai, Z., de Queiroz, M. S., and Dawson, D. M., “Robust adaptive asymptotic tracking of nonlinear systems with additive disturbance,” IEEE Trans. Autom. Control, Vol. 51, 2006, pp. 524–529.

143 [46] Makkar, C., Hu, G., Sawyer, W. G., and Dixon, W. E., “Lyapunov-Based Tracking Control in the Presence of Uncertain Nonlinear Parameterizable Friction,” IEEE Trans. Autom. Control, Vol. 52, No. 10, 2007, pp. 1988–1994. [47] Patre, P. M., MacKunis, W., Kaiser, K., and Dixon, W. E., “Asymptotic Tracking for Uncertain Dynamic Systems Via a Multilayer Neural Network Feedforward and RISE Feedback Control Structure,” IEEE Trans. Autom. Control, Vol. 53, No. 9, 2008, pp. 2180–2185. [48] Starr, A. W. and Ho, Y. C., “Nonzero-Sum Differential Games,” Journal of Optimiza- tion Theory and Applications, Vol. 3, No. 3, 1969, pp. 184–206. [49] Cruz Jr, J., “Survey of Nash and Stackelberg Equilibrim Strategies in Dynamic Games,” 1975. [50] Chen, C. and Cruz Jr, J., “Stackelburg solution for two-person games with biased information patterns,” Automatic Control, IEEE Transactions on, Vol. 17, No. 6, 1972, pp. 791–798. [51] Simaan, M. and Cruz, J., “On the Stackelberg strategy in nonzero-sum games,” Journal of Optimization Theory and Applications, Vol. 11, No. 5, 1973, pp. 533–555. [52] Simaan, M. and J. B. Cruz, J., “Additional Aspects of the Stackelberg Strategy in Nonzero-sum Games,” Journal of Optimization Theory and Applications, Vol. 11, No. 6, 1973, pp. 613–626. [53] Hull, D., “Conversion of optimal control problems into parameter optimization problems,” Journal of Guidance, Control, and Dynamics, Vol. 20, No. 1, 1997, pp. 57–60. [54] Huntington, G., Advancement and Analysis of a Gauss Pseudospectral Tran- scription for Optimal Control, Ph.D. thesis, Ph. D. Dissertation, Department of Aeronautics and Astronautics, MIT, May 2007. [55] Ross, I. and Fahroo, F., “Legendre pseudospectral approximations of optimal control problems,” New Trends in Nonlinear Dynamics and Control and their Applications, 2003, pp. 327–342. [56] Fahroo, F. and Ross, I., “Costate estimation by a Legendre pseudospectral method,” Journal of Guidance, Control, and Dynamics, Vol. 24, No. 2, 2001, pp. 270–277.

[57] Dempe, S., Bilevel Programming: A Survey, Dekan der Fak. fur¨ Mathematik und Informatik, 2003. [58] Wen, U.-P. and Lin, S.-F., “Finding an efficient solution to linear bilevel programming problem: An effective approach,” Journal of Global Optimization, Vol. 8, No. 3, 1996, pp. 295–306.

144 [59] Nicholls, M. G., “The application of non-linear bi-level programming to the aluminium industry,” Journal of Global Optimization, Vol. 8, No. 3, 1996, pp. 245–261. [60] Lignola, M. B. and Morgan, J., “Topological existence and stability for Stackelberg problems,” Journal of Optimization Theory and Applications, Vol. 84, No. 1, 1995, pp. 145–169. [61] Loridan, P. and Morgan, J., “Weak via strong Stackelberg problem: New results,” Journal of Global Optimization, Vol. 8, No. 3, 1996, pp. 263–287. [62] Kirk, D., Optimal control theory: an introduction, Dover Pubns, 2004. [63] King, A., Billingham, J., and Otto, S., Differential equations: linear, nonlinear, ordinary, partial, Cambridge Univ Pr, 2003. [64] Ehtamo, H. and Raivio, T., “On Applied Nonlinear and Bilevel Programming or Pursuit-Evasion Games,” Journal of Optimization Theory and Applications, Vol. 108, No. 1, 2001, pp. 65–96. [65] Horie, K., “Collocation with nonlinear programming for two-sided flight path optimization,” Dissertation Abstracts International, 2002. [66] Dockner, E., Differential games in economics and management science, Cambridge Univ Pr, 2000. [67] Atkinson, K. E., An introduction to numerical analysis, John Wiley & Sons, Inc, 2nd ed., 1989. [68] Dempe, S., Foundations of Bilevel Programming, Kluwer Academic Publishers, 2002. [69] Freiling, G., Jank, G., and Lee, S. R., “Existence and uniqueness of open-loop Stackelberg equilibria in linear-quadratic differential games,” Journal of Optimization Theory and Applications, Vol. 110, No. 3, 2001, pp. 515–544. [70] Hiramatsu, T., “Game Theoretic Approach to Post-Docked Satellite Control,” The 20th International Symposium on Space Flight Dynamics, 2007. [71] Prussing, J. and Conway, B., Orbital Mechanics, Oxford University Press, 1993. [72] Lewis, F., Abdallah, C., and Dawson, D., “Control of robot manipulators,” Macmillan New York, 1993. [73] Qu, Z. and Dawson, D., Robust tracking control of robot manipulators, IEEE Press Piscataway, NJ, USA, 1995. [74] Engwerda, J., LQ dynamic optimization and differential games, Wiley, 2005.

145 [75] Kuipers, J., Quaternions and rotation sequences: a primer with applications to orbits, aerospace, and virtual reality, Princeton University Press, 2002. [76] Dupree, K., Patre, P., Wilcox, Z., and Dixon, W., “Optimal control of uncertain nonlinear systems using RISE feedback,” 2008, pp. 2154–2159. [77] Hiramatsu, T., Johnson, M., Fitz-Coy, N. G., and Dixon, W. E., “Asymptotic Optimal Tracking Control for an Uncertain Nonlinear Euler-Lagrange System: A RISE-based Closed-Loop Stackelberg Game Approach,” IEEE CDC-ECC 2011, 2011. [78] Fahroo, F. and Ross, I. M., “Costate estimation by a Legendre pseudospectral method,” Journal of Guidance, Control, and Dynamics, Vol. 24, No. 2, 2001, pp. 270–277.

146 BIOGRAPHICAL SKETCH Takashi Hiramatsu was born in Kawaguchi, . Motivated and encouraged by his family, Takashi developed his dream of studying abroad, while he was in high school; and, after graduating, he moved to the United States. He received his bachelor’s degree in aerospace engineering from the University of Florida in the spring of 2005 and went on to join the graduate program in mechanical engineering, with the University of Florida Alumni Fellowship. Throughout the years in graduate school, Takashi worked on solving problems related to space applications. His research interests include astrodynamics, nonlinear control, and differential game theory.

147