<<

MULTI-PLAYER PURSUIT-EVASION DIFFERENTIAL GAMES

DISSERTATION

Presented in Partial Fulfillment of the Requirements for

the Degree Doctor of Philosophy in the

Graduate School of The Ohio State University

By

Dongxu Li, B.E., M.S.

*****

The Ohio State University

2006

Dissertation Committee: Approved by

Jose B. Cruz, Jr., Adviser

Vadim Utkin Adviser Philip Schniter Graduate Program in Electrical Engineering c Copyright by

Dongxu Li

2006 ABSTRACT

The increasing use of autonomous assets in modern military operations has led to renewed interest in (multi-player) Pursuit-Evasion (PE) differential games. However, the current differential game theory in the literature is inadequate for dealing with this newly emerging situation. The purpose of this dissertation is to study general PE differential games with multiple pursuers and multiple evaders in continuous time.

The current differential game theory is not applicable mainly because the terminal states of a multi-player PE game are difficult to specify. To circumvent this difficulty, we solve a deterministic problem by an indirect approach starting with a suboptimal solution based on “structured” controls of the pursuers. If the structure is set-time- consistent, the resulting suboptimal solution can be improved by the optimization based on limited look-ahead. When the performance enhancement is applied itera- tively, an optimal solution can be approached in the limit. We provide a hierarchical method that can determine a valid initial point for this iterative process.

The method is also extended to the stochastic game case. For a problem where uncertainties only appear in the players’ dynamics and the states are perfectly mea- sured, the iterative method is largely valid. For a more general problem where the players’s measurement is not perfect, only a special case is studied and a suboptimal approach based on one-step look-ahead is discussed.

ii In addition to the numerical justification of the iterative method, the theoretical soundness of the method is addressed for deterministic PE games under the framework of viscosity solution theory for Hamilton-Jacobi equations. Conditions are derived for the existence of solutions of a multi-player game. Some issues on capturability are also discussed for the stochastic game case.

The fundamental idea behind the iterative approach is attractive for complicated problems. When a direct solution is difficult, an alternative approach is usually to search for an approximate solution and the possibility of serial improvements based on it. The improvement can be systematic or random. It is expected that an optimal solution can be approached in the long term.

iii To my family.

iv ACKNOWLEDGMENTS

I would like to express my deepest gratitude to my advisor, Professor Jose B.

Cruz, Jr., for his guidance, encouragement and financial support throughout my Ph.D study. He has taught me how to find an interesting problem, given me a great degree of freedom to explore problems independently and provided valuable feedbacks. He always made himself available for me in his quite tight schedule. He has been my role model for scholarship and professionalism. I am also thankful to Professor Vadim

Utkin and Philip Schniter for serving on my dissertation committee.

I am very grateful to Dr. Genshe Chen, who is not only a colleague, but also a friend. He has consistently shown his concern on my research as well as my academic growth over the years. He also provided me a precious opportunity of internship, which helped me find the topic of this dissertation, extended my school experience and made me aware of my long-term goal. I also want to thank Dr. Corey Schumacher for his insightful suggestions that inspire part of the fourth chapter of this dissertation.

Many thanks to those graduate students and postdoctoral scholars who have helped me throughout my Ph.D study to make this unique experience gratifying rather than stressful. I am especially thankful to our team Mo Wei, Dan Shen, Xiao- huan Tan, Xu Wang and Ziyi Sun. Thanks for sharing every step of bitterness and joy in our growth.

v My special thanks goes to my girl friend Ye Ye for her kindness, understanding and tolerance for my absence at all times. Finally, I want to thank my parents Zhenming

Li and Zhonglan Liu, and my brother Chuntao Li for their unconditional love and support all through my life.

vi VITA

December 15, 1977 ...... Born - Nanjing, China

July 2000 ...... B.E. in Engineering Physics Department of Engineering Physics Tsinghua University Beijing, China

August 2002 ...... M.S. in Nuclear Engineering Department of Mechanical Engineering The Ohio State University Columbus, Ohio

September 2002 - Present ...... Graduate Research Associate Graduate Teaching Associate Dept. of Elec. & Computer Eng. The Ohio State University Columbus, Ohio

PUBLICATIONS

Research Publications

D. Li, J. B. Cruz, Jr. and C. J. Schumacher, “Stochastic Multi-Player Pursuit Evasion Differential Games,” Under revision for International Journal of Robust and Nonlinear Control.

D. Li and J. B. Cruz, Jr., “An Iterative Method For Pursuit-evasion Games with multiple Players,” Under revision for IEEE Transactions on Automatic Control.

vii D. Li and J. B. Cruz, Jr., “Improvement with Look-ahead on Cooperative Pursuit Games,” in Proceedings of the 45th IEEE Conference on Decision and Control, San Diego, CA, December, 2006.

D. Li and J. B. Cruz, Jr., “Better Cooperative Control with Limited Look-Ahead,” in Proceedings of American Control Conference, Minneapolis, MN, pp.4914-4919, June, 2006.

D. Li and J. B. Cruz, Jr., “A Robust Hierarchical Approach To Multi-stage Task Allocation Under Uncertainty,” in Proceedings of the 44th Joint Conference of CDC- ECC05, Seville, Spain, pp.3375-3380, December, 2005.

D. Li, J. B. Cruz, Jr., G. Chen, C. Kwan and M.H. Chang, “A Hierarchical Approach To Multi-Player Pursuit-Evasion Differential Games,” in Proceedings of the 44th Joint Conference of CDC-ECC05, Seville, Spain, pp.5674-5679, December, 2005.

G. Chen, D. Li, J. B. Cruz, Jr., X. Wang, “Particle Swarm Optimization for Re- source Allocation in UAV Cooperative Control,” in Proceedings of AIAA Guidance, Navigation and Control Conference, Providence, RI, August, 2004.

J. B. Cruz, Jr., G. Chen, D. Garagic, X. Tan, D. Li, D. Shen, M. Wei, X. Wang, “Team Dynamics and Tactics for Mission Planning,” in Proceedings of the 42nd IEEE Conference on Decision and Control, Maui, Hawaii, pp.3579-3584, December, 2003.

FIELDS OF STUDY

Major Field: Electrical and Computer Engineering

viii TABLE OF CONTENTS

Page

Abstract...... ii

Dedication...... iv

Acknowledgments...... v

Vita ...... vii

ListofTables...... xii

ListofFigures ...... xiii

Chapters:

1. Introduction...... 1

1.1 Background...... 1 1.2 Motivation ...... 10 1.3 DissertationOutline ...... 13

2. Deterministic Multi-Player Pursuit-Evasion Differential Games ...... 17

2.1 Deterministic Game Formulation ...... 18 2.2 Dilemma of the Current Differential Game Theory ...... 24 2.3 A Structured Approach of Set-Time-Consistency ...... 25 2.4 Iterative Improvement Based on Limited Look-ahead ...... 29 2.5 Finite Convergence on a Compact Set ...... 35 2.6 TheHierarchyApproach...... 38 2.6.1 Two-player Pursuit-Evasion Games at the Lower Level . . . 39 2.6.2 Combinatorial Optimization at the Upper Level ...... 39 2.6.3 Uniform Continuity of the Suboptimal Upper Value . . . . . 42

ix 2.7 Theoretical Foundation of the Iterative Method ...... 44 2.7.1 Viscosity Solution of Hamilton-Jacobi Equations ...... 44 2.7.2 Existence of the Value Function ...... 47 2.8 SummaryandDiscussion ...... 54

3. Simulation Results of Deterministic Pursuit-Evasion Games ...... 58

3.1 Feasibility of the Iterative Method ...... 59 3.1.1 Analytical Solution of the Two-player Game Example . . . 59 3.1.2 Numerical Solution by the Iteration Method ...... 61 3.2 Suboptimal Solution by the Hierarchical Approach ...... 62 3.3 Suboptimal Solution by Limited Look-ahead ...... 67 3.3.1 Performance Enhancement by Limited Look-ahead . . . . . 67 3.3.2 Limited Look-ahead with a Heuristic Cost-to-go ...... 70

4. Stochastic Multi-Player Pursuit-Evasion Differential Games ...... 76

4.1 Formulation of Stochastic Pursuit-Evasion Games with Perfect State Information...... 76 4.2 The Iterative Approach to Stochastic Pursuit-Evasion Games . . . 81 4.3 Solution to a Two-Player Pursuit-Evasion Game ...... 86 4.3.1 Classical Theory of Stochastic Zero-sum Differential Games 86 4.3.2 Solution to a Two-Player Pursuit-Evasion Game ...... 91 4.3.3 On Finite Expectation of the Capture Time ...... 95 4.4 On Existence of Solutions to Stochastic Pursuit-Evasion Games with PerfectStateInformation ...... 104 4.5 Stochastic Pursuit-Evasion Games with Imperfect State Information 106 4.5.1 Limited Look-ahead for Stochastic Control Problems . . . . 106 4.5.2 On Stochastic Pursuit-Evasion Games ...... 110 4.5.3 On Finite Expectation of the Capture Time ...... 112 4.6 SimulationResults ...... 116

5. DissertationSummaryandFutureWork ...... 123

5.1 DissertationSummary ...... 123 5.2 FutureWork ...... 126 5.2.1 The Iterative Method for Nonzero-sum Games ...... 127 5.2.2 OnEfficientAlgorithms ...... 128 5.2.3 On Decentralized Approaches ...... 131

Appendices:

x A. ProofofTheorem5...... 134

B. ProofofTheorem6...... 140

Bibliography ...... 145

xi LIST OF TABLES

Table Page

3.1 The Necessary Parameters of the Players for PE Game Scenario 1 . . 64

3.2 The Optimal Engagement Between the Pursuers and the Evaders with theFirst-OrderDubin’sCarModel ...... 65

3.3 The Optimal Engagement Between the Pursuers and the Evaders with theModifiedDubin’sCarModel...... 66

3.4 The Necessary Parameters of the Players in the PE Game Scenario 2 69

3.5 The Necessary Parameters of the Players in the PE Game Scenario 3 72

4.1 The Necessary Parameters of the Pursuers and the Evaders in the SelectedStochasticPEGameScenario ...... 117

xii LIST OF FIGURES

Figure Page

2.1 Illustration of a Suboptimal Approach with Hierarchy ...... 41

3.1 The Speed Vector under the Optimal Strategies of the Players . . . . 60

3.2 The Evolution of the Suboptimal Value Wk’s at (2,1) by Iteration . . 63

3.3 Pursuit Trajectories under the Best Engagement ...... 68

3.4 Pursuit Trajectories under the Strategies by the Hierarchical Approach 69

3.5 Pursuit Trajectories under the Strategies Based on One-Step Limited Look-ahead ...... 71

3.6 A Necessary Condition on the Capturability of the Superior Evader . 73

3.7 Cooperative Pursuit Trajectories of the Superior Evader by the Limited Look-ahead Method Based on a Heuristic Cost-to-go ...... 75

4.1 The Simplified Stochastic PE Game in a One Dimensional Space . . . 95

4.2 The Probability Density of the Capture Time pT (t) ...... 97

4.3 Change of thex ˜-˜y Coordinates to the x′-y′ Coordinates ...... 101

4.4 Illustration of the Pursuer’s Control Based on a Noisy Measurement . 114

4.5 Cooperative Pursuit Trajectories of All 4 Evaders by the 3 Pursuers . 119

4.6 Cooperative Pursuit Trajectories of All 4 Evaders by Pursuer 1 and 2 120

xiii 4.7 Pursuit Trajectories of All 4 Evaders by Pursuer 1 ...... 121

4.8 Cooperative Pursuit Trajectories of the 4 Evaders by the 3 Pursuers with the Imperfect Observation of the Evaders’ Strategies ...... 122

5.1 Improving Strategies of Pursuer i in a “Cooperation Region” . . . . . 129

xiv CHAPTER 1

INTRODUCTION

1.1 Background

Optimization is the kernel of decision-making in physical or organizational sys- tems, where a choice is made among a set of possible alternatives with the minimum cost. There are three main components in an optimization problem: the decision- maker, the objective function and the information available to the decision-maker [1].

In traditional optimization theory, all these ingredients are singular, i.e., there is only one decision-maker, one objective function and one information set. Over the cen- turies, both the theory and the practice of optimization have gone in great depth [2].

The setup of optimization problems with a single decision-maker, criterion and information set is often not adequate for modelling complicated problems involving multiple decision-makers. In the applications of social science and economics, there are usually more than one decision-maker and each decision-maker has its own in- formation set, control input to the system and personal preference of the outcome, which depends on the input from each decision-maker. In this type of problem, it is very unlikely that all decision-makers share a common objective and all available in- formation. Therefore, optimization theory above may fail to model a problem where

1 (multiple) decision-makers who have conflicting personal interests. The study of de-

cisions in such conflicting situations provided the original impetus for the research in

games.

Game theory is the study of strategic interactions among rational players, where

each player makes decisions according to its own preference. The history of game

theory can go way back to the Bablylonian Talmud1, where the so-called marriage

contract problem may be regarded as anticipation of the theory of games [3]. The

actual birth of the modern game theory was due to John Von Neumann, who was the

first to prove the minimax theorem. After that, great progress has been made. In

1944, John Von Neumann and Oskar Morgenstern published their seminal work [4],

which laid the groundwork of game theory including two-person zero-sum game and

the notion of a cooperative game etc. They also gave the account of axiomatic utility

theory, which has been widely accepted in decision theory. In 1950, John Nash proved

the existence of a strategic equilibrium (now called as Nash equilibrium) for non-

cooperative games [5]. Nowadays, it is the most widely adopted solution concept for

problems in a game setting2. The last fifty years witnessed intensive research activities in game theory [3], among which is the establishment of the theory of dynamic games.

Dynamic games belong to game theory where the games evolve in (discrete or continuous) time. Their origin may be traced back to the mixed-strategy in a finite matrix game, which could be interpreted as the relative frequency of each move (pure

1The compilation of ancient law and tradition set down during the first five centuries A.D.. 2It is worth mentioning that this is not the only solution concept for game problems. In a game where the decision-making of the players has a hierarchy, i.e., one of the players (called leaders) can enforce the strategy of other players (called followers) who react (rationally) to the leader’s decisions, Stackelberg (hierarchical) equilibrium is used. This name is due to the first study of economic competition by Heinrich von Stackelberg [6].

2 strategy) by repeating the same game many times [7]. It has connections to a discrete- time dynamic game, which is played over (finite or infinite) stages. In a discrete-time dynamic game, each player has a payoff function associated with each stage (possibly different with stages) and the transition of the game from stage to stage is govern by a difference equation that depends on the inputs of all the players. If the evolution of a game is described by differential equations, the dynamic game is also referred to as “differential game” to emphasize the role of the differential equations.

Unlike the initial development of game theory which was inspired by the problems in social science and economics, the main motivation of differential games was the study of military problems such as Pursuit-Evasion (PE) games. In such a (two- player) PE game problem, the pursuer wants to capture the evader and the optimal strategies of both players are studied [11, 7]. Besides, another important motivation for zero-sum differential games is the worst-case controller design for a system with disturbance. In these design problems, the process to be controlled has two types of inputs, controlled and disturbance, and two types of outputs, regulated and measured.

The goal is to suppress the effect from the disturbance on the regulated output. We refer the readers to [8] for linear systems and to Appendix B in [9] (and the references therein) for the nonlinear case.

Differential games were initiated by Rufus Isaacs in early 1950s when he stud- ied the military PE type of problems in the Rand Corporation [10, 11]. The PE game he studied is a two-player zero-sum game, where the players have completely opposite interests. Isaacs used the method of “tenet of transition” [11] to resolve the strategic dynamic decision-making in this adversarial environment, and he ana- lyzed many special cases of Partial Differential Equation (PDE), which is now called

3 Hamilton-Jacobi-Isaacs (HJI) equation. Although developed concurrently with Dy-

namic Programming (DP), his method is essentially a DP method based on backward

analysis.

Following Isaacs, the research in differential games has been boosted. The theory

of two-player zero-sum differential game has been generalized to the N-player non-

cooperative (nonzero-sum) game case [7]. Nash equilibrium is adopted as the solution

concept when the roles of all players are symmetric. The concept of Stackelberg

solution that appeared in a hierarchical (static) optimization problem has also been

extended into differential games due to the work by Chen, Simaan and Cruz [13, 14].

We refer to [7] for detailed introduction of this topic.

The development of differential game theory goes along with optimal control the- ory because their solution techniques are closely related. An optimal control problem can be viewed as a special case of a differential game with one player. The kernel of optimal control theory includes the minimum principle and DP. The minimum principle3 was developed by Pontryagin and his school in USSR in the 1960s [15]. It states that the optimal trajectories of the system satisfy the “Euler-Lagrange equa- tions” and at each point along the trajectories, the optimal control must minimize the Hamiltonian. The minimum principle deals with one extremal associated with one state at a time, such that the resulting optimal control is open-loop.

The invention of DP was due to Richard Bellman and his school in the 1950s [12,

16]. Bellman is the first to apply DP to discrete-time optimal control problems, demonstrating that state rollback is a natural approach to solving optimal control

3It is also refereed to as the maximum principle in the context of a maximization problem.

4 problems. More importantly, the DP procedure results in closed-loop, generally non-

linear, feedback schemes, which makes it the only general method for stochastic con-

trol and differential games. DP principle (or principle of optimality) basically says

that for an optimal system trajectory, its portion starting from any point along

the trajectory to the terminal must also be optimal. The optimal value of the per-

formance index starting from any state at any time is defined as the value function.

With additional assumptions on differentiability, the value function can be charac-

terized as the unique solution of a PDE called Hamilton-Jacobi (HJ) equation4. The notion of value function and HJ equations have roots in Calculus of Variations5 [17].

Unlike the minimum principle, DP deals with a family of possible extremal trajec-

tories at a time, which is associated with the value function at different states and

times. Therefore, the control obtained is in the form of feedback, which makes it

more useful in problems with uncertainty and games. We refer to [19] for connections

of the minimum principle and DP, and to [20] for an overview of this subject.

Uncertainty is inherent in real-world problems, and it induces many disadvantages

to optimal control problems (resp. differential games). If the information available

to a controller (resp. player) is uncertain or if the evolution of the system process is

governed by random processes, the performance criterion is no longer certain. Thus,

the expected value may be the best to optimize, provided that the statistics of the

uncertain factors is known. This leads to the problem of stochastic optimal control

(resp. stochastic differential games).

4In the optimal control setting, it is called Hamilton-Jacobi-Bellman (HJB) equation; it is referred to as HJI equation in differential games. 5It can be dated back to the great mathematicians such as Bernoulli, Leibniz and Newton in connection with the brachistochrone problem [18].

5 The pursuit of stochastic optimal control as well as stochastic differential games

can be traced back to the initial study of random process by Einstein on Brownian

motion. Later, Norbert Wiener with G.I. Taylor made tremendous contributions [21],

among which is Wiener’s theorem on optimal filtering based on minimizing the mean

squared estimate error. It is during this period that stochastic techniques were intro-

duced into control and communication theory. In 1961, Kalman and Bucy extended

the Wiener filter to a time-varying linear system[22], and this state estimator is

called the Kalman-Bucy filter6. At about the same time, optimal control of linear

systems with a quadratic objective function was studied by Kalman [23] (see also Mer-

riam [24]), and the problem was later called the Linear Quadratic Regulator (LQR)

problem. The extension of this problem to linear dynamical systems with additive

white noise7 was first studied by Florentin [25, 26], Joseph and Tou [27] and Kush- ner [28]. This problem can be handled by DP and stochastic calculus, where the

HJB equation becomes a second-order PDE. The next extension was to consider the problem with noise-corrupted observation. Gunckel and Franklin [29] and Joseph and

Tou [27] showed that the optimal linear feedback gains obtained in the deterministic

LQR problem with the state estimate from the Kalman filter generates the optimal control, and this procedure was later called “the separation principle”. The general partially observable problem was first treated by Davis and Varaiya in [30], where they introduced the notion of information state to reduce the problem to a fully ob- servable problem. The problem was also studied later by Fleming and Pardoux [31].

6The discrete-time version is called the Kalman filter. 7White noise driven systems have been used widely in modern stochastic optimal control and filtering theories. In fact, white noise, being a mathematical idealization, gives only an approximate description of real noise.

6 In general, a stochastic control problem with imperfect measurements is very hard, and the issue of the existence of solutions as well as solution techniques is still largely open [32].

DP can be used to solve a differential game problem provided that the value func- tion exists. Although Isaacs resolved many differential game problems using the DP type of method, unfortunately, he did not provide a priori condition on the existence of value function, nor did he give an adequate definition of strategy. The first general and rigorous result about the existence of value in differential games was developed by Fleming. He proved the existence of value for finite-horizon problems and for special cases of PE games [33, 34]. Its main idea is to determine the strategies of players on small time intervals based on the game history. The value is defined by passing to the limit as the lengths of the time intervals tend to zero. Approaching the problem in a different way from Fleming, Varaiya [35], Roxin [36] and Elliott and

Kalton [37, 38] introduced the notion of strategies and value of differential games, established the existence of value and demonstrated that the value satisfies the HJI equation in a generalized sense. Based on some of these results, Friedman developed his theory in the early 1970s and it is summarized in the books [39] and [40]. Fur- thermore, the theory of positional differential games was developed by Krasovskii and

Subbotin in the 1970s, where the value function of the game is shown as the unique function that is stable with respect to both players [41]. Finally, the Berkovitz’s the- ory of existence of value combines features of the methods by Isaacs, Friedman and

Krasovskii-Subbotin [42].

At that time, it was generally believed that in order to apply DP methods, not only the value function of a differential game (control) problem must exist, but also

7 it should be sufficiently smooth because the formulation of HJI (HJB) equations depends on its differentiability. This additional requirement makes the applicability of DP methods extremely limited because even for a simple game (optimal control) problem, its value function may be non-differentiable or even discontinuous. Under such circumstances, it would be difficult to justify the meaning of a HJ equation and whether its solution is the value function. From the 1960’s, many researchers worked on this subject [9]. The turning point is the introduction of viscosity solution for

first-order HJ equations by M.G. Crandall and P.L. Lions in 1983 [43]. Viscosity solution is a weak (or generalized) solution for HJ equations, with which the rigorous mathematical justification of DP methods becomes possible. More importantly, the authors proved the uniqueness of the solution under relatively weak conditions. This concept was further simplified by Crandall, Evans and Lions in [44]. Viscosity solution was first extended to second-order HJ equations with convexity assumption by P.L.

Lions via stochastic control arguments [45]. With purely analytical techniques, R.

Jensen made a major breakthrough on more general (nonconvex) equations [46], and was further simplified by R. Jensen, P.L. Lions and P.E. Souganidis [47]. The most general results on second-order HJ equations were due to H. Ishii and P.L. Lions [48].

We refer to [45] and [49] for an introduction of this subject.

The value function of a differential game (optimal control) problem is closely connected to the viscosity solution of HJ equation, and the study on this relationship started from a significant observation made by P.L. Lions in his book [45] on page

53-54,

“value functions (of optimal control) do in fact satisfy the associated Hamilton-Jacobi equations in the viscosity sense”.

8 Afterwards, a considerable amount of research has been devoted to the applications

of viscosity solution theory on optimal control as well as differential games. The con-

nection between differential games and viscosity solution was perhaps first drawn by

Souganidis in his Ph.D thesis in 1983 with respect to the notion of value of a game

by Fleming and Friedman. Shortly after that, Evans and Souganidis proved that the

value8 of a differential game is the viscosity solution of the associated HJI equation for problems with finite horizon. The connections of the work by Krasovskii and

Subbotin, and Berkovitz with viscosity solutions have also been drawn [9]. Similar results have been obtained for differential game problems under various conditions.

Finally, stochastic (two-player) zero-sum differential game has studied by Fleming and Souganidis under the framework of viscosity solutions of second-order HJI equa- tions [50].

Practically, DP suffers from the so-called “curse of dimensionality”, which stands for the exponential growth of the computational requirements with the number of state variables and time steps. The development of approximate DP methods on large-scale problems began with the policy iteration method for Markov Decision

Process (MDP)9 by Ron Howard [51]. It provides all essential elements of underlying the theory and numerical algorithms of modern approximate DP (or reinforcement learning in computer science) [52]. Over the decades, the numerical DP methods for continuous-time optimal control problem and differential games have also been exten- sively studied [53], and Appendix A in [9] gives an excellent overview of this subject.

8Here, the concept of value is due to Varaiya, Roxin and Elliott and Kalton. 9MDP is a stochastic sequential decision problem with an infinite horizon, where the system evolution is described by a controlled Markov chain. It provides a mathematical framework for modelling decision-making in an uncertain environment.

9 Besides, since a closed-loop optimal solution for a stochastic control problem with an imperfect state information pattern is formidable, suboptimal optimal control meth- ods such as Open-loop Feedback Control and rollout algorithms have been developed based on the DP principle [54]. Connections have been recently drawn between these methods and model predictive control that has been recently widely used in industry.

A unified framework of those methods is introduced in [55], where readers can find a good survey of the recent developments in this area.

1.2 Motivation

PE game was the first impetus for the research of differential game theory. In such a problem, one or a group of players called pursuers go after one or more moving players called evaders. By application of differential game theory, a number of formal solutions regarding optimal strategies in particular PE problems can be achieved [11,

7, 56, 57]. In the literature, a PE problem is usually formulated as a zero-sum game in which the pursuers try to minimize a prescribed cost functional while the evaders try to maximize the same functional. Due to the development of LQ optimal control theory, a large portion of the literature focuses on PE differential games with a performance criterion in a quadratic form and linear dynamics [56, 57, 58].

Furthermore, most studies on PE games in the current literature concentrate on two- player games with a single pursuer and a single evader, and the results for general multi-player PE games are still largely sparse.

Intuitively, the strength of the pursuit by a group of pursuers is more than a simple summation of that from each individual pursuer. Cooperation among interactive pursuers plays an important role. To date, the literature in this field is very limited

10 and only special cases are considered. In [64], Foley and Schmitendorf studied a problem with two pursuers and one evader using LQ formulation. They partitioned the state space according to the cases when the evader is pursued by either one or both of the pursuers. In the region that both of the pursuers are effective, an optimal strategy of the evader is determined against the nearest pursuer. Another special example was studied by Pashkov and Terekhov in [65], where once again a problem of one evader and two pursuers with a nonconvex payoff function was considered.

More multi-player PE problems of special type can be found also in [66, 67] and the references therein.

Although differential game theory has extensive and potential applications that span from traffic control, engineering design, economics to behavioral biology, its pri- mary importance lies in military type applications such as missile guidance, aircraft control and aerial tactics etc [11, 59, 58]. In recent years, Autonomous (Aerial/Ground)

Vehicles AV (AAV/AGV) have shown a great potential value in reducing the human workload and have been increasingly used in military operations. A fleet of AV’s can work more efficiently and effectively than a single AV, but requires a complex high-level tactical planning tool. AV related problems have recently received a lot attention, and one of which is the PE problems involving multiple pursuers and mul- tiple evaders. A typical scenario is that a group of AVs is tasked to suppress a cluster of adversarial moving targets. This problem can naturally be modelled as a PE game with multiple pursuers and multiple evaders. However, the previous theoretical re- sults that mainly focus on problems of a single pursuer and a single evader are not

11 sufficient to deal with the problem of cooperative pursuit of multiple evaders by mul- tiple pursuers in the military situations. Hence, the study of multi-player PE games is very much needed for dealing with the newly emergent situations.

Recently, various researchers have reinitiated the study of this problem [59, 60,

61, 62, 63]. In [60, 61], Hespanha et al. formulated PE games in discrete time under a probabilistic framework, in which greedy and Nash equilibrium strategies with one-step look-ahead are solved respectively to maximize the probability of finding evaders. PE strategies were also studied by Antoniades, Kim and Sastry in [62], where several heuristic solutions were attempted and compared. Furthermore, the system structure and various issues of implementation of PE strategies by a team of autonomous vehicles are discussed in [59, 63]. These approaches all deal with problems in discrete time, in which the problem of search and pursuit are intertwined.

The models they used are adopted from the implementation point of view; however, those pursuit strategies obtained are not saddle-point (equilibrium) solutions for PE dynamic games. Finally, the most recent related work was done by Mitchell, Bayen and Tomlin in [68], where a numerical algorithm to compute the reachable set of a dynamic game with two players was described. This is a complementary problem of a

PE game; however, the algorithm is useful for solving PE game problems. Generally speaking, in spite of all these existing results, general PE differential game problems involving multiple pursuers and multiple evaders still deserve further research effort.

12 1.3 Dissertation Outline

In this dissertation, we study a general multi-player10 PE differential game prob- lem with a cumulative cost functional and general nonlinear dynamics of the players.

We use an indirect approach based on iterative improvement of a suboptimal solution, and then extend to the stochastic case. Existence of a value function under saddle- point equilibrium11 is examined under the framework of viscosity solution theory of

HJ equations. Issues on implementation and approximation schemes are discussed, and the simulation results are presented. This dissertation is the first attempt to treat a general multi-player PE game in continuous time. The major objective is to extend the current theory of PE differential games (mainly on two-player PE games) to a general multi-player PE differential game. Moreover, the method we propose can solve not only multi-player PE game problems but also complex dynamic opti- mization problems such as cooperative control problems. The detailed outline of the dissertation is as follows.

In Chapter 2, the main idea of the this dissertation is presented. We first formu- late a general deterministic multi-player PE differential game, where the strategy of the players is defined together with the (upper/lower) value function of a game. We show the deficiency of the current differential game theory in solving a multi-player

PE differential game problem. To circumvent this difficulty, we first propose a sub- optimal approach by only accounting for a subset of the pursuers’ controls under an additional structure. If the structure satisfies the property of set-time-consistency, the

10The term “multi-player” means multiple pursuers and multiple evaders. 11A saddle-point equilibrium is a Nash equilibrium in a zero-sum game. The rigorous definition in the context of multi-player PE games is provided in Chapter 2.

13 resulting suboptimal upper value function has an improving property, i.e., it can be

improved by the optimization based on limited look-ahead. If it is applied iteratively,

a convergent sequence of suboptimal upper value functions can be generated and the

true upper value function can be approached in the limit. We further show that if the

suboptimal upper value functions are continuous, the convergence can be achieved in

a finite number of steps on a compact set of the state space. Regarding the struc-

tured approach, we propose a hierarchical method that is set-time-consistent, and the

corresponding suboptimal solution is a valid starting point for the iterative process.

Furthermore, we examine the feasibility of this iterative method under the framework

of viscosity solution theory for HJ equations. Sufficient conditions are derived for a

class of PE problems, under which the value function (saddle-point) of a multi-player

PE game exists.

In chapter 3, several simulation results are presented to demonstrate the usefulness of the iterative method. We first apply the iterative method on a two-player PE game for which an analytical solution is also derived. The convergence result obtained co- incides with the prediction from the analytical solution. For a multi-player PE game, we focus on suboptimal numerical solutions. First, we use the hierarchical method to determine a suboptimal solution. Then, we demonstrate that the performance of this suboptimal solution can be improved by the optimization based on one-step look- ahead. Finally, an advantage of the limited look-ahead in solving a more complicated problem is presented through a particular example involving a superior evader.

In Chapter 4, we extend the iterative method for deterministic PE games to stochastic problems. We formulate a stochastic problem where the players’ (pur- suers/evaders) dynamics are governed by random processes and the game states are

14 perfectly accessible by all the players. Accordingly, the concept of the players’ strate- gies and value function are introduced. We show that the iterative method for deter- ministic problems is largely applicable to stochastic problems. Besides, some effort is devoted to two-player stochastic PE games. We derive an analytic solution to a specific game problem, and also, the issue on capturability is discussed. Finally, we briefly discuss a more general problem where the players no longer have perfect state information. It is a very hard problem and only a special case is investigated. A suboptimal method based on one-step look-ahead is proposed, and it is compared to the well-known suboptimal method called Open-Loop-Feedback-Control [54].

We summarize the dissertation in Chapter 5, and point out possible future research directions on this topic in the end.

Basic Notations

R 0 The set of nonnegative real numbers ≥ R+ The set of positive real numbers

Z 0 The set of nonnegative integers ≥ Z+ The set of positive integers r t , min r, t for r, t R ∧ ∧ { } ∈ r t , max r, t for r, t R ∨ ∨ { } ∈ T Capture time of some evader Γ The set of nonanticipative strategies of pursuers ∆ The set of nonanticipative strategies of evaders i Γj The set of nonanticipative strategies of pursuer i to evader j j ∆i The set of nonanticipative strategies of evader j to pursuer i E Engagement scheme between pursuers and evaders

Ei The set of evaders that are engaged with pursuer i The cardinal number of a finite set |S| S v v , vTv for v Rn | | | | ∈ int( ) The interior of a set S S 15 ¯ The closure of a set S S ∂ The boundary of a set S S B(x, δ) Open ball of radius δ centered at x C1( ) Set of functions on with a continuous first-order derivative S S C2( ) Set of functions on with a continuous second-order derivative S S x State x at τ given the initial x under control a( ) and b( )(τ t) τ;xt,a,b t · · ≥ z State z at τ given the initial z associated with x( ) (τ t) τ;zt,x t · ≥

16 CHAPTER 2

DETERMINISTIC MULTI-PLAYER PURSUIT-EVASION DIFFERENTIAL GAMES

In this chapter, we first formulate a deterministic multi-player PE differential game, and introduce the concepts of the players’ strategy and (upper/lower) value function. Then, the dilemma of the current differential game theory in solving multi- player games is discussed. To circumvent this difficulty, we adopt an indirect ap- proach. We first solve the problem for a suboptimal upper value within a subset of the pursuers’ controls under some structure. If the control structure satisfies set-time- consistency, the suboptimal upper value obtained can be improved by the optimization based on limited look-ahead. When the improvement is applied iteratively, a conver- gent sequence of suboptimal upper value functions can be generated and the true upper value is approached in the limit. If the suboptimal upper value functions are continuous, this convergence is achieved in a finite number of steps on any compact subset of the state space. Regarding the structured approach, we propose a hierar- chical decomposition method that is set-time-consistent, such that it can provide a valid starting point for the iterative process. At the end of this chapter, we study the theoretical soundness of the method under the framework of viscosity solution theory.

Sufficient conditions are derived, under which the value function (saddle-point) exists.

17 2.1 Deterministic Game Formulation

Consider a general PE differential game with N pursuers and M evaders in a n0

dimensional space S, S Rn0 . Denote by xi (xj) the state variable associated with ⊆ p e i j pursuer i, i = 1, ,N (evader j, j = 1, , M), where xi Rnp (xj Rne ). Here, ··· ··· p ∈ e ∈ ni , nj, n Z+ and ni , nj n depending on the dynamics of the players. Assume p e 0 ∈ p e ≥ 0 i j that the first n0 elements in state xp (xe) denote the physical position of pursuer i

(evader j) in S. The dynamics for pursuer i and evader j are

i i i i i x˙ p(t)= fp(xp(t),ai(t)) with xp(t0)= xp0, (2.1a)

j j j j j x˙ e(t)= fe (xe(t), bj(t)) with xe(t0)= xe0. (2.1b)

In (2.1), ai and bj are control inputs of pursuer i and evader j respectively, which satisfy

a ( ) Ai(t) , φ : [t, T] Ai φ( ) is measurable , (2.2) i · ∈ 7→ a | · n o b ( ) Bj(t) , ϕ : [t, T] Bj ϕ( ) is measurable . (2.3) j · ∈ 7→ a | · n o i j Here, t 0; Ai Rmp and Bj Rme are compact sets for mi ,mj Z+; T > 0 ≥ a ∈ a ∈ p e ∈ may be the terminal time of the game or some intermediate time of the form t +∆t

i i that will be introduced shortly. We assume that functions f i : Rnp Ai Rnp and p × a −→ j j f j : Rne Bj Rne satisfy the following boundedness and Lipschitz conditions. e × a −→ f i(xi ,a ) C , p p i ≤ i (2.4a) f i(xi ,a ) f i(¯xi ,a ) C xi x¯i ;  p p i − p p i ≤ i p − p j j fe (xe, bj) Cj, ≤ (2.4b) f j(xj, b ) f j(¯xj, b ) C xj x¯j  e e j e e j j e e − ≤ − i j i i n j j ne i j for some Ci,Cj > 0 and all x , x¯ R p , x , x¯ R , ai A , bj B . Here, p p ∈ e e ∈ ∈ a ∈ a is the norm in the corresponding Euclidean space, and we adopt 2-norm in this k · k 18 dissertation. For simplicity, let

T T T T x = x1T, x2T, , xN , x = x1T, x2T, , xM , p p p ··· p e e e ··· e h Ti h T i a = a T,a T, ,a T , b = b T, b T, , b T , 1 2 ··· N 1 2 ··· M h i h i T T 1T 2T N T 1T 2T M T and similarly let f = f , f , , f and f ′ = f , f , , f . Then, p p p ··· p e e e ··· e h i h i according to equation (2.1), the dynamic equation can be rewritten as

x˙ p(t)= fp(xp(t),a(t)) with xp(t0)= xp0, (2.5a)

x˙ e(t)= fe′(xe(t), b(t)) with xe(t0)= xe0. (2.5b)

T N i Further, let x = xT, xT , and denote by X the state space, i.e., X , Rnp p e i=1 × M j h i   Rne Q j=1 . In addition, denote by Aa the set of admissible controls of all the  Q  N i M j pursuers, i.e., Aa = i=1 Aa and similarly for the evaders, Ba = j=1 Ba. Based on (2.5), the sets of theQ admissible control inputs of the pursuers andQ the evaders,

A(t) and B(t) can be defined accordingly, which are composed of Ai(t) and Bj(t)

respectively.

i For pursuer i, define the projection operator P : Rnp Rn0 as →

T P (xi )= xi , xi , , xi . (2.6) p p1 p2 ··· pn0   i Clearly, P (xp) stands for the physical position of pursuer i. A similar operator is

defined for every pursuer and every evader, and we use the same notation P ( ) for · each of them. Evader j is considered captured if there exists pursuer i, such that

P xi (t) P xj(t) ε for t 0. Here, ε is a predefined small positive number. p − e ≤ ≥

The capture time of evader j is defined as

T = min t 0 i, 1 i N, such that P xi (t) P xj(t) ε . (2.7) j ≥ ∃ ≤ ≤ p − e ≤     19 In a PE game with multiple evaders, the evaders are generally not captured simul-

taneously. At the terminal of the game, all the pursuers are captured. The terminal

time of the PE game, T , can be defined as

T = max Tj . (2.8) j  If a PE game ends at a finite time and N < M, there must be some pursuer i that captures more than one evader. To account for this situation, we assume that when pursuer i captures an evader, it is free to go after other evaders. Thus, the number of “alive” evaders (evaders that have not been captured) in a PE game is crucial to the pursuit strategy. We augment the state x with a discrete variable z ZM = Z Z Z with Z = 0, 1 . Here, z , the jth element of vector z, is a ∈ · ··· { } j M binary variable and zj = 1 if evader j is not captured while zj = 0 if it is captured. | {z } According to the definition of the capture, variable z is governed by g :Z X Z j j × → according to the algebraic equation as

gj(0, x) = 0; i j 0 P xp(t) P xe(t) ε for some i, 0 i N, gj(1, x)= − ≤ ≤ ≤ 1 otherwise. (  

T Let g , g , ,g , and the algebraic equation regarding z can be put into a 1 ··· M compact form as 

+ z(t)= z(t )= g z(t−), x(t) . (2.9)

+  T Here, z(t ) and z(t−) denote the right and the left limit at time t. Define z = z z, | | indicating the number of “alive” evaders. Given any z ZM , define the set ∈

Z = z˜ ZM z˜ = 0 if z = 0 , (2.10a) z ∈ | j j  I = j 1, , M z = 0 . (2.10b) z ∈ { ··· }| j 6  20 Clearly, the terminal time of a game, T , can also be described as T = min t z(t) = 0 . |  Assumption 1 Every evader stops moving when it is captured.

With Assumption 1, the dynamics of evader j in (2.1) becomes

x˙ j(t)= z (t) f j(xj(t), b (t)) with xj(t )= xj . (2.11) e j · e e j e 0 e0

According to (2.11), control bj is meaningful only when zj(t) = 1. Now, replace T 1T M T the dynamics f ′ in (2.5) by f = z f , , z f , and the dynamics of the e e 1 · e ··· M · e h i players (pursuers/evaders) can be rewritten as

x˙(t)= f(x(t),a(t), b(t)) with x(t0)= x0, (2.12)

T T T T T T where x = xp , xe and f(x, a, b)= fp (x, a), fe (x, b) .

Now, We choose the objective functional as 

T J(a, b; x0, z0)= G x(t), z(t),a(t), b(t) dt + Q x(T ) (2.13) Zt0   subject to (2.12) and (2.9).

In (2.13), the terminal cost Q : X R 0 is bounded and Lipschitz, i.e., → ≥

Q(x) C and Q(x) Q(¯x) C x x¯ (2.14) | | ≤ Q | − | ≤ Qk − k

M for some CQ > 0 and all x, x¯ X; the cost rate G : X Z Aa Ba R 0 ∈ × × × → ≥ satisfies

δ G(x, z, a, b) C , and G(x, z, a, b) G(¯x, z, a, b) C x x¯ (2.15) ≤ ≤ G | − | ≤ Gk − k for some 0 <δ 0 G ∈ ∈ ∈ a ∈ a in equation (2.15), the optimization with respect to J is meaningful only when T is

finite.

21 We model a multi-player PE game as a zero-sum game, where the pursuers try to

minimize the objective J, while the evaders are maximizing the same objective. With

this formulation, the problem is aimed to solve for optimal “centralized” strategies of

players (opposite to the decentralized scheme with respect to multiple agents). It is

assumed that communications among the pursuers (evaders) are ideal and available

information is shared by all the pursuers (evaders). Clearly, in contrast to a two-

player PE game, both cooperation and competition are relevant in a multi-player

game.

Before defining the value function of a multi-player PE differential game, we need

to specify the information pattern of the players. We first define nonanticipative

strategies as follow. Consider a strategy of the pursuers at time t 0 as a map ≥ α : B(t) A(t). 7→

Definition 1 The strategy α is called nonanticipative strategy if for any t ≤ s T and any b, ˜b B(t), b(τ) = ˜b(τ) for t τ s almost everywhere implies ≤ ∈ ≤ ≤ α[b](τ)= α[˜b](τ) for t τ s almost everywhere. ≤ ≤

Denote by Γ(t) the set of all nonanticipative strategies for the pursuers beginning at time t. Similarly, a nonanticipative strategy β : A(t) B(t) can be defined for the 7→ evaders and the set of all β’s is denoted by ∆(t). According to this definition, it can be seen that the players in a game choose their control inputs at each time instant only based on the current and past information. In the following, we use the notations

α[b]i and β[a]j to stand for the strategies of pursuer i and evader j respectively.

22 Remark 1 Note that by inspection of (2.11), at any x X, z ZM , the strategy of ∈ ∈ evader j, β[a] , is meaningful only if z = 0. Thus, different sets of “alive” evaders j j 6 lead to different optimization problems depending on evader j for j I , which is the ∈ z case in (2.16)-(2.17) as follows.

Finally, we can define the lower and upper value of a multi-player PE game. We

use x as a short notation for x(t), and similarly for z, a and b. For any x X and t t ∈ z ZM , the lower Value of a game V (x , z ) is defined as t ∈ t t

V (xt, zt) = inf sup J(α[b], b; xt, zt) α Γ(t) b( ) B(t) ∈ · ∈  T = inf sup G xτ , zτ , α[b]τ , bτ dτ + Q(xT ) . (2.16) α Γ(t) b( ) B(t) t ∈ · ∈  Z   Similarly, the upper value V (xt, zt) is

V (xt, zt) = sup inf J(a, β[a]; xt, zt) β ∆(t) a( ) A(t) ∈ · ∈  T = sup inf G xτ , zτ ,aτ , β[a]τ dτ + Q(xT ) . (2.17) β ∆(t) a( ) A(t) t ∈ · ∈  Z   Note that since the dynamics of the players are time invariant and the terminal time

T is not fixed, V and V are not a function of time t. Thus, in the following discussion,

unless otherwise explicitly stated, we use (general) time t to denote the initial time of

a game. In (2.16), the pursuers have the informational advantage, i.e., at each t 0 ≥ they know the decisions made by the evaders at t, and similarly for the evaders in

(2.17) [9]. In general, V (x , z ) V (x , z ). Furthermore, if t t ≤ t t

V (xt, zt)= V (xt, zt) (2.18)

for any x X and z ZM , we say the value of a game exists, which is denoted t ∈ t ∈ by V (xt, zt). Condition (2.18) is called the Isaacs condition, under which V (xt, zt) is

23 the saddle-point equilibrium of the PE game. With these definitions, optimality can

be interpreted according to V , V or V . Henceforth, we use the capitalized “Value”

to stand for the (lower/upper) Value functions of a multi-player PE game defined in

(2.16)-(2.18). The definition of the Value above is due to Varaiya [35], Roxin [36] and

Elliott and Kalton [37, 38].

2.2 Dilemma of the Current Differential Game Theory

The conventional methods for solving differential games are closely related to

optimal control theory, which includes DP and the minimum principle. In DP meth-

ods, the Value of a differential game (optimal control) problem is characterized by

a HJI (HJB) equation, in which initial conditions on the terminal states are gener-

ally needed. Note that Isaacs’ method of “tenet of transition” [11] is essentially the

same as DP, which is based on the underlying idea of state rollback. Starting from

the terminal, an optimal state trajectory is traced backwards under the optimal con-

trol strategies. However, in a multi-player PE game, the techniques based on state

rollback encounter tremendous difficulty both in the problem formulation and in spec-

ifying terminal states. First, the treatment by the corresponding HJI equation for a

multi-player game problem is not obvious because the additional state z is discrete.

Second, in contrast to a two-player PE game, the terminal state of a multi-player game is difficult to specify. In a multi-player PE game, if pursuer i catches evader j, we say that both players are engaged. To conduct the analysis based on state roll- back, we need to start with a specific engagement scheme between the pursuers and the evaders. However, the number of possible engagements between the pursuers and the evaders increases drastically with N and M. This explosion makes the optimal

24 terminal state difficult to determine in the dynamic optimization problem. Moreover,

the evaders are generally not captured at the same time, which makes it more diffi-

cult to find an optimal terminal state. Therefore, DP approaches cannot be directly

applied to multi-player PE games. On the other hand, a similar obstacle will be en-

countered when the minimum principle is applied, because boundary conditions are

also needed. Generally speaking, the current differential game theory is inadequate

to solving a multi-player PE differential game, such that alternative techniques are

greatly needed.

2.3 A Structured Approach of Set-Time-Consistency

We first clarify the problem of multi-player PE games to be solved. We do not

assume the Isaacs condition, and our study is focused on the upper Value function

V defined in (2.17). In this case, the evaders are endowed with the informational advantage, which corresponds to a worst case from the pursuers’ perspective. We confine our attention to the “capture region” [11, 7], the subset of the state space X

where the capture of all the evaders is ensured by certain strategy of the pursuers.

By abuse of notation, we use X to denote the “capture region” of interest. In other

words, for all x X, z ZM , there exists a strategy a( ) A(t) of the pursuers such ∈ ∈ · ∈ that all the evaders are captured by a finite time for any β ∆(t). This restriction ∈ assures us that we are solving the problem of finding optimal strategies of the pursuers

(or evaders) without worrying about capturability12.

Since the current differential game theory is not applicable, we start with a sub-

optimal solution for a multi-player PE differential game. We first specify a class of

12Here, capturability is referred to the problem of determining whether or not all the evaders can be captured by the pursuers at certain states.

25 suboptimal methods, such that any member of the set solves a game problem similar

to the original but with respect to a restricted set of the pursuers’ controls. Later in

section 2.6, we will discuss a hierarchical approach that belongs to the set. Now, sup-

pose some structure S is imposed on the pursuers’ controls. Given any states x X ∈ and z ZM , we denote by AS (t) a nonempty restricted (or structured) control set ∈ x,z of the pursuers, and AS (t) A(t). Here, the superscript and the subscript in AS x,z ⊆ x,z indicate the dependence on structure S and on the states x and z.

Assumption 2 For any x X, z ZM , at time t 0 there exists a AS (t) such ∈ ∈ ≥ ∈ x,z that T < for all b( ) B(t). ∞ · ∈

Under Assumption 2, a minimax problem similar to (2.17) is formulated as

T V (x, z) = sup inf G xτ , zτ ,aτ , β[a]τ dτ + Q(xT ) (2.19) S β ∆(t) a( ) Ax,z(t) t ∈ · ∈  Z  e  Suppose that with this restriction, the difficulty of the differential game theory

in multi-player games can be circumvented such that V (x, z) in (2.19) is solvable.

Clearly, V (x, z) is an upper-bound of the real upper Valuee in (2.17), i.e., V (x, z) ≤ V (x, z).e Next, we define the property of Set-Time-Consistency (STC) of the struc-

turede controls under S, which specifies a class of suboptimal strategies that preserve

certain property with time. Given anya ˜t( ) AS (t), x X, z ZM and · ∈ xt,zt t ∈ t ∈

β ∆(t), denote by x t t the trajectory of x for s t starting from x under ∈ s;xt,a˜ ,β[˜a ] ≥ t t t t t a˜ and β[˜a ]. For short, letx ˜(s) = xs;xt,a˜ ,β[˜a ]. Denote by zs;zt,x˜ the trajectory of z

corresponding tox ˜ and usez ˜(s) =z ˜s;zt,x˜ for a short notation.

26 Definition 2 (Set-Time-Consistency) Control structure S is said to be set-time- consistent, or ST-consistent for short, if for any a˜t( ) AS (t) at any t for · ∈ xt,zt 0 t T , there exists a˜s( ) AS (s) associated with x˜ and z˜ at any s for ≤ ≤ · ∈ x˜s,z˜s s s t s T such that a˜t(τ) =a ˜s(τ) for s τ T under any β ∆(t). ≤ ≤ ≤ ≤ ∈

Remark 2 STC indicates that at any state x˜(s), z˜(s) along the trajectories (t s ≤ ≤ T ), the later portion of the structured control a˜t( ) AS (t), determined at time t, · ∈ xt,zt from time s to T , i.e., a˜t(τ), s τ T , belongs to the structured control set AS (s) ≤ ≤ xs,zs determined at time s associated with xs and zs.

Clearly, if the structure S is independent of the state x, z and time t, it is ST- consistent.

With STC, a suboptimal upper Value in (2.19) has an improving property.

Theorem 1 Under a ST-consistent control structure S, function V (x, z) in (2.19)

satisfies (2.20) for any x X, z ZM with ∆t for 0 ∆t T t.e ∈ ∈ ≤ ≤ − t+∆t V (x, z) sup inf G xτ , zτ ,aτ , β[a]τ dτ + V (xt+∆t;x,a,β[a], zt+∆t;z,x) ≥ β ∆(t) a( ) A(t) t ∈ · ∈  Z  e  e (2.20)

Proof: Given an arbitrary ε > 0 and for any x X and z ZM , there exists ∈ ∈ a˜t( ) AS (t), such that · ∈ x,z T t t V (x, z) sup G xτ , zτ , a˜τ , β[˜a ]τ dτ + Q(xT ) ε ≥ β ∆(t) t − ∈  Z  t+∆t  e t t = sup G xτ , zτ , a˜τ , β[˜a ]τ dτ β ∆(t) t ∈  Z T  t t + sup G xτ , zτ , a˜τ , β[˜a ]τ dτ + Q(xT ) ε (2.21) β ∆(t+∆t) t+∆t − ∈ Z  

27 for any 0 ∆t T t. In (2.21), the equality is due to the principle of DP. By the ≤ ≤ − STC of the structure S, the second term from t +∆t to T in (2.21) satisfies

T t t sup G xτ , zτ , a˜τ , β[˜a ]τ dτ + Q(xT ) β ∆(t+∆t) t+∆t ∈  Z   T sup inf G xτ , zτ ,aτ , β[a]τ dτ + Q(xT ) S ≥ β ∆(t+∆t) a( ) Ax ,z (t+∆t) t+∆t ∈ ( · ∈ t+∆t t+∆t  Z )  = V (xt+∆t;x,a˜t,β[˜at], zt+∆t;z,x). (2.22)

e Substitute (2.22) into (2.21),

t+∆t t t V (x, z) sup G xτ , zτ , a˜τ , β[˜a ]τ dτ + V (xt+∆t;x,a˜t,β[˜at], zt+∆t;z,x) ε. ≥ β ∆(t) t − ∈  Z  e  e (2.23)

Note thata ˜t( ) A(t). It follows from (2.23) that · ∈ t+∆t V (x, z) sup inf G xτ , zτ ,aτ , β[a]τ dτ + V (xt+∆t;x,a,β[a], zt+∆t;z,x) ε ≥ β ∆(t) a( ) A(t) t − ∈ · ∈  Z  e  e (2.24)

Since ε is arbitrary, the proof is completed.

Remark 3 Note that the optimization of the pursuers over [t,t +∆T ) in (2.20) is

S taken over all possible strategies in A(t) instead of Ax,z(t). It indicates that this performance improvement results from the structure relaxation in the optimization13.

Theorem 1 states that V can be improved by the optimization based on limited look- ahead if V is associatede with a control structure of STC. The following analysis is based one this limited look-ahead formulation, where the function V on the right side of (2.20) is referred to as “cost-to-go” function. e

13We will further justify this observation by a specific example in Section 5.2.2 in Chapter 5.

28 2.4 Iterative Improvement Based on Limited Look-ahead

In this section, we discuss an iterative process based on a suboptimal solution

that results from structured controls. Denote by W a function such that W W , ∈ ψ : X ZM R . Let ∆t > 0 be a (small) look-ahead interval. Define map { × → } H : W W as → t+∆t H[W ](x, z) = sup inf G xτ , zτ ,aτ , β[a]τ dτ β ∆(t) a( ) A(t) t ∈ · ∈  Z  + W (xt+∆t;x,a,β[a], zt+∆t;z,x) subject to (2.12) and (2.9), (2.25)  where z = 0, W W and t +∆t = (t +∆t) T , min t +∆t, T 14. Note that 6 ∈ ∧ { } transformation H depends on ∆t; however, the dependence is suppressed here because the following study is independent of ∆t. In (2.25), a differential game problem with a shorter duration is solved with W as the terminal cost (or cost-to-go at time t +∆t).

Introduce a notationz ˜x ZM to indicate the evaders that are captured according ∈ to x X, i.e., the jth element ofz ˜x,z ˜x = 0 if P (xi ) P (xj) ε for some i, and ∈ j p − e ≤ z˜x = 1 otherwise. For any state x X, z ZM at time t, define zx as follows. j ∈ ∈

zx =z ˜x z for j = 1, , M (2.26) j j ∧ j ···

Due to the instantaneous state transition of z along the corresponding trajectory x defined in (2.9) and the integral in the objective (2.25), clearly,

x H[W ](xt, zt)= H[W ](xt, zt ) (2.27) for any x X and z ZM . t ∈ t ∈ 14Henceforth, without explicit indication, in an operation similar to H, the optimization is subject to the equations (2.12) and (2.9).

29 In the following, we focus on a subset of W , i.e.,

W , W W W, W (x, z) L for some L R; ≥ ∈ ≥ ∈ n W (x, z) H[W ](x, z) for any x X, z ZM , z = 0 . (2.28) ≥ ∈ ∈ 6 o The set W contains all the functions that are bounded from below and decreasing ≥

with respect to the transformation H. Given any W0 W , a sequence of functions ∈ ≥

W ∞ can be generated by W = H[W ]. By (2.27), { k}k=0 k+1 k

W (x , z )= W (x , zxt ) for each k Z+. (2.29) k t t k t t ∈

The following theorem shows that H[W ] W if W W , which implies that ∈ ≥ ∈ ≥ + Wk W for any k Z if W0 W . In addition, the sequence is decreasing so ∈ ≥ ∈ ∈ ≥ that it converges.

Theorem 2 (i) If W0(x, z) W , then the sequence Wk k∞=0 converges point- ∈ ≥ { } wisely; (ii) the limiting function W (x, z) , limk Wk(x, z) satisfies W (x, z) = ∞ →∞ ∞ H[W ](x, z). ∞

Lemma 1 Let f : X Y R and g : X Y R be two functions on X and Y . If × → × → f(x, y) g(x, y) for any x X, y Y , then ≤ ∈ ∈

sup inf f(x, y) sup inf g(x, y) , and y Y x X ≤ y Y x X ∈ ∈ ∈ ∈   inf sup f(x, y) inf sup g(x, y) . x X y Y ≤ x X y Y ∈ ∈ ∈ ∈   for any x X, y Y . ∈ ∈

30 Remark 4 The conclusion in Lemma 1 can be extended to a functional space.

+ Proof: [Proof of Theorem 2] (i) We first prove that Wk W for k Z by ∈ ≥ ∈ induction. Suppose that Wk W (k Z 0), and we want to show that Wk+1 W . ∈ ≥ ∈ ≥ ∈ ≥ By definition,

t+∆t Wk+1(x, z) = sup inf G xτ , zτ ,aτ , β[a]τ dτ β ∆(t) a( ) A(t) t ∈ · ∈  Z  + Wk(xt+∆t;x,a,β[a], zt+∆t;z,x)  M Note that Wk W implies Wk+1(x, z) Wk(x, z) for any x X, z Z . By ∈ ≥ ≤ ∈ ∈ Lemma 1,

t+∆t Wk+1(x, z) = sup inf G xτ , zτ ,aτ , β[a]τ dτ β ∆(t) a( ) A(t) t ∈ · ∈  Z  + Wk(xt+∆t;x,a,β[a], zt+∆t;z,x)  t+∆t sup inf G xτ , zτ ,aτ , β[a]τ dτ ≥ β ∆(t) a( ) A(t) t ∈ · ∈  Z  + Wk+1(xt+∆t;x,a,β[a], zt+∆t;z,x) = H[Wk+1](x, z) (2.30) 

By (2.30), clearly, Wk+1 = H[Wk] W . Since W0 W , by induction, Wk W ∈ ≥ ∈ ≥ ∈ ≥ + M for k Z . In addition, the sequence W ∞ is decreasing at any x X, z Z . ∈ { k}k=0 ∈ ∈

Therefore, W ∞ converges point-wisely due to the fact that W is bounded from { k}k=0 k + below for any k Z . Define W (x, z) , limk Wk(w, z). ∈ ∞ →∞ + (ii) For any k Z , Wk(x, z) W (x, z) and it follows that ∈ ≥ ∞ t+∆t

Wk(x, z) sup inf G xτ , zτ ,aτ , β[a]τ dτ + W (xt+∆t;x,a,β[a], zt+∆t;z,x) ≥ β ∆(t) a( ) A(t) t ∞ ∈ · ∈  Z   Let k , and then → ∞ t+∆t

W (x, z) sup inf G xτ , zτ ,aτ , β[a]τ dτ+W (xt+∆t;x,a,β[a], zt+∆t;z,x) ∞ ≥ β ∆(t) a( ) A(t) t ∞ ∈ · ∈  Z   (2.31)

31 On the other hand, for any k Z+, ∈ t+∆t

W (x, z) sup inf G xτ , zτ ,aτ , β[a]τ dτ + Wk(xt+∆t;x,a,β[a], zt+∆t;z,x) ∞ ≤ β ∆(t) a( ) A(t) t ∈ · ∈  Z   Similarly, let k , → ∞ t+∆t

W (x, z) sup inf G xτ , zτ ,aτ , β[a]τ dτ+W (xt+∆t;x,a,β[a], zt+∆t;z,x) ∞ ≤ β ∆(t) a( ) A(t) t ∞ ∈ · ∈  Z   (2.32)

From (2.31) and (2.32), W (x, z)= H[W ](x, z). ∞ ∞

Theorem 2 states that starting from some function W0 in W , a fixed point of ≥ the transformation H in space W can be approached by iteration of H. Recall ≥ that the suboptimal upper Value function V (x, z) determined from a ST-consistent

suboptimal control satisfies (2.20) and boundede from below. Therefore, V W . ∈ ≥

Let W0 = V , and the decreasing sequence Wk ∞ can be generated by H.e Each Wk { }k=0

in the sequencee is an upper-bound of V and Wk Wl for 0 l < k. In the following, ≤ ≤ we shall show that W = V if W0 = V . Here, we focus on the upper Value in (2.17), ∞ and a similar process can also be appliede to the lower Value V .

Lemma 2 The upper Value function V in (2.17) satisfies

tm T ∧ V (x, z) = sup inf G xτ , zτ ,aτ , β[a]τ dτ + V (xtm T ;x,a,β[a], ztm T ;z,x) β ∆(t) a( ) A(t) t ∧ ∧ ∈ · ∈  Z   (2.33)

for any t t, x X and z ZM . m ≥ ∈ ∈

Proof: This is the principle of DP. If t T , since m ≥

V (xT ;x,a,β[a], zT ;z,x)= Q(xT ),

the equality in (2.33) is trivial. If tm < T , the proof of Theorem 3.1 in [69] is valid.

32 Lemma 3 For a real-valued function W (x, z), if there exists ∆t > 0, such that for any x X and z ZM , ∈ ∈ t+∆t

W (x, z) = sup inf G xτ , zτ ,aτ , β[a]τ dτ + W (xt+∆t;x,a,β[a], zt+∆t;z,x) , β ∆(t) a( ) A(t) t ∈ · ∈  Z   (2.34)

then for any K Z+, ∈ t+K∆t W (x, z) = sup inf G xτ , zτ ,aτ , β[a]τ dτ β ∆(t) a( ) A(t) t ∈ · ∈  Z  + W (xt+K∆t;x,a,β[a], zt+K∆t;z,x) , (2.35)  for any x X and z ZM . ∈ ∈

Proof: Define W (x, z; λ) as

t+λ f W (x, z; λ) = sup inf G xτ , zτ ,aτ , β[a]τ dτ + W (xt+λ;x,a,β[a], zt+λ;z,x) . β ∆(t) a( ) A(t) t ∈ · ∈  Z   f (2.36) First, consider K = 2 and let λ = 2∆t. The equality in (2.35) reduces to (2.34) if

T t +∆t. If T >t +∆t, let t = t +∆t. By Lemma 2, ≤ m t+∆t W (x, z; 2∆t) = sup inf G xτ , zτ ,aτ , β[a]τ dτ β ∆(t) a( ) A(t) t ∈ · ∈  Z  f + W (xt+∆t;x,a,β[a], zt+∆t;z,x;∆t) . (2.37)  By (2.36) and the hypothesis in (2.34),f

W (xt+∆t;x,a,β[a], zt+∆t;z,x;∆t)= W (xt+∆t;x,a,β[a], zt+∆t;z,x).

Substitute it intof (2.37) and by (2.34),

t+∆t W (x, z; 2∆t) = sup inf G xτ , zτ ,aτ , β[a]τ dτ β ∆(t) a( ) A(t) t ∈ · ∈  Z  f + W (xt+∆t;x,a,β[a], zt+∆t;z,x) = W (x, z). (2.38)  33 Thus, (2.35) holds for K = 2.

Now, consider K = 3. When T t + 2∆t, ≤

W (x, z; 3∆t)= W (x, z; 2∆t)= W (x, z).

f f If T >t + 2∆t, choose tm = t + 2∆t, and by Lemma 2,

t+2∆t W (x, z; 3∆t) = sup inf G xτ , zτ ,aτ , β[a]τ dτ β ∆(t) a( ) A(t) t ∈ · ∈  Z  f + W (xt+2∆t;x,a,β[a], zt+2∆t;z,x;∆t) . (2.39)  Note that W (x, z;∆t) = W (x, zf) for any x X and z ZM . According to (2.38), ∈ ∈ W (x, z; 3∆t)= W (x, z). By induction, this statement is true for any K Z+. f ∈ f ∞ Theorem 3 Suppose that W0 = V . For the sequence Wk(x, z) k=1 with Wk+1 =  M H[Wk], its limit W (x, z) = limk e Wk(x, z)= V (x, z) for any x X, z Z . ∞ →∞ ∈ ∈

x Proof: It is trivial when z = 0, i.e., Wk(x, z) = W (x, z) = Q(x) = V (x, z) for ∞ + x k Z . Now, we consider z = 0. By Theorem 1, W0 = V W . Theorem 2 implies ∈ 6 ∈ ≥ ∞ that Wk(x, z) k=1 is decreasing and its limit satisfies e  t+∆t W (x, z) = sup inf G xτ , zτ ,aτ , β[a]τ dτ ∞ β ∆(t) a( ) A(t) t ∈ · ∈  Z  +W (xt+∆t;x,a,β[a], zt+∆t;z,x) . ∞  By Lemma 3,

t+K∆t W (x, z) = sup inf G xτ , zτ ,aτ , β[a]τ dτ ∞ β ∆(t) a( ) A(t) t ∈ · ∈  Z  +W (xt+K∆t;x,a,β[a], zt+K∆t;z,x) (2.40) ∞  for any K Z+. Furthermore, by Assumption 2, W < . Since G δ > 0 and ∈ 0 ∞ ≥ Q 0, the terminal time T of a game starting from any x, z and time t satisfies that ≥ 34 T t + W (x, z)/δ, which is finite. Therefore, there exists an integer K Z+ such ≤ 0 0 ∈ that W (x, z)/δ K ∆t, i.e., T t + K ∆t. Let K = K in (2.40), and then 0 ≤ 0 ≤ 0 0

t+K0∆t W (x, z) = sup inf G xτ , zτ ,aτ , β[a]τ dτ ∞ β ∆(t) a( ) A(t) t ∈ · ∈  Z  + W (xt+K ∆t;x,a,β[a], zt+K ∆t;z,x) ∞ 0 0  T = sup inf G xτ , zτ ,aτ , β[a]τ dτ + Q(xT ) = V (x, z) β ∆(t) a( ) A(t) t ∈ · ∈  Z   for any x X and z ZM . ∈ ∈ So far, we have provided a complete recipe for solving a deterministic multi-

player PE differential game. By iteration of the optimization over limited look-ahead

intervals based on a suboptimal solution determined by a ST-consistent control, the

dilemma of the conventional differential game theory in a multi-player PE game is

circumvented.

2.5 Finite Convergence on a Compact Set

In the previous section, we have shown that starting from certain initial subopti- mal upper Value, the real upper Value can be approached iteratively. In this section, we show that the convergence can be achieved in a finite number of steps on a com- pact subset of X. Let W0 = V , and then Wk(x, z) 0 for each k Z . Based on ≥ ∈ ≥ th the k function Wk(x, z) in thee sequence, we can define a level set

k , M R Ωr (x, z) X Z Wk(x, z) < r for some r 0. ∈ × ∈ ≥ n o

For now, we assume the continuity of Wk in x, and later in Section 2.7.2 (Theorem

6), we will provide sufficient conditions, under which Wk is continuous in x given

M k R z Z . If Wk is continuous, then Ωr is open. In addition, given any r 0, ∈ ∈ ≥ 35 k k k+1 Z the sets Ωr associated with different k are nested, i.e., Ωr Ωr for any k 0. ⊆ ∈ ≥ k This is because that the sequence is decreasing. Define Ω = ∞ Ω and clearly, r ∪k=0 r Ω = (x, z) X ZM V (x, z) < r . r ∈ × n o + Lemma 4 For any r R 0, there exists N Z such that for any (x, z) Ωr, ∈ ≥ ∈ ∈ W (x, z)= V (x, z) for any n N, and thus Ω = Ωn. n ≥ r r

Proof: It is trivial if Ω = ∅. Hence, we only consider Ω = ∅. Since G δ > 0, r r 6 ≥

given r R 0, for any (x, z) Ωr, the game terminates by a finite time T under ∈ ≥ ∈ certain strategy of the players. In addition, T t + r/δ. Next, we shall show ≤ that for any state (x, z) X ZM , if the game ends by a finite time T , the sequence ∈ × ∞ Wk(x, z) k=0 converges to the upper Value in a finite number of steps. First, consider r = δ ∆t . For any (x, z) Ω , the game ends by ∆t and 1 · ∈ r1 t+∆t W1(x, z) = sup inf G xτ , zτ ,aτ , β[a]τ dτ β ∆(t) a( ) A(t) t ∈ · ∈  Z  + W0(xt+∆t;x,a,β[a], zt+∆t;z,x)  T = sup inf G xτ , zτ ,aτ , β[a]τ dτ + Q(xT ) β ∆(t) a( ) A(t) t ∈ · ∈  Z   = V (x, z) r . (2.41) ≤ 1

The equality above is because that W0(x, z) = V (x, z) = Q(x) at the terminal of a game for any x X, z ZM with zx = 0. Thus, (x, z) Ω1 , which implies ∈ ∈ ∈ r1 1 1 1 Ωr Ω , and due to the nested property, i.e., Ω Ωr , then Ωr = Ω . Next, we 1 ⊆ r1 r1 ⊆ 1 1 r1 consider Ω for r = δ 2∆t. For any (x, z) Ω , by definition, r2 2 · ∈ r2 t+∆t W2(x, z) = sup inf G xτ , zτ ,aτ , β[a]τ dτ β ∆(t) a( ) A(t) t ∈ · ∈  Z  + W1(xt+∆t;x,a,β[a], zt+∆t;z,x) (2.42)  36 If t +∆t T , (2.42) reduces to (2.41), i.e., ≤ T W2(x, z) = sup inf G xτ , zτ ,aτ , β[a]τ dτ + Q(xT ) = V (x, z) r2. β ∆(t) a( ) A(t) t ≤ ∈ · ∈  Z   If t +∆t > T , by the principle of DP,

t+∆t W2(x, z) = sup inf G xτ , zτ ,aτ , β[a]τ dτ β ∆(t) a( ) A(t) t ∈ · ∈  Z  + W (x , z ) = V (x, z) r . (2.43) 1 t+∆t;x,a,β[a] t+∆t;z,x ≤ 2  In (2.43), the second equality is because the game must end by 2∆t, and thus

W1(xt+∆t;x,a,β[a], zt+∆t;z,x)= V (xt+∆t;x,a,β[a], zt+∆t;z,x).

2 2 Therefore, Ωr Ω and then Ωr = Ω . This process can be continued for N steps 2 ⊆ r2 2 r2 until N r/(δ ∆t), such that for any (x, z) Ω , W (x, z)= V (x, z), which implies ≥ · ∈ r N Ω = ΩN . This is also true for any n N. r r ≥ R Now, construct any sequence rk k∞=1 with rk 0, rk < rk+1 and limk rk = { } ∈ ≥ →∞

. Clearly, Ω Ω . By Assumption 2, X = Ω , ∞ Ω . The following ∞ rk ⊆ rk+1 ∪k=1 rk + theorem shows that if W is continuous in x for any k Z , the W ∞ converges k ∈ { k}k=1 to V in a finite number of steps on any compact subset Θ of X, and the limiting

function W = V is continuous in x. ∞

Theorem 4 Suppose that W (x, z) is continuous in x for any k Z+. If Θ is a k ∈ compact subset of X, then

(i) there exists N Z 0 such that V (x, z)= Wn(x, z) for any n N and (x, z) Θ; ∈ ≥ ≥ ∈ (ii) V (x, z) is continuous in x on Θ.

Proof: (i) Since Θ is compact and W is continuous, there exists (x , z ) Θ such 0 0 0 ∈ that W0(x0, z0) = sup(x,z) Θ W0(x0, z0) = r0. Note that V (x, z) W0(x0, z0) = r0 ∈ ≤  37 for (x, z) Θ Ω . By Lemma 4, there exists positive integer N Z+ such that ∈ ⊆ r0 ∈ V (x, z) = W (x, z) for any n N and (x, z) Θ. (ii) The continuity of W implies n ≥ ∈ k the continuity of V on Θ.

2.6 The Hierarchy Approach

With the iterative method introduced earlier, a multi-player PE game reduces to a problem of finding a suboptimal upper Value with the improving property. In this section, a hierarchical approach of STC is proposed, and the resulting suboptimal solution is a valid starting point for the iterative process. We focus on a class of games with the players’ dynamics in (2.12) with (2.9) and the following objective.

T M J(a, b; x, z)= zj(t) dt (2.44) t j=1 Z  X  The objective in (2.44) stands for the sum of the capture time of each evader.

To avoid the difficulty of determining the terminal states in a multi-player PE game and that from discrete z, we use a two-level approach [70]. The upper level

is to determine a proper engagement scheme between the pursuers and the evaders,

such that a multi-player PE game can be decomposed into distributed two-player

PE games. At the lower level, the decoupled two-player games are solved. Suppose

that each evader can be engaged with no more than one pursuer; and each pursuer

may be engaged with more than one evader, in which case the two-player games

involving different evaders and the same pursuer are solved sequentially. The evaders

that are captured in later stages are assumed to know the strategy that the pursuer

exploits in chasing previous evaders. Clearly, this hierarchical approach depends on

the solvability of the underlying two-player PE games and we have the following

assumption.

38 Assumption 3 Each evader j can be captured by at least one pursuer ij. In this

case, we say that evader j is capturable by pursuer ij.

Suppose that the two-player PE game between evader j and pursuer ij is solvable. We T ij jT T denote the upper Value of the two-player game by V ij j(xij j) with xij j = xp , xe .

The detailed procedure of the approach is described as follows.  

2.6.1 Two-player Pursuit-Evasion Games at the Lower Level

We start with the lower level, where the problem is to solve the upper Value

function of series of two-player PE games between pursuer i (i = 1, ,N) and ··· T evader j (j = 1, , M). Let x = xi T, xjT . The upper Value is ··· ij p e   T V ij(xij) = sup inf J(ai, βj; xij) = sup inf dτ . (2.45) i i j ai( ) A (t) j ai( ) A (t) βj ∆ (t) βj ∆ (t) t ∈ i · ∈ n o ∈ i · ∈  Z  j In (2.45), the strategy set ∆i (t) of evader j is defined similarly as ∆(t), i.e.,

∆j(t)= β : Ai(t) Bj(t) β is nonanticipative . i j → | j  In the two-player game, if pursuer i cannot catch evader j at x , then V (x )= . ij ij ij ∞ 2.6.2 Combinatorial Optimization at the Upper Level

Denote by E = ei , ,ei the ordered set of the evaders that are engaged i 1 ··· ni with pursuer i (i = 1, ,N). Denote by the E cardinal number of set E . If ··· | i| i E = ∅, E = n = 0. If E is determined, pursuer i goes after the evaders in E i | i| i i i i th i sequentially. Denote by Tk the capture time of the k evader in Ei. Let xp[k] be

th pursuer i’s state at the time when it is ready to go after the k evader in Ei, i.e.,

i i i j xp[k]= xp(Tk 1) , and similarly for xe[k], the state of evader j. Let the game start at − th time 0. Suppose that the k evader in Ei exploits the best control against pursuer

39 T i i T jT i from time 0 to Tk 1. Define xij[k] = xp [k], xe [k] . The problem at the upper − h i level is to determine an optimal engagement scheme between the pursuers and the

evaders, and the corresponding optimization problem can be formulated as

N K M h V (x, z)= min V ij(xij[k]) (2.46) σijk { } i=1 k j=1  X X=1 X  e N K M subject to σ 0, 1 , σ = 1, σ 1 and ijk ∈ { } ijk ijk ≤ i=1 k j=1 X X=1 X σ σ for l = 1, ,K 1. ij(l+1) ≤ ijl ··· − h In (2.46), the superscript h in V indicates that the suboptimal upper Value is deter-

th mined by the hierarchical approach;e k is the index of stages, i.e., the k evader in Ei

(if any) associated with pursuer i and K is the maximum number of evaders allowed

to be engaged with one pursuer (K M); the assignment variables σ ’s are binary, ≤ ijk th where σijk = 1 indicates that pursuer i is engaged with evader j as the k evader

in E and σ = 0 if it is not. Denote by E , σ an engagement scheme. To i ijk { ijk} emphasize, the problem in (2.46) is to find proper engagement E such that the sum of the capture time of all evaders is minimized under the hierarchical structure. By h Assumption 3, V (x, z) < for any x X and z ZM 15. ∞ ∈ ∈

Note that thee formulation in (2.46) is based on an implicit assumption that V ij is

known from the lower level for any pursuer i and evader j. Clearly, in this approach, a hierarchical structure is imposed on the pursuers’ controls. More importantly, the best strategy of the evaders is determined “locally” by each evader (at the lower level) against its engaged pursuer at proper stages. It is not hard to show that this (composite) strategy by the evaders is optimal with respect to the optimization problem in (2.19).

15Assumption 3 implies that Assumption 2 holds in the hierarchical approach.

40 Remark 5 Suppose that Assumption 3 holds between any pursuer i and evader j.

If N M and K = 1, the optimization problem in (2.46) can be formulated as a ≥ standard Mixed Integer Linear Programming problem [70], such that the commercial solvers such as CPLEX and LINDO can be utilized [71, 72].

Optimization on Engagement Upper level

Lower level

Two-Player Two-Player Two-Player PE Game PE Game …… PE Game

Figure 2.1: Illustration of a Suboptimal Approach with Hierarchy

The hierarchical approach introduced above is illustrated in Figure 2.1. This

method is often used in complex systems, such as problem of cooperative control

of multiple autonomous mobiles [71, 73, 72]. Proper abstraction of coordination is

used at upper levels to decompose the problem into simpler sub-problems that are

easily solved at lower levels. In multi-player PE game problems, simplification results

from an additional hierarchical structure S imposed on the pursuers’ controls. More importantly, the hierarchical approach is ST-consistent because the “structure” (the set of possible engagements between the pursuers and the “alive” evaders) remains same at any state x and time given any z ZM . Therefore, the resulting suboptimal ∈ upper Value is valid starting point for the iterative approach.

41 2.6.3 Uniform Continuity of the Suboptimal Upper Value

It is shown in Section 2.5 that the finite convergence on compact sets depends on the continuity of the suboptimal upper Value functions. In the section, we study h the (uniform) continuity of V that is obtained from the hierarchical approach. Note that the hierarchical approache to a multi-player PE game depends on the solvability of simplified two-player games, and intuitively the continuity of the suboptimal upper

Value also depends on the upper Value of a two-player game. First of all, we use the following definition to classify certain two-player PE games.

Definition 3 (Strong Capturability) In a two-player PE game, the evader is said to be strongly capturable by the pursuer if for any x Rnp , x Rne , 1) there p ∈ e ∈ exists a pursuer’s control a ( ) A(t) such that the capture time T ′(x , x ; a , b ) < p · ∈ p e p e ∞ for any b ( ) B(t), where T ′(x , x ; a , b ) depends on states x , x and strategies a e · ∈ p e p e p e p and be; 2) Define the optimal capture time T (xp, xe) as

T (xp, xe) = sup inf T ′(xp, xe; ap, βe[ap]). βe ∆(t) ap( ) A(t) ∈ · ∈

T T For any ε > 0, there exists δ > 0 such that for any x˜T, x˜T xT, xT < δ, e p − e p

T (˜x , x˜ ) T (x , x ) <ε 16.     p e − p e

Remark 6 (i) Strong capturability requires the (uniform) continuity of T (xp, xe) in xp and xe. Refer to [9] for more details (e.g., Theorem 1.12 on pp.231).

16 Here, xp and xe are the state variable of the pursuer and the evader respectively; ap and be are the control inputs; βe is a nonanticipative strategy of the evader.

42 (ii) If the evader is strongly capturable by the pursuer in a two-player PE game, then

T T for any ε> 0, there exists δ > 0 such that for any x˜T, x˜T xT, xT < δ, there e p − e p

exists a control a ( ) A(t) such that T ′(x , x ; a , b ) T ′(˜x , x˜ ; a , b ) ε for any p · ∈ | p e p e − p e p e | ≤ b ( ) B(t). e · ∈

Lemma 5 Suppose that M N and K = 1 in (2.46). If every pursuer i has the ≤ h strong capturability of every evader j, then V (x, z) in (2.46) is uniformly continuous

in x. e

Proof: Consider any x0, x1 X, z ZM . Let E(x,z) be the optimal engagements ∈ ∈ at x and z according to (2.46). Denote by J h(x, z; E) the value of the cost evaluated

similarly as in (2.46) under engagement E at x and z. Clearly, J h(xl, z; E(xl,z)) = h V (xl, z) for l 0, 1 . By the strong capturability, the upper Value V defined in ∈ { } ij E (2.45)e is uniformly continuous. Suppose that evader j is engaged with pursuer ij under E. By the continuity of V ij in (2.45), given any ε > 0, there exists δ(E) > 0 such that

0 1 0 1 V iE j(xiE j) V iE j(xiE j) ε/M, for any xiE j xiE j δ(E) and j = 1, , M. j j − j j ≤ j − j ≤ ···

(2.47)

Considering that there are finitely many engagements of E, we choose δ = min δ(E) E{ } (uniform in x). Thus, equation (2.47) holds for x0 x1 δ under any E. − ≤ (x0,z) (x1,z) 0 E Let E0 = E and E1 = E . For E1, we rewrite (2.47) as V i 1 j(x E1 ) j ij j ≤ 1 E E V i 1 j(x E1 )+ ε/M for all j. Take summation of V i 1 j over all j, j ij j j

M M h h 0 h 0 0 1 1 V (x , z) J (x , z; E )= V E (x E ) V E (x E )+ ε = V (x , z)+ ε 1 i 1 j i 1 j i 1 j i 1 j ≤ j j ≤ j j j=1 j=1 e X X e (2.48)

43 On the other hand, we can obtain

M M h h 1 h 1 1 0 0 V (x , z) J (x , z; E )= V E (x E ) V E (x E )+ ε = V (x , z)+ ε 0 i 0 j i 0 j i 0 j i 0 j ≤ j j ≤ j j j=1 j=1 e X X e (2.49) h By(2.48) and (2.49), V is continuous in x. Moreover, since δ does not depend on x,

the continuity is uniform.e

Theorem 5 Suppose that if evader j is capturable by pursuer i at the current state x, then evader j is strongly capturable by pursuer i. The suboptimal upper Value h V (x, z) in (2.46) is bounded and uniformly continuous in x on any bounded region

Ωe X. ⊆

Proof: See Appendix A for a detailed proof.

Remark 7 Theorem 5 differs from Lemma 5 in that the singular engagement con-

straint imposed on the pursuers is relaxed.

2.7 Theoretical Foundation of the Iterative Method

In this section, we study the theoretical soundness of the iterative method. The

study is conducted under the framework of viscosity solution theory of HJ equa-

tions. We provide sufficient conditions for a class of PE games under which the Value

function (a saddle-point) exists.

2.7.1 Viscosity Solution of Hamilton-Jacobi Equations

We first introduce the concept of viscosity solution of a HJ equation in the context

of differential games. Before the invention of this (viscosity solution) concept, the

applicability of DP methods was limited due to the requirement of differentiability

44 of a Value function. However, even for a simple differential game (optimal control) problem, the Value function may not be differentiable, or at least not everywhere.

To overcome this difficulty, Crandall and Lions initiated the study of the viscosity solution for HJ equations [43], with which the Value function of a differential game

(optimal control) problem that is merely continuous or even discontinuous can be characterized by a viscosity solution of the corresponding HJI (HJB) equation [9].

We first consider a (two-player) differential game problem with the following dy- namic equation of the players.

x˙ τ = f(xτ ,aτ , bτ ) with x(t)= xt.

Here, x Rn; player 1’s (minimizer) control a and player 2’s (maximizer) control b ∈ are maps from [t, T ] to compact sets A and B , where 0

T J(a, b; t, x)= g(xτ ,aτ , bτ )dτ + q x(T ) , (2.50) Zt  where g : Rn A B R is bounded and Lipschitz in x as well as q : Rn R. Here, × a× a 7→ 7→ we adopt the definitions of (nonanticipative) strategy and the (upper/lower) Value function. The (upper/lower) Value can be shown as the unique viscosity solution of the corresponding HJI equation [69, 9]. First of all, viscosity solution of a HJ equation is defined as follows.

45 Definition 4 Suppose T is fixed. Let H : [0, T ] Rn Rn R be a continuous map. × × 7→ (i) A continuous function u : [0, T ] Rn R is called a viscosity subsolution × 7→ of the Hamilton-Jacobi (HJ) equation

∂ψ ∂ψ + H(t, x, ) = 0 on (0, T ) Rn with ψ(T, x)= q(x), (2.51) ∂t ∂x ×

provided that for any continuously differentiable function φ : (0, T ) Rn R, if u φ × 7→ − attains a local maximum at (t , x ) (0, T ) Rn, then 0 0 ∈ × ∂φ ∂φ (t , x )+ H(t , x , (x ,t )) 0 (2.52) ∂t 0 0 0 0 ∂x 0 0 ≥

(ii) A continuous function u : [0, T ] Rn R is called a viscosity super- × 7→ solution of the HJ equation (2.51) provided that for any continuously differentiable

function φ : (0, T ) Rn R, if u φ attains a local maximum at (t , x ) (0, T ) Rn, × 7→ − 0 0 ∈ × then ∂φ ∂φ (t , x )+ H(t , x , (x ,t )) 0 (2.53) ∂t 0 0 0 0 ∂x 0 0 ≤ (iii) A continuous function u : [0, T ] Rn R is called a viscosity solution of × 7→ the HJ equation (2.51) if it is both a viscosity subsolution and a viscosity supersolution.

Remark 8 Definition 4 is based on the case when T is fixed, and an initial condition

at time T is provided in (2.51). For a problem with a flexible terminal time as a PE

game, the definition is same except that an additional boundary condition on ∂Λ (the

boundary of the terminal set Λ) is needed.

Now, we define

H−(x, p) , max min f(x, a, b) p + g(x, a, b) and (2.54) b Ba a Aa · ∈ ∈ n o H+(x, p) , min max f(x, a, b) p + g(x, a, b) , (2.55) a Aa b Ba · ∈ ∈ n o 46 for x, p Rn. In [69], Evans and Souganidis proved that the lower (resp. upper) ∈ Value of a differential game formulated above with the objective functional in (2.50)

(where T is fixed) is the (unique) viscosity solution of the corresponding HJI equation

+ as in (2.51) with H replaced by H− in (2.54) (resp. H in (2.55)). For problems with

flexible T , refer to [9] and [45].

2.7.2 Existence of the Value Function

Transformation H is crucial for the iterative method based on limited look-ahead for deterministic PE games. In this section, we aim to show that under H, the image of W (k Z+) defined in (2.25) is a viscosity solution of the corresponding HJI k ∈ equation to be formulated.

Uniform Continuity of Function Wk

We first show the uniform continuity of function W (k Z+) in the sequence k ∈ generated by H starting from V . We start with the definition of set Λz. Given any

z ZM (z = 0), define e ∈ 6

Λ = x X j I such that P (xi ) P (xj) ε for some i = 1, ,N . z ∈ ∃ ∈ z p − e ≤ ··· n o (2.56) Here, recall the definition of I in (2.10). Clearly, the set Λ is closed. Let Λc = X Λ . z z z − z

Lemma 6 For the sequence W (x, z) ∞ generated by H with W = V , W (k k k=0 0 k ∈  Z 0) is (uniformly) continuous in x on X if and only if Wk is (uniformly)e continuous ≥ in x on Λc for any z ZM . Here, the set Λc is the closure of the set Λc. z ∈ z z

Proof: Recall that the closed set Λz in (2.56) as

Λ = x X j I such that P (xi ) P (xj) ε for some i = 1, ,N . z ∈ ∃ ∈ z p − e ≤ ··· n o 47 The necessity is trivial due to Λc X. We only need to show the sufficiency. First, z ⊆ when z = 0, the Wk is continuous because Q(x)= Wk(x, z) is (uniform) continuous.

Now, consider z = 0. Suppose that W is continuous in x on Λc for any z ZM . 6 k z ∈ c Considering the fact that Λz is open such that by the property in (2.29), i.e.,

x Wk(xt, zt)= Wk(xt, zt ),

the continuity of W on Λc int(Λ ) (interior) can be easily proved. Note that k z∪ z

c c ′ ′ Λz int(Λz) z Zz Λz . (2.57) ∪ ⊆∪ ∈

Refer to (2.10) on page 20 for the definition of set Zz. Thus, we only need to show

the continuity on the boundary ∂Λz.

Let z = m, where m = 1, , M. For any x ∂Λ , by the definition of set | | ··· 0 ∈ z Λ , it is possible that zx0 = m 1,m 2, , 0. Refer to (2.26) on page 29 for the z − − ··· definition of zx. We first consider zx0 = m 1. In this case, there exists δ > 0 such − 1 that for any x B(x ; δ ) (open ball), zx must be one of the following cases: zx = z ∈ 0 1

x x0 c or z = z . By the continuity of Wk on Λz, given any ε> 0, there exists δ2 > 0 such

that for any x B(x ; δ δ ) with zx = z, W (x, z) W (x , z) <ε. Note that in ∈ 0 1 ∧ 2 k − k 0

this case, x ∂Λ x . Similarly, there exists δ > 0 such that for any x B(x ; δ δ ) 0 ∈ z 3 ∈ 0 1 ∧ 3 with zx = zx0 ,

W (x, z) W (x , z) = W (x, zx0 ) W (x , zx0 ) <ε k − k 0 k − k 0

c because x, x Λ x . Let δ = δ δ δ . Then, W (x, z) W (x , z) <ε for any 0 ∈ z 1 ∧ 2 ∧ 3 k − k 0 x B(x ; δ). If zx0 = m 2, there exists δ > 0 such that for any x B(x ; δ ), zx ∈ 0 − 1 ∈ 0 1 x x x x must be one of the following: z = z, z = m 1 with z Z and z 0 Z x or | | − ∈ z ∈ z zx = zx0 . Similarly as in the case when zx = m 1, the continuity can be proved | | − 48 in each of the different cases. In particular, in the first two cases, x Λ x , and in 0 ∈ z c the last case x Λ x , in which the continuity can be easily proved. This can be 0 ∈ z 0 extended to other cases, such that is it true for zx0 = m 1,m 2, , 0. Finally, | | − − ··· the conclusion is true for uniform continuity.

Next, we show the uniform continuity of Wk in x given the uniform continuity of

W0. Let us first introduce some necessary notations. Recall that the terminal time

T of a PE game can be defined as T = min t z(t) = 0 . Suppose that z(0) = M 17. { | } | | There exist M time instants T m with T m , min t z(t) = m for m = M 1, M { | | } − − 2, , 0. Clearly, T 0 = T . We assume that when z (t) = m and if n (n> 1) evaders ··· | | m 1 m n are captured at the same time, we assume that T − = = T − . ···

Theorem 6 If every evader is strongly capturable by every pursuer, and W0 = V is

M uniformly continuous in x on X for any z Z , Wk(x, z) is uniformly continuouse ∈ in x for any k Z+. ∈

Proof: Refer to Appendix B for a detailed proof.

h Remark 9 Note that the uniform continuity of the suboptimal upper Value V by the hierarchal approach is proved in Theorem 5. Under the conditions in Theoreme 5 and

Theorem 6, Wk is uniformly continuous in x.

Existence of the Value Function

In the following, we show that the image of transformation H in the iteration

process is the (unique) viscosity solution of the corresponding HJI equation.

17The game starts at time 0.

49 First of all, we formulate a series of proper HJI equations according to different z’s with proper boundary conditions. In this way, the difficulty of the discrete variable z in a multi-player PE game can be circumvented, such that the viscosity theory is applicable. Since the players’ dynamics are time invariant, we consider time t = 0 as the lower limit of the integral in H. Define

∆t

HWk (t, x, z) = sup inf G(xτ , zτ ,aτ , β[a]τ )dτ + Wk x∆t;x,a,β[a], z∆t;z,x β ∆(t) a( ) A(t) t ∈ · ∈ (Z ) (2.58)

for 0 t ∆t. Note that H is a function of time t, and clearly, H (0, x, z) = ≤ ≤ Wk Wk H[W ](x, z) for all x X and z ZM . Given a function W and for any z ZM k ∈ ∈ k ∈ (z = 0), the corresponding HJI equation is 6 ∂uz ∂uz + H+ x, = 0, with (2.59) ∂t z ∂x   uz(∆t, x)= W (x, z) for x Λc, k ∈ z x uz(τ, x)= uz (∆t τ, x ) for τ [0, ∆t) if x ∂Λ . − τ ∈ ∈ z

+ In (2.59), Hz is defined as

+ Hz (x, p) = min max p f(x, a, b)+ G(x, z, a, b) . a Aa b Ba ∈ ∈ ·  zx The boundary condition of the equation on ∂Λz is specified by the solution (u ) of the corresponding HJI equation according to zx, which is defined for x ∂Λ according ∈ z to (2.26). This boundary condition is due to the observation in (2.29). If zxτ = 0, the

boundary condition becomes u0(∆t τ, x ) = Q(x ). The following theorem shows − τ τ

that HWk in (2.58) with a given z is a viscosity solution of the corresponding HJI

equation in (2.59) on Λc. If x Λ , H (t, x, z) can be transferred to H (t, x, zx) z ∈ z Wk Wk by (2.29).

50 Theorem 7 Suppose that W is uniformly continuous. For any z ZM (z = 0), k ∈ 6 c HWk (t, x, z) is a viscosity solution of the HJI equation in (2.59) on Λz.

Proof: We use Definition 4 to prove the theorem. Consider an arbitrary φ z ∈ C1 (0, ∆t) Λc . Suppose that H (t, x, z) φ (t, x) attains a local minimum at × z Wk − z (t , x ) (0, ∆t) Λc , i.e., 0 0 ∈ × z  H (t , x , z) H (t, x, z) φ (t , x ) φ (t, x) Wk 0 0 − Wk ≤ z 0 0 − z for all x x δ and some δ > 0. Since Λc is open, there exists a δ > 0 such that k 0 − k ≤ z x Λc when x x δ. Due to the boundedness of the dynamics f, there exists ∈ z k 0 − k ≤ λ > 0 such that x x < δ for 0 <γ<λ. Since viscosity solution is a t0+γ;x0,a,b − 0

local property, the rest of the proof follows that of Theorem 4.1 in [69].

Next, we show that HWk is the unique viscosity solution of (2.59). Suppose that the pursuers and the evaders have independent dynamics, which are described by x˙pt = fp(xpt,at) andx ˙et = fe(xet, bt) respectively. We assume that the following is true.

f (x , σ) C x + 1 for all x Rnξ , σ Σ and some constant C > 0, ξ ξ ≤ | ξ| ξ ∈ ∈  (2.60) where the subscript ξ p,e stands for the pursuers or the evaders; σ Σ is the ∈ { } ∈ control input and Σ A , B is the set of admissible controls. The uniqueness of ∈ { a a} the viscosity solution of a general HJ equation is provided by Theorem V.2 in [43].

Theorem 8 Suppose that function f satisfies (2.4) and (2.60) (if considered as fp

and fe separately in (2.5)), and the cost rate G satisfies (2.15). Function HWk (t, x, z) is the unique viscosity solution of the HJI equation in (2.59) on any bounded subset

c of Λz.

51 Proof: The proof is a simple application of Theorem V.2 in [43]. It can be

easily checked that all the conditions therein are satisfied. Particularly, the condition

(2.60)18 yields the following equality.

+ + lim sup Hz (x1,p) Hz (x2,p) x1 x2 (1 + p ) r = 0 r 0 − | − | k k ≤ → n o

Therefore, the difference (by infinite-norm) between any two viscosity sup- or sub- solutions on the domain can be bounded by the difference between them on the

boundary. Since the viscosity solution HWk (t, x, z) satisfies the boundary condition

in (2.59), they must be same, and HWk (t, x, z) is the unique viscosity solution of

(2.59).

With the uniqueness of the viscosity solution for (2.59), we can show that the

Value of a multi-player PE game exists under additional conditions. Given a function

W W , define a transformation H similar to H in (2.25) as ∈ t+∆t H[W ](x, z) = inf sup G xτ , zτ , α[b]τ , bτ dτ α Γ(t) b( ) B(t) t ∈ · ∈  Z  + W (xt+∆t;x,α[b],b, zt+∆t;z,x) .  Define the set W as ≥

W , W W (x, z) L for some L R ≥ | ≥ ∈  M and W (x, z) H[W ](x, z), x X, z Z . (2.61) ≥ ∈ ∈

Given any W W (refer to (2.28) for the definition), ∈ ≥

W (x, z) H[W ](x, z) H[W ](x, z), (2.62) ≥ ≥ 18Refer to Theorem 3.17 on page 159 in [9] for a detailed proof of the equality of the limit.

52 which implies that W W . By inspection of the definition of H and H, the ≥ ⊆ ≥ second inequality in (2.62) holds because of the fact that the lower Value of a game

is no larger than the upper Value of the same game, i.e., V V (c.f. pp.434 in [9]). ≥

Starting from the same function W0 = V W , it can be shown similarly that the ∈ ≥

sequence W ′ ∞ generated by W ′ =eH[W ′ ] converges to the lower Value defined { k}k=0 k+1 k in (2.16). The corresponding HJI equation is

∂uz ∂uz + H− x, = 0 (2.63) ∂t z ∂x  

with the same boundary conditions as in (2.59), where Hz− is defined as

Hz−(x, p) = max min p f(x, a, b)+ G(x, z, a, b) . b B a A ∈ ∈ ·  + M If H− = H for any z Z , i.e., z z ∈

max min p f(x, a, b)+G(x, z, a, b) = min max p f(x, a, b)+G(x, z, a, b) , (2.64) b Ba a Aa a Aa b Ba ∈ ∈ · ∈ ∈ ·   equation (2.63) coincides with (2.59) for any z ZM . By the uniqueness of the ∈ + viscosity solution, W ′ = W for any k Z , which implies that V = V . Therefore, k k ∈ the Isaacs condition holds and the Value function of the multi-player PE game exists.

According to equation (2.64), the existence of the Value of a multi-player PE dif-

ferential game reduces to the existence of saddle-point equilibrium of a corresponding

static zero-sum game. This problem was originally studied by von Neumann and has

been extensively studied afterwards. Readers can refer to the Minimax theorem for

more details. The following Theorem provides a simple case when (2.64) holds.

Theorem 9 If function f(x, a, b) : Rn A B Rn and G(x, z, a, b) : Rn ZM × a × a 7→ × × A B Rn are separable, i.e., a × a 7→

f(x, a, b)= fa(x, a)+ fb(x, b) and G(x, z, a, b)= Ga(x,z,a)+ Gb(x, z, b), (2.65)

53 then for any p Rn, ∈ min max pTf(x, a, b)+ G(x, z, a, b) = max min pTf(x, a, b)+ G(x, z, a, b) (2.66) a Aa b Ba b Ba a Aa ∈ ∈ { } ∈ ∈ { } for any x Rn, a A and b B . ∈ ∈ a ∈ a

Refer to [9] on pp.444 for a proof.

In a multi-player PE game, the separation property of f holds because the pur-

suers and the evaders have independent dynamics. Thus, a separable G implies the

existence of Value, e.g., G = 1 when the capture time is the objective functional. In

summary, we have shown the existence of solutions for the transformation H and the existence of Value under the framework of the viscosity solution theory.

2.8 Summary and Discussion

In this chapter, we propose an indirect method for multi-player PE games, on

which the conventional differential game theory is not applicable. We start from a

suboptimal upper Value of a PE game, where only subset of the pursuers’ controls un-

der certain structure S is considered. If the structure S satisfies STC, the suboptimal

upper Value has an improving property, i.e., it can be improved by the optimization

based on limited look-ahead. The performance enhancement here results from the

structure relaxation. If this performance enhancement is applied iteratively, a true

upper Value is approached in the limit. Furthermore, we show that a control structure

S that is independent of the state and time is ST-consistent, such that a suboptimal

solution under S can be improved by limited look-ahead. The hierarchical method

provide a valid starting point for the iterative method.

The iterative method can be viewed as an extension of the DP-type methods to

multi-player PE differential games. The related DP methods in the literature include

54 value iteration in Approximate DP (ADP) [52, 55], reinforcement learning [74] and rollout algorithms [55]. All these methods are aimed to improve an existing subop- timal solution based on some approximate strategies. The first two methods mainly deal with MDP problems on finite state spaces [52, 75]. Note that MDP is a discrete- time optimal control problem with an infinite horizon, such that a suboptimal Value function or state feedback strategies can be easily updated by the optimization based on look-ahead. However, a multi-player PE game has a finite horizon, and the improv- ing property of an suboptimal Value function is not obvious with “receding horizons” in contrast to MDP problems. From this aspect, the iterative method is more related to the rollout algorithm for stochastic control problems. The rollout algorithm is a suboptimal control method, where the control input is calculated by optimization over look-ahead intervals based on some (heuristic) base strategy [54, 55]. The perfor- mance of the base strategy can be improved by the rollout if the base strategy satisfies

“sequential consistency” [55]. However, the “sequential consistency” in [55] requires that given a trajectory generated by an existing base strategy, starting from any state along the trajectory, the base strategy generates the remaining state trajectory. Since the consistency is established along a particular trajectory, it may fail in stochastic problems. In our iterative method, STC is a property of the “base strategy”, which does not depend particular trajectories. In this sense, STC is more general and can be extended to stochastic situations. Furthermore, the hierarchical approach is ST- consistent and also useful in many large-scale problems. Finally, the DP methods mentioned above deal with discrete-time problems of finite dimensions; however, we are more interested in continuous-time problems in a (infinite dimensional) function space, such that existence of solutions may be an issue.

55 This dissertation is the first attempt to solve a general multi-player PE differen-

tial game and although we have demonstrated that the iterative method is feasible

on such problems, it is worth mentioning that the iterative method here suffers from

the so-called “curse of dimensionality” in all DP related methods. A practical algo-

rithm needs further study. To this end, those extensively studied numerical methods

for solving DP equations may benefit the implementation of the iterative method in

many ways. First of all, a continuous-time problem can be approximated by a corre-

sponding discrete DP equation [53] and [9](Appendix A). For a discrete-time problem,

numerical methods such as ADP and reinforcement learning have been studied for

decades by researchers in control, operations research and artificial intelligence. There

are plenty of results such as the value function approximation and real-time DP etc,

which can be useful for implementation of the iterative method. We refer the reader

to [75, 76, 74] for good reviews and recent development in that area.

Besides, the iterative method has its practical value in improving suboptimal solutions. First, it can be used to determine a satisfactory solution rather than an optimal solution, and the process can stop at any step to provide the best upper-bound of the upper Value because the sequence of functions is decreasing. On the other hand, from the look-ahead point of view, the performance of a suboptimal strategy under a ST-consistent structured can be improved along the pursuit trajectories when the strategies by the optimization over look-ahead intervals are exploited at t + k∆t for k Z+ [55, 77]. In practice, such a approach based on a carefully chosen cost- ∈ to-go function can usually perform very well. Finally, one way of simplification is

to implement the iterative improvement only on of the pursuers and the

evaders that closely interact with one another.

56 In summary, this iterative method may suggest potential uses in other research areas. It can be applicable to those complex dynamic optimization problems such as cooperative control problems including search, routing and allocation etc [77]. By the feedback implementation of the optimization based on look-ahead, the performance is enhanced compared to the hierarchical method that is often used, and also the results may be more robust against uncertainties [73, 72].

57 CHAPTER 3

SIMULATION RESULTS OF DETERMINISTIC PURSUIT-EVASION GAMES

In this chapter, several simulation results are presented to demonstrate the feasi- bility of the methods introduced in Chapter 2. We first apply the iterative method to a two-player PE game for which the analytical solution is also derived. The con- vergence result obtained from the numerical calculation coincides with the prediction from the analytical solution. Furthermore, we solve a multi-player PE game for a suboptimal solution by the hierarchical method. We show that the performance of the hierarchical method can be improved by the optimization based on one-step look- ahead. Finally, we demonstrate that the advantage of limited look-ahead in solving a more complicated problem when no suboptimal solution is available. Unfortunately, due to the so-called “curse of dimensionality”, we were unable to implement a com- plete iterative process for multi-player PE games, and an efficient algorithm is yet to be developed.

58 3.1 Feasibility of the Iterative Method

To our knowledge, analytical results for multi-player PE differential games as

formulated in this paper are very limited. Here, we apply the iterative method nu-

merically to a two-player PE game in a two dimensional space. An analytical solution

is also derived, and to which the numerical solution can be compared. This example is

created based on a problem similar to that on page 429 in [7]. Consider the following

dynamics of the pursuer and the evader,

x˙ (t) = cos θ(t) x˙ (t)= b(t)+1+ √2 p , e . (3.1) y˙ (t) = sin θ(t) y˙ (t)= 2  p   e −  Here, xp and yp (xe and ye) are the real-valued state variables of the pursuer (evader);

θ is the control of the pursuer; b is the control of the evader with 1 b(t) 1 − ≤ ≤ for t 0. Let the initial condition be [x , y ]T = [0, 0]T and [x , y ]T = [2, 1]T. ≥ p0 p0 e0 e0 The terminal set Λ of the game is defined as Λ , (x , y ), (x , y ) y y . Note p p e e p ≥ e n o that according to (3.1), (y ˙ y˙ ) 1, thus, the game terminates in finite time. Let e − p ≤ − T x˜ = [xp, yp, xe, ye] and the objective is

T J(θ, b;x ˜ )= b(t)+1+ √2 cos θ(t) dt (3.2) 0 − Z0   The pursuer minimizes the objective (3.2) subject to (3.1) while the evader maximizes

it.

3.1.1 Analytical Solution of the Two-player Game Example

We first solve the PE game problem analytically. Change the variable as x =

x x and y = y y , such that the players’ dynamic equation in (3.1) becomes e − p e − p x˙(t)= b(t)+1+ √2 cos θ(t) − . (3.3) y˙(t)= 2 sin θ(t)  − −  59  T T The initial condition is (x , y ) = (2, 1) and the terminal set becomes Λ′ , (x, y) y 0 0 | ≤ 0 . Letx ¯ = [x, y]T. By inspection of (3.2) and (3.3), the objective function can be simplified as

J ′(θ, b;x ¯)= x(T ) x . (3.4) − 0

Note that x0 does not depend on the controls of the players. Thus, a game with the objective (3.4) is equivalent to that with the objective J ′′(θ, b;x ¯)= x(T ). In the following, we solve the game based on J ′′.

2

1

* Tt y 2 K J

x

Figure 3.1: The Speed Vector under the Optimal Strategies of the Players

19 Denote the Value function associated by J ′′ by V ′′(¯x) . According to the objective

J ′′, the Hamiltonian H of the game is

H = V ′′ b(t)+1+ √2 cos θ(t) + V ′′ 2 sin θ(t) , x − y − −     19The Value V ′′ exists.

60 Here, Vx′′ and Vy′′ are the partial derivative of V ′′ with respect to x and y. Taking

derivatives of H with respect to x and y and by the minimum principle,

∂H ∂H V˙ ′′ = = 0, and V˙ ′′ = = 0. x − ∂x y − ∂x

Thus, V ′′ and V ′′ are constants. At the terminal, V ′′(x, y) = 1, which implies x y x |y=0 20 Vx′′ = 1. To determine Vy′′, consider the “main equation” in [11] (on page 67),

min max H = min max Vx′′ bt +1+ √2 cos(θt) + Vy′′ 2 sin(θt) = 0. (3.5) θt bt θt bt − − − n  o

Substitute Vx′′ = 1 into (3.5), and by (3.5) the optimal controls θt∗ and bt∗ satisfy

2 cos(θt∗) = 1 1+ V ′′y, b∗ = 1, (3.6) t  2 sin(θ∗)= V′′q 1+ V .  t y ′′y q Substitute (3.6) into (3.5), 

2 2+ √2 2V ′′ 1+ V = 0. (3.7) − y − ′′y q

Solving (3.7), we obtain Vy′′ = 1, which yields that V ′′(¯x)= x+y. Note that equation

(3.5) remains the same if the order of the minimization and the maximization is

switched. Thus, V ′′ is the Value of the two-player PE game. The solution is also

illustrated in Figure 3.1. The circle stands for the set of possible controls of the

pursuer and the vector η is the velocity vector specified in (3.3) under optimal θt∗ and

b∗. Finally, the Value associated with J ′ is V ′(¯x)= V ′′(¯x) x = y. t − 3.1.2 Numerical Solution by the Iteration Method

In this section, we solve the game numerically using the iterative method. Suppose that the Value of the game and the corresponding optimal controls of the players are

20Here, the “main equation” is actually a Hamilton-Jacobi-Isaacs (HJI) equation.

61 unknown. Choose θ˜ = π (t 0) as a suboptimal control of the pursuer and the t 2 ≥

corresponding optimal control of the evader is, b∗ =1(t 0). Note that under t ≥

(θ,˜ b∗), the PE game ends with a minimum time. The speed vector γ under (θ,˜ b∗) is shown by a dashed line in Figure 3.1. The suboptimal Value function V (¯x) associated

˜ 2+√2 ˜ with (θ, b∗) is V (¯x0)= 3 y0. Since suboptimal control θ does not depende on state x¯ and time t,e it is ST-consistent. Therefore, V has the improving property. Given

(x0, y0)=(2, 1) and let W0 = V , we apply thee iteration method. Here, we discretize

the state space with a grid sizee of 0.05 and choose ∆t = 0.03. Note that ∆t is so small that dynamic optimization problems over limited look-ahead intervals as in (2.20) can be approximated by static optimization problems where the players’

T strategies during ∆t intervals are constant. The evolution of Wk(¯x) atx ¯0 = [2, 1] is illustrated in Figure 3.2. It can be seen that starting from V (2, 1) = 2+√2 1.1381, 3 ≈ Wk(2, 1) converges to the optimal Value V ′(2, 1) = 1 at iteratione 14. The convergence is achieved at about iteration 14. Here, the convergence rate clearly depends on the

size of the look-ahead interval ∆t.

3.2 Suboptimal Solution by the Hierarchical Approach

In this section, a multi-player PE game is solved by the hierarchical approach

introduced in Section 2.6. We select a PE scenario involving 3 pursuers and 5 evaders

in a R2 space. The dynamics of the players are

x˙ = v cos θ x (0) = x ς ς ς with ς ς0 . (3.8) y˙ς = vς sin θς yς (0) = yς0

Equation (3.8) is the first-order Dubin’s car model, where the subscript ς p,e ∈ { }

stands for a pursuer or an evader; xς and yς are state variables (displacement) along

the x- and y-axis respectively; vς is the velocity; θς is the control variable. In

62 Evolution of Suboptimal Value Function with Iteration 1.18

1.16

1.14

1.12

1.1 (2,1) k 1.08 W

1.06

1.04

1.02

1

0 5 10 15 20 25 30 35 40 Iteration

Figure 3.2: The Evolution of the Suboptimal Value Wk’s at (2,1) by Iteration

this model, the players are assumed to perform at their maximum speeds and have

complete maneuverability, which can approximately describe the dynamics of small

ground vehicles. The terminal set Λ of the game is chosen as

Λ= (x , y ), (x , y ) (x x )2 +(y y )2 ε , (3.9) p p e e p − e p − e ≤ n q o

for some ε > 0. Here, we choose ε as 0.5. In this multi-player game, the objective function in (2.44) is adopted, i.e., the sum of the capture times of all evaders is to be optimized. The necessary parameters and the initial states are given in Table 3.1.

A complete solution of this problem is hard to obtain, and here we solve for a suboptimal solution by the hierarchical approach. Since the hierarchical approach is based on the solvability of the corresponding two-player PE games, here we first provide an analytical solution to a two-player game with the players’ dynamics in

(3.8) for vp >ve and the capture time T is chosen as the objective function.

63 Pursuers 1 2 3

(xp0, yp0) (3,0) (5,0) (7,0) vp(1/sec) 6 5 7

ωp(rad/sec) 0.8π 0.8π 0.8π

θp0 π/2 π/2 π/2 Evaders 1 2 3 4 5

(xe0, ye0) (1,5) (4,5) (6,5) (7,5) (9,5) ve(1/sec) 5 4 3 3 5

Table 3.1: The Necessary Parameters of the Players for PE Game Scenario 1

Consider a two-player PE game, where the state variables include x , y for ς ς ς ∈ p,e . Define x = x x and y = y y , such that the dynamics in (3.8) becomes { } e − p e − p x˙ = v cos(θ ) v cos(θ ) x = x x e e − p p with 0 e0 − p0 . (3.10) y˙ = v sin(θ ) v sin(θ ) y = y y e e − p p 0 e0 − p0 2 2 The terminal set of the game is Λ′ = (x, y) x + y ε . The objective is, | ≤ T n p o J = 0 dt. DenoteR by V the Value function of such a two-player PE game. The Hamiltonian of the game is

H = V v cos(θ ) v cos(θ ) + V v sin(θ ) v sin(θ ) . x e e − p p y e e − p p   Here, Vx (Vy) is the partial derivative of V with respect to x (y). By the minimum principle, V˙ = ∂H = 0 and V˙ = ∂H = 0. Thus, V and V are constant. By x − ∂x y − ∂y x y

the main equation [11], i.e., minθp maxθe H = 0, such that we obtain that the optimal

strategies θp∗ and θe∗ satisfies

Vx Vy cos θ∗ = cos θ∗ = and sin θ∗ = sin θ∗ = , p e 2 2 p e 2 2 Vx + Vy Vx + Vy

p 64 p 2 2 and V + V = 1/(v v ). By the definition of the terminal set Λ′, the terminal x y p − e statep (x , y ) satisfies V ( y )+ V x = 0. It follows that T T x · − T y · T 1 Vx/x = Vy/y = . (3.11) (v v ) x2 + y2 p − e Furthermore, it can be shown from (3.11) that p

V (x, y)= x2 + y2 ε /(v v ). (3.12) − p − e p  Clearly, the optimal strategy of the pursuer and the evader are x y cos θp∗ = cos θe∗ = , sin θp∗ = sin θe∗ = . (3.13) − x2 + y2 − x2 + y2 Based on the solution to a two-playerp game, we can use thep hierarchical approach to solve the multi-player PE game for a suboptimal solution. Now, solve the (combi- natorial) optimization problem at the upper level with the objective function (2.46) with K = 3, and the optimal engagement obtained is given in Table 3.221. The per- formance index (the sum of the capture times) under such engagements is evaluated as 26.03 (sec).

Engaged Evaders Pursuer 1 Pursuer 2 Pursuer 3 1st E1 E2 E3 2nd N/A N/A E4 3rd N/A N/A E5

Table 3.2: The Optimal Engagement Between the Pursuers and the Evaders with the First-Order Dubin’s Car Model

Next, we show the possibility of using the hierarchical approach to solve a game

where the analytical solutions of the underlying two-player PE games are unknown.

21Here, Ej stands for evader j.

65 Consider the same game except that the players have more complicated dynamics as

follows.

x˙ p = vp cos θp xp(0) = xp0 x˙ e = ve cos θe xe(0) = xe0 y˙p = vp sin θp , with yp(0) = yp0 and . (3.14) ˙ y˙e = ve sin θe ye(0) = ye0 θp = ωpat θp(0) = θp0 In (3.14), a is the control the a pursuer with 1 a 1 and θ is the control of t − ≤ t ≤ e the evader. This model was originally used by Isaacs when he studied the so-called

homicidal chauffeur game [11, 7]. The angular velocity ωp and the initial orientation

of each pursuer are given in Table 3.1 (rows in shade). It can be verified that every

evader in the game is in the “capture region” of some pursuer [11, 7]. That is,

Assumption 3 on page 39 holds.

In a two-player PE game with the players’ dynamics given in (3.14), the optimal

strategy of evader j (j = 1, , 5) can be shown as same as that in (3.13). Define ··· x = x x and y = y y . A heuristic strategy of the pursuer may be solved by e − p e − p considering the distance between the evader and the pursuer, D = x2 + y2. Suppose that the control of the evader is constant. Take the second orderp time derivative of ¨ ¨ D, i.e., D, and find at∗ to minimize D. It yields the optimal control of the pursuer as

a∗ = sign(x sin θ y cos θ ). (3.15) t − p − p

Engaged Evaders Pursuer 1 Pursuer 2 Pursuer 3 1st E1 E3 E5 2nd N/A E4 E2

Table 3.3: The Optimal Engagement Between the Pursuers and the Evaders with the Modified Dubin’s Car Model

66 In the hierarchical approach, given a possible engagement scheme, we simulate every decoupled game, in which the pursuers utilize the strategy in (3.15) and evaders play according to (3.13). All possible engagements are enumerated and the best engagement result is given in Table 3.3. In this case, the objective functional evaluated under the optimal engagement scheme is 34.54(sec), which is larger than 26.03(sec) in the previous case. The corresponding PE trajectories are shown in Figure 3.3, in which the trajectory of each pursuer along with that of its engaged evaders is plotted separately. Snapshots at the 1st and the 5th second are illustrated.

In this subsection, the simulation result demonstrates the cooperation of the pur- suers by the hierarchical approach. The cooperation results from proper allocation of the pursuers among the evaders. In Figure 3.3, the optimal strategy is utilized by the evaders against their engaged pursuers. It implies that the evaders have the knowledge of the pursuers’ engagement (strategies). Although this may not be true in practice, the pursuers will be better off if the evaders use different strategies. This decision-making is conservative from the pursuers’ perspective.

3.3 Suboptimal Solution by Limited Look-ahead

3.3.1 Performance Enhancement by Limited Look-ahead

The iteration method has poor scalability; however, the appealing aspect of the method is its capability of improving the performance of the pursuers’ controls from the hierarchical approach. In this section, we illustrate the usefulness of limited look- ahead in the performance enhancement in cooperative pursuit problems. Consider a simple PE game involving 2 pursuers and 2 evaders with the dynamics in (3.8) and the objective as the sum of the capture times of all the evaders. Consider a specific

67 Snapshot at Second 1 Snapshot at Second 5 30 30 Capture of 25 25 Evader 1

20 20 Y 15 Y 15

10 10

5 5 Pursuer 1 Pursuer 1 Evader 1 Evader 1 0 0 −10 −5 0 5 −10 −5 0 5 X X (a) Pursuit Trajectories Between Pursuer 1 and Its Engaged Evaders

Snapshot at Second 1 Snapshot at Second 5 15 15 Capture of Evader 3 10 10

5 5 Y Y 0 0

−5 −5 Pursuer 2 Pursuer 2 Evader 3/4 Evader 3/4 −10 −10 4 6 8 10 12 4 6 8 10 12 X X (b) Pursuit Trajectories Between Pursuer 2 and Its Engaged Evaders

Snapshot at Second 1 Snapshot at Second 5 20 20

15 15

10 10

5 5 Y Y 0 0 Capture of −5 −5 Evader 5

−10 Pursuer 3 −10 Pursuer 3 Evader 5/2 Evader 5/2 −15 −15 −10 −5 0 5 10 15 −10 −5 0 5 10 15 X X (c) Pursuit Trajectories Between Pursuer 3 and Its Engaged Evaders

Figure 3.3: Pursuit Trajectories under the Best Engagement

68 scenario specified in Table 3.4. Let ε = 0.5. In Figure 3.4, a snapshot of the resulting cooperative pursuit trajectory is shown under the optimal engagement between the pursuers and the evaders by the hierarchical approach. Here, the arrows indicate the instantaneous moving directions of the players. Clearly, pursuer 1 (2) is engaged with evader 1 (2).

Initial Position Velocity Pursuer 1 (0,0) 6 Pursuer 2 (9,0) 6 Evader 1 (4,0) 3 Evader 2 (5,2) 3

Table 3.4: The Necessary Parameters of the Players in the PE Game Scenario 2

3.5 Pursuer 3 Evader 2.5

2 Evader 2 Pursuer 2 1.5

1

0.5

0

−0.5 Pursuer 1 Evader 1

−1

−1.5 0 1 2 3 4 5 6 7 8 9

Figure 3.4: Pursuit Trajectories under the Strategies by the Hierarchical Approach

69 h T Definex ¯ = [xp, yp, xe, ye] and denote by V (¯x, z) as the suboptimal upper Value

obtained by the hierarchical approach (see (2.46)),e where z is the discrete variable.

Suppose that the game starts at t0. Given ∆t > 0, the optimization problem based

on limited look-ahead at a sample time t = t0 + k∆t (k Z 0) is ∈ ≥

t+∆t 2 h

max min zj(τ) dτ + V x¯t+∆t;¯xt,a,β[a], zt+∆t;zt,x¯ . (3.16) β ∆(t) a( ) A(t) ( t ! ) ∈ · ∈ Z j=1   X e In this example, we choose ∆t = 0.05, and approximate the “minimax” problem in

(3.16) by a static optimization problem. The optimal strategies (at∗, βt∗) solved at each sample time t0 + k∆t are utilized to generate the trajectories of the pursuers and the evaders, which are illustrated in Figure 3.5. Clearly, the evolution of the game in Figure 3.5 can better resemble the reality compared to Figure 3.4. Accord- ing to Figure 3.5, when the pursuers and the evader get closer, both pursuers can move cooperatively to force the evaders to change their escape directions. In this way, the pursuit performance is improved by 4 time steps and here we use the term

“cooperation zone” to indicate the region where the extra cooperation between the two pursuers is possible.

3.3.2 Limited Look-ahead with a Heuristic Cost-to-go

In this subsection, we demonstrate a potential advantage of limited look-ahead in

a problem when no obvious suboptimal solution is available. Consider a PE game

with 3 pursuers and 1 evader with the dynamics specified in (3.8). The evader has

higher speed than any of the pursuers. Choose the capture time of the evader T as

the objective functional. Consider a specific scenario specified in Table 3.5. Note that

i since ve >vp for any i = 1, 2, 3, the hierarchical approach is no longer valid.

70 3.5 3.5 Pursuer Pursuer 3 3 Evader Evader 2.5 2.5 Evader 2 2 2

1.5 Evader 2 Pursuer 2 1.5 Pursuer 2 1 1 0.5 0.5 Evader 1 0 0 −0.5 Pursuer 1 Evader 1 −0.5 Pursuer 1 −1 −1 −1.5 −1.5 0 2 4 6 8 0 2 4 6 8 (a) Pursuit Trajectories at the Beginning (b) Pursuit Trajectories when Approaching Stage a “Cooperation Zone”

3.5 3.5 Pursuer Pursuer 3 3 Evader Evader 2.5 2.5 Capture 2 2 1.5 Evader 2 1.5 Evader 2 Pursuer 2 Pursuer 2 1 1 0.5 0.5 Evader 1 Evader 1 0 0 Pursuer 1 −0.5 −0.5 Pursuer 1 Capture −1 −1 −1.5 −1.5 0 2 4 6 8 0 2 4 6 8 (c) Pursuit Trajectories when Leaving the (d) Capture of the Evaders “Cooperation Zone”

Figure 3.5: Pursuit Trajectories under the Strategies Based on One-Step Limited Look-ahead

71 In the following, we apply the optimization based on one-step limited look-ahead

with a heuristic cost-to-go function. By inspection of the optimization based on

limited look-ahead in (2.20), the performance of the resulting pursuers’ strategies

depends largely on the cost-to-go function evaluated at time t +∆t. Note that the

hierarchical approach is no longer applicable, and we construct a heuristic cost-to-go

function as follows. We first analyze a necessary condition, under which the evader

can be captured. An arbitrary pursuer i(i = 1, 2, 3) and the evader are illustrated in

i Figure 3.6, where vp and ve are the speed vector of the pursuer and that of the evader respectively; βi (αi) is the angle between the line of sight from pursuer i (the evader) to the evader (pursuer i) and the moving direction of the evader (pursuer i). Now,

i let vp and ve be fixed, i.e., αi, βi are constant. If the evader can be captured, αi and

βi must satisfies the following condition.

vi vi sin β = | p| sin α | p| = β¯ = arcsin( vi / v ) (3.17) i v i ≤ v ⇒ i | p| | e| | e| | e| In (3.17), vi and v are the norms of the speed vectors; β¯ is a bound of β when | p| | e| i i the capture is guaranteed, i.e., β β¯ . | i| ≤ i

Initial Position Velocity Pursuer 1 (0,3) 9 Pursuer 2 (1.64, -1.15) 9.8 Pursuer 3 (-2.8,-1) 9.8 Evader (0,0) 10

Table 3.5: The Necessary Parameters of the Players in the PE Game Scenario 3

72 Claim 1 If the evader is captured by a finite time T > 0, then at any time t 0 and ≥ for any possible ve of the evader, there must exist a pursuer i (i = 1, 2, 3), such that

(3.17) holds.

It is clear that when there exists v such that β satisfies β > β¯ for i = 1, 2, 3, the e i | i| i evader can escape from all the pursuers by moving along that direction.

i i vp

Di Ji

di Ei ve

Figure 3.6: A Necessary Condition on the Capturability of the Superior Evader

Denote byx ¯ the composite state variable that includes the states of all the pursuers

and the evader. Suppose that the pursuers can move cooperatively such that (3.17)

holds at any time t 0. We construct the heuristic cost-to-go function as follows. ≥ Define the set Π = x¯ (3.17) holds atx ¯ . Consider the terms α , β and γ (γ = { | } i i i i π α β ) as shown in Figure 3.6, where d is the distance between pursuer i and − i − i i the evader. All of these terms are functions of statex ¯. Define the function ψ(¯x) as

d sin α ψ(¯x) , max min i i . θe i v sin γ  | e| i 

Here, the maximization is taken with respect to the evader’s control θe in (3.8), which is the speed orientation. Let D(¯x) = i di. Choose a heuristic cost-to-go function P 73 V (¯x) (x Π) with the following properties. ∈ e V (¯x ) < V (¯x ) if ψ(¯x ) <ψ(¯x ), 1 2 1 2 (3.18) ( V (¯x1) < V (¯x2) if ψ(¯x1)= ψ(¯x2) and D(¯x1) < D(¯x2). e e Choose ∆t = 0.01e and ε =e 0.5. Implement the look-ahead strategies of the pursuers

and the evader repetitively determined from (3.16) with the cost-to-go function re-

placed by V that satisfies (3.18). The resulting cooperative pursuit trajectories are

shown in Figuree 3.7. It can be seen that with the guidance from V , a proper for-

mation of the pursuers is maintained and capture is ensured. Thise small example demonstrates a possible usefulness of the look-ahead approach in solving this compli-

cated PE problem. A complete solution to this game needs further investigation.

74 3.5 3 Pursuer Pursuer 1 Evader 2.5

2 1.5

1 Evader 0.5 ε 0

−0.5

−1 Pursuer 3 −1.5 Pursuer 2 −2 −3 −2 −1 0 1 2

Figure 3.7: Cooperative Pursuit Trajectories of the Superior Evader by the Limited Look-ahead Method Based on a Heuristic Cost-to-go

75 CHAPTER 4

STOCHASTIC MULTI-PLAYER PURSUIT-EVASION DIFFERENTIAL GAMES

In this chapter, we extend the iterative method for deterministic games to stochas-

tic PE problems. For the case where the players’ (pursuers/evaders) dynamics are

governed by random processes and the state variables are perfectly accessible all the

players, the problem is rigorously defined with the concepts of the players’ strategies

and Value function. We show that the iterative method is still applicable. Under the

framework of the hierarchical method, we devote some effort to two-player stochastic

PE games. An analytic solution for a specific stochastic two-player PE differential

game is derived. Capturability of the evader in a two-player game is also examined.

Finally, we extend the study to a more general case where the players no longer

have perfect state information. It is a very hard problem and only a special case is

addressed.

4.1 Formulation of Stochastic Pursuit-Evasion Games with Perfect State Information

We first formulate a stochastic multi-player PE differential game, where both the pursuers and the evaders can access the perfect state information at any time, based

76 on the deterministic case in Chapter 2. Suppose that there are N pursuers and M evaders in the same state space. Consider a noise corrupted version of the dynamics of the pursuers and the evaders with additive disturbances. The disturbance is modelled as a standard Wiener process (often called Brownian motion). The dynamic equations of pursuer i and evader j are given as follows.

i i i i i i i i dxpt = fp(xpt,ait)dt + σp(xpt)dwpt with xp(t0)= xp0, (4.1a)

j j j j j j j j dxet = fe (xet, bjt)dt + σe(xet)dwet with xe(t0)= xe0. (4.1b)

i j In (4.1), xi Rnp , xj Rne for t 0, ni , nj Z+ and a Ai , b Bj, where pt ∈ et ∈ ≥ p e ∈ it ∈ a jt ∈ a i j i Ai Rmp and Bj Rme are given compact sets with mi ,mj Z+; wi W i Rkp a ∈ a ∈ p e ∈ pt ∈ p ⊆ j (wj W j Rke ) is a standard Wiener process; σi ( ) (σj( )) is the corresponding et ∈ e ⊆ p · e · matrix with a proper dimension. Although this model is adopted for mathematical reasons, in practice, the additive disturbance may stand for a simple wind effect on small AAVs (Autonomous Aerial Vehicles).

i j i j Suppose that fp (fe ) is bounded and Lipschitz in xp (xe), uniformly with respect

i j to ait (bjt) as in (2.1). Assume that σp and σe are bounded and satisfy

σi (xi ) σi (˜xi ) γi xi x˜i , σj(xj) σj(˜xj) γj xj x˜j p p − p p ≤ pk p − pk e e − e e ≤ e k e − ek

i j for some γi , γj > 0 and any xi , x˜i Rn p , xj, x˜j Rne [50]. We use the same p e p p ∈ e e ∈ definition of the capture of evader j in deterministic problems, and also augment the state x with the discrete variable z. For simplicity, let

T T T T x = x1T, x2T, , xN , x = x1T, x2T, , xM , p p p ··· p e e e ··· e h Ti h T i a = a T,a T, ,a T , b = b T, b T, , b T , 1 2 ··· N 1 2 ··· M h iT T h i T T f = f 1T, f 2T, , f N , f = z f 1T, z f 2T, , z f M . p p p ··· p e 1 e 2 e ··· M e h iT T h T T i w = w1T,w2T, ,wN , w = w1T,w2T, ,wM , p p p ··· p e e e ··· e h i h i 77 and define

σ (x ) = diag σ1(x1), , σN (xN ) and σ (x ) = diag z σ1(x1), , z σM (xM ) p p p p ··· p p e e 1 e e ··· M e e   i j as diagonal matrices with σp’s and zjσe’s as the diagonal elements. Then, equation

(4.1) can be rewritten as

dxpt = fp(xpt,at)dt + σp(xpt)dwpt with xp(t0)= xp0, (4.2a)

dxet = fe(xet, bt)dt + σe(xet)dwet with xe(t0)= xe0. (4.2b)

Further,

dxt = f(xt,at, bt)dt + σ(xt)dwt with x(t0)= x0, (4.3)

T T T T T T where x = xp , xe , f(x, a, b) = fp (x, a), fe (x, b) , σ(x) = diag σp(xp), σe(xe) T N i M j and w = w T,w T . Let X ,  Rnp  Rne , and denote by A the p e i=1 × j=1 a h i  Q   QN i  set of possible controls of all pursuers, i.e., Aa = i=1 Aa and similarly for evaders, M j Q Ba = j=1 Ba. DueQ to the introduction of the random process, the strategies of the pursuers and the evaders should be defined properly such that (4.2) has a unique solution. Here, we adopt the definitions in [50].

Definition 5 Let (Ω, , P ) be a probability space, a monotone family of σ- F {Ft} fields with for any t t t , and a complete separable Ft ⊆ F Ft1 ⊆ Ft2 0 ≤ 1 ≤ 2 X . A stochastic process ξ defined on the time interval [t , T], ξ(t) : Ω 0 →X with t0 t T, is said to be t t t0 -progressively measurable if for all t [t0, T], ≤ ≤ {F } ≥ ∈ the map (t,ω) ξ(t,ω) is [t ,t] / ( ) measurable, where [t ,t] is the Borel 7→ B 0 ×Ft B X B 0 σ-field on [t ,t] and ( ) is the Borel σ-field on . 0 B X X

78 Define N M = ki + kj. K p e i=1 j=1 X X The sample space for the stochastic process of (4.3) is defined as

ω Ω = ω C [t, T] , ω : [t, T] RK with ω = 0 (4.4) t ∈ 7→ t n  o for every t [t , T]. Denote by ω the σ-field generated by paths from time t ∈ 0 Ft,s ⊆Fs to s (for any t s T) in Ωω. Provided with the Wiener measure P ω on ω , Ωω ≤ ≤ t t Ft,T t becomes the canonical sample space for the stochastic process of (4.3). Note that T in the definitions above is related to the terminal time of a game defined in Chapter

2. In addition, we also use the space

ω Ω = ω C [t,s] , ω : [t,s] RK with ω = 0 (4.5) t,s ∈ 7→ t n  o for t s T. Next, we define dismissible controls and nonanticipative strategies of ≤ ≤ the pursuers and the evaders.

Definition 6 In a stochastic multi-player PE game at any time t 0, an admissible ≥ control of the pursuers and that of the evaders, a( ) and b( ) are defined respectively · · as

a( ) A(t) , φ : [t, T] A φ( ) is -progressively measurable , · ∈ 7→ a | · Ft,s n o b( ) B(t) , ϕ : [t, T] B ϕ( ) is -progressively measurable · ∈ 7→ a | · Ft,s n o for t s T. ≤ ≤

We say that a( ),a ˜( ) A(t) are the same on interval [t,s], denoted by a a˜ on [t,s], · · ∈ ≈ if the probability P ω(a =a ˜ a.e. in [t,s]) = 1. It is the same for b( ), ˜b( ) B(t). A t · · ∈ nonanticipative strategy of the pursuers (evaders) can be defined as follows.

79 Definition 7 A nonanticipative strategy α (resp. β) of the pursuers (resp. evaders) on [t, T] is

α Γ(t) , ψ : B(t) A(t) for any b( ), ˜b( ) B(t), b ˜b on [t,s] ∈ 7→ | · · ∈ ≈ n implies α[b] α[˜b] on [t, s] for every s [t, T] ≈ ∈ o resp. β ∆(t) , η : A(t) B(t) for any a( ), a˜( ) A(t),a a˜ on [t,s] ∈ 7→ | · · ∈ ≈  n implies β[a] β[˜a] on [t,s] for every s [t, T] . ≈ ∈ o As in the deterministic case, the definition of nonanticipative strategy implies that the decision-making of the players only depend on the information up to the current time.

More importantly, under the definition of admissible control with that of nonantici- pative strategy, the stochastic differential equation (4.3) has a unique solution [50].

More details about the setting-up in a stochastic problem can be found in [78, 19, 7].

Consider the following objective functional,

T (a, b; x , z )=E G x , z ,a , b dτ + Q x (4.6) J t t ω;t τ τ τ τ T  Zt    subject to (4.3) and (2.9).

ω R M In (4.6), ω Ωt,T ; the terminal cost Q : X 0 and the cost rate G : X Z ∈ 7→ ≥ × ×

Aa Ba R 0 satisfy the boundedness and the Lipschitz conditions that are similar × 7→ ≥ to (2.15) [50]. Here, Eω;t denotes the expectation taken with respect to stochastic process ω given the information at time t. Henceforth, we will use Eω for a short notation.

A PE stochastic game is modelled as a zero-sum game. The lower and the upper

Value can be defined similarly as in the deterministic case. For any x X and t ∈

80 z ZM , the lower Value of a game (x , z ) is defined as t ∈ V t t

(xt, zt) = inf sup (α[b], b; xt, zt) V α Γ(t) b( ) B(t) J ∈ · ∈  T = inf sup Eω G xτ , zτ , α[b]τ , bτ dτ + Q(xT ) . (4.7) α Γ(t) b( ) B(t) t ∈ · ∈  Z   Similarly, the upper Value (x , z ) is V t t

(xt, zt) = sup inf (a, β[a]; xt, zt) V β ∆(t) a( ) A(t) J ∈ · ∈  T = sup inf Eω G xτ , zτ ,aτ , β[a]τ dτ + Q(xT ) . (4.8) β ∆(t) a( ) A(t) t ∈ · ∈  Z   To differentiate the objective and the (upper/lower) Value from those in the deter- ministic case, we use the notations , and in (4.7) and (4.8). The Value function J V V exists if the Isaacs condition holds, i.e., = = . V V V V

4.2 The Iterative Approach to Stochastic Pursuit-Evasion Games

A stochastic multi-player PE differential game is very difficult due to the uncer- tainty. However, for stochastic problems with the perfect state information pattern, the iterative approach proposed for deterministic PE games is still applicable. In the following, we summarize the iterative approach in the stochastic case, and again, we focus on the upper Value . V First of all, we define the Sample-Set-Time-Consistency (SSTC) of a control struc-

ture imposed on the pursuers in the stochastic case in contrast to STC in deterministic

problems. Given anya ˜t( ) AS (t), x X, z ZM , β ∆(t) and ω Ωω , de- · ∈ xt,zt t ∈ t ∈ ∈ ∈ t,T t t note by x t t the trajectory of x for s t starting from x undera ˜ and β[˜a ] s;xt,a˜ ,β[˜a ],ω ≥ t

t t corresponding to ω. For short, letx ˜(s)= xs;xt,a˜ ,β[˜a ],ω. Denote by zs;zt,x˜ the trajectory

of z corresponding tox ˜ and usez ˜(s)= zs;zt,x˜ for a short notation.

81 Definition 8 (Sample-Set-Time-Consistency) A suboptimal control structure S is said to be Sample-Set-Time-consistent, or SST-consistent for short, if given any ω Ωω , for any a˜t( ) AS (t) at any 0 t T , there exists a˜s( ) AS (s) ∈ t,T · ∈ xt,zt ≤ ≤ · ∈ x˜s,z˜s associated with x˜ and z˜ at any s for t s T such that a˜t(τ) =a ˜s(τ) for s τ T s s ≤ ≤ ≤ ≤ under any β ∆(t). ∈

Suppose that there exists a feasible SST-consistent control structure such that a

suboptimal upper Value function can be determined as

T (x, z) = sup inf Eω G xτ , zτ ,aτ , β[a]τ dτ + Q(xT ) , (4.9) S V β ∆(t) a( ) Ax,z(t) t ∈ · ∈  Z  e  where ω Ωω and AS (t) is the restricted control set under structure S at time t. ∈ t,T x,z If the control structure S is SST-consistent, it can be shown similarly as in Theorem

1 in Chapter 2 that

t+∆t (x, z) sup inf Eω G xτ , zτ ,aτ , β[a]τ dτ S V ≥ β ∆(t) a( ) Ax,z(t) t ∈ · ∈  Z e  + (x , z ) . (4.10) V t+∆t;x,a,β[a],ω t+∆t;z,x  e Let = W : X ZM R . A transformation on can be defined similarly W { × 7→ } H W as in (2.25)

t+∆t [ ](x, z) = sup inf Eω G xτ , zτ ,aτ , β[a]τ dτ H W β ∆(t) a( ) A(t) t ∈ · ∈  Z  + (x , z ) (4.11) W t+∆t;x,a,β[a],ω t+∆t;z,x  subject to (4.3) and (2.9) for . Let = . A sequence of functions can W ∈ W W0 V Wk

be generated by k+1 = [ k]. In the following,e we will show that the sequence is W H W decreasing, and its limit coincides with the upper Value defined in (4.8).

82 Theorem 10 (i) If (x, z) satisfies (4.10) for any x X and z ZM , then W0 ∈ ∈ , the sequence k k∞=0 converges point-wisely; (ii) The limiting function (x, z) {W } W∞ limk k(x, z) satisfies (x, z)= [ ]. →∞ W W∞ H W∞

Proof: The proof of Theorem 2 in Chapter 2 is applicable.

Claim 2 (Principle of DP) The upper Value function satisfies V

tm T ∧ (x, z) = sup inf Eω G xτ , zτ ,aτ , β[a]τ dτ V β ∆(t) a( ) A(t) t ∈ · ∈  Z 

+ (xtm T ;x,a,β[a],ω, ztm T ;z,x) (4.12) V ∧ ∧  for any t t , x X and z ZM . m ≥ 0 ∈ ∈

Remark 10 Related work on DP principle includes Theorem 1.6 on page 298 in [50],

Theorem 4.1 on page 777 in [69] and Theorem 3.3 on page 180 in [19].

Lemma 7 For a real-valued function (x, z), if there exists ∆t > 0, such that for W any x X and z ZM , ∈ ∈ t+∆t (x, z) = sup inf Eω G xτ , zτ ,aτ , β[a]τ dτ W β ∆(t) a( ) A(t) t ∈ · ∈  Z  + (x , z ) , (4.13) W t+∆t;x,a,β[a],ω t+∆t;z,x  then for any K Z+, ∈ t+K∆t (x, z) = sup inf Eω G xτ , zτ ,aτ , β[a]τ dτ W β ∆(t) a( ) A(t) t ∈ · ∈  Z  + (x , z ) (4.14) W t+K∆t;x,a,β[a],ω t+K∆t;z,x  for any x X and z ZM . ∈ ∈

Proof: The proof of Lemma 3 in Chapter 2 is applicable.

83 Assumption 4 Define the set = x , z ZM zx = 0 22. There exists a con- SG ∈X ∈ 6 stant C > 0, and for any (x, z) , any ε> 0 and K Z+, there exists a( ) A(t) E ∈ SG ∈ · ∈ such that (i)

t+K∆t (x, z)+ ε sup Eω G xτ , zτ ,aτ , β[a]τ dτ V ≥ β ∆(t) t ∈  Z  + (x , z ) ; (4.15) V t+K∆t;x,a,β[a],ω t+K∆t;z,x  (ii) for any t τ t + K∆t, the conditional expectation ≤ ≤

E V x , z x , z C 23 (4.16) x t+K∆t;x,a,β[a],ω t+K∆t;z,x τ;x,a,β[a],ω τ;z,x ∈ SG ≤ E n o e   for any β ∆(t). ∈

Theorem 11 Suppose that = . If Assumption 6 is true, then for the sequence W0 V

k(x, z) ∞ by k+1 = [ k],e its limit (x, z) = limk k(x, z) = (x, z) W k=1 W H W W∞ →∞ W V for any x X, z ZM . ∈ ∈

Proof: Since (x, z) (x, z), we only need to show that (x, z) (x, z). W∞ ≥ V W∞ ≤ V For any (x, z) an arbitrary ε> 0, there exists K Z+ and a( ) A(t) such that ∈ SG ∈ · ∈ P x , z < ε/C due to the fact that C G(x, z, a, b) δ > 0 t+K∆t t+K∆t;z,x ∈ SG E G ≥ ≥   for z = 0 in (2.15).  6 22 Set G is interpreted as all the state x and z from where the game has not yet terminated. S 23It stands for the expectation of the suboptimal upper Value function at x ,zt K t z,x provided that x ,zτ z,x is in G at any τ for t τ t + K∆t. t+K∆t;x,a,β[a],ω + ∆ ; τ;x,a,β[a],ω ; S ≤ ≤

84 By Assumption 6 (i), there exists a( ) A(t) such that the following inequality · ∈ (4.17) holds. For simplicity, letx ¯ = x andz ¯ = z for β ∆(t). t+K∆t;x,a,β[a],ω t+K∆t;z,x ∈ t+K∆t (x, z) sup Eω G xτ , zτ ,aτ , β[a]τ dτ +Eω x,¯ z¯ ε V ≥ β ∆(t) ( t V ) − ∈  Z    t+K∆t  E G x , z ,a , β[a] dτ ≥ ω τ τ τ τ (  Zt   + P (¯x, z¯) / E Q x¯ (¯x, z¯) / ∈ SG ω ∈ SG      

+ P (¯x, z¯) G Eω x,¯ z¯ (¯x, z¯) G ε (4.17) ∈ S V ∈ S ) −      

for any β ∆(t). Note that = [ ] and by Lemma 7, ∈ W∞ H W∞ t+K∆t W (x, z) = sup inf Eω xτ , zτ ,aτ , β[a]τ dτ + (¯x, z¯) . ∞ β ∆(t) a( ) A(t) t W∞ W∞ ∈ · ∈  Z   Now, consider the same a( ) used in (4.17), ·

t+K∆t (x, z) sup Eω G xτ , zτ ,aτ , β[a]τ dτ + (¯x, z¯) W∞ ≤ β ∆(t) t W∞ ∈  Z  t+K∆t  = sup Eω G xτ , zτ ,aτ , β[a]τ dτ β ∆(t) t ∈ (  Z   + P (¯x, z¯) / E Q(¯x) (¯x, z¯) / ∈ SG · ω ∈ SG    

+ P (¯x, z¯) G Eω (¯ x, z¯) (¯x, z¯) G (4.18) ∈ S · W∞ ∈ S )    

Subtract (4.17) from (4.18),

(x,z) (x,z) sup P (¯x, z¯) G Eω (¯x, z¯) (¯x, z¯) (¯x, z¯) G + ε . W∞ − V ≤ β ∆(t) ( ∈ S · W∞ − V ∈ S ) ∈    

Since P (¯x, z¯) G < ε/CE and (¯x, z¯) 0 = V , ∈ S W∞ ≤ W   (x, z) (x, z) ε/CE Eω V (¯x, z¯) (¯x, z¯)e G + ε 2ε (4.19) W∞ − V ≤ · | ∈ S ≤ n o The second inequality in (4.19) is becausee of Assumption 6 (ii). Since ε is arbitrary, (x, z) (x, z). The proof is complete. W∞ ≤ V 85 Finally, it is worth mentioning that the hierarchical approach is also applicable

to stochastic games and it is SST-consistent because the hierarchy is independent of

time t and the states x and z. Note that the hierarchical approach depends on the solvability of the distributed two-player PE games.

4.3 Solution to a Two-Player Pursuit-Evasion Game

Since the hierarchical approach relies on the solution of a two-player game, in the section, we focus on a two-player stochastic PE differential game. The results presented are based on the classical stochastic differential game (optimal control) theory.

4.3.1 Classical Theory of Stochastic Zero-sum Differential Games

We first introduce the necessary concepts and the classical theory of solving a stochastic differential game problem with perfect state information based on [78].

Consider a two-player stochastic differential game, where the dynamic equation of the players is

dxt = f(t, xt,at, bt)dt + σ(xt)dω. (4.20)

In (4.20), x Rn; a and b are the controls of player 1 (minimizer) and player 2 ∈

(maximizer) in compact sets Aa and Ba; ω is a standard Wiener process with a proper dimension. Note that here the game model is time variant, which is more general than the model we use for multi-player PE games. Let the objective functional be

T ˆ(a, b; t, x )=E (t, x ,a , b )dτ + Q(x ) . J t ω G τ τ τ T Zt 

86 Here, the cost rate is a map : R Rn A B R; terminal cost is Q : Rn R; G G × × a × a 7→ 7→ T is the exit time, which is defined as the first time that t or x leaves some set Q that will be defined shortly.

Suppose that X is an open set in the state space. Denote by C2(X) a set of continuously differentiable functions on X. Let = [0, T] R, and define = T ⊆ Q X. Denote by C1,2( ) a set of functions φ(t, x) that have continuous first- and T × Q second- order partial derivatives, φ , φ and φ , where t and x X. For short, t x xx ∈ T ∈ we use C2 and C1,2 to denote the corresponding sets in the following discussion.

Define the set Ψ as

Ψ(X) , ψ : X R ψ is Borel measurable on X and bounded . 7→ |  Denote by P (s, x ,t,x )(s

S [ψ](x )=E ψ(x ) x = ψ(x)P (s, x ,t,x)dx. s,t s { t | s} s ZX Define the operator Ξ(t) on Ψ(X) as

1 Ξ(t)[ψ](xt) = lim h− St,t+h[ψ](xt) ψ(xt) . h 0 → −  Suppose that for any ϕ Ψ(X), Ξ(t) above is well defined. If x is a stochastic ∈ t process defined in (4.3), which is also called a diffusion process, Ξ(t) is a second-order

partial differential operator on C2(X), i.e.,

1 Ξ(t)[ψ]= ψ f(t, x , a, b)+ tr ψ σ(x )σT(x ) . (4.21) x · t 2 xx t t  

87 Here, tr(M) stands for the trace of a square matrix M. Consider any function φ(t, x) ∈ C1,2, and by Ito’s stochastic differential rule [78, 19],

1 dφ(t, x) = φ (t, x)dt + φ (t, x)dx + tr ψ σ(x )σT(x ) dt t x 2 xx t t   = φt(t, x)dt + Ξ(t)φ(t, x)dt + φxσ(xt)dt. (4.22)

Integrate (4.22) on both sides and take the expectation,

t Eφ(t, xt) φ(s, xs)=E φτ (τ, xτ )+Ξ(τ)φ(τ, xτ ) dτ (4.23) − s  Z    Define the exit time T as T = min t (t, x ) / . Suppose that the set is open, { | t ∈ Q} Q and the upper limit of the integral in (4.23) can be the exit time.

Lemma 8 Assume that

(i) functions f and σ in (4.3) satisfy that

f(t, x, a, b) C(1 + x ) and σ(x) C(1 + x ) k ≤ k k k k k ≤ k k

for some constant C > 0, any x X, a A and b B ; ∈ ∈ a ∈ a (ii) Let function C1,2; there exist constants D and k, such that V ∈

(t, x) D 1+ x k for any (t, x) ; kV k ≤ k k ∈Q  (iii) is continuous on ¯, the closure of ; V Q Q (iv) + Ξ(τ) + (τ, x) 0 for all (t, x) , where E T (τ, x ) dτ < for Vτ V G ≥ ∈Q s |G τ | ∞ n o any (s, x ) and some function : R Rn R. Then,R s ∈Q G × 7→ T (s, xs) E (τ, xτ )dτ + (T, xT) . V ≤ s G V n Z o Proof: Refer to Theorem 5.1 on page 124 in [78].

88 We adopt the definition of nonanticipative strategt as well as the (lower/upper)

Value function introduced in (4.7)-(4.8). Denote by ˆ, ˆ and ˆ the upper, the lower V V V and the Value function in the following theorem.

Theorem 12 Suppose that functions f, σ satisfy the conditions (i), and ˆ satisfies V (ii)-(iii) in Lemma 8. Assume that there exists a( ) A(s) such that · ∈ T E (τ, x ,a , β[a] ) dτ < ω G τ τ τ ∞ Zs n o

for any (s, x ) and β ∆(s). Let ˆ be a solution of the following HJI equation, s ∈Q ∈ V ˆ ˆ t(t, xt)+ min max Ξ(t) (t, xt)+ (t, xt,at, bt) = 0 (4.24) V at Aa bt Ba V G ∈ ∈ n h i o for any (t, x ) with the boundary ˆ(T, x ) = Q(x ) where (T, x ) ∂ (the t ∈ Q V T T T ∈ Q boundary of ), such that ˆ C1,2( ) and continuous on ¯ (the closure of ). Q V ∈ Q Q Q Then,(i) ˆ (t, xt) sup ˆ(a, β[a]; t, xt) V ≤ β ∆(t){J } ∈ for any a( ) A(t) and (t, x ) ; · ∈ t ∈Q

(ii) if there exists a∗( ) A(s) such that a∗ A and it satisfies · ∈ t ∈ a ˆ ˆ max Ξ(t) (t, xt)+ (t, xt,at∗, bt) = min max Ξ(t) (t, xt)+ (t, xt,at, bt) (4.25) bt Ba at Aa bt Ba ∈ { V G } ∈ ∈ { V G } for any (t, x ) , then t ∈Q ˆ (s, xs) = sup (a∗, β[a∗]; s, xs) V β ∆(s){J } ∈ for any (t, x ) . t ∈Q

Proof: The theorem can be easily proved by extending Theorem 4.1 on page 159 in [78] that is based on a minimization problem to a minimax problem.

89 Note that in (4.24) and (4.25), the minimum and the maximum are attainable due to

the continuity of the functions in the brackets and the compactness of the control set

Aa and Ba. The saddle point equilibrium can be specified in the following corollary,

where the Isaacs condition holds.

Corollary 1 Suppose that function f, σ satisfy the conditions (i), and ˆ satisfies V (ii)-(iii) in Lemma 8. Assume that there exists a( ) A(s) such that · ∈ T E G˜(τ, x ,a , β[a] ) dτ < ω τ τ τ ∞ Zs n o

for any (s, x ) and β ∆(s). Let ˆ be a solution of the following HJI equation, s ∈Q ∈ V

ˆt(t, xt) + min max Ξ(t) ˆ (t, xt)+ (t, xt,at, bt) at Aa bt Ba V ∈ ∈ { V G }   = ˆt(t, xt) + max min Ξ(t) ˆ (t, xt)+ (t, xt,at, bt) = 0 (4.26) bt Ba at Aa V ∈ ∈ { V G }   for any (t, x ) with the boundary ˆ(T, x ) = Q(x ) where (T, x ) ∂ (the t ∈ Q V T T T ∈ Q boundary of ), such that ˆ C1,2( ) and continuous on ¯ (the closure of ). Q V ∈ Q Q Q Then,(i)

inf ˆ(α[b], b; s, xs) ˆ(s, xs) sup ˆ(a, β[a]; s, xs) α Γ(t){J } ≤ V ≤ β ∆(t){J } ∈ ∈ for any a( ) A(s), b( ) B(s) and (s, x ) ; · ∈ · ∈ s ∈Q

(ii) if there exists a∗( ) A(s) such that a∗ A and it satisfies · ∈ t ∈ a

ˆ ˆ max Ξ(t) (t, xt)+ (t, xt,at∗, bt) = min max Ξ(t) (t, xt)+ (t, xt,at, bt) bt Ba at Aa bt Ba ∈ { V G } ∈ ∈ { V G } for any (t, x ) , and there exists b∗( ) B(s) such that b∗ B and it satisfies t ∈Q · ∈ t ∈ a

ˆ ˆ min Ξ(t) (t, xt)+ (t, xt,at, bt∗) = max min Ξ(t) (t, xt)+ (t, xt,at, bt) at Aa bt Ba at Aa ∈ { V G } ∈ ∈ { V G }

90 for any (s, x ) , then s ∈Q

ˆ(s, xs) = sup (a∗, β[a∗]; s, xs) = inf (α[b∗], b∗; s, xs) = (a∗, b∗; s, xs) V β ∆(s){J } α Γ(s){J } J ∈ ∈ for any (s, x ) . This is a saddle-point equilibrium and ˆ is the Value of the s ∈ Q V game.

Remark 11 In Corollary 1, the Value function of the game is characterized by a

(classical) solution of the HJI equation. Under (4.26), the Isaacs condition holds.

4.3.2 Solution to a Two-Player Pursuit-Evasion Game

In this section, we solve a specific two-player PE game in a R2 space withx ¯ and y¯ as the coordinates. Consider the following dynamics of the players,

d¯xς = vς cos θς dt + σς dωςx¯ withx ¯ς (0) =x ¯ς0, (4.27a)

d¯yς = vς sin θς dt + σς dωςy¯ withy ¯ς (0) =y ¯ς0. (4.27b)

Equation (4.27) is a noise-corrupted version of the first-order Dubin’s car model in

(3.8), where the subscript ς p,e stands for the pursuer or the evader;x ¯ and ∈ { } ς y¯ς are the state variables (displacement) along thex ¯- andy ¯-axis respectively; vς is

the velocity; θς is the control input; ωx¯ς (ωy¯ς ) is a standard Wiener process along the x¯-axis (¯y-axis); σς is a constant. Assume that ωςx¯ and ωςy¯ are independent, and so are ωpx¯ (ωpy¯) and ωex¯ (ωey¯). Choose the capture time of the evader as the objective, i.e., = T (G = 1 and Q = 0 in (4.6)). Define x , [¯x , y¯ ]T, and we rewrite the J ς ς ς dynamics in (4.27) as dx = f (x , θ )dt + σ dω . The game ends when x x ε ς ς ς ς ς ς k p − ek ≤ for some ε> 0.

91 Theorem 13 Assume that E T < , v > v with ε > σ2 + σ2 2(v v ), the { } ∞ p e p e p − e Value function of the PE game (x , x ) is given by  V p e

2 2 2 2 (¯xp x¯e) + (¯yp y¯e) σp + σe 2 2 (xp, xe)= − − + 2 ln x¯p x¯e) + (¯yp y¯e) + C(ε) V vp ve 4(vp ve) − − p − − h i (4.28)

2 2 2 ε (σp+σe ) ln(ε ) where the constant C(ε)= · 2 . vp ve 4(vp ve) − − − −

Proof: We apply Corollary 1 to prove the theorem. We need to show that is V a solution of the corresponding HJI equation as in (4.26). It suffices to show that

min max Ξ(t) (x(t))+1 = 0. First of all, it is easy to check that f, σ in (4.27) θp θe { V } and satisfy conditions (i)-(iii) in Lemma 8. The assumption that E T < implies V { } ∞ that condition in (iv) holds, i.e.,

t E (τ, xτ ) dτ < . s |G | ∞ n Z o Note that here = 1 and = 0 when (¯x x¯ )2 + (¯y y¯ )2 = ε (on the boundary G V p − e p − e of the capture range), and here is notp an explicit function of time. By (4.22), V

2 2 2 2 ∂ ∂ 1 ∂ 2 1 ∂ 2 Ξ(t) = V fp + V fe + 2 V σp + 2 V σe . (4.29) V ∂xp · ∂xe · 2 ∂ xpi 2 ∂ xej i=1 j=1 X X D1 D2 | {z } Here, i = 1, 2 (j = 1, 2) stands forx ¯ and| y ¯ respectively.{z Note that} ωςx¯ and ωςy¯ are independent. Substitute (4.27) into (4.29) to replace fp and fe. After a little

92 manipulation, the terms D1 and D2 in (4.29) become

1 (¯xp x¯e) D1 = − vp cos(θp) 2 2 vp ve (¯xp x¯e) + (¯yp y¯e) − − − p(¯yp y¯e) (¯xp x¯e) + − vp sin(θp)+ − ve cos(θe) (¯x x¯ )2 + (¯y y¯ )2 (¯x x¯ )2 + (¯y y¯ )2 p − e p − e p − e p − e p (¯yp y¯e) p + − ve sin(θe) 2 2 (¯xp x¯e) + (¯yp y¯e) ! − − pσ2 + σ2 2(¯x x¯ ) + p e p − e v cos(θ ) 2(v v )2 (¯x x¯ )2 + (¯y y¯ )2 p p p − e p − e p − e 2(¯x x¯ ) 2(¯x x¯ ) + p − e v sin(θ )+ p − e v cos(θ ) (¯x x¯ )2 + (¯y y¯ )2 p p (¯x x¯ )2 + (¯y y¯ )2 e e p − e p − e p − e p − e 2(¯x x¯ ) + p − e v sin(θ ) , (4.30) (¯x x¯ )2 + (¯y y¯ )2 e e p − e p − e ! and,

D2 = σ2 + + σ2 + p Vx¯px¯p Vy¯py¯p e Vx¯ex¯e Vy¯ey¯e 2 2 1  (¯xp x¯e)  2 (¯yp y¯e) 2 = − 3 σp + − 3 σp 2 2 2 2 2(vp ve) [(¯xp x¯e) + (¯yp y¯e) ] 2 [(¯xp x¯e) + (¯yp y¯e) ] 2 − − − − − 2 2 (¯xp x¯e) 2 (¯xp x¯e) 2 + − 3 σe + − 3 σe 2 2 2 2 [(¯xp x¯e) + (¯xp x¯e) ] 2 [(¯xp x¯e) + (¯yp y¯e) ] 2 ! − − − − σ2 + σ2 (¯y y¯ )2 (¯x x¯ )2 (¯x x¯ )2 (¯y y¯ )2 + p e p − e − p − e σ2 + p − e − p − e σ2 8(v v )2 [(¯x x¯ )2 + (¯y y¯ )2]2 p [(¯x x¯ )2 + (¯y y¯ )2]2 p p − e p − e p − e p − e p − e (¯y y¯ )2 (¯x x¯ )2 (¯x x¯ )2 (¯y y¯ )2 + p − e − p − e σ2 + p − e − p − e σ2 . (4.31) [(¯x x¯ )2 + (¯x x¯ )2]2 e [(¯x x¯ )2 + (¯y y¯ )2]2 e p − e p − e p − e p − e !

93 By inspection of (4.29)-(4.31), only the term D1 in (4.30) involves control θp and θe.

Clearly,

2 2 2 1 σp + σe 2(¯xp x¯e) min max D1 = ( vp + ve)+ 2 − 3 vp θp θe { } vp ve − 4(vp ve) − 2 2 2 − − (¯xp x¯e) + (¯yp y¯e) 2 − 2 − 2(¯yp y¯e)  2(¯xp x¯e)  − 3 vp + − 3 ve − (¯x x¯ )2 + (¯y y¯ )2 2 (¯x x¯ )2 + (¯y y¯ )2 2 p − e p − e p − e p − e 2  2(¯yp y¯e)    + − 3 ve (¯x x¯ )2 + (¯y y¯ )2 2 ! p − e p − e 1 σ2 + σ2 1 = 1 p e  . (4.32) − − 2 vp ve (¯x x¯ )2 + (¯y y¯ )2 − p − e p − e On the other hand, D2 in (4.31) canp be simplified as

1 σ2 + σ2 1 D2= p e . (4.33) 2 vp ve (¯x x¯ )2 + (¯y y¯ )2 − p − e p − e By (4.32)-(4.33), min max Ξ(t) (px(t))+1 = 0. Thus, is a solution of the θp θe { V } V HJI equation. Furthermore, since the dynamics of the pursuers and the evaders are

independent, function f is separable in the sense of Theorem 9. Due to the separability

of G (G = 1), by Theorem 9 and Corollary 1, is the Value. V

Remark 12 Compared with the Value function (3.12) of a corresponding determin- istic game with the same objective, the Value (4.28) here has an additional term due to the disturbance.

With the Value derived in (4.28), a multi-player stochastic PE game that has the objective chosen as the sum of the capture times of all the evader and the dynamics in (4.27) can be decomposed into distributed two-player games. Then, the iterative method is applicable.

94 4.3.3 On Finite Expectation of the Capture Time

In this section, we study the condition of E T < in Theorem 12. We first { } ∞ introduce the following lemma.

Z k n Lemma 9 For any λ, 0 <λ< 1 and k 0, n∞ n λ− < . ∈ ≥ =1 ∞ P The proof is omitted and can be found in real analysis texts such as [79].

Let us first consider a simplified game in a one dimensional space as shown in

Figure 4.1, where the dynamics of the players is similar to (4.27) but with states in

R, i.e.,

dxς = vς uς dt + σς dως

Here, ς p,e ; v is the velocity; u 1, 1 is the control variable; ω is a one ∈ { } ς ς ∈ { − } dimensional standard Wiener process. Clearly, in order to capture the evader, the pursuer moves towards the evader and the evader follows the same direction, i.e., u = u = 1 as illustrated in Figure 4.1. Let x = x x , and then, the dynamics of p e e − p the game becomes

dx = vdt + σdω, (4.34) where v = v v and σ2 = σ2 + σ2. e − p p e

v p v e

Pursuer Evader x

Figure 4.1: The Simplified Stochastic PE Game in a One Dimensional Space

95 Theorem 14 If v > v and ε 0, then E T < , where T is the capture time, p e ≥ { } ∞ T = inf t x(t) ε . { | | | ≤ }

Proof: Suppose that the game starts at time 0 and x(0) >ε, otherwise T = 0. By

the property of Wiener processes, the state xt at time t > 0 is a Gaussian random

variable, i.e., x µ , σ2 . Here, the mean is µ = x(0) + vt and the variance t ∼ N xt xt xt   is σ2 = σ2 + σ2 t. Letv ¯ = v = v v > 0. The fact that the evader has not xt p e − p − e

been captured by time t > 0 implies that that xt > ε at at least time t. Denote by

P (T >t) the probability that evader has not been captured by time t. It satisfies

2 1 (x µxt ) P (T >t) P xt >ε = exp − 2 dx ≤ x>ε σ√2πt − 2σ Z xt !  1 (x x(0) +vt ¯ )2 = exp − 2 dx. (4.35) x>ε σ√2πt − 2σ t Z ! Define t =(x(0) ε)/v¯, t˜= t t and r = x +v ¯t˜, such that P x >ε in (4.35) can 0 − − 0 t be rewritten as 

1 (x +v ¯t˜)2 P xt >ε = exp dx ˜ − 2σ2(t + t˜) Zx>0 σ 2π(t0 + t) 0 !  p 1 r2 = exp dr. ˜ ˜ − 2σ2(t + t˜) Zr>v¯t σ 2π(t0 + t) 0 ! Choose some t such thatv ¯(t t )p> 1, and define 1 1 − 0

t = max 2t ,t . (4.36) 2 { 0 1}

Consider the time t when t>t2, i.e., t>t˜ 0 andv ¯t>˜ 1, and then

1 r2 1 v¯ t˜ r P xt >ε < exp dr< exp · · dr σ√4πt ˜ − 4σ2t˜ σ√4πt ˜ − 4σ2t˜ 0 Zr>v¯t ! 0 Zr>v¯t !  2 2 2σ v¯ ˜ 2σ v¯ = exp 2 t = exp 2 (t t0) . (4.37) v¯√πt0 − 4σ ! v¯√πt0 − 4σ − !

96 Denote by p (t) the probability density of the capture time T , the expectation E T T { } is

t2 ∞ ∞ E T = t p (t)dt = t p (t)dt + t p (t)dt, (4.38) { } · T · T · T Z0 Z0 Zt2

where t2 is chosen as in (4.36). Next, we shall show that the second term on the right

hand site of (4.38) is finite, which implies that E T is finite. Choose a small δt > 0, { } the second term can be written as follows, illustrated in Figure 4.2.

pT t

GPt

…... …... t 0 2 tt2 G t2  NG t T

Figure 4.2: The Probability Density of the Capture Time pT (t)

t2+(k+1)δt ∞ ∞ t p (t)dt = t p (t)dt (4.39) · T · T t2 k t2+kδt Z X=0  Z  Each term in the summation in (4.39) satisfies

t2+(k+1)δt t2+(k+1)δt t p (t)dt < t +(k + 1)δt p (t)dt · T 2 T Zt2+kδt Zt2+kδt  ∞ < t2 +(k + 1)δt pT (t)dt (4.40) Zt2+kδt  Note that P T >t + kδt = ∞ pT (t)dt. By (4.35) and (4.37), 2 t2+kδt

 R 2 ∞ 2σ v¯ p (t)dt< exp t + kδt t . (4.41) T v¯√πt − 4σ2 2 − 0 Zt2+kδt 0 ! 

97 Substitute (4.41) into (4.39), and by Lemma 9,

2 ∞ ∞ 2σ v¯ t p (t)dt< t +(k+1)δt exp t +kδt t < . (4.42) · T 2 v¯√πt − 4σ2 2 − 0 ∞ Zt2 k=0 " 0 !# X   By (4.38) and (4.42), E T < . { } ∞ Now, we study the game in a R2 space specified in Theorem 13. First of all, change the variables asx ˜ =x ¯ x¯ andy ˜ =y ¯ y¯ . According to (4.32), the optimal p − e p − e control of the pursuer coincides with that of the evader, namely, θp∗ = θe∗. Suppose that both the pursuer and the evader use the same optimal control and denoted by

θ. Then the dynamic equation of players can be rewritten as

d˜x =(v v ) cos θ(t)dt + σ dω σ dω withx ˜(0) =x ˜ , (4.43a) p − e p px¯ − e ex¯ 0 d˜y =(v v ) sin θ(t)dt + σ dω σ dω withy ˜(0) =y ˜ . (4.43b) p − e p py¯ − e ey¯ 0

In (4.43a), ω and ω are independent, such that the term σ dω σ dω is px¯ ex¯ p px¯ − e ex¯ 2 2 equivalent to a random process σdωx˜ with σ = σp + σe , where ωx˜ is a standard

Wiener process. Similarly, σdω can be definedp according to σ dω σ dω in y˜ p py¯ − e ey¯ (4.43b). Let v = v v and x = [˜x, y˜]T. Define ρ(x) = x˜2 +y ˜2. The following p − e theorem provides sufficient conditions under which the capturp e time T of the evader

has a finite expectation.

Theorem 15 In a PE game with the dynamics given in (4.27), if vp > ve, ε >

(σ2 + σ2)/2(v v ) and the probability P ρ(x ) > ε under the optimal control p e p − e t determined in (4.32) satisfies 

2 2 ∞ Cρ (ρ ρ + κ t) P ρ(x ) >ε < exp − 0 dρ (4.44) t σ2t − 2σ2t Zε !  for some C > 0, κ R (κ = 0) and any t> 0, then E T < , where ρ = x˜2 +y ˜2. ∈ 6 { } ∞ 0 0 0 98 p Proof: The proof is similar to that of Theorem 14. Let t =(ρ ε)/κ2, t¯= t t 0 0 − − 0 and r = ρ + κ2t¯. By (4.44),

2 2 ∞ Cρ (ρ ρ + κ t) P ρ(x ) >ε < exp − 0 dρ t σ2t − 2σ2t Zε !  2 2 ∞ Cρ (ρ + κ t¯) = exp dρ σ2(t¯+ t ) − 2σ2(t¯+ t ) Z0 0 0 ! 2 2 ∞ C(r κ t¯) r because r = ρ + κ2t¯ = − exp dr. 2 ¯ 2 ¯ κ2t¯ σ (t + t0) − 2σ (t + t0)!   Z Choose some t such that κ2(t t ) > 1 and let t = max 2t ,t . Consider that 1 1 − 0 2 { 0 1} 2 t>t2, such that t>t¯ 0 and κ t>¯ 1.

2 2 ∞ C(r κ t¯) r P ρ(xt) >ε < 2 − exp 2 dr 2¯ σ (t¯+ t ) − 2σ (t¯+ t ) Zκ t 0 0 !  2 2 ∞ C(r κ t¯) κ t¯ r < −2 exp 2· dr 2¯ 2σ t − 4σ t¯ Zκ t 0 ! 2 2 ∞ C(r κ t¯) κ r = −2 exp 2 dr 2¯ 2σ t − 4σ Zκ t 0 ! 2 4 8Cσ κ (t t0) = 4 exp −2 (4.45) κ t0 − 4σ !

for any t 0. The rest of the proof follows the proof of Theorem 14 from (4.38). ≥

Remark 13 If the conditions in Theorem 15 hold, P r(T < ) = 1. ∞

Justification of the Condition in (4.44)

In the following, we shall justify that the condition in (4.44). The discussion is rather heuristic than mathematically rigorous. The complete characterization of the distribution of the states requires an analytical solution of a PDE called Fokker

99 Planck Equation (FPE) [80] as

2 2 2 ∂p(t, x) ∂ 1 ∂ p(t, x) 2 = p(t, x)fi(x, θ) + 2 σ with p(0, x)= δ(x x0), ∂t − ∂xi 2 ∂x − i=1 i=1 i X   X (4.46)

where δ( ) is Dirac-Delta function; f is the ith element of function f in (4.43), where · i i = 1, 2 stands forx ˜ andy ˜ respectively. An analytical solution of the equation (4.46) p(t, x) is formidable. In the following, we provide an upper-bound of p(t, x).

First, we analyze a simplified situation where the control variable θ in (4.46) is

fixed, i.e., θ = θ for t 0. Based on the property of the Wiener process, the t 0 ≥

θ0 probability distribution px (t, x) at time t is 2 2 1 x˜ µx˜(t) + y˜ µy˜(t) θ0 − − px (t, x)= 2 exp 2 , (4.47) 2πσ t − 2σ t  !

θ0 where µx˜(t) =x ˜0 + vt cos θ0 and µy˜(t) =y ˜0 + vt sin θ0. Here in the notation of px , the

superscript indicates the fixed control θ0 and the subscript x implies that the distri-

bution is in the x coordinates. Now, we change the coordinates by a transformation

at time t, which is illustrated in Figure 4.3. Gt x′ x˜ cos βt sin βt xˆ , = t , with t = , (4.48) y′ G y˜ G sin β cos β      − t t  where βt is the angle between the line of sight from the evader to the pursuer and the

x˜-axis at time t.

Clearly, x′ and y′ are jointly Gaussian with the mean

µ ′ cos β sin β µ x = t t x˜ µ ′ sin β cos β µ  y   − t t   y˜  µ cos β + µ sin β µ2 + µ2 = x˜ t y˜ t = x˜ y˜ (4.49) µ sin β + µ cos β  − x˜ t y˜ t  " q 0 # and the covariance Covt, 2σ2t 0 2σ2t 0 Cov = T = . t Gt 0 2σ2t Gt 0 2σ2t     100 c v y y xc

P x, P y Pursuer

Et Evader x

Figure 4.3: Change of thex ˜-˜y Coordinates to the x′-y′ Coordinates

T T T Thus, x′ and y′ are independent. Note thatx ˆ = x′ , y′ and the distribution in

θ0 h i the new coordinates pxˆ becomes

2 2 2 2 x′ µ (t)+ µ (t) + y′ 1 − x˜ y˜ pθ0 (t, xˆ)= exp     . (4.50) xˆ 2πσ2t − q 2σ2t         Next, we change the x′-y′ coordinates to the ρ-ϕ (polar) coordinate system as

2 2 1 ρ ′ ′ = x + y , ϕ ′ ′ = tan− (y′/x′) with ϕ ′ ′ [0, 2π). x ,y ′ ′ x ,y x ,y ∈ q 1 1 Here, tan− is slightly different from the standard definition of the function tan− ,

and its range is the interval [0, 2π). At each (x′, y′), the absolute value of the Jacobian

J t (x′, y′) of the transformation t is G G ′ x′ y √x′2+y′2 √x′2+y′2 1 1 J t (x′, y′) = det ′ ′ = = . G y x 2 2 ρ ′ ′ | | ′2 ′2 ′2− ′2 ! x′ + y′ x ,y x +y x +y

p

101 θ0 Now, we derive the probability density pρ,ϕ(t, ρ, ϕ) in the ρ-ϕ coordinates. By (4.47)

and (4.48),

θ0 1 θ0 pρ,ϕ(t, ρ, ϕ) = J t (x′, y′) − pxˆ (t, x) | G | 2 2 2 2 ρ ρ cos ϕ µx˜(t)+ µy˜(t) +(ρ sin ϕ) = exp − (4.51). 2πσ2t − q 2σ2t       2 2 Letρ ¯µx˜,µy˜ = µx˜ + µy˜. If ϕ = 0, q 2 ρ ρ ρ¯µ ,µ θ0 − x˜ y˜ pρ,ϕ(t, ρ, ϕ =0)= 2 exp 2 for ρ 0. (4.52) 2πσ t − 2σ t  ! ≥

By inspection of (4.51) and (4.52), for any ϕ = 0, p (t, ρ, ϕ) < p (t, ρ, ϕ = 0). 6 ρ,ϕ ρ,ϕ Then, the probability that ρ > r for some r 0 under control θ at time t, θ0 (ρ> ≥ 0 P r; t) satisfies

2π ∞ θ0 (ρ > r; t) < dϕ p (t, ρ, ϕ = 0)dρ P ρ,ϕ Z0 Zr 2π 2 ∞ ρ ρ ρ¯µ ,µ = dϕ exp − x˜ y˜ dρ. (4.53) 2πσ2t − 2σ2t Z0 Zr  ! ∗ Next, we study the probability θ (ρ > r; t) when both the pursuer and the evader P exploit the optimal control θ∗ determined in (4.39). Intuitively, optimal control θ∗ (in state feedback) drives each state (˜x, y˜) towards the origin, such that it outperforms

(static) θ0 with respect to minimizing the distance ρ between the pursuer and the

evader. On the other hand, suppose that the state xθ∗ (t) under θ∗ has a smaller

2 variance σ at any time t compared to that under a fixed control θ0. Similar to

(4.53), we assume that the following inequality is true. Denote by µx∗˜(t) and µy∗˜(t) the

2 2 mean ofx ˜ andy ˜ at time t 0 under θ∗ andρ ¯µ∗ ,µ∗ (t)= µ∗ (t) + µ∗ (t) . ≥ x˜ y˜ x˜t y˜t q

102 Assumption 5

2 2π ρ ρ¯µ∗ ,µ∗ (t) ∗ ∞ Cρ x˜ y˜ θ (ρ > r; t) < dϕ exp − dρ (4.54) P 2πσ2t − 2σ2t   Z0 Zr     for some constant C > 0.

Since function ρ = x˜2 +y ˜2 is convex, by Jensen’s inequality, p 2 2 2 2 ρ¯µ∗ ,µ∗ = µ∗ + µ∗ E x˜ ∗ +y ˜ ∗ , ρ¯θ∗ . x˜ y˜ x˜ y˜ ≤ θ θ q q 

Thus, inequality (4.54) holds whenρ ¯ ∗ ∗ is replaced byρ ¯ ∗ , i.e., µx˜,µy˜ θ

2 2 ∗ ∞ Cρ (ρ ρ¯ ∗ (t)) θ (ρ > r; t) < πdϕ exp − θ dρ. (4.55) P 2πσ2t − 2σ2t Z0 Zr !

∗ ∗ 2 2 In the following, we derive the evolution ofρ ¯θ (t) under θ∗. Let ρθ = x˜θ∗ +y ˜θ∗ p along the trajectory under optimal control θ∗. The evolution of ρθ∗ (t) can be derived

based on the Ito’s rule of integral. According to the definition of Ξ(t) in (4.21), ρθ∗ (t)

at any (˜x, y˜) under the control θ∗ is

∗ ∗ 2 ∗ 2 ∗ ∂ρθ (t) ∂ρθ (t) 1 ∂ ρθ (t) 2 1 ∂ ρ¯θ (t) 2 ρ˙ ∗ (t) = v cos θ∗ + v sin θ∗ + σ + σ θ ∂x˜ ∂y˜ 2 ∂x˜2 2 ∂y˜ 1 = v + σ2. (4.56) − 2ρθ∗ (t)

If the evader has not been captured by t, then ρ(τ) >ε for any τ (0 τ t). Here, ≤ ≤ we focus on the set = (˜x, y˜) x˜2 +y ˜2 >ε . Since ε > σ2/2v (see Theorem 15), S n p o it follows from (4.56) that

1 2 1 2 2 ρ˙θ∗ (t)= v + σ < v + σ = κ < 0 (4.57) − 2ρθ∗ (t) − 2ε −

for any 0 τ t. Equation (4.57) indicates that the decreasing rate of ρ at any ≤ ≤ point in is less than κ2. Since this holds at any (˜x, y˜) for ρ>ε, the decreasing S − 103 rate ofρ ¯ ∗ , the expectation of ρ ∗ , is less than κ2. Namely,ρ ¯ ∗ (t) <ρ κ2t. This θ θ − θ 0 − relationship holds as long as the state (˜x, y˜) remains in . The inequality of (4.55) S holds withρ ¯ ∗ (t) replaced by ρ κ2t, which yields (4.44). θ 0 −

4.4 On Existence of Solutions to Stochastic Pursuit-Evasion Games with Perfect State Information

In this section, we outline the study on the existence of Value for the stochastic PE

games with the perfect state information pattern, and general issues are discussed with

suggestions for the future study. As in the deterministic case, this problem is discussed

under the framework of the viscosity solution of (second-order) HJ equations.

Consider the players’ dynamics formulated in (4.3). Recall the second order dif-

ferential operator defined for the stochastic process in (4.21). It is expected that the

corresponding HJI equation (with a fixed z) for a stochastic game problem has the

following form, ∂uz ∂uz ∂2uz + H (x, , ) = 0. (4.58) ∂t z ∂x ∂x2

c Here, the HJI equation in (4.58) is defined on Λz with proper boundary conditions,

where Hz can be

1 Hz(x, p, D)= Hz−(x, p, D) = max min p f(x, a, b)+ tr Σ(x)D + G(x, z, a, b) , b Ba a Aa ∈ ∈ · 2 n  o or,

+ 1 Hz(x, p, D)= Hz (x, p, D) = min max p f(x, a, b)+ tr Σ(x)D + G(x, z, a, b) . a Aa b Ba ∈ ∈ · 2 n  o Here, Σ(x)= σ(x)σT(x), in which σ(x) is given in (4.3). Note that (4.58) is a second-

order PDE and its viscosity solution can be defined similarly according to Definition

4 with H(t,x,p) replaced by Hz(x, p, D).

104 The method for a deterministic PE game can be used to show the existence of

(upper/lower) Value of a Stochastic PE game. Suppose that an approximate upper

Value function can be determined by a suboptimal control under a structure of SSTC.

The major step is to show that the image of transformation of a continuous function H is the (unique) viscosity solution of the corresponding HJI equation in (4.58), Wk where is defined as H t+∆t [ k](x, z) = sup inf Eω G xτ , zτ ,aτ , β[a]τ dτ H W β ∆(t) a( ) A(t) t ∈ · ∈  Z  + (x , z ) . (4.59) Wk t+∆t;x,a,β[a],ω t+∆t;z,x  This can be proved if [ ](x, z) in (4.59) satisfies the principle of DP [50], be- H Wk cause the viscosity solution is a local property. In contrast to deterministic problems, the principle of DP for stochastic problems is non-trivial due to some measurability problem [50, 19]. Once [ ](x, z) is shown to be a viscosity solution of the corre- H Wk

sponding HJI equation for any k Z 0, by iteration, the upper Value function can be ∈ ≥ approached in the limit by Theorem 11. Furthermore, according to the most general

uniqueness results on the viscosity solution of a second-order HJ equation by Ishii

and Lions in [48] (Theorem VI.5 on pp.65), [ k](x, z) for k Z 0 can be shown as H W ∈ ≥ the unique viscosity solution of the equation (4.58). Then, with a separable function

G as in (2.65), it follows that the upper and the lower Value coincide, such that the

Value function of a stochastic game exists.

In this section, we only provide some guidelines to show the existence of the Value

function of a stochastic PE game under the framework of the viscosity solution theory

of HJ equations. A thorough investigation on this topic is a potential future research

direction. The study may start with [50, 48, 19] and the references therein.

105 4.5 Stochastic Pursuit-Evasion Games with Imperfect State Information

Up to this point, we have discussed stochastic multi-player PE games with perfect state information. In this section, we deal with a more realistic model, where the measurements of the players are no longer perfect. This is a very difficult problem even for control (with one player) problems. Stochastic games may encounter additional difficulty since the fundamental assumption of common knowledge among the players may no longer be valid due to the distinct noisy measurements by the pursuers and the evaders respectively [81, 82, 83]. To avoid this kind of difficulties from the information structure, we only deal with a class of special problems where the pursuers have noisy measurements but the evaders can still measure the states perfectly. This presents a worst-case scenario from the pursuers’ perspective. In the following, we first propose a suboptimal approach to a stochastic control problem with limited look-ahead, based on which the results on PE game problems are presented afterwards.

4.5.1 Limited Look-ahead for Stochastic Control Problems

We consider a stochastic control problem with the following system dynamic and measurement equations.

dx = f(t, x, u)dt + σ(t, x)dω with x(t0)= x0 (4.60a)

y = h(x, ξ) (4.60b)

Here, x is the state; u is the control; ω is a standard Wiener process; y is the measure- ment; ξ is the measurement disturbance, which is modelled as a Gaussian random variable. Assume that the state and the measurement are in some spaces with proper

(finite) dimensions respectively. Suppose that f satisfies the regularity conditions in

106 t, x and u as well as σ, such that equation (4.60a) admits a unique solution under admissible controls (see Theorem 5.2 in [7] on pp. 230). Consider an integrable cost functional as T J(u; x0,ω)= g(t, xt, ut)dt + q(xT ). (4.61) Zt0 In (4.61), g 0 is the cost rate; q 0 is the terminal cost. We refer to the information ≥ ≥ set as all the information that is available to a decision-maker (controller) up to time t, and denote it by It. Clearly, for the perfect state information case considered earlier,

Iperfect = x t τ t . In the case where the measurement is imperfect, the t τ | 0 ≤ ≤ information set is I = u , y t τ

(4.60a) admits a unique solution with u(t)= φ(It). The optimal (closed-loop) control problem is to find φ such that the following objective is minimized.

min Ex0,ω,ξ J(u; x0) I0 . (4.62) φ Φ ∈ 

Here, the expectation is taken over x , ω and ξ for all t t T given the information 0 0 ≤ ≤

I0. In general, the problem in (4.62) is very difficult, and a solution may not exist.

Closed-form solutions to this kind of problems are mostly absent except for stochastic problems with linear dynamics, a quadratic objective functional and Gaussian noise, for which the underlying Riccati equations have solutions [7]. In most cases, subopti- mal approaches are used, among which is the so-called Open-Loop Feedback Control

(OLFC) [54]. OLFC is feedback implementation of open-loop optimal controls. At any sample time t + k∆t > 0 (k Z+), an open-loop control problem is solved 0 ∈ given the information set It0+k∆t. The open-loop control is implemented immediately afterwards from t0 + k∆t to t0 +(k +1)∆t. We denote by VOL the performance index

107 of an open-loop optimal control as

V (I ) , min E J(u; t, x ) I OL t0+k∆t xt0+k∆t,ω t t0+k∆t u( ) U(t0+k∆t) { | } · ∈ T = min E g(τ, x , u )dτ + q(x ) I , xt0+k∆t,ω τ τ T t0+k∆t u( ) U(t0+k∆t) | · ∈ Zt0+k∆t  (4.63) where u( ) U(t) , φ : [t, T ] U φ( ) is progressively measurable (see · ∈ 7→ a · Ft,T − n o Definition 5) for t t T and U is a compact set. Equation (4.63) is formulated 0 ≤ ≤ a based on an underlying assumption that no more measurement will be taken after time t0 + k∆t.

Now, taking one step further, we consider the control action based on limited look-ahead. Suppose that the current time is t0 + k∆t and one more measurement y is allowed at t +(k + 1)∆t with t +(k + 1)∆t T . Clearly, the effect of the 0 0 ≤ future measurement is taken into account in the decision-making at t0 + k∆t. The new optimization problem becomes

t0+(k+1)∆t V (I ) , min E g(τ, x , u )dτ LA t0+k∆t xt0+k∆t,ω τ τ u( ) U(t0+k∆t) · ∈ (  Zt0+k∆t 

+EIt +(k+1)∆t VOL(It0+(k+1)∆t) u[t0+k∆t,t0+(k+1)∆t) It0+k∆t ,(4.64) 0 ) h i where u stands for u(τ) for t + k ∆t τ

It0+(k+1)∆t = It0+k∆t, u[t0+k∆t,t0+(k+1)∆t), yt0+(k+1)∆t .  Therefore, in (4.64), the expectation taken with respect to It0+(k+1)∆t given It0+k∆t

and u[t0+k∆t,t0+(k+1)∆t) is equivalent to that taken with respect to yt0+(k+1)∆t.

In the following, we shall show that V V at any sample time t + k∆t. LA ≤ OL 0 First of all, we consider a static decision problem with the objective function J(u, ω),

108 in which u U is the decision variable and ω Ω is a random variable in a finite ∈ a ∈ dimensional space. If the decision-maker only knows the distribution p(ω) of ω, the problem is to minimize the expected cost with a proper u as

V = min Eω J(u, ω) . (4.65) u Ua ∈ { }

Intuitively, additional information to the decision-maker may lead to a better decision.

Suppose that the decision-maker has extra knowledge of ξ in addition to p(ω) before he makes the choice. Let ξ Ψ be a random variable and the joint distribution ∈ p(ω, ξ) is known. A decision function φ is a map, φ : Ψ U . Denote by Φ the set 7→ a of all possible φ’s. The expected cost for the decision-maker becomes

Vξ = min Eξ Eω J φ(ξ); ω ξ =Eξ min Eω J u; ω ξ (4.66) φ Φ u Ua ∈  ∈  n n  oo n  o

Lemma 10 V V . ξ ≤

Lemma 10 can be shown by using the fact that E min J(u, ω) min E J(u, ω) ω { } ≤ { ω } and the details are omitted.

Z Theorem 16 VOL(It0+k∆t) VLA(It0+k∆t) for any k 0. ≥ ∈ ≥

Proof: For a short notation, we use k to denote time t0 + k∆t. By (4.63),

k+1

VOL(Ik) = min Exk,ω g(τ, xτ , uτ )dτ u( ) U(k) · ∈ (  Zk  T

+ min Exk+1,ω g(τ, xτ , uτ )dτ + q(xT ) Ik . (4.67) u( ) U(k+1) k+1 ) · ∈  Z 

109 Consider the measurement yk+1 at t0 +(k + 1)∆t, and view yk+1 as the ξ in Lemma

10. The second term in (4.67) satisfies

T

min Exk+1,ω g(τ, xτ , uτ )dτ + q(xT ) Ik+1, u[k,k+1) u( ) U(k+1) ( k+1 ) · ∈ Z T

Eyk+1 min Exk+1,ω g(τ, xτ , uτ )dτ + q(xT ) Ik, u[k,k+1), yk+1 ≥ u( ) Uk+1 · ∈ (  Zk+1  )

=Eyk+1 VOL(Ik+1) Ik, u[k,k+1) . (4.68) n o

Note that I = I , u , y and substitute (4.68) into (4.67), we obtain that k+1 { t [k,k+1) k+1} V (I ) V (I ). OL t0+k∆t ≥ LA t0+k∆t Both the OLFC scheme and the approach based limited look-ahead are subopti- mal, and they both provide an upper-bound for the optimal Value function. Theorem

16 states that the controls by limited look-ahead provide a better (expected) perfor- mance than the controls by OLFC scheme at any sample time t0 + k∆t. In other words, at each time t0 + k∆t(k Z 0), the problem in (4.64) is solved and the result- ∈ ≥ ing control is exploited over the next ∆t interval. In this process, the upper-bound of the performance by limited look-ahead is continuously improved and this bound is better than that by the OLFC scheme.

4.5.2 On Stochastic Pursuit-Evasion Games

In this section, we consider a stochastic PE game with an imperfect state in- formation pattern. We focus on a special game problem as follows. The scenario corresponds to a worst case from the pursuers’ perspective.

110 Assumption 6

(A1) The dynamics of both the pursuers and the evaders involve disturbance;

(A2) The pursuers measure the states subject to disturbances, while the evaders can

access the prefect state information;

(A3) Information is shared by all the pursuers or the evaders due to sufficient com-

munication on either side.

In the following, we shall use one-step look-ahead to solve a stochastic PE game

where Assumption 6 holds. Consider stochastic PE games with the players’ dy-

namics in (4.3) with (2.9), and the objective functional given in (4.6). Suppose a

decision is made at every sample time t0 + k∆t for k Z 0, where t0 is the start- ∈ ≥ ing time. Define the information set of the pursuers at time t for t t T as 0 ≤ ≤ Ip = Ip , u , y t τ t, t + k∆t t for k Z+ , where Ip is the ini- t t0 τ t0+k∆t 0 ≤ ≤ 0 ≤ ∈ t0  p tial information set at t0. Denote by It the family of all possible information sets p It at t. In the approach based on limited look-ahead, given an cost-go-to function p : I R, the pursuers solve the following optimization problem at each sample VC2G t 7→

timee t = t0 + k∆t.

t+∆t p p sup inf Ext,ω G τ, xτ , zτ ,aτ , β[a]τ +Eyt+∆t C2G(It+∆t) It (4.69) β ∆(t) a( ) A(t) t V ∈ · ∈  Z    e Here, the cost-to-go term is crucial to the performance. However, it is very difficult to determine an optimal cost-to-go function, which is essentially the true (upper/lower)

Value function. Therefore, in practice, a heuristic or suboptimal cost-to-go is VC2G often used instead in stochastic problems with imperfect state information patterns.e

For example, the performance index associated with open-loop strategies of the play- ers can be used. However, in a game where the terminal time is not fixed, e.g., a

111 PE game, open-loop strategies may not exist. In this case, an alternative cost-to-go

function needed to be selected. For instance, the cost-to-go function can be

(Ip)=E (x ) Ip . (4.70) VC2G t xt V t t n o e e In (4.70), is an approximate upper Value function determined (possibly by the V hierarchicale approach) for corresponding stochastic problems with perfect state infor- mation patterns. To further reduce the computational complexity in (4.70), certainty equivalence can be used.

Generally speaking, a stochastic game (control) problem with imperfect state information is very hard, and the theoretical results in this area are still largely unavailable. In this section, only a special case of this type of problems is discussed.

More general results entail further fundamental study.

4.5.3 On Finite Expectation of the Capture Time

This section is a counterpart of Section 4.3.3 and the theme is the capture time

in a two-player stochastic PE game with an imperfect state information pattern.

Particularly, we study conditions on the accuracy of the pursuer’s measurement, under

which the capture time of the evader has a finite expectation. Consider a two-player

PE game in a R2 space with the players’ dynamics given in (4.27) and the objective

as the capture time. Recall that in (4.27),x ¯ς andy ¯ς are used to denote the state

variables (ς p,e ), and also x , [¯x , y¯ ]T. We assume that the evader can measure ∈ { } ς ς ς

the states (xp, xe) perfectly; while the pursuer knows its own state xp but can only

access a noisy measurement of xe. The study here is based on the analysis for the perfect state information case, but taking into account the effect of the measurement disturbance.

112 Since the evader can access the perfect states, we assume that the evader exploits ˜ the optimal control θe∗ determined in (4.32). Suppose that the pursuer’s control θp is based on its noisy measurement of x . Letx ˜ , x¯ x¯ andy ˜ , y¯ y¯ . Define e p − e p − e 2 2 ˜ ρ = x˜ +y ˜ . According to (4.21), the evolution of ρ with controls θe∗ and θp can be

describedp as follows, similarly as in (4.56).

∂ρ ∂ρ 1 ρ˙ = v cos θ˜ + v sin θ˜ + v + σ2 (4.71) ∂x¯ p p ∂y¯ p p e 2ρ  p p 

Note that equation (4.71) reduces to (4.56) when the pursuer can measure xe perfectly and employs optimal θp∗ determined in (4.32). The following analysis is conducted based on a specific model of the pursuer’s measurement and the result reveals how the measurement accuracy can affect the capturability of the evader.

Consider the following measurement equation.

ye = xe + ξe(xe) (4.72)

Here, ye is the measurement; ξe(xe) is a random vector representing the measurement

disturbance, which may be state dependent. Suppose that the pursuer calculates

its optimal control according to (4.32) but with the perfect state xe replaced by its measurement ye. Assume that the ξe is bounded, and we shall provide a bound Ξe(ρ)

of ξ (x) that may depend on ρ , x˜2 +y ˜2, i.e., ξ (x) Ξ (ρ), under whichρ ¯ has e k e k ≤ e a positive decreasing rate. Here,ρ ¯p=E ρ . { } We consider a coordinate system whose origin is the current position of the pur-

suer. Denote by ϑ(x) the angle between the line of sight from the evader to the pursuer

1 2 and thex ¯-axis as illustrated in Figure 4.4. Define γ(ρ) = cos− (ve + σ /2ρ) vp ,

where σ2 = σ2 +σ2. Note that ε> (σ2 +σ2)/2(v v ) in Theorem 15 (as in Theorem  p e p e p − e

113 y ; U e ve e Ye J Evader vp Pursuer Tp x - x

Figure 4.4: Illustration of the Pursuer’s Control Based on a Noisy Measurement

13). Hence, γ(ρ) is well defined when ρ ε, and clearly γ(ρ) < π/2. Define the set ≥ , δ γ(ε) >δ> 0 . Let γ (ρ)= γ(ρ) δ for some δ . Dε { | } δ − ∈Dε

Proposition 1 If there exists a δ such that Ξ (ρ) ρ tan γ (ρ) for ρ ε at ∈Dε e ≤ · δ ≥ any x and any time t 0, then ρ¯˙(t) < κ2 at any t 0 for some κ = 0. e ≥ − ≥ 6

Proof: Since ξ (x) Ξ (ρ), according to Figure 4.4, clearly, y falls within a k e k ≤ e e region Y , which is a circle of radius Ξ (ρ) centered at x . Let x˜ˆ =x ¯ y and e e e p − e1 y˜ˆ =y ¯ y . By (4.32), θ˜ satisfies p − e2 p x˜ˆ y˜ˆ cos θ˜ = and sin θ˜ = . p − p − x˜ˆ2 + y˜ˆ2 x˜ˆ2 + y˜ˆ2 q q Namely, θ˜ Θδ , θ ϑ(x)+ π γ (ρ) θ ϑ(x)+ π + γ (ρ) . Under θ˜ , equation p ∈ x | − δ ≤ ≤ δ p (4.71) becomes 

∂ρ ∂ρ 1 ρ˙ = v cos θ˜ + v sin θ˜ + v + σ2. (4.73) ∂x¯ p p ∂y¯ p p e 2ρ  p p  Note that in (4.73), ∂ρ = cos ϑ(x) and ∂ρ = sin ϑ(x). Thus, ∂ρ cos θ˜ + ∂ρ sin θ˜ = ∂x¯p ∂y¯p ∂x¯p p ∂y¯p p cos ϑ(x) θ˜ . The fact that θ˜ Θδ implies θ˜ ϑ(x) π γ (ρ) < π/2, i.e., − p p ∈ x | p − − | ≤ | δ |  114 cos(θ˜ ϑ(x) π) cos γ (ρ) . Thus, p − − ≥ δ  cos ϑ(x) θ˜ = cos ϑ(x) θ˜ π cos γ (ρ) = cos γ(ρ) δ . (4.74) − p − − p − ≤ − δ −     Substitute (4.74) into (4.73),

1 1 ρ˙ v cos γ(ρ) δ + v + σ2 = v cos γ(ρ) δ v + v ≤ − p − e 2ρ − p − − e 2ρ p      .  = v cos γ(ρ) δ cos γ(ρ) = v sin γ(ρ) δ/2 sin(δ/2) − p − − − p −     v sin γ(ε) δ/2 sin(δ/2) = κ2 < 0. (4.75) ≤ − p − −  Since (4.75) holds for any ρ>ε, thus it holds for ρ¯˙ if the evader is not captured, i.e.,

ρ<¯˙ κ2. − Proposition 1 states that if ξ ρ tan γ (ρ) , the distance between the pursuer k ek ≤ · δ and the evader decreases on average. Following  the analysis in Section 4.3.3, it is expected that the capture time of the evader has a finite expectation.

Note that the measurement with a bounded disturbance considered in (4.72) is restrictive, and it is unlikely that the pursuer’s measurement has a bounded distur- bance in a real-world problem. However, the assertion in Proposition 1 still has its practical value because it sheds light on the relationship between the measurement accuracy and the capturability as well as the capture range of the pursuer (indicated by ε). The results can be extended to a more general case where the measurement noise is unbounded but with a known probability distribution.

In this section, some fundamental issues of stochastic PE differential games have been discussed. The limited look-ahead approach is extended to stochastic games with both the perfect and the imperfect state information pattern. Most of the study is devoted to the perfect state information case. Problems with imperfect state

115 information needs further study on some fundamental issues. Except for some special

cases, analytical solutions for most problems are still largely unavailable. Bayesian

analysis used in Theorem 16 can be further extended to more general situations, e.g.,

in a situation when (useful) measurements are absent. In such a problem, a PE game

can be used to model a search problem based on the probability maps of the detected

and the hidden evaders [84].

4.6 Simulation Results

In this section, we implement the method based on limited look-ahead for stochas-

tic PE games with an imperfect state information pattern. Consider a cooperative

pursuit problem of multiple evaders in a R2 space. Suppose that there are 3 pursuers and 4 evaders, and the dynamics of the players are given in (4.27). The sum of the capture time of each evader is chosen as the objective. In addition to assumption

6, we also assume that the pursuers can measure their own states perfectly. The measurement equation (for evader j) of the pursuers is

x¯e ξj yj(t)= j + x¯ , (4.76) y¯e ξj  j   y¯  j j where ξx¯ and ξy¯ are independent Gaussian distributed random variables, with zero- mean and a common variance, i.e., ξj ξj 0, σ2 . The necessary parameters x¯ y¯ ∼ N j   of the pursuers and the evaders are given  in Table 4.1. Let the variance of the measurement disturbance σ2 be 0.5 for j = 1, , 4. j ··· Since the dynamics in (4.27) is linear in the states and the disturbances both in the dynamics and in the measurement are Gaussian, the Kalman filter is applicable.

The assumption here is that the evaders’ controls can be measured perfectly by the pursuers. We solve the stochastic game by the limited look-ahead method as in (4.69).

116 Suppose that the current time is t. The cost-to-go function at time t+∆t is similar to

(4.70) but with the state estimate used instead with the principle of certainty equiv-

p alence, i.e., I It+∆t = xˆt+∆t Ip , wherex ˆt+∆t Ip =E xˆt+∆t It . Here, function V V | t | t | V is determinede by the hierarchicale method for a corresponding problem with perfecte state information.

The details of the simulation are introduced as follows. Note that the pursuers and the evaders have independent dynamics; thus, the dynamic equation is separable in the sense of Theorem 9. This property is useful to decouple the decision-making of the pursuers and that of the evaders. Since results from the hierarchical method, V we assume that under an engagement schemee between the pursuers and the evaders, each evader still exploits the optimal strategy that is calculated according to (4.32) in the perfect state information case against its engaged pursuer. The strategy of the evaders is optimal (against the pursuers) if the pursuers’ observation does not depend on the evaders’ control. With the evaders’ strategy obtained, a PE game

Pursuers 1 2 3

x¯p0, y¯p0 (0,-12) (8,8) (-8,8) v (1/sec) 5 5 5 p  2 σp 0.5 0.5 0.5 Evaders 1 2 3 4

x¯e0, y¯e0 (0,6) (-3,-3) (3,-3) (0,0) v (1/sec) 3 2 2 2 e  2 σe 0.5 0.5 0.5 0.5

Table 4.1: The Necessary Parameters of the Pursuers and the Evaders in the Selected Stochastic PE Game Scenario

117 can be transformed into a stochastic optimal control problem with an imperfect state information pattern from the pursuers’ perspective. The approach based on limited look-ahead is implemented in the following simulations.

Three games are simulated, which involve all four evaders and one, two or three pursuers respectively, i.e., the game between all the evaders and pursuer 1, pursuers

(1,2) and pursuers (1, 2, 3). In the simulations, stochastic differential equations are simulated by using the Euler’s method with the time interval ∆t chosen as 0.1. This

∆t is also used as the size of limited look-ahead intervals. Since ∆t is small, the dynamic optimization problem based on limited look-ahead in (4.69) can be approx- imated by a static optimization problem with the controls of the players fixed during

∆t intervals.

The game involving four evaders and all three pursuers is illustrated in Figure 4.5 at various stages. The trajectories of the players are plotted and the circles in the stages 3 and 4 indicate the pursuers’ capture range as well as the capture of the corresponding evaders inside. An interesting phenomenon can be observed from Fig- ure 4.5, as well as from the following figures, that the pursuers move cooperatively at the beginning stages to force the evaders to remain in a certain region. When the pursuers get close enough, they become engaged with specific evaders. This is be- cause that according to the objective in (2.44), all the evaders are equally important.

Thus, the evaders tend to avoid potential capture by each of the pursuers. When the pursuers move closer cooperatively, the evaders wander for a while trying to avoid each pursuer. From this aspect, the method based on limited look-ahead has a ad- vantageous potential for concealing the true intent of the pursuers. Finally, Figure

4.5 only provides a sample run of the stochastic process.

118 Cooperative Pursuit: Stage 1 Cooperative Pursuit: Stage 2

5 5

0 0 Y Y

−5 −5

PusuerPusuer PursuerPursuer −10 EvaderEvader −10 EvaderEvader

−5 0 5 −5 0 5 X X Cooperative Pursuit: Stage 3 Cooperative Pursuit: Stage 4

5 5

0 0 Y Y

−5 −5

PursuerPursuer Pusuer −10 EvaderEvader −10 Evader

−5 0 5 −5 0 5 X X

Figure 4.5: Cooperative Pursuit Trajectories of All 4 Evaders by the 3 Pursuers

For comparison purposes, we simulate similar games with the pursuers 1 and 2 and pursuer 1 respectively. They share the same parameters and the necessary initial conditions with the game involving all three pursuers. The results are shown in

Figure 4.6 and Figure 4.7 respectively.

Figure 4.6 illustrates the trajectories of the cooperative pursuit of the evaders by the pursuers 1 and 2. It is worth noting that pursuer 2 on the right is directly engaged with evader 1 (the fastest) at the top from the beginning, while pursuer 1 twists a little at the beginning stages and then goes after the rest of the evaders

119 Cooperative Pursuit: Stage 1 Cooperative Pursuit: Stage 2 10 10

5 5

0 0

Y −5 Y −5

−10 −10 Pursuer Pursuer Evader −15 −15 Evader

−10 −5 0 5 −10 −5 0 5 X X Cooperative Pursuit: Stage 3 Cooperative Pursuit: Stage 4 10 10

5 5

0 0

Y −5 Y −5

−10 −10 Pursuer Pursuer Evader −15 Evader −15

−10 −5 0 5 −10 −5 0 5 X X

Figure 4.6: Cooperative Pursuit Trajectories of All 4 Evaders by Pursuer 1 and 2

systematically. In Figure 4.7, the cooperative pursuit trajectories between all the evaders and only pursuer 1 are illustrated. It can be seen that pursuer 1 goes after the 4 evaders systematically.

In the simulations above, the Kalman filter is used by the pursuers to estimate the evaders’ states. It requires the perfect knowledge of the evaders’ controls, which may be unrealistic. In the following, we relax this assumption and assume that the evaders’ controls can be measured subject to disturbance. Namely, for each evader j (j = 1, , 4), the pursuers’ measurement of θj (evader j’s control) is based on ··· e

120 Cooperative Pursuit: Stage 1 Cooperative Pursuit: Stage 2 15 10 Pursuer Pursuer Evader 10 Evader 5 5 0

Y Y 0 −5 −5

−10 −10

−15 −15 −6 −4 −2 0 2 4 6 −5 0 5 X X Cooperative Pursuit: Stage 3 Cooperative Pursuit: Stage 4 20 Pursuer Pursuer 15 Evader 20 Evader 10 5 10 Y Y 0 0 −5

−10 −10 −15 −10 −5 0 5 10 −10 −5 0 5 10 15 X X

Figure 4.7: Pursuit Trajectories of All 4 Evaders by Pursuer 1

˜j j θe = θe + ηj, where ηj is a zero-mean Gaussian distributed random variable with the

2 2 variance σ j . We choose σ j = 0.5 for all j = 1, , 4, and the cooperative pursuit θe θe ··· trajectories of all 4 evaders by the 3 pursuers are illustrated in Figure 4.8. With the noise-corrupted measurements of the controls used in the state estimator, it is expected that the pursuers’ performance is degraded.

From these results, one appealing factor of the limited look-ahead method is that it has a potential for concealing the true intent of the pursuers compared to the hierarchical approach. The reason is that the optimization at the upper level based

121 Cooperative Pursuit: Stage 1 Cooperative Pursuit: Stage 2 10 10

5 5

0 0 Y Y

−5 −5 Pursuer Pursuer −10 Evader −10 Evader

−5 0 5 10 −5 0 5 X X Cooperative Pursuit: Stage 3 Cooperative Pursuit: Stage 4

10 5 5 0 0 Y Y

−5 −5 Pursuer Pursuer −10 Evader −10 Evader

−5 0 5 −5 0 5 10 15 X X

Figure 4.8: Cooperative Pursuit Trajectories of the 4 Evaders by the 3 Pursuers with the Imperfect Observation of the Evaders’ Strategies

on (2.44) only provides structured controls to the pursuers, where each pursuer has specific engaged evaders to go after; whereas in the optimization based on limited look-ahead in (4.11) as well as in (4.69), such a control structure is relaxed and the cooperative movement of the pursuers may result in a better outcome if the evaders are not aware of the pursuers’ strategy [85].

122 CHAPTER 5

DISSERTATION SUMMARY AND FUTURE WORK

5.1 Dissertation Summary

The increasing use of autonomous assets and robots in modern military operations has led to renewed interest in the subject of (multi-player) PE games. However, the current theory of differential games is inadequate for solving general multi-player

PE differential games. This is the first attempt to solve general multi-player PE games in continuous time. The purpose of this dissertation is to extend the current differential game theory to the general multi-player PE game case. We approach the problem in an indirect way using an iterative method. The basic idea is to improve a suboptimal solution iteratively based on limited look-ahead, such that an optimal solution is approached in the limit. This approach shares insights with DP-type methods such as the rollout algorithm [54] and the value iteration method [51], which mainly deal with discrete-time problems. However, a multi-player PE differential game is more challenging because it involves multiple players and it is in continuous time. The major contribution of this dissertation is to provide a theoretical foundation for general multi-player PE games under the framework of the current differential

123 game and optimal control theory, and also the solvability and the existence of solutions

of such a problem are examine. Practical scalable algorithms are yet to be developed.

We adopt the concept of nonanticipative strategy and the (upper/lower) Value function due to Varaiya, Roxin and Elliott and Kalton in the game formulation, because this notion of Value can be characterized by a solution of the corresponding

HJI equation in a general sense. Due to the multiplicity of players, the information about the “alive” evaders is naturally modelled by a discrete variable z. In a multi-

player PE game, the major difficulty of the current differential game techniques based

on state rollback lies in the fact that the terminal states of the game cannot be easily

specified. To circumvent this dilemma, we approach the multi-player PE problem

in an indirect manner starting with a suboptimal solution by only considering a

subset of the pursuers’ control inputs, which are called “structured” controls. A

major breakthrough results from the improving property of the suboptimal solution

that is determined by a ST-consistent structured control. The optimization based

on limited look-ahead can be utilized to improve such a suboptimal solution. If it

is applied iteratively, the resulting sequence of functions converges to a true upper

Value function. Furthermore, if the functions in the sequence are continuous, the

convergence can be achieved in a finite number of steps.

With this iterative method, a multi-player PE game reduces to a problem of

finding a valid suboptimal solution with the improving property. Specifically, we

show that the method based on hierarchical decomposition is applicable. With the

hierarchical structure of STC, a valid initial point for the iterative process can be

determined. Furthermore, the performance enhancement based on limited look-ahead

is also applicable to many other complex dynamic optimization problems.

124 Uncertainty is inherent in real-world problems, especially in military applications

when information sources are limited. For this reason, we extend our discussion to the

stochastic PE game case. When uncertainties only appear in the players’ dynamics,

the theoretical results of the iterative method are largely valid for stochastic problems.

Under the frame work of the hierarchical approach, the two-player stochastic PE game

case is extensively examined. In particular, we derive an analytical solution to a two-

player stochastic PE game with the players’ dynamics as the Dubin’s car model, and

also investigate conditions under which the capture time has a finite expectation.

In the case when the states are only measured subject to disturbance, a worst-case

analysis is conducted from the pursuers’ perspective. Particularly, the pursuers’s

measurement is assume to be noise-corrupted while the evaders can still access the

perfect states. A suboptimal approach based on one-step look-ahead is studied. This

method provides a better upper-bound of the performance index compared to the

well-known suboptimal approach—OLFC.

In addition to the numerical justification of the iterative method based on look-

ahead, we examine the theoretical soundness of the method for deterministic PE

games, under the framework of viscosity solution theory for HJ equations. Conditions

on the existence of the Value function of a multi-player PE game are derived in terms

of the players’ dynamics and the objective functional. Besides, continuity of the

suboptimal Value functions are studied.

Due to the close connection to the DP principle, the iterative method suffers from the “curse of dimensionality”. However, numerical methods of DP equations have been extensively studied over the years, and they can benefit the iterative approach to multi-player PE games in many ways. From a practical point of view, the iterative

125 process can stop at any step and provide the best upper-bound of the upper Value function due to the monotonicity of the sequence. Moreover, the suboptimal strategy that results from one-step look-ahead can usually perform well in many practical problems. We have demonstrated the practical usefulness and the advantage of the limited look-ahead through several selected multi-player PE game scenarios.

The fundamental idea behind the iterative approach introduced in this dissertation for multi-player PE games is attractive for general complex systems. When an optimal solution cannot be obtained directly, an alternative approach is usually to search for an approximate solution and the possibility of serial improvements based on that. The improvement can be systematic, as in the iterative method here, or random, as in genetic algorithm24 etc. It is often expected that an exact solution can be approached in the long term.

5.2 Future Work

We suggest several potential research directions both in theory and for practical applications. The first is to generalize the iterative method to the nonzero-sum game case, which can advance the theory of stochastic PE games. The second direction is on efficient algorithms, where an interesting observation of the optimization based on one-step limited look-ahead is discussed. A third is about decentralized approaches, which have better adaptability and robustness under most realistic circumstances.

24A genetic algorithm is a numerical optimization (or search) technique that is to find true or approximate solutions to optimization (or search) problems.

126 5.2.1 The Iterative Method for Nonzero-sum Games

The iterative method studied is for zero-sum game problems. Here, an underlying

assumption is that the information available for the decision-making of each player is

a common knowledge [86], i.e., the information is shared by all the players and the fact

that it is known is also known by all. However, in a general stochastic multi-player

PE game with an imperfect state information pattern, the pursuers and the evaders

actually access different sets of information because either the pursuers or the evaders

have an independent measurement process. Therefore, the PE game is nonzero-sum

p e for the following reason. Let It and It denote respectively the information sets of the

pursuers and of the evaders at time t, and clearly Ip = Ie. Denote by (a, b; x, z) the t 6 t J objective function associated with the corresponding deterministic game under the

controls a and b at the states x and z. Thus, the multi-player PE game problem can

be described as p mina E (a, b; x, z) It J . (5.1) e  maxb E (a, b; x, z) I   J t Since Ip = Ie, the objective of the pursuers and that of the evaders are no longer t 6 t  opposite. Therefore, the game becomes a nonzero-sum game.

To address a stochastic game with an imperfect state information pattern in (5.1),

we need to answer the question whether the iterative method can be extended to the

nonzero-sum game case. In a general nonzero-sum game, there are N (N > 1) players

and each has an individual objective. Nash equilibrium is the solution concept. In

contrast to a zero-sum game that has only one associated Value function, a nonzero-

sum game problem is more complicated, where each player has its own Value evaluated

at Nash equilibrium. Hence, in an iterative method for a nonzero-sum game (if any),

127 there are N different sequences of suboptimal Value functions. Two issues need to be addressed: 1) the monotone property of the sequence in zero-sum game is no longer present, so is the property of convergence; 2) there is no obvious control structure or valid starting point. A relevant study can be found in [87].

5.2.2 On Efficient Algorithms

The iterative method is not scalable and efficient computation algorithms are needed. One possible remedy may resort to the methods in Approximate (or Neuro-)

DP, such as functional approximation, temporal difference or Q-learning and simulation- based optimization etc. [76, 74, 52]. On the other hand, suboptimal approaches may be attractive in practice. In the following, we focus on a suboptimal approach based on one-step look-ahead. This is a brief introduction, and more details can be found in [85].

Consider a deterministic PE game involving N pursuers and M evaders with the objective functional in (2.44) and the players’ dynamics in (2.12). The following discussion is based on a specific z ZM , i.e., z does not change, so that z is suppressed ∈ in the formulations. Let x denote the composite state variable that contains the states of all the pursuers and the evaders. We adopt the definition of strategy given in Chapter 2. Let x be the state x at time τ for t τ t +∆t starting from τ;xt,a,b τ ≤ ≤ x under the controls a( ) and b( ) for some ∆t> 0. Define t · ·

= x for a( ) A(t), b( ) B(t) and t τ t +∆t . Xt τ;xt,a,b · ∈ · ∈ ≤ ≤  Note that the dynamic in (2.12) is bounded, and so is the set . Let E∗(x) be the Xt optimal engagement between the pursuers and the evaders at state x, i.e., E∗(x) is the optimal solution of (2.46). Given a ∆t > 0, suppose that there is no transition

128 of z that occurs during t to t +∆t under any a( ) A(t) and b( ) B(t). Denote · ∈ · ∈ h by V (x)25 the optimal performance index by the hierarchical approach according to E (2.46).e Let V (x; E0) be the performance index under an arbitrary engagement E0. h E Clearly, V (ex)= V (x; E∗(x)).

e e Proposition 2 If E∗(x )= E∗(x) for any x , then t ∈Xt

h t+∆t h 26 V xt = sup inf Mdτ + V xt+∆t;xt,a,β[a] . (5.2) β ∆(t) a( ) A(t) t ∈ · ∈  Z   e  e Proposition 2 is easy to prove [85]. Define the set of all possible engagements over as Xt

E = E∗(x) x . Proposition 2 implies that if E = 1 (the cardinal number), xt | ∈Xt | xt | the optimal hierarchical structured control of the pursuers determined at t is optimal.

In other words, the optimization in (5.2) may result in a better performance (than the h E hierarchical approach) only if there exists x such that V (x) < V (x; E∗(x )). ∈ Xt t

Here, we say that the state xt is in a cooperation region (of thee pursuers)e if Ext > 1. | |

Evader i e 2

2 i vp2 12o i v Pursuer Pursuer i i vp1 1 Evader Evader i e1

Figure 5.1: Improving Strategies of Pursuer i in a “Cooperation Region”

25Note that the argument z is suppressed because no transition of z occurs during t to t + ∆t.

26 M Note that G(x,z,a,b) = j=1 z(τ) = M if no transition of z occurs. P

129 Now, we assume that E = m> 1. Note that not every pursuer has m different | xt | engaged evaders based on the engagements in Ext . Suppose that pursuer i has two

i i possible engaged evaders e1 and e2, as shown in Figure 5.1. Consider that the players’

dynamics are given in (3.8). According to (3.13) in Chapter 3, the optimal strategy of

the pursuer in a two-player PE game is to move towards the evader. Thus, under the

hierarchical structure, pursuer i has two potential orientations at time t, as illustrated

by the speed vector vi and vi in Figure 5.1. It is hard to believe that an optimal p1 p2 solution lies in this set with only two possible alternatives. Nevertheless, in the

optimization over the look-ahead interval ∆t in (5.2), the structure is relaxed such

that all possible controls of pursuer i are considered. Intuitively, a better solution

may exist within the relaxed control set. Let the arc 12 in Figure 5.1 denote the

control space spanned by structured controls vi and vi , where the angle between p1 pb2 the two vectors is no larger than π. If E = 2, either vi or vi may not be optimal | xt | p1 p2 i i and pursuer i should account for both evader e1 and e2 simultaneously when making a decision. With the objective function given in (2.44) and by inspection of Figure 5.1, the following assertion is true.

Claim 3 Any strategy vi of pursuer i that falls outside of the region 12 is dominated by vi or vi with respect to the pursuit of both ei and ei . This is true in a general p1 p2 1 2 b case when pursuer i (i = 1, ,N) has m engaged evaders according to the engagements ·

in Ext .

i i th In (3.8), θp is the control of pursuer i. Let θpk be the k structured control of pursuer

th i i for the k engagement in Ext . By Claim 3, the optimal control θp∗ of pursuer i

from the optimization problem based on one-step look-ahead in (5.2) can be written

130 in the following form.

m m i i i i i θ ∗ = λ θ with λ 0 and λ = 1. (5.3) p k pk k ≥ k k k X=1 X=1 The analysis above on the pursuers is also valid for solving the optimal strategies of

j the evaders, i.e., θe∗ for each evader j can also be written as linear combination of

the structured controls with non-negative real coefficients that sum to 1.

This preliminary investigation reveals the fact that the performance enhancement

by limited look-ahead based on the structured approach essentially results from the

structure relaxation in the optimization over look-ahead intervals. This relaxation

on the control structure enables a better coordination among the pursuers. More

importantly, the linear structure of the improving strategy specified in (5.3) can be

useful to reduce the computation. With proper approximation, the optimization

problem in (5.2) may be formulated as a nonlinear programming problem, such that

extensive numerical methods in the literature [2] are applicable.

5.2.3 On Decentralized Approaches

In this section, we point out the future research on decentralized methods. In

general, a closed-form solution of a multi-player PE game is formidable and it relies

on numerical calculation in most cases. However, most DP-type methods are compu-

tationally intensive, such that they may not be suitable for real-time implementation.

On the other hand, in practical problems like military applications, uncertainty is

usually an important factor, which entails on-line decisions. Moreover, considering

possible failures in communication among the pursuers (evaders), the underlying as-

sumption of the global information sharing among the pursuers (evaders) is invalid.

131 Thus, from a realistic point of view, a computationally efficient and scalable algorithm that is robust against practical uncertainties is in great need.

In contrast to a centralized method, a decentralized approach is more attractive for the decision-making in large-scale systems involving uncertainties. By “divide and conquer”, a decentralized approach has better adaptability, scalability and computa- tional efficiency but with sacrifice on optimality. For the practical issues mentioned above, decentralized methods are certainly an important future research direction for multi-player PE games. Note that the hierarchical approach can be viewed as a preliminary decentralized approach. However, it is simple and not suitable for highly uncertain problems, e.g., some pursuers might be lost (destroyed in the battle-field) or new evaders may emerge before the end of the game.

A more sophisticated decentralized method is needed, where although decisions are made individually by each pursuer (evader) or by a smaller group of the pur- suers (evaders) with local information, the coordination with other pursuers (evaders) should be taken into account. The ideal case is that a global optimum with respect to the original game objective criterion can be preserved by local optimization. From this aspect, team theory can play a role [88]. Another possible direction of central- ized methods is to design a set of local negotiation rules among the pursuers (evaders) such that local decisions may lead to a desirable behavior at the group level. This idea is from the filed of multi-agent systems [89]. With the interconnections between the players, additional cooperation is possible compared to the hierarchical approach.

Furthermore, cooperative game theory may be useful to determine a proper coopera- tion topology (“coalition”) among the pursuers (evaders) [86], with which centralized methods can be implemented on smaller groups. Intuitively, cooperative behavior

132 requires information sharing among the (cooperative) players. Thus, in decentral- ized approaches, the inter-player communication becomes an important issue. The information structure, communication efficiency and robustness are all of great im- portance.

Finally, throughout the study in this dissertation, we have encountered tremen- dous challenges in stochastic games. Not only the calculation becomes more difficult, but also the undertainty raises fundamental problems on the information pattern in control as well as differential game problems. More additional assumptions have to be introduced in order to make the problem well defined and solvable. This type of problems have trapped mathematicians and engineers for decades and this area is still widely open. We wish some day more fundamental machineries are available to make a major breakthrough.

133 APPENDIX A

PROOF OF THEOREM 5

Proof: Suppose that the game starts at time t . For any z ZM , x X, denote 0 ∈ ∈ by Ei the set of evaders that have been engaged with pursuer i under an engagement

E. Let E(x,z) be the optimal engagements at x and z according to (2.46). Denote by

h Ji (x, z; Ei) the performance evaluation of the sub-game between pursuer i and the evaders in Ei with the optimal strategies exploited by pursuer i and the evaders at the lower level. Clearly, J h = 0 if E = ∅. If E = n > 0, let E = j , , j , i i | i| i i { 1 ··· ni } where j = 1, , M. We say E is feasible if J h(x, z; E) < . Under Assumption 3, k ··· ∞ there exists some E that is feasible.

Note that N h h J (x, z; E)= Ji (x, z; Ei). i=1 X Lemma 5 indicates that if J h(x, z; E ) is uniformly continuous in xi and xj (j E ) i i p e ∈ i h under any feasible E, then J h(x, z; E) is uniformly continuous in x and so is V (x, z)

because there are finitely many E′s and e

N h h h V (x, z) = min J (x, z; E) = min Ji (x, z; E) . E E n o n j=1 o e X h In the following, we need to show the uniform continuity of Ji (x,z,Ei) for each E.

Given an arbitrary z ZM , suppose that E = 1, , n for 1 n M. Let ∈ i { ··· i} ≤ i ≤ 134 x , x be arbitrary two point in Ω. We denote by x1( ) and x2( ) be the trajectories 1 2 p · p · i i of pursuer i starting form x1p and x2p respectively under a common strategy that

will be introduced shortly. According to the problem formulated in (2.46), pursuer i

proceeds to the evaders in Ei sequentially, therefore,

ni h Ji (x,z,Ei)= V ij xij[j] . (A.1) j=1 X  h By (A.1), the uniform continuity of Ji (x,z,Ei) depends on that of V ij xij[j] for ev-

ery j = 1, , n . In the following, we shall show the uniform continuity of V  x [j] ··· i ij ij for each j. 

For x Ω (l = 1, 2), denote by x i and x j the state of pursuer i and evader j l ∈ lp le l , i T jT T in Ei associated with xl. Also, define xij xlp , xle . Denote by Jij(ai, bj; xij)

the objective functional of the two-player game between pursuer i and evader j in Ei,

where ai and bj are the corresponding control inputs. Let ε > 0 be arbitrary. First

of all, consider V i1, i.e., the upper Value of the sub-game between pursuer i and the

first evader in Ei. Recall that

2 2 V i1(x ) = sup inf Ji1(ai, b1; x ) . (A.2) i1 i i1 ai( ) A (t0) β1 ∆1(t0) · ∈ ∈  For the given ε> 0, there exists a strategy of evader 1 , i.e., β2 ∆ (t)27, such that 1 ∈ 1

V (x2 ) J (a2, β2; x2 )+ ε for any a2( ) Ai(t), (A.3) i1 i1 ≤ i1 i 1 i1 i · ∈

where a2( ) is the control of pursuer i starting from x . Now, consider the same sub- i · 2 game (between pursuer i and evader 1 in Ei) starting from x1. Choose the strategy of evader 1, β1, such that β1[a ](t)= β2[a2](t) for t t T 1 T 2 and any a ( ) Ai(t ), 1 1 i 1 i 0 ≤ ≤ 1 ∧ 1 i · ∈ 0 27 2 Here, the subscript of β1 stands for the first evader and the superscript stands for the initial point x Ω. 2 ∈

135 1 2 where T1 and T1 are the capture time of evader 1 in Ei associated with the initial states x1 and x2. Under this setting, by (A.2),

1 1 1 V i1(x ) inf Ji1(ai, β ; x ), i1 i 1 i1 ≥ ai( ) A (t0) · ∈ and there exists a1( ) Ai(t ), such that i · ∈ 0

V (x1 ) J (a1, β1; x1 ) ε. (A.4) i1 i1 ≥ i1 i 1 i1 −

Since (A.3) holds for any a2 Ai(t ), it is true when a2(t)= a1(t) for t t T 1 T 2. i ∈ 0 i i 0 ≤ ≤ 1 ∧ 1

Next, we prove by cases according to the capture time of the first evader in Ei

1 2 associated with x1 and x2, i.e., T1 and T1 .

Case 1(T 1 T 2): Since the dynamics of the players is Lipschitz according to 1 ≤ 1

(2.4) and Ω is bounded, there exists a constant C1 > 0 such that the trajectories

1 1 2 2 starting from x1 and x2 under strategies (ai , β1 ) and (ai , β1 ) satisfies

x1 (T 1) x2 (T 1) C x1 (t ) x2 (t ) . (A.5) i1 1 − i1 1 ≤ 1 i1 0 − i1 0

Here, by abuse of notation, we use xl ( ) to denote the trajectory of pursuers i and i1 · l l l evader 1 starting from xi1 for l = 1, 2. Clearly, xi1(t0) = xi1. Because of the strong

capturability, there exists δ > 0 such that if x1 (t ) x2 (t ) δ there exists 1 i1 0 − i1 0 ≤ 1 a2( ) Ai(T 1) such that T 2 T 1 + ε for any β2 ∆ (T 1). It follows by (A.3)-(A.4) i · ∈ 1 1 ≤ 1 1 ∈ 1 1 that

V (x2 ) V (x1 ) T 2 T 1 + 2ε 3ε. (A.6) i1 i1 − i1 i1 ≤ 1 − 1 ≤ Choose a control input a1(t) for T 1 t T 2 T 1 + ε, such that no second evader is i 1 ≤ ≤ 1 ≤ 1 captured along the trajectory x i ( ) during the time from T 1 to T 2. This is possible 1p · 1 1 because the second evader in Ei tries to escape. Under this construction, we want to check the subsequent game (between pursuer i and the second evader in Ei) from

136 2 both x1 and x2 at the same time T1 . Due to the boundedness of the dynamics and

by (A.5), regardless of the control of evader 2 in Ei (will be examined shortly), there

exists a constant δ1′ > 0 such that,

V (x1 (T 1)) V (x1 (T 2)) ε (A.7) | i2 i2 1 − i2 i2 1 | ≤

1 2 for any x (t ) x (t ) δ′ . Finally, by the boundedness of the dynamics in (2.4), i1 0 − i1 0 ≤ 1

x1(T 2) x2(T 2) C xi xi + C (T 2 T 1) p 1 − p 1 ≤ 2 1p − 2p 3 1 − 1

C xi xi + C ε for some C ,C > 0. (A.8) ≤ 2 1p − 2p 3 2 3

Therefore, x1(T 2) x2(T 2) can also be suppressed by the initial condition of the sub- p 1 − p 1

game between pursuer i and the evaders in Ei. It provides some sense of continuity.

1 2 Case 2(T1 > T1 ): By (A.3)-(A.4),

V (x2 ) V (x1 ) T 2 T 1 + 2ε 2ε. (A.9) i1 i1 − i1 i1 ≤ 1 − 1 ≤

By similar arguments as in Case 1, i.e., due to the boundedness and the Lipschtiz condition of the players’ dynamics,

x1 (T 2) x2 (T 2) C x1 (t ) x2 (t ) for some C > 0. (A.10) i1 1 − i1 1 ≤ 4 i1 0 − i1 0 4

Equation (A.10) is the counterpart of (A.5). Again, by the strong capturability, there exists δ > 0 such that for any x1 x2 δ , T 1 T 2 + ε. Choose a2(t) for 2 k i1 − i1k ≤ 2 1 ≤ 1 i T 2 t T 1, such that no second evader is captured. Similar to the inequality (A.8) 1 ≤ ≤ 1 in Case 1,

x1(T 1) x2(T 1) C xi xi + C (T 1 T 2) p 1 − p 1 ≤ 4 1p − 2p 5 1 − 1

C xi xi + C ε for some C ,C > 0. (A.11) ≤ 4 1p − 2p 5 4 5

137 Now, consider the second evader (2) in E . Denote by T = T 1 T 2 , max T 1, T 2 . i 1 1 ∨ 1 { 1 1 } Choose the strategy of evader 2 associated with x , β2 ∆ (t ), such that 2 2 ∈ 2 0

2 2 2 2 2 2 2 2 2 sup V i2 xi2 T1; xi2(t0),ai , β[ai ] V i2 xi2 T1; xi2(t0),ai , β2 [ai ] + ε. (A.12) β ∆2(t0) ≤ ∈ 2 2 2 2  2 2  Here, xi2 T1; xi2(t0),ai , β[ai ] is the state xi2 at T1 starting from xi2(t0) under the

controls a2 and β[a2]. Let β1[a1](t) = β2[a2](t) for 0 t T and any a Ai(t ). i i 2 i 2 i ≤ ≤ 1 i ∈ 0 Then,

1 1 1 1 1 1 1 1 1 sup V i2 xi2 T1; xi2(t0),ai , β[ai ] V i2 xi2 T1; xi2(t0),ai , β2 [ai ] . (A.13) β ∆2(t0) ≥ ∈   By the Lipschitz condition of the dynamics, similarly,

x 1(T ) x 2 T ) C x 1(t ) x 2(t ) for some C > 0. (A.14) e2 1 − e2 1 k ≤ 6 e2 0 − e2 0 6 Considering (A.5) and (A.8) in Case 1 as well as (A.10) and (A.11) in Case 2, the subsequent game between pursuer i and evader 2 in Ei at time T1 is equivalent to the

l game between pursuer i and evader 1 and its initial state xi2(T1) (l = 1, 2) at time

T1 can be suppressed by the initial state x1 and x2. Since same conclusions as in

(A.6) and (A.7) in Case 1 and (A.9) in Case 2 can be drawn for the game between pursuer i and evader 2 in Ei.

By mathematical induction, the same conclusions can be drawn for the rest of the

h ni evaders in Ei. Note that Ji (x,z,Ei)= j=1 V ij(xij) in (A.1) and that (A.5), (A.8) P 1 2 1 2 in Case 1 and (A.10), (A.11) in Case 2 hold for any selected ai , ai , β1 and β1 . By inspection of (A.12) and (A.13), it can be shown that for a given ε> 0, there exists

δ > 0 such that

J h(x ,z,E ) J h(x ,z,E ) ε i 2 i − i 1 i ≤ when x x δ for some δ > 0. Since x and x are arbitrary, it is true k 1 − 2k ≤ 1 2 that J h(x ,z,E ) J h(x ,z,E ) ε when x x δ¯ for some δ¯ > 0. Let i 1 i − i 2 i ≤ k 1 − 2k ≤ 138 h h δ∗ = min δ, δ¯ . Then, J (x ,z,E ) J (x ,z,E ) ε for x x δ. Thus, { } k i 1 i − i 2 i k ≤ 1 − 2 ≤ h Ji (x, z; E) is continuous in x with any feasible E. Note that δ∗ does not depend on x, such that the continuity is unform with respect to x. h Using the arguments similar to those in Lemma 5, it can be shown that V (x, z) h is uniformly continuous. The boundedness of V is because Ω is bounded. e

e Remark 14 In the proof of Theorem 5, the argument in (A.10) is important because

the arguments are constructed based on (A.3) and (A.4) and in (A.4) the “optimal”

1 2 control input of pursuer i is considered. However, during the time from T1 to T1 , the

1 control ai is chosen arbitrarily. Equation (A.10) shows that the sacrifice of Value due

to a1 can be suppressed by the initial state, i.e., x x . i k 1 − 2k

139 APPENDIX B

PROOF OF THEOREM 6

Proof: Prove by induction. Assume that Wn (n Z 0) is uniformly continuous in ∈ ≥ x, we want to prove the uniform continuity of Wn+1. By Lemma 6, it suffices to show

c the uniform continuity on Λz.

Consider an arbitrary z ZM (z = 0). We first prove the continuity of W on ∈ 6 n+1 Λc. Choose any x , x Λc, which is equivalent to z 0 1 ∈ z

z = zx0 = zx1 . (B.1)

Refer to (2.26) on page 29 for the definition of zx. Without loss of generality, we assume that

W (x , z) W (x , z). (B.2) n+1 0 ≤ n+1 1 Given any ε > 0, there exists a strategy β ( ) ∆(t), such that the following holds 1 · ∈ for any a ( ) A(t). 1 · ∈

Wn+1(x1, z)= H Wn (x1, z) t t +∆   G xτ , zτ ,a , β [a ]τ dτ + Wn x , z + ε (B.3) ≤ 1τ 1 1 t+∆t;x1,a1,β1[a1] t+∆t;z,x Zt   

140 Here, we use the notation β1 (or β0) to stand for the strategy of the evaders associated

with the initial state x1 (or x0), and similarly use a1 (or a0) for the pursuers. Select

β ( ) as 0 ·

β [a ](τ)= β [a ](τ) for any t τ t +∆t and a ,a A(t) (B.4) 0 0 1 1 ≤ ≤ 0 1 ∈ and specifically, β [a ] (τ)= β [a ] (τ) (control of the jth evader) for j I I , 0 0 j 1 1 j ∈ z0(τ)∪ z1(τ)

where z0(τ) or z1(τ) is the state z at time τ associated with trajectory of x with the

initial point x0 or x1 under (a0, β0) or (a1, β1). By (B.4), clearly,

t+∆t

Wn+1(x0, z) inf G(xτ , zτ ,aτ , β0[a]τ )dτ + Wn(xt+∆t;x ,a,β [a], zt+∆t;z,x) . ≥ a( ) A(t) 0 0 · ∈ (Zt ) Further, there exists a ( ) A(t), such that 0 · ∈ t+∆t Wn (x , z) G(xτ , zτ ,aτ , β [a]τ )dτ + Wn(x , z ) ε. (B.5) +1 0 ≥ 0 t+∆t;x0,a,β0[a] t+∆t;z,x − Zt z 1 Let the time when the transition of z along z0(τ) (or z1(τ)) occurs be T0| |− (or

z 1 z 1 z 1 z 1 T | |− ). Define T | |− = T | |− T | |− . Equation (B.4) completely defines strategy β 1 0 ∧ 1 0 z 1 τ τ for each evader j I for t τ T | |− . Denote by x ( ) (resp. x ( )) the players’ ∈ z0 ≤ ≤ 1 · 0 · trajectories under (α1, β1) (resp. (α0, β0)) starting from x1 (resp. x0). Clearly,

τ x1(τ)= x(τ; x1,a1, β1[a1]), τ (B.6) x0(τ)= x(τ; x0,a0, β0[a0])

z 1 for t τ T | |− . We prove by cases as follows. ≤ ≤ z 1 Case 1) Suppose that t +∆t T | |− , i.e., there is no transition of z occurring ≤ from t to t +∆t along both z (τ) and z (τ). Since (B.3) holds for any a A(t), it 0 1 1 ∈

141 is true if we choose a1 = a0. By (B.3)-(B.5),

W (x , z) W (x , z) n+1 1 − n+1 0 t+∆t G(xτ , zτ ,a1τ , β1[a1]τ ) G(xτ , zτ ,a0τ , β0[a0]τ ) dτ ≤ t − Z h i +W (xτ (t +∆t), z (t +∆t) W (xτ (t +∆t), z (t +∆t) + 2ε. n 1 1 − n 0 0     (B.7)

By the Lipschitz condition of the dynamics f, the boundedness and the Lipschitz condition of the cost rate G, and the uniform continuity of Wn, both terms on the right hand side of (B.7) can be suppressed by x x . Therefore, there exists δ > 0 k 0 − 1k 1 such that

Wn+1(x1, z) Wn+1(x , z) 3ε for x0 x1 δ1. (B.8) − − ≤ k − k ≤

z 1 For the case t+∆t > T | |− , i.e., there is at least one evader that is captured along

τ τ either the trajectory x1 or x0. We prove the case that only one evader is captured from t to t+∆t, and it can be extended to the case when multiple evaders are captured during the time interval.

z 1 z 1 z 1 z 1 Case 2) (t +∆t > T | |− : T1| |− < T0| |− ). Let τ1 = T1| |− . Suppose that evader

τ j that is captured by pursuer i at τ1 along trajectory x1. The following analysis is focused on pursuer i and evader j. By the Lipschitz condition of the players’ dynamics, given any λ> 0, there is δ2 > 0 such that

xτ i (τ ) xτ i (τ ) λ and xτ j(τ ) xτ j(τ ) λ (B.9) 0p 1 − 1p 1 ≤ 0e 1 − 1e 1 ≤

for any x x δ . Consider the two-player PE game between pursuer i and evader k 1− 0k ≤ 2 τ i τ j 28 τ j starting from x0p(τ1) and x0e(τ1) along the trajectory x0. By the assumption of

28 τ i τ j Here, x0 p (x0 e) denotes the state trajectory of pursuer i (evader j) corresponding to the com- τ posite state trajectory of x0 .

142 strong capturability, given any γ for 0 γ t+∆t τ , there exists λ > 0 such that ≤ ≤ − 1 1 there exists aa ˜ ( ) Ai(τ ) such that the capture time of evader j (associated with 0i · ∈ 1 xτ ), T 0 τ + γ for any xτ (τ ) xτ (τ ) λ for any b ( ) Bj(τ ). Furthermore, 0 j ≤ 1 k 1 1 − 0 1 k ≤ 1 j · ∈ 1 by (B.9), there exists a δ > 0 such that for any x x δ , it is true that 3 k 1 − 0k ≤ 3 xτ (τ ) xτ (τ ) λ as well as T 0 τ + γ

of the pursuersa ¯0 as

a¯ (τ)= a (τ), t τ

t+∆t W (x , z) G(x , z , a¯ , β [¯a] )dτ + W xτ (t +∆t), z (t +∆t) C ε n+1 0 ≥ τ τ τ 0 τ n 0 0 − 1 Zt  (B.11)

for some constant C > 0. Now, let a (τ) =a ¯ (τ) for t τ

still holds. Due to (B.4) and by (B.11) and (B.3), there exists δ4 > 0 such that

W (x , z) W (x , z) C ε (B.12) n+1 1 − n+1 0 ≤ 2

for any x x δ and some constant C > 0. Conclusion in (B.12) is drawn k 0 − 1k ≤ 4 2 similarly as in (B.7)-(B.8).

z 1 z 1 z 1 z 1 Case 3) (t +∆t > T | |− : T1| |− > T0| |− ). Similar to Case 2, let τ0 = T0| |− ,

τ and suppose that evader j is captured by pursuer i at τ0 along the trajectory x0. By

the strong capturability, there exists a strategy of pursuer i,a ˜ ( ) Ai(τ ), such that 1i · ∈ 0 T τ + γ for some γ (0 <γ t +∆t τ ) that can be made arbitrarily small by j ≤ 0 ≤ − 0 suppressing x x . In this case, we choose the a found in (B.5), and construct k 0 − 1k 0 143 τ the strategya ¯1 of the pursuers along trajectory x1 as

a¯ (τ)= a (τ), t τ

Note that (B.3) holds for any a1. Use the β1 found in (B.3), and consider (B.4), it can be proved similarly as in Case 2, i.e., there exists δ5 > 0 such that

W (x , z) W (x , z) C ε (B.14) n+1 1 − n+1 0 ≤ 3 for some C > 0 and any x x δ . 3 k 0 − 1k ≤ 5 Considering the (B.2), (B.8), (B.12) and (B.14), we can show the continuity of

c Wn+1 in x on Λz if only one evader is captured during the ∆t interval. Following the idea in the proof of Theorem 5, the conclusion can be extended the case where

c multiple evaders are captured during ∆t. Then, Wn+1 is continuous in x on Λz.

c Next, we examine that if Wn+1 is continuous in x on Λz, i.e., the cases other than

that in (B.1). We first check the case of zx0 = zx1 = 0. Considering that Λc is open 6 z and the equality in (2.57), given x , x Λc, if zx0 = zx1 = 0, the continuity of W 0 1 ∈ z 6 n+1 c c on Λz is equivalent to the continuity on Λzx0 , which has been proved. Furthermore,

the cases where zx0 = zx1 can be transferred into those similar to Case 2 or Case 3 6 (if zx0 zx1 = z 1) with τ or τ = t by checking if W (x , z) W (x , z) or ∧ | |− 0 1 n+1 0 ≤ n+1 1 otherwise. More general cases when zx0 zx1 < z 1 can be shown accordingly in ∧ | |−

the case with capture of multiple evaders during ∆t. Thus, Wn+1(x, z) is continuous

c in x on Λz. By Lemma 6, Wn+1(x, z) is continuous in x on X. Note that the continuity

does not depend on x, so that it is uniform.

Finally, by the hypothesis that W0 = V is uniformly continuous in x on X. By

+ mathematical induction, this is true for alleWk, k Z . ∈

144 BIBLIOGRAPHY

[1] Y.C. Ho, “Differential games, dynamic optimization, and generalized control theory,” Journal of Optimization Theory and Applications, vol. 6, no. 3, pp. 179– 209, 1970.

[2] J. Nocedal and S. Wright, Numerical Optimization. New York: Springer-Verlag, 1999.

[3] P. Walker, “History of game theory: A chronology of game theory.” Webpage: http://www.econ.canterbury.ac.nz/personal pages/paul walker/gt/hist.htm.

[4] J. Neumann and O. Morgenstern, Theory of Games and Economic Behavior. Princeton, NJ: Princeton University Press, 1944.

[5] J. Nash, “Equilibrium points in N-person games,” Proceedings of the National Academy of Sciences of the United States of America, vol. 36, pp. 48–49, 1950.

[6] H. Stackelberg, The theory of the market Economy. London, U.K.: Oxford University Press, 1952. Translated by A.T. Peacock.

[7] T. Ba¸sar and G. Olsder, Dynamic Noncooperative Game Theory. Philadelphia: SIAM, second ed., 1998.

[8] T. Ba¸sar and P. Bernhard, H∞-optimal Control and Related Minimax Design Problems. Boston: Birkh¨auser, 2nd ed., 1995.

[9] M. Bardi and I. Capuzzo-Dolcetta, Optimal Control and Viscosity Solutions of Hamilton-Jacobi-Bellman Equations. Boston: Birkh¨auser, 1997.

[10] R. Isaacs, “Rand reports rm-1391 (30 november 1954), rm-1399 (30 november 1954), rm-1411 (21 december 1954) and rm-1486 (25 march 1955),” tech. rep., all entitled in part Differential Games.

[11] Isaacs, Differential Games: A Mathematical Theory with Applications to Warfare and Pursuit. New York: John Wiley & Sons, Inc., 1965.

145 [12] R. Bellman, Dynamic Programming. Princeton, NJ: Princeton University Press, 1957.

[13] C. Chen and J.B. Cruz, Jr., “Stackelberg solution for two-person games with biased information patterns,” IEEE Transactions on Automatic Control, vol. 17, no. 6, pp. 791–798, 1972.

[14] M. Simaan and J.B. Cruz, Jr., “On the stackelberg strategy in nonzero-sum games,” Journal of Optimization Theory and Applications, vol. 11, no. 5, pp. 533– 555, 1973.

[15] L. Pontryagin, V. Boltyanskii, R. Gamkrelidze, and E. Mishchenko, The Mathe- matical Theory of Optimal Processes. 1961. Translated by K.N. Trirogoff, edited by L.W. Neustadt, Interscience, N.Y., 1962.

[16] R. Bellman and S. Dreyfus, Applied Dynamic Programming. Princeton, NJ: Princeton University Press, 1962.

[17] H. Pesch and R. Bulirsch, “The maximum principle, bellman’s equation, and carath´eodory’s work,” Journal of Optimization Theory and Applications, vol. 80, no. 2, pp. 199–225, 1994.

[18] J. Willems, “1696: The birth of optimal control,” in Proceedings of the 35th IEEE Conference on Decision and Control, (San Diego, CA), pp. 1586–1587, 1996.

[19] J. Yong and X. Zhou, Stochastic Controls: Hamiltonian Systems and HJB Equa- tions. New York: Sringer, 1999.

[20] J. Arthur E. Bryson, “Optimal control—1950 to 1985,” IEEE Control Systems Magazine, vol. 16, no. 3, pp. 26–33, 1996.

[21] N. Wax, ed., Collected Papers on Noise and Stochastic Processes. New York: Dover, 1954.

[22] R. Kalman and R. Bucy, “New results in linear filtering and prediction,” Journal of Basic Engineering (ASME), vol. 83D, pp. 95–108, 1961.

[23] R. Kalman, “Contributions to the theory of optimal control,” Boletin de la So- ciedad Matematica Mexicana, vol. 5, pp. 102–119, 1960.

[24] C. Merrian, Optimization Theory and the Design of Feedback Control Systems. New York: McGraw Hill, 1964.

[25] J. Florentin, “Optimal control of continuous time, markov, stochastic systems,” Journal of Electronics and Control, vol. 10, pp. 473–488, 1961.

146 [26] J. Florentin, “Partial observability and optimal control,” Journal of Electronics and Control, vol. 13, pp. 263–279, 1962.

[27] P. Joseph and J. Tou, “On linear control theory,” Transactions of AIEE, vol. 80, no. 3, pp. 193–196, 1961.

[28] H. Kushner, “Optimal stochastic control,” IRE Transactions on Automatic Con- trol, vol. 7, no. 5, pp. 120–122, 1962.

[29] P. Gunckel and G. Franklin, “A general solution for linear sampled-data control systems,” Transactions of ASME, vol. 85D, pp. 197–203, 1963.

[30] M. Davis and P. Varaiya, “Dynamic programming conditions for partially observ- able stochastic systems,” SIAM Journal on Control and Optimization, vol. 11, pp. 226–261, 1973.

[31] W. Fleming and E. Pardoux, “Optimal control for partially observed diffusions,” SIAM Journal on Control and Optimization, vol. 21, no. 2, pp. 261–285, 1981.

[32] S. Mitter, “Filtering and stochastic control: A historical perspective,” IEEE Control Systems Magazine, vol. 16, no. 3, pp. 67–76, 1996.

[33] W. Fleming, “The convergence problem for differential games,” Journal of Math- ematical Analysis and Applications, vol. 3, pp. 102–116, 1961.

[34] W. Fleming, “The convergence problem for differential games II,” in In Advances in Game Theory (M. Dresher et al., ed.), vol. 52 of Annals of Mathematical Studies, pp. 195–210, Princeton, NJ: Princeton University Press, 1964.

[35] P. Varaiya, “On the existence of solutions to a differential game,” SIAM Journal of Control, vol. 5, pp. 153–162, 1967.

[36] E. Roxin, “Axiomatic approach in differential games,” Journal of Optimization Theory and Applications, vol. 3, no. 3, pp. 153–163, 1969.

[37] R. Elliott and N. Kalton, “The existence of value in differential games,” Memoirs of the American Mathematical Society, vol. 126, 1972.

[38] R. Elliott and N. Kalton, “Cauchy problems for certain Isaacs-Bellman equations and games of survival,” Transactions of the American Mathematical Society, vol. 198, pp. 45–72, 1974.

[39] A. Friedman, Differential Games. New York: Wiley, 1971.

[40] A. Friedman, “Differential games,” vol. 18 of CBMS Regional Conference Series in Mathematics, Providence, R.I.: American Mathematical Society, 1974.

147 [41] N. Krasovskii and A. Subbotin, Positional Differential Games. Moscow: Nauka, 1974. Revised English Edition: Game-theoretical Control Problems, Springer, New York, 1988.

[42] L. Berkovitz, “A thoery of differential games: Theory and applications in worst- case controller design,” in Advances in Game Theory (T. Ba¸sar and A. Haurie, eds.), vol. 52 of Annals of International Society of Dynamic Games, pp. 3–22, Boston, MA: Birkh¨auser, 1994.

[43] M. Crandall and P. Lions, “Viscosity solutions of Hamilton-Jacobi equations,” Transactions of the American Mathematical Society, vol. 277, no. 1, pp. 1–43, 1983.

[44] M. Crandall, L. Evans, and P. Lions, “Some properties of viscosity solutions of Hamilton-Jacobi equations,” Transactions of American Mathematics Society, vol. 282, pp. 487–502, 1984.

[45] P. Lions, Generalized Solutions of Hamilton-Jacobi Equations. Boston, MA: Pitman Advanced Publishing Program, 1982.

[46] R. Jensen, “The maximum principle for viscosity solutions of fully nonlinear sec- ond order partial differential equations,” Archive of Rational Mechanical Analy- sis, vol. 101, pp. 1–27, 1988.

[47] R. Jensen, P. Lions, and P. Souganidis, “A unique result for viscosity solutions of second-order fully nonlinear partial differential equations,” Proceedings of Amer- ican Mathematical Society, vol. 102, pp. 975–978, 1988.

[48] H. Ishii and P. Lions, “Viscosity solutions of fully nonlinear second-order elliptic partial differential equations,” Journal of Differential Equations, vol. 83, pp. 26– 78, 1990.

[49] M. Crandall, H. Ishii, and P. Lions, “User’s guide to viscosity solutions of sec- ond order partial differential equations,” Bulletin of the American Mathematical Society, vol. 27, no. 1, pp. 1–67, 1992.

[50] W. Fleming and P. Souganidis, “On the existence of value functions of two- player, zero-sum stochastic differential games,” Indiana University Mathematics Journal, vol. 38, no. 2, pp. 293–314, 1989.

[51] R. Howard, Dynamic Programming and Markov Processes. Cambridge, MA: The MIT Press, 1960.

[52] D. Bertsekas and J. Tsitsiklis, Neuro-Dynamic Programming. Belmont, MA: Athena Scientific, 1996.

148 [53] H. Kushner and P. Dupuis, Numerical Methods for Stochastic Control Problems in Continuous Time. New York: Springer, 2001.

[54] D. Bertsekas, Dynamic Programming and Optimal Control:Vol.1 and 2. Belmont, MA: Athena Scientific, third ed., 2005.

[55] D. Bertsekas, “Dynamic programming and suboptimal control: A survey from ADP to MPC,” Fundamental Issues in Control, European Journal of Control, vol. 11, no. 4-5, pp. 310–334, 2005.

[56] Y.C. Ho, A.E. Bryson, Jr., and S. Baron, “Differential games and optimal pursuit-evasion strategies,” IEEE Transactions on Automatic Control, vol. 10, no. 4, pp. 385–389, 1965.

[57] W. Willman, “Formal solutions for a class of stochastic pursuit-evasion games,” IEEE Transactions on Automatic Control, vol. 14, no. 5, pp. 504–509, 1969.

[58] V. Turetsky and J. Shinar, “Missile guidance laws based on pursuit-evasion game formulations,” Automatica, vol. 39, no. 4, pp. 607–618, 2003.

[59] R. Vidal, O. Shakernia, H.J. Kim, D.H. Shim, and S. Sastry, “Probabilistic pursuit-evasion games: theory, implementation, and experimental evaluation,” IEEE Transactions on Robotics and Automation, vol. 18, no. 5, pp. 662–669, 2002.

[60] J. Hespanha, H.J. Kim, and S. Sastry, “Multiple-agent probabilistic pursuit- evasion games,” in Proceedings of the 38th Conference on Decision and Control, (Phoenix, AZ), pp. 2432–2437, 1999.

[61] J. Hespanha, M. Prandini, and S. Sastry, “Probabilistic pursuit-evasion games: A one-step nash approach,” in Proceedings of the 39th IEEE Conference on Decision and Control, (Sydney, Australia), pp. 2272–2277, 2000.

[62] A. Antoniades, H.J. Kim, and S. Sastry, “Pursuit-evasion strategies for teams of multiple agents with incomplete information,” in Proceedings of the 42th IEEE Conference on Decision and Control, (Maui, HI), pp. 756–761, 2003.

[63] L. Schenato, S. Oh, and S. Sastry, “Swarm coordination for pursuit evasion games using sensor networks,” in Proceedings of the International Conference on Robotics and Automation, (Barcelona, Spain), pp. 2493–2498, 2005.

[64] M. Foley and W. Schmitendorf, “A class of differential games with two pursuers versus one evader,” IEEE Transactions on Automatic Control, vol. 19, no. 3, pp. 239–243, 1974.

149 [65] A. Pashkov and S. Terekhov, “A differential game of approach with two pursuers and one evader,” Journal of Optimization Theory and Applications, vol. 55, no. 2, pp. 303–311, 1987.

[66] P. Hagedorn and J. Breakwell, “A differential game with two pursuers and one evader,” Journal of Optimization Theory and Applications, vol. 18, no. 1, pp. 15– 29, 1976.

[67] A. Levchenkov, “Differential game of optimal approach of two inertial pursuers to a noninertial evader,” Journal of Optimization Theory and Applications, vol. 65, no. 3, pp. 501–518, 1990.

[68] A. B. I.M. Mitchell and C. Tomlin, “A time-dependent Hamilton-Jacobi formu- lation of reachable sets for continuous dynamic games,” IEEE Transactions on Automatic Control, vol. 50, no. 7, pp. 947–957, 2005.

[69] L. Evans and P. Souganidis, “Differential games and representation formulas for solutions of Hamilton-Jacobi-Isaacs equations,” Indiana University Mathematics Journal, vol. 33, no. 5, pp. 773–797, 1984.

[70] D. Li, J.B. Cruz, Jr., G. Chen, C. Kwan, and M.H. Chang, “A hierarchical ap- proach to multi-player pursuit-evasion differential games,” in Proceedings of the 44th Joint Conference of CDC-ECC05, (Seville, Spain), pp. 5674–5679, December 2005.

[71] T. Schouwenaars, E. Feron, B. Moor, and J.P. How, “Mixed integer programming for multi-vehicle path planning,” in Proceedings of European Control Conference, (Porto, Portugal), pp. 2603–2608, September 2001.

[72] A. Richards, J. Bellingham, M. Tillerson, and J.P. How, “Coordination and control of multiple UAVs,” in Proceedings of AIAA Guidance, Navigation, and Control Conference, (Monterey, CA), pp. 5311–5316, 2002.

[73] P. Chandler and M. Pachter, “Hierarchical control for autonomous teams,” in Proceedings of AIAA Guidance, Navigation, and Control Conference, (Montreal, Canada), pp. 632–642, September 2001.

[74] J. Si, A. Barto, W. Powell, and D. Wunsch, Handbook of Learning and Approxi- mate Dynamic Programming. A John Wiley & Sons, Inc., 2004.

[75] A. Barto, S. Bradtke, and S. Singh, “Learning to act using real-time dynamic programming,” Artificial Intelligence, vol. 72, pp. 81–138, 1995.

[76] B. Roy, “Neuro-dynamic programming: overview and recent trends,” in Hand- book of Learning and Approximate Dynamic Programming, pp. 431–460, Kluwer, 2001.

150 [77] D. Li and J.B. Cruz, Jr., “Better cooperative control with limited look-ahead,” in Proceedings of American Control Conference, (Minneapolis, MN), pp. 4914– 4919, June 2006.

[78] W. Fleming and R. Rishel, Deterministic and Stochastic Optimal Control. New York: Springer-Verlag, 1975.

[79] W. Rudin, Principles of Mathematical Analysis. New York: McGraw-Hill, Inc, 1976.

[80] H. Risken, The Fokker-Planck Equation: Methods of Solution and Applications. Berlin: Springer-Verlag, second ed., 1996.

[81] A. Kian, J.B. Cruz, Jr., and M. Simaan, “Stochastic discrete-time Nash games with constrained state estimators,” Journal of Optimization Theory and Appli- cations, vol. 114, no. 1, pp. 171–188, 2002.

[82] I. Rhodes and D. Luenberger, “Differential games with imperfect state infor- mation,” IEEE Transactions on Automatic Control, vol. 14, no. 1, pp. 29–38, 1969.

[83] I. Rhodes and D. Luenberger, “Stochastic differential games with constrained state estimators,” IEEE Transactions on Automatic Control, vol. 14, no. 5, pp. 476–481, 1969.

[84] S. Kanchanavally, R. Ordonez, and J. Layne, “Mobile target tracking by net- worked uninhabited autonomous vehicles via hospitability maps,” in Proceedings of the American Control Conference, (Boston, MA), pp. 5570–5575, 2004.

[85] D. Li and J.B. Cruz, Jr., “Improvement with look-ahead on cooperative pursuit games,” in Proceedings of the 44th IEEE Conference on Decision and Control, (San Diego, CA), December 2006.

[86] M. Osborne and A. Rubinstein, A Course in Game Theory. MA: The MIT Press, 1994.

[87] J. Hu and M. Wellman, “Nash Q-learning for general-sum stochastic games,” The Journal of Machine Learning Research, vol. 4, pp. 1039–1069, 2003.

[88] Y.C. Ho, “Team decision theory and information structures,” Proceedings of IEEE, vol. 68, no. 6, pp. 644–654, 1980.

[89] M. Wooldridge, An Introduction to MultiAgent Systems. Chichester, England: John Wiley & Sons, Inc., 2002.

151