ON LINEAR CONTROL

OF

DECENTRALIZED STOCHASTIC SYSTEMS

by

Steven Michael Barta

B.S., Yale University 1973

S.M., Massachusetts Institute of Technology

1976

SUBMITTED IN PARTIAL FULFILLMENT OF THE

REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY at the

MASSACHUSETTS INSTITUTE OF TECHNOLOGY July, 1978

Massachusetts Institute of Technology 1978 Signature redacted Signature of Author. Department of Electrical Engineering and omPuerScience, July 5, 1978 Signature redacted Certified by. Thesis Sf ervisor

Accepted by...... Chairman, Departmental Committee on Graduate Students 2

ON LINEAR CONTROL OF DECENTRALIZED STOCHASTIC SYSTEMS

by

Steven Michael Barta

Submitted to the Department of

Electrical Engineering and Computer

Science on July 5, 1978 in partial

fulfillment of the requirements for

the degree of Doctor of Philosophy.

Abstract

A general decision and control model for stochastic linear systems with several decision-makers who act on the basis of different informa- tion is formulated and analyzed. Many important systems, such as those described by stochastic linear differential equations and by stochastic linear differential-delay equations, are special cases of the model. Problems with the classical information pattern or with nonclassical information patterns, such as the no sharing or delayed sharing of obser- vations, can be studied with the model. To develop the existence results and optimality conditions, the controls are restricted to be causal linear functions of the information. Then the different information patterns are defined in terms of restric- tions on the linear operators defining the control laws. Existence of solutions to the feedback equations of the model is proven for linear control laws. For quadratic cost functions, general necessary conditions are presented. These conditions are equivalent to a set of nonlinear integral equations. When control actions propagate no faster through the system than information, the optimality conditions are also sufficient and are equivalent to a set of linear integral equations. The operator approach leads to a straightforward derivation of a general separation theorem for classical information patterns and an interpretation of fixed- structure constraints. For a decentralized version of linear quadratic loss decision theory, a certainty equivalent type separation theorem is proven and an integral equation similar to the Wiener-Hopf equation is derived to characterize the solution. By generalizing the innovations method of classical 3 estimation theory, the integral equation is solved. By specializing the problem to systems described by finite-dimensional state equations, a recursive solution reminiscent of the Kalman-Bucy filter is derived. For problems with delayed sharing of information, it is shown the control may be written in a separated form as the sum of two functions of certain uncontrolled processes. One function is linear in the control operator which operates on the information possessed by all stations (common information), while the other is nonlinear in the operator for information possessed by each station and uncorrelated with the common information. In certain cases, this separation of the control leads to a separation of the optimization problem into a centralized problem based on the common information and a decentralized problem based on the information uncorrelated with the common information.

Thesis Supervisor: Nils R. Sandell, Jr. Title: Associate Professor of System Science and Engineering v

0

-o m z --I Cl) 5

Acknowledgment

It is a pleasure for me to thank Professor Nils Sandell, my thesis

supervisor. Throughout the course of this work, he made many valuable

suggestions.

I am grateful to Professors Michael Athans and Yu-Chi Ho, my thesis readers, for their stimulating comments.

I appreciate the helpful discussions I have had with my fellow graduate students Douglas Looze, Ronald Feigenblatt, Demos Teneketzis,

Robert Washburn, and Joseph Wall.

For typing the thesis, I thank Nancy Ivey; for drawing the figures,

I thank Arthur Giordani. Their work is excellent.

This research was conducted at the M.I.T. Electronic Systems

Laboratory with partial support provided by the National Science

Foundation under grant NSF/ENG-77-19971, by the Office of Naval Research under contract ONR/N00014-76-C-0346, and by the Department of Energy under contract ERDA-E(49-18)-2087. 6

TABLE OF CONTENTS

Chapter Page

I Introduction 8

1.1. Decentralized Mathematical Optimization 8

1.2. Summary of thesis 16

1.3. Contributions of thesis 20

Ii A Stochastic Linear System Model 21

2.1. The Model 21

2.2. Examples 25

2.3. Discrete-time Version of Model 35

III Existence Results and Optimality Conditions 37 for Linear Control Laws

3.1. The Operator Formulation 37

3.2. Existence 40

3.3. Optimization 43

3.4. Partially Nested Systems 56

IV Operator Derivations for some Classical 61 and Nonclassical Models

4.1. Optimization with the 61 Classical Information Pattern

4.2. Decentralized Fixed-Structure 74 Optimization

V Certainty Equivalent Solutions by a 81 Decentralized Innovations Approach

5.1. Separation of Estimation and 81 Decision-making in a Team

5.2. Solution of the Estimation Problem 91 7

Chapter Page

5.3. The Decentralized Kalman-Bucy Problem 98

5.4. The Second-Guessing Paradox 104

VI Delayed Sharing Information Patterns 113

6.1. Separation of the Control Law 113

6.2. Application to Optimization 123

VII Conclusions and Suggestions 129 for Future Research

7.1. Summary and Conclusions 129

7.2. Areas for Future Research 133

Appendix 1 135

Appendix II 140

Appendix III 142

References 144

Biographical Note 149 8

CHAPTER I

Introduction

1.1. Decentralized Mathematical Optimization

Many physical, social, and biological phenomena may be described by mathematical models. One of the principal components of such a model is

a system. A set of equations, such as a set of differential equations,

characterizes the system. Often there is a decision-maker (or a set of decision-makers) associated with the system (see Figure 1.1). The deci- sions are defined in terms of mathematical functions which depend on the decision-maker's information about the system. The objective of the decision-maker is to choose the actions so as to minimize or maximize some mathematical function which depends on the decisions and on certain variables related to the system. When the decisions affect the system,

the decision-making process is called feedback control. In addition, it is often assumed there are disturbances affecting the system; these disturbances are usually modeled as random variables or as stochastic processes.

To see how a typical engineering problem is formulated as an optimi-

zation problem for a mathematical model, consider the control of message flow at one node of a data communication network or the similar situation of control of traffic flow at an entrance ramp to a highway. Suppose each system is described by a state variable. For the node this variable is the number of messages in the queue waiting to enter the network, while for the entrance ramp the state variable is the number of vehicles waiting to enter the highway. It is reasonable to assume the state variables DISTURBANCE SYSTEM DECISION- MAKER IM At IP

L m - m m m - m m m - m m e - m m m m m uJ

Fig. 1.1. System and Decision-maker. 10

satisfy a set of differential equations with stochastic driving terms,

where these driving terms represent random inflows of messages or vehicles.

Furthermore, the effects of the rest of the network on the node are also

modeled by the stochastic process driving terms. Let the control actions

affect the rates at which messages or vehicles leave the queues. Then a

possible performance objective is to keep the flow rates near some pre-

specified nominal values while not allowing the queues to become too

large. To translate this objective into a mathematical performance index,

we form the sum of a nonnegative increasing function of the absolute value

of the state variable and a nonnegative increasing function of the absolute

value of the deviation of the flow rate from the nominal rate. Then the

cost criterion to be minimized is the expected value of the time integral

of the sum. The information available to the controller at any time in-

stant is the past record of the state variable values, the past record of

control inputs, and a priori knowledge of the model and the statistics of

the random disturbances.

The mathematical theory of optimal decision and control is concerned

with determining necessary and sufficient conditions to characterize the

decisions which minimize or maximize a given performance criterion subject

to the constraints imposed by the system structure. By combining ideas

from the fields of probability, stochastic processes, statistics, game

theory, calculus of variations, and mathematical programming, researchers

have developed powerful tools, such as stochastic dynamic programming and

the maximum principle, to analyze certain types of mathematical

-- - F -...- 11 models.1 Although in many cases the solutions given by the theory are

difficult to compute, the theory provides a unifying conceptual basis for

studying mathematical optimization and it provides insight which is valuable in designing suboptimal controllers.

An important theme in optimization theory (and also in other areas

of engineering) is the use of on-line measurements, i.e. observations

taken while the system is operating, in the optimal controller. In deter-

ministic systems, feedback controllers are often used to achieve perfor-

mance objectives. In dynamic random environments an essential element of

optimal decision rules is the dependence of the decisions on on-line mea-

surements. This dependence is important even if the decisions are not

fed back into the system. For example, in estimation of a random process,

which is not being controlled, but is observed in additive noise, it is

clear that good strategies will revise the a priori estimates as new ob-

servations are taken.

In any decision and control strategy for a deterministic feedback

system or for a stochastic system (with or without feedback), it is

necessary to specify for each time instant what information about the

system is available to the decision-maker. It is convenient to divide the

information into two types. A priori information about the structure of

the model, the a priori statistics of the random disturbances, and similar

data is off-line information, i.e. it is known at the initial time before

the system starts operating. On the other hand, sensor measurements,

which depend on the time paths of the random variables describing the

1. There are many references on deterministic and stochastic optimization. See, for example, Aoki [2], Astrom [4], Bellman [10], Dreyfus [19], and Kushner [31].

I-l-, -,. I 12

system, are unknown at the initial time. Most of the powerful results of

stochastic are based on the assumption that there is a

single decision-maker who has access to all off-line and on-line informa-

tion. Furthermore, this decision-maker is able to store all past obser-

vations and all past inputs. (If there are several decision-makers, it

is assumed they have the same information.) This information structure is

called the classical information pattern.

Although dynamic observation structures are often accurately modeled

by classical information patterns, in large scale systems, e.g. data com- munication and power networks, command and control systems, economic sys-

tems, and ecological systems, decision-making is usually decentralized

among several agents, each of whom acts on the basis of different informa-

tion. Such decentralized information is called a nonclassical information pattern.2 In Figure 1.2, for example, suppose both decision-makers have

the same off-line, i.e. a priori, information. Because decision-maker

1 observes output processes 1 and 2, while decision-maker 2 observes out-

put processes 2 and 3, the information pattern in nonclassical. If the network example discussed previously in this section is extended to

explicitly include several nodes, then the flow control problem is a

stochastic control prohlem with classical information -pattern when there

is complete exchange between the nodes with no delays of all off-line and on-line measurements. A more realistic model would involve delayed ex- change of on-line information such as the instantaneous values of state variables. Here the flow control problem is a stochastic control problem

2. For a good survey of decentralized control, see Sandell et al. [46]. This paper is in the Special Issue on Large-scale Systems and Decen- tralized Control of the IEEE Transactions on Automatic Control, which contains many other relevant papers. r m -mmmmmmmmmmmmmmm-mmmmmmmmmmmmmmmm I I I I I -A-- I OUTPUT1 w a Im *1 DECISION - MAKER 1

am I I

DISTURBANCE, OUTPUT SYSTEM 2

or MMN 2 OUTPUT3 DECISION-MAKER

I a I I La m m m mi

Fig. 1.2. Decentralized Decision-making. 14 with a nonclassical information pattern. The reasons for designing decentralized control systems include, among other things, lack of central- ized computing capability, limits on the rates of communicating informa- tion, and the economic tradeoffs between the costs of communication links and the costs of distributed computation and control.

When there is a single optimization objective for the decision-makers, 3 they are called a team.3 Marschak and Radner [39,41] pioneered the study of static teams, i.e. situations where there is no time evolution of the system.4 In several seminal papers, Witsenhausen [54,55,56,57] investi- gated dynamic team problems. One of the most important results of these studies is an example of a dynamic linear-quadratic-Gaussian (LQG) team problem with feedback for which the separation theorem does not hold and for which nonlinear solutions outperform the best linear solution [54].

The explanation of why this nonlinearity occurs provides much insight into the interactions between information and control in decentralized systems

The key idea is the interpretation that each decision-maker's input serves two purposes. While each input has the usual funcyion of direct control, when there are differences in information, the optimal decisions are also being used to communicate information between the controllers through the system dynamics. This communication role is called signaling.

Although it is not possible to separate out the control and communication parts of the decision rules, evidence from the study by Aoki [3] and by

Sandell and Athans [45] of the control sharing information pattern and from

3. This is in contrast to game situations, where there are conflicting objectives. 4. See Beckman [9], Groves and Radner [21], Kriebel [29], and McGuire [40] for some applications of team theory. 15 other problems indicates that the complex nonlinear control strategies arise from the communication aspects of the problem. Sandell [44],

Bismut [12], and Ho, et al. [23] also consider the signaling phenomenon.

For a certain class of discrete-time decentralized systems with linear dynamics, quadratic performance criteria, and Gaussian disturbances,

Ho and Chu [22] prove the optimal solution is a linear function of the information. These decentralized LQG systems have the property that if controller i's actions influence the information of controller j, then j has at least the same information as i. The information structure of any decentralized system with this property is called a partially nested information pattern. In partially nested systems there seems to be no need for signaling strategies. Hence, it is not surprising that the partially nested LQG problem has a linear optimal solution. The discrete- time one-step delay problem solved by Sandell and Athans [45], Kurtaran and Sivan [30], Toda [48], and Yoshikawa [60] is an example of a system with a partially nested information pattern.5

5. Segall [41] discusses a continuous-time analogue of the one-step delay problem. 16

1.2. Summary of Thesis

A striking characteristic of current decentralized stochastic control theory is the diversity of different models and different approaches to optimization. For example, there are many types of information pattern studied. These range from no sharing of information between agents

[ 8,14,22,41,52,55,57] to delayed sharing [30,45,47,48,51,54,60] to the classical case of complete sharing [2,4,5,10,19,31]. In some cases only system observations are communicated, while other problems involve sharing of the control values. For a given information pattern the class of admissible control laws that have been considered covers the spectrum from almost arbitrary nonlinear functions [55,57] of the information to fixed structure models [14,36,37] where the gains of a linear compensator are determined. There are, however, very few general models encompassing the more specific results.6

The model analyzed in this thesis provides a unified framework for studying linear control of linear stochastic systems with classical and nonclassical information patterns and quadratic performance criteria.

In Chapter II, we formulate an input-output model of a linear system with stochastic disturbances. A similar model is considered by Lindquist

[34,35]. Then by coupling linear systems together, we obtain the inter- connected linear system model which forms the basis for all subsequent results in the thesis. To illustrate the generality of the model, we discuss several examples of systems with classical and nonclassical information patterns. 6. For general discrete-time problems, see Witsenhausen [57]. Sandell [44] considers a general finite-state, finite-memory model. 17

In Chapter III the relationships between the inputs and outputs are

described in terms of linear operators on a vector space. When the

controls are restricted to be linear functions of the observation pro-

cesses, the control laws are characterized by linear operators. The diff-

erent information patterns are then interpreted in terms of restrictions

on these operators. This operator approach leads to an elegant theorem

on the existence of solutions to the feedback equations of the system.

By using a generalized differential, we obtain a general necessary condi-

tion characterizing the optimal linear control law for systems with

quadratic criteria and general information patterns. This condition is

equivalent to a set of nonlinear integral equations. When the information

pattern is partially nested, the necessary condition is also sufficient.

Moreover, in this case the optimality condition is equivalent to a set of

coupled linear integral equations.

Several applications of the operator approach are considered in

Chapter IV. The classical separation theorems [5,7,31,34,59] for systems

described by differential equations and differential-delay equations are

derived. For decentralized cases where the controllers are constrained

to be finite-dimensional compensators, we show these restrictions are

equivalent to restrictions on certain impulse response matrices.

In the next two chapters we discuss some nonclassical situations where the overall optimization problem satisfies separation properties.

The optimal solution for a linear-quadratic-Gaussion (LQG) system with a

classical information pattern is a well-known example of a separated

control law. Separation principles are important because they yield 18

structural insight which is valuable in designing either optimal or sub-

optimal controllers. In particular, estimation and decision schemes might

be designed independently and then combined to form an overall control

law. Moreover, if the estimation problem is solved by a finite-dimensional

sufficient statistic which may be updated recursively, then large scale data reduction is possible. The Kalman filter estimate, for example, is a

sufficient statistic for the LQG problem with classical information pattern.

Thus, although the optimal control is a function of all past data, the

information necessary for control is summarized in the estimate.

A certainty equivalent type separation theorem is proven in Chapter

V for a decentralized version of linear quadratic loss decision theory.

To characterize the solution, we derive an equation similar to the

Wiener-Hopf equation. By generalizing the innovations method of Kailath

[24,26], this integral equation is solved. By specializing the problem

to systems described by finite-dimensional state equations, a recursive

solution reminiscent of the Kalman-Bucy filter equations is derived.

Finally a second-guessing paradox is resolved.

For problems with delayed sharing of information, we show in Chapter

VI that the controls may be written as the sum of two functions of uncon-

trolled stochastic processes. One function is linear in the control oper-

ator which operates on the information possessed by all agents (common in-

formation), while the other is nonlinear in the control operator for in-

formation possessed by each agent and uncorrelated with the common

7. See Fel'dbaum [20] for a pioneering treatment of estimation and control in dynamic optimization. For more recent results on separation, see Bar-Shalom and Tse [7] and Wonham [59]. 19 information. In certain cases this separation in the control laws leads to a separation of the optimization problem into two problems, a classical problem based on the common information and a decentralized problem in- volving the information uncorrelated with the common information. We also discuss a result of Varaiya and Walrand [51].

Chapter VII contains conclusions and suggestions for future research.

1 ~ 20

1.3. Contributions of Thesis

1. Formulation of a linear operator model for control of interconnected

stochastic linear systems with classical and nonclassical information

patterns.

2. Existence proof for the feedback equations of the model for general

information patterns.

3. Derivation of necessary optimality conditions and transformation of

the conditions into nonlinear integral equations.

4. Characterization of optimality for systems with partially nested

information patterns in terms of linear integral equations.

5. Solution of a decentralized Wiener-Hopf equation by a generalized

innovations approach and presentation of finite-dimensional sufficient

statistics for a team analogue of the Kalman-Bucy problem.

6. Development of separation results for delayed sharing information

patterns. 21

CHAPTER II

A Stochastic Linear System Model

In this chapter we formulate and discuss a control model for inter-

connected stochastic linear systems. This model forms the basis for all

the results of the thesis. Several examples are presented to illustrate

the generality of the model.

2.1. The Model

Many linear systems are characterized by an input-output description of the form

0t y(t) = y (t) + f N(t,s) u(s) ds, tE[O,T], (2.1) 0 where y(-) is the p-dimensional output function, u(-) is the m-dimension-

al input function, and y0 (-) is a disturbance function. We will model

y0(-) - Ey0(-), where "E" is the expectation operator, as either a mean-

square continuous process or as the sum of a white-noise process and a mean-square continuous process. We assume N(t,s) is either an L2 matrix

kernel, i.e.

t t tr f f N(t,s) N'(t,s) ds dt < m, (2.2) 0 0

1. A zero-mean stochastic process is mean-square continuous if and only if its covariance R(t,s) is continuous for all (t,s)S[0,Tx[O,T]. See Wong [58, Ch. 2]. By recognizing that diffusion processes and processes described by Poisson-driven stochastic differential equa- tions are mean-square continuous, we see that mean-square continuity is not a very restrictive assumption. 22

where tr is the trace operator, or the sum of an L2 kernel and a function

N (t) 6(t - s), where N1 (t) is a continuous matrix function and 6(t - s)

is the Dirac delta function. All admissible controls must satisfy

T f EI|u(t)11 2 dt < c. (2.3) 0

To define an- interconnected stochastic linear system, consider the

set of observations

0M t y.(t) = y.(t) + E f N..(ts) u.(s) ds, i=l 0 (2.4)

te[0,TJ, i = 1, ... , M,

where y0(-) and u.() are p -dimensional and m -dimensional, respectively, 1 1 stochastic processes satisfying the assumptions associated with (2.1).

Thus, each observation process is the output of a linear system driven by

M input processes. Furthermore, assume there is a one-to-one correspond-

ence between the inputs and the outputs. That is, we interpret the over- all system as consisting of M stations; at station i measurements y(-) 21 and control actions u.(-) are taken.2 1 The objective of the control actions is to minimize the quadratic functional

T M J = E [x'(t)Q(t)x(t) + E u!(t)R.(t)u.(t)] dt, (2.5) 0 i=1 1 1 1 2. This assumption is not very strong and is chosen mainly for conven- ience in relating the model to physical situations.

- -"T17 23 where Q(t) 0, R.(t) > 0, and x(-)is an n-dimensional state process de- fined by

0 M t x(t) = x (t) + E f K. (t ,s)u. (s)ds. (2.6) i=1 0

The mean-square continuous process x ( ) is not'affected by the controls and K.(t,s) is either an L2 kernel or a Dirac delta kernel or the sum of such functions.3 Since there is a single system objective, we have a team problem and we will refer to each station as an agent of the team.

To determine the admissible controls, we must specify the informa- tion pattern of the system, i.e. upon what data is the control input u (-) at stationri allowed to depend. We also specify the class of functions of the information which may be used to form the control values at each instant of time. Suppose agent i's information at time t consists of all off-line information, such as the a priori statistics of the un- 0 0 controlled processes x () and y.('), I = 1,...,M, and the set of obser- 1 vations

g.(t) = {y.(s.), s.c[0, t - T..], j = 1, ... , M}, (2.7)

where T . is the delay in transmitting data from station j to station i, and if t - T.. < 0, then i has not received any on-line data from j. 1J This model of the information sets is quite general and includes the

3. A similar model is considered by Lindquist [34] for one station, i.e. a classical information pattern. 24 classical information pattern and several nonclassical patterns.

If T.. = 0 for all i and j, for example, there is complete sharing 'J (classical case), while if T.. > T for all i and j, there is no sharing 13 of information (a nonclassical case).

The set of admissible control laws, denoted by U =-y(0()) i = 1, ... , M}, is defined as the class of functions measurable with res- pect to the a-fields generated by the information sets and for which 4 there exist solutions to (2.4). Thus, the control value at time t, u.(t), is given by u.(t) = y.(2K(t),t). Note the controls are always causally dependent on the observation processes.

4. Characterizing this general class of functions is a difficult problem which has not been resolved even for classical information patterns. See Bene [11], Davis and Varaiya [16], Lindquist [35], Willems [53], and Witsenhausen [55] for further discussion.

"-=F-"'------r-"' -- 25

2.2. Examples

In this section we present several examples to show how some impor-

tant control models may be written in the form (2.4), (2.5), (2.6), and

(2.7).

The first example is the linear-quadratic problem with white-noise driving disturbances and classical information pattern. We are given an n-dimensional linear stochastic differential equation

x(t) = A(t) x(t) + B(t) u(t) + E(t), x(O) = x0, te[O,T], (2.8)

where u(-) is an m-dimensional control process, and C(-) is a white-noise process. The control must be selected as a causal function of the p-dimensional process

y(t) = C(t) x(t) + 0(t), (2.9)

where 0(-) is a white-noise process. The objective is to minimize

T J = E f x'(t) Q(t) x(t) + u'(t) R(t) u(t) dt. (2.10) 0

To write this model in the integral equation form, we define the processes

X (t) = A(t) x(t) + t(t), x (0) = x (2.11) 26

and

xU (t)=xA(t) (t) + B(t) u(t),.xu(0) =0 (2.12)

Then we have x(t) = x0 (t) + xu (t), which implies

0 x(t) = x (t) + f C(t,s) B(s) u(s) ds, 0 (2.13)

K(t,s) 2 (t,s) B(s),

where P(t,T) is the transition matrix of A(t). The observation process y(t) is

0 t y(t) = C(t) x (t) + 0(t) + f G(t) t(t,s) B(s) u(s) ds . (2.14) 0

If we define

0 0 y (t) C(t) x (t) + 0(t)

N(t,s) H C(t) D(t,s) B(s), then

y(t) = y(t) + fN(t,s) u(s) ds. (2.15) 0

-'N, 27

Remark 2.1. When there is no observation noise, i.e. O(E) E 0, it

is sometimes convenient to model the observations as

y(t) = x(t) = A(t) x(t) + B(t) u(t) + E(t)

and 0 -0 0 y (t) = x (t) = A(t) x (t) + (t).

Notice that with these observations, any control of the form

u(t) = G(t) x(t), with G(t) square-integrable, may be written as an L t integral u(t) = G(t) f y(s)ds. 0 In this model, there is a single station and hence the information pattern is simply

F(t) = {y(s), s [0,t]} . (2.16)

To generalize the previous example to the case of two stations with

different information, let the state equation be

x(t) = A(t) x(t) + BI(t) u1(t) + B2 (t) u2(t) + t(t)

(2.17)

x(0) = x0 , tE[0,T],

where u(-) (i = 1,2) is the control process and (-) is white-noise.

Suppose there are two sets of observations

y.(t) = C.(t) x(t) + 6.(t), i = 1,2, (2.18) S11 28 where {E. ()}. is a set of white-noise processes which may be corre- 1 iL- lated. The optimization criterion is to minimize

T 2 J = E f x'(t) Q(t) x(t) + Z u!(t) R.(t) u.(t) dt, (2.19) 0 i=1' where Q(t) 0, R.(t) > 0, and u.(') must be a causal function of y.()

As in the first example, we may write

t 0 x(t) = x )0+ f $(t,s)B1 (s)u1 (s)ds + f 4(t,s)B2(s)u2(s)ds, (2.20) 0 0

0 with x (t) defined by (2.11). Also.,

0 2 t f N.. (t,s) u. (s) ds, (2.21) y (t) = y.(t) + E J i=1 0 1 where 0 0 y.(t) (t)t) x (t) + 6.(t) i

N..(t,s) C.(t) 4(t,s) B.(s).

The information pattern of agent i is

)={y(s), ss[0,t]}, i = 1,2. (2.22)

As a variant of this problem, we may have an information pattern which

allows for sharing of measurements. For example, consider the

T"T 29 information sets

( 1)S= {y1(s),sIs[O,t],y2(s2),s26[0,t - T12 (2.23)

- T21}(2.24) 2(t) = {ys2( ,s2 E[O,t],y1 (s1 ),s1 s0,t

In this system, station 1 transmits its measurements to station 2 with a time delay of T2 1 , while station 2 transmits its measurements to station

1 with a T1 2 delay. If T21=12 0, then we still have two linear systems coupled together, but the information pattern is classical.

Although the information structure (2.7) implies measurements are communicated noiselessly between stations, communication over noisy channels can also be handled in our model. Assume, for example, that agent i instantaneously sends his measurements to agent j, but the commu- ication channel adds white noise 0.. (t). Then we define a new set of 3 measurements

y(t) = Cn(t)x(t) + n(t),i = 1,2, (2.25) where CnC1t) - [C(t)

CC2 (t)

On t)1 2(t) + a021(t)

n [ (t) + 012(ti 02(t) B12 0 2L 2(t) J 30

information pattern be (2.22) with y.(-) replaced by y.(-). Let the 1

Thus, we have modeled noisy communication channels as an information pattern with no sharing.

The next example is a nonclassical generalization of quadratic loss decision theory. Suppose the team objective is to minimize the loss function

T J = E ! [(u(t) - S(t)z(t))Q(u(t) - S(t)z(t))] dt, (2.26) 0 where Q = Q' > 0, {z(s),ss[0,T]} is an uncontrolled n-dimensional state process, S(t) is an M x n time-varying matrix, and {u(s),ss[O,T]} is an

M-dimensional decision process whose components u (-) are restricted to be causal functions of an uncontrolled p.-dimensional observation process 0I {y (s) ,ss[Q,TJ}.

In this model, N..i(t,s) 2 0 for all t, s, i = 1, ... , M, and therefore 0 Y 1(t) = y.(t), i = 1, ... , M . (2.27)

For the state process x(t) let

x (t) 2 - S(t)z(t) (2.28)

K.(t,s) B e.6(t - s), i = 1,...,N , (2.29) where e, is a column vector with a 1 in element i and 0 in all other elements and 6(t - s) is the Dirac delta function. In the cost (2.5), set

R. (t) E 0. The information pattern is

J.(t) = {y.(s),O s S t} . (2.30)

We do not exclude the case where y. E y. for some ifj. This may be SJ

------7. 31 interpreted either as noiseless, instantaneous sharing of information or as a problem with fewer stations, but with vector decisions made at some stations.

The last model we discuss is a network of interconnected linear systems where both the measurements obtained at a node and the actions taken at the node propagate to other nodes with some delay. Consider the network shown in Figure 2.1. When there is no single link connecting a

Fig. 2.1. A network.

pair of nodes, e.g. nodes i and k, then we assume the events at these nodes influence each other only indirectly, i.e. the coupling terms in the state dynamics between the state variables corresponding to these nodes will be zero.

The state equations for an M node network are 32

x.(t) = A..(t)x.(t) + E A. .(t)x.(t - T..) 11 1 13 3 13

+ B.(t)u.(t) + C4.(t), tE[0,T] 1 1 1

x.(t) = X.(t), t < 0, i = 1,...,M , (2.31)

where the A. .(t) and B.(t) are bounded matrices for all i and j, 13) {X (t)} is a set of random variables with known joint distribution for all t, and { .(t)}. is a set of white-noise processes which may be corre- 1 1 lated. Note that if A.. E 0, then by our convention there is no link connecting node i to node j. Assume at each node the measurements are

y(t) = C. (t)x.(t) + 0.(t),i l,...jm , (2.32) where C.(t) is a bounded matrix and {O.(-)}. is a set of white-noise 1 1 1 processes.

Let

x'(t) E [x{(t),...,x'(0)

V (t) [E (t),..., (t)]

u'(t) E

U'(t) u!(t), ... , 3 (t)]

We now show that x(t) and y.(t) may be written as (2.6) and (2.4), respectively. From (2.31) and the definition of x(t), we have

x(t) =d(t)x(t) + Ed. .(t)x(t - T) i 13 13~ 1,j

+ (t)u(t) + ((t), te[0,T],

x(t) = X(t), t < 0 (2.33)(233 33 whered(t) is a block diagonal matrix with block i equal to A.. (t), 11 4..(t) is a partitioned matrix with element (i,j) equal to A..(t) and all 1J 1J other elements zero, W(t) is block diagonal with block element (i,i) equal to B.(t) and other elements zero.

Define the uncontrolled and controlled processes by

-00 0 x (t) =.4(t)x (t) + Z..(t)x (t - r..) + ,(t), i9 J 1J

tE[O,T], x0(t) = X(t), t 0 (2.34) and

x (t) =,d(t)xu(t) + .Z..(t)x (t - T ) 1,, ii ij

+ (t)u(t), te[0,T], X (t) = 0, t 5 0. (2.35)

To obtain the equation for x(t), let (t,s) be the transition matrix

satisfying 4(ts) =4(t)Ot(t,s) + E . (t)(t - T s),

tE[O,T], '(t,t) = I, 4)(t,s) = 0, t < s.

Then x(t) = x0(t) + x t) satisfies

0 t x(t) = x (t) + f K(t,s)u(s) ds, 0 where 00 x (t) = P(t,0)X(0) + Zf D(t,s + T (s + )X(s) ds

t + f (t,s)4(s) ds, 0 and I I

34

K(t,s) E D(t,s)S(s).

For the observation processes, we have 00 Y(t)= [,...,C.(t),...,0]xt) + O.(t),

N..(t,s) E C.(t)4. .(t,s)B.(s). 13 1 33 3

Many information patterns are possible in this network model.

An interesting case occurs when the information propagates only along links and at the same rate as the effects of the control actions-.

Then station i has the information

t)= {y3s), (s - T.j, jES.} (2.36) ly i () J. S Tj 1 where S is the set of nodes directly connected to node i. 35

2.3. Discrete-time Version of Model

A discrete-time version of our basic optimization problem is

T M Min E E [x'(t+l)Q(t+l)x(t+l) + E u!(t)R .(tU (t)3 (2.37) t=0 i=1 where 0 M t-l x(t) = x0t) + E E K (t,s)u (s),t = 0,...,T + 1, (2.38) i=1 s=0 and the observations satisfy

0 M t-1 y.(t) = y.(t) + E E N..(t,s)u.(s),t = 0,...,T, (2.39) j=l s=0 1J and, for each t, the second moments of x0(t) and y0(t) exist. The dis- crete-time analogue of the L2 condition (2.2) is

T T tr E E N(t,s)N'(t,s)

The on-line information is

j(t) = {y.(s.), s.=0,1,...,t-T ,j=1,...,M},t=0,...JT (2.41) and the control is chosen as u.(t) = y (J (t),t) for some function 1 ii

Y (-,-),i=1,P...,9M.

Although all the discussions in this thesis are in terms of the continuous-time model, there are discrete-time analogs for the results.

The reason for this is that the mathematical framework for the subsequent development is the family of L2 spaces, i.e. the Hilbert spaces of 36 square-integrable functions, and there is an isomorphism between the L spaces and the k2 spaces, i.e. the spaces of square-integrable sequences

(see Balakrishnan [6], p. 97). 37

Chapter III

Existence Results and Optimality

Conditions for Linear Control Laws

An existence theorem for linear control laws and necessary optimal- ity conditions are derived in this chapter. The optimality conditions are equivalent to a set of nonlinear integral equations. For a certain class of information structures, which includes the classical structure, these conditions are shown to be sufficient also and to be equivalent to a set of linear integral equations.

3.1. The Operator Formulation

In developing the existence and optimization results, we use the following operator interpretation of linear systems. Write (2.4) and

(2.6) as, respectively, M y. = y. + E N. .u., i= 1,. ..,m (3.1) j=l1

0 M x = x + Z K.u.. (3.2) i=l

Each of these equations is a description of the respective stochastic processes y. and x on the interval [0,T]. The linear operators N. . and 1J 2 K. are Volterra operators on the L2 space of appropriate dimension. 12 We also assume all processes have zero mean. In the linear case with linear control laws and quadratic criterion, this is no real loss of generality since a nonzero mean simply adds a constant term to the control.

1. See Appendix I for an introduction to linear operators. 2. See Wong [58, Ch. 2] or Doob [18, Ch. 9] for discussions of the stochastic integral for mean-square continuous and white-noise processes. I I

38

Let U CU be the class of control laws containing the linear functions of the information (2.7) which satisfy (2.3). Thus, for any control law in UL we may represent the control u.(t) as

M t u.(t) =Z I H..(t,s)y.(s)ds, te[O,T], (3.3) j=l 0 where

H. .(ts) E 0, s > t - T.., (3.4a)

T T tr f f H (t,s)HI.(t,s)dtds < o. (3.4b) 0 0 ij 1

Although only when y. contains white-noise is (3.4b) essential to ensure J that (2.3) is satisfied, this assumption is important in the development of the mathematical framework. Furthermore, in most cases of engineering interest, (3.4b) is reasonable. Equation (3.3) is equivalent to the operator expression M u= H.,y. (3.5) j=l

Notice that H.,., for all i, j is an L Volterra operator. Although we 13 2 have not specifically mentioned the dimensions of the kernel H. . (t,T), in 13 all subsequent discussions the dimensions are assumed to be chosen con- sistently. Define the partitioned operator

1H1-H ll''. . HE 39

with kernel H1 (t~s)...HlM(t~s) H(t s)E I IM

HMI (t.,s)...HMM(ts)

Then (3.4b) implies H satisfies

T T tr f f H(t,s)H'(t,s)dtds <-oo (3.6) 0 0

This relation defines a norm on the space of linear operators X satisfying (3.6) and moreover for any HVl H2 s4E we may also define

T T E tr f f H 1 (t,s)H (t,s)dtds = tr(I-1H2 ) (3.7) 0 0 as an inner product so that Wis a Hilbert space (H is the adjoint of

H). In the next three sections we consider operator derivations of the theorems on existence and optimization.

3. See Appendix I or Balakrishnan [6, Ch. 3] for further discussion of vector spaces of operators. I I

40

3.2. Existence

In the model of Chapter II, the large class of control laws U is admissible. The class U is indeed the largest class for which our linear probabilistic mathematical model with second-order controls is physically meaningful.

There are several difficulties which usually prevent us from finding the optimal control in U for the criterion (2.5). First, there is no method for checking whether or not an arbitrary nonlinear control law is 4 admissible or not. Second there is no way to test an admissible candi- date control law for optimality when the problem has the general non- classical information pattern. This situation contrasts sharply with the case of the classical information pattern. In the classical problem, the dynamic programming algorithm provides sufficient conditions for optimal- ity. Although using the algorithm imposes a substantial computational burden, dynamic programming does provide a method for testing a given control law for optimality. For most nonclassical problems, however, there is no similar set of sufficient conditions.5 A necessary condition, called person-by-person optimality, is that any agent's control must be optimal among all of his admissible controls given that the other agents' controls are optimal.

4. Wonham [59] imposes a Lipschitz condition on the control laws to guarantee existence of a solution to the feedback equations. See BeneI" [11], Davis and Varaiya [16], and Lindquist [35] for different approaches to guarantee existence. These approaches do not, however, provide tractable criteria for testing admissibility. 5. See Witsenhausen [57] for sufficient conditions for a class of discrete-time problems.

rI-r 41

In general, however, many nonoptimal solutions are also person- by-person optimal.6 Moreover, even to check a solution for person-by- person optimality, we require a characterization of the admissible con- trols more analytically tractable than the definition given in Chapter II.

Therefore, because of these conceptual and analytical difficulties associated with the class U, we further restrict the control laws to lie in the subclass of U consisting of the linear functions of the informa- tion. This subclass is UL, which is defined in Section 3.1.

Letdte be the subspace of)?' A which satisfies (3.4a), i.e. the admissible controls. We now prove the feedback equations have a solution for any HE'. Substituting (3.5) into (3.1) yields A

0 M M y. = y. + E N.. E H.kYk' i = l,...,M. (3.8) j=1 k-1

Defining the partitioned vector processes

yy Y

y -Y - - .0 YM Y and rearranging (3.8), we obtain

[I - NH]y = y0 , (3.9) where I is the identity operator and NH is the partitioned operator with element (i,k) given by

M (NH)ik = E N. .H.'(3.10) ik . i3 jk (.0

6. See Witsenhausen [54] for an example. I I

42

Existence of a solution to the feedback equations is now quite easy to

prove.

Theorem 3.1. For the class of Hihf, the operator (I - NH) is

invertible, the unique inverse is a Volterra (causal) operator,

and the process y satisfies -l 0 y= (I - NH) y . (3.11)

Pf: Since tecV, the operator H satisfies A the conditions assumed in Theorem AI.1 in Appendix I. Hence we have (3.11). Q.E.D.

In our existence proof we assume the kernel of H is square-integrable.

Although this rules out delta function kernels, note that we do allow

such functions in the kernel of N. Furthermore, if we apply the results

of Willems [53, Ch. 4], then the condition on H may be weakened to the

assumption that the instantaneous gain of (NH)r is less than unity for

some integer r greater than zero.

Notice that in (3.11), the output process is expressed in terms of

the uncontrolled process y . If we define

u u-

UM - then we may also express the controls in terms of y as -1 0 u = H(I - NH) y - (3.12) 43

3.3. Optimization

In this section we consider optimizing (2.5), where u satisfies

(3.12). Using the results on covariance operators in Appendix I, we may

write the cost function as

T M J = E f x'(t)Q(t)x(t) + E u!(t)R.(t)u.(t)dt 0 i=11 1

T = tr f Q(t)E (tt) + E R.(t)E (tt)dt (3.13) 1=1 1 0 U.

H tr (QE + S R.E ). x i. I U~ 1= where E and E are the covariance operators of the x and u. processes, x U. 1 respectively. Define the operators

F(H) - H(I - NH)

K B[Kl19...9KHJ

K QK1+ R K1QK2 ... KQK

L K 2 KI

KMK I m.n KMQKM +RM

Let E o. and Eo be the covariance operators (see Appendix I) for the x y 0 0 x and y processes, respectively, and E o0o be the cross-covariance y X 0 0 operator of y and x ; also the operators Q and R have kernels

Q(t)6(t - s) and R(t)6(t - s), respectively. From (3.13) and the defini-

tions of the x and u processes, we have

J(H) = tr(QE o + 2QKF(H)E 0 (3 X Y X (3.14) * + LF(H)E F (H)). y0 I I

44

Thus, solving the stochastic optimal control problem is equivalent to minimizing J(H) with HEJ;t, which is an infinite-dimensional deterministic A problem.

Because minimizing J(H) is an infinite-dimensional problem, the necessary conditions for optimality involve generalized derivatives instead of the gradients used in finite-dimensional optimization.

In infinite-dimensional normed spaces, a generalized directional deriva- tive is the Fr echet differential.

Definition 3.1 (Luenberger [38, Ch. 7]). Let T be a transfor-

mation defined on an open domain D in a normed space X and

having range in a normed space Y. If for fixed xED and each xeX,

there exists 6T(x0 ;x) EY which is linear and continuous with

respect to x such that

lim IT(x0 + X) - T(x 0 ) 6T(x 0;x) Ii xi|+0 I xi= 0,

then T is said to be Frechet differentiable at x and 6T(x0 ;x)

is said to be the Frechet differential of T at x0 with

increment x.

Remark 3.1. This definition of the Frechet differential should be compared with the definition of the Frechet derivative in [38, Ch. 7].

Also a useful result is that the Frechet differential of T(x)=T2x)T2 (x) is 6T (X0 ;x)T (X0 ) + T1 (X0 )6T2X0 ;x), where 6T and 6T2 are the Frechet differentials of T and T2 respectively. If Y is the real line, then the Frechet differential satisfies

n t 45

d 6T(x0 ;x) = y- T(x0 + Ex)1 0 d F_ 0E = 0 and for each fixed x0 EX, 5T(x 0 ;x) is a functional with respect to the variable xcX.

To calculate the Frchet differential of J(H), we need the Frchet differential of F(H).

Lemma 3.1. The Frechet differential of F(H) = H(I - NH)I

at the point H is

6F(H;H) = H(I - NH) + H(I - NH) NH(I - NH)~(3.15)

Pf: This follows by applying the rule for the differential of

T (H)T2(H), where T1(H) E H, T2(H) E (I - NH) and 6T (H;H) = H,

- -l -1 6T2 (H;H) = (I - NH) NH(I - NH) . To verify the expression for 6T2 , simply start with (I - NH)(I - NH) 1 = I, apply the product rule, and note SI 0. Consider the function

* T(A) = tr(LAZ oA + 2QKAZ o o). y y x

Because T(A) is a real-valued function, we have

d * 6T(A;H) = tr(L[A + EH]Z a [A + cH] de y

+ 2QK[A + EH]EyOxO 0

2 = tr(LHE 0 A + LAE oH + QKHE o XO y y

-- * * * = 2tr([LAE o + K QE o o H y y x where the last equality follows from (1.19) and (1.20) in Appendix I and the fact thatL = Land yE = E . y y I . I

46

Therefore, an application of the chain rule for generalized derivatives

(Luenberger [38, Ch. 7]) gives

6J(H;H) = 2tr([LF(H)EYo +

* * -- * K QZ a ox][6F(H;H)] ) (3.16) y x where 6F(H;H) satisfies (3.15). The infinite-dimensional generalization

of setting the directional derivative equal to zero is given in the next

theorem.

Theorem 3.2. If H minimizes the functional J(H) over the

space ,-. then

6J(H;H) = 0 (3.17)

for all H k,A where 6J(H;H) satisfies (3.16).

Pf: Theorem 1 in Luenberger [38, Ch. 7] gives this necessary condition

for functionals which have a Gateaux differential. But if the Frechet

differential exists, then the Gateaux differential exists and they are

equal. Thus, the Frechet differential must be zero at the minimum.

Q.E.D.

Define the operators E1 and E2 by

* * 0 E1 EEI(H) E LF(H)Eo + K QE YxO (3.18)

E2 E E2(H) E (I - NH) 1 (3.19)

Notice that E is not a causal operator, but E2 is. Using (1.19) and

(1.20) in Appendix I, we may write (3.16) as

6J(H;H) = 2tr(E1 [HE2 + HE2 NHE2 ]

- -* * -* * *-*-* = 2tr(E 1 [E2 H + E2 H N E2 H ]) (3.20)

= 2tr([E1 E2 + N E H EIE2]H) 47 where the overbar on E. indicates evaluation at H. The optimality condition (3.17) implies

6J(H;Ii) = tr(hH ) = 0 (3.21) for H-J 'and where

-- -* *--*- -* h EE2 + N E2H EIE2 (3.22)

Recall that a partition of H is defined by (3.3); assume h is partition- ed conformally to H . Then (3.21) is equivalent to

E tr(h..H..) = 0 (3.23) for all H. . satisfying (3.4). Since H. . = 0 is admissible, (3.23) implies

tr(h..H..) = 0, i,j = 1,...,M , (3.24) for all H. . satisfying (3.4). Moreover, by summing (3.24) we get (3.23); so (3.24) is equivalent to (3.23). The following lemma allows us to put

(3.24) into an integral equation form. n Lemma 3.2. Suppose A is an L2 linear operator on L2[0,T]

and

tr(AV ) = 0 (3.25)

for all Volterra L2 operators with kernels satisfying

V(ts) = 0 for s > t - T. Then A is the adjoint of a Volterra

operator and satisfies

A = 0 , (3.26)

where

A(t,s), s St-T A+ E

S > t -T. L 0, I I

48 * Pf: The kernel of AV is

s-T k(t,s) 3 1 A(t,6)V'(s,G)d6. (3.27) 0 Thus,

T T t-T tr(AV ) tr k(t,t)dt tr f f A(te)V'(tO)dedt. (3.28) 0 0 0

Condition (3. 25) implies

T t-T tr I f A(t,0)V'(tO)d6dt = 0 (3.29) 0 0

for all L2 matrix functions such that V (t,6) = 0 for 0 > t - T.

In particular, we may set

V(t,0) = A(t,0), 0 < t - T (3.30) =P0, >t-T.

Thus, (3. 29) becomes

T t-T tr f f A(t,0)A'(t,)dedt = 0 (3.31) 0 0 which implies

A(t,&) = 0, 0 t -T

except possibly on a set of measure 0 on the square [0,T] x [0,T]. Q.E.D.

Applying the lemma to (3.24), we get

=0T [ [] iJ - 0, i,j = 1,.. .,M. (3.32)

Remark 3.2. The notation in (3.32) is suggestive of the spectral factorization notation of Wiener-Hopf theory (see Van Trees [50, Ch. 6]).

Actually, the operator approach and Lemma 3.2 (with T = 0) may be used to

Th 49 give a time domain derivation of the Wiener-Hopf equation. Also when the problem discussed in Chapter 5 of this thesis is specialized to one station, the Wiener-Hopf equation characterizes the solution.

If h. . (t,s) is the kernel of h. ., then (3.32) implies

h. .(ts) = 0, s < t - T.., i,j = 1,...,M. (3.33) :IJ 1J

Since E2 , H, and N are causal operators, the adjoints of these operators are anticausal (see expression for adjoint in Appendix I). Thus, by repeatedly applying (1.10) from Appendix I, we obtain

h(t,s) =

s T S 3 s 2-- f f f f [E 2 (s,s4 )E(s3,sQ)H(s 3,s 2 )E2 (S2 ,sI)N(s1 ,t)] Ot t t dsds2ds3ds s + f E 1 (t's)E'(s,s )ds (3.34) 0 * If h(t,s) is partitioned conformally to H (t,s), then (3.33) becomes

s T s3 s2 - --- (f f f f [E2(s,s )E' (s3,s4 )H (s3,s2)E2(s29,s)N(s ,t)] O t t t dsds2ds3ds s + f E (t,,s )E'(s,s)ds1) = 0, 3.35) 0

i,j = l,...,M, s < t - T.., where "( )'.'. indicates block (i,j) of the partitioned matrix. Because JJ E and E2 are nonlinear functions of H, (3.35) is a set of nonlinear in-

tegral equations for the kernels H.. (t,s).

As an example to illustrate the complicated nature of (3.35), we

consider the two station version of the basic team problem with no I I

50

sharing of information. The observations are 0 Y= y1 + N1 1 u1 + N12u2 (3.36a)

0 (3.36b) 2 y 2 + N21u1+ N22u2 We may simplify (3.36) and the optimization problem by noting that if

u. = G.y., then exactly the same controls may be realized by setting 1 1 1

u. = H.(y. - H..u.) (3.37) 1 1 1 11 1

-l 7 and choosing H. = G. (I + N. .G.) . Similarly given H. we may realize 1 1 1111 -l the form G.y. by setting G. = (I - H .N..) H.. Therefore without loss of

generality we may consider observations and controls of the form 0 Y, ~-y 1 + N 12 U2 (3.38a)

(3.38b) y Y 2 + N2 1u

u. = H.y. . (3.38c) 1 1 1

In this situation we have H H. and H.. = 0, i j; (3.11) and

(3.12) become, respectively,

S -12 2 1 (3.39) y = [0 21 -N21H I

H 0 I -N H - 7y 0 1221 (3.40) u = . 0o 12] -N2IHl rJ y0 0 2 21H1 I 2 Also, substituting the controls into (3.38) yields an alternative

expression for (3.39), i.e. -1l 0 0 = (I - N1 2 H2 N2 1 H1 ) 1 + N1 2 H2 2 7. This may be demonstrated by substituting for H. in (3.37), solving for u., and using the identity (I+G(I-NG)-lNjG(I-NG)-1. The iden- tity is proven by starting with G = G(I-NG) (I-NG).

11 51

N H)-1 [y0 +NHy0 y 2 (I-N2 1H1 N 1 2 H22 21 + 1

or -l

y (I-N1 2H2 N2 1 H) (I-N1 2 H2 N21 H1 12H2. 1 r-1 1 [ , Y2 (I-N2 1H1 N1 2 H2 N21 1 (I-N2 1 H1 N1 2 H2 )

(3.41)

The definition of E and (3.41) imply the kernel E2(t, s) satisfies 2 2 12 E (ts) 2 E2 (t,s)j E E2 (t,s) E2(t,s) E22(ts)

1N (I-N 1 2H 2N2 1H )1(t,s) ((I-N1 2 H 2N2 1 H1 ) 12 H2 )(ts)

(I-N ((I-N21H1N12H2) N2 1 H1 )(ts) 21H1N 1 2H 2 ) (t,s) (3.42)

where, for example, (I-N 12H H ) 1 (t,s) is the kernel of

definition (3.18), we have (.-N12H2N2 1H 1) . From the * * E ELF(H)E 0 + K QE O0 1 y y x

L L fH1 11 12 E zE 11 12 0] E2 Ey yy

21 2 22 L2 L2 0 H 2 EE y U

+ KI Q[Zo o, 0 , y1 x 2 K2 where

L121 + R K QK 2 [L 1 1KQK

L21 L22J K2QK 2 + R LK2QK 1 2 J I I

52

Thus, the kernel of E1 is

1 E2(t,s) E 2 (tls)( 1 ' s)(3.43) E21((s) E)t s 1 / J where, for example,

1 E (t,s) = (LEH1E2 yo)(t,s) + (L H E2 yoyo)(ts) 1 11 2 y11 12 y y 1

2 + o(L2H2E21E)(ts) + (L1 2H 2E 2yoyc)(ts) (3.44)

+ (KQZo o)(t, s)

and, for example, (L H1 1E2 E Z)(t,s) is the kernel of L H E21y 1112y1 '111 The other terms in E (t,s) have a structure similar to (3.44). 1 Because there are only two stations and there is no sharing of infor- mation, the set of integral equations (3.35) is

s T s 3 3dS (f f f f [E (s,s )E'(s 2 4 3 ,s4 )H(s 3 ,s2 )E2 (s2 ,s)N(s1 ,t)] ds1 ds 2 ds3 4 0 t t t

s - + f EI(t,s1 )E(s,s1 )ds). 0, i = 1,2, s t , (3.45) 0 where E2(s,s 4 ) and E1(s 3,s4 ) are defined by (3.42) and (3.43), respective- ly and 0 N12 (sl, t) N(s ,t)B)J [N 21(st) 0

It is clear that it is very difficult to solve the two coupled nonlinear equations (3.45). Furthermore, even in the case of

1 r 53 finite-dimensional system dynamics, the equations do not simplify signi- ficantly.8 The two station finite-dimensional model satisfies (2.17),

(2.21), and (2.22). Therefore, the operator kernels N(t,s) and K(t,s) are

N ,(t's) N1 2 (ts)

N(t,s)

N)(t,s) N2 2 (ts)_J

(3.46a)

C (t)O(t,s)B(s) C (t)O(ts)B2(s)

C 2 (t)((t,s)B1 (s) C 2(t)(D(t,s)B 2(s)

K(t,s)- [K 1(t,s) K2 (ts)] = O(t,s)[B1 (s) B2(s)] . (3.46b)

Notice that here we use the equation (3.36), rather than using (3.38a,b).

We do this so that the state representation for the impulse responses

N(t,s) and K(t,s) will be given by (2.17). To see the main difference between the finite-dimensional modeland the general case, let G(t) be any

9 fundamental matrix of A(t). Then the transition matrix 1(ts) satisfies

4(ts) = ('t) (s), t > s

Consequently, when the model is finite-dimensional, the kernels of N and

K satisfy the separable (in t and s) equations

C (t) N(t,s) = 4(t) (s)[Bl(s) B2 (s)]

C2 (t)-

8. Whittle and Rudge [52] show that, -in a stationary version, any solu- tion of the necessary conditions must have an irrational transfer function, which implies there does not exist a finite-dimensional realization of the optimal solution. 9. See Chen [13, p. 129]. I I

54 -l ^-1 K(ts) =4(t) [0 (s)B 1 (s), 0 (s)B (s)]

By a.similar construction, the kernels of the covariance operators of the uncontrolled processes may be put into separable form. Moreover, the separable form is a characteristic property of finite-dimensional system models; i.e. it is both a necessary and sufficient condition for the 10 existence of a finite-dimensional state space representation. In (3.45) we see the separability condition does not eliminate the nonlinearities which make solving the integral equations so difficult.

Although the optimal H, i.e. H, must satisfy the necessary condition

(3.17), in general there will be many other stationary points of 6J since

J(H) is not a convex functional of- H. Thus another difficulty with (3.35) is the solution is not unique. For certain information structures, however, minimizing J(H) is equivalent to minimizing a convex functional and a necessary and sufficient condition for the unique solution may be obtained. Before discussing some of these situations, we consider the existence question.

Because J(H) is a nonconvex functional on an infinite-dimensional vector space it is very difficult to show there exists an H achieving the infimum. But J(H) is nonnegative and there are feasible controls resulting in finite cost.- Hence, we infer the infimum is finite and non- negative.

10. See Chen [13, p. 148] for a proof that a p x r square-integrable impulse response T(t,s) may be realized by an n-dimensional system if and only if there exist p x n and n x r matrices M(t) and N(s) such that T(t,s) = M(t)N(s).

r ~r 55

Sometimes it is possible to prove the infimum is achieved for a non-

convex function by showing the cost function is unbounded in all direc-

tions.11 The problem here is that it is not clear J(H) is radially un-

bounded. By modifying the cost function to make it radially unbounded,

however, it is possible to prove an existence theorem. Instead of (3.14), consider the cost

* (H) = J(H) + atr(HH ) , (3.47) where a > 0. The additional term in the criterion is a penalty on the norm of H and the effect is to keep the "gains" H.. (t,s) from becoming iii

too large. Let E inf[J 1 (H)IHS ]. Then because J(H) 0, if there exists an H which achieves the infimum f, this H must also solve the optimization problem

min[J (H) HES] (3.48)

S tHEeHtIottr(HH) < 0 1.

For (3.48) we have the following existence theorem.

Theorem 3.3: There exists an II which achieves the minimum

in (3.48).

Pf: The functional J1 (H) is Frechet differentiable and hence continuous. * * Therefore it is also weak continuous. Furthermore, the set S is weak compact, since it is a sphere in the Hilbert space,. Thus, by Theorem

2 in Luenberger [38, Ch. 5], J1 (H) achieves its minimum on S. Q.E.D.

11. Witsenhausen [54] obtains an existence result by proving the cost is radially unbounded in a scalar case. The difficulty of proving existence in vector situations has been noted in the context of de- signing fixed-structure compensators for systems with classical and nonclassical information patterns. See Levine [33], Looze, et al. [36, Sec. 3.6], and Chapter IV, Section 2 of this thesis. I I

56

3.4 Partially Nested Systems

In this section we determine a set of necessary and sufficient

optimality conditions for the class of systems with partially nested

information structure. An information structure is partially nested if all terms in station i's observation process which depend on the controls

(this includes the controls of the other stations) may be determined as

causal functions of station i's information. In other words, control actions propagate at a rate no faster than information. Thus, there is no need for communicating information, i.e. signaling, through the dyna- mics of the control system.

To make these informal comments precise, define the processes

yM(t - yT.1 1(t)y0(t - -n T 1l 01

0 y (t )(t- LT T )J

where T . is defined by (2.7) and if t - T . .< 0, set that element of 0 w. (t) or w.(t) equal to 0. Thus, we have

(t) = {w.(s)|se[0,t]}, i = 1,...,M . (3.49)

By definition, we also have

0 M S-T.i Y (s-[T..+E= y .(s-Tf.)+ E f xNO(s-T.., )u (0)dO, 1J 3 1J k=l 0 (3.50) ss[O,T}

Let W (t) be the set of causal linear functions of V((t). The following 1 1 definition is a modified version of the Ho and Chu [22] concept of partial nesting. Let u.(t) = Yj( .(t),t) and y' = [y',...,Y']. 57

Definition 3.2. If for each time t and for all yeUL and

s t, M s-T.. Z f 'N. (s-T..,)u(O)dO EW. (t), k=1 0 3k lJ k1

ij =1,...,m,

then the information structure (t), i = 1,. ..,M, is -1

partially nested.

Therefore, if the information structure is partially nested, the observa- tion process satisfies

w. =w. + T.w., (3.51) ]1 131 where T. is a linear operator. Notice the operator T. depends on the control laws. For purposes of linear control, however, the process w. O may be replaced by the uncontrolled process w.

Theorem 3.4. If the information structure is partially

nested, then the set of control laws of the form

u. = H w., i = 1,...,M, where 1 11L

HA H

0 is equivalent to the control laws of the form u. = Gw,

i=1,.. .,M, where

G

GE

GJ A I I

58 -10 Pf: Suppose u. = H.w.. Equation (3.51) implies w. = (I-T.(H)) w., since 1 1 1 -10 0 0 T. is causal. Thus, u. = H.(I-T.(H)) w. G.w.; and note w. is obtained 1 1l 1 1 1 ii 1 0 0 from w. by the equation (I-T.)w. = w.. Now, if u. = G w. the the partial 1 11 :i nesting condition still implies (3.51), but now T. B T.(G). The realiza- 1 1- tion of this control law in feedback form is u. = G.(I-T.(G))w.. Q.E.D. - 1 1 1 Theorem 3.4 thus implies that for partially nested systems we need only consider laws of the form 0 U.= G.w., i = 1,...,M. (3.52)

We will see that for this form of control the optimality condition of

Theorem 3.2 simplifies and is also a sufficient condition. But first we 0 rewrite (3.52) in terms of the process y (defined before (3.9)) so that the special form of the cost function for partially nested systems may 0 be compared with the general form (3.14). From the definition of w., we have (3.52) equivalent to H 0 u. = G..y., (3.53) 1 j-l 13 1 where G.. satisfies (3.4) and G is defined analogously to H. For this 13 class of control laws, the cost function is

J (G) = tr(QE a + LGZ 0G p x y(3.54) + 2QKGE a0 ), yx where L, K, o,EyO jo0o are as defined in (3.14). This optimization X. y y x is carried out over all GEAA*

To prove existence and uniqueness of the solution for min [J (G)IGstYX], assume Eo is invertible and note that since L is a positive definite oLperator, it 59 is also invertible. Then we may define a new inner product onNA by * [G,G2 ] = tr(LG Ey oG2). (3.55)

From (3.54), we have

* J (G) = tr(QEX + LGE G + 2QKGE -Ox) P xy 0 y x

* * ** *. = tr(QEx, + LGZ EG + 2K Q E 0 OG )

=tr(QE x + LGEY OG +

2L1 K* Q*F-* 00-1 O *O y x y y

tr(QE=o)+[G+LtKQ 0 O- 0Q 0 0OGE~10+,G + LGZK x y x y y x y

-1 * * * -1 -1 * * * -1 - [L K Q E a oE O,L K Q E a oE 0]. (3.56) y x y y x y

Hence, minimizing J (G) is equivalent to finding the minimum distance p from the subspace 44to the point L K Q E 0 oE o in V, but where y x y has the inner product (3.55). The solution is characterized by the orthogonal projection theorem (Luenberger [38, Ch. 3]).

Theorem 3.5: The minimum of J (G) for GEX4fl is achieved p by the unique G satisfying

* *-1 *-1 * 0 0 [G + L K Q E y xO y ,G] = 0 (3.57)

for all GEJ .

The optimality condition implies

tr(gG ) = 0 (3.58) for all G and where

g - LGEy 0 + K Q Z o O. (3.59) I I

60

As we did for (3.21), we may show (3.58) is equivalent to

tr(g..G..) = 0, (3.60) :LJ 1J for all G satisfying (3.4). The discussion preceding (3.53) shows why

G.. satisfies (3.4).

Applying Lemma 3.2 to (3.60), we have

g 0, i,j = l,...,M, (3.61) 13 + and thus g.. (t,s) = 0, s < t - TC.., i,j = 1,.. .,M. (3.62) From (3.59) we have T T g(t,s) = f f L(t,s 1 )G(s 1 ,s2)zyO(s ,s)dsIds 0 s 2 2 T

+ f K'(s 1 ,t)Q(s)z oyo(s1 ,s)ds1 (3.63a) t or equivalently, T s1 g(t,s) = ff L(t,s)G(s 2)zy O(5 2 ,s)d2 ds 0 0 T + f K'(s,t)Q(s9)zoyoy(s , s)dsI. (3.63b) t Therefore, (3.16) becomes T T (f f L(t,s 1 )G(s ,s2)EyO(s2 2 s)ds1 ds2 0 s2 T

+ f K'(s1 ,t)Q(s)EZoo(sls)ds1 ).. = 0, (3.64) t s < t - .. , i~pj = 1,...,2M.

Equation (3.64) represents a set of linear integral equations for G(t,s);

this set is the analogue for partially nested systems of the general set of nonlinear equations (3.35). In Chapters IV and V we apply this part- ially nested optimality condition to obtain explicit solutions for some

stochastic control problems.

-i -~ 61

CHAPTER IV

Operator Derivations for some

Classical and Nonclassical Models

This chapter is concerned with applying the operator approach to several well-known problems from the control literature. A generalized separation theorem for the classical information pattern is derived in the first section and then applied to a system with finite-dimensional dynamics and to some classical systems with delays in the observations and dynamics. The second section discusses decentralized fixed-structure optimization.

4.1. Optimization with the Classical Information Pattern

A system with the classical information pattern consists of a single station and thus of a single controller who has access to all past obser- vations (and hence also to all past controls). We may say, equivalently that the system is composed of several stations which share all informa- tion with no delay. This flexibility in formulation arises from our assumptions that all the observations and controls may be vector pro- cesses and nonzero correlation is permitted between the uncontrolled processes. It is important to note that the classical information pattern does not rule out the possibility that the relationship between the observation process y0 and the state process x0 involves delays (e.g. see latter part of this section).

We may write the observations and controls for the classical problem as

y = y + Nu (4.1) I I

62

u = Hy, (4.2) where there is no special partitioning in (4.1) and (4.2). Because there is a single station, the information pattern is partially nested.

Therefore, we need only consider controls of the form 0 u = Gy . (4.3)

Remark 4.1. To obtain the classical case from the general partially 0 nested pattern, we set T.. B 0 for all i and j in the w. and w. processes in section 3.4, and consequently we have,for all i,w. (t) B y(t), 0 0 w.(t) B y (t).

In this situation the optimality conditions (3.61) and (3.64) are, respectively,

* - * * [(K QK + R)GEy o + K QE o a] + = 0, (4.4)

T T f r [K QK(t,p ) + R(t) 6 1 (t-pI)]G(p1P2)zyO(P ,s)dp dp 0 2 1 2 p2 T = - f K'(p,t)Q(p)Z 0 o(p,s)dp, 0 s t T, (4.5) t where T

K*QK(t,p1 ) - f K'(p,t)Q(p)K(p,p1 )dp. max(t,p1 )

Thus, finding the optimal control for linear stochastic systems with classical information patterns and quadratic criteria involves solving the linear integral equation (4.5). In certain cases this equation can be considerably simplified.

Note that if the observation process y were white-noise, then we could perform the integral over p2 easily, since the covariance of

- r 63 white-noise has a delta function kernel. Although yo is not white-noise, for an important class of models it is linearly equivalent to the innova- tions process which is a white-noise process.

Theorem 4.1 (Kailath). Suppose there is a set of observations

of the form

y(t) = z(t) + 6(t), tE[0,T],

where 0(t) E a sample function of zero-mean white-noise with

covariance E[0(t)0'(s)] = Z(t)6(t-s), Z(t) > 0 and z(t) E a

sample function of a zero-mean "signal" process that has

finite variance trE[z(t)z'(t)] < , tej[O,T]. Also, "future"

noise 0(-) is uncorrelated with "past" signal z(-), i.e.

E[z(s)0'(t)] = 0, 0 s < t < T.

Let 2(t) be the linear least-squares estimate of z(t) given

the data [z(s), ss[0,t]]. Then the innovations process v(-)

defined by

v(t) E y(t) - 2(t) i(t) + 0(t)

is a white-noise with the same covariance as 0(-), i.e.

E[v(t)v'(s)] = E[0(t)0'(s)], 0 < t, s < T.

Furthermore, y(-) and v(-) can be obtained from the other by

causal linear operations.

Pf: See Kailath [24,26].

Suppose the uncontrolled observation process is 0 y (t) = z(t) + 0(t), (4.6)

1. This equivalence result has many theoretical and practical implica- tions for estimation, control, and detection problems. See Kailath [24,26] for further discussion and references to other work. I I

64

where y0 satisfies the assumptions of Theorem 4.1. Then because y0 and 0 v are equivalent for linear operations and the control law must be 0 0 linear, the theorem implies we may replace y by v . Substituting the

covariance operator of v0 into (4.5) yields

T T

ff[KQK(t,p1 ) + R(t)6(t-p 1 )]G(p1 ,p2)Z 0 (p2 )6(p -s)dp dp 0 2 1 2 p2

T = - f K'(pt)Q(p)Zovo(p,s)dp, t or T R(t)G(t,s) + f K QK(tP1)G(pi,s)dp

T - - f K'(p,t)Q(p)Zoo(p,s)dpE0 (s), (4.7)

0 s t T.

Equation (4.7) is a Fredholm type integral equation for determining

the optimal linear control when the observation process is the sum of a

second-order process and white-noise. By using a different approach,

Lindquist [34] also derived this equation. In [34], he assumed the ob- servation process was white-noise and then derived (4.7) by a procedure similar to the derivation in Chapter III. Then he considered cases under which the observations can be transformed into the innovations. Further- more, he also showed that if we let the subscript denote column i of G(t,s) -l and Z OvO(tsi)zE (s), then (4.7) is the set of necessary and sufficient optimality conditions for the family of deterministic problems

T Min f [x(t)Q(t)x.(t) + U(t)R(t)U.(t)]dt (4.8) S 65 subject to

x1 (t) = (EKovo(ts)Zo'(s))i + f K(t,p)U (p)dp, (4.9) s

tE[s,T], i = 1,...,P, 2 where G(t,s) has p columns and U. (t) E G(t,s).. Notice, in contrast to 0 [34], that we assumed mean-square continuity of the processes x (-) and u(-) to derive (4.5), and we allowed the observations to contain mean- square continuous and white-noise components. Then to simplify the integral equation we made particular assumptions about the observations.

0 Because v depends only on the uncontrolled process y0 , it is inde- pendent of the cost criterion and the controls. Therefore, we have a separation theorem. The first part of the problem is to find the linear filter generating the innovations v0 from the observations y . The other part of the problem is to solve the integral equation (4.7), i.e. to solve (4.8) subject to (4.9).

To obtain some specific results, we impose structure on the kernel 0 K(t,s) and on the observation process y . Consider the problem with dynamics (2.8), i.e.

x(t) = A(t)x(t) + B(t)u(t) + t(t), (4.10)

x(O) = x0 , E[x0 '(t)] 0, ts[0,T] and uncontrolled observations

yo(t) = C(t)x0(t) + e(t).

Let

EtO(t)6'(s)] = 16(t-s) (4.11)

2. See Lindquist [34] and also Balakrishnan [6] for further discussion of the operator approach to deterministic optimal control problems. I I

66

(t)'(s)] = EZE[ (t)6(t-s)3, (4.12)

where I is the p x p identity matrix and t(-) and O(-) are independent.

The innovations process is

v0 (t) = y0 (t) - C(t) 0(tIt),

where x0(tjt) is the optimal linear estimate of x0(t) given the data 0 {y (p),pe[o,t]}. The observation model leads to the well-known Kalman-

Bucy filter equations for i0 (tjt). Thus i 0 (tlt) satisfies -0 0 x (t It) = A (t) X" (t It) +

E 0Vo(t3,t)[y0(t) - C(t)R,0 (001), xv X010) = 0, te[0,T], (4.13)

where 0 Xao(tt) E[x (t)v'(t)]

= PO(t)C'(t)

and P (t) satisfies the Riccati equation

S(t) = A(t)P0(t) + P0(t)A' (t)

(4.14) -Zovo(t,t)E' 0OvO(t,t) + E(t),

P 0 (0) = E0 ' Next we must solve (4.8) when

K(t,p) = (tp)B(p), where P(t,p) is the transition matrix of A(t), and for p > s

Zxovo(p,s) = D(p,s)P0 (s)G'(s).

But under these conditions (4.9) is equivalent to (with E - I)

x.(t) = A(t)x.(t)+ B(t)U.(t) (4.15)

X i(s)=(Px0 (s)C(s), tc[sT], 67 where (z). is defined as column i of z. The optimal solution to (4.8), I subject to (4.15) ,in th.e familiar feedback form is (Athans[5]) -l U. (t,s ) = - R (t)B' (t)S(t)x.(t,s), (4.1 6) where S(t) satisfies th e Riccati equation

S(t) = - S(t)A(t) - A'(t)S(t) - Q(t)

+ S(t)B(t)R~ (t)B' (t)S(t), S(T) = 0. (4.1 1)

Thus,

G. (t,s) = R (t)B'(t)S(t)x.(t,s), where x.(t,s) satisfies (4.15) with U. = G., and the optimal stochastic 1 1 control is t - 0 u(t) = f G(t,s)v (s)ds 0

=- R~1(t)B'(t)S(t) f X(t,s)v0(s)ds, (4.18) 0 where column i of X(t,s) is x.(t,s). To write (4.18) as a function of

i(tlt), the best linear estimate of the controlled state, we note that 0 for any control u = Gv , we have

A t 0 i(tjt) = i (tjt) + f D(tp)B(p)f G(p,s)v (s)dsdp 0 0 t0

- f [(ts)P0 (s)C'(s) + 0

f 4(t,p)B(p)G(p,s)dp]v0(s)ds. (4.19) s I I

68

But column i of the term multiplying v (s) in (4.19) satisfies (4.15).

Therefore, (4.18) becomes

u(t) = -1(t)B'(t)S(t)i(tlt), (4.20)

which. is the classical separation result for finite-dimensional systems.

By changing the kernels K(t,s) and E ov 0 (ts), we obtain the separa-

tion theorems for systems with delays in either the observation processes

or in the dynamics or in both. Suppose the dynamic equation is again

(4.10) and the cost is (2.10), but the observation process is

y(t) = C(t-T)x(t-T) + 6(t), te[T,T], (4.21)

0,[teG t[T,]

where O(-) satisfies (4.11). This is the case of delayed measurements of

the state. In this model we have

x(t-T) = xu(t-T) + x0 (t-T),

where

-0 0 x (t-T) = A(t-T)x (t-T) + C (t-T),

te[T,T], x0(T) = x0 , (4.22)

and

U (t-T) = A(t-T)x (t-r) + B(t-T)u(t-T)

ts[T,T], x (T) = 0 . (4.23)

Therefore, the uncontrolled part of y, i.e. y0, is given by

y0(t) = C(t-T)x0(t-T) + 6(t). (4.24)

The innovations process is

V0(t) = y 0 (t) - C(t-T) 0 (t-Tjt), (4.25) 40 0 where x (t-Tlt) is the best linear estimate of x (t-T) given

3. This separation theorem may also be derived by stochastic dynamic programming. See Wonham [59] for details. 69

{y 0(p), pe[O,t]}. Because x0 satisfies (4.22), the following Kalman-Bucy filter yields x0(t-Tt): A 0 x (t-Tjt) = A(t-T)x (t-TIt) +

P0(t-T)C'(t-T)[y0(t) - C(t-T)i0(t-rjt)], 10 x (p) = 0, p S T, ts[T,T], 0 and P (t-T) satisfies (4.14) with the appropriate time arguments delayed

0 by T and P (p) = E0 , p < T.

To obtain the optimal control we now must solve (4.7). But the only difference between this problem and the no delay case is in the term on the right-hand side, i.e. the driving term. Thus, for this problem we 0 put E 0 0O(p,s) in the integrand, where v is now (4.24), and

Zxo O(Ps) = P(p, s)4(s,s-T)P0 (s-T)C' (s-T) . (4.26)

Since we have not changed the state dynamics from the no delay case, solving (4.7) is equivalent to solving (4.8) subject to (4.15), but where now the initial condition is 0 x.(s) = (4(s,s-T)P (s-T)C'(s-T))..

Using an argument similar to that which leads to (4.18) and (4.20), noting t(t-T, s-T) is transition matrix for A(t-T), we find the optimal

1 0 control u(t) = - R (t)B'(t)S(t)[(t,t-T)2 (t-T t)

+ xu(t)]

= - R_ 1 (t)B'(t)S(t)i(tjt). (4.27)

Notice that @(t,t-T)x0(t-Tit) is a predictor for the uncontrolled process.

The result (4.27) was obtained by Kleinman [28] for an infinite horizon version of the problem. I

70

The final classical separation theorem that we consider is for

systems with delays in both the dynamics and in the observations. Since

Lindquist [34] solves a general linear delay problem with the classical

information pattern by first determining the innovations and then solving

the deterministic problem (4.8), only a summary of the results is pre-

sented here. Furthermore, the derivation is analogous to the no delay

case.

Suppose the state equations are

x(t) = A1 (t)x(t) + A2 (t)x(t-T)

+ B(t)u(t) + t(t), ts[0,T] (4.28)

x(p) = x0 '(P),p < 0 E[x0(p)((t)] = 0,

where A1(t) and A2 (t) are bounded matrices, T > 0, the process C(-)

satisfies (4.12), and {x0(p), p 01 is a collection of second-order ran-

dom variables which are independent of (). Let the observations be

y(t) = C(t-T)x(t-T) + 0(t), (4.29) where 0(-) satisfies (4.11). The uncontrolled part of the state process

satisfies -00 0 x (t) = A1 (t)x (t) + A2 (t)x (t-T) + t(t) (4.30) 0 x (P) = x0(p), p S 0, tE[0,T], and the controlled part is defined analogously to (2.12). Thus, we have 0 0 y (t) = C(t-T)x (t-T) + 0(t) and the innovations process is

v0(t) = y 0 (t) - C(t-T)xi(t-Tlt), (4.31) where x0(t-Tlt) is the optimal linearleast-squares estimate of x0(t-T) given the information {y0(s), sE[0,t]}.

1 "r- 71

Because there are delays in both the state and observation equations, the filter for x0(t-Tjt) is not finite-dimensional. Define the smoothed conditional error covariance4

0 0 0 0 P (t,p, ) = E{[x (t+p) - 2 (t+plt)][x (t+0) - "0(t+( t)] }, 0 0 0 where*Wt0= {y (s), se[0,t]}. Because P (t,p,t) satisfies a set of deter- t ministic equations,5 we have the unconditioned equation

P0(t,p,t) = E {[x0(t+p) -- 20(t+pjt)][x0(t+0) - 2 (t+Jt)]}.

Also let

P(0() P0(t,0,0) 0 o 0 P (t,p) P (t,p,O)

o 0 P2 (t,p,t) E P (tp,t).

Then the optimal estimate 20(t-Tit) is generated by AO 0 0 x (tjt) = A1 (t)i (tjt) + A2(t)5Z (t-Tt) +

0 0 0 P1 (t,-T)'C'(t-T)[y (t) - C(t-T)2 (t-Tlt)] (4.32)

'to 0 x (t +pjt) = E(t + pit + p) +

t 0 0 0 f P (st+p-s,-T)C'(s-T)[y (s) .- C(s-T)2 (s-Ts)Ids, (4.33) t+P - Tr p O .

To set up the deterministic problem (4.8), we let

K(t,p) = 4D(t,p)B(p), where $D(t,p)D is the transition matrix which satisfies

4. Note that Q(t,T,s) in [34] satisfies Q(t,T,s) = PU(s,t-s,T-s). 5. See Kwong and Willsky [32]. Lindquist [34] and others discuss linear filtering with delays, but they do not present equations for PO(t,p,t) as is done in [32]. I

72

3@D(t~s) D= A (t)D(t ,s) + A (t>ID(t-T,s), Dt1 D ' 2 D tE[0,T],

)D(tt) = I

qD(t's) = 0, t < s.

For p > s, the kernel EZ o(p,s) satisfies

0v o(p,s)= E[x0(p)v0(s)']

= E[x0(p)x 0s-T) - X0 (s-TIs))']C'(s-T)

= E[{DDD(p,s-T)x0(s-T) +

s-T DIL/(p, +T)A 2( +I)x0 (a)da + S -2T p.010 D(Pta)(a)da}(x (s-c) - X (s-TIs))']C'(s-T). s-T (4.34)

Because t(&) and Q(-) are independent, E(-) is white-noise, and x0 () is

a causal function of C(-), the term depending on {t(c), a > s - T} is zero

O 0 0 and (4.34) may be written (note E[x (s-T)(x (s-T) - i (s-Tls))'] = 0 since 10 0 X (s-T) is the optimal linear estimate of x (s-f) 0 E xovo(P's) = %D(Ps-T)P (s,-T, -T)C'(s-T)

s-T

+s2 D(p,c+T)A 2(cY+T)P (s,c-s,-T)daC'(s-T). (4.35) s -2T

In the no-delay case the deterministic problem was to solve (4.8)

subject to (4.15). With delays we still solve (4.15), but the constraint

becomes- x.(ts) = A (t)x.(t,s) + A (t)x.(t-T,s) + B(t)U.(t,s), 1 1 1 2 i 1 (4.36) x.(p) = (PO(s,p-s,-T) C'(s-T))., p < s, tE[0,T], where (Z). is column i of Z.

1 T 73

The feedback solution to the deterministic problem with dynamics (4.36) is oP -1 UO (ts) = - R (t)B'(t)K (t)x.(ts) -1

- R (t)B'(t) f K(t,p)x.(t + p,s)dp, (4.37) -T

where KO(t) and K (t,p) satisfy a set of coupled partial differential

equations.6

Analogously to the earlier cases, we may write the optimal stochastic.

control as

U (t) =-1~(t)B' (t)K0 (t):R(tl|t)

- 1 (t)B'(t) f0K (t~)2(t +plt)dp, (4.38) -T where K 0 (t) and K1 (t,p) are defined as in (4.37) and i(sft) satisfies

(4.32) and (4.33).

In summary, we have shown in this section that stochastic control

for linear systems with quadratic criteria and classical information

patterns involves solving the integral equation (4.5). When the observa-

tions are equivalent to an innovations process, the linear filter which

produces the innovations from the observations is determined first and

then the integral equation (4.7) is solved. Moreover, solving (4.7) is

equivalent to solving the deterministic problem (4.8). Thus, we have a generalized separation theorem from which more familiar specialized results may be obtained. A comparison of the nonlinear integral equations

for the no sharing case, i.e. (3.35), with the separated control laws shows

the dramatic effect that a change in information pattern can have on a control problem.

6. See Alekal, et.al. [1] and Delfour and Mitter [17] for more details. 74

4.2. Decentralized Fixed-Structure Optimization

As we proved in Chapter III, the necessary conditions for decentral-

ized systems without partially nested information are a set of nonlinear

integral equations.. Therefore, in general we do not expect that the kernels H.. (t,s) satisfying these conditions are the impulse response matrices of linear systems with finite-dimensional state-space represen-

tations. And furthermore, as we noted in Section 3.3, even the assump-

tion that the system satisfies equations such as (2.17), (2.21), (2.22)

does not seem to lead to solutions of the integral equations H(t,s) which are separable, as is required for a finite-dimensional realization.

The difficulties associated with solving decentralized problems with dynamics (2.17) have been recognized in the control literature [14,36,42].7

One approach to designing suboptimal, but easily implementable, controllers is to constrain the controls to be the outputs of finite- dimensional linear systems with compensator-like structures. These fixed- structure models have been discussed, for example, by Rhodes and

Luenberger [42] for stochastic differential games, and by Chong and

Athans [14], by Looze, et.al. [36], and by Looze [37] for dynamic team problems.

In this section we relate the fixed-structure models to the general operator approach by showing that the fixed-structure constraint is equi- valent to a particular type of constraint on the H. . (t,s) matrices. Since it is typical of the fixed structure models, we analyze a slightly modi- fied version of the problem of Chong and Athans [14].

7. Looze [37] and Chong [15] consider a hierarchical approach to these problems.

K~ 75

The n-dimensional system dynamics and p.-dimensional observations 1 satisfy, respectively,

x(t) = A(t)x(t) + B (MuI(t) + B2(OU2(t), (4.39)

x(0) = x0 ' tE(O,T], and

yi(t) = C.(t)x(t) + 6O(t), i = 1,2, (4.40) where 1 and 02 are independent zero-mean white-noise processes and

(t,T) = 0 (t)6(t-T), i=1,2.

We assume the system has two control stations and station i has the information

(t) = {y(s), sc{O,t]J; 1 also, as usual, both controllers know the a priori covariance X of the initial state and ., i=1, 2. The cost function is

T 2 J = E f [x'(t)Q(t)x(t) + u!(t)R.(t)u.(t)]dt. (4.41) 0 i1 1 1

We constrain the controls to have the form

u.(t) = D.(t)2.(t), (4.42)

2 x.(t)= [A(t) + Z B.(t)D.(t)]2.(t) + 1 i~l1 1 1

G.(t)[y.(t) - C(t)2.(t)]

2 =[A(t) + EB.i(t)D.i(t) - G.i()C.i(t)]X".i(t) + G.i(t)y (t) i=l1 11 1 1 (4.43) =(0) 0, te[o,T], i=1,2,

8. The only difference between our problem and the one in [14] is that we assume the mean of the initial state is zero and we do not penalize separately the final state x(T) in J. 76 where x^.(t) is the output of an n-dimensional filter and D.(t) and G.(t) are matrices of appropriate dimensions which are to be chosen.9 For each i, (4.42) and (4.43) give a finite-dimensional state-space representation of a causal linear system with input process y and output process u .

Therefore if P. is the transition matrix of 1

A(t) + B 1 (t)D 1 (t) + B2(t)D 2 (t) - G (t)C (t) then the controller satisfies

t u1 (t) = f H.(ts)y.(s)ds, i=1,2, (4.44) where

H.(t,s) = D.(t)4\(t,s)G.(s). (4.45)

Equation (4.44) is exactly the form of control law we have been discussing in the general linear system model. In this case, however, we do not have the freedom to choose any square-integrable H.(t,s), but rather the kernel must satisfy the constraint (4.45). Hence, the optimization problem is to determine D. (t) and G. (t), i=1,2, so that the controls given by (4.44) minimize (4.41).

Let the vector m(t) be defined as m'(t) E [x'(t),x'(t)- '(t), x'(t) - 5 (t)]. It can be shown (see [14]) that m(t) satisfies the differential equation

m(t) = A(t)m(t) - B(t)G(t), (4.46)

m'(O) = [x6,x6,x6], te[0,T],

9. Notice that x" (t) and x (t) are unbiased estimates of x(t) for all u (t) and u2 (t). See CGong and Athans [14].

m~r~ 77

where A + B D + B2D -B D -BD 11 22 112 2

A(t) 0 A + BD2 G C -BD 2 2 1 1 2 2 0 -B D A+ B D 22 I

0 0 B(t) G (t) 0

0 G 2(t)

2 (t)

Then wi.th (D^ as the transition matrix of A, we have A

t m(t) = A(t,0)m(0) + A(tS)B(s)6(sds. (4.47) A ~ 0A

Using ( 4.42), we express the controls as

u(t)E fu(] = D(t)m(t), (4.48) u29W where

D 2 t 0 ( D2 () and thus,

t u(t) = D(t)PA(t,0)m(0) + f D(t)4 (ts)B(s)0(s)ds. (4.49)

Equation (4.49) is exactly (3.12) for the special case where H. . (t,s) ii 78

satisfies (4.45), i.e. (4.49) expresses the control process in terms of

the uncontrolled processes.

By using the results of Appendix I, we may write the cost function as T J = tr f (Q(t)Z (t,t) + R(t)E (t,t)dt, 0

=tr(QE + RE ) (4.50) x u where Z and Zuare the covariance operators of the state and control IC u processes, respectively, and the operators Q and R have, respectively,

the kernels

Q(t})6(t -- s)

[R(t) 01 [0R2 (t)J 6(t -s).

Expression (4.50) is just an alternative way of writing (3.13). If we define

Z =-M(t) E[m(t)m'(t)] m (t,t) B and the kernel of the operator I as

I(t,s) B [I,0,0]6(t - s)=B 16(t - s),

then (4.50) and the definition of m(t) imply (where D(t,s)BD(t)6(t-s))

J=tr(QIZ I + RDEZD) m T = f tr(Q(t)IM(t)I' + R(t)D(t)M(t)D'(t))dt, (4.51) 0

A10 which is equivalent to (43) in [14] with F B 0. Moreover, it is easy to

10. This can be shown by rearranging Q(t) in [14]. 79

check that M(t) satisfies the equation

A(t) = A(t)M(t) + M(t)A(t) + B(t) e (t)B(t), (4.52)

0. 0 0 M(O) = , 0 0 0

Equations (4.51) and (4.52) show that when there are fixed-structure con-

straints, the operator optimization problem specializes to minimizing

(4.51) subject to a matrix differential equation. Necessary conditions

for this may be obtained from the matrix minimum principle, as in [14].

In this section, by constraining the kernels of the control operators, we obtained the fixed-structure model. This operator viewpoint also motivates an alternative technique for finding finite-dimensional con- trollers. Consider the input-output relation

t u(t) = f H(t,s)y(s)ds. (4.53) 0

Then there exists a finite-dimensional representation of the system with m x n impulse response matrix H(t,s) if and only if

H(t,s) = H1(t)H2 (s), (4.54) where H and H are m x r and r x n matrices, respectively, and 1 2 r is a positive integer (see footnote 10 in Chapter III). Consequently, if

(4.53) represents the system generating the controls from the observations and H has the form (4.54), then we have a finite-dimensional controller.1 1

11. Choosing the form (4.54) and the integer r is an example of parameter- izing the impulse-response H(t,s). Not much research has been done on the nontrivial issues involved in choosing parameters. But see Looze et al. [36] for some further discussion. 80

Although we do not study the implications of constraints of the form

(4.54) here, this appears to be a promising area for future research. 81

CHAPTER V

Certainty Equivalent Solutions by

a Decentralized Innovations Approach

This chapter considers the team decision problem described by equations (2.26)-(2.30). We show the optimal solution satisfies a

certainty equivalence type separation theorem. By introducing a decen-

tralized innovations process, the integral equation characterizing the optimal solution is solved. When the state is generated by a finite- dimensional linear system, then the optimal decisions are generated recursively by a decentralized linear filter.

5.1. Separation of Estimation and Decision-making in a Team

In section 4.1, we proved the separation theorem of stochastic control for classical information patterns using the necessary and

sufficient optimality condition for partially nested systems. Although

the integral equation (4.5) is linear, in general such equations are difficult to solve. The key properties of this case which permitted us

to obtain the elegant separated solution were the equivalence between the

observations and the innovations process and the relation between the

Fredholm equation (4.7) and the deterministic optimal control problem.1

We now apply similar methods to study the class of team problems

described by (2.26)-(2.30). Because the information processes are un-

affected by the controls, these systems are partially nested. Therefore,

the optimal solutions are characterized by linear integral equations. 1. The relations between Fredholm equations with separable kernels (i.e. kernels which are the impulse response matrices of finite-dimensional linear systems), innovations processes, and Kalman filters are dis- cussed in Kailath [25]. 82

We prove the optimal solution may be separated into an estimation

part and a decision part. The estimates form a set of sufficient

statistics for a family of problems in the sense that, under decentral-

ized information, these estimates are inserted where known state appears

in the optimal decision rule under complete information, i.e. under

certainty. Thus, we have a certainty equivalence type separation theorem.

The estimation part of the problem is related to a least-squares problem

for which a decentralized version of the Wiener-Hopf equation character-

izes the solution. By generalizing the classical linear innovations

theorem, we solve the equation. Then we specialize to the case where the

state is generated by a finite-dimensional linear system of the form

(2.8) with B(t) = 0. Here the team estimates are generated by a set of

finite-dimensional recursive equations reminiscent of the Kalman-Bucy

filter. Finally, a second-guessing paradox is resolved. Most of the

results in this chapter are discussed by Barta and Sandell [8].

We recall the problem (2.26)-(2.30) is

T Min J = E f (u(t) - S(t)z(t))'Q(u(t) - S(t)z(t)) dt, (5.1) 0 where Q = Q' > 0, {z(T), 0 T TI is a zero-mean n-dimensional state process, S(t) is an M x n time-varying matrix, and {u(T), 0 K T TI is a decision process whose components u.(-) are restricted to be causal linear functions of an uncontrolled zero-mean p .-dimensional observation process

{y (T), 0 < T < TI. Thus, we have 0 Nttsou(.i=l,. ..,M. Note the solution of (5.1) minimizes the integrand at each time t.

T-- ~T 83

To write the cost function in the form (3.54), let the operators Q and S have the kernels Q6(t-s) and S(t)6(t-s), respectively. Then the cost (5.1) is

* ** J(G) = tr[S QSE + QGE G - 2QGE S ], (5.2) z y yz where G E diag[G1 ,...,GM] and G E C.., where G. is defined in 1 Mi 11 Chapter III.

Remark 5.1. If A. (i = 1,...,N) is an operator with a. x n.-dimen- sional kernel, then let

A 1 0

diag[A1 ,...,A ]" N be a partitioned operator with blocks A. on the diagonal and O's else- where so that diag[A1 ,...,AN] has an a x n-dimensional kernel, where n n a = Z a., n = E n.. An analogous definition holds if the A. are matrices. i=l i=l11 The optimality condition (3.58) becomes

- * * tr[(QGE - QSEZ )G ] = 0 (5.3) y yz for all GEk, i.e. for causal operators, with A' square-integrable kernels satisfying the information constraints. Equation (5.3) implies condition

(3.61) for this problem is

M _M n* Z q. .G.E - E E q S E = 0, (5.4) . 1.) J y.y. .kkj y.z. + j=1 3i j=1lk=l j z

i = 1,...,M, where the q.. are the elements of Q and the kernel of S . is S (t)6(t-s). 1T kj kj Thus, the set of integral equations (3.64) in this case is 84

M t n M E q. . G.(t,0)E (0,s)dG = Z ZE qflSU(t)Z (t,s) (5.5) j=l 0 j i 9=1 k=l kzy

i = ,...,, O s < t T. M Remark 5.2. If we let P = Z p. and in (5.3) let the kernel of G, i=l G(t,s), be any M x P matrix with square-integrable elements, rather than restrict G to have the diagonal form, then we have the necessary and sufficient condition for a problem with complete sharing of information, i.e. the classical information pattern. Under these conditions, applying

Lemma 3.2 to (5.3) yields

[QGZ - QSE ] = 0, y yz + which implies (since Q is invertible)

t _ G(t,0)E (0,s)dO = S(t)E' (t,s), 0 y yz 0 < s :St T, which is the nonstationary Wiener-Hopf equation (see [50, Ch. 6]) used for determining the optimal linear least-squares estimate of S(t)z(t) in terms of the data {y(s), 0 s ti.

Now let ffV', j=1,...,M, solve (5.5) when S (t) = 1 and all other J YTI -YBn elements of S(t) are zero. Then the G satisfy

i t E q..f G. (t,O)E (0,s)dO = q. E (ts), (5.6) j=l 0 y i1y z y.

i = ,. ., ,< s : t 5T

Y =1,.,M n =1..,n.

Hence, by the superposition principle, for any S(t) the solution to (5.5) is

111-1-1-1-17-wr 85

H n G.(t,s) = ZE S (t)GT.(ts). (5.7) -i y=lrrjl YTB j

Thus, at each time t, the set of random variables

t {G Ty.} = { f Gh (ts)y.(s)ds} JJY,31TI 0Y'"

is a set of sufficient statistics for agent j for problem (5.1).

We now prove the set of kernels {GTh(t,s)}. satisfies a Wiener-

Hopf type of integral equation. For agent j define the partitioned matrix

(t,s)1 W (1s)3 . I (t,sS) Mj where each block W. has n rows. Let row f(t=l,... n) in block 13

Y(Y=l,...,M) be equal to Grh. Also set J

!O(t, S) C6 1(t's),..., 9Mt')

Y(s) B diag[y 1 (s),...,3yM(s)]

X(s) B diag[z(s),...,z(s)], where X(s) has M blocks z(s) (see Remark 5.2). Finally, define the

notation

E[Y(t)QY' (s)] B Z (ts)

E[X(t)QY' (s)] B Z (ts)

Then it is straightforward to verify (using the fact that Q = Q') that

(5.6) is equivalent to

t / C(tO)ZyyQ(O,s)d6 = EXY;Q(ts). (5.8) 0 86

Equation (5.8) is a nonclassical version of the nonstationary Wiener-Hopf equation. Before solving this equation, we show that its solution is that of a certain least-squares team estimation problem. In view of the close connection between classical least-squares estimation theory and

Wiener-Hopf equations, it is not surprising that (5.8) is related to a quadratic team problem.

Define the AM x M partitioned matrix

d1 (t)

D(t) E dM(t) where d.(t) is an n x M matrix and column i (i=l,...,M) of D(t) is restricted to be a causal linear function of y (). Let the team loss function be

J (D(t)) = trE[(D(t) - X(t))Q(D(t) - X(t))']. (5.9)

By writing out (5.9), we find that row p (n=l,...,n) of block y(y=1,...,M) in the matrix D(t) corresponds to the decision vector u(t) in the team problem

T (u(t)) = E[(u(t) SY T hz(t))'Q(u(t) Min J 2 - - S z(t))], where S has a one in element (y,r) and zero elsewhere. Minimizing J is thus equivalent to solving nM separate team problems of the form

Min J . Because minimizing J is the same as minimizing the time integral of J 2', the necessary and sufficient conditions for Min J 2 for all y and p are (5.6). Hence (5.6) also characterizes the optimal solu- tion of Min J . 87

We now discuss an alternative form of the optimal solution charact- erization in which the idea of certainty equivalent decision rules as functions of a set of sufficient statistics appears naturally. Let Y be the space of all nM x M matrices whose elements are scalar random variables. Define

(D1 ,D2) E trE[D QD%] (5.10) for D.EW6 and note (5.10) is well-defined since we assume all random variables are second-order. It is routine to verify that with inner product (5.10) and norm hiDt|2 = (D,D), 9 is a Hilbert space.

Let 4(t) be the subspace of all D(t)Eg with column i a linear causal function of y.() By the Hilbert space projection theorem, we have the following characterization of the nM x M matrix X(t)e.d(t) which solves Min J (D(t)).

Theorem 5.1. The minimum of (D(t) - X(t),D(t) - X(t))

for D(t)Ed(t) is achieved by the unique X(t)Ed(t)

satisfying

(X(t) - X(t),D(t)) E trE[(X(t) - X(t))QD'(t)] = 0 (5.11)

for D(t)Ed(t).

Condition (5.11) is a generalization of the classical orthogonality condition that defines the linear regression estimate. Note that, in contrast to the classsical case, the estimate depends on Q.

Condition (5.11) also implies a further useful characterization of the optimal decisions. Because KD(t)Exd(t) for any real nM x M matrix

K, (5.11) implies

((X(t) - X(t), KD(t)) = 0 (5.12) 88

For all D(t)E4(t). Therefore definition (5.10) and equation (5.12) give

trE[(X(t) - X(t))Q(KD(t))'] = trE[K'(X(t) - X(t))QD'(t)] (5.13) where the first equality is a property of the trace operator. Since

(5.13) is true for all real K, we have

E[(X(t) - X(t))QD'(t)] = 0 (5.14) for D(t)E-zd(t). Note that X = ,Y, where Wq is the operator with kernel

t(t,s), and (5.14) implies

t E[(f '(t,O)Y(0) - X(t))QY'(s)] = 0 (5.15) 0 for all s < t, which is just (5.8).

If we interpret X(t) as a team estimate of X(t), we may prove a

certainty equivalence theorem.

Theorem 5.2. Let 0 be an nM x M matrix. Then the optimal

solution to the team problem

Min (D(t) - @X(t), D(t) - 4X(t)), (5.16)

where D(t)Ed(t) and X diag[z(t),...,z(t)] is CFX(t),

where X(t) solves min [J(D(t))D(t)E(t)].

Pf: A necessary and sufficient condition for D(t) to solve (5.16) is

(D(t) - bX(t), D(t)) = 0 (5.-17)

for all D(t)Ed(t). From (5.13), it is clear D = @X(t) satisfies (5.17). Q.E.D. To see that theorem 5.2 is a certainty equivalence result, simply

note that if X assumes a single value, say X0, with probability one, then

PX0 solves (5.17). This theorem generalizes the certainty equivalence

result of classical decision theory which states the solution of the 89 quadratic loss problem min E[(u-Gx)'(u-Gx)], where x is a random vector and u is the decision, is Gi, where X^ solves the estimation problem min E[u-x) ' (u-x)].

With Theorem 5.2 we may also solve the problem

Min J2(u(t)) = E[(u(t) - S(t)z(t))'Q(u(t)- S(t)z(t))], where S(t) and u.(t) are defined in (5.1). Recall that minimizing

J 2(u(t)) is equivalent to solving (5.1). Also recall that (5.15), i.e.

(5.8),is equivalent to (5.6). Thus, the team estimate interpretation of

X(t) provides further insight into the nature of the sufficient statistics

{GYy.}. which form the elements of X(t). SJ J,yTI Theorem 5.3. Let S.(t) be row j of S(t). Then the optimal

solution u to min J2(u(t)) is M u. (t) = Z S.(t). .(t), i=l,...,M, (5.18) 1 j=l

x .(t)- where is column i of X(t). (Note x (t) 31

x i(t )_

is an n-vector.

Pf: Setting =S(t) ESI(t),...,sM(t)l(5.19)

0 in (5.16) yields the objective

(D(t) - 0S(t)X(t), D(t) - t)X(t))20S = (5.20) E[(D (t) - S(t)z(t))'Q(D'(t) - S(t)z(t)) + Z D.(t)QD (t)], 1 1 i=2igs where D.i(t) is row i of D(t). The first term on the right-hand side of 90

(5.20) is J2(D'(t)).' By Theorem 5.2 the solution D(t) minimizing (5.20) 2 1 is

u 1(t)...uM4(t) D(t) = S(t) X(t) = (5.21) where U.i(t) is as defined (5.18). Q.E.D.

It is important to note that although we have a certainty equivalent solution for minimizing J1 (D(t)), if fu(t) solves the team problem min E[(u(t) - z(t))'Q(u(t) - z(t))], then in general the optimal solution minimizing J 2 (u(t))is not S(t)u(t).

P 91

5.2. Solution of the Estimation Problem

In this section we solve the decentralized Wiener-Hopf equation

(5.8). For the same reasons as in the classical case, this equation is hard to solve. For certain special cases, however, the solution to the classical Wiener-Hopf equation is very easy to find. For example, if the observation process is white noise, then there is a simple formula for the optimal estimating filter. Sometimes it is possible to replace the observations by the innovations process as we did in proving the classical separation theorem in Chapter III.

We will develop a decentralized innovations method to solve (5.8) and to do this we need an extended definition of white noise process.

Definition 5.1. Let V(T), 0 T < T be an N x M matrix

process. Then V(-) is a Q-white noise if and only if

Z (t)6(t-s), where Z (t) > 0. E[V(t)QV'(s)] = VV;Q VV;Q

Remark 5.3. Definition 5.1 is a nonrigorous definition as is the classical white noise definition. To be rigorous we define the integral t r(t) E f V(T)dT as a Q-orthogonal increments process, i.e. 0 tAs E[t(t)QY'(s)] = f E (r)dT, (5.22) 0 VV;Q where tAs E min(t,s). Although the subsequent arguments in this chapter are in terms of the Q-white noise, everything may be made rigorous by working with Q-orthogonal increments processes.

Next, we prove that if the team information is a Q-white noise,

then (5.8) is easy to solve. Let the team information process be the 92

Q-white noise process V(-) where column i of V(-) is team member i's information. Then let the optimal solution be

t X(t) = f U(t,T)V(T)dT. (5.23) 0 t Remark 5.4. Because Y(T) has the diagonal form, f (t,T)Y(T)dT 0 represents all linear functions of the data. Also because V(-) is not necessarily of the diagonal form, (5.23) is not the most general linear function such that column i of X(-) is only a function of the elements of column i of V(). For our purposes, however, (see Theorem 5.4, below),

(5.23) is sufficiently general.

As in (5.15) we derive

t E[f U(t,T)V(T)QV'(s)dT] = U(t,s)Evv;q(S)= E 1 (t s), (5.24) 0 where the first equality holds because V(-) is a Q-white noise.

Therefore if E (t) is nonsingular, then VV;Q U(t,s) = EXVQ(t,s)E A(s) and t X(t) = f (E5Q(tT)ES (T))V(T)dT. (5.25) 0

This formula resembles the classical linear least squares result except that our V(') is a Q-white noise process and the solution is the matrix

X(t) rather than a vector. Note also the integrand of (5.25) is a function only of a priori information, and column i of X(t) is a function only of column i of {V(T), 0 T t} as is required in the team problem.

Although (5.25) is a useful formula when the information is a

Q-white noise process, in most cases it will not have this property. 93

In some models the observations are equivalent, however, to a Q-white noise process in the sense that any linear function of the original ob- servations may be written as a linear function of the related process.

We call the related process a nonclassical innovations process.

In addition, this process may be obtained by passing the observations through a causal and causally invertible decentralized filter. In the decentralized filter, each agent of the team puts his own observations into a causal and causally invertible filter. Then the individual output processes form the columns of the nonclassical innovations process, which satisfies Definition 5.1.

In classical theory, the observations are often assumed to have the form y(t) = r(t) + 8(t), 0 < t < T, (5.26) where E[0(t)0'(s)] = 6 (t-s), E[r(t)r'(t)]< o, and past signal is uncorre- lated with future noise, i.e.

E[6(t)r'(s)] = 0, 0 S s < t T. (5.27)

To derive a nonclassical analogue of (5.26), let C (t) (i=1,...,M) be a

P. x n matrix and define the P x M matrix

X (t) = C(t)X(t), (5.28) N C where P 3 Z p., C(t) E diag[C1 (t),...,C(t)], and X(t) _=diag[z(t),..., i=l z (t)]. Also define the P x M matrix process

8(t) = diag[01 (t),...'m(t)], (5.29) where 6.(t) is a p.-dimensional process and

E[0(t)Q6'(s)] = (t)6(t-s), ZQQ(t) > 0, (5.30)

0 t,s c T. I I

94

Now let the team observation matrix Y(t) satisfy2

Y(t) = XC(t) + e(t), (5.31)

and assume future values of 0(-) are Q-orthogonal to past values of

X C(-), i.e. C E[0(t)QX'(s)] = 0, 0 S s < t . T. (5.32)

The interpretation of (5.31) is that team member i observes a noise

corrupted measurement of linear combinations of the state vector X(-).

Thus, (5.31) is equal to

Y.(t) = C.(t)z(t) + 0.(t), i=l., (5.33) S11 Let the team estimate of XC(t) given the information {Y(T),

0 < T S t} be denoted XC(t). Theorem 5.2 implies XC (t) = C(tX(t), where X(t) is the optimal linear estimate of X(t). Therefore,

t XC(t) = C(t) f W(t,T)Y(T)dt 0 t 71-f (t,T)Y(T)dT, (5.34) 0 where W(t,T) is defined by (5.8). See also (11.4), in Appendix II, the

equation for dj(t,T). We now define a nonclassical innovations process.

Definition 5.2. The nonclassical innovations process for the

team problem of minimizing J1 (D(t)) with data (5.31) is the

process V(t) = Y(t) - XC(t)

= C(t)(X(t)- X(t)) + 0(t). (5.35)

2. Note that it is possible to have a Z(t) process in (5.31) instead of Xc(t), where Z(t) is- related to the state process. For our purposes, however, we do not need this slightly more general formulation. 3. A sufficient condition for (5.32) is that future values of 0i(-) (i=1, ... ,M),are uncorrelated with past values of z(-). 95

The following theorem is a nonclassical version of the linear innovations theorem.

Theorem 5.4. The innovations process, defined by (5.35)

satisfies the following properties.

(i) E[V(t)QV'(s)] = %Q(t)6(t-s), (5.36)

whereE 06Q(t) is defined in (5.30);

(ii) {V(T), 0 T S t} = t{Y(T), 0 T < t}, (5.37)

where for a matrix process M(-), Z{M(T), 0 T ti is the set t of linear functions of the form f N(t,T)M(T)dT, for some 0 matrix kernel N(t,T).

Pf: Appendix II.

Condition (i) of the theorem says V(-) is a Q-white noise with the same

Q-covariance as the process noise 0(t). Condition (ii) says that any linear function of the observation process may also be written as a linear function of the innovations. The reason for this is that the operator R whose kernel is defined by (5.34) is causal and causally in- vertible, i.e. if we write in operator notation

V = (I - ), (5.38)

then (I - 8) is causal and

Y(Il )~1V. (5.39)

Figure 5.1 illustrates the structure of the optimal solution D(t) of the problem of minimizing (D(t) - 4(t)X(t), D(t) - G(t)X(t)).

Of course, Figure 5.1 is equivalent to M separate structures, each comput-

ing a column of X(t), as is required by the decentralized information I

96 constraint. Moreover, it is crucial to remember that to simplify the solution to the decentralized Wiener-Hopf equation, it is not sufficient for each team member to whiten his own observation process, i.e. to compute v.(t) = y.(t) - C.(t)z(t), (5.40) 1 1 1 where z(t) is the classical linear least-squares estimate based on y (-); the matrix process v(-) B diag[v1(-),(...V(] is not a Q-white noise.

Y M) + VM tX (M)D (M Z V U(t,-r)V(-r)d-r - D (t)

C(t) + -

Fig. 5.1. The optimal linear team decision structure.

But rather a joint whitening of the aggregate information process Y(t) must be carried out within a decentralized computational scheme.

The final topic of this section is a proof that when the state and information process are all jointly Gaussian, the optimal linear causal solution is optimal over the class of all (linear and nonlinear) causal solutions. To prove this let D(t) B D[Y(T), 0

{y (T), 0 < T < t}. We will show J (D(t)) 2 J(v()), where X(t) is the optimal linear estimate. Begin by writing 97

J (D(t)) - tr E[(D(t) - X(t))Q(D(t) - X(t))']

= tr E[(D(t) - 2(t)+ X(t) -X(t))Q(D(t)-(t)+X(t)-X(t))']

= tr E [(D(t) - X(t))Q(D(t) -X(t))

+ 2(2(t) - X(t))Q(D(t) -- (t))'] + J (2(t)). (5.41)

To prove

E[(X(t) - X(t))Q(D(t) - 2(t))'] = 0 (5.42) first note the orthogonality condition (5.15) implies

E[(X(t) - X(t))QX'(t)] = 0 (5.43)

Second, (5.15) and the Gaussian assumption imply each component of the vector process {yi(T), 0 T t} is independent of all elements of column i of the matrix (X(t) - X(t))Q. Therefore,

E[(X(t) - X(t))QD'(t)] = 0 (5.44) and

J (D(t)) = (D(t) - X(t), D(t) - X(t)) + J(X(t)) t J (X(t)). 111 (5.45) I

98

5.3. The Decentralized Kalman-Bucy Problem

By assuming X(t) solves a finite-dimensional linear stochastic differential equation, we derive a recursive formula for the team estimate X(t). Let z(t) solve the equation

z(t) = A(t)z(t) + ((t),

z(0) = z 0, 0 < t < T, (5.46) and t(t) satisfies

6 E[C(t)] = 0, E[t(t)C'(s)] = EU(t) (t-s)

Z (t) > 0, 0 < t,s < T, (5.47) and

Ez = 0, Ez z' Zz z > 0, E[C(t)zl] = 0. (5.48) 0 0 0 0[0

To put this into partitioned matrix form, define

Ed(t)diag[A(t),...,A(t)] (5.49)

E(t) E diag[C(t),..., (t)] (5.50) where d(t) and E(t) each have M blocks. Then we have

k(t) = 'd(t)X(t) + 3(t) (5.51)

X(0) = X0 , 0 < t < T, and

E[E(t)] = 0, E[E(t)QE'(s)] = Z (t)6(t-s) = (5.52) =(Q@I)diag[E (t).., ()6(t-s), and

EX0 = 0, E[E(t)QX ] = 0,

diag[EZ '' 0(5.53) E[X0 QXh] = ZXXQ = QOI where the Kronecker product Q0I satisfies 99

Q@I = (5.54) qM ... q I and I is the n x n identity.

The observations of team member i are

y.(t) = C.(t)z(t) + 0 (t), (5.55) where C.(t) is a p x n matrix and 0.(t) is a classical white noise satisfying

E[0.(t)] = 0, E[0.(t)06(s)] = E (t)6(t-s), I3

(t) > 0, E[0. (t)z'] = 0, E[0.(t)t'V(s)] = 0. (5;56) 116 0

In partitioned matrix form, we have

Y(t) = C(t)X(t) + 0(t), 0 < t < T, (5.57) where C(t) E diag[C (t),...,C (t)] and 0(t) diag[0 (t),...,0M(t)]. 1 ' M.IM E 1 The assumptions on 0.&) imply 0(-) is a Q-white noise and

E[0(t)] = 0, E[0(t)Q0'(s)] = X% (t)6(t-s),

E[0(t)QX'] = 0, (5.58) 0 where

q 1 1 % (t) 0 ... qlMZe 1 M(t) E =(5.59)eQ(t) q M1 5 (t) ... qZE' 0 (t)

and Z0 0 ;q(t) > 0 because Q > 0 and (5.56) holds. Furthermore our assump- tions also give

E[0(t)QX'(s)] = 0, 0 < s < t < T, (5.60) which is required in our innovations theorem. I I

100

To derive a recursive formula for X(t), we compute first the non-

classical innovations process and then use formula (5.25) to find the

decentralized filter. The innovations process is

V(t) = Y(t) - C(t)X(t)

=C(t)X(t) + 6(t), (5.61)

where X(t) B X(t) - X(t). The result is contained in the next theorem.

Theorem 5.5. Let X(t) and V(t) satisfy (5.51) and (5.61) respec-

tively. Define

S(t)E(t)C'(t)E4Q(t) (5.62)

and

E(t) E[X(t)QX'(t)]. (5.63)

Then the optimal linear team decision for (5.9) satisfies

X(t) = 4(t)X(t) + o((t)V(t), X(Q) = 0, (5.64)

or equivalently, the optimal decision of agent i, i.e.

column i of X(t), satisfies Aixi x (t) = sd(t)x^ (t) + X(t)[I.y.(t) - C(t)x (t)], (565(5.65)

M where I. = [0',...,I',...,0'] is a E Bp P x p.-dimensional i=1 partitioned matrix with block j (jji) a zero matrix with p.

rows and block i a p.-dimensional identity. Also E(t) solves

the Riccati equation

2(t) = Y(t)E(t) + Z(t) /'4(t) - (t)%E (t)t'(t)+Y.-3 ;Q(t), ee;Q =-,Q X(5.66) E(O) = E . X0 X ;

r7F1 - - --l 101

Pf: For this problem (5.25) implies t -1 X(t) = f E[X(t)QV(T)Z0 6 Q(T)V(T)dT. (5.67) 0 Differentiating (5.67) and using (5.51), we obtain

X(t) = E[X(t)QV(t)]4Q (t)V(t)

+ f E[X(t)QV(T)]Z4Q(T)V(T)dT

= E[X(t)QV(t)]Z4Q(t)V(t)

t -1 + d(t) f E[X(t)QV(T)]E;(T)V(T)dT 0

+ ft E[E(t)QV(T)]Z4Q(T)V(T)dT. (5.68) 0

The second term on the right-hand side of the second equality is

jj(t)X(t) and the last term is zero by the assumptions (5.56) and (') is a white process. For the first term we have

E[X(t)QV'(t)] = E[X(t)Q(X(t)C(t) + 6(t))']

= E[(X(t) + X(t))Q'(t)]C'(t) + 0

= 0 + E[X(t)QX'(t)]C'(t)

- E(t)C'(t) (5.69)

The second equality follows from (5.56) and we have the third equality because the error must be orthogonal to all linear functions of the data.

Thus,

E[X(t)QV'(t)]Z ;(t) = 6(t), (5.70) where X(t) is defined by (5.62).

4. This proof is very similar to the derivation of the classical Kalman- Bucy filter by the innovations method. See Kailath [24]. 102

To derive the Riccati equation, note

X(t) = [ 4(t) - x(t)C(t)]X(t) - X(t)O(t) + 3(t), k(0)= X0 , (5.71) which implies

E(t) B E[X(t)QX'(t)] = '(t,0)[E ]''(t,0)

t + f (tg)[ X(T)EX(T) + EZ (T)]'(t,T)dT, where (t,T) is the transition matrix defined by

V(t,T) = d(t)T(tT), T(0,0) = I. (5.73)

Differentiating (5.72) yields (5.66). Q.E.D.

Figure 5.2 illustrates the feedback structure of the decentralized optimal filter. A striking feature of the filter is that all agents' linear systems of the form (5.65) are, except for the input processes, identical. Because E(t) may be computed offline by solving the equation

(5.66), the gain &(t) is precomputable. Furthermore, because the optimal cost is

tr E[X(t)QX'(t)] = trE(t), (5.74) the performance is precomputable and does not depend on the data. There 5 are many results on the asymptotic behavior of Riccati equations, and although it is not considered in this thesis, the asymptotic behavior of the system (5.64) may be obtained from a study of (5.66).

5. See Kalman and Bucy [27].

I 4 s(t) + t) t )d t 83702AW051|

A(t)

Alx MI

+Ct(t) + +y(t) ++c M ,

a

Ai)

C(t) + z(M)dr )i e(t) C(t) O

Arnct) + + Aff+ o),AM~g t(x)(t C Ymt) ( +)x

cm~t) Im (t)

C(t)

Fig. 5.2. The decentralized Kalman-Bucy f ilter. I I

104

5.4. The Second-Guessing Paradox

A common explanation of why the optimal linear solution to decentral- ized linear dynamic optimization problems of the form (5.1) or (5.9) should consist of infinite-dimensional filters, in contrast with ;the finite-dimensional Kalman filters of the classical problem, is the second- 6 guessing phenomenon. Theorem 5.5 shows, however, the solution may be realized by a finite-dimensional filter. To resolve this apparent para- dox, we apply the second-guessing argument to our problem and then discuss why it does not hold.

Assume (5.9) is to be solved with only two agents in the team.

Let z(t) have dynamics (5.46) and the information be given by (5.55).

Since it will be convenient subsequently to use the familiar classical

LQG separation theorem, also assume all random varaibles are Gaussian.

Therefore, the optimal solution at time t for agent i (i=1,2) is the

2n-dimensional column vector x (t), i.e. column i of X(t), which is a function only of {y(T), 0 < T < t. The second-guessing argument begins by assuming x (t) is the output of a linear system, i.e. there exists some ni-dimensional vector x* such that

x (t) = L (t)x^4(t) + B.(t)y.(t)

t (t) = M. (t)x (t). (5.75)

6. Aoki [33 and Rhodes and Luenberger [42] discuss briefly second-guess- ing. Whittle and Rudge [52] use a frequency domain argument to prove the optimal solution to an LQG team problem has an irrational transfer function. In the case they consider there is feedback of the control into the state dynamics to give an information pattern that is not partially nested.

-T7 T 105

Figure 5.3 is a schematic diagram of the dynamic systems (5.46) and (5.75)

generating z(t) and ii(t), i=1,2. The solid curve encloses the system as

seen by agent 1 if agent 2's controls are fixed. Similarly the dashed

curve encloses the corresponding system seen by agent 2.

it M FILTER i

/ IM

M ZMt)SYSTIEM X -

- y'(t)

^2(t FILTER 2

Ai Fig. 5.3. The dynamic systems generating z(t) and x(t). This means that if we define

1 zt ct (t) 2 (5.76) 22

2 z (t) c (t) - (5.77)

then a necessary condition for optimality is that the output of filter 1,

x (t), be the optimal solution to the classical LQG problem with criterion

(5.9), controls d1(t) (first column of D(t)), state vector cI (t), and

observations y1 (.). Again there is the corresponding result for agent 2. 106

If these necessary conditions, which are called person-by-person optimal- ity, were violated, then at least one agent could unilaterally improve the cost.

Now by the classical separation theorem, we conclude the optimal control x1(t) is a linear transformation of the components of the Kalman filter estimate of a1(t) based on the information y 1 (. Hence assuming the representation of (5.46) and (5.75) is controllable and observable filter 1 has dimension equal to that of a1(t), namely n2 + n, which

to agent 2 shows implies n1 = n2 + n. The same argument applied n2 = n1 + n. Thus, we have

n1 =n2 + n (5.78) n2 = n + n

For positive n, there exist no real, finite n1 and n2 satisfying (5.78).

Therefore, according to the second-guessing argument, (5.75) is an invalid assumption and the optimal solution is not generated by finite-dimensional filters. The reason given for this is that agent i estimates agent j's decision and agent j's estimate of i's estimate, and so on. Because of

the infinite sequence of estimates of estimates, the optimal solution is

infinite-dimensional. Thus, we seem to contradict Theorem 5.5.

When we examine the second-guessing "proof" closely, however, we

find an important assumption is not rigorously justified. It is assumed

that computing the person-by-person optimal solution requires an estimate

min[J(u ,...,uM 7. A decision (u1 ,...,u) for the optimization problem 1 u.SU.] is person-by-person optimal if for each i (i=l,...,M) 1Y. J~u ..., ,..,uM< J(u1,...,u.,..., u), for all usU . 107

of the complete state. What is actually needed, however, is only a parti-

cular linear transformation of the state estimate. We now show this

linear transformation and hence xi (t) may be obtained from a filter of

lower dimension than that of a (t).

To derive the person-by-person optimal solution, complete the square

in (5.9) and alternately substitute the optimal decisions of agents 1 and

2 to get

1(d 1 (t),22(t)) =

= E q1 {[d 1 (t) - z(t) + a i 1 2(t)]'[d1 (t) - z(t) + a1 i1 2(t)]

+ [d21(t) + a (22(t) - z(t))]'[d2 1 (t) + a(x(22() z(t)) ]}

+ (q2 2 (t)21 2 (t) + (i22(t) - z(t))'( 2 22(t) - z(t))]

(5.79) and

i(R 1(t)5,d2(t))

= E[q2'[d1 2 (t) + a2 (i1 1 (t) - z(t))]'[d1 2 (t) + a2 (i1 1 (t) - z(t))]

+ [d 22(t) - z(t) + a2 i2 1(t)]'[d 2 2(t) - x(t) + a2i 2 1(t)

+ (q - [ 1 1 (t) - z(t))'(21 1 (t) - z(t)) + 2 1 (t)x2 j1

(5.80)

where q Bq1 , q22 Bq 2 ' q 1 2 =-q 2 1 B q, a1 Bq/q 1 , a2 BHq/q 2 , and Xi. (t) t =f {y(T), 0< T< t} and is as in Theorem 5.3. If we define ( 1 1i U (t) E E[a' (t)I&t], then from (5.79) and (5.80), we conclude the optimal

d (t), which is x (t), satisfies 108

f (t) = T.&'(t), i=1,2, (5.81) where T are partitioned matrices defined by

1-a1

TE [(5.82)

a2212 -a2 1 -a 2

T E(5.83) 2 0O -a 21 and I is the n x n identity.

We have already characterized x2(t) in Theorem 5.5 as the solution to a linear differential equation; and z(t) also satisfies such an equa- 1^2 tion. Therefore, to minimize J1(dl(t), x2(t)) agent 1 may use a Kalman filter having the process y 1 () as input to compute c^1(t) and then obtain x (t) by multiplying the filter output by T1 . Agent 2 minimizes 1 2 J1(x 1(t),d (t)) similarly. Before writing the filter equations, we assume (t) = 0, ifj, and define 1 J

E.(t) ELC 1(t)C.(t),'(t) ~1 i=1,2, (5.84) 1t( 11 partition E(t), given by (5.66) as

( Z(t)1

E(t) = I ,(5.85) 3(t) E2(t)

and also define the system matrices (derived from (5.46) and (5.65)) for

the a (t)

-F ~-r 109

0 0

A~)E(t)E (t) -z (t) (5.86) [A(t) E2 (t (t) A~)E(t)E 2 (tj - 3 (t)E 1

A (t) 0 0

1 Z(t)E1(t) -E(t)E2(t) (5.87) A(t)- (t) E (t) 3A 2 23 Z3(t)E 2 (t) A(t)- 2 (t) E 2 (t)

Also define

Z;(t)E (t)E'(t)] Q (t)E(tE(t) 2 (5.88) Q(t) 2(t)E2(WE3(t) Z(t)E 2 (t)E (t)

(t)E (t)E(t) (5.89) 2(t) 3 (t)E (t)E'(t) E 3 (t)E1 (t)E;(t)]

Let

A B) A (t) A'(t) i=l,2, (5.90) i2 i6

A i5() A 6(t) Ai3(t)

satisfy the Riccati equation

(t= r(t)A(t) + A (t)r(t) 1 1 1 .

[iEi(t) 0 0 U (t) 0 -A. (t)0 0 0 A.(t) + (5.91) 10 1 0 0 2.(t)] 110

Agent 1 determines his optimal decision from the linear system

Cl(t)

(t) = 1(t)-c(t) + A1 (t) [ 0] Zj1(t)[y(t)-[C(t),00]t (t)] F1 +Al0 0

x = TT(t) (t), cx(0) =0, (5.92) while agent 2 determines his decision from

cx (t) = P2(t)& t)

[C;(t)]

+ A 2 (t) 0 zj0 y (t)[y2(t) - [c2(), 0]t2(t)I 2 C(2),0 x~~ (tW 0

2 x2 t)=T2 & (t), s20) = 0. (5.93)

Systems (5.92) and (5.93) are each 3n-dimensional state variable

representations of linear systems with inputs y(') and y 2 (), respective- 1 'l2 ly, and outputs x () and x2(-), respectively. From linear system theory

we know that equivalent representations of a system may be obtained from

nonsingular transformations of the state variables. Thus, define the new

states 1 (t) by

T. 1.lt 3(t) = J t, i=l,2. (5.94) I

From (5.92) and (5.93), and using Lemma AIII.1 in Appendix III, we may

derive the dynamic equations for (t), namely

I 1 7-rI " 111

A(t)-E I(t)EI(t) -E (t)E2(t) o

1(t)-Zjt)E= 3 3 2 01 13 (t)

L1 T3]

(t)C (t)1 (t)/q 1

3(t) C' W)E (t) /q y (t) 11 11

-n4

-00 ^l a (0) = 0, (5.95) 0

and

A(t)-E(t)E(t) -z(t) 0 E2 (t)

2 (t) = -Z3 (t)E (t) A(t)-Z 2(t)E2(t) 2 2 T5 ' I71

z(t)c;(t)E (t)/q 2 2 2 (t) E2(t) I(t)E 12(t) /q2 Y2

= 2(t) : ] 13(t), 132(0) = 0, (96(5.96)

where {mn} represent elements which we will now prove are irrelevant to

our problem.

In each of the above representations the first two n-dimensional

states and the output are not affected by the third state. Therefore, the

third state may be eliminated from the representation. Notice this state 112

is unobservable. When it is eliminated, we get exactly the original solution (5.65) derived in Section 5.3. Thus, by a thorough analysis of

the person-by-person optimal solution, we have resolved the second- guessing paradox for the class of decentralized linear systems in which

the controls do not influence the dynamics.

As we indicated in footnote 6, this result does not imply the optimal

solution to all decentralized LQG problems should have finite-dimensional

sufficient statistics. Our discussion shows, however, that a careful rigorous analysis, rather than an informal justification, is required in applying the second-guessing argument to decentralized dynamic optimiza-

tion problems. 113

CHAPTER VI

Delayed Sharing Information Patterns

This chapter is concerned with systems in which there is delayed exchange between the stations of all measurements. We prove a separation result for the control laws and then apply it to the optimization problem.

6.1. Separation of the Control Law

The results in Chapters III, IV, and V illustrate the importance of separation principles in stochastic control. As discussed in Chapter

I, when the optimal control law separates into an estimation part and a decision part, valuable structural insight is gained which is useful in implementing either the optimal or some suboptimal control scheme.

Moreover, if the estimation problem has a recursive solution, then large scale data reduction is possible as in the Kalman filter case.

In this chapter we discuss a separation result different from the division of the control law into an estimation and decision part. We show that for delayed sharing information patterns the-control laws may be written in terms of the uncontrolled processes as the sum of a term linear in one operator and nonlinear in another. The linear term involves the shared information, while the nonlinear term involves information uncorrelated with the shared, i.e. common, information. When this form of the control law is substituted into the general performance index, some simplification results.

Other results are obtained in discrete-time for the one-step delay problem by Sandell and Athans [45], Kurtaran and Sivan [30], Toda [48], 114 and Yoshikawa [60]. Witsenhausen [56] and Yoshikawa and Kobayashi [61] also discuss discrete-time separation. Varaiya and Walrand [51] and

Yoshikawa and Kobayashi [61] show that a conjecture of Witsenhausen [56] for n-step delay problems is false. In section 6.2 we discuss the count- erexample presented in [51].

Consider the two station control problem with observation equations

y2y + N u2 (6.la)

y2 2 211 (6.lb) Let the criterion be (3.13). Suppose the information sets are

d.(t) = {y.(s.S, s.E[0,t], y.(s.), s.c[0,t-T], ij} (6.2)

i = 1,2, te[0,T] where T < T. If we clefine

(t) = {y(s), se[0,t-T], i = 1,21, tc[T,T}, (6.3a)

(t) = 4, tE[0,T], (6.3b) where $ is the empty set, then (X(t) is the common information at time t, i.e. the information both agents have. Define J(t) as 0 0 (t) = {y0(s), sE[O,t-TI, i = 1,2 , tE[T,T] (6.4a) 0 o (t) = tE[0 } (,6.4b)

The controls may be chosen as

[:1 [:11 H 12 lI

U2H21 H222 where 1. In this chapter we restrict attention to two station teams, although the results are easily generalized to M station problems. 115

H 1[ H1 2]

H CE 21H 2

Note the kernel of H.., itj, is of the form H..(ts) = 0, s > t - T. 1J

We now prove /(t) is equivalent to 90 (t) in the sense of linear operations. Define the stochastic processes

y.fV(t-T), t>T t

r 0 y (t-T), t >T

zjt( 000, t < T, i = 1,2

z (( z(t) B

z 2 (t)

Note that each agent's control may be a causal linear function of the process z(-). 0 Theorem 6.1. Let Z(t) and Z (t) be the sets of causal

linear functions of j(t) and 40(t), respectively. Then

Z(t) = Z0(t). (6.6)

Pf: From (6.1) we have that

(6.7a) zi = z1 + 12 Z, Z,+ N _u 0 2 D12 2 z2 z20+ DN210, (6.7b) where the operator N9. is defined by 13

t-T g = . g (t) = N. (0)dO. 1iJ f .(t-T,6)u (6.8) 0 13 116

But the information constraints imply there exist linear operators T1 and T2 such that

ND U =Tz (6.9a) 12 2 1

D N2 u =Tz, 21 1 2 (6.9b) because {u.(s)jse[0,t-T]} is a causal linear function of z(-). Therefore,

(6.7) implies0 T

z = z + II z (6.10) .T2 and by the same arguments used in Theorem 3.4, we get Z(t) = Z (t). Q.E.D.

Define the linear projection of y(p) into Z(t) by

y(pIt) = EL[y(P)JZ(t)]

= EL 1Z(t) , (6.11)

p, tE[0,T].

Since Y(plt) is a linear estimate, there exists a kernel F (t,p,s) such y that t 0 y(pIt) = f F (tp,s)z (s)ds (6.12) 0 y or in operator notation

0 yyt =1F' t 2(6z3 (6.13) y where F (t,p,s) may be found by solving the nonstationary Wiener-Hopf y equation (see [50, Ch. 6]). -Notice that rt is an operator, but if differs y from the type of operators we have been considering since the kernel depends on three parameters rather than two. The parameter t in

P (t,p,s) defines a family of operators, with one for each value of t. 117

We discuss the characterization of F later. y If we let

y(pjt) y(p) - tOZ , pc[0,T, (6.14) then by the orthogonal projection theorem, we have

Efz (s)'(pI ] = 0, sE[0,t]. (6.15)

We may now express the control porcess as 1 0 2 ~ u.=H. z + H.y

10 2 2 0 H. z + H.y. - H.P z i = 1,2 (6.16) i or 1 0 2 2 0 u=H z + H y- H2IYz0 (6-.17) where the last two terms are defined as 2 2 2 0 (H2y)(t) = H y - H r z y

ft H2(t ,p)3(p It)dp 0 2 tO0 = f H (t,p)(y(p)-(P z ) (p)]dp 0 y

f H2(t-.p)[y(p) 0

(6.18) 0 and the operators H satisfy

H 2

H H 2 H 2 0 H2

There is no loss of generality here s;ince any causal linear function of the process y. may be written in the form (6.16). Suppose, for example, 118 we wanted to realize the control u. = . z0 + G.2y.. Then in (6.16) 1 1 2 2 2 simply set H. = (G. + G. F ) and H = G.. In (6.17) we define the ker- 1 1 1 y. i : 2 nel of the operator H r as y

t 2 f H (t,p)1' (t,p,s)dp. (6.19) 0

In this definition we have removed the "t" on rU, but notice how t y appears in (6.19). Also, when t 2 T, we have (6.18) equal to

(H y)(t) = ft H (t,p) (pjt)dp (6.20) t-T

since for p > t - T, y(pjt) = 0.

Although (pjt) depends on H2 , it is independent of the control laws used on data taken before t - T, i.e. it does not depend on H . To see

this, we derive an expression for the y process in terms of the uncontrol- 0 led process y . Equations (6.1) and (6.17) imply

0 2 Y,=y 1 + N1 2[H2 z + H y2 H2T z0

0 1 0 2 2 0 + H 2y - H 2 z } y 2 2 2 + N21[H1 1z0 1 z + 1 y 1 1 y 1 or 0 1 0 2 2 0 y = y + NH z + NH y - NH F z . (6.21) y From (6.21), we obtain 2 -1 0 1 0 2 0 y = (I - NH ) [y + NH z - NH2F z . (6.22) y Because all operators are causal, the process (I - NH ) [NH z -NH P z ] y

is contained in Z0(t), i.e. at time t these terms depend only on

{z0 (s), sc[0,t]}. Thus, this process cancels out of y(pjt). If we define 2 E2 as in (3.19), i.e.

E ~E(I- NH) ,(6.23) 2 119 then we have 2 0 2 0 0 y(pjt) = (E2 y )(p) - EL[(E2y0)(p)|Z0(t)], (6.24) and we see y(pjt) is independent of H'.

To further characterize (6.24), we use the following lemma on mean- square estimation.

Lemma 6.1. Let p(-) and q(-) be vector stochastic processes

on [0,T] and let A be an operator with either delta function

kernel or square-integrable kernel. Suppose the process r(-)

satisfies T r(t) = f A(t,s)p(s)ds, tcfo,T]. 0

Define the linear minimum mean-square error estimates of r(')

and p(-) in terms of {q(s), sc[0,t]} as

(plt)B EL(r(p)jq(s), ss[0,t]]

p(p t) B ELp(p)lq(s), sE[0,t]].

Then these estimates satisfy

T (plt) = f A(p,s)^(sJt)ds (6.25) 0 T p(pJt) = f F (t,p,s)q(s)ds, (6.26) 0p

where the kernel of t satisfies P t F (t,p,p)E (n,s)dn = E (p,s), (6.27) 0 p q pq s t.

The matrices E (q,s) and E (p,s) are, respectively, the q pq kernels of the covariance operator of q and the cross-

covariance between p and q. 120

Pf: The orthogonality condition for the optimal estimate implies the necessary and sufficient condition (see [50, Ch. 6])

E[(p(p) - p(pjt))q'(s)] = 0, s C t. (6.28)

Thus, we have, by integrating (6.28),

T 0 = f A(G,p)E[(p(p) - P(pjt))q'(s)]dp 0 T =Ef A(6,p) (p(p) - p(pjI t))q' (s))dp 0 T = E[r(O) - f A(e,p)f(pt)dp)q'(s)], (6.29) 0 S5 t, where we assume the expectation and integration may be interchanged (see

[18, p. 431]). Since the orthogonality property characterizes the estimate, (6.29) implies (6.25). To prove (6.27), notice that (6.28) implies

t E[(p(p) - f rp(t,p,Tr)q(n)dn)q'(s)]-= 0, 0 which is (6.27). Q.E.D.

Applying the lemma to (6.24) yields

2 0 0 2 t0 y(pjt) = (E y )(p) - f E2(p,s) f P o(t,s,ck)z ()d~ds, 2 2 0 y or in operator notation

t =20 2 t 0 y E2y E2 yoz (6.30) t 0 In (6.30) the operator rto does not depend on the control laws because y y and z are uncontrolled stochastic processes. Thus, we may write the controls as 121 1 0 2 2-0 u = H z + HE2Ey

1 0 2 20 22 0 H z + H E y - H E F o z, (6.31) 2 2 y where (H2 E2Fo z0)(t) is defined as

2 2 2 t 0 f H (t,p) f E (p's) f r o(t,s,4)z (4)dxdsdp. (6.32) t-T 0 2' y

The lower limit on the p integral is t - T since H2Ey2 0 - H2 E FOz0 is 2 2 the same as (6.20). Because E2 is a nonlinear function of H 2 , (6.32) is 2 1 nonlinear in H2, but it is linear in H . Using the definition F(H) =

H(I - NH) , we may write (6.31) as 1 0 2 0 2 0 (.3 u =}H1z +F(H)y -F(H2)FO0z0(6.33) y _1 2 u + u , 1 1 0 2 2 0 2 0 where u Hz , u2 E F(H )y - F(H )F oz y The processes u and u in (6.33) satisfy 1 2 E[u (t)u (t)'] = 0, tEI0,T], (6.34) but 1 2 E[u (t)u (s)'] t 0, (6.35)

if s

1 1 0

0

u 2 (S) = f s (F(H 2 ))(s,2 p)dP .0 2 2 2 - Os (F(H ) 00 )(p, 2y)d(p 2 2 ,) z0(ndd2 sS

2 fS(F(H ))(s,p2tir(p 2 Is)dp2 . (6.36) 0 122

Then E[u1(t)u (s)'] =

E[( ft H(tp1)z 0 (p )dp )( fs(F(H2))(s,p2 0 1 1 (p 2ls)dP 2) 0 0

f f H1(t,p 1)E[z0(p 1 )y0(p 2js)'](F(H ))(s,p2)'dIdp 2. (6.37) 0 0

By the orthogonality condition,

0 E[z (p)y(p2 s)'] = 0, p1 Ej[O ,s]. (6.38)

If s < t, then some p1 in (6.37) will not be contained in [0,s] and the integral will not be zero.

I r 123

6.2. Application to Optimization

In this section we examine the optimization problem with the controls

(6.33) substituted into the cost function (3.13). From (3.13) we have 2 J = tr(QZ + Z R.E ). (6.39) i=1 1 Since

0 2 x = x + E K.u. =11 0 = x + Ku, (6.40) the cost is

J = tr(QZ o + 2QKEux0 + LZU), (6.41) where L is as in (3.14). Define the functions

J1(H1) E tr(QEx o + 2QKH1Eoxo + LH1 1O(H S2(H2x z2Z olX)r)Z 22 2 2 J (H ) E tr(2QK[F(H )kyo0o -F(H )P oX o oI

+ L[F(H)E * ( 2F(H5 P?1zoyoF*(2 6) 2 * * 2 + F(H )P of ogo (H rF 6))

J (H ,H2) E 2tr(L[F(H2 )yo(H1) 2 1 * - F(H )1' Z (H ) ]), (6. 42) y 0 z 0

2 2

The first term has the form of the cost function for the classical problem

linear function of z( The second term J(H has the general nonlinear decentralized feedback form. The final term J3(H1H2) couples F(H ) and

H The reason the coupling is nonzero in general may be seen by writing

...... -~w- 124

J3 in the time domain. From (6.33) and (6.42), we have

312 T Ti2 J (HI,H ) = 2tr(E f I f L(t,s)u(s)ds } u2(t)'dt) 0 0 T T 1 2 = 2tr( f f L(t,s)E[u (s)u (t)']dsdt). (6.44) 0 0

But for general L, (6.35) implies (6.44) is not zero.

If, however, L has a delta function kernel, such as

L(t,s) = L(t)6(t - s), (6.45) then

J = 2tr( T L(t)E[u 1 (t)u 2 (t)']dt) = 0. (6.46) 0

The definition of L in section 3.3 implies L has a delta kernel if the operator K has such a kernel since R . already has kernel R . (t)6(t - s).

This corresponds to a cost

T 2 J = E f [x'(t)Q(t)x(t) + E u!(t) R.(t)u.(t)]dt, (6.47a) 0 i=1 where a x(t) = x (t) + K1 (t)u1 (t) + K 2 (t)u 2 (t). (6.47b)

With J and x given by (6.47), J3 = 0 and the problem of minimizing

J(HI1,H ) separates into two problems

min J1(H1) (6.48a)

min I (H2 (6.48b) where (6.48a) is a problem with classical information pattern and (6.48b) is a general team problem.

Although a controlled state does not appear in (6.47), we may still have such a state in the dynamics. So, for example, if the dynamic 125 equation is (2.17), then the centralized problem has a structure similar to (4.21)-(4.27), while the decentralized problem is still nonlinear.

In the general case (6.43), since J(HI,H2) is quadratic in H1, the integral equation characterizing the optimal solution will be linear in H1 as in the partially nested case where the cost (3.54) is quadratic in the control operator. Moreover if the original problem is partially nested, than the controls may be chosen (analogously to (6.31)) as 1 0 2 0 2 0 u = H z + H y - H 1r oz . (6.49) y In this case the function J(H1,H ) is quadratic in H1 and H and a linear integral equation characterizes the optimal solution. Finally if the cost has the same form as in Chapter V and the controls do not influence the 3 observation processes then J = 0 and the results of Chapter V may be applied to solve the team problem (6.48b). When the state dynamics are finite-dimensional then a predictor combined with the Kalman filter estimate (as in (4.27)) is a sufficient statistic for the centralized problem, while the results of section 5.3 give sufficient statistics for the decentralized problem (6.48b).

Thus, the simplifications we have obtained in the optimization problem for systems with delayed sharing information patterns are based on two transformations of the observation processes. In the first transfor- mation we remove the control dependent terms from the common information so that when the control processes are expressed in terms of the uncon- trolled processes, the control operator for the common information appears linearly. This results in quadratic terms replacing some of the general nonlinear terms in the cost function. The second transformation removes 126

from each station's information the part that depends on the common infor-

mation so that the controls are composed of two orthogonal terms.

We found that the orthogonality property yields separation of the optimi-

zation problem into two independent problems when some restrictions are

placed on the cost function.

The idea of decomposing information processes into orthogonal parts

is an important tool for understanding decentralized control problems.

As another example of this type of analysis, we consider a result of

Varaiya and Walrand. In [56], they show with a two step case that a

conjecture of Witsenhausen [51] for n-step delay problems is false.

The conjecture states that without loss of generality the optimal control

for station i, in a certain model with a state representation, may be

chosen as a function of the conditional distribution of the state given

the common information and the information available only to station i.

Our analysis will provide insight into why the conjecture is false.

The key point is that the projection of each station's individual observa-

tions onto the common information must be removed from the individual

observations before the optimization is carried out.

In [51], the problem of finding the optimal controls in the counter-

example is transformed to the team problem

Min E[(w - u --v)2 + v 2 (6.50)

s.t. u = u(y), v = v(y,w), where u and v are the controls and the uncontrolled random variables y and w are Gaussian with zero mean and covariance 127

2 CF l .5 . w WY

yj = [ Q(6.51) WY 1 C 2.5 1 yw y

The unique optimal solution to (6.50) is

() 1 -, 1 1 (.2 U(y) -=2 -Y, V(y,W) 2 W 4 Y.(6.52)

According to the Witsenhausen conjecture, the optimal solution is found by

solving not (6.50), but 1 2 2 Min 7 E(w - u - v) + v ] (6.53)

s.t. u = u(y) v = v(w).

Because the solution (6.52) is unique the conjecture is false.

In our projection approach we assert the solution may be expressed as

u=u1 +u 2 (6.54)

V =vI + v 2 where u. and v. solve the team problems 1 1 1 2 2 Min IE[(w - u - v) + v ] (6.55a) 21 1 1

s.t. u,1 = u1 (y) v1 = v1 (y) and

Min 1 [(w - u2 2 2+v2 (6.55b)

s.t. u2 = u2(ap) v2=v2 ' where "ap" indicates a priori information and

w = w - E[wfy] = w - 1 (6.56)

Notice that (6.55a) is a classical problem based on the common data y, while (6.55b) is a team problem in which the projection of w into the common information has been subtracted from w. Since u is only a function 128 of y, U2 is a trivial decision. Solving (6.55) yields the solutions 11- y y v1(y) = 0 (6.57a)

U = 0 (W) = - w (6.57b) 2 v 2 2 = - (w y). 2 2

Therefore u=u + u = yand v = v +v= 1/2 w - 1/4 y and we have obtained the correct optimal solution. Notice that E[wjy] = 1/2 y and the solution (6.57a) is a function of the conditional distribution of the

"state" w, given the common information. Therefore a separation result holds after the noncommon information is made orthogonal to the common information.

- r 129

CHAPTER VII

Conclusions and Suggestions

for Future Research

7.1. Summary and Conclusions

A model for linear control of interconnected linear stochastic systems with quadratic performance criteria was formulated and analyzed.

In Chapter II, an input-output model of a linear system with stochastic disturbances was presented. By coupling such linear systems together, an interconnected system model was obtained. Several examples of systems with classical and nonclassical information patterns illustrated the generality of the model. For example, various linear systems which may be described by state space representations turned out to be special cases of the general model.

In Chapter III, the relation between the inputs and outputs was des- cribed in terms of linear operators on a vector space. When the controls were restricted to be linear functions of the observations, the control laws were characterized by linear operators. The different information patterns were interpreted in terms of constraints on the kernels of these operators. Then the operator kernels were further constrained to be square-integrable. Although the square integrability condition implies some loss of generality, it is sufficiently general to include many models previously studied in the literature. Furthermore, it is a physically reasonable constraint. When the square integrability constraint was im- posed, the existence and optimization problems became embedded in a 130

Hilbert space framework. The great advantage of working with the Hilbert space is the powerful mathematical theory for these spaces. Thus, exist- ence of solutions to the feedback equations was easy to prove.

The other main topic of Chapter II was the derivation of the necessary condition for optimality for general information patterns.

This condition was expressed as a set of nonlinear integral equations.

For partially nested information patterns, the necessary condition is also sufficient and equivalent to a set of linear integral equations.

The straightforward derivation of the optimality conditions not only illustrated the power of the operator approach, but also demonstrated the dramatic effect of different information patterns on the form of the optimization problem.

Also in this chapter the difficulties of proving that there exists a solution to the optimization problem were considered. But the addition of a quadratic operator penalty term to the cost function yielded an optimization problem for which the minimum was attained by an admissible control law.

Applications of the operator approach to some classical and nonclass- ical problems were discussed in Chapter IV. The classical separation theorems for systems described by differential and differential-delay equations were derived. For decentralized systems it was shown that con- straining the controllers to have finite-dimensional state space represen- tations is equivalent to a separability constraint on the kernels of the control operators. This idea motivated a more general formulation of the finite-dimensional constraint than had been previously considered in the literature.

1- ": 11"T 7 131

In Chapter V a team problem in which the controls did not influence the dynamics was separated into estimation and decision parts as is done in classical cases, and a certainty equivalent type separation theorem was proven. An equation similar to the nonstationary Wiener-Hopf equation characterized the team estimates. The solution to the decentralized

Wiener-Hopf equation illustrated the importance of two themes from classi- cal and nonclassical estimation theory. To simplify the estimation problem, the classical idea of replacing the original observation process by an equivalent white-noise process, the innovations, was used. It was not sufficient, however, for each station to compute the classical innovations process based on its information, but rather a joint whitening of the aggregate information process, in which each station took account of the other stations' decisions, was required.

A recursive formula similar to the Kalman-Bucy filter was derived for the case where the system was characterized by a finite-dimensional state space representation. The last section discussed the relation between the second-guessing phenomenon and the class of team problems considered in this chapter.

When there is delayed sharing of information, we showed in Chapter VI that the controls may be written in a separated form as the sum of functions of the uncontrolled stochastic processes. For certain cost functions the separated control law led to a separation of the optimization problem into a classical problem involving the common information and a decentralized problem involving information uncorrelated with the common information. 132

As an application of the idea of using uncorrelated information to separate the overall optimization problem, a relatively simple team problem was discussed.

The results of this thesis show clearly the value of the operator approach to quadratic optimization for linear stochastic systems.

The generality of the operator formulation permits modeling a large class of dynamic structures and a wide variety of information patterns.

Analysis of the mathematical optimization problem provides a method for developing decision and control algorithms for existing systems and gives insights which are important in the design of new systems. 133

7.2. Areas for Future Research

1. Since solving the optimality conditions involves, in some cases,

solving a set of nonlinear integral equatins, approximation techniques

must be developed to satisfy computational constraints. Another type

of approximation is designed to avoid implementation of an infinite-

dimensional solution. Further study of finite-dimensional approxima-

tions to infinite-dimensional systems and the associated parameteriza-

tion questions is needed.

2. Because the long-range behavior and stability properties of systems

are of interest, another area for futher work is analysis of the

asymptotic behavior of decentralized systems. Furthermore, as in the

case of the time-varying Riccati equation and the steady-state equation,

it is sometimes easier to compute an infinite horizon, rather than a

finite horizon, solution.

3. A comparison of the results of Chapters IV, V and VI indicates the

remarkable effects that changing the information pattern can have on

the optimization problem. To serve as an aid in the design of

decentralized systems, a comparative study of the relationships between

the information structure and the optimization problem is needed.

4. The classical Kalman-Bucy filter and the results in Chapter V show

that significant data reduction is possible when sufficient statistics

exist. The discussion in Chapters IV and V also demonstrated that

the existence of sufficient statistics generated by a finite-dimension-

al system is closely related to separability of operator kernels.

An examination of the classes of systems for which the necessary 134 conditions imply separable kernel optimal solution should lead to a characterization of problems for which the optimal solutions may be expressed in terms of recursively generated finite-dimensional sufficient statistics.

T 135

Appendix I

Operators and Integral Equations

In this appendix we discuss briefly some results on linear operators and integral equations. For more details and proofs, see Balakrishnan [6],

Luenberger [38], Riesz and Sz-Nagy [43], and Tricomi [49].

AI.l. Basic Concepts and Integral Equations

A transformation mapping a vector space X into a vector space Y is called a linear operator if for all xi, x2 sX and all scalars a, a2 ,we have

T(aIx 1 + a 2x2 ) = aIT(x 1 ) + t2 2T(xQ.

Let L[0,T] be the space of n-dimensional vector functions such that each component is a Lebesgue measurable square-integrable function on [0,T]. Suppose M(t,s) is an m x n matrix function defined on the square [0,T]x[O,T] such that

T T tr f M(t,s)M'(t,s)dtds < o , (1.2) 0 0 where tr is the trace operator for matrices. If f c L [OT], then we may define the linear operator M :Ln[0,T] + L[0,T] by

T g(t) = f M(t,s)f(s)ds , (1.3) 0 where g E L7[O,T]. The matrix function M(t,s) is called the kernel of M.

A streamlined notation for (1.3) is

g = Mf. (1.4) 136

* The adjoint operator M is defined by

T g = M f ; g(t) = f M'(s,t)f(s)ds. (1.5) 0

Two types of linear operators deserve special mention. Operators which are causal, i.e.

M(ts) = 0, s > t , (1.6) are called Volterra operators; in this case (1.3) may be written

t g(t) = f M(t,s)f(s)ds (1.7) 0

In many physical situations we are interested in causal operators.

Another special operator is the identity which has the kernel

I(ts) = 16(t - s), (1.8) where 6(t - s) is the Dirac delta function and I is the n x n identity matrix. Although (1.2) is not satisfied by I(t,s), we may still define the linear operator I by

T g(t) = f(t) = f (t- s)f(s)ds , (1.9) or f If.

If we have two linear operators M and M satisfying (1.2) and we define the operator M M M by

T

M(t,s) = f NI(ts1 )M2(s 1,s)ds ,(1.10) 0 then N satisfies (1. 2).

TI 137

We now interpret- Volterra integral equations in terms of linear operators. A Volterra integral equation of the second kind has the form

t f(t) - f M(t,s)f(s)ds = g(t), tE[0,T], (1.11) 0 where gc(L[O,T] is a given function and f is to be determined

(see Tricomi [49]). We may write (I.11) in operator notation as

f - Mf = g. (1.12)

The following result is an existence theorem for (I.11).

Theorem AI.1. If the n x n kernel satisfies (1.2), then the

integral equation f - Mf = g has a solution f belonging to

Ln[(0,T]for every gcLn,0,T]. Furthermore f = (I - M) g,

- - -l where (I - M) satisfies (1.2) and (I - M) may be obtained

from the Neumann series 2 g + Mg + M g + ... (1.13)

Pf: See Balakrishnan [6, p. 103] or Tricomi [49, p. 10] for a proof.

AT.2. Hilbert Spaces and Traces of Operators

By defining the inner product

T T = tr f f M1 (t,s)M'(t,s)dtds, (1.14) 0 0 we obtain the Hilbert space Jeof operators which satisfy (1.2).

This Hilbert space is isomorphic to the L2 space of matrix functions

M(t,s) defined on the square [0,T] x [0,T].

Let the operator A have a square matrix kernel A(t,s), i.e. A(t,s) is n x n. Now suppose {e.}. is an orthonormal basis for Ln[0,T]. If 1 12 138 CO Z [Ae.,e.] < where T [f,g] B f f'(t)g(t)dt, 0

then we define the trace of A by

CO tr(A) B Z [Ae.,e.], (1.15)

Moreover, (1.15) is independent of the basis chosen (see Balakrishnan

[6, Ch. 3]). It may be proven that when tr(A) exists, we have

T tr(A) = f A(t,t)dt. (1.16) 0 * We now derive an expression for tr(A), where A = MN and M, Ne4.

The kernel of A is

T A(t,s) = f M(t , s )N'(s,s )ds (1.17)

Therefore,

T T T tr(A) = f A(t,t)dt = f f M(t,sI)N'(t,s 1)ds dt, (1.18) 0 0 0 which is (1.14) with M = M, M2= N.

Two useful properties of the trace are

A ) (1.19) and tr(A1 A 2 ) =tr(A 2 1 tr(A) = tr(A ). (1.20)

To define the trace of a partitioned operator, let the kernel of A be A (t,s)...Alr(ts)

A(t,s)

Ar (t,s)]...Arr(ts) 139

where the matrices A..(t,s) are square and the A..(t,s), i # j, are 11 3.3 chosen so that A(t,s) is square. Then if tr(A..) exists for all i,

we have r tr(A) E Z tr(A..). (1.21) i=l

AI.3. Covariance Operators

Let x(t), tE[0,T], be a zero-mean n-dimensional second-order

stochastic process. Define the kernel of the covariance operator E by

Zx(t,s) 2 E[x(t)x'(s)] . (1.22)

We use either E or to denote the covariance operator of x. X Exx If y is also a zero-mean second-order vector process, we define the cross- covariance operator by

Z (t,s) E[x(t)y'(s)]. (1.23) xy E Note that since E (t,s) = E'-(t,s), we have xy yx

Z =Z . (1.24) xy yx A necessary and sufficient condition for the kernel E (ts) to be x continuous is that x(-) be a mean-square continuous process (see Wong

[57, Ch. 2]).

If x(-) is mean-square continuous then T T tr(E ) = tr f E[x(t)x'(t)]dt = E f x'(t)x(t)dt (1.25) 0 0 and if y(-) is a stochastic process defined by T y(t) = f M(t,s)x(s)ds with MeV, then with x white-noise or mean-square continuous,

tr(E ) = tr (ME M ). (1.26) y x 140

APPENDIX II

The proof of Theorem 5.4 is based on a factorization theorem for operators on a Hilbert space. Balakrishnan [6, Ch. 3] presents a good discussion of this topic and Kailath [26] applies the theorem for operators on L2 [0,T] to prove the classical linear innovations theorem.

Because (5.8) is actually the same type of integral equation as the classical nonstationary Wiener-Hopf equation, it is not surprising the theory of solving both is the same. Thus, since the proofs are so similar, we only provide a sketch here.

Because E 0 (t) > 0, we have

-1/2 t)E (t) Q(E-1/2t))' = I, (11.1) 60;Q 00;Q -00;Q

1/2 (\ kernel E 2(s))' is real, symmetric, square inte- 06;Q (t)(ts)(Et XcXC;Qt)OO;Q grable, and positive semidefinite. If we define the operator (with "*" denoting operator adjoint)

OO;Q -;Q) +* 1/2(E 2 ) +t (11.2) ee;Q XC c ;Q oe;.Q

M P on L [0,T] , where P = E p., then the factorization theorem implies 2

-1/2 -1/2 * - 6;Q XCX CQ 0eQ

-1/2 1/2* -1/2 9 1/2 (I-(E 16.Q E60 )( E 6;- WEe ), II3 141 where 4is the causal operator on L2 [0,T] whose kernel J?(t,T) solves

(5.8) for the model (5.31), i.e.

t f R(t,)ZXC (T,s)dT + *(ts)E (s) = EXCXC (ts). (11.4) 0 C C CC

Furthermore, the inverse (I - 66;Q 66;Q )~221- (I + S) exists. Thus,

(11/2 +)Is* )E1/2 1*- * VVQ I- 66;Q (I+s) (I+S6) (Z Q)Q )

2 2 2 * (E*-1/2 *) 1/2 * - z/ll2 (-1/ }iE1/ ) S)(I+s1/ OO 6Q6;Q 6S;Q) ( )( +S )(I-(E ) )6) 0

_E1/2 1/2 00;Q 66;Q

- (11.5)

Also,

YE;q (I+S) %9;Q2V. (11.6)

Hence, because the kernel of the operator I+S has the matrix form

(I+S)(t,T), any linear function of Y(-) may be written as a linear function

of V(-), as in (ii) of Theorem 5.4.

t 142

APPENDIX III

The lemma in this appendix relates E(t) to A.(t), i=1,2. Let

T 12 1 (111.1)

and 1 0 0 0 0 0 (111.2) 0 i o o o o where I is the n x n identity and the O's are n x n zero matrices.

Lemma 1: Let 11(t) be defined as

[q(A11 (t) - aA14 (t)) q2 a2 (A2 1 (t) - 4 H(t) = (111.3) (A (t) - a A (t)) [qa1 (A1 1 (t) - A15 (t)) q2 2 1 2 2 5

Then E(t) = 1(t), 0 < t < T.

Pf: At t=0, we have ^1(0) = 0 and x2(0) = 0. Since only the first com-

ponent of a (0), i.e. z(0), is random, we have A.(0) diag[Ezz,00].

Thus,

q z z Zz zO (0) = ] Z(0). (111.4) qZ q2Z0E N z 0z0 2 z0 zn

Next, from (5.91), we compute the differential equation satisfied by

q i(t) 0 (III.5) 2 = 11(t) q2 A2(t) 143

This calculation yields

E1 (t) 0 r-t=d(t)I(t) +H(t) e-'(t) + (Z)- E(t) Ht 0 E2(t)

NAl (t)E (t) 0 - (E (t) - 11(t)) (1II.6) 0 2A21(t)E 2 (t)

Subtracting (111.6) from (5.66), we obtain, with AE(t) -EE(t) - 11(t),

[E1 (t ) AZ(t) = d(t)A(t) + AZ(t) d'(t) - t

E2 (t)] +q A t)(t)E (t) E +AI(t) 0 (111.7) q2 A21(tE 2(t) Thus, (111.4) and (111.7) imply AE(t) = 0. Q.E.D.

-T- 144

REFERENCES

1. Y. Alekal, P. Brunosky, D. Chyung, E. Lee, "The quadratic problem for systems with time delays," IEEE Trans. Automat. Contr., Vol. AC-16, pp. 673-687, Dec. 1971.

2. M. Aoki, Optimization of Stochastic Systems. New York: Academic Press, 1967.

3. M. Aoki, "On decentralized linear stochastic control problems with quadratic cost," IEEE Trans. Automat. Contr., Vol. AC-18, pp. 243-250, June 1973.

4. K. Astr6m, Introduction to Stochastic Control Theory. New York: Academic Press, 1970.

5. M. Athans, "The role and use of the stochastic linear-quadratic- Gaussian problem in control system design," IEEE Trans. Automat. Contr., Vol. AC-16, pp. 529-552, Dec. 1971.

6. A. Balakrishnan, Applied Functional Analysis. New York: Springer- Verlag, 1976.

7. Y. Bar-Shalom and E. Tse, "Dual effect, certainty equivalence, and separation in stochastic control," IEEE Trans. Automat. Contr., Vol. AC-19, pp. 494-500, Oct. 1974.

8. S. Barta and N. Sandell, "Certainty equivalent solutions of quadratic team problems by a decentralized innovations approach," M.I.T. Elec. Syst. Lab. Paper ESL-P-799, Feb. 1978; also submitted to IEEE Trans. Automat. Contr.

9. M. Beckmann, "Decision and team problems in airline reservations," Econometrica, Vol. 26, pp. 134-145, 1958.

10. R. Bellman, Dynamic Programming. Princeton, N.J.: Princeton Univer- sity Press, 1957.

11. V. BeneY, "Existence of optimal stochastic control laws," SIAM J. Contr., Vol. 9, pp. 446-472, 1971.

12. J. Bismut, "An example of interaction between information and control: The transparency of a game," IEEE Trans. Automat. Contr., Vol. AC-18, pp. 518-522, Oct. 1973.

13. C.-T. Chen, Introduction to Linear System Theory. New York: Holt, Rinehart, and Winston, 1970. 145

14. C. Y. Chong and M. Athans, "On the stochastic control of linear systems with different information sets," IEEE Trans. Automat. Contr., Vol. AC-16, pp. 423-430, Oct. 1971.

15. C. Y. Chong, On the Decentralized Control of Large-Scale Systems, Ph.D. Thesis, M.I.T., 1973; also M.I.T. Elec. Syst. Lab. Rep. ESL-R-503, June 1973.

16. M. Davis and P. Varaiya, "Information states in linear stochastic systems," J. Math. Anal. & Appl., Vol. 37, pp. 384-402, 1972.

17. M. Delfour and S. Mitter, "Controllability, observability, and feed- back control of affine hereditary differential systems," SIAM J. Contr., Vol..10, pp. 298-328, May 1972.

18. J. Doob, Stochastic Processes. New York: Wiley, 1953.

19. S. Dreyfus, Dynamic Programming and the Calculus of Variations. New York: Academic Press, 1965.

20. A. Fel'dbaum, Optimal Control Systems. New York: Academic Press, 1966.

21. T. Groves and R. Radner, "Allocation of resources in a team," J. of Econ. Theory, Vol. 4, pp. 415-441, 1972.

22. Y.-C. Ho and K.-C. Chu, "Team decision theory and information structures in optimal control problems - Part I," IEEE Trans. Automat. Contr., Vol. AC-17, pp. 15-22, Feb. 1972.

23. Y.-C. Ho, M. Kastner, E. Wong, "Teams, signaling, and information theory," IEEE Trans. Automat. Contr., Vol. AC-23, pp. 305-312, April 1978.

24. T. Kailath, "An innovations approach to least-squares estimation - Part I: linear filtering in additive white noise," IEEE Trans. Automat. Contr., Vol. AC-13, pp. 646-655, Dec. 1968.

25. T. Kailath, "Fredholm resolvents, Wiener-Hopf equations, and Riccati differential equations," IEEE Trans. Info. Th., Vol. IT-15, pp. 665-672, Nov. 1969.

26. T. Kailath, "A note on least squares estimation by the innovations method," SIAM J. Contr., Vol. 10, pp. 477-486, Aug. 1972.

27. R. Kalman and R. Bucy, "New results in linear filtering and prediction theory," Trans. ASME, J. Basic Engrg., Ser. D. Vol. 83, pp. 95-107, Dec. 1961.

--- 146

28. D. Kleinman, "Optimal control of linear systems with time-delay and observation noise," IEEE Trans. Automat. Contr., Vol. AC-14, pp. 524-527, Oct. 1969.

29. C. Kriebel, "Quadratic teams, information economics, and aggregate planning decisions," Econometrica, Vol. 36, pp. 530-543, 1968.

30. B.-Z. Kurtaran and R. Sivan, "Linear-quadratic-Gaussian control with one-step delay sharing pattern," IEEE Trans. Automat. Contr., Vol. AC-19, pp. 571-574, Oct. 1974.

31. H. Kushner, Stochastic Stability and Control. New York: Academic Press, 1967.

32. R. Kwong and A. Willsky, "Optimal filtering and filter stability of linear stochastic delay systems," IEEE Trans. Automat. Contr., Vol. AC-22, pp. 196-201, April 1977.

33. W. Levine, T. Johnson, M. Athans, "Optimal limited state variable feedback controllers for linear systems," IEEE Trans. Automat. Contr., Vol. AC-16, pp. 785-793, Dec. 1971.

34. A. Lindquist, "Optimal control of linear stochastic systems with applications to time lag systems," Inform. Sci., Vol. 5, pp. 81-126, 1973.

35. A. Lindquist, "On feedback control of linear stochastic systems," SIAM J. Contr., Vol. 11, pp. 323-343, 1973.

36. D. Looze, P. Houpt, M. Athans, Dynamic stochastic control of freeway corridor systems, Vol. III: Dynamic centralized and decentralized control strategies, M.I.T. Elec. Syst. Lab. Rep. ESL-R-610, Aug. 1975.

37. D. Looze, Hierarchical Control and Decomposition of Decentralized Linear Stochastic Systems, Ph.D. Thesis, M.I.T., 1978.

38. D. Luenberger, Optimization by Vector Space Methods. New York: Wiley, 1969.

39. J. Marschak and R. Radner, The Economic Theory of Teams. New Haven, Conn.: Yale University Press, 1971.

40. C. McGuire, "Some team models of a sales organization," Manag. Sci., Vol. 7, pp. 101-130, 1961.

41. R. Radner, "Team decision problems," Ann. Math. Stat., Vol. 33, pp. 857-881, Sept. 1962.

42. I. Rhodes and D. Luenberger, "Stochastic differential games with constrained state estimators," IEEE Trans. Automat. Contr., Vol. AC-14, pp. 476-481, Oct. 1969. 147

43. F. Riesz and B. Sz-Nagy, Functional Analysis. New York: Frederick Ungar, 1955.

44. N. Sandell, Jr., Control of Finite-State, Finite-Memory Stochastic Systems, Ph.D. Thesis, M.I.T., 1974; also M.I.T. Elec. Syst. Lab. Rep. ESL-R-545, May 1974.

45. N. Sandell, Jr. and M. Athans, "Solution of some nonclassical LQG stochastic decision problems," IEEE Trans. Automat. Contr., Vol. AC-19, pp. 108-116, April 1974.

46. N. Sandell, Jr., P. Varaiya, M. Athans, M. Safonov, "Survey of decen- tralized control methods for large scale systems," IEEE Trans. Automat. Contr., Vol. AC-23, pp. 108-128, April 1978.

47. A. Segall, "Centralized and decentralized control schemes for Gauss- Poisson processes," IEEE Trans. Automat. Contr., Vol. AC-23, pp. 47-57, Feb. 1978.

48. M. Toda, Dynamic Team Decision Problems under Uncertainty, Ph.D. Thesis, Univ. California, Los Angeles, 1974.

49. F. Tricomi, Integral Equations. New York: Interscience, 1957.

50. H. Van Trees, Detection, Estimation, and Modulation Theory, Part I. New York: Wiley, 1968.

51. P. Varaiya and J. Walrand, "On delayed sharing patterns," IEEE Trans. Automat. Contr., to appear.

52. P. Whittle and J. Rudge, "The optimal linear solution of a symmetric team control problem," J. Appl. Prob., Vol. 11, pp. 377-381, 1974.

53. J. Willems, The Analysis of Feedback Systems. Cambridge, Mass.: M.I.T. Press, 1971.

54. H. Witsenhausen, "A counterexample in stochastic optimum control," SIAM J. Contr., Vol. 6, pp. 131-147, 1968.

55. H. Witsenhausen, "On information structures, feedback, and causality," SIAM J. Contr., Vol. 9, pp. 149-160, 1971.

56. H. Witsenhausen, "Separation of estimation and control for discrete- time systems," Proc. IEEE, Vol. 59, pp. 1557-1566, Sept. 1971.

57. H. Witsenhausen, "A standard form for sequential stochastic control," Math. Sys. Theory, Vol. 7, pp. 5-11, 1973.

58. E. Wong, Stochastic Processes in Information and Dynamical Systems. New York: McGraw-Hill, 1971.

T 148

59. W. Wonham, "On the separation theorem of -stochastic control," SIAM J. Contr., Vol. 6, pp. 312-326, 1968.

60. T. Yoshikawa, "Dynamic programming approach to decentralized stochas- tic control problems," IEEE Trans. Automat. Contr., Vol. AC-20, pp. 796-797, Dec. 1975.

61. T. Yoshikawa and H. Kobayashi, "Separation of estimation and control for decentralized stochastic control systems," submitted to IFAC/78, Helsinki. 149

Biographical Note

Steven M. Barta was born in Paterson, New Jersey in 1952. He received a B.S. degree Summa Cum Laude in mathematics and physics from Yale University in 1973. In 1976 he received an S.M. degree in Electrical Engineering from the Massachusetts Institute of

Technology. He is a member of Phi Beta Kappa, Sigma Xi, and the

IEEE. His publications are "Stochastic Models of Price

Adjustment" (with ), Annals of Economic and Social

Measurement, Vol. 5, No. 3(1976); "Capital Accumulation in a

Stochastic Decentralized Economy," M.I.T. Electronic Systems

Laboratory Paper ESL-P-793, January, 1978 (submitted to the

Journal of Economic Theory); "Certainty Equivalent Solutions of Quadratic Team Problems by a Decentralized Innovations

Approach" (with Nils Sandell), M.I.T. Electronic Systems

Laboratory Paper ESL-P-799, February, 1978 (submitted to

IEEE Transactions on Automatic Control).