IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 60, NO. 8, AUGUST 2015 2129 Optimal Control of Two-Player Systems With Output Feedback Laurent Lessard, Member, IEEE, and Sanjay Lall, Fellow, IEEE

Abstract—In this article, we consider a fundamental decentral- ized optimal control problem, which we call the two-player prob- lem. Two subsystems are interconnected in a nested information pattern, and output feedback controllers must be designed for each subsystem. Several special cases of this architecture have previously been solved, such as the state-feedback case or the case where the dynamics of both systems are decoupled. In this paper, we present a detailed solution to the general case. The structure of the optimal decentralized controller is reminiscent of that of Fig. 1. Decentralized interconnection. the optimal centralized controller; each player must estimate the state of the system given their available information and apply static control policies to these estimates to compute the optimal controller. The previously solved cases benefit from a separation between estimation and control that allows the associated gains to be computed separately. This feature is not present in general, and some of the gains must be solved for simultaneously. We show that computing the required coupled estimation and control gains amounts to solving a small system of linear equations. Index Terms—Cooperative control, decentralized control, linear Fig. 2. General four-block plant with controller in feedback. systems, optimal control. the measurement y1 received by is the same as that received by . We give a precise definition of our chosen notion of I. INTRODUCTION stability in Section II-B and we further discuss our choice of ANY large-scale systems such as the internet, power stabilization framework in Section III. M grids, or teams of autonomous vehicles, can be viewed The feedback interconnection of Fig. 1 may be represented as a network of interconnected subsystems. A common feature using the linear fractional transformation shown in Fig. 2. of these applications is that subsystems must make control The information flow in this problem leads to a sparsity P K decisions with limited information. The hope is that despite the structure for both and in Fig. 2. Specifically, we find the decentralized nature of the system, global performance criteria following block-lower-triangular sparsity pattern: can be optimized. In this paper, we consider a particular class G K P 11 0 K 11 0 of decentralized control problems, illustrated in Fig. 1, and 22 = G G and = K K (1) develop the optimal linear controller in this framework. 21 22 21 22 G G Fig. 1 shows plants 1 and 2, with associated controllers while P , P , P are full in general. We assume P and C C C 11 12 21 1 and 2. Controller 1 receives measurements only from K are finite-dimensional continuous-time linear time-invariant G C G G 1, whereas 2 receives measurements from both 1 and 2. systems. The goal is to find an H -optimal controller K subject C G G 2 The actions of 1 affect both 1 and 2, whereas the actions to the constraint (1). We paraphrase our main result, found in C G of 2 only affect 2. In other words, information may only Theorem 6. Consider the state-space dynamics flow from left to right. We assume explicitly that the signal u1 G G x˙ A 0 x B 0 u entering 1 is the same as the signal u1 entering 2. A different 1 = 11 1 + 11 1 + w modeling assumption would be to assume that these signals x˙ 2 A21 A22 x2 B21 B22 u2 could be affected by separate noise. Similarly, we assume that y C 0 x 1 = 11 1 + v y2 C21 C22 x2 Manuscript received March 14, 2013; revised January 13, 2014 and July 25, 2014; accepted January 20, 2015. Date of publication February 9, 2015; date where w and v are Gaussian white noise. The objective is of current version July 24, 2015. This work was partly done while L. Lessard to minimize the quadratic cost of the standard LQG problem, was an LCCC postdoc at Lund University in Lund, Sweden. S. Lall is partially subject to the constraint that supported by the U.S. Air Force Office of Scientific Research (AFOSR) under grant MURI FA9550-10-1-0573. Recommended by Associate Editor L. Mirkin. Player 1 measures y1 and chooses u1 L. Lessard is with the Department of Mechanical Engineering, University of California, Berkeley, CA 94720 USA (e-mail: [email protected]). Player 2 measures y ,y and chooses u . S. Lall is with the Department of Electrical Engineering, and the Department 1 2 2 of Aeronautics and Astronautics, Stanford University, Stanford, CA 94305 USA (e-mail: [email protected]). Here we borrow the terminology from game theory, and envis- Digital Object Identifier 10.1109/TAC.2015.2400658 age separate players performing the actions u1 and u2. We will

0018-9286 © 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information. 2130 IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 60, NO. 8, AUGUST 2015 show that the optimal controller has the form [33]. One particularly relevant numerical approach is to use 00 vectorization, which converts the decentralized problem into an u = Kx| + (x| − x| ) y1,uζ HJ y,u y1,uζ equivalent centralized problem [21]. This conversion process results in a dramatic growth in state dimension, and so the where x|y1,uζ denotes the optimal estimate of x using the infor- method is computationally intensive and only feasible for small mation available to Player 1, and x|y,u is the optimal estimate problems. However, it does provide important insight into the of x using the information available Player 2. The matrices problem. Namely, it proves that there exists an optimal linear K, J, and H are determined from the solutions to Riccati and controller for the two-player problem considered herein that is Sylvester equations. The precise meaning of optimal estimate rational. These results are discussed in Section VI-A. and uζ will be explained in Section V. Our results therefore pro- A drawback of such numerical approaches is that they do not vide a type of separation principle for such problems. Questions provide an intuitive explanation for what the controller is doing; of separation are central to decentralized control, and very little there is no physical interpretation for the states of the controller. is known in the general case. Our results therefore also shed In the centralized case, we have such an interpretation. Specifi- some light on this important issue. Even though the controller cally, the controller consists of a Kalman filter whose states are is expressed in terms of optimal estimators, these do not all estimates of the states of the plant together with a static control evolve according to a standard Kalman filter (also known as gain that corresponds to the solution of an LQR problem. the Kalman-Bucy filter), since Player 1, for example, does not Recent structural results [1], [17] reveal that for general classes know u2. of delayed-sharing information patterns, the optimal control The main contribution of this paper is an explicit state-space policy depends on a summary of the information available and formula for the optimal controller, which was not previously a dynamic programming approach may be used to compute known. The realization we find is generically minimal, and it. There are general results also in the case where not all computing it is of comparable computational complexity to information is eventually available to all players [30]. However, computing the optimal centralized controller. The solution also these dynamic programming results do not appear to easily gives an intuitive interpretation for the states of the optimal translate to state-space formulae for linear systems. controller. For linear systems in which each player eventually has access The paper is organized as follows. In the remainder of the to all information, explicit formulae were found in [8]. A simple introduction, we give a brief history of decentralized control case with varying communication delay was treated in [15]. and the two-player problem in particular. Then, we cover Cases where plant data are only locally available have also been some background mathematics and give a formal statement studied [3], [16]. of the two-player optimal control problem. In Section III, we Certain special cases of the two-player problem have been characterize all structured stabilizing controllers, and show solved explicitly and clean physical interpretations have been that the two-player control problem can be expressed as an found for the states of the optimal controller. Most notably, the equivalent structured model-matching problem that is convex. state-feedback case admits an explicit state-space solution using In Section IV, we state our main result which contains the a spectral factorization approach [25]. This approach was also optimal controller formulae. We follow up with a discussion of used to address a case with partial output feedback, in which the structure and interpretation of this controller in Section V. there is output feedback for one player and state feedback for The subsequent Section VI gives a detailed proof of the main the other [26]. The work of [24] also provided a solution to the result. Finally, we conclude in Section VII. state-feedback case using the Möbius transform associated with the underlying poset. Certain special cases were also solved A. Prior Work in [6], which gave a method for splitting decentralized opti- mal control problems into multiple centralized problems. This If we consider the problem of Section I but remove the splitting approach addresses cases other than state-feedback, structural constraint (1), the problem becomes the well-studied including partial output-feedback, and dynamically decoupled classical H synthesis, solved for example in [34]. The optimal 2 problems. controller is then linear and has as many states as the plant. In this article, we address the general two-player output- The presence of structural constraints greatly complicates feedback problem. Our approach is perhaps closest technically the problem, and the resulting decentralized problem has been to the work of [25] using spectral factorization, but uses the outstanding since the seminal paper by Witsenhausen [29] factorization to split the problem in a different way, allowing in 1968. Witsenhausen posed a related problem for which a a solution of the general output-feedback problem. We also nonlinear controller strictly outperforms all linear policies. Not provide a meaningful interpretation of the states of the optimal all structural constraints lead to nonlinear optimal controllers, controller. This paper is a substantially more general version of however. For a broad class of decentralized control problems the invited paper [12] and the conference paper [13], where the there exists a linear optimal policy [4]. Of recent interest have special case of stable systems was considered. We also mention been classes of problems for which finding the optimal linear the related work [9] which addresses star-shaped systems in the controller amounts to solving a convex optimization problem stable case. [18], [20], [28]. The two-player problem considered in the present work is one such case. II. PRELIMINARIES Despite the benefit of convexity, the search space is infinite- dimensional since we must optimize over transfer functions. We use Z+ to denote the nonnegative integers. The imaginary Several numerical and analytical approaches for addressing unit is i, and we denote the imaginary axis by iR. A square decentralized optimal control exist, including [18], [19], [23], matrix A ∈ Rn×n is Hurwitz if all of its eigenvalues have a LESSARD AND LALL: OPTIMAL CONTROL OF TWO-PLAYER SYSTEMS WITH OUTPUT FEEDBACK 2131

n×n strictly negative real part. The set L2(iR),orsimplyL2,isa Under these conditions, there exists a unique X ∈ R sat- Hilbert space of Lebesgue measurable matrix-valued functions isfying (i). This X is symmetric and positive semidefinite, F :iR →Cm×n with the inner product and is called the stabilizing solution of the ARE. As a short ∞ form we will write (X, K)=ARE(A, B, C, D) where K = 1 − T −1 T T F, G = trace (F ∗(iω)G(iω)) dω (D D) (B X + D C) is the associated gain. 2π −∞

1/2 A. Block-Triangular Matrices such that the inner product induced norm F2 = F, F is Lm×n bounded. We will sometimes write 2 to be explicit about Due to the triangular structure of our problem, we also the matrix dimensions. As is standard, is a closed subspace require notation to describe sets of block-lower-triangular ma- of L2 with matrix functions analytic in the open right-half trices. To this end, suppose R is a commutative ring, m, n ∈ H⊥ H L Z2 ≥ plane. 2 is the orthogonal complement of 2 in 2. We write + and mi,ni 0. We define lower(R, m, n) to be the set Rp to denote the set of proper real rational transfer functions. of block-lower-triangular matrices with elements in R parti- We also use R as a prefix to modify other sets to indicate tioned according to the index sets m and n. That is, X ∈ the restriction to real rational functions. So RL2 is the set of lower(R, m, n) if and only if strictly proper rational transfer functions with no poles on the RH2 RL imaginary axis, and is the stable subspace of 2.The X11 0 mi×nj X = where Xij ∈ R . set of stable proper transfer functions is denoted RH∞.For X21 X22 the remainder of this paper, whenever we write G2, it will always be the case that G∈RH2.EveryG∈Rp has a state- We sometimes omit the indices and simply write lower(R). space realization T T We also define the matrices E1 =[I 0] and E2 =[0 I ] , A B − with dimensions to be inferred from context. For example, G = = D + C(sI − A) 1B ∈ R CD if we write XE1 where X lower( ,m,n), then we mean (n +n )×n E1 ∈ R 1 2 1 . When writing A ∈ lower(R), we allow for with the possibility that some of the blocks may be empty. For −AT CT example, if m1 =0then we encounter the trivial case where ∗ × G = lower(R, m, n)=Rm2 (n1+n2). −BT DT There is a correspondence between proper transfer functions ∗ where G is the conjugate transpose of G. If this realization G∈Rp and state-space realizations (A, B, C, D). The next is chosen to be stabilizable and detectable, then G∈RH∞ if result shows that a specialized form of this correspondence and only if A is Hurwitz, and G∈RH2 if and only if A is exists when G∈lower(Rp). Hurwitz and D =0. For a thorough introduction to these topics, Lemma 1: Suppose G∈lower(Rp,k,m), and a realization G ∈ Z2 see [34]. for is given by (A, B, C, D). Then there exists n + and The plant P∈Rp maps exogenous inputs w and actuator an invertible matrix T such that inputs u to regulated outputs z and measurements y. We seek a TAT−1 ∈ lower(R,n,n) TB ∈ lower(R,n,m) control law u = Ky where K∈Rp so that the closed-loop map −1 has some desirable properties. The closed-loop map is depicted CT ∈ lower(R,k,n) D ∈ lower(R,k,m). in Fig. 2, and corresponds to the equations Proof: Partition the state-space matrices according to the z P P w partition imposed by k and m. = 11 12 ,u= Ky. (2) y P21 P22 u ⎡ ⎤ A B B G 1 2 G 11 0 ⎣ ⎦ The closed-loop map from w to z is given by the lower = G G = C1 D11 0 . P K 21 22 linear fractional transform (LFT) defined by F( , ):= C2 D21 D22 −1 P11 + P12K(I −P22K) P21.IfK = F(J , Q) and J has a proper inverse, the LFT may be inverted according to Q = Note that we immediately have D ∈ lower(R,k,m).How- −1 Fu(J , K), where Fu denotes the upper LFT, defined as ever, A, B, and C need not have the desired structure. If M K M M K −M K −1M Fu( , ):= 22 + 21 (I 11 ) 12. k1 =0or m2 =0, then G12 is empty, the sparsity pattern is We will express the results of this paper in terms of the trivial, and any realization (A, B, C, D) will do. Suppose G12 solutions to algebraic Riccati equations (ARE) and recall here is non-empty and let T be the matrix that transforms G12 into the basic facts. If DTD>0, then the following are equivalent. Kalman canonical form. There are typically four blocks in such n×n (i) There exists X ∈ R such that a decomposition, but since G12 =0, there can be no modes that are both controllable and observable. Apply the same ATX + XA + CTC T -transformation to G, and obtain the realization − (XB + CTD)(DTD)−1(BTX + DTC)=0 ⎡ ⎤ Aco¯ 00 B11 0 − ⎢ A A 0 B 0 ⎥ and A − B(DTD) 1(BTX + DTC) is Hurwitz ⎢ 21 c¯o¯ 21 ⎥ G ⎢ A A A B B ⎥ A − iωI B = ⎢ 31 32 co¯ 31 co¯ ⎥ . (3) (ii) (A, B) is stabilizable and has full col- ⎣ ⎦ CD Cco¯ 00 D11 0 umn rank for all ω ∈ R. C21 C22 C23 D21 D22 2132 IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 60, NO. 8, AUGUST 2015

unclear which the relevant definition of decentralized stability is in practice. We therefore stick to the well-established notion of closed-loop stability used for centralized control systems. In this section, we provide a state-space characterization of stabilization when both the plant and controller have a block- lower-triangular structure. Specifically, we give necessary and sufficient conditions under which a block-lower-triangular sta- Fig. 3. Feedback loop with additional inputs and outputs for analysis of bilizing controller exists, and we provide a parameterization stabilization of two-input two-output systems. of all such controllers akin to the Youla parameterization [32]. Many of the results in this section appeared in [11] and similar This realization has the desired sparsity pattern, and we results also appeared in [18]. Throughout this section, we notice that there may be many admissible index sets n.For assume that the plant P satisfies example, the modes Ac¯o¯ can be included into either the A11 ⎡ ⎤ A B B block or the A22 block. Note that (3) is an admissible realization P P 1 2 even if some of the diagonal blocks of A are empty.  11 12 ⎣ ⎦ is a minimal P P = C1 0 D12 (5) Lemma 1 also holds for more general sparsity patterns [11]. 21 22 realization. C2 D21 0

B. Problem Statement We further assume that the P22 subsystem is block-lower- triangular. Since we have P22 ∈ lower(Rp), Lemma 1 allows We seek controllers K∈Rp such that when they are con- P∈R us to assume without loss of generality that the matrices A, B2, nected in feedback to a plant p as in Fig. 2, the plant and C have the form is stabilized. Consider the interconnection in Fig. 3. We say 2 that K stabilizes P if the transfer function (w, u1,u2) → A11 0 B11 0 C11 0 A := B2 := C2 := (z,y1,y2) is well-posed and stable. Well-posedness amounts to A21 A22 B21 B22 C21 C22 I −P22(∞)K(∞) being nonsingular. (6) × × × The problem addressed in this article may be formally stated where A ∈ Rni nj , B ∈ Rni mj and C ∈ Rki nj .The P∈R ij ij ij as follows. Suppose p is given. Further suppose that following result gives necessary and sufficient conditions under P ∈ lower(R ,k,m). The two-player problem is 22 p which a there exists a structured stabilizing controller. −1 P∈R P ∈ R minimize P11 + P12K(I −P22K) P21 Lemma 2: Suppose p and 22 lower( p,k,m). 2 Let (A, B, C, D) be a minimal realization of P that satisfies subject to K∈lower(R ,m,k) p (5) and (6). There exists K ∈ lower(R ,m,k) such that K K P 0 p 0 stabilizes . (4) stabilizes P if and only if both We will also make some additional standard assumptions. (i) (C11,A11,B11) is stabilizable and detectable, and We will assume that P11 and P22 are strictly proper, which (ii) (C22,A22,B22) is stabilizable and detectable. ensures that the interconnection of Fig. 3 is always well-posed, In this case, one such controller is and we will make some technical assumptions on P and 12 − P in order to guarantee the existence and uniqueness of the A + B2Kd + LdC2 Ld 21 K0 = (7) optimal controller. The first step in our solution to the two- Kd 0 player problem (4) is to deal with the stabilization constraint. This is the topic of the next section. K1 L1 where Kd := and Ld := and Ki and K2 L2 III. STABILIZATION OF TRIANGULAR SYSTEMS Li are chosen such that Aii + BiiKi and Aii + LiCii are Hurwitz for i =1, 2. Closed-loop stability for decentralized systems is a sub- Proof: (⇐=) Suppose that (i) and (ii) hold. Note that tle issue. In the centralized case, the core idea of pole-zero A + B2Kd and A + LdC2 are Hurwitz by construction, thus cancellation is ancient, and this was beautifully extended to (C2,A,B2) is stabilizable and detectable, and it is thus immedi- multivariable system in the Desoer—Chan theory of closed- ate that (7) is stabilizing. Due to the block-diagonal structure of loop stability [2], where the interconnection of the plant and Kd and Ld, it is straightforward to verify that K0 ∈ lower(Rp). controller in Fig. 3 is considered closed-loop stable if and only (=⇒) Suppose K0 ∈ lower(Rp,m,k) stabilizes P. Because (w, u ,u ) → (z,y ,y ) if the transfer function 1 2 1 2 is well-posed P is minimal, we must have that the realization (C2,A,B2) and stable. Other modeling assumptions are possible, where one is stabilizable and detectable. By Lemma 1, we may assume either includes different plant uncertainty or different injected that K0 has a minimal realization with AK ,BK ,CK ,DK ∈ and output signals. These assumptions would lead to different lower(R). The closed-loop generator A¯ is Hurwitz, where definitions of stability. For SISO systems, these two notions − were shown to be equivalent, and thus robustness to noise added A 0 B 0 I −D 1 0 C A¯ = + 2 K K . to communication signals between the plant and the controller 0 AK 0 BK 0 I C2 0 is equivalent to this type of robustness to plant modeling error. Several works have proposed extensions of these ideas to It follows that decentralized systems, including [27], [31]. However, it is as yet A 0 B 0 poorly understood exactly what the correspondence is between , 2 is stabilizable plant uncertainty and signal uncertainty, and also it remains 0 AK 0 BK LESSARD AND LALL: OPTIMAL CONTROL OF TWO-PLAYER SYSTEMS WITH OUTPUT FEEDBACK 2133

and hence by the PBH test, (A, B2) and (AK ,BK ) are stabiliz- Proof: If we relax the constraint that Q be lower trian- able and similarly (C2,A) and (CK ,AK ) are detectable. gular, then this is the standard parameterization of all central- Each block of A¯ is block-lower-triangular. Viewing A¯ as a ized stabilizing controllers [34, Theorem 12.8]. It suffices to block 4 × 4 matrix, transform A¯ using a matrix T that permutes show that the map from Q to K and its inverse are structure- −1 states 2 and 3. Now T AT¯ is Hurwitz implies that the two preserving. Since each block of the state-space realization of Jd 2 × 2 blocks on the diagonal are Hurwitz. But these diagonal is in lower(R),wehave(Jd)ij ∈ lower(Rp). Thus F(Jd, ·) blocks are precisely the closed-loop A¯ matrices corresponding preserves lower triangularity on its domain. This also holds for J −1 · to the 11 and 22 subsystems. Applying the PBH argument to the inverse map Fu( d , ), since −1 ¯ the diagonal blocks of T AT implies that (Cii,Aii,Bii) is ⎡ ⎤  stabilizable and detectable for i =1, 2 as desired. A B2 −Ld Note that the centralized characterization of stabilization, J −1 ⎣ ⎦ d = C2 0 I . as in [34, Lemma 12.1], only requires that (C2,A,B2) be −Kd I 0 stabilizable and detectable. The conditions in Lemma 2 are  stronger because of the additional structural constraint. As in the centralized case, this parameterization of stabilizing We may also characterize the stabilizability of just the 22 controllers allows us to rewrite the closed-loop map in terms block, which we state as a corollary. of Q. After some simplification, we obtain F (P, K)=T + Corollary 3: Suppose P ∈ lower(R ,k,m), and let  11 22 p T QT where T has the realization (A, B ,C ,D ) be a minimal realization of P that satisfies 12 21 2 2 22 22 ⎡ ⎤ (6). There exists K0 ∈ lower(Rp,m,k) such that K0 stabilizes − AKd B2Kd B1 B2 P22 if and only if both ⎢ ⎥ T11 T12 ⎢ 0 ALd BLd 0 ⎥ (i) (C ,A ,B ) is stabilizable and detectable, and T = ⎣ ⎦ (9) 11 11 11 21 0 CKd −D12Kd 0 D12 (ii) (C22,A22,B22) is stabilizable and detectable. 0 C2 D21 0 Indeed, there exist block-lower transfer matrices that cannot be stabilized by a block-lower controller. For example and we have used the following shorthand notation: ⎡ ⎤ −10 0 10 AKd := A + B2Kd ALd := A + LdC2 ⎢ 010 10⎥ 1 0 ⎢ ⎥ s+1 ⎢ − ⎥ CKd := C1 + D12Kd BLd := B1 + LdD21. (10) P22 = = ⎢ 00 1 01⎥ . 1 1 ⎣ ⎦ s−1 s+1 100 00 011 00 Combining the results above gives the following important equiv- alence between the two-player output-feedback problem (4) The above realization is minimal, but the grouping of states and a structured model-matching problem. into blocks is not unique. We may group the unstable mode Corollary 5: Suppose the conditions of Theorem 4 hold. Q either in the A11 block or in the A22 block, which corre- Then opt is a minimizer for sponds to n =(2, 1) or n =(1, 2), respectively. The former leads to an undetectable (C11,A11) while the latter leads to T T QT  minimize 11 + 12 21 2 an unstabilizable (A22,B22). By Corollary 3, this plant cannot be stabilized by a block-lower-triangular controller. However, subject to Q∈lower(RH∞) (11) centralized stabilizing controllers exist due to the minimality of the realization. if and only if Kopt = F(Jd, Qopt) is a minimizer for the the P Note that a stabilizable 22 may have an off-diagonal block two-player output-feedback problem (4). Here Jd is given by that is unstable. An example is (8), and T is defined in (9). Furthermore, Qopt is unique if and only if Kopt is unique. 1 0 −20 P = s−1 K = . Corollary 5 gives an equivalence between the two-player 22 1 1 0 00 s−1 s+1 output-feedback problem (4) and the two-player stable model- matching problem (11). The new formulation should be easier We now treat the parameterization of all stabilizing controllers. to solve than the output-feedback version because it is convex The following result was proved in [18]. and the Tij are stable. However, its solution is still not straight- Theorem 4: Suppose the conditions of Lemma 2 hold, and forward, because the problem remains infinite-dimensional and (C11,A11,B11) and (C22,A22,B22) are both stabilizable and there is a structural constraint on Q. detectable. Define Kd and Ld as in Lemma 2. The set of all K∈lower(Rp,m,k) that stabilize P is IV. MAIN RESULT {F(Jd, Q)|Q ∈ lower(RH∞,m,k)} In this section, we present our main result: a solution to where the two-player output-feedback problem. First, we state our ⎡ ⎤ assumptions and list the equations that must be solved. We A + B2Kd + LdC2 −Ld B2 assume the plant satisfies (5) and the matrices A, B2, C2 have J = ⎣ ⎦ . (8) d Kd 0 I the form (6). We further assume that A11 and A22 have non- − C2 I 0 empty dimensions, so ni =0 . This avoids trivial special cases 2134 IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 60, NO. 8, AUGUST 2015

and allows us to streamline the results. To ease notation, we Theorem 6: Suppose P∈Rp satisfies (5), (6) and define the following cost and covariance matrices: Assumptions A1–A6. Consider the two-player output feedback problem as stated in (4) T T QS C1 C1 C1 D12 T −1 T := T T = C1 D12 [ C1 D12 ] minimize P11 + P12K(I −P22K) P21 S R D12C1 D12D12 2 subject to K∈lower (R ,m,k) T T T T p WU B1B1 B1D21 B1 B1 K P := T T = . (12) stabilizes UV D21B1 D21D21 D21 D21 (i) There exists a unique optimal K. Our main assumptions are as follows. (ii) Suppose Φ, Ψ satisfy the linear (14) and (15). Then the A1) DT D > 0 optimal controller is 12 12 ⎡ ⎤ A2) (A11,B11) and (A22,B22) are stabilizable A + B2K + LCˆ 2 0 −Lˆ − A iωI B2 ⎣ B K − B KAˆ + LC + B Kˆ −L ⎦ A3) has full column rank for all ω ∈ R Kopt = 2 2 2 2 C1 D12 − ˆ ˆ T K K K 0 A4) D21D21 > 0 (19) A5) (C11,A11) and (C22,A22) are detectable ˆ ˆ A − iωI B where K, L, K, L are defined in (13) and (16). A6) 1 has full row rank for all ω ∈ R (iii) There exist Φ, Ψ satisfying (14) and (15). C2 D21 Proof: A complete proof is provided in Section VI.  We will also require the solutions to four AREs An alternative realization for the optimal controller is ⎡ ⎤ ˆ ˆ (X, K)=ARE(A, B2,C1,D12) A + B2K + LC2 0 L − ˆ ˆ − ˆ T T T T T K = ⎣ LC2 LC2 A + LC2 + B2KLL ⎦ . (Y,L )=ARE A ,C2 ,B1 ,D21 opt − − ˆ (X,J˜ )=ARE(A ,B ,C E ,D E ) K K 0 22 22 1 2 12 2 (20) ˜ T T T T T T T (Y,M )=ARE A11,C11,E1 B1 ,E1 D21 . (13)

Finally, we must solve the following simultaneous linear equa- V. S TRUCTURE OF THE OPTIMAL CONTROLLER × ∈ Rn2 n1 tions for Φ, Ψ : In this section, we will discuss several features and conse- quences of the optimal controller exhibited in Theorem 6. First, (A + B J)TΦ+Φ(A + MC ) 22 22 11 11 a brief discussion on duality and symmetry. We then show a − (X˜ − X ) ΨCT + U T V −1C 22 11 12 11 11 structural result, that the states of the optimal controller have a natural stochastic interpretation as minimum-mean-square- + XA˜ + J TST + Q − X MC =0 (14) 21 12 21 21 11 error estimates. This will lead to a useful interpretation of the T matrices Φ and Ψ defined in (14) and (15). We then compute (A22 + B22J)Ψ + Ψ(A11 + MC11) the H -norm of the optimally controlled system, and charac- − B R−1 BT Φ+ST (Y˜ − Y ) 2 22 22 22 12 11 terize the portion of the cost attributed to the decentralization ˜ T T − constraint. Finally, we show how our main result specializes to + A21Y + U12M + W21 B22JY21 =0 (15) many previously solved cases appearing in the literature. × and define the associated gains Kˆ ∈ R(m1+m2) (n1+n2) and × A. Symmetry and Duality Lˆ ∈ R(n1+n2) (k1+k2) The solution to the two-player output-feedback problem has ˆ 00 nice symmetry properties that are perhaps unexpected given the K := − −1 T T (16) R22 B22Φ+S12 J network topology. Player 2 sees more information than Player 1, so one might expect Player 2’s optimal policy to be more ˆ M 0 L := − T T −1 . (17) complicated than that of Player 1. Yet, this is not the case. ΨC11 + U12 V11 0 Player 2 observes all the measurements but only controls u2,so only influences subsystem 1. In contrast, Player 1 only observes For convenience, we define the Hurwitz matrices subsystem 1, but controls u1, which in turn influences both A := A + B KA:= A + LC subsystems. This duality is reflected in (13)–(16). K 2 L 2 If we transpose the system variables and swap Player 1 and AJ := A22 + B22JAM := A11 + MC11 Player 2, then every quantity related to control of the original Aˆ := A + B Kˆ + LCˆ . (18) system becomes a corresponding quantity related to estimation 2 2 of the transformed system. More formally, if we define Note that AK ,AL,AJ ,AM are all Hurwitz by construction, and T T ‡ A22 A12 Aˆ is Hurwitz as well, because it is block-lower-triangular and A = T T A21 A11 its block-diagonal entries are AM and AJ . The matrices Φ and Ψ have physical interpretations, as do the gains Kˆ and Lˆ. → ‡ ‡ ‡ then the transformation (A, B2,C2) (A ,C2,B2) leads to These will be explained in Section V. The main result of this paper is Theorem 6, given below. (X, K) → (Y ‡,L‡), (X,ˆ Kˆ ) → (Yˆ ‡, Lˆ‡), and Aˆ → Aˆ‡. LESSARD AND LALL: OPTIMAL CONTROL OF TWO-PLAYER SYSTEMS WITH OUTPUT FEEDBACK 2135

This is analogous to the well-known correspondence between which we write compactly as q˙ = Acq + Bcw.LetΘ be Kalman filtering and optimal control in the centralized case. the controllability Gramian, i.e., the solution Θ ≥ 0 to the T T Lyapunov equation AcΘ+ΘAc + BcBc =0. Then B. Gramian Equations ⎡ ⎤ Z 00 In this subsection, we derive useful algebraic relations that Θ=⎣ 0 Yˆ − Y 0 ⎦ follow from Theorem 6 and provide a stochastic interpretation. 00Y Recall the plant equations, written in differential form

x˙ = Ax + B1w + B2u where Z satisfies the Lyapunov equation y = C2x + D21w. (21) A Z + ZAT + LVˆ LˆT =0. The optimal controller (19) is given in Theorem 6. Label its K K states as ζ and ξ, and place each equation into the following canonical observer form: Proof: Uniqueness of Θ follows because Ac is Hurwitz, T and Θ ≥ 0 follows because Ac is Hurwitz and BcB ≥ 0.The ˙ ˆ c ζ = Aζ + B2uˆ − L(y − C2ζ) (22) solution can be verified by direct substitution and by making uˆ = Kζ (23) use of the identities in Lemma 7.  ξ˙ = Aξ + B u − L(y − C ξ) (24) The observation in Theorem 8 lends itself to a statistical 2 2 interpretation. If w is white Gaussian noise with unit intensity, ˆ u = Kζ + K(ξ − ζ). (25) the steady-state distribution of the random vector q will have Θ Θ We will see later that this choice of coordinates leads to a covariance . Since is block-diagonal, the steady-state dis- ζ ξ − ζ x − ξ natural interpretation in terms of steady-state Kalman filters. tributions of the components , , and are mutually Our first result is a Lyapunov equation unique to the two-player independent. As expected from our discussion on duality, the problem. algebraic relations derived in Lemma 7 may be dualized to Lemma 7: Suppose Φ, Ψ is a solution of (14) and (15). obtain a new result, which we now state without proof. Lemma 9: Suppose Φ, Ψ is a solution of (14) and (15). Define Lˆ and Aˆ according to (16)–(18). There exists a unique Define Kˆ and Aˆ according to (16)–(18). There exists a unique matrix Yˆ satisfying the matrix Xˆ satisfying the equation Aˆ(Yˆ − Y )+(Yˆ − Y )AˆT +(Lˆ − L)V (Lˆ − L)T =0. (26) AˆT(Xˆ −X)+(Xˆ −X)Aˆ +(Kˆ −K)TR(Kˆ −K)=0. (28) Further, Yˆ ≥ Y , Yˆ = Y˜ , Yˆ =Ψ, and 11 21 ˆ − ˆ T T −1 T L = YC2 + U E1V11 E1 . (27) Further, Xˆ ≥ X, Xˆ22 = X˜, Xˆ21 =Φ, and Proof: Since Aˆ is stable and (Lˆ − L)V (Lˆ − L)T ≥ 0,it −1 T T T Kˆ = −E2R E B Xˆ + S . (29) follows from standard properties of Lyapunov equations that 22 2 2 (26) has a unique solution and it satisfies Yˆ − Y ≥ 0. Right- The dual of Theorem 8 can be obtained by expressing the multiplying (26) by E1 gives closed-loop map in the coordinates (x, x − ζ,x − ξ) instead. In ˆ ˆ − ˆ − T A(YE1 YE1)+(YE1 YE1)AM these coordinates, one can show that the observability Gramian T T is also block-diagonal. +(Lˆ − L)V (E1M − L E1)=0. This equation splits into two Sylvester equations, the first in ˆ ˆ Y11, and the second in Y21. Both have unique solutions. Upon C. Estimation Structure comparison with (13), (15), (16), one finds that Yˆ11 = Y˜ and In this subsection, we show that the states ζ and ξ of optimal Yˆ =Ψ. Note that, since Φ is fixed, Ψ is uniquely determined. 21 controller (22)–(25) may be interpreted as suitably defined Similarly comparing with (16) verifies (27).  Kalman filters. Given the dynamics (21), the estimation map Our next observation is that under the right coordinates, and corresponding Kalman filter xˆ is given by the controllability Gramian of the closed-loop map is block- ⎡ ⎤ diagonal. A B1 B2 − x ⎣ ⎦ w AL LB2 y Theorem 8: Suppose we have the plant described by (21) and = I 00 , xˆ = the optimal controller given by (22)–(25). The closed-loop map y u I 00u C2 D21 0 for one particular choice of coordinates is (30) ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ˙ ˆ ˆ ζ AK −LC2 −LC2 ζ where L and AL are defined in (13) and (18) respectively. We ⎣ ˙ ˙ ⎦ ⎣ ⎦ ⎣ ⎦ ξ − ζ = 0 Aˆ (Lˆ − L)C2 ξ − ζ will show that this Kalman filter is the result of an H2 optimiza- x˙ − ξ˙ 00 AL x − ξ tion problem and we will define it as such. This operator-based ⎡ ⎤ definition will allow us to make precise claims without the need −ˆ LD21 for stochastics. Before stating our definition, we illustrate an ⎣ ˆ − ⎦ + (L L)D21 w important subtlety regarding the input u; in order to make the B1 + LD21 notion of a Kalman filter unambiguous, one must be careful to 2136 IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 60, NO. 8, AUGUST 2015

specify which signals will be treated as inputs. Consider the then we use the notation x|y,u to mean following numerical instance of (30): ⎡ ⎤ Gest y − x|y,u := . 4 30 1 − u x ⎣ ⎦ w 5 11y = 1000 , xˆ = . y u 100u 1010 This notation is motivated by the property that (31) − G −GestG Suppose we know the control law u. We may then eliminate u x x|y,u = 11 1 21 w. from the estimator (31). For example ⎡ ⎤ The stochastic interpretation of Definition 10 is therefore that −51 1 Gest is chosen to minimize the mean square estimation error 1 1 1 u = y leads to xˆ = ⎣ 01 1⎦ y. (32) assuming w is white Gaussian noise with unit intensity. In the 10 10 0 following lemma, we show that the quantity x|y,u as defined in Definitions 10 and 11 is the usual steady-state Kalman filter for The new estimator dynamics are unstable because we chose an estimating the state x using the measurements y and inputs u, unstable control law. Consider estimating x again, but this time as given in (30). using the control law apriori. Eliminating u from the plant first Lemma 12: Let G be ⎡ ⎤ and then computing the Kalman filter leads to A B B ⎡ ⎤ 1 2 ⎡ ⎤ G = ⎣ ⎦ −41 30 I 00 ⎢ ⎥ −71 3 x 11 01 C2 D21 0 = ⎢ ⎥ w, xˆ = ⎣ −12 1 13 ⎦ y. y ⎣ 10 00⎦ 10 0 and suppose Assumptions A4–A6 hold. Then 10 01 AL −LB2 (33) Gest = The estimators (32) and (33) are different. Indeed I 00 where L and A are given by (13) and (18) respectively. s 3s +10 L xˆ = y and xˆ = y Proof: A proof of a more general result that includes +4s − 5 s2 +6s +5 preview appears in [7, Theorem 4.8]. Roughly, one begins F Gest F¯ by applying the change of variables = 1 + , and then in (32) and (33) respectively. In general, open-loop and closed- parameterizing all admissible F¯ using a coprime factorization loop estimation are different so any sensible definition of of G21. This yields the following equivalent unconstrained Kalman filtering must account for this by specifying which problem: inputs (if any) will be estimated in open-loop. For centralized AL B1 + LD21 AL B1 + LD21 optimal control, the correct filter to use is given by (30)–(31). min −Q I 0 C D Our definition of an estimator is as follows. Note that similar 2 21 2 optimization-based definitions have appeared in the literature, s.t. Q∈RH2. as in [22]. (p +p )×(q +q ) One may check that Q =0is optimal for the unconstrained Definition 10: Suppose G∈R 1 2 1 2 is given by p problem, and hence F¯ =0as required, by verifying the fol- lowing orthogonality condition: G G G 11 12 ∗ = AL B1 + LD21 AL B1 + LD21 ⊥ G21 G22 ∈H 2 . I 0 C2 D21 (34) and G21(iω) has full row rank for all ω ∈ R ∪{∞}. Define the × To see why (34) holds, multiply the realizations together and G Gest ∈Rp1 (p2+q1) estimator of to be p partitioned according use the state transformation (x1,x2) → (x1 − Yx2,x2), where Gest Gest Gest to =[ 1 2 ] where Y is given in (13). See [34, Lemma 14.3] for an example of such a computation.  Gest G −FG  1 =argmin 11 21 2 Comparing the result of Lemma 12 with the ξ-state of the F∈RH 2 G −FG ∈RH 11 21 2 optimal two-player controller (24), we notice the following consequence. Gest G −GestG 2 = 12 1 22. Corollary 13: Suppose the conditions of Theorem 6 are satisfied, and the controller states are labeled as in (22)–(25). Note that under the assumptions of Definition 10, the estima- Then x|y,u = ξ, where the estimator is defined by the map tor Gest is unique. We will show existence of Gest for particular G :(w, u) → (x, y) induced by the plant (21). G below. We now define the following notation. The above result means that ξ, one of the states of the optimal Definition 11: Suppose G and Gest are as in Definition 10. If controller, is the usual Kalman filter estimate of x given y and u. The next result is more difficult, and gives the analogous x w result for the other state. The next theorem is our main structural = G y u result. We show that ζ may also be interpreted as an optimal estimator in the sense of Definitions 10 and 11. LESSARD AND LALL: OPTIMAL CONTROL OF TWO-PLAYER SYSTEMS WITH OUTPUT FEEDBACK 2137

Theorem 14: Suppose the conditions of Theorem 6 are satis- centralized case, the optimal estimation gain does not depend fied, and label the states of the controller as in (22)–(25). Define on the choice of control policy. also Note that using u1 or uˆ instead of uζ in Theorem 14 yields an incorrect interpretation of ζ. In these coordinates ˆ ˆ uζ := (K − K)ζ, uξ := Kξ. (35) K ˆ ˆ A + B2E2E2 + LC2 LE1 B2E1 y ζ = 1 or so that u = uζ + uξ. Then I 00u1 ⎡ ⎤ ⎡ ⎤ ˆ ˆ x ζ A + B2K + LC2 LE1 B2 y ⎣ ⎦ ⎣ ⎦ ζ = 1 . ξ = ζ . I 00uˆ u uˆ |y1,uζ In general, neither of these maps is stable, so for these choices The estimator is defined by the map G :(w, uζ ) → (x, ξ, u, y1) of signals, ζ cannot be an estimator of x in the sense of induced by (35), the plant (21), and the controller (22)–(25). Definitions 10 and 11. Proof: The proof parallels that of Lemma 12, so we omit We can now state the optimal two-player controller in the most of the details. Straightforward algebra gives following simple form: ⎡ ⎤ AB2 B1 B2 00 u = Kx|y + (x|y,u − x|y ,u ) (36) ⎢ − ˆ − ⎥ 1,uζ 1 ζ ⎢ LC2 A + LC2 + B2K LD21 B2 ⎥ HJ ⎢ ⎥ ⎢ I 000⎥ G = ⎢ ⎥ . ˆ ⎢ 0 I 00⎥ where H := K21. These equations make clear that Player 1 is ⎣ ⎦ 0 K 0 I using the same control action that it would use if the controller was centralized with both players only measuring y . We can ETC 0 ETD 0 1 1 2 1 21 also see that the control action of Player 2 has an additional F Gest F¯ correction term, given by the gain matrix [ HJ] multiplied After substituting = 1 + , computing the coprime fac- torization, and changing to more convenient coordinates, the by the difference between the players’ estimates of the states. resulting unconstrained optimization problem is Note that uζ is a function only of y1, as can be seen from the ⎡ ⎤ state-space equations Aˆ (Lˆ − L)C (Lˆ − L)D 2 21 ˆ ⎢ ⎥ ζ = AK ζ − LE1(y1 − [ C11 0]ζ) ⎢ 0 AL B2 + LD21 ⎥ ⎢ ⎥ ˆ minimize ⎢ II 0 ⎥ −QD uζ =(K − K)ζ. ⎣ ⎦ I 00 The orthogonality relationships of the form (34) used in K 00 2 Lemma 12 and Theorem 14 to solve the H2 optimization problems may also be interpreted in terms of signals in the where Aˆ is defined in (18), and ⎡ ⎤ optimally controlled closed-loop. Aˆ (Lˆ − L)C2 (Lˆ − L)D211 Corollary 15: Suppose u, w, x and y satisfy (21)–(25). Then ⎣ ⎦ → − → − D = 0 AL B2 + LD21 . the maps E2 : w x ξ and R2 : w y C2ξ, which give T T T the error and residual for Player 2, are E1 C2 E1 C2 E1 D21 AL B1 + LD21 AL B1 + LD21 Q E = R = . Optimality of =0is established by verifying the analogous 2 I 0 2 C D orthogonality relationship to (34). This is easily done using the 2 21 Gramian identity from Theorem 8. Then we have The maps E1 : w → x − ζ and R1 : w → y1 − C11ζ1, which ⎡ ⎤ give the error and residual for Player 1, are Aˆ −LEˆ B 1 2 ⎡ ⎤ ⎢ ⎥ ˆ ˆ − ˆ − Gest ⎢ I 00⎥ A (L L)C2 (L L)D21 = ⎣ ⎦ . ⎣ ⎦ I 00 E1 = 0 AL B1 + LD21 K 00 II 0 T − T Comparing with (22), we can see that AM ME1 E1 ⎡ ⎤ R1 = R2. T ζ C11 E1 y ⎣ ζ ⎦ = Gest 1 uζ ∗ ∈H⊥ uˆ Furthermore, the orthogonality conditions E2R2 2 and ∗ ∈H⊥ E1R1 2 are satisfied. and the result follows.  As with Theorem 8, Corollary 15 lends itself to a statistical

The result of Theorem 14 that ζ = x|y1,uζ has a clear inter- interpretation. If w is white Gaussian noise with unit intensity, pretation that ζ is a Kalman filter of x. However, this result and we consider the steady-state distributions of the error and assumes that Player 2 is implementing the optimal policy residual for the second player, x − ξ and y − C2ξ respectively, for u2. This is important because the optimal estimator gain then they are independent. Similarly, the error and residual for depends explicitly on this choice of policy. In contrast, in the the first player, x − ζ and y1 − C11ζ1, are also independent. 2138 IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 60, NO. 8, AUGUST 2015

D. Optimal Cost small. Examining these cases individually, we find the fol- ˆ We now compute the cost associated with the optimal con- lowing. If L = L, then even the optimal centralized controller trol policy for the two-player output-feedback problem. From makes no use of y2; it only adds noise and no new information. In the two-player context, this leads to ζ = ξ, so both players centralized H2 theory [34], there are many equivalent expres- sions for the optimal centralized cost. In particular have the same estimate of the global state, based only on y1, and either player is capable of determining u. On the other hand, if A B 2 ˆ  P K 2 K 1 K = K, then even the optimal centralized controller makes no F( , cen) 2 = C1 + D12K 0 use of u1. In the two-player context, this leads to a policy in 2 2 which Player 1 does nothing. AL B1 + LD21 + D12K 0 2 E. Some Special Cases =trace(XW) + trace(YKTRK) The optimal controller (19) is given by (36), which we repeat =trace(YQ)+trace(XLV LT). here for convenience where X, Y, K, L are defined in (13). Of course, the cost of the 00 optimal two-player controller will be greater, so we have u = Kx| + (x| − x| ). y1,uζ HJ y,u y1,uζ  P K 2  P K 2 F( , opt) 2 = F( , cen) 2 +Δ Recall that K and J are found by solving standard AREs ≥ where Δ 0 is the additional cost incurred by decentralization. (13). The coupling between estimation and control appears in We now give some closed-form expressions for Δ that are − −1 T T the term H = R22 (B22Φ+S12), which is found by solving similar to the centralized formulae above. the coupled linear equations (14) and (15). Theorem 16: The additional cost incurred by the optimal Several special cases of the two-player output-feedback controller (19) for the two-player problem (4) as compared to problem have previously been solved. We will see that in the cost of the optimal centralized controller is each case, the first component of (x| − x| ) is zero. In y,u y1,uζ 2 other words, both players maintain identical estimates of x1 Aˆ (Lˆ − L)D21 Δ= given their respective information. Consequently, u no longer D (Kˆ − K)0 12 2 depends on H and there is no need to compute Φ and Ψ.We now examine these special cases in more detail. = trace(Yˆ − Y )(Kˆ − K)TR(Kˆ − K) a) Centralized: The problem becomes centralized when both T = trace(Xˆ − X)(Lˆ − L)V (Lˆ − L) players have access to the same information. In this case, both players maintain identical estimates of the entire where K, Kˆ , L, Lˆ are defined in (13)–(16), Xˆ and Yˆ are state x. Thus, x| = x| =ˆx and we recover the Aˆ y,u y1,uζ defined in (26) and (28), and is defined in (18). well-known centralized solution u = Kxˆ. Proof: The key is to view Kopt as a sub-optimal central- H b) State feedback: The state-feedback problem for two play- ized controller. Centralized 2 theory [34] then implies that ers is the case where our measurements are noise-free,  Q 2 so that y1 = x1 and y2 = x2. Therefore, both players Δ= D12 youD21 2 (37) measure x1 exactly. This case is solved in [24], [25] and the solution takes the following form, which agrees with where Qyou is the centralized Youla parameter. Specifically, −1 our general formula: Qyou = Fu(J , Kopt) and ⎡ ⎤ − A B2 L x1 0 − − u = K + (x2 xˆ2|1). J 1 ⎣ ⎦ xˆ2|1 J = C2 0 I . −KI0 Here, xˆ2|1 is an estimate of x2 given the information This centralized Youla parameterization contains the gains K available to Player 1, as stated in [25]. and L instead of Kd and Ld. After simplifying, we obtain c) Partial output feedback: In the partial output-feedback case, y = x as in the state-feedback case, but y is Aˆ Lˆ − L 1 1 2 Qyou = (38) a noisy linear measurement of both states. This case is Kˆ − K 0 solved in [26] and the solution takes the following form substituting (38) into (37) yields the first formula. The second (using notation from [26]), which agrees with our general formula follows from evaluating (37) in a different way. Note formula:  − −1 2 T that Ds + Cs(sI As) Bs = trace(CsWcCs ), where x1 0 − Wc is the associated controllability Gramian, given by AsWc + u = K + (ˆx2|2 xˆ2|1) T T xˆ2|1 J WcAs + BsBs =0. In the case of (37), the controllability ˆ − Gramian equation is precisely (26), and therefore Wc =Y Y d) Dynamically decoupled: In the dynamically decoupled and the second formula follows. The third formula follows using case, all measurements are noisy, but the dynamics of ˆ the observability Gramian Wo = X − X given by (28).  both systems are decoupled. This amounts to the case Theorem 16 precisely quantifies the cost of decentralization. where A21 =0, B21 =0, C21 =0, and W , V , U are We note that Δ is small whenever (Lˆ − L) or (Kˆ − K) is block-diagonal. Due to the decoupled dynamics, the esti- LESSARD AND LALL: OPTIMAL CONTROL OF TWO-PLAYER SYSTEMS WITH OUTPUT FEEDBACK 2139

mate of x1 based on y1 does not improve when addition- Lemma 18: SupposeT∈RH∞ satisfiesAssumptionsB1–B3. ally using y2. This case is solved in [6] and the solution Then the two-player model-matching problem takes the following form, which agrees with our general formula: minimize T + T QT  11 12 21 2 xˆ1|1 0 subject to Q∈lower(H2) (41) u = K + (ˆx2|2 − xˆ2|1). xˆ2|1 J has a unique solution. Furthermore, Q is the minimizer of (41) Note that estimation and control are decoupled in all the special if and only if cases examined above. This fact allows the optimal controller to be computed by merely solving some subset of the AREs (13). H⊥ L T ∗ T T QT T ∗ ∈ 2 2 In the general case however, estimation and control are coupled 12( 11 + 12 21) 21 H⊥ H⊥ . (42) via Φ and Ψ in (14) and (15). 2 2

Proof: The proof follows exactly that of Lemma 17.  VI. PROOFS Each of these model-matching problems has a rational trans- A. Existence and Uniqueness of the Controller fer function solution, as we now show. Lemma 19: Suppose T∈RH∞ satisfies Assumptions B1–B3 In order to prove existence and uniqueness, our general and has a minimal joint realization given by approach is to first convert the optimal control problem into ⎡ ⎤ A¯ B¯ B¯ a model-matching problem using Corollary 5. This model- T T 1 2 ¯ T 11 12 ⎣ ⎦ where A is matching problem has stable parameters ij, and so may be = C¯1 0 D¯ 12 (43) T21 0 Hurwitz. solved using standard optimization methods on the Hilbert C¯2 D¯ 21 0 space H2. We therefore turn our attention to this class of problems. We will need the following assumptions. The solution to the centralized model-matching problem (39) is B1) T11(∞)=0 rational, and has realization B2) T12(iω) has full column rank for all ω ∈ R ∪{∞} ⎡ ⎤ A¯K B¯2K¯ 0 B3) T21(iω) has full row rank for all ω ∈ R ∪{∞} Q ⎣ 0 A¯ −L¯ ⎦ The optimality condition for centralized model-matching is opt = L (44) given in the following lemma. K¯ K¯ 0 Lemma 17: SupposeT∈RH∞ satisfiesAssumptionsB1–B3. ¯ ¯ ¯ ¯ Then the model-matching problem where K, L are defined in (13), and AK , AL are defined in (18) with all state-space parameters replaced by their barred T T QT  minimize 11 + 12 21 2 counterparts. Proof: It is straightforward to verify that subject to Q∈H (39) 2 Assumptions A1–A6 hold for the realization (43). Optimality has a unique solution. Furthermore, Q is the minimizer of (39) is verified by substituting (44) directly into the optimality  if and only if condition (40) and applying Lemma 17. Lemma 20: SupposeT∈RH∞ satisfiesAssumptions B1–B3. T ∗ T T QT T ∗ ∈H⊥ 12( 11 + 12 21) 21 2 . (40) Then the optimal solution of the two-player model-matching problem (41) is rational. Proof: The Hilbert projection theorem (see, for example Proof: Since the H2-norm is invariant under rearrange- [14]) states that if H is a Hilbert space, S ⊆ H is a closed ment of matrix elements, we may vectorize [5] the contents of subspace, and b ∈ H, then there exists a unique x ∈ S that the norm in (41) to obtain minimizes x − b2. Furthermore, a necessary and sufficient ⊥ condition for optimality of x is that x ∈ S and x − b ∈ S . T minimize vec(T11)+ T ⊗T12 vec(Q) Given a bounded linear map A : H → H which is bounded 21 2 Q∈ H below, define T := {Ax|x ∈ S}. Then T is closed and the subject to lower( 2). (45) projection theorem implies that the problem Due to the sparsity pattern of Q, some entries of vec(Q) will Ax − b minimize 2 be zero. Let E be the identity matrix with columns removed Q∈ H subject to x ∈ S corresponding to these zero-entries. Then lower( 2) if and only if vec(Q)=Eq for some q ∈H2. Then (45) is has a unique solution. Furthermore, x is optimal if and only equivalent to ∈ ∗ − ∈ ⊥ if x S and A (Ax b) S . This result directly implies the lemma, by setting H := L2 and S := H2, and defining T T T ⊗T minimize vec( 11)+( 21 12)Eq 2 the bounded linear operator A : L2 →L2 by AQ := T12QT 21, ∗ ∗ ∗ P T PT subject to q ∈H2. (46) with adjoint A := 12 21. The operator A is bounded below as a consequence of Assumptions B2–B3.  We now develop an optimality condition similar to (40), but This is a centralized model-matching problem of the form (39). for the two-player structured model-matching problem. Using standard properties of the Kronecker product, one may 2140 IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 60, NO. 8, AUGUST 2015

verify that Assumptions B1–B3 are inherited by follows from Theorem 4 that Kopt is an admissible stabilizing T controller. vec(T11)(T ⊗T12)E 21 . (47) We now directly verify that Qopt defined in (49) satisfies 10 the optimality condition (42) for the model-matching problem characterized by the T given in (9). The closed-loop map T + By Lemma 19 the optimal q is rational, and hence the result 11 T Q T has a particularly nice expression. Substituting in follows.  12 opt 21 (49) and (9) and simplifying, we obtain We now prove that under the assumptions of Theorem 6, the two-player output-feedback problem (4) has a unique solution. Ac Bc Proof of Theorem 6, Part (i): We apply Corollary 5 to T11 + T12QoptT21 = reduce the optimal control problem to the two-player model- Cc 0 RH ⎡ ⎤ matching problem over ∞. It is straightforward to check −ˆ −ˆ T AK LC2 0 LD21 that under Assumptions A1–A6, the particular given in (9) ˆ − ˆ ˆ T ∞ ⎢ 0 A B2KB1 + LD21 ⎥ satisfies B1–B3. These assumptions imply that 12( ) is left- =⎢ ⎥ . ⎣ 00AL B1 + LD21 ⎦ invertible and T21(∞) is right-invertible. Thus, the optimal Q ∈RH∞ Q ∞ solution opt must have ( )=0 to ensure that C1 + D12KC1 + D12Kˆ −D12Kˆ 0 T11 + T12QT212 is finite. Therefore we may replace the Q∈ RH constraint that lower( ∞) as in Corollary 5 with the Note that Kd and Ld are now absent, as the optimal closed- constraint that Q∈lower(RH3) without any loss of generality. loop map does not depend on the choice of parameterization. Existence and uniqueness now follows from Lemma 18. Ratio- The left-hand side of the optimality condition (42) is therefore nality follows from Lemma 20.  T ∗ T T Q T T ∗ The vectorization approach of Lemma 18 effectively reduces 12 ( 11 + 12 opt 21) 21 the structured model-matching problem (41) to a centralized ⎡ ⎤ −AT −CT C 0 0 model-matching problem, which has a known solution, shown Kd Kd c ⎢ 0 A B BT B DT ⎥ later in Lemma 19. Unfortunately, constructing the solution in ⎢ c c Ld c 21 ⎥ = ⎣ − T − T ⎦ . this manner is not feasible in practice because it requires finding 00ALd C2 a state-space realization of the Kronecker system (47). This T T B2 D12Cc 00 leads to a dramatic increase in state dimension, and requires solving a large Riccati equation. Furthermore, we lose any Apply Lemmas 7 and 9 to define Xˆ and Yˆ . Now perform the physical interpretation of the states, as mentioned in Section I. state transformation x → Txwith ⎡ ⎤ I − [ X Xˆ 0]⎡ 0 ⎤ B. Formulae for the Optimal Controller ⎢ ⎥ ⎢ 0 ⎥ Proof of Theorem 6, Part (ii): ⎢ − ⎣ ˆ ⎦ ⎥ Suppose the linear (14) and T := ⎣ 0 I Y ⎦ . (15) are satisfied by some Φ, Ψ and the proposed Kopt has Y been defined according to (19). We will make the following 00 I simplifying assumption: At this point we make use of the Riccati (13), as well as the L1 = M and K2 = J (48) Sylvester equations (14) and (15) via the identities (26)–(29). This leads to a state space realization with 5n states, and where L1 and K2 were originally defined in Lemma 2, and sparsity pattern of the form M and J are defined in (13). There is no loss of generality ⎡ ⎤ in choosing this particular parameterization for T , but it leads − T AKd 0 E1E1  to simpler algebra. To verify optimality of K ,weusethe ⎢ T ⎥ opt ⎢ 0  0 E2  ⎥ parameterization of Theorem 4 to find the Qopt that generates ⎢ ˆ T T ⎥ ⎢ 00AE2 E2 ⎥ Kopt. The computation yields Ω=⎢ ⎥ ⎢ 000 00⎥ Q J −1 K ⎣ 0000−AT −CT ⎦ opt = Fu d , opt Ld 2 ⎡ ⎤ BT 0 E   0 AK −LCˆ 2 0 Lˆ 2 1 ⎢ ˆ − ˆ − ˆ ⎥ ⎢ 0 A B2KLd L ⎥ = ⎣ 00A L − L ⎦ where  denotes a matrix whose value is unimportant. The L d second and fourth states are unobservable and uncontrollable, ˆ ˆ Kd − KKd − K K 0 respectively. Removing them, we are left with ⎡ ⎤ ˆ ⎡ ⎤ AK 0 L − T ⎣ − ⎦ AKd E1  = 0 AL Ld L (49) ⎢ ˆ T T ⎥ ⎢ 0 AE2 E2 ⎥ K − K Kˆ 0 Ω=⎣ − T − T ⎦ . d 00ALd C2 ˆ T where AK , AL, A are defined in (18). The last simplification B2 E1 0 in (49) comes thanks to (48). The sparsity structure of the gains Lˆ and Kˆ gives Qopt a block-lower-triangular structure. Note Because of the block-triangular structure of A, B2, C2 and also that AK , AL, Aˆ are Hurwitz, so Qopt ∈ lower(RH3).It the block-diagonal structure of Kd, Ld, it is straightforward to LESSARD AND LALL: OPTIMAL CONTROL OF TWO-PLAYER SYSTEMS WITH OUTPUT FEEDBACK 2141 check that requirement (6) and the parameter choice (48). If ⎡ T ⎤ −A  AP BP Kd Q := where A is Hurwitz ⎣ 0 −AT −CT ⎦ ∈H⊥ 11 P ΩE1 = M 11 2 . CP 0 T B2  0 then the right-hand side of (50) is minimized by Similarly, ETΩ ∈H⊥, and therefore 2 2 [ Q21 Q22 ] ⎡ ⎤ H⊥ L 0 0 C ∗ ∗ 2 2 P − T (T11 + T12QoptT21)T ∈ ⊥ ⊥ . ⎢ AKd + B2 ¯ B2 ¯ B2 ¯ Ld ⎥ 12 21 H H ⎢ K1 K2 K3 ⎥ 2 2 ⎢ − ⎥ = 0 AL 0 Ld L . ⎢ T ⎥ ⎣ 00AP BP E ⎦ So by Lemma 18, Qopt is the solution to the model-matching 1 problem (41), and therefore must also be a minimizer of (11). K¯1 K¯2 K¯3 0 It follows from Corollary 5 that K is the unique optimal opt (52) controller for the two-player output-feedback problem (4). 

Here the quantities K¯1, K¯2, and K¯3 are defined by C. Existence of Solutions to the Sylvester Equations ¯ − −1 T T T K1 = R22 (B22ΘX + S12 + R12K1)0 We now show that the linear equations (14) and (15) have a ¯ − −1 T T solution. First, we show that existence of the optimal controller K2 = R22 (B22Φ+S12) J implies a certain fixed point property. A simple necessary con- ¯ − −1 T T dition for optimality is person-by-person optimality; by fixing K3 = R22 B22ΓX + R12CP (53) any part of the optimal Q and optimizing over the rest, we cannot improve upon the optimal cost. where ΘX , ΓX , and Φ are the unique solutions to the linear Lemma 21: Suppose T is given by (9) and equations Assumptions A1–A6 hold. Further suppose that Q∈ ATΘ +[Θ X˜]A E + ETCT C E =0 lower(RH2) and J X X Kd 1 2 Kd Kd 1 ATΓ +Γ A Q 0 J X X P Q := 11 Q21 Q22 T + [ΘX X˜]B2E1 + J R21 + S21 CP =0

Define the following partial optimization functions: T ˜ − AJ Φ+ΦAM + XA21 ΘX MC11 T T + J S12 + Q21 +ΓX BP C11 =0 (54) g1(Q11):=argmin min T11 + T12QT212 (50) Q ∈RH Q21∈RH2 22 2 where Q, R, S are defined in (12), X,Y,J,L˜ are defined in g (Q ):=argmin min T + T QT  . (51) (13), and AKd,CKd are defined in (10). 2 22 Q ∈RH 11 12 21 2 Q T Q11∈RH2 21 2 Proof: Since 11 is held fixed, group it with 11 to obtain

If Q is optimal for the two-player model-matching problem T + T QT 11 12 21 (41), then T T Q TT T Q Q T = 11 + 12E1 11E1 21 + 12E2[ 21 22] 21.

g2 (g1(Q11)) = Q11 and g1 (g2(Q22)) = Q22. This is an affine function of [ Q21 Q22 ], so the associated model-matching problem is centralized. Finding a joint realiza- Proof: Under the stated assumptions, the optimization tion of the blocks as in Lemma 19, we obtain problems (50) and (51) have unique optimal solutions. If Q ⎡ ⎤ ¯ ¯ ¯ is optimal, then clearly we have Q = g (Q ) and Q = A B1 B2 22 1 11 11 T + T E Q ETT T E g (Q ). The result follows by substituting one identity into 11 12 1 11 1 21 12 2 ⎣ ¯ ¯ ⎦ 2 22 T = C1 0 D12 the other.  21 0 C¯2 D¯ 21 0 An alternative way of stating Lemma 21 is that the optimal ⎡ ⎤ − − Q and Q are the fixed points of the maps g ◦ g and AKd LdC2 B2E1CP LdD21 B2E2 11 22 2 1 ⎢ ⎥ g1 ◦ g2 respectively. Our next step is to solve these fixed-point ⎢ 0 ALd 0 BLd 0 ⎥ ⎢ T T ⎥ equations analytically. The key insight is that (50) and(51) are = ⎢ 0 BP E1 C2 AP BP E1 D21 0 ⎥ . centralized model-matching problems of the form (39). In the ⎣ ⎦ CKd C1 D12E1CP 0 D12E2 following lemma, we fix Q and we find [ Q Q ] that 11 21 22 0 C2 0 D21 0 minimizes the right-hand side of (50). Lemma 22: Assume T∈RH∞ is given by (9). Suppose It is straightforward to check that Assumptions B1–B3 are that Assumptions A1–A6 hold together with the structural satisfied for this augmented system. Now, we may apply 2142 IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 60, NO. 8, AUGUST 2015

Lemma 19. The result is that We may solve (51) in a manner analogous to how we solved ⎡ ⎤ (50). Namely, we can provide a formula for the optimal Q A¯ + B¯ K¯ B¯ K¯ 0 11 2 2 and Q as functions of Q . The result follows directly from Q Q ⎣ 0 A¯ + L¯C¯ −L¯ ⎦ 21 22 [ 21 22]opt = 2 (55) Lemma 22 after we make a change of variables. K¯ K¯ 0 Lemma 23: Assume T∈RH∞ is given by (9). Suppose Assumptions A1–A6 hold together with the structural require- ¯ ¯ ¯ ¯ ¯ ¯ where we defined (X,K):=ARE(A, B2, C1, D12) and ment (6) and the parameter choice (48). If ¯ ¯T ¯T ¯T ¯T ¯ T (Y,L ):=ARE(A , C2 , B1 , D21). One can check that the A B stabilizing solution to the latter ARE is Q Q Q ⎡ ⎤ ⎡ ⎤ 22 := where AQ is Hurwitz CQ 0 000 Ld ¯ ⎣ ⎦ ¯ ⎣ − ⎦ Y = 0 Y 0 and L = L Ld . then the right-hand side of (51) is minimized by − T 000 BP E1 ⎡ ⎤ ¯ ¯ ALd +[L1 0]C2 00L1 The former ARE is more complicated, however. Examining Q ⎢ [ L¯2 0]C2 AK 0 L¯2 ⎥ −1 T T 11 ⎢ ⎥ K¯ = −R¯ (B¯ X¯ + S¯ ), we notice that due to all the zeros Q = ⎣ [ L¯ B ] C 0 A L¯ ⎦ ¯ ¯ ¯ 21 3 Q 2 Q 3 in B, the only part of X that affects the gain K is the second − − sub-row of the first block-row. In other words, if Kd Kd KE2CQ 0 ⎡ ⎤ (57) ¯ ¯ ¯ X¯11 X¯12 X¯13 The quantities L1, L2, and L3 are defined by ⎣ ⎦ X¯ := X¯21 X¯22 X¯23 ¯ ¯ ¯ ¯ 0 X31 X32 X33 L1 = − T T T −1 ΘY C11 + U12 + L2V12 V11 ¯ T ¯ T ¯ T ¯ then K only depends on E2 X11, E2 X12, and E2 X13. Multi- M ¯ T L¯2 = − plying the ARE for X on the left by [ E2 00]and on the − ΨCT + U T V 1 T T 11 12 11 right by [ E2 00], we obtain the equation T −1 L¯3 = − ΓY C + BQV21 V T ˜ ˜ T 11 11 AJ X + XAJ +(C1E2 + D12E2J) (C1E2 + D12E2J)=0 where ΘY , ΓY , and Ψ are the unique solutions to the linear ˜ T ¯ ˜ where X := E2 X11E2. It is straightforward to see that X as equations defined in (13) satisfies this equation. Substituting this back into ¯ T ˜ the ARE for X and multiplying on the left by [ E2 00],we T Y T T T E2 ALd +ΘY AM + E2 BLdBLdE1 =0 obtain ΘY

T T T T T A [ E X¯11 E X¯12 E X¯13 ] AQΓY +ΓY AM J 2 2 2 ⎡ ⎤ − Y˜ AKd LdC2 B2E1CP + B ETC + V M T + U =0 T ¯ T ¯ T ¯ ⎣ ⎦ Q 2 2 Θ 21 21 +[E2 X11 E2 X12 E2 X13 ] 0 ALd 0 Y T 0 BP E1 C2 AP T ˜ − AJ Ψ+ΨAM + A21Y B22JΘY T +(C1E2 + D12E2J) [ CKd C1 D12E1CP ]=0 (56) T T + U12M + W21 + B22CQΓY =0 (58) T T Right-multiplying (56) by [0 E2 0] , we conclude that ˜ T ¯ ˜ ¯ where W, V, U are defined in (12), X, Y,K,M are defined in E2 X12E2 = X. Notice that (56) is linear in the X terms. (13), and ALd,BLd are defined in (10). Assign the following names to the missing pieces: A key observation that greatly simplifies the extent to which T ¯ ˜ T ¯ ˜ T ¯ the optimization problems (50) and (51) are coupled is that E2 X11 := [ ΘX X ] E2 X12 := [ Φ X ] E2 X13 := ΓX the Qii have simple state-space representations. In fact, we Q Upon substituting these definitions into (56) and simplifying, have the strong conclusion that for any fixed 11, the optimal Q Q we obtain (54). A similar substitution into the definition of K¯ = 22 = g1( 11) has a fixed state dimension no greater than the [ K¯1 K¯2 K¯3 ] leads to the formulae (53). dimension of the plant, as in the following result. The equations (54) have a unique solution. To see why, note Theorem 24: Suppose T∈RH∞ is given by (9) and that they may be sequentially solved: for ΩX , then for ΓX , Assumptions A1–A6 hold together with the structural require- and finally for Φ. Each is a Sylvester equation of the form ment (6) and the parameter choice (48). The functions gi A1Ω+ΩA2 + A0 =0, where A1 and A2 are Hurwitz, so the defined in (50) and (51) are given by solution is unique. Furthermore, A¯ + B¯K¯ is easily verified AL (Ld − L)E2 to be Hurwitz, so the procedure outlined above produces the g (Q )= ¯ ¯ ¯ 1 11 correct stabilizing X. Now, substitute K and L into (55). The K¯2 0 result is a very large state-space realization, but it can be greatly AK L¯2 reduced by eliminating uncontrollable and unobservable states. g (Q )= (59) 2 22 T − The result is (52). This reduction is not surprising, because we E1 (Kd K) 0 solved a model-matching problem in which the joint realization for the three blocks had a zero as the fourth block.  where K¯2, L¯2 are defined in Lemmas 22 and 23 respectively. LESSARD AND LALL: OPTIMAL CONTROL OF TWO-PLAYER SYSTEMS WITH OUTPUT FEEDBACK 2143

Proof: This follows directly from Lemmas 22 and 23 Repeating a similar procedure as above by instead examining Q and some simple state-space manipulations. Note that g2( 22) (61) and (58), we find that CQΓY = K¯2ΓˆY where ΓˆY is depends on Q22 only through the realization-independent quan- given by k ≥ tities CQAQBQ for k 0, and similarly for g1. Y˜ − Y11 Proof of Theorem 6, Part (iii): Applying the fixed-point ΓˆY = . ΘY − Y21 results of Lemma 21, there must exist matrices AP , BP , CP , AQ, BQ, CQ such that The result is a different equation involving only Φ and Ψ.This ¯ time, it turns out to be (15). AP BP AK L2 = (60) We have therefore shown that existence of a solution to T − CP 0 E1 (Kd K) 0 the fixed-point equations of Lemma 21 implies the existence − of a solution to the Sylvester equations (14) and (15). This AQ BQ AL (Ld L)E2  = (61) completes the proof. CQ 0 K¯2 0 VII. CONCLUSION where (54) and (58) are satisfied. Given A ,B ,C we define Q Q Q In this article, we studied the class of two-player output- Ψ according to (58), and given A ,B ,C we define Φ P P P feedback problems with a nested information pattern. according to (54). We will show that these Φ and Ψ satisfy the We began by giving necessary and sufficient state-space con- Sylvester equations (14) and (15). ditions for the existence of a structured stabilizing controller. For convenience, define S := (A ,B ,C ,A ,B ,C ). P P P Q Q Q This led to a Youla-like parameterization of all such controllers Our goal is to use (60) and (61) to eliminate the matrices in S and a convexification of the two-player problem. from (54) and (58). We begin with (60). The main result of this paper is explicit state-space formu- Note that we cannot simply set the corresponding state- lae for the optimal H controller for the two-player output- space parameters in (60) equal to one another. This approach is 2 feedback problem. In the centralized case, it is a celebrated and erroneous because transfer function equality does not in general widely-generalized result that the controller is a composition of imply that the state-space matrices are also equal. For example, an optimal state-feedback gain with a Kalman filter estimator. if ET(K − K)=0, then we can set C =0and any choice of 1 d P Our approach generalizes both the centralized formulae and this B satisfies (60). Equality of transfer functions does however P separation structure to the two-player decentralized case. We imply equality of the Markov parameters. Namely show that the H2-optimal structured controller has generically k T − k ¯ CP AP BP = E1 (Kd K)AK L2 for k =0, 1,.... (62) twice the state dimension of the plant, and we give intuitive interpretations for the states of the controller as steady-state S Now consider (54). Note that ΘX does not depend on . Kalman filters. The player with more information must du- Furthermore, the equation for ΓX is a Sylvester equation of the plicate the estimator of the player with less information. This T form AJ ΓX +ΓX AP +ΩCP =0, where has the simple anthropomorphic interpretation that Player 2 is Ω:=[Θ X˜ ] B E + J TR + S correcting mistakes made by Player 1. X 2 1 21 21 Both the state-space dimension and separation structure of and Ω is independent of S. Since AJ and AP are Hurwitz by the optimal controller were previously unknown. While these construction, the unique ΓX is given by the integral results show that the optimal controller for this problem has ∞ an extremely simple state-space structure, not all such decen- tralized problems exhibit such pleasant behavior. One example Γ = exp(ATt)ΩC exp(A t)dt. X J P P is the two-player fully-decentralized state-feedback problem, 0 where even though the optimal linear controller is known to Substitute the Markov parameters (62), and conclude that be rational, it is shown in [10] that the number of states of the controller may grow quadratically with the state-dimension of ∞ the plant. T T − ¯ ΓX BP = exp(AJ t)ΩE1 (Kd K)exp(AK t)L2dt. The formulae that we give for the optimal controller are 0 simply computable, requiring the solution of four standard AREs, two that have the same dimension as the plant and two ˆ ¯ ˆ The equation above is of the form ΓX BP = ΓX L2, where ΓX with a smaller dimension. In addition, one must solve a linear satisfies matrix equation. All of these computations are simple and have readily available existing code. ATΓˆ + Γˆ A +ΩET(K − K)=0. (63) J X X K 1 d While there is as yet no complete state-space theory for One can verify by direct substitution that (63) is solved by decentralized control, in this work we provide solutions to a prototypical class of problems which exemplify many of ΓˆX =[ΘX − X21 X˜ − X22 ] . (64) the features found in more general problems. It remains a fundamental and important problem to fully understand the Noting that the term ΓX BP appears explicitly in the Φ-equation separation structure of optimal decentralized controllers in the ˆ ¯ of (54), we may replace the ΓX BP term by ΓX L2, and substi- general case. While we solve the two-player triangular case, tute the expressions for ΓˆX and L¯2 from (64) and Lemma 22, we hope that the solution gives some insight and possible respectively. Doing so, we find that ΘX cancels. The result is hints regarding the currently unknown structure of the optimal an equation involving only Φ and Ψ, and it turns out to be (14). controllers for more general architectures. 2144 IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 60, NO. 8, AUGUST 2015

ACKNOWLEDGMENT [25] J. Swigart and S. Lall, “An explicit state-space solution for a decentralized two-player optimal linear-quadratic regulator,” in Proc. Amer. Control The authors would like to thank B. Bernhardsson for very Conf., 2010, pp. 6385–6390. helpful discussions. [26] J. Swigart and S. Lall, “Optimal controller synthesis for a decentralized two-player system with partial output feedback,” in Proc. Amer. Control Conf., 2011, pp. 317–323. [27] A. S. M. Vamsi and N. Elia, “Design of distributed controllers realizable REFERENCES over arbitrary directed networks,” in Proc. IEEE Conf. Decision Control, [1] S. Adlakha, S. Lall, and A. Goldsmith, “Networked Markov decision pro- 2010, pp. 4795–4800. cesses with delays,” IEEE Trans. Autom. Control, vol. 57, no. 4, pp. 1013– [28] P. G. Voulgaris, “Control of nested systems,” in Proc. Amer. Control Conf., 1018, 2012. 2000, vol. 6, pp. 4442–4445. [2] C. A. Desoer and W. S. Chan, “The feedback interconnection of lumped [29] H. S. Witsenhausen, “A counterexample in stochastic optimum control,” linear time-invariant systems,” J. Franklin Inst., vol. 300, no. 5–6, SIAM J. Control, vol. 6, no. 1, pp. 131–147, 1968. pp. 335–351, 1975. [30] J. Wu and S. Lall, “A dynamic programming algorithm for decentralized [3] F. Farokhi, C. Langbort, and K. H. Johansson, “Optimal structured static Markov decision processes with a broadcast structure,” in Proc. IEEE state-feedback control design with limited model information for fully- Conf. Decision Control, 2010, pp. 6143–6148. actuated systems,” Automatica, vol. 49, no. 2, pp. 326–337, 2013. [31] V. Yadav, M. V. Salapaka, and P. G. Voulgaris, “Architectures for dis- [4] Y.-C. Ho and K.-C. Chu, “Team decision theory and information struc- tributed controller with sub-controller communication uncertainty,” IEEE tures in optimal control problems–Part I,” IEEE Trans. Autom. Control, Trans. Autom. Control, vol. 55, no. 8, pp. 1765–1780, 2010. vol. 17, no. 1, pp. 15–22, 1972. [32] D. Youla, H. Jabr, and J. Bongiorno, Jr., “Modern Wiener-Hopf design of [5] R. A. Horn and C. R. Johnson, Topics in Matrix Analysis. Cambridge, optimal controllers-Part II: The multivariable case,” IEEE Trans. Autom. U.K.: Cambridge Univ. Press, 1994. Control, vol. 21, no. 3, pp. 319–338, 1976. [6] J. H. Kim and S. Lall, “A unifying condition for separable two player [33] D. Zelazo and M. Mesbahi, “H2 analysis and synthesis of networked optimal control problems,” in Proc. IEEE Conf. Decision Control, 2011, dynamic systems,” in Proc. Amer. Control Conf., 2009, pp. 2966–2971. pp. 3818–3823. [34] K. Zhou, J. Doyle, and K. Glover, Robust and Optimal Control. [7] M. Kristalny, “Exploiting Previewed Information in Estimation and Con- Englewood Cliffs, NJ, USA: Prentice-Hall, 1995. trol,” PhD thesis, Faculty of Mech. Eng., Technion—IIT, Haifa, Israel, Aug. 2010. [8] A. Lamperski and J. C. Doyle, “Dynamic programming solutions for decentralized state-feedback LQG problems with output feedback,” in Proc. Amer. Control Conf., 2012, pp. 6322–6327. [9] L. Lessard, “Decentralized LQG control of systems with a broadcast architecture,” in Proc. IEEE Conf. Decision Control, 2012, pp. 6241– Laurent Lessard (M’09) was born in Toronto, ON, 6246. Canada. He received the B.Sc. degree in engineering [10] L. Lessard, “Optimal control of a fully decentralized quadratic regulator,” science from the University of Toronto, Toronto, in Proc. 50th Annu. Allerton Conf. Commun., Control, Comp., 2012, and the M.S. and Ph.D. degrees in aeronautics pp. 48–54. and astronautics from Stanford University, Stanford, [11] L. Lessard, M. Kristalny, and A. Rantzer, “On structured realizability CA, USA. and stabilizability of linear systems,” in Proc. Amer. Control Conf., 2013, He is currently a Postdoctoral Scholar in the pp. 5784–5790. Berkeley Center for Control and Identification, Uni- [12] L. Lessard and S. Lall, “A state-space solution to the two-player decentral- versity of California, Berkeley. Before that, he was ized optimal control problem,” in Proc. Allerton Conf. Commun., Control, a LCCC postdoc in the Department of Automatic Comp., 2011, pp. 1599–1564. Control, Lund University, Lund, Sweden. His re- [13] L. Lessard and S. Lall, “Optimal controller synthesis for the decentralized search interests include decentralized control, robust control, and large-scale two-player problem with output feedback,” in Proc. Amer. Control Conf., optimization. 2012, pp. 6314–6321. Dr. Lessard received the O. Hugo Schuck Best Paper Award at the American [14] D. G. Luenberger, Optimization by Vector Space Methods.NewYork, Control Conference in 2013. NY, USA: Wiley-Interscience, 1997. [15] N. Matni and J. C. Doyle, “Optimal distributed LQG state feedback with varying communication delays,” in Proc. IEEE Conf. Decision Control, 2013. [16] A. Mishra, C. Langbort, and G. E. Dullerud, “Optimal decentralized con- trol of a stochastically switched system with local parameter knowledge,” in Proc. Eur. Control Conf., 2013, pp. 75–80. Sanjay Lall (F’15) received the B.A. degree in math- [17] A. Nayyar, A. Mahajan, and D. Teneketzis, “Optimal control strategies ematics and the Ph.D. degree in engineering from the in delayed sharing information structures,” IEEE Trans. Autom. Control, University of Cambridge, Cambridge, U.K. vol. 56, no. 7, pp. 1606–1620, 2011. He is a Professor in the Departments of Electrical [18] X. Qi, M. V. Salapaka, P. G. Voulgaris, and M. Khammash, “Structured Engineering and Aeronautics and Astronautics at optimal and robust control with multiple criteria: A convex solution,” Stanford University, Stanford, CA, USA. Previously IEEE Trans. Autom. Control, vol. 49, no. 10, pp. 1623–1640, 2004. he was a Research Fellow in the Department of [19] A. Rantzer, “Linear quadratic team theory revisited,” in Proc. Amer. Con- Control and Dynamical Systems, California Insti- trol Conf., 2006, pp. 1637–1641. tute of Technology, Pasadena, and prior to that he [20] M. Rotkowitz and S. Lall, “A characterization of convex problems in de- was NATO Research Fellow in the Laboratory for centralized control,” IEEE Trans. Autom. Control, vol. 51, no. 2, pp. 274– Information and Decision Systems, Massachusetts 286, 2006. Institute of Technology, Cambridge. He was also a Visiting Scholar in the De- [21] M. Rotkowitz and S. Lall, “Convexification of optimal decentralized con- partment of Automatic Control, Lund Institute of Technology, Lund, Sweden. trol without a stabilizing controller,” in Proc. Int. Symp. Math. Theory His research focuses on the development of advanced engineering method- Netw. Syst. (MTNS), 2006, pp. 1496–1499. ologies for the design of control systems, and his work addresses problems [22] A. Saberi, A. A. Stoorvogel, and P. Sannuti, Filtering Theory With Ap- including decentralized control and model reduction. plications to Fault Detection, Isolation, Estimation. Boston, MA, USA: Dr. Lall received the O. Hugo Schuck Best Paper Award at the American Birkhäuser, 2007. Control Conference in 2013, the George S. Axelby Outstanding Paper Award [23] C. W. Scherer, “Structured finite-dimensional controller design by con- by the IEEE Control Systems Society in 2007, the NSF Career award in 2007, vex optimization,” Linear Algebra and its Applications, vol. 351–352, Presidential Early Career Award for Scientists and Engineers (PECASE) in pp. 639–669, 2002. 2007, and the Graduate Service Recognition Award from Stanford University [24] P. Shah and P. Parrilo, “H2-optimal decentralized control over posets: in 2005. With his students, he received the best student paper award at the IEEE A state-space solution for state-feedback,” IEEE Trans. Autom. Control, Conference on Decision and Control in 2005 and the best student paper award vol. 58, no. 12, pp. 3084–3096, 2013. at the IEEE International Conference on Power Systems Technology in 2012.