<<

168 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 53, NO. 1, JANUARY 2005 Robust -Squared Error Estimation in the Presence of Model Uncertainties Yonina C. Eldar, Member, IEEE, Aharon Ben-Tal, and Arkadi Nemirovski

Abstract—We consider the problem of estimating an unknown so that is chosen to minimize the Euclidian norm of the data parameter vector x in a linear model that may be subject to error . However, in an estimation context, the objective x uncertainties, where the vector is known to satisfy a weighted typically is to minimize the size of the estimation error , norm constraint. We first assume that the model is known ex- actly and seek the linear that minimizes the worst-case rather than that of the data error . mean-squared error (MSE) across all possible values of x. We show To develop an estimation method that is based directly on the that for an arbitrary choice of weighting, the optimal minimax estimation error, we may seek the estimator that minimizes MSE estimator can be formulated as a solution to a semidefinite the mean-squared error (MSE), where the MSE of an estimate programming problem (SDP), which can be solved very efficiently. of is the of the squared norm of the esti- We then develop a closed form expression for the minimax MSE mation error and is equal to the sum of the and the estimator for a broad class of weighting matrices and show that it coincides with the shrunken estimator of Mayer and Willke, with squared norm of the bias. Since the bias generally depends on a specific choice of factor that explicitly takes the prior the unknown parameters , we cannot choose an estimator to information into account. directly minimize the MSE. A common approach is to restrict Next, we consider the case in which the model matrix is subject to the estimator to be linear and unbiased and then seek the esti- uncertainties and seek the robust linear estimator that minimizes mator of this form that minimizes the variance or the MSE. It is the worst-case MSE across all possible values of x and all possible values of the model matrix. As we show, the robust minimax MSE well known that the LS estimator minimizes the variance in the estimator can also be formulated as a solution to an SDP. estimate among all unbiased linear . However, this Finally, we demonstrate through several examples that the min- does not imply that the LS estimator leads to a small variance or imax MSE estimator can significantly increase the performance a small mean-squared error (MSE). A difficulty often encoun- over the conventional least-squares estimator, and when the model tered in this estimation problem is that the resulting variance can matrix is subject to uncertainties, the robust minimax MSE esti- be very large, particularly in nonorthogonal and ill-conditioned mator can lead to a considerable improvement in performance over the minimax MSE estimator. problems. Various modifications of the LS estimator for the case in Index Terms—Data uncertainty, linear estimation, mean squared error estimation, minimax estimation, robust estimation. which the data model holds i.e., with and known exactly, have been proposed. Among the alternatives are Tikhonov regularization [5], which is also known in the I. INTRODUCTION statistical literature as the ridge estimator [6], the shrunken HE problem of estimating a set of unknown deterministic estimator [7], and the covariance shaping LS estimator [8], [9]. T parameters observed through a linear transformation In general, these LS alternatives attempt to reduce the MSE , and corrupted by additive noise , arises in a large variety in estimating by allowing for a bias. However, each of the of areas in science and engineering, e.g., communication, estimators above is designed to optimize an objective which economics, signal processing, seismology, and control. does not depend directly on the MSE, but rather depends on the Owing to the lack of statistical information about the pa- data error . rameters , often, the estimated parameters are chosen to opti- In many engineering applications, the model matrix is also mize a criterion based on the observed signal . The celebrated subject to uncertainties. For example, the matrix may be es- least-squares (LS) estimator [1]–[3], which was first used by timated from noisy data, in which case, may not be known Gauss to predict movements of planets [4], seeks the linear esti- exactly. If the actual data matrix deviates from the one assumed, mate of that results in an estimated data vector that then the performance of an estimator designed based on alone is closest, in a LS sense, to the given data vector may deteriorate considerably. Various methods have been pro- posed to account for uncertainties in . The Total LS method [10], [11] seeks the parameters and the minimum perturbation Manuscript received January 30, 2003; revised January 29, 2004. The work of Y. C. Eldar was supported in part by the Taub Foundation and by a Technion to the model matrix that minimize the data error. Although V.P.R. Fund. The associate editor coordinating the review of this manuscript and the total LS method allows for uncertainties in , in many cases, approving it for publication was Dr. Jean Pierre Delmas. it results in correction terms that are unnecessarily large. In par- Y. C. Eldar is with the Department of Electrical Engineering, Technion—Is- ticular, when the model matrix is square, the total LS method rael Institute of Technology, Haifa 32000, Israel (e-mail: [email protected] nion.ac.il). recuse to the conventional LS method, which does not take the A. Ben-Tal and A. Nemirovski are with the MINERVA Optimization uncertainties into account. Recently, several methods [12]–[14] Center, Department of Industrial Engineering, Technion—Israel Institute of have been developed to treat the case in which the perturba- Technology, Haifa 32000, Israel (e-mail: [email protected]; nemirovs@ ie.technion.ac.il). tion to the model matrix is bounded. These methods seek Digital Object Identifier 10.1109/TSP.2004.838933 the parameters that minimize the worst-case data error across

1053-587X/$20.00 © 2005 IEEE

Authorized licensed use limited to: Georgia Institute of Technology. Downloaded on February 16,2010 at 17:11:22 EST from IEEE Xplore. Restrictions apply. ELDAR et al.: ROBUST MEAN-SQUARED ERROR ESTIMATION IN THE PRESENCE OF MODEL UNCERTAINTIES 169 all bounded perturbations of and possibly bounded perturba- have the same eigenvector matrix. In Section IV, we tions of the data vector. In [15], the authors seek the estimator consider the case in which both and the model matrix are that minimizes the best possible data error over all possible per- subject to uncertainties and show that the minimax MSE esti- turbations of . Here again, the above objectives depend on the mator that minimizes the worst-case MSE in the region of un- data error and not on the estimation error or the MSE. certainty can again be formulated as an SDP. We then consider, In this paper, we consider the case in which the (possibly in Section V, the special case in which and have weighted) norm of the unknown vector is bounded and de- the same eigenvector matrix and and have the same velop robust estimators of , whose performance is reason- eigenvector matrix. Examples illustrating the performance ad- ably good across all possible values of in the region of uncer- vantage of the minimax MSE estimator over the LS estimator, tainty, by minimizing objectives that depend explicitly on the and the advantage of the robust minimax MSE estimator over MSE. the minimax MSE estimator in the presence of model uncer- We first consider the case in which is known exactly and tainties, are discussed in Section VI. develop a minimax linear robust estimator that minimizes the worst-case MSE across all possible bounded values of , i.e., II. MINIMAX MSE ESTIMATION WITH KNOWN over all values of such that for some constant We denote vectors in by boldface lowercase letters and and weighting matrix . The minimax MSE estimator for the matrices in by boldface uppercase letters. denotes the special case in which and the covariance matrix of identity matrix of appropriate dimension, denotes the Her- the noise vector is given by for some mitian conjugate of the corresponding matrix, and denotes has been developed in [16]. Here, we extend the results to ar- an estimated vector or matrix. The notation that bitrary and and show that the minimax MSE estimator can be formulated as the solution to a semidefinite programming is positive semidefinite. problem (SDP) [17]–[19], which is a tractable convex optimiza- Consider the problem of estimating the unknown determin- tion problem that can be solved efficiently, e.g., using interior istic parameters in the linear model point methods [19], [20]. We then develop a closed-form solu- (1) tion to the minimax estimation problem for the case in which the weighting and have the same eigenvector matrix. where is a known matrix with full rank , and is a In particular, when , we show that the optimal estimator is zero-mean random vector with covariance . We assume that a shrunken estimator proposed by Mayer and Willke [7], with a is known to satisfy the weighted norm constraint specific choice of shrinkage factor, that explicitly takes the prior for some positive definite matrix and scalar , where information into account. We demonstrate through simulations, . that the minimax MSE estimator can increase the performance We estimate using a linear estimator so that for over the conventional LS approach. some matrix . The MSE of the estimator is We then consider the case in which the model matrix is given by not known exactly, but is rather given by , where is known, and is a bounded perturbation matrix. Under this model, we seek a robust linear estimator that minimizes Tr (2) the worst-case MSE across all possible values of and . The second term in (2) (the squared norm of the bias ) de- Here again, we show that the optimal estimator can be found pends on the unknown parameters ; thus, in general, we cannot by solving an SDP. In the special case in which the weighting construct an estimator to directly minimize the MSE. Instead, and have the same eigenvector matrix and we seek the linear estimator that minimizes the worst-case MSE and have the same eigenvector matrix, we show that the across all possible values of satisfying . Thus, we minimax MSE estimator can be found by solving a convex op- consider the problem timization problem in two unknowns, regardless of the problem dimension. We note that an alternative way to account for bounds on is through regularization methods, such as Tikhonov regular- Tr (3) ization [5]. A more general regularization method that takes uncertainties in , as well as possibly other data uncertainties, To develop the solution to (3), we first determine the worst into account, was developed in [21]. However, these methods possible parameters , i.e., the parameters that are the solution are based on minimizing a weighted data error, whereas our to the inner problem in (3): approach directly minimizes the estimation error. In a com- panion paper [27], we consider a minimax regret approach that (4) also depends explicitly on the MSE rather than the data error. The paper is organized as follows. In Section II, we consider By introducing the change of variable , we have that the case in which is known and develop an SDP formula- tion of the linear minimax MSE estimator that minimizes the worst-case MSE across all possible bounded parameters .In Section III, we develop a closed-form expression for the min- imax linear estimator in the case in which the weighting and (5)

Authorized licensed use limited to: Georgia Institute of Technology. Downloaded on February 16,2010 at 17:11:22 EST from IEEE Xplore. Restrictions apply. 170 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 53, NO. 1, JANUARY 2005 where is the largest eigenvalue of Lemma 1 (Schur’s Complement): Let . We can express as the solution to

(6) be a Hermitian matrix with . Then, if and only if subject to , where is the Schur complement of in and is given by (7) From Lemma 1, it follows that the constraints (11) are satis- fied if and only if From (5)–(7), it follows that the problem (3) can be reformulated as

Tr (8) (12) subject to (7), which in turn is equivalent to Note that the constraints (12) are indeed LMIs in the variables (9) , , and . We conclude that the problem of (3) is equivalent to the SDP defined by (9) and (12). In the next section, we develop an explicit expression for subject to the optimal estimator that minimizes the worst-case estimation Tr error in the case in which the weighting matrix and the matrix have the same eigenvector matrix. (10)

We now show that the problem of (9) and (10) can be III. MINIMAX MSE ESTIMATOR FOR AND formulated as a standard semidefinite program (SDP) [17]–[19], JOINTLY DIAGONALIZABLE which is the problem of minimizing a linear objective subject We now consider the case in which and have to linear matrix inequality (LMI) constraints. An LMI is a the same eigenvector matrix. Thus, if has an eigende- matrix constraint of the form , where the matrix composition , where is a unitary matrix depends linearly on . The advantage if this formulation is and is a diagonal matrix, then for some diagonal that it readily lends itself to efficient computational methods. matrix . We then have the following proposition. Indeed, by exploiting the many well known algorithms for Proposition 1: Let denote the deterministic unknown pa- solving SDPs [17], [18], e.g., interior point methods1 [19], [20], rameters in the model , where is a known the optimal estimator can be computed efficiently in polynomial matrix with rank , and is a zero-mean random vector with time. Furthermore, SDP-based algorithms are guaranteed to positive definitive covariance . Let converge to the global optimum. where is a diagonal matrix with diagonal elements , and let , where is a diagonal matrix with diag- A. Semidefinite Programming Formulation of the Estimation onal elements . Then, the solution to Problem the problem We now establish our claim that the problem of (9) and (10) can be formulated as an SDP. To this end, let vec , where vec denotes the vector is given by obtained by stacking the columns of . With this notation, the constraints (10) become

where (11) (13) The constraints (11) are not in the form of an LMI because of the terms and in which the elements is an orthogonal projection onto the space spanned by the of do not appear linearly. To express these inequal- last columns of ities as LMIs in the variables , , and we rely on the following lemma [22, p. 472]:

1Interior point methods are iterative algorithms that terminate once a (14) prespecified accuracy has been reached. A worst-case analysis of interior point methods shows that the effort required to solve an SDP to a given accuracy grows no faster than a polynomial of the problem size. In practice, the algorithms behave much better than predicted by the worst-case analysis, and is the smallest index such that and and in fact, in many cases, the number of iterations is almost constant in the size of the problem. (15)

Authorized licensed use limited to: Georgia Institute of Technology. Downloaded on February 16,2010 at 17:11:22 EST from IEEE Xplore. Restrictions apply. ELDAR et al.: ROBUST MEAN-SQUARED ERROR ESTIMATION IN THE PRESENCE OF MODEL UNCERTAINTIES 171

Before proving Proposition 1, we note that there always exists which is equivalent to a satisfying (15). Indeed, for ,we have that (24)

Thus, our problem reduces to finding , , and that minimize (16) subject to (22) and (24). Let be any diagonal matrix with diagonal elements equal to .If satisfies the constraints (22) and (24), then so does . Indeed so that . For particular values of and , there may be smaller values of for which (15) is satisfied. Tr Tr Proof: The proof of Proposition 1 is comprised of three Tr (25) parts. First, we show that the optimal minimizing the worst- case MSE has the form where we used the fact that diagonal matrices commute and that . Similarly (17) for some matrix . We then show that can be chosen as a diagonal matrix. Finally, we derive the optimal values of the diagonal elements of . (26) We begin by showing that the optimal has the form given by (17). To this end, note that the MSE of (2) depends on only Since if and only if through and Tr . Now, for any choice of , we conclude that if satisfies (24), then so does . Therefore, if is an Tr Tr optimal matrix that minimizes subject to (22) and (24), then is also an optimal solution. Now, since the problem of Tr minimizing subject to (22) and (24) is convex, the set of optimal solutions is also convex [23], which implies that if Tr (18) is optimal for any diagonal with diagonal elements , then so is , where the summation is where over all diagonal matrices with diagonal elements .It is easy to see that is a diagonal matrix. Therefore, we have (19) shown that there exists an optimal diagonal solution . Denote by the diagonal elements of . Then, our problem is the orthogonal projection onto the range space of . reduces to In addition, since . Thus, to minimize Tr , it is sufficient to (27) consider matrices that satisfy

(20) subject to

Substituting (19) into (20), we have (28)

Since the problem of (27) and (28) is a convex optimization problem, from Lagrange duality theory [24], it follows that the (21) value of the minimum in (27), which we denote by , is equal to the optimal value of the dual problem, namely for some matrix . Denoting by the matrix , (21) reduces to (17). We now show that can be chosen as a diagonal matrix. (29) Since , we can express the constraints (10) as where the Lagrangian is given by

Tr (22) and (30) (23)

Authorized licensed use limited to: Georgia Institute of Technology. Downloaded on February 16,2010 at 17:11:22 EST from IEEE Xplore. Restrictions apply. 172 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 53, NO. 1, JANUARY 2005

Differentiating with respect to and equating to 0 Substituting the optimal value of into (31), we have that

(31) (42)

For this choice of where

(32) (43) so that

(33) and is chosen as the smallest index for which . otherwise. Denoting completes the proof of the proposition. In the special case in which , we can immediately verify The dual problem associated with (27) and (28) is therefore that so that the optimal estimator reduces to

(34) (44) subject to where

(35) Tr (45)

(36) is the variance corresponding to the LS estimator. The estimator given by (44) is a shrunken estimator proposed by Mayer and To solve (34) subject to (35) and (36), we form the Lagrangian Willke [7], which is simply a scaled version of the LS estimator, with an optimal choice of shrinkage factor. We therefore con- clude that this particular shrunken estimator has a strong opti- (37) mality property: Among all linear estimators of in the linear model (1) such that , it minimizes the worst-case esti- mation error. where from the Karush–Kuhn–Tucker (KKT) conditions [24], As we expect intuitively, when , of (44) reduces . Differentiating with respect to and equating to 0 to the LS estimator. Indeed, when the norm of can be made arbitrarily large, the MSE will also be arbitrarily large unless (38) the bias is equal to zero. Therefore, in this limit, the worst-case estimation error is minimized by choosing an estimator with so that .If , then from (38), , and from the zero bias that minimizes the variance, which leads to the LS KKT conditions, .If , then to satisfy (38), we estimator. must have that . In this case We summarize our results in the following theorem. Theorem 1: Let denote the deterministic unknown param- (39) eters in the model , where is a known matrix with rank , and is a zero-mean random vector with Thus, the optimal value of is positive definite covariance . Then, the problem

(40) is equivalent to the semidefinite programming problem where is chosen to satisfy (36). We now show that there is a unique satisfying (36). Defining subject to

(41)

is a root of . Clearly, is monotonically decreasing for , where . In addition, for , and for . Therefore, there is a unique satisfying (36). where vec . In addition, we have the following.

Authorized licensed use limited to: Georgia Institute of Technology. Downloaded on February 16,2010 at 17:11:22 EST from IEEE Xplore. Restrictions apply. ELDAR et al.: ROBUST MEAN-SQUARED ERROR ESTIMATION IN THE PRESENCE OF MODEL UNCERTAINTIES 173

1) If and have the same eigenvector matrix so To develop the solution to (47), we first consider the inner that , where is diagonal with diag- maximization problem onal elements and , where is diagonal with diagonal elements , then (48)

We note that where is an orthogonal projection onto the space spanned by the last columns of and is defined by (13), is defined by (14), and is the smallest index such that and 2) If , then (49)

where is the largest eigenvalue of the matrix where Tr is the variance corre- . We can sponding to the LS estimator. express (49) as the solution to the problem In Section VI, we provide several examples that illustrate the performance advantage of the minimax MSE estimator over the (49) conventional LS estimator. In [28], we prove analytically that the MSE of the minimax MSE estimator is smaller than that of subject to the LS estimator for all . Before proceeding to the case in which is also subject to uncertainties, we note that in Theorem 1, the bound on the norm of is assumed to be known. If is not given a priori, (51) then one possibility is to choose to be equal to the norm of the LS estimator of , namely Using Lemma 1 we can rewrite the constraint (51) as

(46)

(52) IV. MINIMAX ESTIMATION WITH UNKNOWN which is equivalent to In the previous section, we developed the optimal estimator that minimizes the worst-case estimation error across all pos- sible values of that are bounded. In our development, we as- (53) sumed that the model matrix is known exactly. However, in where many engineering applications, the model matrix is subject to uncertainties, for example, it may have been estimated from noisy data, in which case, is an approximation to some nom- inal underlying matrix. If the true data matrix is for some unknown perturbation matrix , then the actual perfor- (54) mance of an estimator designed based on alone may perform poorly. We now exploit the following proposition, the proof of which is In this section, we consider robust estimators that explicitly provided in the Appendix. take uncertainties in into account. Specifically, suppose now Lemma 2: Given matrices , , with that the model matrix is not known exactly but is rather given by , where , and denotes the matrix spectral norm [22], i.e., the largest singular value of the corre- sponding matrix. Our problem then is if and only if there exists a such that

From Proposition 2, it follows that (53) is satisfied if and only if there exists a such that

Tr (55) (47)

Authorized licensed use limited to: Georgia Institute of Technology. Downloaded on February 16,2010 at 17:11:22 EST from IEEE Xplore. Restrictions apply. 174 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 53, NO. 1, JANUARY 2005

We conclude that the problem (47) can be expressed as , . Note that in the case and for some , (59) is satisfied. (56) The assumption (59) is made for analytical tractability. If and represent convolution with some filter, and is a sta- subject to the LMI (55) and tionary process, then , , and will be Toeplitz matrices Tr (57) and are therefore approximately diagonalized by Fourier trans- form matrices of appropriate dimensions so that in this case, Using Lemma 1, we can express (57) as the LMI (59) is approximately satisfied [25]. We now show that in the case of jointly diagonalizable (58) matrices as in (59), the minimax MSE estimator of Theorem 2 reduces to a simple convex optimization problem in two unknowns and can therefore be solved very efficiently, for where vec , so that the problem of (56) subject example, using the Ellipsoidal method (see, e.g., [17, Ch. 5.2]). to (57) and (55) can be formulated as an SDP. Specifically, we have the following theorem. We summarize our results in the following theorem. Theorem 3: Let denote the deterministic unknown param- Theorem 2: Let denote the deterministic unknown param- eters in the model , where is a known eters in the model , where is a known matrix with rank , is an unknown matrix satis- matrix with rank , is an unknown matrix satisfying fying , and is a zero-mean random vector with , and is a zero-mean random vector with positive positive definitive covariance . Let , definitive covariance . Then, the problem where is a diagonal matrix with diagonal elements , let , where is a diagonal matrix with diagonal elements , and let , where is a diagonal is equivalent to the semidefinite programming problem matrix with diagonal elements . Then, the solution to the problem subject to is given by

Here is an diagonal matrix with diagonal elements , where where vec . In Theorem 2, the bound on the norm of is assumed to be known. If the value of is not specified, then one possibility is to choose to be equal to the norm of the perturbation matrix and and are the solution to the convex optimization problem resulting from the total LS estimator. In Section VI, we demonstrate that the minimax MSE esti- mator of Theorem 2 explicitly takes the uncertainties in into account and can in some cases significantly outperform the min- imax MSE estimator of Theorem 1, which does not account for subject to uncertainties in the model matrix.

V. M INIMAX ESTIMATOR FOR JOINTLY DIAGONALIZABLE MATRICES (60) Suppose now that and have the same Proof: The proof is comprised of three parts. First, we eigenvector matrix , and and have the same show that the optimal minimizing the worst-case MSE has unitary eigenvector matrix . In this case the form (61)

(59) for some matrix . We then show that can be chosen as a diagonal matrix. Finally, we derive the optimal values of where and are diagonal matrices with diagonal ele- the diagonal elements of . ments , and , , respectively, We begin by showing that the optimal has the form (61). and is an diagonal matrix with diagonal elements From Theorem 2, it follows that the minimax MSE estimator

Authorized licensed use limited to: Georgia Institute of Technology. Downloaded on February 16,2010 at 17:11:22 EST from IEEE Xplore. Restrictions apply. ELDAR et al.: ROBUST MEAN-SQUARED ERROR ESTIMATION IN THE PRESENCE OF MODEL UNCERTAINTIES 175 with a bounded uncertainty in can be expressed as the solu- TABLE I tion to SIGNAL PARAMETERS

Tr (62) subject to

(63) The constraint (63) is equivalent to for any invert- Now, let be any matrix satisfying (70), and de- ible . Choosing fine . Then

(72) (64)

since . In addition and using (59), (63) becomes Tr Tr Tr Tr Tr (73)

Therefore, the optimal value of satisfies so that the (65) problem of (68) and (69) reduces to Define Tr (74) (66) subject to (71). Once we find the optimal , the optimal can so that be found from (66) as

(66) (75) Then, the problem of (62) and (65) can be expressed in terms of as which is equivalent to (61), thus completing the first part of the proof. Tr (68) We now show that the optimal value of can be chosen as a diagonal matrix. To this end, we first note that if satisfies subject to (71), then

(69)

Let , where is the matrix consisting of the first columns of , let denote the matrix with diagonal elements , , let denote the matrix with diagonal elements , , and let denote the matrix with diagonal elements , . Then, we can express the constraint (69) as (76)

Here, is any diagonal matrix with diagonal elements .It follows from (76) that for any , where . In addition, we have that Tr Tr . Therefore, if (70) is an optimal solution, then so is . Since our problem is Clearly, if (70) is satisfied, then convex, the set of optimal solutions is also convex [23], which implies that the diagonal matrix is also a solution, where the summation is over all diagonal matrices (71) with diagonal elements . Therefore, we have shown that there exists an optimal diagonal solution .

Authorized licensed use limited to: Georgia Institute of Technology. Downloaded on February 16,2010 at 17:11:22 EST from IEEE Xplore. Restrictions apply. 176 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 53, NO. 1, JANUARY 2005

Fig. 1. Nominal blurring kernel r@z Yz A.

Denote the diagonal elements of by , and let Suppose first that . In this case, using Lemma 1, (80) is diag denote the diagonal matrix with equivalent to diagonal elements . Then, the constraint can be written as (77), shown at the bottom of the page. By permuting the rows and the columns of the matrix in (77), we can trans- form it into a block diagonal matrix, where the th block is (82)

Now, a 2 2 matrix is positive semidefinite if and only if the (78) diagonal elements and the determinant are non-negative. There- fore, (82) is equivalent to the conditions so that (77) is satisfied if and only if each of the matrices (78) (83) is positive semidefinite. Thus, the problem of (74) and (71) (84) become (85) (79) Clearly, (84) and (82) together imply (84). Furthermore, we can express (84) as subject to

(86) (80) Since the coefficient multiplying in (86) is negative, it follows that there exists a satisfying (86) if and only if the discrimi- We now show that the problem of (79) subject to (80) can be nant is non-negative, i.e., if and only if further simplified. First, we note that to satisfy (80), we must (87) have that which, using the fact that , is equivalent to (81) (88)

diag diag diag diag (77) diag diag

Authorized licensed use limited to: Georgia Institute of Technology. Downloaded on February 16,2010 at 17:11:22 EST from IEEE Xplore. Restrictions apply. ELDAR et al.: ROBUST MEAN-SQUARED ERROR ESTIMATION IN THE PRESENCE OF MODEL UNCERTAINTIES 177

TABLE II MSE USING THE LS, MINIMAX MSE, AND ROBUST MINIMAX MSE ESTIMATORS

If (88) is satisfied, then the set of ’s satisfying (86) are (84). Substituting the optimal value of into (79), our problem , where are the roots of the quadratic function becomes in (86). Since we would like to choose to minimize (79), it follows that the optimal is (92)

subject to (60). Since the problem of (79) subject to (80) is convex, and the reduced problem (91) subject to (92) is obtained by minimizing (89) over one of the variables in (79), the reduced problem is also convex, completing the proof of the theorem. Thus, if , then the optimal value of is given by (89), In Section VI, we illustrate the performance of the estimators where in addition, conditions (88) and (84) must be satisfied. of Theorems 1 and 3. Next, suppose that . In this case, to ensure that (80) is satisfied, we must have that VI. EXAMPLES

(90) The purpose of this section is to illustrate the performance ad- vantage of the minimax MSE estimator of Theorem 1 over the (91) conventional LS estimator and to demonstrate the fact that in the presence of uncertainties in the model matrix , robust estima- tion, which explicitly takes these uncertainties into account, can We can immediately verify that (90) and (91) are special cases recover the signal much better, as compared with the total LS of (89) and (88) with . We therefore conclude that method and (nonrobust) minimax MSE estimation, particularly the optimal value of is given by (89), subject to (88) and in those cases where the latter is expected to perform poorly.

Authorized licensed use limited to: Georgia Institute of Technology. Downloaded on February 16,2010 at 17:11:22 EST from IEEE Xplore. Restrictions apply. 178 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 53, NO. 1, JANUARY 2005

Fig. 2. Minimax MSE and robust minimax MSE estimation with & aHXP for the three different types of perturbations.

In the simulations below, we consider the problem of esti- where mating a two–dimensional (2-D) image from noisy observa- tions, which are obtained by blurring the image with a blurring kernel (a 2-D filter) that may not be known exactly, and adding (94) random Gaussian noise. Specifically, we generate an image which is the sum and are given parameters. Clearly, the image of three harmonic oscillations: is periodic with period . Therefore, we can represent the image by a length- vector , with components (93) . In the experiments, , and the ampli- tudes and frequencies of the harmonic components of

Authorized licensed use limited to: Georgia Institute of Technology. Downloaded on February 16,2010 at 17:11:22 EST from IEEE Xplore. Restrictions apply. ELDAR et al.: ROBUST MEAN-SQUARED ERROR ESTIMATION IN THE PRESENCE OF MODEL UNCERTAINTIES 179

Fig. 3. Minimax MSE and robust minimax MSE estimation with & aHXT for the three different types of perturbations. are given in Table I. Note that the typical magnitude of an entry where and are randomly chosen shifts (to make in , i.e., visually distinct from ), is an independent, zero-mean, Gaussian noise process so that for each and , is , and is the noise variance. The convolutional kernel is given by is 1.578. (96) The observed image is given by where is a nominal blurring filter defined by

(97) (95)

Authorized licensed use limited to: Georgia Institute of Technology. Downloaded on February 16,2010 at 17:11:22 EST from IEEE Xplore. Restrictions apply. 180 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 53, NO. 1, JANUARY 2005 with For each choice of the perturbation matrix, we compute the MSE

(101) (98) and is a bounded perturbation. The support of the convolution kernel is, up to a shift, the five-point set for the LS, minimax MSE, and robust minimax MSE estimators. . Since, in our problem, is square, the total LS method coin- By defining the vectors and with components cides with the LS method so that the LS performance is also the and , respectively, and defining the matrices and total LS performance. The MSE results are given in Table II. with the appropriate elements and , As we expect, for , the minimax MSE and robust esti- respectively, the observations can be expressed in the form of mators coincide. We also see that the minimax MSE estimator an uncertain linear model . can significantly outperform the LS estimator. For , the For our choice of , the norm (maximum singular robust estimator that takes uncertainties in into account can value) of is 1, and the condition number of is 987.95. lead to improved performance over the LS, total LS, and min- The magnitude of the Fourier transform , , imax MSE estimators, which for large values of can be quite and of is shown in Fig. 1, together with a significant. As we expect, the minimax MSE estimator performs histogram of the log of the singular values of . best for case 3, while the robust estimator performs best for case In the simulations below, we chose the weighting matrix 1. Note that even when the perturbation matrix is chosen to be with or . Note that and worst for the robust estimator, the robust estimator still performs have the same eigenvector matrix since . This choice better than the minimax MSE estimator. of reflects the fact that components of corresponding to We note that we do not compare our results with those of [13] small singular values of should receive a smaller weight and [21] since the later methods require a prior bound on the then components corresponding to large singular values. For norm of the data error, which we do not assume in our model. each choice of , we choose the bound as .For In Figs. 2 and 3, we plot the original image, the observed the bound on the norm of the perturbation matrix , we used image, and the estimated images using the minimax MSE es- seven different values: timator and the robust minimax MSE estimator for the three choices of perturbations, where in Fig. 2, , and in Fig. 3, . Since the error in the LS estimate is so large, we do not show the resulting image. The case corresponds to the case in which the model ma- trix is equal to the nominal matrix . Therefore, in this case, the APPENDIX robust estimator coincides with the (nonrobust) minimax MSE PROOF OF PROPOSITION 2 estimator. To prove the proposition, we first note that For each value of , we consider three different choices of the perturbation matrix , which are chosen as follows. For a (102) given estimator ,wedefine the worst-case perturbation as the one that maximizes if and only if for every

(99) (103)

Using the Cauchy–Schwarz inequality, we can express (103) as Thus (104) (100) We now rely on the following lemma [26, p. 23]. where is a unit singular vector of corresponding to the Lemma 3: [ -procedure] Let and largest singular value. In our simulations, we consider three spe- be two quadratic functions of , cial cases of . where and are symmetric, and there exists a satisfying . Then, the implication 1) worst case with respect to the minimax MSE estimator (i.e., is chosen as the of the minimax MSE esti- mator); 2) given by (100) with a randomly chosen unit vector holds true if and only if there exists an such that ; 3) worst case with respect to the robust minimax MSE esti- mator.

Authorized licensed use limited to: Georgia Institute of Technology. Downloaded on February 16,2010 at 17:11:22 EST from IEEE Xplore. Restrictions apply. ELDAR et al.: ROBUST MEAN-SQUARED ERROR ESTIMATION IN THE PRESENCE OF MODEL UNCERTAINTIES 181

Since is equivalent to , [27] Y. C. Eldar, A. Ben-Tal, and A. Nemirovski, “Linear Minimax regret we can use Lemma 3 to conclude that (104) is satisfied if and estimation of deterministic parameters with bounded data uncertainties,” IEEE Trans. Signal Process., vol. 52, no. 8, pp. 2177–2188, Aug. 2004. only if there exists a such that [28] Z. Ben-Haim and Y. C. Eldar, “Maximum set estimators with bounded estimation error,” IEEE Trans. Signal Process., to be published. (105) completing the proof. Yonina C. Eldar (S’98–M’02) received the B.Sc. degree in physics in 1995 and the B.Sc. degree in REFERENCES electrical engineering in 1996, both from Tel-Aviv University (TAU), Tel-Aviv, Israel, and the Ph.D. [1] T. Kailath, Lectures On Linear Least-Squares Estimation. New York: degree in electrical engineering and computer Springer, 1976. science in 2001 from the Massachusetts Institute of [2] S. M. Kay, Fundamentals of Statistical Signal Processing: Estimation Technology (MIT), Cambridge. Theory. Upper Saddle River, NJ: Prentice-Hall, 1993. From January 2002 to July 2002, she was a [3] C. W. Therrien, Discrete Random Signals and Statistical Signal Pro- Postdoctoral fellow at the Digital Signal Processing cessing. Englewood Cliffs, NJ: Prentice-Hall, 1992. Group, MIT. She is currently a Senior Lecturer [4] K. G. Gauss, Theory of Motion of Heavenly Bodies. New York: Dover, with the Department of Electrical Engineering, 1963. Technion-Israel Institute of Technology, Haifa, Israel. She is also a Research [5] A. N. Tikhonov and V. Y. Arsenin, Solution of Ill-Posed Prob- Affiliate with the Research Laboratory of Electronics at MIT. Her current lems. Washington, DC: V.H. Winston, 1977. research interests are in the general areas of signal processing, statistical signal [6] A. E. Hoerl and R. W. Kennard, “Ridge regression: Biased estimation for processing, and quantum information theory. nonorthogonal problems,” Technometr., vol. 12, pp. 55–67, Feb. 1970. Dr. Eldar was in the program for outstanding students at TAU from 1992 [7] L. S. Mayer and T. A. Willke, “On biased estimation in linear models,” to 1996. In 1998, she held the Rosenblith Fellowship for study in Electrical Technometr., vol. 15, pp. 497–508, Aug. 1973. Engineering at MIT, and in 2000, she held an IBM Research Fellowship. She is [8] Y. C. Eldar, “Quantum Signal Processing,” Ph.D. dissertation, Mass. currently a Horev Fellow of the Leaders in Science and Technology program at Inst. Technol., Cambridge, MA, 2001, http://www.ee.technion.ac.il/ the Technion and an Alon Fellow. Sites/People/YoninaEldar/Download/thesis.pdf. [9] Y. C. Eldar and A. V. Oppenheim, “Covariance shaping least-squares estimation,” IEEE Trans. Signal Process., vol. 51, pp. 686–697, Mar. 2003. [10] G. H. Golub and C. F. Van Loan, “An analysis of the total least-squares Aharon Ben-Tal received the Ph.D. degree in problem,” SIAM J. Numer. Anal., vol. 17, no. 6, pp. 883–893, Dec. 1980. applied mathematics from Northwestern University, [11] S. Van Huffel and J. Vandewalle, The Total Least-Squares Problem: Evanston, IL, in 1973. Computational Aspects and Analysis. Philadelphia, PA: SIAM, 1991, He is Professor of Operations Research and vol. 9, Frontier in Applied Mathematics. Head of Minerva Optimization Center at the Tech- [12] L. El Ghaoui and H. Lebret, “Robust solution to least-squares problems nion—Israel Institute of Technology, Haifa, Israel. with uncertain data,” SIAM J. Matrix Anal. Appl., vol. 18, no. 4, pp. He has been a Visiting Professor at the University 1035–1064, 1997. of Michigan, Ann Arbor; University of Copenhagen, [13] S. Chandrasekaran, G. H. Golub, M. Gu, and A. H. Sayed, “Parameter Copenhagen, Denmark; and Delft University of estimation in the presence of bounded data uncertainties,” SIAM J. Ma- Technology, Delft, The Netherlands. His recent in- trix Anal. Appl., vol. 19, no. 1, pp. 235–252, 1998. terests are in continuous optimization, in particular, [14] A. H. Sayed, V.H. Nascimento, and S. Chandrasekaran, “Estimation and nonsmooth and large-scale problems, conic and robust optimization, as well as control with bounded data uncertainties,” Linear Alg. Appl., vol. 284, no. convex and nonsmooth analysis. In the last 10 years, he has devoted much effort 1, pp. 259–306, Nov. 1998. to engineering applications of optimization methodology and computational [15] S. Chandrasekaran, M. Gu, A. H. Sayed, and K. E. Schubert, “The de- schemes. He has published more than 90 papers and co-authored two mono- generate bounded error-in-variables model,” SIAM J. Matrix Anal. Appl., graphs. He was Dean of the Faculty of Industrial Engineering and Management vol. 23, pp. 138–166, 2001. at the Technion from 1989 to 1992. He served as a Council Member of the [16] M. S. Pinsker, “Optimal filtering of square-integrable signals in Mathematical Programming Society from 1994 to 1997. He was Area Editor Gaussian noise,” Problems Inform. Trans., vol. 16, pp. 120–133, 1980. (continuous optimization) for Mathematics of Operations Research from 1993 [17] A. Ben-Tal and A. Nemirovski, Lectures on Modern Convex Optimiza- to 1999, a member of Editorial Boards of the SIAM Journal on Optimization and tion, ser. MPS-SIAM Series on Optimization, 2001. Mathematical Modeling and Numerical Analysis, and currently is Associate [18] L. Vandenberghe and S. Boyd, “Semidefinite programming,” SIAM Rev., Editor of the Journal of Convex Analysis and Mathematical Programming. vol. 38, no. 1, pp. 40–95, Mar. 1996. [19] Y. Nesterov and A. Nemirovski, Interior-Point Polynomial Algorithms in Convex Programming. Philadelphia, PA: SIAM, 1994. [20] F. Alizadeh, “Combinatorial Optimization With Interior Point Methods and Semi-Definite Matrices,” Ph.D. dissertation, Univ. Minnesota, Min- Arkadi Nemirovski received the Ph.D. degree neapolis, MN, 1991. in mathematics from Moscow State University, [21] A. H. Sayed, V. H. Nascimento, and F. A. M. Cipparrone, “A regularized Moscow, Russia, in 1973. robust design criterion for uncertain data,” SIAM J. Matrix Anal. Appl., He is Professor of Operations Research and vol. 23, no. 4, pp. 1120–1142, 2002. Deputy Head of Minerva Optimization Center at [22] R. A. Horn and C. R. Johnson, Matrix Analysis. Cambridge, U.K.: the Technion—Israel Institute of Technology, Haifa, Cambridge Univ. Press, 1985. Israel. He has been a visiting professor at Delft [23] D. G. Luenberger, Optimization by Vector Space Methods. New York: Technical University, Delft, The Netherland; and the Wiley, 1968. Georgia Institute of Technology, Atlanta. His recent [24] D. P. Bertsekas, Nonlinear Programming, Second ed. Belmont, MA: interests are in the areas of convex optimization Athena Scientific, 1999. (complexity of convex programming). He has co-au- [25] R. Gray, “Toeplitz and circulant matrices: A review,” Inform. Sys. Lab., thored four monographs and over 80 papers and has served as an Associated Stanford Univ., Stanford, CA, Tech. Rep. 6504-1, 1977. Editor for the journal Mathematics of Operations Research. [26] S. Boyd, L. El Ghaoui, E. Feron, and V. Balakrishnan, Linear Matrix Prof. Nemirovski received the Fulkerson Prize from the Mathematical Pro- Inequalities in System and Control Theory. Philadelphia, PA: SIAM, gramming Society and the AMS in 1982, the Dantzig Prize from the MPS and 1994. SIAM in 1991, and the John von Neumann Theory Prize of INFORMS in 2003.

Authorized licensed use limited to: Georgia Institute of Technology. Downloaded on February 16,2010 at 17:11:22 EST from IEEE Xplore. Restrictions apply.