Accurate Solution to Overdetermined Linear Equations with Errors Using L1 Norm Minimization

Computational Optimization and Applications, 17, 329–341, 2000 c 2001 Kluwer Academic Publishers. Manufactured in The Netherlands. Accurate Solution to Overdetermined Linear Equations with Errors Using L1 Norm Minimization J. BEN ROSEN [email protected] Computer Science Department, University of Minnesota, Minneapolis, MN 55455, USA; Department of Computer Science and Engineering, University of California, San Diego, La Jolla, CA 92093 HAESUN PARK [email protected] Computer Science Department, University of Minnesota, Minneapolis, MN 55455, USA JOHN GLICK [email protected] Department of Mathematics and Computer Science, University of San Diego, San Diego, CA 92110, USA LEI ZHANG [email protected] Computer Science Department, University of Minnesota, Minneapolis, MN 55455, USA Received April 6, 1999; Accepted July 6, 1999 Abstract. It has been known for many years that a robust solution to an overdetermined system of linear equations Ax b is obtained by minimizing the L1 norm of the residual error. A correct solution x to the linear system can often be obtained in this way, in spite of large errors (outliers) in some elements of the (m n) matrix A and the data vector b. This is in contrast to a least squares solution, where even one large error will typically cause a large error in x. In this paper we give necessary and sufficient conditions that the correct solution is obtained when there are some errors in A and b. Based on the sufficient condition, it is shown that if k rows of [Ab] contain large errors, the correct solution is guaranteed if (m n)/n 2k/ , where >0, is a lower bound of singular values related to A. Since m typically represents the number of measurements, this inequality shows how many data points are needed to guarantee a correct solution in the presence of large errors in some of the data. This inequality is, in fact, an upper bound, and computational results are presented, which show that the correct solution will be obtained, with high probability, for much smaller values of m n. Keywords: overdetermined linear systems, robust approximation, minimum error, zero error conditions, outliers, 1-norm, linear programming 1. Introduction The benefits of using minimization of the L1 norm in approximation problems have been known for many years [3–5, 10, 16]. The advantages are likely to be greatest when the problem data contains a relatively small number of large errors (outliers). Examples of applications where the L1 norm minimization is used to advantage are described in [8, 9, 13, 14, 20] We present an analysis for the linear, overdetermined system of equations Ax b, (1) 330 ROSEN ET AL. where there may be large errors in some elements of the (m n) matrix A, m > n, and in the data vector b ∈ Rm . Overdetermined linear equations arise in many practical applications [2, 6, 7, 15, 22]. A typical such application is the representation of a measured signal (t) as a linear combi- nation of n specified functions Xn x j ϕ j (ti ) = (ti), i = 1,...,m (2) j=1 where the functions ϕ j (t) are known (but with possible errors) and the output signal (t) is measured (also with possible errors) at m(>n) discrete times ti , i = 1,...,m. The amplitudes x j are to be determined. Letting bi = (ti) and Aij = ϕj(ti)gives the system (1). If there are no errors in A or b, then the correct value xc of x is easily obtained by choosing any set of n linearly independent rows of A, and solving the resulting (n n) system for xc. When there are errors, an important practical question is how to choose the number m, of measurements needed to obtain an accurate solution, given n, and the number k of rows of [Ab] which may contain errors. An answer to this question is given in this paper. Given a full rank matrix A and vector b, with possible errors in both, it is desired to get a solution vector x as close as possible to the correct (but unknown) vector xc. A basic assumption is that there is a correct (but unknown) matrix Ac and data vector bc such that Acxc = bc. (3) A solution x is obtained by solving min kAx bk1. (4) x In the next section we give necessary and sufficient conditions that x = xc. The minimum norm solution to (4) is given by 1 x = B bB , (5) where B is an (n n) basis matrix selected from the rows of A, and bB consists of the corresponding elements of b. If the n selected rows of A and corresponding elements of b contain no errors, then the result will be that x = xc. Clearly, if there are k rows with errors, there must be at least m n +k rows in order to get the correct solution. A sufficient condition which guarantees that such a selection will be made is given in the next section. It is shown there (Theorem 2.1) that x = xc, provided (m n)/n 2k/, where >0, is a minimum singular value of an (n n)-submatrix of B1 A. This, in fact, gives an upper bound to the value of m needed to guarantee an accurate solution. More practically useful computational results are given in §3. The closely related case when the rows of B and the elements of bB contain only small errors is also analyzed in the next section. SOLUTION TO OVERDETERMINED LINEAR EQUATIONS 331 It is also shown there that the condition is invariant with respect to the magnitude of the errors in b. This property of the L1 norm minimization is compared to the approximation x˜ obtained from the solution to the least squares problem min kAx bk2, (6) x where the error in x˜ will typically be proportional to the errors in b. In order to obtain more practically useful relationships, extensive computational tests were carried out. The results of these tests give an empirical formula and curves which show how many data points (m) are needed to give x = xc with any specified probability P, assuming that there are no more than k rows with large errors. These results are presented in §3. The computational results show that for k rows with errors, k 50 and n 40, (4) will give x = xc with probability P 0.995 when m n 22 + 2k. Note that this value of m n depends linearly on k, as does the upper bound in Theorem 2.1, but it is substantially smaller, and therefore more useful in practice. Given an optimal solution to (4), and the corresponding basis B, it is also of interest to determine what changes can be made in b without changing the optimality of the basis B. This situation can be analyzed by a sensitivity analysis of the linear program corresponding to (4). This linear program is given by (31) with w = x and d = b. Such an analysis is described in [10]. The use of the L1 norm minimization has also been investigated for the more difficult parameter estimation problem A()x b. (7) For this type of problem it is necessary to determine the parameter vector as well as the vector x when there are errors in the data vector b. Many signal identification problems can be formulated in this way [1, 11, 12, 18, 19, 21]. The minimization problem now becomes min kA()x bk1. (8) ,x The theoretical results presented here for (4) are relevant for (8), but are not directly ap- plicable. An iterative algorithm for the solution of (8) is described and analyzed in [18]. It has been applied to the identification of a complex signal with some large errors in the data. The computational results presented in [19] show that, as with the linear case (4), the solution obtained using the L1 norm is very robust with respect to large data errors. 2. Conditions for correct solution 2.1. Necessary and sufficient conditions In this section we investigate when the L1 minimization (4) will give the correct solution xc when there may be large errors in some rows of A and in some elements of b. A condition is formulated in terms of a scalar , and a vector d. It is then shown that kdk1 is a 332 ROSEN ET AL. necessary condition and kdk1 <is a sufficient condition that x = xc. A closely related situation, which is more likely to occur in practice, is where there may be large errors in some rows of A and in some elements of b, and in addition, there are small errors in all elements of b. A sufficient condition that x xc for this situation is also given. m We are given a full rank (m n) matrix A and a data vector b ∈ R . The minimum L1 norm solution x is given by (4). There may be large errors in some rows of A and in some elements of b. However, as given by (3) in §1, we assume that there is a correct matrix Ac and correct data vector bc such that all m equations are satisfied by a correct vector xc. The errors in A are represented by the (m n) matrix E, and the errors in b by the vector m de ∈ R , so that A = Ac + E, (9) b = bc + de. Also, let w represent the error in x, so that x = xc + w. (10) Finally, let d = de Exc. (11) It follows from (3), (9), (10) and (11) that Ax b = Aw d. (12) Therefore, (4) is equivalent to min kAw dk .

Load more