Least Squares Solutions to Linear Systems of Equations

1 2 Least Squares with Examples in Signal Processing The `sum of squares' of x is denoted by kxk2, i.e., Ivan Selesnick March 7, 2013 NYU-Poly 2 X 2 T kxk2 = jx(n)j = x x: (3) These notes address (approximate) solutions to linear equations by least n squares. We deal with the èasy' case wherein the system matrix is full The ènergy' of a vector x refers to kxk2. rank. If the system matrix is rank deficient, then other methods are 2 In these notes, it is assumed that all vectors and matrices are real-valued. needed, e.g., QR decomposition, singular value decomposition, or the In the complex-valued case, the conjugate transpose should be used in place pseudo-inverse, [2,3]. of the transpose, etc. In these notes, least squares is illustrated by applying it to several basic problems in signal processing: 2 Overdetermined equations 1. Linear prediction Consider the system of linear equations 2. Smoothing y = Hx: 3. Deconvolution If there is no solution to this system of equations, then the system is 4. System identification òverdetermined'. This frequently happens when H is a `tall' matrix (more rows than columns) with linearly independent columns. 5. Estimating missing data In this case, it is common to seek a solution x minimizing the energy of For the use of least squares in filter design, see [1]. the error: 2 1 Notation J(x) = ky − Hxk2: We denote vectors in lower-case bold, i.e., Expanding J(x) gives 2 x 3 1 T 6 7 J(x) = (y − Hx) (y − Hx) (4) 6 x2 7 x = 6 : 7 : (1) = yT y − yT Hx − xT HT y + xT HT Hx (5) 6 : 7 4 : 5 = yT y − 2yT Hx + xT HT Hx: (6) xN We denote matrices in upper-case bold. The transpose of a vector or matrix Note that each of the four terms in (5) are scalars. Note also that the scalar in indicated by a superscript T , i.e., xT is the transpose of x. xT HT y is the transpose of the scalar yT Hx, and hence xT HT y = yT Hx. The notation kxk2 refers to the Euclidian length of the vector x, i.e., Taking the derivative (see AppendixA), gives q @ kxk = jx j2 + jx j2 + ··· + jx j2: (2) J(x) = −2HT y + 2HT Hx 2 1 2 N @x 1 For feedback/corrections, email [email protected]. Setting the derivative to zero, Matlab software to reproduce the examples in these notes is available on the web or from the author. http://eeweb.poly.edu/iselesni/lecture_notes/ @ J(x) = 0 =) HT Hx = HT y Last edit: November 26, 2013 @x 1 Let us assume that HT H is invertible. Then the solution is given by Set the derivatives to zero to get: T −1 T 1 x = (H H) H y: x = HT µ (11) 2 This is the `least squares' solution. y = Hx (12) 2 T −1 T min ky − Hxk2 =) x = (H H) H y (7) x Plugging (11) into (12) gives In some situations, it is desirable to minimize the weighted square error, 1 T P 2 y = HH µ: i.e., n wn rn where r is the residual, or error, r = y − Hx, and wn are 2 positive weights. This corresponds to minimizing kW1=2(y − Hx)k2 where 2 Let us assume HHT is invertible. Then W is the diagonal matrix, [W]n;n = wn. Using (7) gives T −1 1=2 2 T −1 T µ = 2(HH ) y: (13) min kW (y − Hx)k2 =) x = (H WH) H Wy (8) x where we have used the fact that W is symmetric. Plugging (13) into (11) gives the `least squares' solution: 3 Underdetermined equations x = HT (HHT )−1y: Consider the system of linear equations We can verify that x in this formula does in fact satisfy y = Hx by plugging y = Hx: in: T T −1 T T −1 If there are many solutions, then the system is ùnderdetermined'. This Hx = H H (HH ) y = (HH )(HH ) y = y X frequently happens when H is a `wide' matrix (more columns than rows) with linearly independent rows. So, In this case, it is common to seek a solution x with minimum norm. That 2 T T −1 min kxk2 s.t. y = Hx =) x = H (HH ) y: (14) is, we would like to solve the optimization problem x 2 In some situations, it is desirable to minimize the weighted energy, i.e., min kxk2 (9) x P 2 n wn xn, where wn are positive weights. This corresponds to minimizing such that y = Hx: (10) 1=2 2 kW xk2 where W is the diagonal matrix, [W]n;n = wn. The derivation Minimization with constraints can be done with Lagrange multipliers. So, of the solution is similar, and gives define the Lagrangian: 1=2 2 −1 T −1 T −1 min kW xk2 s.t. y = Hx =) x = W H HW H y 2 T x L(x; µ) = kxk2 + µ (y − Hx) (15) Take the derivatives of the Lagrangian: @ This solution is also derived below, see (25). L(x) = 2x − HT µ @x 4 Regularization @ In the overdetermined case, we minimized ky − Hxk2. In the underde- L(x) = y − Hx 2 @µ 2 termined case, we minimized kxk2. Another approach is to minimize the 2 2 2 weighted sum: c1 ky − Hxk2 + c2 kxk2. The solution x depends on the Setting the derivative to zero, ratio c =c , not on c and c individually. 2 1 1 2 @ A common approach to obtain an inexact solution to a linear system is J(x) = 0 =) HT Hx + λAT Ax = HT y @x to minimize the objective function: =) (HT H + λAT A) x = HT y J(x) = ky − Hxk2 + λ kxk2 (16) 2 2 So the solution is given by where λ > 0. Taking the derivative, we get x = (HT H + λAT A)−1HT y @ J(x) = 2HT (Hx − y) + 2λx @x So, Setting the derivative to zero, 2 2 min ky − Hxk2 + λ kAxk2 x @ T T (18) J(x) = 0 =) H Hx + λx = H y T T −1 T @x =) x = (H H + λA A) H y =) (HT H + λI) x = HT y Note that if A is the identity matrix, then equation (18) becomes (17). So the solution is given by 6 Constrained least squares x = (HT H + λI)−1HT y Constrained least squares refers to the problem of finding a least squares solution that exactly satisfies additional constraints. If the additional So, constraints are a set of linear equations, then the solution is obtained as 2 2 T −1 T min ky − Hxk2 + λ kxk2 =) x = (H H + λI) H y (17) follows. x The constrained least squares problem is of the form: This is referred to as `diagonal loading' because a constant, λ, is added 2 T min ky − Hxk2 (19) to the diagonal elements of H H. The approach also avoids the problem x T T of rank deficiency because H H + λI is invertible even if H H is not. In such that Cx = b (20) addition, the solution (17) can be used in both cases: when H is tall and when H is wide. Define the Lagrangian, 5 Weighted regularization 2 T L(x; µ) = ky − Hxk2 + µ (Cx − b): A more general form of the regularized objective function (16) is: The derivatives are: J(x) = ky − Hxk2 + λ kAxk2 2 2 @ L(x) = 2HT (Hx − y) + CT µ where λ > 0. Taking the derivative, we get @x @ @ J(x) = 2HT (Hx − y) + 2λAT Ax L(x) = Cx − b @x @µ 3 Setting the derivatives to zero, 6.1 Special cases @ Simpler forms of (23) are frequently useful. For example, if H = I and L(x) = 0 =) x = (HT H)−1(HT y − 0:5CT µ) (21) @x b = 0 in (23), then we get @ 2 L(x) = 0 =) Cx = b (22) min ky − xk2 s.t. Cx = 0 @µ x (24) −1 =) x = y − CT CCT Cy Multiplying (21) on the left by C gives Cx, which from (22) is b, so we have If y = 0 in (23), then we get T −1 T T C(H H) (H y − 0:5C µ) = b 2 min kHxk2 s.t. Cx = b x (25) −1 or, expanding, =) x = (HT H)−1 CT C(HT H)−1CT b T −1 T T −1 T C(H H) H y − 0:5C(H H) C µ = b: If y = 0 and H = I in (23), then we get Solving for µ gives 2 T T −1 min kxk2 s.t. Cx = b =) x = C CC b (26) x −1 µ = 2 C(HT H)−1CT C(HT H)−1HT y − b which is the same as (14). Plugging µ into (21) gives 7 Note T −1 T T T −1 T −1 T −1 T x = (H H) H y − C C(H H) C C(H H) H y − b The expressions above involve matrix inverses. For example, (7) involves T −1 Let us verify that x in this formula does in fact satisfy Cx = b, (H H) . However, it must be emphasized that finding the least square solution does not require computing the inverse of HT H even though the T −1 T Cx = C(H H) H y− inverse appears in the formula. Instead, x in (7) should be obtained, in −1 T T CT C(HT H)−1CT C(HT H)−1HT y − b practice, by solving the system Ax = b where A = H H and b = H y.

Least Squares Solutions to Linear Systems of Equations

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support