Least Squares Solutions to Linear Systems of Equations
Total Page:16
File Type:pdf, Size:1020Kb
1 2 Least Squares with Examples in Signal Processing The `sum of squares' of x is denoted by kxk2, i.e., Ivan Selesnick March 7, 2013 NYU-Poly 2 X 2 T kxk2 = jx(n)j = x x: (3) These notes address (approximate) solutions to linear equations by least n squares. We deal with the `easy' case wherein the system matrix is full The `energy' of a vector x refers to kxk2. rank. If the system matrix is rank deficient, then other methods are 2 In these notes, it is assumed that all vectors and matrices are real-valued. needed, e.g., QR decomposition, singular value decomposition, or the In the complex-valued case, the conjugate transpose should be used in place pseudo-inverse, [2,3]. of the transpose, etc. In these notes, least squares is illustrated by applying it to several basic problems in signal processing: 2 Overdetermined equations 1. Linear prediction Consider the system of linear equations 2. Smoothing y = Hx: 3. Deconvolution If there is no solution to this system of equations, then the system is 4. System identification `overdetermined'. This frequently happens when H is a `tall' matrix (more rows than columns) with linearly independent columns. 5. Estimating missing data In this case, it is common to seek a solution x minimizing the energy of For the use of least squares in filter design, see [1]. the error: 2 1 Notation J(x) = ky − Hxk2: We denote vectors in lower-case bold, i.e., Expanding J(x) gives 2 x 3 1 T 6 7 J(x) = (y − Hx) (y − Hx) (4) 6 x2 7 x = 6 : 7 : (1) = yT y − yT Hx − xT HT y + xT HT Hx (5) 6 : 7 4 : 5 = yT y − 2yT Hx + xT HT Hx: (6) xN We denote matrices in upper-case bold. The transpose of a vector or matrix Note that each of the four terms in (5) are scalars. Note also that the scalar in indicated by a superscript T , i.e., xT is the transpose of x. xT HT y is the transpose of the scalar yT Hx, and hence xT HT y = yT Hx. The notation kxk2 refers to the Euclidian length of the vector x, i.e., Taking the derivative (see AppendixA), gives q @ kxk = jx j2 + jx j2 + ··· + jx j2: (2) J(x) = −2HT y + 2HT Hx 2 1 2 N @x 1 For feedback/corrections, email [email protected]. Setting the derivative to zero, Matlab software to reproduce the examples in these notes is available on the web or from the author. http://eeweb.poly.edu/iselesni/lecture_notes/ @ J(x) = 0 =) HT Hx = HT y Last edit: November 26, 2013 @x 1 Let us assume that HT H is invertible. Then the solution is given by Set the derivatives to zero to get: T −1 T 1 x = (H H) H y: x = HT µ (11) 2 This is the `least squares' solution. y = Hx (12) 2 T −1 T min ky − Hxk2 =) x = (H H) H y (7) x Plugging (11) into (12) gives In some situations, it is desirable to minimize the weighted square error, 1 T P 2 y = HH µ: i.e., n wn rn where r is the residual, or error, r = y − Hx, and wn are 2 positive weights. This corresponds to minimizing kW1=2(y − Hx)k2 where 2 Let us assume HHT is invertible. Then W is the diagonal matrix, [W]n;n = wn. Using (7) gives T −1 1=2 2 T −1 T µ = 2(HH ) y: (13) min kW (y − Hx)k2 =) x = (H WH) H Wy (8) x where we have used the fact that W is symmetric. Plugging (13) into (11) gives the `least squares' solution: 3 Underdetermined equations x = HT (HHT )−1y: Consider the system of linear equations We can verify that x in this formula does in fact satisfy y = Hx by plugging y = Hx: in: T T −1 T T −1 If there are many solutions, then the system is `underdetermined'. This Hx = H H (HH ) y = (HH )(HH ) y = y X frequently happens when H is a `wide' matrix (more columns than rows) with linearly independent rows. So, In this case, it is common to seek a solution x with minimum norm. That 2 T T −1 min kxk2 s.t. y = Hx =) x = H (HH ) y: (14) is, we would like to solve the optimization problem x 2 In some situations, it is desirable to minimize the weighted energy, i.e., min kxk2 (9) x P 2 n wn xn, where wn are positive weights. This corresponds to minimizing such that y = Hx: (10) 1=2 2 kW xk2 where W is the diagonal matrix, [W]n;n = wn. The derivation Minimization with constraints can be done with Lagrange multipliers. So, of the solution is similar, and gives define the Lagrangian: 1=2 2 −1 T −1 T −1 min kW xk2 s.t. y = Hx =) x = W H HW H y 2 T x L(x; µ) = kxk2 + µ (y − Hx) (15) Take the derivatives of the Lagrangian: @ This solution is also derived below, see (25). L(x) = 2x − HT µ @x 4 Regularization @ In the overdetermined case, we minimized ky − Hxk2. In the underde- L(x) = y − Hx 2 @µ 2 termined case, we minimized kxk2. Another approach is to minimize the 2 2 2 weighted sum: c1 ky − Hxk2 + c2 kxk2. The solution x depends on the Setting the derivative to zero, ratio c =c , not on c and c individually. 2 1 1 2 @ A common approach to obtain an inexact solution to a linear system is J(x) = 0 =) HT Hx + λAT Ax = HT y @x to minimize the objective function: =) (HT H + λAT A) x = HT y J(x) = ky − Hxk2 + λ kxk2 (16) 2 2 So the solution is given by where λ > 0. Taking the derivative, we get x = (HT H + λAT A)−1HT y @ J(x) = 2HT (Hx − y) + 2λx @x So, Setting the derivative to zero, 2 2 min ky − Hxk2 + λ kAxk2 x @ T T (18) J(x) = 0 =) H Hx + λx = H y T T −1 T @x =) x = (H H + λA A) H y =) (HT H + λI) x = HT y Note that if A is the identity matrix, then equation (18) becomes (17). So the solution is given by 6 Constrained least squares x = (HT H + λI)−1HT y Constrained least squares refers to the problem of finding a least squares solution that exactly satisfies additional constraints. If the additional So, constraints are a set of linear equations, then the solution is obtained as 2 2 T −1 T min ky − Hxk2 + λ kxk2 =) x = (H H + λI) H y (17) follows. x The constrained least squares problem is of the form: This is referred to as `diagonal loading' because a constant, λ, is added 2 T min ky − Hxk2 (19) to the diagonal elements of H H. The approach also avoids the problem x T T of rank deficiency because H H + λI is invertible even if H H is not. In such that Cx = b (20) addition, the solution (17) can be used in both cases: when H is tall and when H is wide. Define the Lagrangian, 5 Weighted regularization 2 T L(x; µ) = ky − Hxk2 + µ (Cx − b): A more general form of the regularized objective function (16) is: The derivatives are: J(x) = ky − Hxk2 + λ kAxk2 2 2 @ L(x) = 2HT (Hx − y) + CT µ where λ > 0. Taking the derivative, we get @x @ @ J(x) = 2HT (Hx − y) + 2λAT Ax L(x) = Cx − b @x @µ 3 Setting the derivatives to zero, 6.1 Special cases @ Simpler forms of (23) are frequently useful. For example, if H = I and L(x) = 0 =) x = (HT H)−1(HT y − 0:5CT µ) (21) @x b = 0 in (23), then we get @ 2 L(x) = 0 =) Cx = b (22) min ky − xk2 s.t. Cx = 0 @µ x (24) −1 =) x = y − CT CCT Cy Multiplying (21) on the left by C gives Cx, which from (22) is b, so we have If y = 0 in (23), then we get T −1 T T C(H H) (H y − 0:5C µ) = b 2 min kHxk2 s.t. Cx = b x (25) −1 or, expanding, =) x = (HT H)−1 CT C(HT H)−1CT b T −1 T T −1 T C(H H) H y − 0:5C(H H) C µ = b: If y = 0 and H = I in (23), then we get Solving for µ gives 2 T T −1 min kxk2 s.t. Cx = b =) x = C CC b (26) x −1 µ = 2 C(HT H)−1CT C(HT H)−1HT y − b which is the same as (14). Plugging µ into (21) gives 7 Note T −1 T T T −1 T −1 T −1 T x = (H H) H y − C C(H H) C C(H H) H y − b The expressions above involve matrix inverses. For example, (7) involves T −1 Let us verify that x in this formula does in fact satisfy Cx = b, (H H) . However, it must be emphasized that finding the least square solution does not require computing the inverse of HT H even though the T −1 T Cx = C(H H) H y− inverse appears in the formula. Instead, x in (7) should be obtained, in −1 T T CT C(HT H)−1CT C(HT H)−1HT y − b practice, by solving the system Ax = b where A = H H and b = H y.