MATH 3795 Lecture 10. Regularized Linear Least Squares

MATH 3795 Lecture 10. Regularized Linear Least Squares. Dmitriy Leykekhman Fall 2008 Goals I Understanding the regularization. D. Leykekhman - MATH 3795 Introduction to Computational Mathematics Linear Least Squares { 1 Review. I Consider the linear least square problem min kAx − bk2: n 2 x2R From the last lecture: T m×n I Let A = UΣV be the Singular Value Decomposition of A 2 R with singular values σ1 ≥ · · · ≥ σr > σr+1 = ··· = σmin fm;ng = 0 I The minimum norm solution is r X uT b x = i v y σ i i=1 i I If even one singular value σi is small, then small perturbations in b can lead to large errors in the solution. D. Leykekhman - MATH 3795 Introduction to Computational Mathematics Linear Least Squares { 2 Regularized Linear Least Squares Problems. I If σ1/σr 1, then it might be useful to consider the regularized linear least squares problem (Tikhonov regularization) 1 λ min kAx − bk2 + kxk2: n 2 2 x2R 2 2 Here λ > 0 is the regularization parameter. I The regularization parameter λ > 0 is not known a-priori and has to be determined based on the problem data. See later. I Observe that 2 1 2 λ 2 pA b min kAx − bk2 + kxk2 = min x − : x x λI 0 2 2 2 D. Leykekhman - MATH 3795 Introduction to Computational Mathematics Linear Least Squares { 3 Regularized Linear Least Squares Problems. I Thus 2 1 2 λ 2 pA b min kAx − bk2 + kxk2 = min x − : (1) x x λI 0 2 2 2 I For λ > 0 the matrix A p 2 (m+n)×n λI R has always full rank n. Hence, for λ > 0, the regularized linear least squares problem (1) has a unique solution. I The normal equation corresponding to (1) are given by T T T A A A b p p x = (AT A+λI)x = AT b = p : λI λI λI 0 D. Leykekhman - MATH 3795 Introduction to Computational Mathematics Linear Least Squares { 4 I Rearranging terms we obtain T T T T V (Σ Σ + λI)V xλ = V Σ U b; T T multiplying both sides by V from the left and setting z = V xλ we get (ΣT Σ + λI)z = ΣT U T b: Regularized Linear Least Squares Problems. T m×m I SVD Decomposition: A = UΣV , where U 2 R and V 2 Rn×n are orthogonal matrices and Σ 2 Rm×n is a 'diagonal' matrix with diagonal entries σ1 ≥ · · · ≥ σr > σr+1 = ··· = σmin fm;ng = 0: I Thus the normal to (1) T T (A A + λI)xλ = A b; can be written as T T T T T (V Σ U U ΣV + λ I )xλ = V Σ U b: | {z } |{z} =I =VV T D. Leykekhman - MATH 3795 Introduction to Computational Mathematics Linear Least Squares { 5 Regularized Linear Least Squares Problems. T m×m I SVD Decomposition: A = UΣV , where U 2 R and V 2 Rn×n are orthogonal matrices and Σ 2 Rm×n is a 'diagonal' matrix with diagonal entries σ1 ≥ · · · ≥ σr > σr+1 = ··· = σmin fm;ng = 0: I Thus the normal to (1) T T (A A + λI)xλ = A b; can be written as T T T T T (V Σ U U ΣV + λ I )xλ = V Σ U b: | {z } |{z} =I =VV T I Rearranging terms we obtain T T T T V (Σ Σ + λI)V xλ = V Σ U b; T T multiplying both sides by V from the left and setting z = V xλ we get (ΣT Σ + λI)z = ΣT U T b: D. Leykekhman - MATH 3795 Introduction to Computational Mathematics Linear Least Squares { 5 Regularized Linear Least Squares Problems. I The normal equation corresponding to (1), T T (A A + λI)xλ = A b; is equivalent to (ΣT Σ + λI) z = ΣT U T b: | {z } diagonal T where z = V xλ. I Solution ( T σi(ui b) σ2+λ ; i = 1; : : : ; r; zi = i 0; i = r + 1; : : : ; n: Pn I Since xλ = V z = i=1 zivi, the solution of the regularized linear least squares problem (1) is given by r T X σi(u b) x = i v : λ σ2 + λ i i=1 i D. Leykekhman - MATH 3795 Introduction to Computational Mathematics Linear Least Squares { 6 I The representation r T X σi(u b) x = i v λ σ2 + λ i i=1 i of the solution of the regularized LLS also reveals the regularizing λ 2 property of adding the term 2 kxk2 to the (ordinary) least squares functional. We have that T ( σi(ui b) 0; if 0 ≈ σi λ ≈ T 2 ui b σ + λ ; if σi λ. i σi Regularized Linear Least Squares Problems. I Note that r T r T X σi(ui b) X ui b lim xλ = lim vi = vi = xy λ!0 λ!0 σ2 + λ σ i=1 i i=1 i i.e., the solution of the regularized LLS problem (1) converges to the minimum norm solution of the LLS problem as λ goes to zero. D. Leykekhman - MATH 3795 Introduction to Computational Mathematics Linear Least Squares { 7 Regularized Linear Least Squares Problems. I Note that r T r T X σi(ui b) X ui b lim xλ = lim vi = vi = xy λ!0 λ!0 σ2 + λ σ i=1 i i=1 i i.e., the solution of the regularized LLS problem (1) converges to the minimum norm solution of the LLS problem as λ goes to zero. I The representation r T X σi(u b) x = i v λ σ2 + λ i i=1 i of the solution of the regularized LLS also reveals the regularizing λ 2 property of adding the term 2 kxk2 to the (ordinary) least squares functional. We have that T ( σi(ui b) 0; if 0 ≈ σi λ ≈ T 2 ui b σ + λ ; if σi λ. i σi D. Leykekhman - MATH 3795 Introduction to Computational Mathematics Linear Least Squares { 7 Regularized Linear Least Squares Problems. λ 2 I Hence, adding 2 kxk2 to the (ordinary) least squares functional acts as a filter. Contributions from singular values which are large relative to the regularization parameter λ are left (almost) unchanged whereas contributions from small singular values are (almost) eliminated. I It raises an important question: How to choose λ? D. Leykekhman - MATH 3795 Introduction to Computational Mathematics Linear Least Squares { 8 Regularized Linear Least Squares Problems. I Suppose that the data are b = bex + δb. We want to compute the minimum norm solution of the (ordinary) LLS with unperturbed data bex r X uT b x = i v ex σ i i=1 i but we can only compute with b = bex + δb, we don't know bex. I The solution of the regularized least squares problem is r T T X σi(u bex) σi(u δb) x = i + i v : λ σ2 + λ σ2 + λ i i=1 i i D. Leykekhman - MATH 3795 Introduction to Computational Mathematics Linear Least Squares { 9 Regularized Linear Least Squares Problems. I We observed that r T X σi(u bex) i ! x as λ ! 0: σ2 + λ ex i=1 i I On the other hand T ( σi(ui δb) 0; if 0 ≈ σi λ ≈ T 2 ui δb σ + λ ; if σi λ, i σi which suggests to choose λ sufficiently large to ensure that errors δb in the data are not magnified by small singular values. D. Leykekhman - MATH 3795 Introduction to Computational Mathematics Linear Least Squares { 10 Regularized Linear Least Squares Problems. % Compute A t = 10.^(0:-1:-10)'; A = [ ones(size(t)) t t.^2 t.^3 t.^4 t.^5]; % compute exact data xex = ones(6,1); bex = A*xex; % data perturbation of 0.1% deltab = 0.001*(0.5-rand(size(bex))).*bex; b = bex+deltab; % compute SVD of A [U,S,V] = svd(A); sigma = diag(S); for i = 0:7 % solve regularized LLS for different lambda lambda(i+1) = 10^(-i) xlambda = V * (sigma.*(U'*b) ./ (sigma.^2 + lambda(i+1))) err(i+1) = norm(xlambda - xex); end D. Leykekhmanloglog(lambda,err,'*'); - MATH 3795 Introduction to Computational Mathematics ylabel('||x^{ex} Linear Least Squares {- 11x_{\lambda}||_2'); xlabel('\lambda'); −3 I For this example λ ≈ 10 is a good choice for the regularization parameter λ. However, we could only create this figure with the knowledge of the desired solution xex. I How can we determine a λ ≥ 0 so that kxex − xλk2 is small without knowledge of xex. I One approach is the Morozov discrepancy principle. Regularized Linear Least Squares Problem. I The error kxex − xλk2 for different values of λ (loglog-scale): D. Leykekhman - MATH 3795 Introduction to Computational Mathematics Linear Least Squares { 12 Regularized Linear Least Squares Problem. I The error kxex − xλk2 for different values of λ (loglog-scale): −3 I For this example λ ≈ 10 is a good choice for the regularization parameter λ. However, we could only create this figure with the knowledge of the desired solution xex. I How can we determine a λ ≥ 0 so that kxex − xλk2 is small without knowledge of xex. I One approach is the Morozov discrepancy principle. D. Leykekhman - MATH 3795 Introduction to Computational Mathematics Linear Least Squares { 12 Regularized Linear Least Squares Problem. I Suppose b = bex + δb. We do not know the perturbation δb, but we assume that we know its size kδbk. I Suppose the unknown desired solution xex satisfies Axex = bex. I Hence, kAxex − bk = kAxex − bex − δbk = kδbk: I Since the exact solution satisfies kAxex − bk = kδbk we want to find a regularization parameter λ ≥ 0 such that the solution xλ of the regularized least squares problem satisfies kAxλ − bk = kδbk This is Morozov's discrepancy principle.

Load more