MATH 3795 Lecture 10. Regularized Linear Least Squares

Total Page:16

File Type:pdf, Size:1020Kb

MATH 3795 Lecture 10. Regularized Linear Least Squares MATH 3795 Lecture 10. Regularized Linear Least Squares. Dmitriy Leykekhman Fall 2008 Goals I Understanding the regularization. D. Leykekhman - MATH 3795 Introduction to Computational Mathematics Linear Least Squares { 1 Review. I Consider the linear least square problem min kAx − bk2: n 2 x2R From the last lecture: T m×n I Let A = UΣV be the Singular Value Decomposition of A 2 R with singular values σ1 ≥ · · · ≥ σr > σr+1 = ··· = σmin fm;ng = 0 I The minimum norm solution is r X uT b x = i v y σ i i=1 i I If even one singular value σi is small, then small perturbations in b can lead to large errors in the solution. D. Leykekhman - MATH 3795 Introduction to Computational Mathematics Linear Least Squares { 2 Regularized Linear Least Squares Problems. I If σ1/σr 1, then it might be useful to consider the regularized linear least squares problem (Tikhonov regularization) 1 λ min kAx − bk2 + kxk2: n 2 2 x2R 2 2 Here λ > 0 is the regularization parameter. I The regularization parameter λ > 0 is not known a-priori and has to be determined based on the problem data. See later. I Observe that 2 1 2 λ 2 pA b min kAx − bk2 + kxk2 = min x − : x x λI 0 2 2 2 D. Leykekhman - MATH 3795 Introduction to Computational Mathematics Linear Least Squares { 3 Regularized Linear Least Squares Problems. I Thus 2 1 2 λ 2 pA b min kAx − bk2 + kxk2 = min x − : (1) x x λI 0 2 2 2 I For λ > 0 the matrix A p 2 (m+n)×n λI R has always full rank n. Hence, for λ > 0, the regularized linear least squares problem (1) has a unique solution. I The normal equation corresponding to (1) are given by T T T A A A b p p x = (AT A+λI)x = AT b = p : λI λI λI 0 D. Leykekhman - MATH 3795 Introduction to Computational Mathematics Linear Least Squares { 4 I Rearranging terms we obtain T T T T V (Σ Σ + λI)V xλ = V Σ U b; T T multiplying both sides by V from the left and setting z = V xλ we get (ΣT Σ + λI)z = ΣT U T b: Regularized Linear Least Squares Problems. T m×m I SVD Decomposition: A = UΣV , where U 2 R and V 2 Rn×n are orthogonal matrices and Σ 2 Rm×n is a 'diagonal' matrix with diagonal entries σ1 ≥ · · · ≥ σr > σr+1 = ··· = σmin fm;ng = 0: I Thus the normal to (1) T T (A A + λI)xλ = A b; can be written as T T T T T (V Σ U U ΣV + λ I )xλ = V Σ U b: | {z } |{z} =I =VV T D. Leykekhman - MATH 3795 Introduction to Computational Mathematics Linear Least Squares { 5 Regularized Linear Least Squares Problems. T m×m I SVD Decomposition: A = UΣV , where U 2 R and V 2 Rn×n are orthogonal matrices and Σ 2 Rm×n is a 'diagonal' matrix with diagonal entries σ1 ≥ · · · ≥ σr > σr+1 = ··· = σmin fm;ng = 0: I Thus the normal to (1) T T (A A + λI)xλ = A b; can be written as T T T T T (V Σ U U ΣV + λ I )xλ = V Σ U b: | {z } |{z} =I =VV T I Rearranging terms we obtain T T T T V (Σ Σ + λI)V xλ = V Σ U b; T T multiplying both sides by V from the left and setting z = V xλ we get (ΣT Σ + λI)z = ΣT U T b: D. Leykekhman - MATH 3795 Introduction to Computational Mathematics Linear Least Squares { 5 Regularized Linear Least Squares Problems. I The normal equation corresponding to (1), T T (A A + λI)xλ = A b; is equivalent to (ΣT Σ + λI) z = ΣT U T b: | {z } diagonal T where z = V xλ. I Solution ( T σi(ui b) σ2+λ ; i = 1; : : : ; r; zi = i 0; i = r + 1; : : : ; n: Pn I Since xλ = V z = i=1 zivi, the solution of the regularized linear least squares problem (1) is given by r T X σi(u b) x = i v : λ σ2 + λ i i=1 i D. Leykekhman - MATH 3795 Introduction to Computational Mathematics Linear Least Squares { 6 I The representation r T X σi(u b) x = i v λ σ2 + λ i i=1 i of the solution of the regularized LLS also reveals the regularizing λ 2 property of adding the term 2 kxk2 to the (ordinary) least squares functional. We have that T ( σi(ui b) 0; if 0 ≈ σi λ ≈ T 2 ui b σ + λ ; if σi λ. i σi Regularized Linear Least Squares Problems. I Note that r T r T X σi(ui b) X ui b lim xλ = lim vi = vi = xy λ!0 λ!0 σ2 + λ σ i=1 i i=1 i i.e., the solution of the regularized LLS problem (1) converges to the minimum norm solution of the LLS problem as λ goes to zero. D. Leykekhman - MATH 3795 Introduction to Computational Mathematics Linear Least Squares { 7 Regularized Linear Least Squares Problems. I Note that r T r T X σi(ui b) X ui b lim xλ = lim vi = vi = xy λ!0 λ!0 σ2 + λ σ i=1 i i=1 i i.e., the solution of the regularized LLS problem (1) converges to the minimum norm solution of the LLS problem as λ goes to zero. I The representation r T X σi(u b) x = i v λ σ2 + λ i i=1 i of the solution of the regularized LLS also reveals the regularizing λ 2 property of adding the term 2 kxk2 to the (ordinary) least squares functional. We have that T ( σi(ui b) 0; if 0 ≈ σi λ ≈ T 2 ui b σ + λ ; if σi λ. i σi D. Leykekhman - MATH 3795 Introduction to Computational Mathematics Linear Least Squares { 7 Regularized Linear Least Squares Problems. λ 2 I Hence, adding 2 kxk2 to the (ordinary) least squares functional acts as a filter. Contributions from singular values which are large relative to the regularization parameter λ are left (almost) unchanged whereas contributions from small singular values are (almost) eliminated. I It raises an important question: How to choose λ? D. Leykekhman - MATH 3795 Introduction to Computational Mathematics Linear Least Squares { 8 Regularized Linear Least Squares Problems. I Suppose that the data are b = bex + δb. We want to compute the minimum norm solution of the (ordinary) LLS with unperturbed data bex r X uT b x = i v ex σ i i=1 i but we can only compute with b = bex + δb, we don't know bex. I The solution of the regularized least squares problem is r T T X σi(u bex) σi(u δb) x = i + i v : λ σ2 + λ σ2 + λ i i=1 i i D. Leykekhman - MATH 3795 Introduction to Computational Mathematics Linear Least Squares { 9 Regularized Linear Least Squares Problems. I We observed that r T X σi(u bex) i ! x as λ ! 0: σ2 + λ ex i=1 i I On the other hand T ( σi(ui δb) 0; if 0 ≈ σi λ ≈ T 2 ui δb σ + λ ; if σi λ, i σi which suggests to choose λ sufficiently large to ensure that errors δb in the data are not magnified by small singular values. D. Leykekhman - MATH 3795 Introduction to Computational Mathematics Linear Least Squares { 10 Regularized Linear Least Squares Problems. % Compute A t = 10.^(0:-1:-10)'; A = [ ones(size(t)) t t.^2 t.^3 t.^4 t.^5]; % compute exact data xex = ones(6,1); bex = A*xex; % data perturbation of 0.1% deltab = 0.001*(0.5-rand(size(bex))).*bex; b = bex+deltab; % compute SVD of A [U,S,V] = svd(A); sigma = diag(S); for i = 0:7 % solve regularized LLS for different lambda lambda(i+1) = 10^(-i) xlambda = V * (sigma.*(U'*b) ./ (sigma.^2 + lambda(i+1))) err(i+1) = norm(xlambda - xex); end D. Leykekhmanloglog(lambda,err,'*'); - MATH 3795 Introduction to Computational Mathematics ylabel('||x^{ex} Linear Least Squares {- 11x_{\lambda}||_2'); xlabel('\lambda'); −3 I For this example λ ≈ 10 is a good choice for the regularization parameter λ. However, we could only create this figure with the knowledge of the desired solution xex. I How can we determine a λ ≥ 0 so that kxex − xλk2 is small without knowledge of xex. I One approach is the Morozov discrepancy principle. Regularized Linear Least Squares Problem. I The error kxex − xλk2 for different values of λ (loglog-scale): D. Leykekhman - MATH 3795 Introduction to Computational Mathematics Linear Least Squares { 12 Regularized Linear Least Squares Problem. I The error kxex − xλk2 for different values of λ (loglog-scale): −3 I For this example λ ≈ 10 is a good choice for the regularization parameter λ. However, we could only create this figure with the knowledge of the desired solution xex. I How can we determine a λ ≥ 0 so that kxex − xλk2 is small without knowledge of xex. I One approach is the Morozov discrepancy principle. D. Leykekhman - MATH 3795 Introduction to Computational Mathematics Linear Least Squares { 12 Regularized Linear Least Squares Problem. I Suppose b = bex + δb. We do not know the perturbation δb, but we assume that we know its size kδbk. I Suppose the unknown desired solution xex satisfies Axex = bex. I Hence, kAxex − bk = kAxex − bex − δbk = kδbk: I Since the exact solution satisfies kAxex − bk = kδbk we want to find a regularization parameter λ ≥ 0 such that the solution xλ of the regularized least squares problem satisfies kAxλ − bk = kδbk This is Morozov's discrepancy principle.
Recommended publications
  • Instrumental Regression in Partially Linear Models
    10-167 Research Group: Econometrics and Statistics September, 2009 Instrumental Regression in Partially Linear Models JEAN-PIERRE FLORENS, JAN JOHANNES AND SEBASTIEN VAN BELLEGEM INSTRUMENTAL REGRESSION IN PARTIALLY LINEAR MODELS∗ Jean-Pierre Florens1 Jan Johannes2 Sebastien´ Van Bellegem3 First version: January 24, 2008. This version: September 10, 2009 Abstract We consider the semiparametric regression Xtβ+φ(Z) where β and φ( ) are unknown · slope coefficient vector and function, and where the variables (X,Z) are endogeneous. We propose necessary and sufficient conditions for the identification of the parameters in the presence of instrumental variables. We also focus on the estimation of β. An incorrect parameterization of φ may generally lead to an inconsistent estimator of β, whereas even consistent nonparametric estimators for φ imply a slow rate of convergence of the estimator of β. An additional complication is that the solution of the equation necessitates the inversion of a compact operator that has to be estimated nonparametri- cally. In general this inversion is not stable, thus the estimation of β is ill-posed. In this paper, a √n-consistent estimator for β is derived under mild assumptions. One of these assumptions is given by the so-called source condition that is explicitly interprated in the paper. Finally we show that the estimator achieves the semiparametric efficiency bound, even if the model is heteroscedastic. Monte Carlo simulations demonstrate the reasonable performance of the estimation procedure on finite samples. Keywords: Partially linear model, semiparametric regression, instrumental variables, endo- geneity, ill-posed inverse problem, Tikhonov regularization, root-N consistent estimation, semiparametric efficiency bound JEL classifications: Primary C14; secondary C30 ∗We are grateful to R.
    [Show full text]
  • PARAMETER DETERMINATION for TIKHONOV REGULARIZATION PROBLEMS in GENERAL FORM∗ 1. Introduction. We Are Concerned with the Solut
    PARAMETER DETERMINATION FOR TIKHONOV REGULARIZATION PROBLEMS IN GENERAL FORM∗ Y. PARK† , L. REICHEL‡ , G. RODRIGUEZ§ , AND X. YU¶ Abstract. Tikhonov regularization is one of the most popular methods for computing an approx- imate solution of linear discrete ill-posed problems with error-contaminated data. A regularization parameter λ > 0 balances the influence of a fidelity term, which measures how well the data are approximated, and of a regularization term, which dampens the propagation of the data error into the computed approximate solution. The value of the regularization parameter is important for the quality of the computed solution: a too large value of λ > 0 gives an over-smoothed solution that lacks details that the desired solution may have, while a too small value yields a computed solution that is unnecessarily, and possibly severely, contaminated by propagated error. When a fairly ac- curate estimate of the norm of the error in the data is known, a suitable value of λ often can be determined with the aid of the discrepancy principle. This paper is concerned with the situation when the discrepancy principle cannot be applied. It then can be quite difficult to determine a suit- able value of λ. We consider the situation when the Tikhonov regularization problem is in general form, i.e., when the regularization term is determined by a regularization matrix different from the identity, and describe an extension of the COSE method for determining the regularization param- eter λ in this situation. This method has previously been discussed for Tikhonov regularization in standard form, i.e., for the situation when the regularization matrix is the identity.
    [Show full text]
  • Statistical Foundations of Data Science
    Statistical Foundations of Data Science Jianqing Fan Runze Li Cun-Hui Zhang Hui Zou i Preface Big data are ubiquitous. They come in varying volume, velocity, and va- riety. They have a deep impact on systems such as storages, communications and computing architectures and analysis such as statistics, computation, op- timization, and privacy. Engulfed by a multitude of applications, data science aims to address the large-scale challenges of data analysis, turning big data into smart data for decision making and knowledge discoveries. Data science integrates theories and methods from statistics, optimization, mathematical science, computer science, and information science to extract knowledge, make decisions, discover new insights, and reveal new phenomena from data. The concept of data science has appeared in the literature for several decades and has been interpreted differently by different researchers. It has nowadays be- come a multi-disciplinary field that distills knowledge in various disciplines to develop new methods, processes, algorithms and systems for knowledge dis- covery from various kinds of data, which can be either low or high dimensional, and either structured, unstructured or semi-structured. Statistical modeling plays critical roles in the analysis of complex and heterogeneous data and quantifies uncertainties of scientific hypotheses and statistical results. This book introduces commonly-used statistical models, contemporary sta- tistical machine learning techniques and algorithms, along with their mathe- matical insights and statistical theories. It aims to serve as a graduate-level textbook on the statistical foundations of data science as well as a research monograph on sparsity, covariance learning, machine learning and statistical inference. For a one-semester graduate level course, it may cover Chapters 2, 3, 9, 10, 12, 13 and some topics selected from the remaining chapters.
    [Show full text]
  • Regularized Radial Basis Function Models for Stochastic Simulation
    Proceedings of the 2014 Winter Simulation Conference A. Tolk, S. Y. Diallo, I. O. Ryzhov, L. Yilmaz, S. Buckley, and J. A. Miller, eds. REGULARIZED RADIAL BASIS FUNCTION MODELS FOR STOCHASTIC SIMULATION Yibo Ji Sujin Kim Department of Industrial and Systems Engineering National University of Singapore 119260, Singapore ABSTRACT We propose a new radial basis function (RBF) model for stochastic simulation, called regularized RBF (R-RBF). We construct the R-RBF model by minimizing a regularized loss over a reproducing kernel Hilbert space (RKHS) associated with RBFs. The model can flexibly incorporate various types of RBFs including those with conditionally positive definite basis. To estimate the model prediction error, we first represent the RKHS as a stochastic process associated to the RBFs. We then show that the prediction model obtained from the stochastic process is equivalent to the R-RBF model and derive the associated mean squared error. We propose a new criterion for efficient parameter estimation based on the closed form of the leave-one-out cross validation error for R-RBF models. Numerical results show that R-RBF models are more robust, and yet fairly accurate compared to stochastic kriging models. 1 INTRODUCTION In this paper, we aim to develop a radial basis function (RBF) model to approximate a real-valued smooth function using noisy observations obtained via black-box simulations. The observations are assumed to capture the information on the true function f through the commonly-used additive error model, d y = f (x) + e(x); x 2 X ⊂ R ; 2 where the random noise is characterized by a normal random variable e(x) ∼ N 0;se (x) , whose variance is determined by a continuously differentiable variance function se : X ! R.
    [Show full text]
  • Regularization Tools
    Regularization Tools A Matlab Package for Analysis and Solution of Discrete Ill-Posed Problems Version 4.1 for Matlab 7.3 Per Christian Hansen Informatics and Mathematical Modelling Building 321, Technical University of Denmark DK-2800 Lyngby, Denmark [email protected] http://www.imm.dtu.dk/~pch March 2008 The software described in this report was originally published in Numerical Algorithms 6 (1994), pp. 1{35. The current version is published in Numer. Algo. 46 (2007), pp. 189{194, and it is available from www.netlib.org/numeralgo and www.mathworks.com/matlabcentral/fileexchange Contents Changes from Earlier Versions 3 1 Introduction 5 2 Discrete Ill-Posed Problems and their Regularization 9 2.1 Discrete Ill-Posed Problems . 9 2.2 Regularization Methods . 11 2.3 SVD and Generalized SVD . 13 2.3.1 The Singular Value Decomposition . 13 2.3.2 The Generalized Singular Value Decomposition . 14 2.4 The Discrete Picard Condition and Filter Factors . 16 2.5 The L-Curve . 18 2.6 Transformation to Standard Form . 21 2.6.1 Transformation for Direct Methods . 21 2.6.2 Transformation for Iterative Methods . 22 2.6.3 Norm Relations etc. 24 2.7 Direct Regularization Methods . 25 2.7.1 Tikhonov Regularization . 25 2.7.2 Least Squares with a Quadratic Constraint . 25 2.7.3 TSVD, MTSVD, and TGSVD . 26 2.7.4 Damped SVD/GSVD . 27 2.7.5 Maximum Entropy Regularization . 28 2.7.6 Truncated Total Least Squares . 29 2.8 Iterative Regularization Methods . 29 2.8.1 Conjugate Gradients and LSQR .
    [Show full text]
  • On the Determination of Proper Regularization Parameter
    Universität Stuttgart Geodätisches Institut On the determination of proper regularization parameter α-weighted BLE via A-optimal design and its comparison with the results derived by numerical methods and ridge regression Bachelorarbeit im Studiengang Geodäsie und Geoinformatik an der Universität Stuttgart Kun Qian Stuttgart, Februar 2017 Betreuer: Dr.-Ing. Jianqing Cai Universität Stuttgart Prof. Dr.-Ing. Nico Sneeuw Universität Stuttgart Erklärung der Urheberschaft Ich erkläre hiermit an Eides statt, dass ich die vorliegende Arbeit ohne Hilfe Dritter und ohne Benutzung anderer als der angegebenen Hilfsmittel angefertigt habe; die aus fremden Quellen direkt oder indirekt übernommenen Gedanken sind als solche kenntlich gemacht. Die Arbeit wurde bisher in gleicher oder ähnlicher Form in keiner anderen Prüfungsbehörde vorgelegt und auch noch nicht veröffentlicht. Ort, Datum Unterschrift III Abstract In this thesis, several numerical regularization methods and the ridge regression, which can help improve improper conditions and solve ill-posed problems, are reviewed. The deter- mination of the optimal regularization parameter via A-optimal design, the optimal uniform Tikhonov-Phillips regularization (a-weighted biased linear estimation), which minimizes the trace of the mean square error matrix MSE(ˆx), is also introduced. Moreover, the comparison of the results derived by A-optimal design and results derived by numerical heuristic methods, such as L-curve, Generalized Cross Validation and the method of dichotomy is demonstrated. According to the comparison, the A-optimal design regularization parameter has been shown to have minimum trace of MSE(ˆx) and its determination has better efficiency. VII Contents List of Figures IX List of Tables XI 1 Introduction 1 2 Least Squares Method and Ill-Posed Problems 3 2.1 The Special Gauss-Markov medel .
    [Show full text]
  • The L-Curve and Its Use in the Numerical Treatment of Inverse Problems 1 Introduction
    The L-curve and its use in the numerical treatment of inverse problems P. C. Hansen Department of Mathematical Modelling, Technical University of Denmark, DK-2800 Lyngby, Denmark Abstract The L-curve is a log-log plot of the norm of a regularized solution versus the norm of the corresponding residual norm. It is a convenient graphical tool for displaying the trade-off between the size of a regularized solution and its fit to the given data, as the regularization parameter varies. The L-curve thus gives insight into the regularizing properties of the underlying regularization method, and it is an aid in choosing an appropriate regu- larization parameter for the given data. In this chapter we summarize the main properties of the L-curve, and demonstrate by examples its usefulness and its limitations both as an analysis tool and as a method for choosing the regularization parameter. 1 Introduction Practically all regularization methods for computing stable solutions to inverse problems involve a trade-off between the “size” of the reg- ularized solution and the quality of the fit that it provides to the given data. What distinguishes the various regularization methods is how they measure these quantities, and how they decide on the optimal trade-off between the two quantities. For example, given the discrete linear least-squares problem min kA x¡bk2 (which specializes to A x = b if A is square), the classical regularization method devel- oped independently by Phillips [31] and Tikhonov [35] (but usually referred to as Tikhonov regularization) amounts — in its most general 1 form — to solving the minimization problem n o 2 2 2 x¸ = arg min kA x ¡ bk2 + ¸ kL (x ¡ x0)k2 ; (1) where ¸ is a real regularization parameter that must be chosen by the user.
    [Show full text]
  • Arxiv:1707.06852V1 [Stat.ME] 21 Jul 2017 Dynamical System Dx T = G(T, X , Θ), (1.1) Dt T Where G Is a Known Function and Θ Is a Parameter
    A Statistical Perspective on Inverse and Inverse Regression Problems Debashis Chatterjeea, Sourabh Bhattacharyaa,∗ aInterdisciplinary Statistical Research Unit, Indian Statistical Institute, 203 B. T. Road, Kolkata - 700108, India Abstract Inverse problems, where in broad sense the task is to learn from the noisy response about some unknown function, usually represented as the argument of some known functional form, has received wide attention in the general scientific disciplines. How- ever, in mainstream statistics such inverse problem paradigm does not seem to be as popular. In this article we provide a brief overview of such problems from a statistical, particularly Bayesian, perspective. We also compare and contrast the above class of problems with the perhaps more statistically familiar inverse regression problems, arguing that this class of problems contains the traditional class of inverse problems. In course of our review we point out that the statistical literature is very scarce with respect to both the inverse paradigms, and substantial research work is still necessary to develop the fields. Keywords: Bayesian analysis, Inverse problems, Inverse regression problems, Regularization, Reproducing Kernel Hilbert Space (RKHS), Palaeoclimate reconstruction 1. Introduction The similarities and dissimilarities between inverse problems and the more tradi- tional forward problems are usually not clearly explained in the literature, and often “ill-posed” is the term used to loosely characterize inverse problems. We point out that these two problems may have the same goal or different goal, while both consider the same model given the data. We first elucidate using the traditional case of deterministic differential equations, that the goals of the two problems may be the same.
    [Show full text]
  • Bayesian Analysis of Linear Inverse Problems with Applications in Economics and Finance
    Alma Mater Studiorum - Universit`adi Bologna DOTTORATO DI RICERCA IN ECONOMIA Ciclo XX Settore scienti¯co disciplinare di a®erenza: SECS-P/05 - ECONOMETRIA. Bayesian Analysis of Linear Inverse Problems with Applications in Economics and Finance Presentata da: Anna SIMONI Coordinatore Dottorato: Relatore: Andrea ICHINO Sergio PASTORELLO Esame Finale Anno 2009 Contents 1 Introduction 1 2 Regularized Posterior in linear ill-Posed Inverse Problems 6 2.1 Introduction . 6 2.2 The Model . 8 2.2.1 Sampling Probability Measure . 8 2.2.2 Prior Speci¯cation and Identi¯cation . 8 2.2.3 Construction of the Bayesian Experiment . 10 2.3 Solution of the Ill-Posed Inverse Problem . 10 2.3.1 Tikhonov Regularized Posterior distribution . 11 2.3.2 Tikhonov regularization in the Prior Variance Hilbert scale . 12 2.4 Asymptotic Analysis . 13 2.4.1 Speed of convergence with Tikhonov regularization in the Prior Vari- ance Hilbert Scale . 16 2.5 The case with unknown operator K ...................... 17 2.5.1 Asymptotic Analysis . 18 2.6 The case with di®erent operator for each observation . 20 2.6.1 Marginalization of the Bayesian experiment . 21 2.6.2 Asymptotic Analysis . 23 2.7 Conclusions . 24 2.8 Appendix A: Proofs . 24 2.9 Appendix B: Examples . 35 2.10 Appendix C: Monte Carlo Simulations . 38 3 On the Regularization Power of the Prior Distribution in Linear ill-Posed Inverse Problems 43 3.1 Introduction . 43 3.2 Asymptotic Analysis . 47 3.3 A particular case . 50 3.4 g as an hyperparameter . 52 3.5 Conclusion . 54 3.6 Appendix A: proofs .
    [Show full text]
  • Regularization of Least Squares Problems
    Regularization of Least Squares Problems Heinrich Voss [email protected] Hamburg University of Technology Institute of Numerical Simulation TUHH Heinrich Voss Least Squares Problems Valencia 2010 1 / 82 Outline 1 Introduction 2 Least Squares Problems 3 Ill-conditioned problems 4 Regularization 5 Large problems TUHH Heinrich Voss Least Squares Problems Valencia 2010 2 / 82 Introduction Well-posed / ill-posed problems Back in 1923 Hadamard introduced the concept of well-posed and ill-posed problems. A problem is well-posed, if — it is solvable — its solution is unique — its solution depends continuously on system parameters (i.e. arbitrary small perturbation of the data can not cause arbitrary large perturbation of the solution) Otherwise it is ill-posed. According to Hadamard’s philosophy, ill-posed problems are actually ill-posed, in the sense that the underlying model is wrong. TUHH Heinrich Voss Least Squares Problems Valencia 2010 4 / 82 Introduction Ill-posed problems Ill-posed problems often arise in the form of inverse problems in many areas of science and engineering. Ill-posed problems arise quite naturally if one is interested in determining the internal structure of a physical system from the system’s measured behavior, or in determining the unknown input that gives rise to a measured output signal. Examples are — computerized tomography, where the density inside a body is reconstructed from the loss of intensity at detectors when scanning the body with relatively thin X-ray beams, and thus tumors or other anomalies are detected. — solving diffusion equations in negative time direction to detect the source of pollution from measurements Further examples appear in acoustics, astrometry, electromagnetic scattering, geophysics, optics, image restoration, signal processing, and others.
    [Show full text]
  • Kernel Instrumental Variable Regression
    Kernel Instrumental Variable Regression Rahul Singh Maneesh Sahani Arthur Gretton MIT Economics Gatsby Unit, UCL Gatsby Unit, UCL [email protected] [email protected] [email protected] Abstract Instrumental variable (IV) regression is a strategy for learning causal relationships in observational data. If measurements of input X and output Y are confounded, the causal relationship can nonetheless be identified if an instrumental variable Z is available that influences X directly, but is conditionally independent of Y given X and the unmeasured confounder. The classic two-stage least squares algorithm (2SLS) simplifies the estimation problem by modeling all relationships as linear functions. We propose kernel instrumental variable regression (KIV), a nonparametric generalization of 2SLS, modeling relations among X, Y , and Z as nonlinear functions in reproducing kernel Hilbert spaces (RKHSs). We prove the consistency of KIV under mild assumptions, and derive conditions under which convergence occurs at the minimax optimal rate for unconfounded, single-stage RKHS regression. In doing so, we obtain an efficient ratio between training sample sizes used in the algorithm’s first and second stages. In experiments, KIV outperforms state of the art alternatives for nonparametric IV regression. 1 Introduction Instrumental variable regression is a method in causal statistics for estimating the counterfactual effect of input X on output Y using observational data [60]. If measurements of (X, Y ) are confounded, the causal relationship–also called the structural relationship–can nonetheless be identified if an instrumental variable Z is available, which is independent of Y conditional on X and the unmeasured confounder. Intuitively, Z only influences Y via X, identifying the counterfactual relationship of interest.
    [Show full text]
  • Sequential Tikhonov Regularization: an Alternative Way for Integral Inversion … Fachbeitrag
    Eshagh, Sequential Tikhonov Regularization: An Alternative Way for Integral Inversion … Fachbeitrag Sequential Tikhonov Regularization: An Alternative Way for Integral Inversion of Satellite Gradiometric Data Mehdi Eshagh Abstract of this mission will be a set of spherical harmonic coeffi- Numerous regularization methods exist for solving the ill- cients of the Earth’s gravity field and their corresponding posed problem of downward continuation of satellite grav- errors to degree and order 200 corresponding to a spatial ity gradiometry (SGG) data to gravity anomaly at sea level. resolution of 0.9° × 0.9° (100 km × 100 km). This set is Generally, the use of a dense set of data is recommended in expected to deliver the geoid and gravity anomalies with the downward continuation. However, when such dense data accuracies of 1–2 cm and 1 mGal, respectively from joint are used some of the regularization methods are not efficient inversion of SGG and satellite-to-satellite tracking data. and applicable. In this paper, a sequential way of using the GOCE measures the full tensor of gravitation, contain- Tikhonov regularization is developed for solving large systems ing second-order partial derivatives of the geopotential, and compared to methods of direct truncated singular val- in the gradiometer reference frame, second by second, ue decomposition and iterative methods of range restricted during its life. However, it should be stated that the full minimum residual, algebraic reconstruction technique, ν and tensor of gravitation is not measured with the same ac- conjugate gradient for recovering gravity anomaly at sea level curacy and there are highly sensitive and less sensitive from the SGG data.
    [Show full text]