NORM P ESTIMATION by Hans Nyquist AKADEMI
Total Page:16
File Type:pdf, Size:1020Kb
Institute of Statistics University of Umeå S-901 87 Umeå Sweden RECENT STUDIES ON L -NORM P ESTIMATION by Hans Nyquist AKADEMISK AVHANDLING som med tillstånd av rektorsämbetet vid Umeå universitet för erhållande av filosofie doktorsexamen framlägges till offentlig granskning i Hörsal D, Samhällsvetarhuset, fredagen den 12 december 1980 kl 10.15. TABLE OF CONTENTS ABSTRACT ACKNOWLEDGEMENTS CHAPTER I. OVERVIEW OF THE PROBLEM 1 CHAPTER II. A BACKGROUND TO L -NORM ESTIMATION 5 P 2.1 An introduction to the art of combining observations 5 2.2 Linear models 15 2.3 Stable distributions and their domains of attraction 20 2.4 Examples of models with fat-tailed or contamined distributed random variables 26 2.5 Algorithms for computations of L^- and L^-norm estimates 35 2.6 Some Monte Carlo studies 39 2.7 Conclusions 52 CHAPTER III. SOME BASIC PROPERTIES OF L -NORM ESTIMATORS 55 P 3.1 Definition of the L -norm estimators 55 P 3.2 The existence of L -norm estimates 59 P 3.3 Characterizations of L -norm estimates 60 P 3.4 Uniqueness of L^-norm estimates, 1 < p < 00 66 3.5 Uniqueness of L^-norm estimates 67 3.6 Geometric interpretations 86 3.7 A general notion of orthogonality 89 3.8 Relations to M- and L-estimators 91 3.9 Equivalence of L^-norm and maximum likelihood estimators 93 3.10 Computation of L^-norm estimates, 1 < p < «>, p * 2 95 3.11 Conclusions 97 CHAPTER IV. SOME SAMPLING PROPERTIES OF L -NORM ESTIMATORS 100 P 4.1 Introduction 100 4.2 The sampling distribution of L^-norm estimators 102 4.3 Stochastic regressors 109 4.4 Heteroscedasticity 111 4.5 Linearly dependent residuals 114 4.6 The optimal L -norm estimator 123 P 4.7 Estimation of the variance 133 4.8 Geometrical interpretations 135 4.9 Conclusions 138 CHAPTER V. ON THE USE OF L -NORM ESTIMATORS IN STATISTICAL ANALYSIS 141 P 5.1 An introduction to the use of L -norm estimators 141 P 5.2 Multicollinearity 143 5.3 Linearly dependent residuals 147 5.4 Estimation of interdependent systems 151 5.5 Conclusions 157 CHAPTER VI. SUMMARY 159 RECENT STUDIES ON L -NORM P ESTIMATION by Hans Nyquist Institute of Statistics University of Umeå S-901 87 UMEÅ Sweden ABSTRACT When estimating the parameters in a linear regression model, the method of least squares (L^-norm estimator) is often used. When thè residuals are independent and identically normally distributed, the least squares estimator is BLUE as well as equivalent to the maximum likelihood estimator. However, the least squares estimator is known to be sensitive to departures from the assumption of normally distri buted residuals. In a variety of applications there are theoretical as well as empirical evidences that the residuals display distributional properties different from those of normal distributions. It is there fore desirable to develop alternative estimators. Members of the class of Lp-norm estimators have here been proposed as alternatives to the least squares estimator. In this monograph, questions concerning the existence, uniqueness and asymptotic distributions of L^-norm estimators are discussed. It is seen that an L^-norm estimate will always exist and it will be u- nique for 1 < p < ». For p = 1 a necessary and sufficient condition on the regressors for unique estimation is given. The question of u- niqueness in large samples is also discussed. The asymptotic distri bution of Lp-norm estimators i shown to be normal, for sufficiently small p. When selecting an L^-norm estimator, a procedure based on the asymptotic variance is proposed. Finally, the possibilities to construct L -norm based estimators P for estimating multicollinear regression models and models with se rially dependent residuals are discussed. L -norm based methods for P estimating interdependent systems are also considered. Key words and phrases; L^-norm estimation, robustness, linear regression models, fat-tailed distributed residuals. ACKNOWLEDGEMENTS Most of this study was carried out at the Institute of Statistics, University of Umeå. I wish to express my sincere gratitude to Professor Uno Zackrisson, Head of the Institute, and Docent Anders Westlund for their guidance in preparation of this thesis. Their constructive criticism, ideas and advice added new, important dimensions to my research. I was fortunate to find colleagues willing to discuss parts of this study with me. In this respect, I am especially grateful to Professor Lester D. Taylor, Department of Economics, University of Arizona, Docent Lennart Bondesson, Institute of Mathematical Statistics, University of Umeå and FK Göran Arnoldsson, Institute of Statistics, University of Umeå. A special thanks to Berith Melander, an excellent typist, who unself ishly gave her time in order to help me meet deadlines; an activity in which Maj-Britt Backlund and Åsa Ågren also participated. It is gratefully acknowledged that this work was supported by grants from the Swedish Research Council for the Humanities and the Social Sciences. Umeå November 1980 Hans Nyquist 1. CHAPTER I. OVERVIEW OF THE PROBLEM The use of stochastic models has a long history. For example, it lö known that scientists during the 18th century made use of stochastic models for discribing the motions of celestial bodies. Stochastic models usual ly comprise a set of unknown parameters to be estimated. Therefore the problem of estimating parameters in stochastic models aldo has a long history. The first estimating techniques were based on different types of averaging empirical observations or functions of empirical observa tions. A completely new methodology in estimating unknown parameters in sto chastic models was introduced by Boscovich in 1757. He proposed that the parameters should be estimated according to the minimum of a function of the measurment errors. The function Boscovich proposed was the sum of the absolute measurement errors. This method is still employed under the name of MAD (minimum absolute deviations), LAE (least absolute error), least first power or L^-norm estimator. However, that estimating method was computationally very complicated. Only relatively simple models could be estimated, and it took a long time to carry through all computations. In the works of Legendr e from 1805 and Gauss from 1809, another func tion to be minimized was proposed, viz. the sum of the 'squares of the measurement errors. By that, the computationally simple method of least squares (or L^-norm estimator) was born. After the proposals of Legendre and Gauss, the method of least squares has been one of the most popular estimating techniques. The popularity of the least squares estimator is presumably due to the easy computation and the fact that when the resi duals are independent and identically normally distributed, the least squa res estimator of a linear regression model is BLUE (best linear unbia sed estimator) as well as equivalent to the maximum likelihood estimator, implying the inference to be easily performed. 2. However, the least squares estimates are known to be sensitive to departures from the assumption of normally distributed residuals. In a variety of applications there are theoretical as well as empirical evidences that the residuals display distributional characteristics dif ferent from those of normal distributions. It is therefore desirable to develop alternative estimators. Since there is an infinite number of ways to construct estimators, we have to restrict our attention to some class of estimators. Boscovich introduced the idea of, in some sense, minimizing the residual vector. The same idea is fundamental in Legendre's and Gauss1 method of least squares. However there is no unique criterion for minimizing a vector. Therefore we have to minimize some functional of the residual vector. To add more mathematical structure to the estimator, we restrict the functional to be a vector norm. Thus, given an appropriate vector norm I I.I I, the estimates are chosen as to minimize ||e| | , where e is the observed residual vector. While the class of all vector norms is limitless, an interesting sub class is given by the so-called L -norms. The L -norm of the vector of P P observed residuals is defined as { E for 1 < p < « IMI - i=1/ max Iei| for p=« An estimator minimizing a L -norm of the residual vector, will be called P an Lp-norm estimator. Thus, Boscovich's proposal corresponds to p=1 and the least squares estimator to p=2 (therefore we frequently use the na mes L^- and I^-norm estimators, respectively). With the computer routines that are available ^today, there should not be any serious problems in computing L^-norm estimates. The interest in the L^-norm estimator has its origin in the results 3. of Laplace (1818) and Edgeworth (1887a and b). There it is shown that the L^-norm estimator is preferable to the least squares, when estima ting a simple linear regression model with fat-tailed or contamined distributed residuals. Results from several Monte Carlo studies, per formed more recently, indicate that the same conclusions are to be drawn, even for more complicated models. The interest in the class of L -norm estimators is that the least P squares as well as the L^-norm estimators are members of that class. Furthermore, when estimating a linear regression model and the residual distribution belongs to the class of double exponential distributions, (which is a family of the exponential class of distributions), there is an Lp-norm estimator that is equivalent jto the maximum likelihood esti mator. For other distributions, there are Monte Carlo simulations that suggest that there is an L^-norm estimator preferable both to the L^- and to the L^-norm estimator. The purpose of the present monograph is the investigation of L^-norm estimators of linear regression models.