Minimum Mean Squared Error Estimation

MINIMUM MEAN SQUARED ERROR ESTIMATION by Victor Charles Drastik Submitted for the degree of Doctor of Philosophy, School of Mathematics, University of New South Wales. August, 1983 2. Abstract Let x1, x2, ... ,Xn be independent and identically distributed random variables, each with distribution depending on one or more para meters. The random vector ~ usually depends on a parameter e in a nonlinear fashion. We define a linearising transformation for e to be a function g(X) such that the dependence of g(X) on e is linear in the sense that E g(X) is proportional to e If we can find m · such transformations g1, g2 , .. ,gm we may construct an estimator of e as a compound linearising transformation A m e = . l c. g. ( X) 1 =1 1 1 - The optimal estimator of this form may then be found by minimising the mean squared error of e with respect to the constants c1, c2, ... ,cm. We will call the combination of these two stages the Method of Minimum Mean Squared Error Estimation. In this thesis, we discuss some approaches to finding linearising transformations. The resulting techniques are applied to several parametric problems, including a computer simulation study of estimators of scale parameters, in which a modification to the MLE is revealed to be almost optimal. Finally, we discuss distribution-free estimation of the centre of a syrrmetric distribution. I~ a computer simulation study, the estimators resulting from the MMSEE approach are found to be superior to the best estimators from the Princeton Robustness Study. 3. Acknowledgement I would like to express my gratitude to Dr. Peter Cooke, my supervisor and friend, whose guidance and encouragement have been of immeasurable value to me in the course of my research. 4. Contents Page Title Page 1 Abstract 2 Acknowledgement 3 Table of Contents 4 Notation 5 Chapter I In troduc ti on 8 Chapter II Theoretical Development 14 Chapter II I Scale Parameter Simulation Results 39 Chapter IV Distribution-free MMSEE 48 Appendix 74 Bibliography 76 5. Notation exp(x), ln(x) natural exponent and natural logarithm of x r(n) gamma function of n , defined as 00 n-1 -u r (n ) = J u e du for n > 0 0 B(m,n) beta function of m and n , defined as 1 B(m,n) = J um-1( 1-u )n-1 du= r(m)r(n) for 0 r(m+n) m > 0 , n > 0 n'. factori a 1 n , defined as n'. = r(n+l) for n -> 0 . When n is integral, n'. = n(n-1) ... 2.1 binomial coefficient of n and r , defined as (n) n'. f 0 r = r'. (n-r} '. or < r < n [xJ integer part of x , defined as the largest integer less than or equal to x the inverse function of h , defined as the value of x such that y = h(x) -X , A vector x , matrix A T X , AT transpose of ~, transpose of A J , l , n integration, summation, product operators a d ax , dx partial derivative, differentiation operators (f1 (x) = 1x f(x)) E , Var, Cov, MSE expectation, variance, covariance, mean squared error operators 6. random variable x. observed value of X. 1 1 x(l) ~ x(2) < ~ \n) order statistics based on x1, x2, ... ,Xn x(l) ~ x(2) < ordered observations sample mean, defined as = x x ln . 1 x.1 1= 1 1 n 2 s2 sample variance, defined as s2 = - l (X.-X) n-1 . 1 1= 1 F(xl~) cumulative distribution function of the random variable X, depending on the parameter vector e f(xl~) probability density function of the random variable X ~ depending on e 2 N(µ,o) Nonnal random variable with mean µ and variance o2 U[a,bJ Unifonn random variable over the interval [a,bJ a, a, y location, scale and shape parameters. <P, e lower, upper endpoints µ, a mean, standard deviation e,"' -e estimators of e C* quantity c distinguished in some way (usually optimal value of c) n sample size LT Linearising Transfonnation MLE Maximum Likelihood Estimation or Estimator or Estimate 7. MSE Mean Squared Error MMSEE Minimum MSE Estimation or Estjmator or Estimate V, E, ~ for all, belongs to, approximately equals is distributed as Character type use Lower case Greek scalars, parameters (Examples : w, µ) Lower case Roman vectors, constants or coefficients, observations, functions (Examples: C, a, X, f) Upper case Roman matrices, random variables, functions (Examples: A-1 , -X, F(x I~) ) Upper case Greek spaces, operators (Examples: 0, l: ) 8. Chapter I: Introduction In the classical problem of estimation of a single parameter e, the estimator is a function of independent and identically distributed random variables x1, x2, ... ,Xn each with distribution depending on e (and possibly other parameters). We would like to find an estimator ,.. e which is 11 close 11 to e . An ideal (or most concentrated) estimator is one which has its probability mass concentrated as closely as possible about the true parameter value e . Unfortunately, although the property of most concentrated is highly desirable, ideal estimators seldom exist. There are just too many possible estimators for any one of them to be most concentrated, so the concept of ideal estimation is not usable in practice. We need to find a criterion of closeness which also leads to estimators which are highly concentrated about e . The Mean Squared Error (MSE) of an estimator e(X) of the para meter e based on the sample ~ = (X 1, x2, ... ,Xn) is defined to be the average squared deviation of e from e and can be written as the ,.. variance of e plus the square of its bias; that is, MSE(e) = E{e(~)-e} 2 = Var(e) + {E(e-e)}2 The principal advantage of the MSE criterion is that it severely A penalises large deviations of e from e , that is, the frequency of large errors is greatly reduced. Another advantage is that consistency in MSE implies asymptotic unbiasedness and consistency in probability. Asymptotically the estimator is concentrated arbitrarily closely to e (but perhaps not optimally so). 9. Let 0 represent the set of all possible estimators e of e. A A uniformly MSE-optimal estimator among all members of 0 cannot exist because 0 is too large. It includes estimators such as 8(X)-- = 0 V X which are extremely prejudiced in favour of a particular e value. Therefore an estimator which has uniformly smallest MSE would necessarily have zero MSE for all e . This is clearly impossible except for trivial cases. We must therefore restrict our attention to some suitable subset of 0. The estimators in such a subset should possess some desirable property, usually by satisfying a constraint equation. The two most corrmon constraints are (1) unbiasedness, with constraint equation E(e) = e V0E0 and (2) invariance, expressed as location: 8( X+c) = 0(X) + C scale e(cX) = ea( X) 1 V -X , V c ER The MSE-optimal estimators in the first subset are Uniformly Minimum Variance Unbiased Estimators and in the second are Pitman [16J estimators of location and scale. In this thesis we consider estimators in a third subset of 0 that defined by the constraint m a(X) = l c. g.(X) 1=. 1 1 1 - where {ci} is a set of undetermined coefficients and {gi} is a set of functions of the sample data. This constraint has the advantage 10. that, unlike the other constraints, it immediately gives us a functional fonn for the estimator. Like the others it is not a strong restriction and often the three methods develop fonnulae which differ only in constants. Another advantage is that once the functions {gi} are chosen, the selection of the MSE-optimal member of the set is usually a matter of some elementary calculus, whereas the optimal estimators for the other methods are not at all obvious. The Method of Minimum Mean Squared Error Estimation (MMSEE) is one of a class of methods of optimal algorithm selection generally known as "methods of undetennined coefficients" (see, for example, Gerald, [BJ). The principle behind these methods is that first we choose a class of formulae that are likely to be effective in solving a given problem, then we choose a particular member of the class by referring to some appropriate criterion of efficiency. For example, a problem in numerical analysis is to find the area under a curve between two given points by approximate integration. The method of undetennined coef ficients used by Gauss to solve this problem is as follows: first we select a class of fonnulae by considering the characteristics of the problem. We can write the value of the integral as b I = f f(x)dx = (b-a)f a where f is the average value of f over the domain [a,bJ. One reasonable fonnula to approximate an average function value is a linear combination of several function values. Thus " m I = w. f(x .) . l 1 , 1 1= is an approximation to I , where {xi} is a set of points in [a,bJ and {wi} is a set of undetermined coefficients. Next, we select a 11. criterion for deciding which "I is the best one for our purposes. There are several reasonable criteria available, but the one usually chosen is that I should be exact for as many low-order polynomials as possible. Since we have 2m undetermined quantities, it must be " possible to choose the {wi} and {xi} such that I correctly evaluates the integrals of all polynomials of degree less than or equal to 2m-1 • Thus, in general, in order to select a class of formulae, we apply some prior knowledge about what sort of formula is.likely to be suit able.

Load more