The Calculus of M-Estimation

The Calculus of MEstimation Leonard A Stefanski and Dennis D Boos Abstract Since the groundbreaking pap ers by Hub er in the s Mestimation metho ds estimating equations have b een increasingly imp ortant for asymptotic analysis and approximate inference Now with the prevalence of programs like Maple and Mathematica calculation of asymptotic variances in complex problems is just a matter of routine manipulation The intent of this article is to illustrate the depth and generality of the Mestimation approach and thereby facilitate its use KEY WORDS Asymptotic variance Central Limit Theorem Estimating equations Maple M estimator Institute of Statistics Mimeo Series No March Leonard A Stefanski and Dennis D Bo os are Professors Department of Statistics North Carolina State Univer sity Raleigh NC Email addresses are stefanskistatncsuedub o os statncsuedu INTRODUCTION P n b Y That is the Mestimator Mestimators are solutions of the vector equation i i=1 satises n X b Y i i=1 Here we are assuming that Y Y are indep endent but not necessarily identically distributed 1 n is a pdimensional parameter and is a known p function that do es not dep end on i or n In this description Y represents the ith datum In some applications it is advantageous i to emphasize the dep endence of on particular comp onents of Y For example in a regression i problem Y x Y and would typically b e written i i i n X b Y x i i i=1 where x is the ith regressor i Hub er intro duced Mestimators and their asymptotic prop erties and they were an imp ortant part of the development of mo dern robust statistics Liang and Zeger help ed p opularize Mestimators in the biostatistics literature under the name generalized estimating equa tions GEE Obviously many others have made imp ortant contributions For example Go damb e intro duced the concept of an optimum estimating function in an Mestimator context and that pap er could b e called a forerunner of the Mestimator approach However our goal is not to do cument the development of Mestimators or to give a bibliog raphy of contributions to the literature Rather we want to show that the Mestimator approach is simple p owerful and more widely applicable than many readers imagine We want students to feel comfortable nding and using the asymptotic approximations that ow from the metho d The key advantage is that a very large class of asymptotically normal statistics includin g delta metho d transformations can b e put in the general Mestimator framework This unies large sample ap proximation metho ds simplies analysis and makes computations routine An imp ortant practical consideration is the availability of a symb olic manipulation program like Maple otherwise the matrix calculations can b e overwhelming Nevertheless the general theory is straightforward We claim that many estimators not thought of as Mestimators can b e written in the form of Mestimators Consider as a simple example the mean deviation from the sample mean n X b Y j jY i 1 n i=1 Is this an Mestimator There is certainly no single equation of the form n X Y i i=1 b b that yields Moreover there is no family of densities f y such that is a comp onent of the 1 1 maximum likeliho o d estimator of But if we let y jy j and y y 1 1 2 2 1 2 1 2 2 then P n n b b X jY j i 2 1 i=1 C B C B b b Y A A i 1 2 P n b Y i=1 i 2 i=1 P n b b yields Y and n jY Y j We like to use the term partial Mestimator for 2 1 i i=1 an estimator that is not naturally an Mestimator until additional functions are added The key idea is simple any estimator that would b e an Mestimator if certain parameters were known is a partial Mestimator b ecause we can stack functions for each of the unknown parameters This asp ect of Mestimators is related to the general approach of Randles for replacing unknown parameters by estimators b From the ab ove example it should b e obvious that we can replace Y by any other 2 estimator dened by an estimating equation for example the sample median Moreover we can b b also add functions to give delta metho d asymptotic results for transformations like log 3 1 In this latter context there are connections to Benichou and Gail Certainly the combination of standard inuence curve and delta theorem metho dology can handle a larger class of problems than this enhanced Mestimation approach However we b elieve that the combination of a single approach along with a symb olic manipulator like Maple will make this Mestimation approach much more likely to b e used in complex problems A description of the basic approach is given in Section along with a few examples Connec tions to the inuence curve are given in Section and then extensions for nonsmo oth functions are given in Section Extensions for regression are given in Section A discussion of some test ing problems is given in Section and Section summarizes the key features of the Mestimator metho d The Basic Approach Mestimators solve where the vector function must b e a known function that do es not dep end on i or n For regression situations the argument of will b e expanded to dep end on regressors x i but the basic will still not dep end on i For the moment we will conne ourselves to the iid case where Y Y are iid p ossibly vectorvalued with distribution function F The true parameter 1 n value is dened by 0 Z E Y y dF y F 1 0 0 R For example if Y Y then clearly the p opulation mean y dF y is the unique i i 0 R solution of y dF y If there is one unique satisfying then in general there exists a sequence of Mestimators 0 p b b such that the weak law of large numb ers leads to as n Furthermore if is suitably 0 P n 1 smo oth then Taylor expansion of G n Y gives n i i=1 0 b b G G G R n n 0 0 0 n n h i 0 T 0 to b e nonsingular For n suciently large we exp ect G G where G 0 0 n n n = so that we can rearrange and get p p p 1 0 b n G n G nR 0 0 n 0 n n Under suitable regularity conditions as n n X p 0 E Y Y A G i 0 1 0 0 0 n T T n i=1 h i p d T n G MVN B where B E Y Y n 0 0 0 1 0 1 0 p p nR n If A exists the Weak Law of Large Numb ers gives If B exists then follows from 0 0 the Central Limit Theorem The hard part to prove is Hub er was the rst to give general results for but there have b een many others since then We shall b e content to observe that holds in most regular situations where there are sucent smo othness conditions on and has xed dimension p as n Putting and together with Slustkys Theorem we have that V 0 b is AMN as n 0 n 1 1 T where V A B fA g AMN means asymptotically multivariate normal 0 0 0 0 The limiting covariance V is called the sandwich matrix b ecause the meat B is placed 0 0 1 1 T b etween the bread A and fA g 0 0 b Extension Supp ose that instead of satises n X b Y c i n i=1 p p p n n is absorb ed as n Following the ab ove arguments and noting that c where c n n p in nR of we can see that as long as and hold then will also hold This extension n allows us to cover a much wider class of statistics including empirical quantiles estimators whose function dep ends on n and Bayesian estimators For maximum likeliho o d estimation y log f y is often called the score func tion If the data truly come from the assumed parametric family f y then A B 0 0 1 I the information matrix In this case the sandwich matrix V reduces to the usual I 0 0 0 One of the key contributions of Mestimation theory has b een to p oint out what happ ens when the assumed parametric family is not correct In such cases there is often a welldened satisfying 0 b and satisfying but A B and valid inference should b e carried out using the 0 0 1 1 T 1 correct limiting covariance matrix V A B fA g not I 0 0 0 0 0 Using the lefthandside of we dene the empirical estimator of A by 0 n X 0 b b b Y A Y G i n n T n i=1 b b Note that for maximum likeliho o d estimation nA is the observed information matrix I n Y Similarly the empirical estimator of B is 0 n X T b b b Y Y B Y i i n n i=1 Putting these together yields the empirical sandwich estimator 1 1 T b b b b V Y A Y B Y fA Y g n n n n b V Y will generally b e consistent for V under mild regularity conditions see Iverson and n 0 Randles for a general theorem on this convegence b V Y requires no analytic work b eyond sp ecifying the function In some problems it is n 1 1 T simpler to work directly with the limiting form V A B fA g and just plug 0 0 0 0 in estimators for and any other unknown quantities in V The notation V suggests that 0 0 0 is the only unknown quantity in V but in reality V often involves higher moments or 0 0 0 other characteristics of the true distribution function F of Y In fact there is a range of p ossibilities i for estimating V dep ending on what mo del assumptions are used For simplicity we will just 0 b b use the notation V Y for the purely empirical estimator and V for any of the exp ected n value plus mo del assumption versions For maximum likeliho o d estimation with a correctly sp ecied family the three comp eting h i 1 1 1 1 b b b b b A Y and I V In this case estimators for I are V Y I n n n Y h i 1 1 b b b and I are generally more ecient than V Y for the standard estimators I n n Y 1 1

The Calculus of M-Estimation

Analysis of Variance; *******43*1T**T

Autoregressive Conditional Kurtosis

6 Modeling Survival Data with Cox Regression Models

Quadratic Versus Linear Estimating Equations

Inference Based on Estimating Equations and Probability-Linked Data

18.650 (F16) Lecture 10: Generalized Linear Models (Glms)

Optimal Variance Control of the Score Function Gradient Estimator for Importance Weighted Bounds

Bayesian Methods: Review of Generalized Linear Models

Simple, Lower-Variance Gradient Estimators for Variational Inference

Generalized Estimating Equations for Mixed Models

STAT 830 Likelihood Methods

Using Generalized Estimating Equations to Estimate Nonlinear