Introduction to Empirical Processes and Semiparametric Inference Lecture 04: Overview Continued
Total Page:16
File Type:pdf, Size:1020Kb
Empirical Processes: Lecture 04 Spring, 2014 Introduction to Empirical Processes and Semiparametric Inference Lecture 04: Overview Continued Michael R. Kosorok, Ph.D. Professor and Chair of Biostatistics Professor of Statistics and Operations Research University of North Carolina-Chapel Hill 1 Empirical Processes: Lecture 04 Spring, 2014 We say that a map φ : Dφ ⊂ D 7! E, is Gateaux-differentiableˆ at θ 2 Dφ 0 if there exists a map φθ : D 7! E such that φ(θ + th − φ(θ) 0 − φ (h) ! 0 ; t θ where θ + th 2 Dφ for every t > 0 small enough. 2 Empirical Processes: Lecture 04 Spring, 2014 We say that the map is Hadamard-differentiable if there exists a 0 continuous linear map φθ : D 7! E such that φ(θ + th − φ(θ) 0 sup φθ(h) ! 0 ; (1) h2K t for every compact set K ⊂ D where θ + th 2 Dφ for every t > 0 small enough. We say that the map is Frechet-differentiable´ if the condition (1) holds for every bounded subset K ⊂ D. 3 Empirical Processes: Lecture 04 Spring, 2014 §Semiparametrics: Models and Efficiency ¤ ¦ ¥ A statistical model is a collection of probability measures fP 2 Pg that specify the distribution of a random observation X. Consider the linear model Y = β0Z + e, where the observed data is k X = (Y; Z), β 2 R is unknown, and the joint distribution of (Z; e) satisfies • E(ejZ) = 0 almost surely, • E(e2jZ) ≤ K < 1 almost surely, and • the distribution of (Z; e) is otherwise unspecified (mostly). 4 Empirical Processes: Lecture 04 Spring, 2014 In this instance, the model P has • an unkown parameter β of interest and • some other unknown components of secondary interest but which allow flexibility (the joint distribution of (Z; e)). In general, a model P has • A parameter (often denoted ) of interest and • several partially unspecified components not of interest. 5 Empirical Processes: Lecture 04 Spring, 2014 For semiparametric models, = (θ; η), where • θ is finite dimensional and usually of primary interest, and • η is infinite dimensional and usually of secondary interest. • There may be other components not of interest which are not parameterized at all. • θ and η may have multiple subcomponents. • Choices of parameter names are contextual and vary in the literature, although there are some consistencies. 6 Empirical Processes: Lecture 04 Spring, 2014 The goals of semiparametric inference are primarily to: • Select an appropriate model for inference on X; • Estimate one or more subcomponents of , sometimes θ alone is the focus; • Conduct inference (e.g., confidence intervals) for the parameters of interest; • Try to obtain efficient estimators. The standard set-up for inference we will use in this book is where estimation and inference is based on a sample, X1;:::;Xn of i.i.d. realizations of a distribution P 2 P. 7 Empirical Processes: Lecture 04 Spring, 2014 The generic prameter of interest can be viewed as a function of the distribution (P ), since if we know the distribution, we know the parameter value. An estimator Tn (based on the data X1;:::;Xn) is efficient for (P ) if p the limiting variance V of n(Tn − (P )) is the smallest among all regular estimators of (P ). The inverse of V is the information for Tn. 8 Empirical Processes: Lecture 04 Spring, 2014 The optimal efficiency of a parameter (P ) depends on the complexity of the model P. Estimation under P is more taxing than estimation under any parametric submodel P0 = fPθ : θ 2 Θ0g ⊂ P, where Θ0 is finite-dimensional. For example, in the linear regression model P setting above, if we assume Z and e are independent and e is N(0; σ2), where σ2 is unknown, and call this model P0, P0 is a parametric submodel of P. 9 Empirical Processes: Lecture 04 Spring, 2014 Thus, and in general, information for P is worse (less) than information for any parametric submodel P0 ⊂ P. For a semiparametric model P, if the information for the regular estimator Tn (to estimate (P )) equals the minimum (infemum) of the Fisher informations for all parametric submodels of P, then Tn is semiparametric efficient for (P ), and the associated information for Tn is called the “efficient information” (also called the efficient information for P). This is because the only settings with more information than Tn are for parametric (not semiparametric) submodels. 10 Empirical Processes: Lecture 04 Spring, 2014 Interestingly, when (P ) is one-dimensional, it is always possible to identify a one-dimensional parametric submodel that has the same Fisher information as the efficient information for P. A parametric submodel that achieves the efficient information is called a least favorable or hardest submodel. Fortunately, finding the efficient information for (P ) only requires consideration of one-dimensional parametric submodels fPt : t 2 [0; )g surrounding representative (true) distributions P 2 P, where P0 = P and Pt 2 P for all t 2 [0; ). 11 Empirical Processes: Lecture 04 Spring, 2014 If P has a dominating measure µ, then each P can be expressed as a density p. In this case, we require the densities around p to be smooth enough so R 2 that g(x) = @pt(x)=(@t)jt=0 exists with X g (x)p(x)µ(dx) < 1. More generally, we say that a submodel is differentiable in quadratic mean with score function g : X 7! R if 2 Z " 1=2 1=2 # (dPt(x)) − (dP (x)) 1 − g(x)(dP (x))1=2 ! 0: (2) t 2 12 Empirical Processes: Lecture 04 Spring, 2014 Sometimes it is important for us to consider a collection of many one-dimensional submodels surrounding a representative P , each submodel represented by a score function g: such a collection of score functions P_P is called a tangent set. 2 Because, for any g 2 P_P , P g = 0 and P g < 1, these tangent sets 0 are subsets of L2(P ), the space of all function h : X 7! R with P h = 0 and P h2 < 1. When the tangent set is closed under linear combinations, it is called a tangent space. 13 Empirical Processes: Lecture 04 Spring, 2014 Consider a parametric model P = fPθ : θ 2 Θg, where Θ is an open k subset of R and P is dominated by µ, such that the classical score _ _ 2 function `θ(x) = @ log pθ(x)=(@θ) exists with Pθk`θk < 1 (and a few other conditions hold). One can show that each of the one-dimensional submodels 0 _ k g(x) = h `θ(x), for any h 2 R , satisfies (2), forming the tangent space _ 0 _ k PPθ = fh `θ : h 2 R g. Thus there is a simple and direct connection between the classical score function and the more general tangent spaces. 14 Empirical Processes: Lecture 04 Spring, 2014 Continuing with the parametric setting, we say that an estimator θ^ of θ is efficient for estimating θ if • it is regular and _ _0 • it’s information achieves the Cramer-Rao` lower bound P [`θ`θ]. Thus the tangent set for the model contains information about the optimal efficiency. This is also true for semiparametric models, although the relationship between tangents sets and the optimal information is more complex. 15 Empirical Processes: Lecture 04 Spring, 2014 k Consider estimation of the parameter (P ) 2 R for the model P: For any estimator Tn of (P ), if p p ˇ n(Tn − (P )) = nPn P + oP (1); then ˇ • P is an influence function for (P ) and • Tn is asymptotically linear. 16 Empirical Processes: Lecture 04 Spring, 2014 For a given tangent set P_P , assume for each submodel fPt : 0 ≤ t < g satisfying (2) with some g 2 P_P and some > 0, that @ (P ) t _ = P (g); @t t=0 _ 0 k for some linear map P : L2(P ) 7! R : In this setting, we say that is differentiable at P relative to P_P . 17 Empirical Processes: Lecture 04 Spring, 2014 When P_P is a linear space, there exists a measurable function ~ k P : X 7! R such that _ h ~ i P (g) = P P (X)g(X) ; for each g 2 P_P . ~ _ 0 The function P 2 PP ⊂ L2(P ) is unique and is called the efficient influence funtion for the parameter in the model P. 18 Empirical Processes: Lecture 04 Spring, 2014 ~ The efficient influence function P for can usually be found by taking ˇ any influence function P and projecting it onto P_P . Moreover, we will see in Theorem 18.8 that full efficiency of an estimator Tn of (P ) can be verified by checking that the influence function of Tn lies in P_P . 19 Empirical Processes: Lecture 04 Spring, 2014 k Consider a parametric model P = fPθ : θ 2 Θg, where Θ ⊂ R . k d Suppose the parameter of interest is (Pθ) = f(θ), for f : R 7! R _ with derivative fθ at θ, and that the Fisher information for θ, h i _ _0 Iθ ≡ P `θ`θ is invertible. 0 _ k Recall that, in this case, g(X) = h `θ(X), for some h 2 R , and thus _ _ Pt = Pθ+th, which implies that P (g) = fθh. 20 Empirical Processes: Lecture 04 Spring, 2014 Since also we have h i h i _ −1 _ _ −1 _ _0 fθIθ P `θ(X)g(X) = fθIθ P `θ`θh _ = fθh; h i _ _ ~ we have that P (g) = fθh = P P g , provided ~ _ −1 _ P (X) = fθIθ `θ(X): p p ~ Any estimator Tn for which n(Tn − (P )) = nPn P + oP (1) has _ −1 _T asymptotic variance fθIθ fθ and thus achieves the Cramer` Rao lower bound and is efficient.