M- and Z-Estimation

STAT/BMI 741 University of Wisconsin-Madison Empirical Processes & Survival Analysis Lecture 1 M- and Z-Estimation Lu Mao [email protected] 1-1 Objectives By the end of this lecture, you will 1. know what the term empirical process means; 2. get (re-)familiarized with M- and Z-estimators and their asymptotic properties; 3. be somewhat acquainted with defining and solving infinite-dimensional estimating equations (e.g., Nelsen-Aalen, Cox model); 4. be able to heuristically derive functional derivatives. M- and Z-Estimation 1-2 1.1 Motivation and Warm-Up 1.2 Asymptotics for Z-Estimators 1.3 Infinite-Dimensional Estimating Functions 1.4 Application: Classical Survival Analysis Methods as NPMLE M- and Z-Estimation 1-3 Contents 1.1 Motivation and Warm-Up 1.2 Asymptotics for Z-Estimators 1.3 Infinite-Dimensional Estimating Functions 1.4 Application: Classical Survival Analysis Methods as NPMLE M- and Z-Estimation 1-4 Notation We deal with an i.i.d. sample of size n and use X to denote the generic observation until further notice. Empirical measure Pn: n −1 X Pnf(X) = n f(Xi) i=1 Underlying probability measure P : P f(X) = Ef(X) Standardized empirical measure Gn: √ Gnf(X) = n(Pn − P )f(X) n −1/2 X = n {f(Xi) − Ef(X)} i=1 M- and Z-Estimation 1-5 Notation By the (ordinary) central limit theorem, Gnf(X) N 0, Varf(X) , where denotes weak convergence. The empirical process is defined as: {Gnf : f ∈ F} where F is a class of functions, e.g., p F = {fθ : θ ∈ Θ ⊂ R }, F = {All uniformly bounded non-decreasing (or bounded-variation) functions on [0, τ]}. M- and Z-Estimation 1-6 Motivation and Warm-Up Why consider varying f? Example: the classical empirical process: Gn1(X ≤ t), t ∈ R, which converges (uniformly in t) weakly to a tight Brownian bridge GF with covariance function F (t ∧ s) − F (t)F (s) where F (t) = Pr(X ≤ t). Furthermore, GF has continuous sample paths GF (s) = GF (t) + o(1),F -a.s. as d(s, t) → 0, where d(s, t) = |F (s) − F (t)|. M- and Z-Estimation 1-7 Motivation and Warm-Up The uniform (weak) convergence and continuity of the sample paths of the limiting distribution have two important implications. 1. Confidence Band: √ n sup |Pn1(X ≤ t) − F (t)| supt∈[a,b]|GF | t∈[a,b] confidence bands for F (·) on [a, b] can be constructed using (estimated) quantiles of the right-hand side of the above display. 2. Asymptotic Continuity (AC): Gn1(X ≤ xn) = Gn1(X ≤ x) + oP (1) ∀xn, x, s.t. d(xn, x) → 0. M- and Z-Estimation 1-8 Motivation and Warm-Up 2. Asymptotic Continuity (AC, cont’d): To see this, use strong approximation to construct a GF such that |Gn1(X ≤ t) − GF (t)| →a.s. 0. Then, with probability one, Gn1(X ≤ xn) = GF (xn) + o(1) (uniform convergence) = GF (x) + o(1) (continuity of GF ) = Gn1(X ≤ x) + o(1). (pointwise convergence) The AC property is tremendously useful in the general setting, especially when one wishes to replace the (erratically behaved) sample average operation Pn with the population average operation P , which is usually smoother. M- and Z-Estimation 1-9 Motivation and Warm-Up What is the general setting? The problem of uniform weak convergence with varying f can be framed as weak convergence in the space of all bounded functions, i.e., ∞ Gnf GP (f), in l (F), || · ||F (1.1) ∞ for some tight process GP , where l (F) denotes the space of bounded functionals on F and || · ||F the supreme norm. The limiting process GP , if it exists at all, is always Gaussian with covariance matrix σ(f, g) = P fg − P fP g A more remarkable fact is that the tight process GP has uniformly continuous sample paths with respect to the standard deviation metric ρ(·, ·) ρ(f, g) = pP (f − P f − g + P g)2 M- and Z-Estimation 1-10 Motivation and Warm-Up So, if the uniform weak convergence (1.1) holds, one is allowed to 1. Construct confidence band for the functional P f, f ∈ F 2. Use the AC condition Gnfn = Gnf + oP (1) provided that ρ(fn, f) → 0. More applications of empirical processes theory include quantifying the modulus of continuity φn(δ) = sup |Gn(f − g)|, f,h∈F,ρ(f,g)<δ which is useful in deriving the rate of convergence for M-estimators. M- and Z-Estimation 1-11 Motivation and Warm-Up Why empirical processes (EP) in survival analysis 1. Survival analysis deals with processes (indexed by time), a natural setting for EP; 2. Survival analysis is usually concerned with non- and semi-parametric models, whose infinite-dimensional parameters are difficult to deal with (or outright intractable) by traditional tools of asymptotics; 3. While martingale theory offers an elegant conceptual framework, the range of problems it can handle is limited, especially when one ventures beyond the conventional univariate setting into the realms of, e.g., recurrent events, multivariate failure times, (semi-)competing risks, etc.. The EP is much more powerful and versatile than is the martingale theory. M- and Z-Estimation 1-12 Motivation and Warm-Up In this series of lectures we aim to 1. give an expository introduction of EP theory as a tool of asymptotic statistics; 2. illustrate the powerfulness and versatility of EP techniques with a sample of survival analysis problems; In the process, we will 1. be as mathematically precise as it need be; 2. not be overly concerned with regularity conditions, e.g., measureability in non-separable Banach spaces such as l∞(F); 3. stick to the principle of “intuition first, technicality later”. M- and Z-Estimation 1-13 Contents 1.1 Motivation and Warm-Up 1.2 Asymptotics for Z-Estimators 1.3 Infinite-Dimensional Estimating Functions 1.4 Application: Classical Survival Analysis Methods as NPMLE M- and Z-Estimation 1-14 M-Estimation Consider estimation of θ, whose true state of nature θ0 satisfies θ0 = arg max P mθ θ e.g. 1. Classical parametric model with density pθ(X) pθ mθ(X) = log (X) pθ0 2. A mean regression model Eθ(Y |Z) = µ(Z; θ) 2 mθ(Y, Z) = −(Y − µ(Z; θ)) M- and Z-Estimation 1-15 M-Estimation A natural estimator is θbn = arg max Pnmθ θ e.g. 1. Maximum likelihood estimator for classical parametric model ML θbn = arg max Pn log pθ(X) θ 2. Least squares estimator for mean regression model LS 2 θbn = arg min Pn(Y − µ(Z; θ)) θ M- and Z-Estimation 1-16 Z-Estimation When mθ(X) is smooth in θ, can compute θbn by solving the “score” equation Pnm˙ θ(X) = 0, ∂ where m˙ θ = ∂θ mθ. We thus consider Z-estimators defined as solutions to estimating equations of the following type Pnψθ = 0 where the root, denoted as θbn, is expected to estimate θ0 with P ψθ0 = 0. M- and Z-Estimation 1-17 Z-Estimation Formally, √ √ √ ˙ 0 = nPnψ = nPnψθ0 + Pnψ n(θbn − θ0), bθn eθn where θen is between θ0 and θbn. ˙ ˙ ˙ ˙ If Pnψ → P ψθ0 (e.g., if supθ |Pn − P |ψθ →P 0 and P ψθ is eθn continuous at θ0), then √ ˙ −1 n(θbn − θ0) = −(P ψθ0 ) Gnψθ0 + oP (1) N 0, (P ψ˙ )−1P ψ⊗2(P ψ˙ T )−1 , θ0 θ0 θ0 where the asymptotic variance can be consistently estimated by a (moment-based) sandwich estimator. But the above derivations would need (argument-wise, i.e., for each fixed x) smoothness conditions on the estimating function ψθ(x). M- and Z-Estimation 1-18 Z-Estimation It is not useful for deriving the asymptotic distributions of, e.g., the pth sample quantile, which is an approximate zero of Ψn(θ) := Pn 1(X ≤ θ) − p . But if the following holds Gnψ = Gnψθ0 + oP (1), (1.2) bθn √ √ i.e., n n(ψ − ψθ0 ) = nP (ψ − ψθ0 ) + oP (1), then we can show that P θn θn ∂ b ∂ b ∂θ Pnψθ can be replaced by ∂θ P ψθ. Condition (1.2) is an asymptotic continuity (AC) condition, and is true if a uniform central limit theorem of the empirical process {Gnψθ : ||θ − θ0|| < δ} holds and ρ(ψ , ψθ0 ) →P 0 as discussed in the previous section. bθn M- and Z-Estimation 1-19 Z-Estimation - Consistency Theorem 1.1 (Consistency of Z-Estimators) Suppose (1.1a) supθ |Pn − P |ψθ →P 0 (1.1b) For every {θn} with P ψθn → 0, we have θn → θ0 If Pnψ = oP (1), then θbn →p θ0. bθn Remark 1.1 (Conditions for consistency) Condition (1.1a) is a uniform-law-of-large-numbers type condition; Condition (1.1b) says that θ0, as the zero of P ψθ need be not only unique but also well separated (c.f. van der Vaart, 1998, Figure 5.2) M- and Z-Estimation 1-20 Z-Estimation Exercise 1.1 (A sufficient condition for well-separatedness) If P ψθ is continuous in θ which ranges in a compact set, then a unique zero θ0 is also well separated. Proof of Theorem 1.1: By (1.1a), P ψ = Pnψ + oP (1) = oP (1). (1.3) bθn bθn The result follows from (1.1b) (by sub-sequence arguments). Exercise 1.2 (Completion of Proof of Theorem 1.1) Show that given (1.1b), (1.3) implies θbn →P θ0 using sub-sequence arguments. M- and Z-Estimation 1-21 Z-Estimation Theorem 1.2 (Asymptotic Normality of Z-Estimator) Suppose (1.2a) The asymptotic continuity condition holds, i.e., Gnψ = Gnψ0 + oP (1) bθn (1.2b) The function P ψθ is differentiable at θ0 with an invertible derivative Vθ0 .

M- and Z-Estimation

Probability Based Estimation Theory for Respondent Driven Sampling

Autoregressive Conditional Kurtosis

Detection and Estimation Theory Introduction to ECE 531 Mojtaba Soltanalian- UIC the Course

Quadratic Versus Linear Estimating Equations

Inference Based on Estimating Equations and Probability-Linked Data

Analog Transmit Signal Optimization for Undersampled Delay-Doppler

10. Linear Models and Maximum Likelihood Estimation ECE 830, Spring 2017

Generalized Estimating Equations for Mixed Models

Lessons in Estimation Theory for Signal Processing, Communications, and Control

Using Generalized Estimating Equations to Estimate Nonlinear

On the Aliasing and Resolving Power of Sea Level Low-Pass Filtered

Estimation Theory