<<

STAT/BMI 741 University of Wisconsin-Madison

Empirical Processes &

Lecture 1 M- and Z-Estimation

Lu Mao [email protected] 1-1 Objectives

By the end of this lecture, you will 1. know what the term empirical process ; 2. get (re-)familiarized with M- and Z- and their asymptotic properties; 3. be somewhat acquainted with defining and solving infinite-dimensional estimating equations (e.g., Nelsen-Aalen, Cox model); 4. be able to heuristically derive functional derivatives.

M- and Z-Estimation 1-2 1.1 Motivation and Warm-Up

1.2 Asymptotics for Z-Estimators

1.3 Infinite-Dimensional Estimating Functions

1.4 Application: Classical Survival Analysis Methods as NPMLE

M- and Z-Estimation 1-3 Contents

1.1 Motivation and Warm-Up

1.2 Asymptotics for Z-Estimators

1.3 Infinite-Dimensional Estimating Functions

1.4 Application: Classical Survival Analysis Methods as NPMLE

M- and Z-Estimation 1-4 Notation We deal with an i.i.d. sample of size n and use X to denote the generic observation until further notice.

Empirical measure Pn: n −1 X Pnf(X) = n f(Xi) i=1 Underlying measure P :

P f(X) = Ef(X)

Standardized empirical measure Gn: √ Gnf(X) = n(Pn − P )f(X) n −1/2 X = n {f(Xi) − Ef(X)} i=1

M- and Z-Estimation 1-5 Notation By the (ordinary) ,

 Gnf(X) N 0, Varf(X) , where denotes weak convergence. The empirical process is defined as:

{Gnf : f ∈ F} where F is a class of functions, e.g.,

p F = {fθ : θ ∈ Θ ⊂ R }, F = {All uniformly bounded non-decreasing (or bounded-variation) functions on [0, τ]}.

M- and Z-Estimation 1-6 Motivation and Warm-Up

Why consider varying f? Example: the classical empirical process:

Gn1(X ≤ t), t ∈ R, which converges (uniformly in t) weakly to a tight Brownian bridge GF with function F (t ∧ s) − F (t)F (s) where F (t) = Pr(X ≤ t).

Furthermore, GF has continuous sample paths

GF (s) = GF (t) + o(1),F -a.s. as d(s, t) → 0, where d(s, t) = |F (s) − F (t)|.

M- and Z-Estimation 1-7 Motivation and Warm-Up

The uniform (weak) convergence and continuity of the sample paths of the limiting distribution have two important implications. 1. Confidence Band: √ n sup |Pn1(X ≤ t) − F (t)| supt∈[a,b]|GF | t∈[a,b]

confidence bands for F (·) on [a, b] can be constructed using (estimated) quantiles of the right-hand side of the above display. 2. Asymptotic Continuity (AC):

Gn1(X ≤ xn) = Gn1(X ≤ x) + oP (1)

∀xn, x, s.t. d(xn, x) → 0.

M- and Z-Estimation 1-8 Motivation and Warm-Up

2. Asymptotic Continuity (AC, cont’d): To see this, use strong approximation to construct a GF such that |Gn1(X ≤ t) − GF (t)| →a.s. 0. Then, with probability one,

Gn1(X ≤ xn) = GF (xn) + o(1) (uniform convergence)

= GF (x) + o(1) (continuity of GF )

= Gn1(X ≤ x) + o(1). (pointwise convergence)

The AC property is tremendously useful in the general setting, especially when one wishes to replace the (erratically behaved) sample average operation Pn with the population average operation P , which is usually smoother.

M- and Z-Estimation 1-9 Motivation and Warm-Up What is the general setting? The problem of uniform weak convergence with varying f can be framed as weak convergence in the space of all bounded functions, i.e.,

∞  Gnf GP (f), in l (F), || · ||F (1.1)

∞ for some tight process GP , where l (F) denotes the space of bounded functionals on F and || · ||F the supreme norm.

The limiting process GP , if it exists at all, is always Gaussian with σ(f, g) = P fg − P fP g

A more remarkable fact is that the tight process GP has uniformly continuous sample paths with respect to the metric ρ(·, ·)

ρ(f, g) = pP (f − P f − g + P g)2

M- and Z-Estimation 1-10 Motivation and Warm-Up

So, if the uniform weak convergence (1.1) holds, one is allowed to

1. Construct confidence band for the functional P f, f ∈ F 2. Use the AC condition

Gnfn = Gnf + oP (1)

provided that ρ(fn, f) → 0.

More applications of empirical processes theory include quantifying the modulus of continuity

φn(δ) = sup |Gn(f − g)|, f,h∈F,ρ(f,g)<δ which is useful in deriving the rate of convergence for M-estimators.

M- and Z-Estimation 1-11 Motivation and Warm-Up

Why empirical processes (EP) in survival analysis

1. Survival analysis deals with processes (indexed by time), a natural setting for EP; 2. Survival analysis is usually concerned with non- and semi-parametric models, whose infinite-dimensional are difficult to deal with (or outright intractable) by traditional tools of asymptotics; 3. While martingale theory offers an elegant conceptual framework, the of problems it can handle is limited, especially when one ventures beyond the conventional univariate setting into the realms of, e.g., recurrent events, multivariate failure times, (semi-)competing risks, etc.. The EP is much more powerful and versatile than is the martingale theory.

M- and Z-Estimation 1-12 Motivation and Warm-Up

In this series of lectures we aim to 1. give an expository introduction of EP theory as a tool of asymptotic ; 2. illustrate the powerfulness and versatility of EP techniques with a sample of survival analysis problems;

In the process, we will 1. be as mathematically precise as it need be; 2. not be overly concerned with regularity conditions, e.g., measureability in non-separable Banach spaces such as l∞(F); 3. stick to the principle of “intuition first, technicality later”.

M- and Z-Estimation 1-13 Contents

1.1 Motivation and Warm-Up

1.2 Asymptotics for Z-Estimators

1.3 Infinite-Dimensional Estimating Functions

1.4 Application: Classical Survival Analysis Methods as NPMLE

M- and Z-Estimation 1-14 M-Estimation

Consider estimation of θ, whose true state of nature θ0 satisfies

θ0 = arg max P mθ θ e.g.

1. Classical parametric model with density pθ(X)

pθ mθ(X) = log (X) pθ0

2. A regression model Eθ(Y |Z) = µ(Z; θ)

2 mθ(Y,Z) = −(Y − µ(Z; θ))

M- and Z-Estimation 1-15 M-Estimation

A natural is

θbn = arg max Pnmθ θ e.g. 1. Maximum likelihood estimator for classical parametric model

ML θbn = arg max Pn log pθ(X) θ 2. estimator for mean regression model

LS 2 θbn = arg min Pn(Y − µ(Z; θ)) θ

M- and Z-Estimation 1-16 Z-Estimation

When mθ(X) is smooth in θ, can compute θbn by solving the “score” equation Pnm˙ θ(X) = 0, ∂ where m˙ θ = ∂θ mθ. We thus consider Z-estimators defined as solutions to estimating equations of the following type

Pnψθ = 0

where the root, denoted as θbn, is expected to estimate θ0 with P ψθ0 = 0.

M- and Z-Estimation 1-17 Z-Estimation Formally, √ √ √ ˙ 0 = nPnψ = nPnψθ0 + Pnψ n(θbn − θ0), bθn eθn where θen is between θ0 and θbn. ˙ ˙ ˙ ˙ If Pnψ → P ψθ0 (e.g., if supθ |Pn − P |ψθ →P 0 and P ψθ is eθn continuous at θ0), then √ ˙ −1 n(θbn − θ0) = −(P ψθ0 ) Gnψθ0 + oP (1)   N 0, (P ψ˙ )−1P ψ⊗2(P ψ˙ T )−1 , θ0 θ0 θ0 where the asymptotic can be consistently estimated by a (-based) sandwich estimator. But the above derivations would need (argument-wise, i.e., for each

fixed x) smoothness conditions on the estimating function ψθ(x). M- and Z-Estimation 1-18 Z-Estimation It is not useful for deriving the asymptotic distributions of, e.g., the pth sample quantile, which is an approximate zero of   Ψn(θ) := Pn 1(X ≤ θ) − p . But if the following holds

Gnψ = Gnψθ0 + oP (1), (1.2) bθn √ √ i.e., n n(ψ − ψθ0 ) = nP (ψ − ψθ0 ) + oP (1), then we can show that P θn θn ∂ b ∂ b ∂θ Pnψθ can be replaced by ∂θ P ψθ. Condition (1.2) is an asymptotic continuity (AC) condition, and is true if a uniform central limit theorem of the empirical process

{Gnψθ : ||θ − θ0|| < δ}

holds and ρ(ψ , ψθ0 ) →P 0 as discussed in the previous section. bθn M- and Z-Estimation 1-19 Z-Estimation - Consistency

Theorem 1.1 (Consistency of Z-Estimators)

Suppose

(1.1a) supθ |Pn − P |ψθ →P 0

(1.1b) For every {θn} with P ψθn → 0, we have θn → θ0

If Pnψ = oP (1), then θbn →p θ0. bθn

Remark 1.1 (Conditions for consistency)

Condition (1.1a) is a uniform-law-of-large-numbers type condition;

Condition (1.1b) says that θ0, as the zero of P ψθ need be not only unique but also well separated (c.f. van der Vaart, 1998, Figure 5.2)

M- and Z-Estimation 1-20 Z-Estimation

Exercise 1.1 (A sufficient condition for well-separatedness)

If P ψθ is continuous in θ which ranges in a compact set, then a

unique zero θ0 is also well separated.

Proof of Theorem 1.1: By (1.1a), P ψ = Pnψ + oP (1) = oP (1). (1.3) bθn bθn The result follows from (1.1b) (by sub-sequence arguments).

Exercise 1.2 (Completion of Proof of Theorem 1.1)

Show that given (1.1b), (1.3) implies θbn →P θ0 using sub-sequence arguments.

M- and Z-Estimation 1-21 Z-Estimation

Theorem 1.2 (Asymptotic Normality of Z-Estimator)

Suppose

(1.2a) The asymptotic continuity condition holds, i.e.,

Gnψ = Gnψ0 + oP (1) bθn

(1.2b) The function P ψθ is differentiable at θ0 with an invertible

derivative Vθ0 .

−1/2 If Pnψ = oP (n ) and θbn →P θ0, then bθn √ n(θ − θ ) = −V −1 ψ + o (1). bn 0 θ0 Gn θ0 P

M- and Z-Estimation 1-22 Z-Estimation

Proof of Theorem 1.2: Re-arrange the terms in the display of (1.2a) to obtain √ √ n(P ψ − P ψθ0 ) = n(Pnψ − Pnψθ0 ) + oP (1) bθn bθn where the right-hand side is

−Gnψθ0 + oP (1) by assumption. By (1.2b), the left-hand side is √ √ Vθ0 n(θbn − θ0) + oP ( n||θbn − θ0||). The result follows.

M- and Z-Estimation 1-23 Z-Estimation Example 1.1 (Sample Quantiles)

Denote ξp0 as the pth quantile of distribution F and suppose the

density f is continuous on a neighborhood of ξp0 and f(ξp0) > 0.

Consider the sample quantile ξbpn that approximately solves (up to n−1) Pn(1(X ≤ ξ) − p) ≈ 0.

Denote ψξ(X) = 1(X ≤ ξ) − p. Then, by the classical Glivenko-Cantelli result,

Pnψξ →P P ψξ = F (ξ) − p,

uniformly in ξ ∈ R. Note that ξp0 is a unique and well separated zero of F (·) − p because F is non-decreasing and strictly increasing on a

neighborhood of ξp0. By Theorem 1.1, ξbpn →P ξp0. M- and Z-Estimation 1-24 Z-Estimation Example 1.1 (Sample Quantiles, cont’d)

Furthemore, by the classical Donsker theorem, a uniform central limit theorem holds on the class {Gnψξ : |ξ − ξp0| < δ} for some δ > 0. q

Also, ρ(ψ , ψξp0 ) ≤ |F (ξbpn) − F (ξp0)| →P 0 as ξbpn →P ξp0. We ξbpn have verified the AC condition (1.2a). Since ∂ P ψξ = f(ξp0), ∂ξ ξ=ξp0 we have

√ −1 n(ξbpn − ξp0) = −f(ξp0) Gn(1(X ≤ ξp0) − p) + oP (1) −2  N 0, f(ξp0) p(1 − p) ,

where f(ξp0) can be estimated by fbn(ξbpn) with some density estimator fbn. M- and Z-Estimation 1-25 Z-Estimation Example 1.1 (Sample Quantiles, cont’d)

In the following simulations, F is the distribution of the log of Expn(1). We use estimator of f with bin width 2n−1/5.

Table 1.1: Simulation of sample quantiles of the log of Expn(1)

n = 200 n = 1000 p SE SEE CP Bias SE SEE CP 0.25 0.007 0.139 0.141 0.941 0.000 0.063 0.064 0.954 0.50 −0.003 0.101 0.103 0.953 0.000 0.045 0.046 0.950 0.75 0.000 0.087 0.090 0.961 0.000 0.040 0.040 0.949

Bias and SE are the bias and of the estimator; SEE is the empirical average of the standard error estimator; CP is the empirical coverage probability of the 95% confidence interval. Each entry is based on 2,000 replicates.

M- and Z-Estimation 1-26 Contents

1.1 Motivation and Warm-Up

1.2 Asymptotics for Z-Estimators

1.3 Infinite-Dimensional Estimating Functions

1.4 Application: Classical Survival Analysis Methods as NPMLE

M- and Z-Estimation 1-27 Z-Estimation - A Functional Extension

In non- and semi-parametric models, the parameter may be infinite-dimensional, e.g., distribution functions, hazard functions, etc.. Denote the functional parameter as η(t), t ∈ T for some index set T . In that case, the estimating function must also be infinite dimensional; denote it as

ψη(t), t ∈ T . Theorem 1.2 holds in this functional case provided that the AC condition

(1.2a) holds uniformly in t and that the Vθ0 matrix in (1.2b) is replaced by a suitably defined functional derivative Vη0 [·]. Rigorous definitions of functional derivatives are the subject of Lecture 3. But heuristically, the functional derivative can be calculated by ∂

Vη0 [h] = P ψη0+h ∂ =0

M- and Z-Estimation 1-28 Z-Estimation - A Functional Extension

The question remains how an infinite-dimensional estimating function might be defined? Suppose we are in the M-estimation setting where we wish to find

ηn = arg max Pnmη (1.4) b η∈H for some objective function mη, where H is some functional space. If η is a positive measure on T , e.g., a hazard function. For fixed η, consider a parametric sub-model η,t  dη,t(·) = 1 + 1(· ≤ t) dη(·) where  ∈ R. Note that when  is small, η,t is also a positive measure for every t.

M- and Z-Estimation 1-29 Z-Estimation - A Functional Extension So if ηbn is to satisfy (1.4), then the objective function

φn() := Pnmη,t , is maximized at 0 when η = ηbn. That is, we must have ∂

Pnmη,t = 0, t ∈ T ∂ =0 at η = ηbn. Hence the infinite-dimensional estimating function can be taken to be ∂

ψη(t) = mη,t , t ∈ T ∂ =0 If η has a constraint on its total mass so that R dη = 1, e.g., a distribution function, then one can use the sub-model n o dη,t(·) = 1 +  1(· ≤ t) − η(t) dη(·). R Clearly, dη,t = 1 for all  and t. Then, the estimating function and be constructed similarly. M- and Z-Estimation 1-30 Z-Estimation Example 1.2 (The Empirical Distribution as NPMLE)

Suppose we want estimate the nonparametric distribution function F . The log-likelihood is

`n(F ) = Pn log dF (X).

The nonparametric likelihood estimator (NPMLE) Fbn is maximizer of over the space of all discrete distribution functions.

To derive Fbn algebraically, denote di = dF (Xi). Using a Lagrange multiplier λ to enforce the constraint on the di, we then maximize the following function

n n ! X X log di − λ di − 1 . i=1 i=1

M- and Z-Estimation 1-31 Z-Estimation Example 1.2 (The Empirical Distribution as NPMLE)

Take derivatives with respect to the di and λ and set them to zero −1 to find that di = n . Hence

n X Fbn(t) = di1(Xi ≤ t) = Pn1 (X ≤ t) i=1 which is the empirical distribution function. Alternatively, one may take a functional perspective. Define the sub-model for F by

n o dF,t(·) = 1 +  1(· ≤ t) − F (t) dF (·).

∂ Take ∂ `n(F,t) = 0, ∀t ∈ R and set F = Fbn, we obtain =0 Pn1 (X ≤ t) − Fbn(t) = 0. The result follows. M- and Z-Estimation 1-32 Contents

1.1 Motivation and Warm-Up

1.2 Asymptotics for Z-Estimators

1.3 Infinite-Dimensional Estimating Functions

1.4 Application: Classical Survival Analysis Methods as NPMLE

M- and Z-Estimation 1-33 Nelsen-Aalen Estimator as NPMLE

Example 1.3 (Nelsen-Aalen Estimator)

Let T be the event time, C the independent time, δ = 1(T ≤ C), X = T ∧ C. The observed are (X, δ). Equivalently, the observed data can be represented by N(t) = 1(X ≤ t, δ = 1), and Y (t) = 1(X ≥ t), t ∈ [0, τ], where τ is the study end time. Denote π(t) = P (X ≥ t). The interest lies in estimating Λ, the hazard function of T . The log-likelihood can be written as

lΛ(δ, X) = δ log dΛ(X) − Λ(X) Z τ Z τ = log dΛ(s)dN(s) − Y (s)dΛ(s). 0 0

M- and Z-Estimation 1-34 Nelsen-Aalen Estimator as NPMLE

Example 1.3 (Nelsen-Aalen Estimator, cont’d)

To see the second equality, recall that Z τ δf(X) = f(t)dN(t), ∀f, 0 and Z τ f(X) = Y (t)df(t), if f(0) = 0. 0

As discussed in §1.3, define the sub-model Λ,t by

dΛ,t(·) = (1 + 1(· ≤ t))dΛ(·), t ∈ [0, τ].

M- and Z-Estimation 1-35 Nelsen-Aalen Estimator as NPMLE

Example 1.3 (Nelsen-Aalen Estimator, cont’d)

Then, we can calculate the (infinite-dimensional) score function by

∂ ˙ `Λ(X, δ)(t) = lΛ,t (X, δ) ∂ =0 Z t = N(t) − Y (s)dΛ(s) 0

=: MΛ(t).

Observe that MΛ0 (t) is the usual counting process martingale. Set

PnMΛ(t) = 0, t ∈ [0, τ]. (1.5)

M- and Z-Estimation 1-36 Nelsen-Aalen Estimator as NPMLE

Example 1.3 (Nelsen-Aalen Estimator, cont’d)

That is, Z t PnN(t) − Pn Y (s)dΛ(s) = 0. 0 So, PndN(t) − PnY (t)dΛ(t) = 0, and hence PndN(t) # of failures at t dΛbn(t) = = PnY (t) # at risk at t−

Now, to derive the influence function of Λbn, it remains to calculate VΛ0 , the functional derivative of PMΛ at Λ0, and its inverse.

M- and Z-Estimation 1-37 Nelsen-Aalen Estimator as NPMLE Example 1.3 (Nelsen-Aalen Estimator, cont’d)

Following the suggestion in §1.3, we calculate the derivative of

PMΛ with respect to Λ by

∂ Z t

VΛ[h](t) = PMΛ+h(t) = − π(s)dh(s). ∂ =0 0

−1 How to derive VΛ ? Let

V [h](t) = g(t).

Then, −π(t)dh(t) = dg(t).

Thus, Z t V −1[g](t) = h(t) = − π(s)−1dg(s). 0 M- and Z-Estimation 1-38 Nelsen-Aalen Estimator as NPMLE Example 1.3 (Nelsen-Aalen Estimator, cont’d)

Therefore, by Theorem 1.2, √ n(Λ (t) − Λ (t)) = − V −1[M ](t) + o (1) bn 0 Gn Λ0 Λ0 P Z t −1 = Gn π(s) dMΛ0 (s) + oP (1). 0

√   Hence, n Λbn(t) − Λ0(t) is asymptotically Gaussian with covariance function Z t Z s  −1 −1 σ(s, t) = P π(u) dMΛ0 (u) π(u) dMΛ0 (u) (*) 0 0 Z s∧t −1 = π(u) dΛ0(u), (†) 0 where (†) follows from martingale theory. M- and Z-Estimation 1-39 Nelsen-Aalen Estimator as NPMLE

Example 1.3 (Nelsen-Aalen Estimator, cont’d)

Based on (*), we have a robust estimator

Z t Z s  σ∗ (s, t) = { Y (u)}−1dM (u) { Y (u)}−1dM (u) . bn Pn Pn Λ Pn Λ 0 bn 0 bn Based on (†), we have a “simplified” estimator

Z s∧t † −1 σbn(s, t) = {PnY (u)} dΛbn(u). 0

Both are consistent in this setting. However, (*) is preferred in other settings when robustness is needed.

M- and Z-Estimation 1-40 Nelsen-Aalen Estimator for Recurrent Event

Example 1.3 (Nelsen-Aalen Estimator, cont’d)

Consider a recurrent event counting process N ∗(t). The interest is to estimate µ(t) := EN ∗(t).

The observed data are N(t) := N ∗(t ∧ C) and Y (t) := I(C ≥ t), t ∈ [0, τ]. Note that, motivated by the score function in the univariate case, we may use the following martingale-type function as estimating function

Z t Mµ(t) := N(t) − Y (s)dµ(s), 0

since it has mean zero at µ0.

M- and Z-Estimation 1-41 Nelsen-Aalen Estimator for Recurrent Event

Example 1.3 (Nelsen-Aalen Estimator, cont’d)

Similarly, we arrive at the Nelsen-Aalen type estimator

Z t PndN(s) µbn(t) = 0 PnY (s)

as a solution to PnMµ(t) = 0

To estimate the co- of µbn, version (†) is invalid unless one is willing to assume that N ∗(·) is a (nonhomogeneous) Poisson process. On the other hand, the robust version (*) is valid regardless of the correlation structure between the events, provided that certain (mild) regularity conditions hold.

M- and Z-Estimation 1-42 Cox Model Revisited

Example 1.4 (Cox Model)

In addition to the variables in the univariate case of Example 1.3, introduce a vector of covariates Z, and assume that T ⊥⊥ C|Z. Consider the Cox model for the conditional hazard function for T

Λ(t|Z) = exp(θTZ)Λ(t),

where θ is the regression parameter and Λ is an infinite-dimensional baseline cumulative hazard function. In what follows, we are going to derive the NPMLE for the Cox model and show that the profile likelihood for θ corresponds to the partial likelihood.

M- and Z-Estimation 1-43 Cox Model Revisited Example 1.4 (Cox Model, cont’d)

The log-likelihood takes the form

`θ,Λ(δ, X, Z) = δ log dΛ(X|Z) − Λ(X|Z) T = δθTZ + δ log dΛ(X) − eθ Z Λ(X) τ τ Z T Z = θTZN(τ) + log dΛ(t)dN(t) − eθ Z Y (t)dΛ(t). 0 0

The score function for θ is Z τ ˙ θTZ `1(θ, Λ) = ZN(τ) − Ze Y (t)dΛ(t) 0

= ZMθ,Λ(τ; Z),

R t θTZ where Mθ,Λ(t; Z) = N(t) − 0 e Y (t)dΛ(t).

M- and Z-Estimation 1-44 Cox Model Revisited

Example 1.4 (Cox Model, cont’d)

Using techniques similar to those in Example 1.3, we find the (infinite-dimensional) score function for Λ

˙ `2(θ, Λ)(t) = Mθ,Λ(t; Z), t ∈ [0, τ].

˙ To “profile away” Λ, solve Pn`2(θ, Λ)(t) = 0 for Λ while holding θ fixed, again along the lines of Example 1.3, and obtain

Z t dN(u) Λ (t; θ) = Pn . (1.6) bn θTZ 0 Pne Y (u) Note that (1.6) is the Breslow estimator when θ is replaced by its partial-likelihood estimator.

M- and Z-Estimation 1-45 Cox Model Revisited

Example 1.4 (Cox Model, cont’d) ˙ Insert (1.6) into the place of Λ in Pn`1(θ, Λ) to obtain the profile likelihood score for θ

˙ ˙   p`n(θ) := Pn`1 θ, Λbn(·; θ)  τ  Z T dN(u) = ZN(τ) − Zeθ Z Y (u) Pn Pn θTZ 0 Pne Y (u) T Z τ Zeθ Z Y (u) = ZN(τ) − Pn dN(u) Pn θTZ Pn 0 Pne Y (u) ( T ) Z τ Zeθ Z Y (u) = Z − Pn dN(u), (1.7) Pn θTZ 0 Pne Y (u)

which is indeed the partial likelihood score function.

M- and Z-Estimation 1-46 Concluding Remarks

We have presented a rigorous Z-estimation theory for Euclidean parameters. We have seen how this theory, with some hand-waving, might be generalized to the infinite-dimensional case, which is useful for non- and semi-parametric models. Beyond the Cox model, the infinite-dimensional estimating equation generally lacks an explicit solution. In the spirit of Theorem 1.2, however, it suffices to show that the inverse of the V operator (i.e., information operator in the case of NPMLE) exists and is bounded in order to prove the asymptotic normality of the Z-estimator. Two representative works that may be instructive in this regard are Murphy, Rossini, & van der Vaart (1997; proportional odds model), and Zeng & Lin (2006; transformation models).

M- and Z-Estimation 1-47 References

- Murphy, S. A., Rossini, A. J., & van der Vaart (1997). A Maximum likelihood estimation in the proportional odds model. Journal of the American Statistical Association, 92, 968-976. - Zeng, D. & Lin, D. Y. (2006). Efficient estimation of semiparametric transformation models for counting processes. Biometrika, 93, 627-640.

M- and Z-Estimation 1-48