Theoretical Properties of a New Weighted Likelihood Estimator for Right Censored Data

Adhidev Biswas, Pratim Guha Niyogi, Suman Majumder, Subir K. Bhandari, Abhik Ghosh and Ayanendranath Basu

TECHNICAL REPORT NO. ISRU/2018/2

INTERDISCIPLINARY STATISTICAL RESEARCH UNIT INDIAN STATISTICAL INSTITUTE 203, Barrackpore Trunk Road Kolkata - 700108 INDIA Theoretical Properties of a New Weighted Likelihood Estimator for Right Censored Data

Adhidev Biswas, Pratim Guha Niyogi, Suman Majumder, Subir K. Bhandari, Abhik Ghosh and Ayanendranath Basu

Abstract As in all cases involving randomly generated real data, the robustness issue is a matter of real concern in the context of right censored survival data. In the present manuscript we consider the weighted likelihood approach considered in Biswas et al.(2015) and Majumder et al.(2016) and apply it in the right censored survival analysis scenario. Important theoretical properties of the weighted likelihood estimator (WLE), such as the influence (IF) and asymptotic normality, are derived. The resulting function coincides with the IF of the maximum likelihood estimator (MLE) under the true model, thereby indicating the first order efficiency of the proposed method at the model. The same has been reflected in the asymptotic distribution of the estimator, where the asymptotic of the WLE coincides with that of the MLE. Taken in conjunction with the strong robustness properties of the WLE, the above results demonstrate the immense potential and utility of the WLE in case of right censored survival data.

Keywords: Asymptotic normality, Censoring, Consistency, Influence function, Maximum likelihood estimator, Sub-distribution functions, Weighted likelihood es- timator.

1 Introduction

In many scientific experiments, one needs to model real data to understand the laws of nature and to predict the behaviour of the system in future. Right censored data are commonly encountered in many medical and industrial experiments. As in any random dataset generated through a stochastic mechanism, right censored data (in survival or other contexts) may also be prone to (legitimately occurring or spurious) outliers and model misspecifications. The scale and nature of such problems has in- creased and become more complicated in the present age of big data. Sometimes, the

1 problem is further aggravated by incorrect measurements and faulty data recording. Classical procedures often produce questionable estimators and incorrect inference in such cases. This manuscript considers a non-trivial modification of the weighted likelihood estimation scheme as considered in Biswas et al.(2015) and Majumder et al.(2016), tailored to the additional requirement of dealing with the censored observations. The modification is facilitated by the availability of the product-limit survival function estimate as presented in Kaplan and Meier(1958). The rest of the manuscript is organized as follows. Section2 introduces the problem by describing the setup and presents suitable mathematical models. In Section3 we inspect the maximum likelihood estimating equation, which leads to the development of the proposed weighted likelihood scheme in Section4. In this section we also obtain the influence function of the weighted likelihood estimator. Section5 presents the proof of the asymptotic distribution of the weighted likelihood estimator under some standard regularity conditions.

2 The Mathematical Setup

We consider data generated in the survival context where each observation is either fully observed or censored randomly (independently of the failure time of the object) to the right. Let us introduce the necessary mathematical structure involved in what follows. Let T1,T2, ··· ,Tn denote the failure times of n objects, coming from the cumu- lative distribution function (CDF) G having a density function (PDF) g. The true density g is modelled by some suitable parametric family of densities {fθ : θ ∈ Θ}. But, in the right-censored context, we also have to deal with the censor- ing variable C which is assumed to have CDF FC and PDF fC . Let C1,C2, ··· ,Cn denote the independent and identically distributed (i.i.d.) observations from fC . Here we assume that the censoring distribution C does not contain any information about the model parameter θ. So, we essentially observe the variables

Xi = min{Ti,Ci} and ∆i = 1{Ti6Ci}, i ∈ {1, 2, ··· , n}, (1) where 1A is the indicator function for the event A. We now have i.i.d. pairs of observations {(Xi, ∆i); i ∈ {1, 2, ··· , n}}, on the basis of which we wish to obtain a robust weighted likelihood estimator of the parameter θ. For the sake of complete- ness, we start with the description of the maximum likelihood estimator (MLE) in the next section.

3 The Likelihood Equation

We shall first calculate the following quantities required for the expression of the likelihood function. For the ith observation, we consider two quantities, termed as

2 sub-distribution functions (Reid, 1981), as defined by

u F (x) = P[Xi 6 x, ∆i = 1] = P[Ti 6 x, Ti 6 Ci] Z x Z x = P[Ci > t]g(t)dt = SC (t)g(t)dt, (2) 0 0 and

c F (x) = P[Xi 6 x, ∆i = 0] = P[Ci 6 x, Ti > Ci] Z x Z x ¯ = P[Ti > t]fC (t)dt = G(t)fC (t)dt. (3) 0 0 ¯ Here, G = 1 − G and SC = 1 − FC denote the survival functions of T and C, respectively. Moreover, note that Z ∞ Z ∞ P[∆i = 1] = P[Ti 6 Ci] = P[Ci > t]g(t)dt = SC (t)g(t)dt, (4) 0 0 and Z ∞ Z ∞ ¯ P[∆i = 0] = P[Ti > Ci] = P[Ti > t]fC (t)dt = G(t)fC (t)dt. (5) 0 0

Combining the above results, we obtain the conditional distribution of Xi given ∆i as  S (x )dG(x ) δi  G¯(x )dF (x ) 1−δi dF (x |δ ) = C i i × i C i , δ ∈ {0, 1}. (6) Xi|∆i i i R ∞ R ∞ ¯ i 0 SC (t)dG(t) 0 G(t)dFC (t)

The likelihood function of θ, given the observations x = (x1, x2, ··· , xn) and δ = (δ1, δ2, ··· , δn), is evaluated by substituting Fθ in place of G, and has the form n Y δi 1−δi L(θ|x, δ) = (SC (xi)fθ(xi)) × (Sθ(xi)fC (xi)) , (7) i=1 where Sθ = 1 − Fθ with Fθ being the model CDF corresponding to the density fθ. Then, the log-likelihood is given by

n X l(θ) ≡ l(θ|x, δ) = {δi log fθ(xi) + (1 − δi) log Sθ(xi)} i=1 n X + {δi log SC (xi) + (1 − δi) log fC (xi)}. (8) i=1 We obtain the maximum likelihood equation of θ by maximizing this log-likelihood with respect to θ ∈ Θ or, equivalently, solving the estimating equation given by

n n X X ∇l(θ) = {δiu1,θ(xi) + (1 − δi)u2,θ(xi)} = Uθ(xi, δi) = 0, (9) i=1 i=1

3 where ∇ represents derivative with respect to θ, u1,θ(x) = ∇ log fθ(x), u2,θ(x) = ∇ log Sθ(x) and ( u1,θ(x), if δ = 1 Uθ(x, δ) = u2,θ(x), if δ = 0. In order to obtain the IF, we need to consider the likelihood functional based on the CDF’s G and FC . This can be defined as the solution θ to the population estimating equation given by ZZ Uθ(x, δ)dFX|∆(x|δ)dP(δ) = 0, (10) or, Z Z u1,θ(x)dFX|∆(x|1)P(∆ = 1) + u2,θ(x)dFX|∆(x|0)P(∆ = 0) = 0 X X or, Z Z u c u1,θ(x)dF (x) + u2,θ(x)dF (x) = 0. (11) X X It can easily be verified that replacing F u and F c by their respective empirical ˆu ˆc counterparts Fn and Fn will lead to the likelihood equation (9).

4 Robust Weighted Likelihood Functional: Influ- ence Function Analysis

In Section3, we considered the likelihood equation and derived the implicit definition which gives rise to the likelihood functional. In this section, we are going to extend this idea to obtain a robust weighted likelihood functional for the censored data case. Given the setup in Section2, let us again start with the data of the form (1) and follow the construction of the likelihood function in the same line mentioned in Section3. However, in order to control the influence of outlying observations, we introduce the weight function following the idea and rationale of Biswas et al.(2015) and Majumder et al.(2016). While defining the residual function in the present right-censored case, we also need to take into account the possibility of data points being censored (eg. lost to follow-up). As a remedy, we consider the product-limit estimator of the survival function (and likewise, that of the distribution function) introduced in Kaplan and Meier(1958) and define the residual function as follows.  0, if p < F (x) < 1 − p  θ  ˆ FKM(x) − 1, if Fθ(x) 6 p τKM,θ(x) = Fθ(x) (12)  ˆ∗ SKM(x)  − 1, if Fθ(x) > 1 − p,  Sθ(x)

4 ˆ∗ where SKM(x) is the product-limit estimator of P(T > x). In subsequent discussion, 1 the parameter p has been fixed at . 2 Using the same idea as in Biswas et al.(2015) and Majumder et al.(2016), the weight function is defined as a sufficiently smooth, unimodal, positive function H(·) of the residual in (12), with right tail decaying to zero. We now define our weighted likelihood estimating equation by modifying the likelihood equation (9) as below.

n X H(τKM,θ(Xi))Uθ(Xi, ∆i) = 0. (13) i=1 ˆ Hereafter, by weighted likelihood estimator (to be denoted by θn,WLE), we define any solution of the weighted likelihood estimating equation (13). Before embarking on evaluating the influence function, let us mention one prop- erty to be used in subsequent development. Lemma 4.1. The proposed weighted likelihood estimator is Fisher-consistent. Proof. The weighted likelihood functional T (G) is defined as a solution to the equa- tion Z H(τG,t(x))Ut(x, δ)dG(x) = 0, (14) X where τG,t(·) is defined as  G(x)  − 1, if 0 < F (x) 1/2 F (x) t 6 τ (x) = t G,t G¯(x)  − 1, if 1/2 < Ft(x) < 1. St(x) It will be proved in Section5 that the weighted likelihood estimating equation (13) and the maximum likelihood estimating equation (9) are asymptotically equivalent under some regularity conditions. Using this result and the fact that the expecta- tion of likelihood score function under the model is zero (under similar regularity conditions to be given in Section5) we assert that

Eθ[H(τFθ,θ(X))Uθ(X, ∆)] = 0, which establishes our claim and proves the result. To derive the influence function of the weighted likelihood estimator at the true distribution G, we first consider the contaminated version of the true distribution given by

G(x) = (1 − )G(x) + 1{x>y}. (15) The residual function for the same contaminated model will then be of the form

( G(x) 1 − 1, if Fθ(x) < τ (x) = Fθ(x) 2 (16) ,θ G¯(x) 1 − 1, if Fθ(x) , Sθ(x) > 2

5 and the weighted likelihood functional will be defined simply as a solution to the weighted version of equation (10) as given by ZZ H(τ,θ(x))Uθ(x, δ)dFX|∆(x|δ)dP(δ) = 0, (17) X or, Z u c H(τ,θ(x)) [u1,θ(x)dF (x) + u2,θ(x)dF (x)] = 0, (18) X u where X = [a, b), for some a ∈ [0, ∞) and b ∈ (a, ∞] finite or infinite, and dF (x) = c ¯ SC (x)dG(x) and dF (x) = G(x)dFC (x). Let θ be a solution to (18). Then the IF of our weighted likelihood estimator ∂ will be given by θ | . ∂  =0 At this stage, we define two of X :

 1  1 X = x ∈ X : F (x) < , X = x ∈ X : F (x) . (19) 1, θ 2 2, θ > 2

1 If we define ξ such that Fθ (ξ) = 2 , then the above definitions become

X1, = {x ∈ X : a < x < ξ} , X2, = {x ∈ X : ξ 6 x < b} . (20) In order to differentiate both sides of (18) with respect to , let us first calculate the derivatives of the functions involved separately. The derivative of the residual function will be given by

1 −G(x) ( {x>y} G(x) ∇Fθ (x) 0 − θ, for x ∈ X1, 0 Fθ (x) Fθ (x) Fθ (x) τ,θ (x) = ¯ ¯ (21)  1{x

0 ∂ where θ = θ. Thus, assuming θ = θg, a solution to (18) under  = 0, we  ∂ =0 obtain  1 −G(x) ∇F (x) {x>y} G(x) θg 0  − θ , for x ∈ X1 0 Fθg (x) Fθg (x) Fθg (x) =0 τ (x) = ¯ (22) ,θ =0 1{x

LHS() = H(τ,θ (x))u1,θ (x)SC (x)dG(x) X Z +  H(τ,θ (x))u1,θ (x)SC (x)d(1{x>y} − G(x)) X Z ¯ + H(τ,θ (x))u2,θ (x)G(x)dFC (x) X

6 Z ¯ +  H(τ,θ (x))u2,θ (x)(1{x

∂ Z LHS() = H0(τ(x))τ 0 (x) u (x)S (x)dG(x) ,θ =0 1,θg C ∂ X =0 Z  0 + H(τ(x))∇u1,θg (x)SC (x)dG(x) θ =0 X

+ H(τ(y))u1,θg (y)SC (y) Z

− H(τ(x))u1,θg (x)SC (x)dG(x) X Z + H0(τ(x))τ 0 (x) u (x)G¯(x)dF (x) ,θ =0 2,θg C X Z  ¯ 0 + H(τ(x))∇u2,θg (x)G(x)dFC (x) θ =0 X Z

+ H(τ(x))u2,θg (x)1{x

It is to be noted that θg being a solution to (18) under  = 0, the fourth and last term in the right side of (24) add up to zero. 0 We now collect the coefficients of θ =0 from (24) which gives the final (combined) coefficient as Z

D = H(τ(x))∇u1,θg (x)SC (x)dG(x) X Z ¯ + H(τ(x))∇u2,θg (x)G(x)dFC (x) X Z 0 G(x) ∇Fθg (x) − H (τ(x)) u1,θg (x)SC (x)dG(x) X1 Fθg (x) Fθg (x) Z G¯(x) ∇S (x) 0 θg ¯ − H (τ(x)) u2,θg (x)G(x)dFC (x), (25) X2 Sθg (x) Sθg (x)

0 The terms free from θ =0 are then combined in the quantity N given by Z 0 1{x>y} − G(x) N = H (τ(x)) u1,θg (x)SC (x)dG(x) X1 Fθg (x) Z ¯ 0 1{x

7 + H(τ(y))u1,θg (y)SC (y) Z 1 − G(x) 0 {x>y} ¯ + H (τ(x)) u2,θg (x)G(x)dFC (x) X1 Fθg (x) Z 1 − G¯(x) 0 {x

+ H(τ(x))u2,θg (x)1{x

0 These calculations now clearly yield the form of the IF θ =0 of the weighted likelihood estimator, which is given in the following

Theorem 4.1 (Influence Function). The influence function of the proposed weighted likelihood estimator is given by

0 −1 IF (y; T,G) = θ =0 = −D N, (27) where D and N are as in (25) and (26) respectively.

If the true distribution G belongs to the model, i.e., G = Fθ0 for some θ0 ∈ Θ, then the influence function turns out to have the simpler form

Z Z −1

IF (y; T,Fθ0 ) = − ∇u1,θ0 (x)SC (x)dFθ0 (x) + ∇u2,θ0 (x)Sθ0 (x)dFC (x) X X  Z 

× u1,θ0 (y)SC (y) + u2,θ0 (x)1{x

5 Consistency and Asymptotic Normality of the Weighted Likelihood Estimator

To prove the asymptotic normality of the proposed weighted likelihood estimator under the right censored scenario, we need the following assumptions. For simplic- ity the results are presented with a scalar parameter in mind; however they may be extended to the vector parameter case with appropriate generalizations in the notation.

(A1) The weight function H(τ) is non-negative, bounded and twice continuously differentiable such that H(0) = 1 and H0(0) = 0.

(A2) The functions H0(τ)(1 + τ) and H00(τ)(1 + τ)2 are bounded. Further, H00(τ) is continuous in τ.

8 (A3) For every θ ∈ Θ, there is a neighbourhood N(θ ) such that for every θ ∈ 0 0 ˜ ˜ 2 ˜ N(θ0), the quantities Uθ(x)∇Uθ(x, δ) , Uθ (x)Uθ(x, δ) , ∇Uθ(x)Uθ(x, δ) and |∇2Uθ(x, δ)| are bounded by M1(x),M2(x),M3(x) and M4(x) respectively, with

Eθ0 [Mi(X)] < ∞ for all i ∈ {1, 2, 3, 4}. (A4) [U˜ 2(X)U 2(X, ∆)] < ∞ and [∇U 2 (X, ∆)] < ∞. Eθ0 θ θ Eθ0 θ0 2 (A5) The analogue of Fisher information in the survival context, J(θ) = Eθ[Uθ (X, ∆)] is non-zero and finite for all θ ∈ Θ. (A6) S (x) S (x), for some  > 0. C > θ0 ˜ Here, the function Uθ is defined as

( ∇Fθ(x) 1 , if Fθ(x) 6 U˜ (x) = Fθ(x) 2 θ ∇Sθ(x) 1 , if Fθ(x) > . Sθ(x) 2 We now present the main result in the following Theorem 5.1. Let the true distribution G of the failure time data belong to the ˆ model, with θ0 being the true parameter value. Let θn,WLE be the proposed weighted likelihood estimator. Define

n 1 X A (θ) = H(τ (X ))U (X , ∆ ), n n KM,θ i θ i i i=1 n 1 X B (θ) = ∇ (H(τ (X ))U (X , ∆ )) , and n n KM,θ i θ i i i=1 n 1 X C (θ) = ∇ (H(τ (X ))U (X , ∆ )) , n n 2 KM,θ i θ i i i=1 where ∇2 represents the second derivative with respect to θ. Then, under Assump- tions (A1)–(A6) listed above, the following results hold. √ 1 Pn (a) n An(θ0) − n i=1 Uθ0 (Xi, ∆i) = oP (1). 1 Pn (b) Bn(θ0) − n i=1 ∇Uθ0 (Xi, ∆i) = oP (1). 0 0 ˆ (c) Cn(θ ) = OP (1), where θ lies on the line segment joining θ0 and θn,WLE. Before starting the proof of Theorem 5.1 above, we present two very important corollaries that are the focus of this work. ˆ Corollary 5.1. There exists a sequence {θn,WLE}n∈N of roots of the weighted likeli- hood estimating equation (13) such that ˆ P θn,WLE −→ θ0.

9 The proof of Corollary 5.1 essentially stems from the ideas introduced in Serfling (1981) and Lehmann and Casella(2006). The idea of the proof is that, given any  > 0, the root of the equation (13) lies inside the interval (θ0 − , θ0 + ). For this purpose, we assert the Lemma in the line of Serfling(1981). Lemma 5.1. For any  > 0,  3  |A (θ − ) − J(θ )| > J(θ ) −→ 0, and Pθ0 n 0 0 4 0  3  |A (θ + ) + J(θ )| > J(θ ) −→ 0. Pθ0 n 0 0 4 0

Proof. Consider a Taylor series expansion of the term An(θ0 − ) around θ0 which gives 1 A (θ − ) = A (θ ) − B (θ ) + 2C (θ0) n 0 n 0 n 0 2 n 1 = A − B + 2C , n n 2 n where we denote

0 An ≡ An(θ0),Bn ≡ Bn(θ0) and Cn ≡ Cn(θ ).

Then,

|An(θ0 − ) − J(θ0)|

1 2 = An − Bn +  Cn − J(θ0) 2

1 2 An − (Bn + J(θ0)) +  Cn 6 2 1 |A | + |B + J(θ )| + 2|C |. 6 n n 0 2 n Thus, the first probability expression becomes  3  |A (θ − ) − J(θ )| > J(θ ) (29) Pθ0 n 0 0 4 0  1 3  |A | + |B + J(θ )| + 2|C | > J(θ ) (30) 6 Pθ0 n n 0 2 n 4 0  1   1  |A | > J(θ ) + |B + J(θ )| > J(θ ) (31) 6 Pθ0 n 4 0 Pθ0 n 0 4 0 1 1  + 2|C | > J(θ ) . (32) Pθ0 2 n 4 0

10 From Theorem 5.1, we have,

An = oP (1),Bn + J(θ0) = oP (1) and Cn = OP (1). Thus, each of the terms in the last inequality (29) approaches zero as n → ∞. This proves the first part of the Lemma.

Proof of Corollary 5.1. From Lemma 5.1, it follows that for any  > 0,

Pθ0 [An(θ0 − ) < 0 and An(θ0 + ) > 0] −→ 1, as n −→ ∞.

Thus, using the continuity of An(θ), we conclude that for any  > 0,

Pθ0 [ ∃ a root of (13) in the interval (θ0 − , θ0 + )] −→ 1, as n −→ ∞.

Let us define this zero of An(θ) by ˆ θn,WLE() = inf {θ : θ ∈ (θ0 − , θ0 + ) and An(θ) = 0} . ˆ The measurability of θn,WLE() follows in the same line as illustrated in Serfling (1981). However, this root depends upon both θ0 (which is unknown) and . It can ˆ easily be seen that θn,WLE() is consistent for θ0. Let us now illustrate one technique that has been adopted in order to avoid the dependencies of the root on θ0 and .

Let {Tn}n∈N be a sequence of consistent estimators of θ0, which can be obtained under fairly general conditions. Let us define the following estimator. ( ˆ The solution of (13) closest to Tn, if ∃ at least one solution θn,WLE = θ1 ∈ Θ, arbitrary, if ∃ no solution. ˆ Employing this technique, we achieve a sequence {θn,WLE} of consistent solutions of (13) and complete the proof.

√ ˆ D −1 Corollary 5.2. n(θn,WLE − θ0) −→ N(0,J (θ0)). ˆ Proof. Expanding the left side of (13) at θ = θn,WLE about the true parameter θ0 results in √ √ ˆ nAn(θ0) n(θn,WLE − θ0) = − ˆ . (θn,WLE−θ0) 0 Bn(θ0) + 2 Cn(θ ) Now, from Theorem 5.1, we get, √ D nAn(θ0) −→ N(0,J(θ0)). ˆ 0 ˆ Also, from Corollary 5.1, (θn,WLE − θ0) = oP (1) and Cn(θ ) = OP (1). So, (θn,WLE − 0 θ0)Cn(θ ) = oP (1). Hence, we conclude √ ˆ D −1 n(θn,WLE − θ0) −→ N(0,J (θ0)).

11 Let us now prove the main Theorem 5.1. Proof of Theorem 5.1: We first make a slight modification in the definition of the residual function as given by

0, if 0 F (x) 1  6 θ0 6 knα Fˆ (x)  KM 1 1  − 1, if α < Fθ (x)  F (x) kn 0 6 2 τ˜ (x) = θ0 (33) KM,θ0 Sˆ (x)  KM − 1, if 1 < F (x) 1 − 1  S (x) 2 θ0 6 knα  θ0  1 0, if 1 − knα < Fθ0 (x) 6 1, where α is a suitable positive constant. Apparently, this change in the definition may seem contrary to the idea conveyed in the original definition of the residual function. However, a residual equal to zero leads to a weight equal to 1, and if anything, makes the asymptotics easier. Also we may notice that while doing any real data analysis or simulation, we can always make the smallest observation fall outside the two extreme regions of zero residual by appropriately choosing the value 1 of k which ensures k > α . In particular, therefore, n Fθ0 (the minimum observation) the use of the residual (33) makes no difference to our estimation scheme. To avoid further notational complications, we will continue to denote the residual function as

τKM,θ0 (x) in the subsequent developments, but will actually use the residual function

τ˜KM,θ0 (x) defined in (33). To prove the first assertion of the theorem, we consider the Taylor series expan- sion of (13) to get 1 H(τ (x)) − 1 = H(0) + τ (x)H0(0) + τ 2 (x)H00(ξ (x)) − 1 KM,θ0 KM,θ0 2 KM,θ0 θ0 1 = τ 2 (x)H00(ξ (x)), 2 KM,θ0 n,θ0 where ξn,θ0 (x) is between 0 and τKM,θ0 (x).

As n grows large enough, ξn,θ0 (x) → 0. Hence, by assuming sufficient smoothness of the weight function H(·), we can assert that ∃ m1,n → m1 < ∞ such that

|H(τ (x)) − 1| m τ 2 (x). KM,θ0 6 1,n KM,θ0 So, [{H(τ (X)) − 1}2] m2 [τ 4 (X)]. E KM,θ0 6 1,nE KM,θ0 We now proceed to evaluate [τ 4 (X)]. For this, let us first partition the E KM,θ0 sample space according to the definition of the modified residual as below.    −1 −1 1 X1 = F (0),F θ0 θ0 knα

12      −1 1 −1 1 X2 = F ,F θ0 knα θ0 2      −1 1 −1 1 X3 = F ,F 1 − θ0 2 θ0 knα     −1 1 −1 X4 = F 1 − ,F (1) . θ0 knα θ0

S4 Then, the overall sample space is j=1 Xj and

4  4  X  4  E τKM,θ0 (X) = E τKM,θ0 (X)1{X∈Xj } . j=1

At this juncture, the asymptotic distributional result (Kalbfleisch and Prentice, 2011; Fleming and Harrington, 2011; Andersen et al., 1985, 2012; Peterson Jr, 1977) of the Kaplan-Meier estimator of survival function needs to be invoked as this will provide a basis for the proof of the theorem. The asymptotic distribution of Kaplan- Meier estimator of the survival function is given by √ ˆ D 2 2 n(SKM(x) − Sθ0 (x)) −→ N(0,Sθ0 (x)σ (x)), where Z x λ (u) σ2(x) = θ0 du. 0 Sθ0 (u)SC (u) Chang(1991) computed moments of the Kaplan-Meier estimator of different or- ders which we will be using while computing higher order moments for the residual function τKM,θ0 (X). We list below the 2nd, 3rd and 4th order moments for the Kaplan-Meier estimator as established in Chang(1991). For x ∈ X3, the formulae are

1 1 Z x  1 1  1 [τ 2 (x)] = σ2(x) + − λ (u)du − σ4(x) + o(n−2), E KM,θ0 2 2 θ0 2 n n 0 SX (u) SX (u) 2n 1  Z x 1 3  [τ 3 (x)] = − λ (u)du + σ4(x) + o(n−2), E KM,θ0 2 2 θ0 n 0 SX (u) 2 3 [τ 4 (x)] = σ4(x) + o(n−2), (34) E KM,θ0 n2 while for x ∈ X2, they are S2 (x)  1 1 Z x  1 1  1  [τ 2 (x)] = θ0 σ2(x) + − λ (u)du − σ4(x) + o(n−2), E KM,θ0 F 2 (x) n n2 S2 (u) S (u) θ0 2n2 θ0 0 X X 1 S3 (x) Z x 1 3  [τ 3 (x)] = × θ0 λ (u)du − σ4(x) + o(n−2), E KM,θ0 n2 F 3 (x) S2 (u) θ0 2 θ0 0 X

13 3 S4 (x) [τ 4 (x)] = × θ0 × σ4(x) + o(n−2). (35) E KM,θ0 n2 F 4 (x) θ0 We will now attempt to evaluate integrations of the right sides of the moment expressions (except the small-o terms) as listed in (34) and (35). As such, we have, Z 4 σ (x)dFX (x) X3 Z Z x 2 λθ0 (u) = du dFX (x) X3 0 Sθ0 (u)SC (u) Z 1 Z x f (u) 2 θ0 du dF (x) 6 S2 (x) S2 (u) X X3 C 0 θ0 Z 1  1 2  = − + 1 dF (x) S2 (x) S2 (x) S (x) X X3 C θ0 θ0 α 1  1  2 log kn  knα − 1 − − 2e 6S F −1 1 − 1  knα S F 1  C θ0 knα C θ0 2 α  1  2 log kn  (knα) knα − 1 − − 2e 6 knα 1  SC Fθ0 2 = O(nα(1+)). (36)

Further, Z S4 (x) θ0 × σ4(x)dF (x) F 4 (x) X X2 θ0 Z 1 S4 (x) × θ0 dF (x) 6 S2 (x) F 4 (x) X X2 C θ0 " #  1 3 1 1 (knα)2 1 − − 6 knα S F −1 1  S F −1 1  C θ0 2 C θ0 knα 1  knα  1 1  + (knα − 2) − 2 log + − S F −1 1  2 2 knα C θ0 2 = O(n2α). (37)

Combining the results obtained in (36) and (37), we have,

3 Z Z S4 (x)  σ4(x)dF (x) + θ0 × σ4(x)dF (x) n2 X F 4 (x) X X3 X2 θ0 −2+α(1+) −2+2α 6 O(n ) + O(n ). (38)  1  Now, in order to make the resulting term in (38) as o , the following constraints n

14 need to be imposed on the value of α. 1 −2 + α(1 + ) + 1 < 0 ⇔ α < , 1 +  1 −2 + 2α + 1 < 0 ⇔ α < . 2  1 1 From the above inequalities, we get α < min , . Choosing appropriate 1 +  2 value of  > 0 and thereby that of α, we can determine the required rate at which the quantities in (36) and (37) converge to zero. Evaluating in a similar fashion the other moments as mentioned in (34)and (35), we conclude that Z Z  3   3  E τKM,θ0 (x) dFX (x) = o(1), E τKM,θ0 (x) dFX (x) = o(1), X3 X2 Z Z  2   2  E τKM,θ0 (x) dFX (x) = o(1), E τKM,θ0 (x) dFX (x) = o(1). X3 X2 However, the small-o terms in (36) and (37) are also functions of x and hence needs to be integrated. To handle these residual terms, we apply the following result mentioned in Andersen et al.(2012, p. 85) and show that these small-o terms  1  (o(n−2)) would not influence the o terms we obtained in (38). n

P Lemma 5.2. Let Zn(x) be a sequence of stochastic processes such that Zn(x) → f(x) for almost all x, where f is a deterministic function such that R |f(x)|dx < ∞. R Further, for all η > 0, ∃ gη with gη < ∞ such that

lim inf [|Zn(x)| gη(x)∀x] 1 − η. (39) n→∞ P 6 > The, Z t Z t P sup Zn(x)dx − f(x)dx → 0. (40) t>0 0 0 For our purpose, we will apply a Dvoretzky-Kiefer-Wolfowitz type inequality for the Kaplan- Meier estimator (Bitouzé et al., 1999) which states that, for any η > 0 there exists constant D > 0 such that   5 √ ˆ  2  P sup SC (x){FKM(x) − Fθ0 (x)} > η 6 exp −2nη + D nη . (41) x 2 As such, we fix η > 0 proceed in the following manner.

 4  Sˆ (x) sup KM − 1 > η4 P  S (x)  x∈X3 θ0

15 " # Sˆ (x) = sup KM − 1 > η P S (x) x∈X3 θ0   knα sup Sˆ (x) − S (x) > η 6 P KM θ0 x∈X3 " # knα ˆ 6 P −1 1 sup SC (x){SKM(x) − Sθ0 (x)} > η S (F (1 − )) x∈X C θ0 knα 3   (knα)1+ sup S (x){Sˆ (x) − S (x)} > η 6 P C KM θ0 x∈X3

5 h 1−2α(1+) 2 1−2α(1+) i exp −2n η + Dn 2 η . 6 2 1 By choosing α < , we get an n ∈ such that for all n n , 2(1 + ) 0 N > 0  4  Sˆ (x) sup KM − 1 η4 1 − η P  S (x) 6  > x∈X3 θ0  4  Sˆ (x) or, KM − 1 η4, ∀x ∈ X 1 − η. P  S (x) 6 3 > θ0 In other words,   ˆ 4 SKM(x) 4 lim inf P  − 1 6 η , ∀x ∈ X3 > 1 − η. (42) n→∞ S (x) θ0 We now define  4   Sˆ (x) 3  Z (x) = KM − 1 − σ4(x) f (x). 3,n S (x) n2 X  θ0  3 Since, it has been shown that σ4(x) = o(1), we can now apply Lemma 5.2 by n2 choosing the dominating function gη to be

4 gη(x) = 2η fX (x), x ∈ X3. Hence, we conclude that Z |Z3,n(x)|dx = o(1). (43) X3 In the same vein, we can also conclude that Z |Z2,n(x)|dx = o(1), (44) X2

16 where

 4   Fˆ (x) 3 S (x)4  Z (x) = KM − 1 − θ0 σ4(x) f (x), x ∈ X . 2,n F (x) n2 F (x) X 2  θ0 θ0 

Combining (43) and (44), we have,

n [τ 4 (X)] O(n−1+α(1+)) + O(n−1+2α). (45) E KM,θ0 6 For the second and third order moments, we argue in the same way to conclude that

2 3 E[τKM,θ0 (X)] = o(1) and E[τKM,θ0 (X)] = o(1). (46) Combining all these, we finally have,

[|(H(τ (X)) − 1)U (X, ∆)|] 1/2[(H(τ (X)) − 1)2] 1/2[U 2 (X, ∆)] E KM,θ0 θ0 6 E KM,θ0 E θ0 m 1 2 , for some m < ∞ and β > , 6 nβ 2 2 since, by (A5), [U 2 (X, ∆)] = J(θ ) < ∞. E θ0 0 Consequently,

" n # 1 X √ (H(τ (X )) − 1)U (X , ∆ ) E n KM,θ0 i θ0 i i i=1 n 1 X √ [|(H(τ (X )) − 1)U (X , ∆ )|] 6 n E KM,θ0 i θ0 i i i=1 n 1 X m2 m2 √ = → 0. 6 n nβ nβ−1/2 i=1 Then, an application of Markov’s Inequality gives,

n 1 X P √ (H(τ (X )) − 1)U (X , ∆ ) → 0, n KM,θ0 i θ0 i i i=1 which proves Part 1. For Part 2, we have,

n n 1 X 1 X B (θ ) − ∇U (X , ∆ ) H0 (τ (X ))(1 + τ (X ))U˜ (X )U (X , ∆ ) n 0 n θ0 i i 6n θ0 KM,θ0 i KM,θ0 i θ0 i θ0 i i i=1 i=1 n 1 X + |(H(τ (X )) − 1)∇U (X , ∆ )|. (47) n KM,θ0 i θ0 i i i=1

17 For the second part of the expression, we proceed similarly as in the proof of Part 1 to obtain n 1 X m3 [|(H(τ (X )) − 1)∇U (X , ∆ )|] , n E KM,θ0 i θ0 i i 6 nβ i=1 which implies

n 1 X P |(H(τ (X )) − 1)∇U (X , ∆ )| −→ 0. n KM,θ0 i θ0 i i i=1 Now, for the first sum in (47), we consider the following

0 0 00 0 H (τKM,θ0 (x)) = H (0) + τKM,θ0 (x)H (ξθ0 (x)) 00 0 = τKM,θ0 (x)H (ξθ0 (x)), where ξ0 (x) is between 0 and τ (x). θ0 KM,θ0 Then, by application of Cauchy-Schwarz inequality and Assumption (A4), we get,

H0(τ (X))(1 + τ (X))U˜ (X)U (X, ∆) E KM,θ0 KM,θ0 θ0 θ0 1/2 0 2 1/2 ˜ 2 6 E [{H (τKM,θ0 (X))(1 + τKM,θ0 (X))} ]E [{Uθ0 (X)Uθ0 (X, ∆)} ] m 1/2[τ 2 (X) + 2τ 3 (X) + τ 4 (X)] 6 4E KM,θ0 KM,θ0 KM,θ0 m 4 , for some γ > 0. 6 nγ

From the above calculations, we conclude that (47) is oP (1). For Part 3, we observe that

n 0 1 X Cn(θ ) = ∇2(H(τKM,θ(Xi))Uθ(Xi, ∆i)) 0 n θ=θ i=1 n 1 X = H(τ 0 (X ))∇ U 0 (X , ∆ ) n KM,θ i 2 θ i i i=1 n 2 X 0 − H (τ 0 (X ))(1 + τ 0 (X ))U˜ 0 (X )∇U 0 (X , ∆ ) n KM,θ i KM,θ i θ i θ i i i=1 n 1 X 0 − H (τ 0 (X ))(1 + τ 0 (X ))∇U˜ 0 (X )U 0 (X , ∆ ) n KM,θ i KM,θ i θ i θ i i i=1 n 1 X 0 2 + H 0 (τ 0 (X ))(1 + τ 0 (X ))U˜ 0 (X )U 0 (X , ∆ ) n θ KM,θ i KM,θ i θ i θ i i i=1 n 1 X 00 2 2 + H 0 (τ 0 (X ))(1 + τ 0 (X )) U˜ 0 (X )U 0 (X , ∆ ). n θ KM,θ i KM,θ i θ i θ i i i=1

18 ˆ Now, owing to the consistency of θn,WLE as shown in Lemma 5.1, for large enough n, 0 we can choose a neighbourhood of θ0 which contains θ . Hence, under the assumption 0 (A3), each of the terms in the right side is bounded. Thus, we conclude that Cn(θ ) = OP (1).

References

Andersen, P. K., Borgan, O., Gill, R. D., and Keiding, N. (2012). Statistical models based on counting processes. Springer Science & Business Media.

Andersen, P. K., Borgan, Ø., Hjort, N. L., Arjas, E., Stene, J., and Aalen, O. (1985). Counting process models for life history data: A review [with discussion and reply]. Scandinavian Journal of , 12(2):97–158.

Biswas, A., Roy, T., Majumder, S., and Basu, A. (2015). A new weighted likelihood approach. Stat, 4(1):97–107.

Bitouzé, D., Laurent, B., and Massart, P. (1999). A Dvoretzky–Kiefer–Wolfowitz type inequality for the Kaplan–Meier estimator. In Annales de l’Institut Henri Poincare (B) Probability and Statistics, volume 35, pages 735–763. Elsevier.

Chang, M. N. (1991). Moments of the Kaplan–Meier estiamtor. Sankhy¯a: The Indian Journal of Statistics, Series A, pages 27–50.

Fleming, T. R. and Harrington, D. P. (2011). Counting processes and survival analysis, volume 169. John Wiley & Sons.

Kalbfleisch, J. D. and Prentice, R. L. (2011). The statistical analysis of failure time data, volume 360. John Wiley & Sons.

Kaplan, E. L. and Meier, P. (1958). Nonparametric estimation from incomplete observations. Journal of the American Statistical Association, 53(282):457–481.

Lehmann, E. L. and Casella, G. (2006). Theory of point estimation. Springer Science & Business Media.

Majumder, S., Biswas, A., Roy, T., Bhandari, S., and Basu, A. (2016). Statis- tical inference based on a new weighted likelihood approach. arXiv preprint arXiv:1610.07949.

Peterson Jr, A. V. (1977). Expressing the Kaplan-Meier estimator as a function of empirical subsurvival functions. Journal of the American Statistical Association, 72(360a):854–858.

Reid, N. (1981). Influence functions for censored data. Ann. Statist., 9(1):78–92.

19 Serfling, R. J. (1981). Approximation Theorems of Mathematical Statistics (Wiley Series in Probability and Statistics).

20