A Bahadur-Kiefer theorem beyond the largest observation

Citation for published version (APA): Einmahl, J. H. J. (1992). A Bahadur-Kiefer theorem beyond the largest observation. (Memorandum COSOR; Vol. 9234). Technische Universiteit Eindhoven.

Document status and date: Published: 01/01/1992

Document Version: Publisher’s PDF, also known as Version of Record (includes final page, issue and volume numbers)

Please check the document version of this publication:

• A submitted manuscript is the version of the article upon submission and before peer-review. There can be important differences between the submitted version and the official published version of record. People interested in the research are advised to contact the author for the final version of the publication, or visit the DOI to the publisher's website. • The final author version and the galley proof are versions of the publication after peer review. • The final published version features the final layout of the paper including the volume, issue and page numbers. Link to publication

General rights Copyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright owners and it is a condition of accessing publications that users recognise and abide by the legal requirements associated with these rights.

• Users may download and print one copy of any publication from the public portal for the purpose of private study or research. • You may not further distribute the material or use it for any profit-making activity or commercial gain • You may freely distribute the URL identifying the publication in the public portal.

If the publication is distributed under the terms of Article 25fa of the Dutch Copyright Act, indicated by the “Taverne” license above, please follow below link for the End User Agreement: www.tue.nl/taverne

Take down policy If you believe that this document breaches copyright please contact us at: [email protected] providing details and we will investigate your claim.

Download date: 30. Sep. 2021 EINDHOVEN UNIVERSITY OF TECHNOLOGY Department of Mathematics and Computing Science

Memorandum COSOR 92-34 A Bahadur-Kiefer theorem beyond the largest observation J.H.J. Einmahl

Eindhoven, September 1992 The Netherlands Eindhoven University of Technology Department of Mathematics and Computing Science , statistics, operations research and systems theory P.O. Box 513 5600 :MB Eindhoven - The Netherlands

Secretariate: Dommelbuilding 0.03 Telephone: 040-473130

ISSN 09264493 A BAHADlJR-KIEFER THEOREM BEYOND THE LARGEST OBSERVATION

John H.J. Einmahl* Eindhoven University of Technology

It is shown that under natural extreme-value conditions a distributional Bahadur-Kiefer theorem holds in a point lying outside the sample. The limiting distribution is degenerate if the extreme-value index is equal to one; the proper refinement for that case is also established. In both cases the limiting distribution is chi-square with one degree of freedom.

* Part of the research was performed at the MSRI, Berkeley, and supported by NSF grant DMS-8505550.

AMS 1991 subject classifications: 62E20,60F05. Key words and phrases: Bahadur-Kiefer representation, extreme values, tail and quantile estimation, weak limit theorem.

Running head: Bahadur-Kiefer outside sample. 1 Introduction

Let Xl, X 2, ••• be a sequence of Li.d. random variables with common distribution function F, which we assume for convenience to be differentiable and strictly increasing. Denote with Q the inverse or quantile function pertaining to F. Well-studied estimators for F and Q are the empirical distribution function

n (1) Fn(x) = ~ L l(_oo,x](Xi ), x E IR , i=l and its left-continuous inverse Qn, the empirical quantile function. Denote the empirical process by

1 (2) an(x) = n"2(Fn (x) - F(x)), x E IR , and the quantile process by

1 (3) fin(t) = n"2 f(Q(t))(Qn(t) - Q(t)), 0 < t < 1 , where f := F'. Bahadur (1966) introduced and studied the process

(4) Rn(t) := an(Q(t)) + f3n(t), 0 < t < 1 .

This process is also thoroughly investigated in Kiefer (1967, 1970) and hence now known as the Bahadur-Kiefer process. Recently there has been again a lot of interest in the process Rn and some of its generalizations. We refer to Shorack (1982), Einmahl and Mason (1988), Deheuvels and Mason (1990a,b, 1991), Beirlant and Einmahl (1990), Beirlant, Deheuvels, Einmahl and Mason (1991), Deheuvels (1992a,b), Arcones and Mason (1992) and Deheuvels and Einmahl (1993).

A main and striking result on the topic, stated in Kiefer (1970) and proved in Shorack (1982, upper bound) and Deheuvels and Mason (1990a, lower bound) is that under addi­ tional regularity on F

sup 1. Ii n4 O

1 (6) sup IB(t)12, O

1 where Z is standard normal and independent of B.

If we are not only interested in the middle of a distribution (fixed t) but also in its right tail (t = tn ---+ 1 as n ---+ 00) it turns out that most of the asymptotic distributional theory for (Xn(Q(tn)), f3n(tn) and Rn(tn) can be obtained by using essentially the same techniques as for fixed t, as long as n(l - tn) ---+ 00, i.e. we work in the intermediate tail. Also on this subject there are several papers published; for obvious but neglected implications to estimating the tail of F, see Einmahl (1990).

It is clear that Fn is useless as an estimator of F if t n tends to one, so fast that

(8) n(1 - tn) ---+ 0 (n -+ 00) , i.e. if we want to estimate F(Yn), subject to the condition n(1 - F(Yn)) ---+ O. Observe that we work then outside the sample. In fact a little reflection shows that estimating F(Yn) in a reasonable way is impossible without additional assumptions on F if we are as far in the tail as here. A rather weak and natural assumption is that F is in the domain of max-attraction of an extreme-value distribution, meaning that there exists constants an > 0 and bn, n E IN, such that

l

(10) exp(-(1 + Ix)-lh) =: G,,(x), IE JR, for those x for which 1 + IX > O. (For I = 0 interpret (1 + IX)-lh as e-x .) The parameter I is called the extreme-value index. Under assumptions (8) and (9) estimators for F(Yn) and Q(tn ) are derived and (under additional conditions) shown to be asymptotically normal in Dijk and de Haan (1991) and de Haan and Rootzen (1992), respectively (in the sequel denoted as DdH and dHR). See also Dekkers and de Haan (1989), and Dekkers, Einmahl and de Haan (1989) for more background.

In this paper we establish an extreme-tail analogue of (7) under assumptions similar to those in DdH and dHR, thus combining modern statistical theory of extremes with the classical results by Bahadur and Kiefer. It turns out that the limiting is chi-square with one degree of freedom.

The main results are presented in section 2. The proofs are deferred to section 3.

2 2 Main Results

\Vhen (9) holds for a certain I E JR (see (10», this condition can be rewritten as

(11) for all x. For our purpose condition (11) can be expressed most conveniently as follows: there exist functions a > 0 and b such that

(12) lim s(1- F(b(s) + xa(s») = (1 +,xr1h 8-+00 for all x with 1 + IX > O. Writing U = Q(I- 1/1), where J denotes the identity function, we can and will choose b = U and a = JU'. We need much more notation. Write Pn = 1 - tn; throughout we will assume npn -4 0 (n -4 00). As suggested before we also write Yn = Q(tn). Let k = kn be a sequence of positive integers with k ~ n, set an = kl(npn). Our estimator of Q(tn) is defined by

(13) with

(14) b(I) = Xn-k:n ,

(the (k + 1)-st order statistic from above),

k-l (16) M~rl = i I)log Xn-i:n -log Xn-k:nY (r = 1,2), i=O

(17) PIh') = { ~ 1-')' 1 < 0, (}l)2)_1 ( n (18) i = M(1) + 1 _ 1 1 _ M n 2 ( M~2l

:I: Set q(x) = JS')'-1 logs ds. Assuming among other things that k -4 00, kin -4 0 (n -4 00) 1 a.nd

(19) lim log an = 0 n-+oo Vk

3 it is shown in dHR that

yfk(Qn(tn) - Q(tn)) (20) a(~)q(an) is asymptotically normal with mean zero and

(21) /' < o.

Now we turn to the estimation of F(Yn). A difficulty is the restriction 1+/,x > 0 in connection with (12). So we assume, setting s = ~ in (12), that Yn can be written as

c~ (22) Yn = b( kn )+a ( kn)--- 1 /' with en > 0 (for positive s interpret sO ~ 1 as log s). The estimator of F(Yn) is

- _ k • ' Yn - b(An))k ) -l/'Y (23) Fn (Yn)-1--n (max (0,1+/, a(~) .

In DdH it is established that under similar conditions as for the proof of (20)-(21) and if /' > -~ we have that

(24)

is also asymptotically normal. The limiting random variable has exactly the same distribu­ tion as the one for the expression in (20).

Remark Note that for a result like (24) to be reasonable a restriction on /' is needed. Even it we are sampling from the uniform-[0,8] distribution (8 > 0), a parametric model with/, = -1, it is hopeless to estimate F(Yn) with some Fn(Yn) in such a way that

1 - () Fn(Yn) () 25 1 _ F(Yn) P 1, n -- 00 .

It is observed in DdH that in fact the limiting random variable for (24) is minus the one for (20), or equivalently that the sum of the expressions in (20) and (24) converges to zero in probability. It is our aim in this paper to present the proper distributional result for this sum. '\Trite

4 and let X~ denote a chi-square random variable with one degree of freedom.

Theorem 1 Let I > -~, npn -+ 0, k - 00, k/n - O. Assume U three times differentiable, 9 positive and

g'(log 0 , (27) ...;k i') _ g(1og I)

(28) log an sup 19"(v)/g'(v)l- 0 , v ~ log(I)

(29) q(an)/(a~..jk) - 0 , then as n - 00

with

/~O (31) -! < / < O.

Observe that the limit is degenerate for / = 1; the proper refinement of Theorem 1 for this case is as follows.

Theorem 2 Let / = 1. Assume the conditions of Theorem 1 are fulfilled with (27) and (29) strengthened to

g'(log 11) (32) (log an)...;k (l :') - 0 , 9 og k

(33) respectively, then as n - 00

(34) - 2X~ . d

Note that in Theorem 1 the extra (compared with the results in DdH and dHR) multiplica­ tion factor is c~...;k/q(cn), which tends to 00 by (29) and the fact that an may be replaced

5 by Cn there. In Theorem 2 this extra multiplcation factor is v'k; compared with Theorem 1 the extra factor is log Cn'

It is also noteworthy that the assumptions needed to prove (30) are weaker than the as­ sumptions obtained by uniting those required for the asymptotic normality of Q(in ) and F(Yn) in dHR and DdH, respectively. In particular, (2.11) in the latter paper is superfluous here.

3 Proofs

Define

(35) G n = Vk(i-,),

(36) An = Vk(aG)ja(I) - 1) ,

(37) B n = Vk((b(I) - b(I)/a(I» , and note (see dHR for details) that as n - 00, Gn, An and En converge weakly to nonde­ generate limiting random variables G, A and E, respectively. In fact G and E are normal for all, E IR and A is normal if, #- O.

Proof of Theorem 1 Write

(38)

= _ nc~')'+l ((1-.t (y »_ ~) + nc~')'+l (Pn _ ~) n ft q2( Cn ) nCn q2( Cn ) nCn also

(39)

Observe that (U(1jPn) - U(njk»/a(njk) = (c~ - 1);'. Now it follows from the last part of the proof of the Theorem in dHR that as n _ 00

6 (40)

Using (29), this easily implies that

and, weaker, cn/an -+ 1.

Throughout the remainder of the proof we assume I f 0; the small adjustments neces­ sary for the case I = 0 are routine and hence omitted. We now consider the sum of the deterministic terms in (38) and (39):

(42)

By a two-step Taylor expansion of 1 - (cn/any'!, it follows from (41) that the RHS of (42) converges to zero. So it suffices to prove the theorem with Rn replaced by

(43)

Define

(44)

C~-l C~-l) (----- l' I where for the last identity (36) and (37) are used. Set

Now observe, using again (36) and (37), that the second term in (43) can be rewritten as

7 Now from (41) it readily follows that this expression is equal to

c~ (4 '"'1 ) v'k TXT An) (1) - q( cn) n n (1 + v'k + Op •

Following the lines of the proof of Lemma 2 in DdH and using the in that paper obtained asymptotic normality of lVn and (35), we find that the first term in (43) can be written as

(48)

_ c~v'k -Y-"YW 1 ( )W2 () - -(-) Cn n - '2 'Y + 1 n + Op 1 q Cn

_ C~ v'k C~ log Cn 1) 2 ( ) - -(-) W n - () GnWn - '2(-;' + 1 W n + op 1 . q Cn q en

Combining (43), (47), (48) and the aforementioned weak convergence of An we see that

,_ c~ log C n G ,T 1 ( ) 2 ( ) (49) R n - - () nH n - '2 I + 1 W n + 0 p 1 . q Cn

Since for I ~ 0, Gn = -Wn + op(1) (see dHR), we have in this case

(.50+ ) R~ = ~b - l)W~ + op(l) ; for 'Y < 0, c~ log cn / q( cn) - 0 (n - 00) and hence

(50-) R~ = -!b + l)HT~ + op(l) .

Now using Lemma 1 in DdH, which gives the asymptotic normality of lVn , in combination with (50), completes the proof. 0

Proof of Theorem 2 This proof follows the lines of the previous one; we will highlight the differences. Consider

p _ _k ) + -----::--k_ (51) ( n nCn Cn log Cn

= k (Cn -1) + k (an _ cn) = an_k_ (aCn _ 1)2 . log Cn an Cn log Cn Cn log Cn n

Similarly as in dHR it follows from (28) and (32) that

8 and cn/an -. 1 (n -. 00). Hence the RHS of (51) tends to zero as n --+ 00. So, it suffices to prove (34) with R n1 replaced by

(53)

Using (36), (37), (52) and (44) we see that the second term in (53) is equal to

(54)

A~) A~) = - 1k Wn (1 + +opel) = -VkWn1 (1 + +op(l) , Cn og Cn vk vk where lVn1 := VklVn/(cn log cn). Similarly as in the previous proof, the first term in (53) can be written as (use (33) and (35))

_k_ {cn log Cn I,V _ I(' ) c~ 10g2 CnW2 } (1) (55) " I'- I nl 2 1+ 1 2" nl + op 1og Cn C~v k cn'Yk

From (54) and (55) it follows that we may replace R~l in turn by

Define H n = (c~ - 1)Ii - (c n - 1) and observe that we have from our (44) and the proof of Lemma 1 in dHR

and hence

n ~A n (58) -G - C log cnW 0 (logc ) n - () nl n p I'-k q Cn +q (Cn ) + v,.

By (56), (33) and the fact that q(cn ) = Cn log Cn - Cn + 1, this results in

9 n . 11 _ WA 1 W {C log en H[ ~A }- 1 W 2 (1) (59) R n1 - - nl n + og Cn nl q(c ) ttnl + q(c ) n og Cn nl + Op n n

= W~l +op(l) .

Now applying Lemma 1 in DdH yields the result. o

Acknowledgement I am grateful to Laurens de Haan for comments on a first draft of this paper.

References

Arcones, M.A. and Mason, D.M. (1992). The Bahadur-Kiefer expansion for M-estimators. Preprint.

Bahadur, R.R. (1966). A note on quantiles in large samples. Ann. Math. Statist. 37, 577-580.

Beirlant, J., Deheuvels, P., Einmahl, J.H.J. and Mason, D.M. (1991). Bahadur-Kiefer theo­ rems for uniform spacings processes. Teo. Verojtn. i ee Primenenia 36, 724-743.

Beirlant, J. and Einmahl, J .H.J. (1990). Bahadur-I\:iefer theorems for the product-limit pro­ cess. J. Multivariate Analysis 35, 276-294.

Deheuvels, P. (1992a). Pointwise Bahadur-Kiefer-type theorems (I). J. Mogyor6di Memorial Volume, (to appear).

Deheuvels, P. (1992b). Pointwise Bahadur-Kiefer-type theorems (II). Nonparametric Statis­ tics and Related Topics (A. Saleh ed.). North Holland, Amsterdam, (to appear).

Deheuvels, P. and Einmahl, J.H.J. (1993). Approximations and two-sample tests based on P-P and Q-Q plots of the Kaplan-Meier estimators of life-time distributions. J. Multi­ variate Analysis, (to appear).

Deheuvels, P. and Mason, D.M. (1990a). Bahadur-Kiefer-type processes. Ann. Probab. 18, 669-697.

Deheuvels, P. and Mason, D.M. (1990b). A Bahadur-Kiefer-type two sample statistic with applications to tests of goodness offit. In Colloq. Math. Soc. Janos Bolyai 57,157-172. North-Holland, Amsterdam.

Deheuvels, P. and Mason, D.M. (1991). A functional 1IL approach to pointwise Bahadur­ Kiefer theorems. Proc. of the 8th Conf. on Probab. in Banach spaces, Bowdoin, Maine, (to appear).

Dekkers, A.L.M., Einmahl, J.H.J. and de Haan, L. (1989). A estimator for the in­ dex of an extreme-value distribution. Ann. Statist. 17, 1833-1855.

10 Dekkers, A.L.M. and de Haan, L. (1989). On the estimation of the extreme-value index and large quantile estimation. Ann. Statist. 17, 1795-1832.

Dijk, V. and de Haan, L. (1991). On the estimation of the exceedance probability of a high level. Proceedings conference on Order Statistics, Alexandria, Egypt, (to appear).

Einmahl, J.H.J. (1990). The empirical distribution function as a tail estimator. Statistica Neerlandica 44, 79-82.

Einmahl, J .H.J. and Mason, D.M. (1988). Strong limit theorems for weighted quantile pro­ cesses. Ann. Probab. 16, 1623-1643. de Haan, 1. and Rootzen, H. (1992). On the estimation of high quantiles. J. Statist. Plan­ ning and Inference, (to appear).

Kiefer, J. (1967). On Bahadur's representation of sample quantiles. Ann. Math. Statist. 38,1323-1342.

Kiefer, J. (1970). Deviations between the sample quantile process and the sample dJ. In Nonparametric Techniques in Statistical Inference (M. Puri, ed.) 299-319. Cambridge University Press, London.

Shorack, G.R. (1982). Kiefer's theorem via the Hungarian construction. Z. Wahrschein­ lichkeit. verw. Gebiete 61, 369-373.

11 List of COSOR-memoranda - 1992 Number Month Author Title 92-01 January F.W. Steutel On the addition of log-convex functions and sequences

92-02 January P. v.d. Laan Selection constants for Uniform populations

92-03 February E.E.M. v. Berkum Data reduction in statistical inference R.N. Linssen D.A.Overdijk

92-04 February H.J.C. Huijberts Strong dynamic input-output decoupling: H. Nijmeijer from linearity to nonlinearity

92-05 March 5.J.1. v. Eijndhoven Introduction to a behavioral approach J .M. Soethoudt of continuous-time systems

92-06 April P.J. Zwietering The minimal number of layers of a perceptron that sorts E.H.L. Aarts J. 'Vessels

92-07 April F.P.A. Coolen Maximum Imprecision Related to Intervals of Measures and Bayesian Inference with Conjugate Imprecise Prior Densities

92-08 May LJ.B.F. Adan A Note on "The effect of varying routing probability in J. 'Vessels two parallel queues with dynamic routing under a W.H.M. Zijm threshold-type scheduling"

92-09 May LJ.B.F. Adan Upper and lower bounds for the waiting time in the G.J.J.A.N. v. Houtum symmetric shortest queue system J. v.d. Wal

92-10 May P. v.d. Laan Subset Selection: Robustness and Imprecise Selection

92-11 May R.J.M. Vaessens A Local Search Template E.H.L. Aarts (Extended Abstract) J.K. Lenstra

92-12 May F .P.A. Coolen Elicitation of Expert Knowledge and Assessment of Im- precise Prior Densities for Lifetime Distributions

92-13 May M.A. Peters Mixed H z/ H oo Control in a Stochastic Framework A.A. Stoorvogel -2- Number Month Author Title 92-14 June P.J. Zwietering The construction of minimal multi-layered perceptrons: E.H.L. Aarts a case study for sorting J. Wessels

92-15 June P. van der Laan Experiments: Design, Parametric and Nonparametric Analysis, and Selection

92-16 June J.J.A.M. Brands On the number of maxima in a discrete sample F.W. Steutel R.J.G. \Vilms

92-17 June S.J.L. v. Eijndhoven Introduction to a behavioral approach of continuous-time J .M. Soethoudt systems part II

92-18 June J .A. Hoogeveen New lower and upper bounds for scheduling around a H. Oosterhout small common due date S.L. van der Velde

92-19 June F .P.A. Coolen On Bernoulli Experiments with Imprecise Prior Probabilities

92-20 June J .A. Hoogeveen Minimizing Total Inventory Cost on a Single Machine S.L. van de Velde in Just-in-Time Manufacturing

92-21 June J .A.Hoogeveen Polynomial-time algorithms for single-machine S.L. van de Velde bicriteria scheduling

92-22 June P. van der Laan The best variety or an almost best one? A comparison of subset selection procedures

92-23 June T.J.A. Storcken Extensions of choice behaviour P.HJ\'I. Ruys

92-24 July L.C.G.J.1:1. Habets Characteristic Sets III Commutative Algebra: an overview

92-25 July P.J. Zwietering Exact Classification With Two-Layered Perceptrons E.H.L. Aarts J. "Vessels

92-26 July M.W.P. Savelsbergh Preprocessing and Probing Techniques for Mixed Integer Programming Problems -3- Number Month Author Title 92-27 July I.J.B.F. Adan Analysing EklE,.lc Queues W.A. van de vVaarsenburg J. 'Vessels

92-28 July O.J. Boxma The compensation approach applied to a 2 x 2 switch G.J. van Houtum

92-29 July E.H.L. Aarts Job Shop Scheduling by Local Search P.J.M. van Laarhoven J .K. Lenstra N.L.J. Ulder

92-30 August G.A.P. Kindervater Local Search in Physical Distribution Management M.'V.P. Savelsbergh

92-31 August M. Makowski MP-DIT Mathematical Program data Interchange Tool M.\V.P. Savelsbergh

92-32 August J .A. Hoogeveen Complexity of scheduling multiprocessor tasks with S.L. van de Velde prespecified processor allocations B. Veltman

92-33 August O.J. Boxma Tandem queues with deterministic service times J.A.C. Resing

92-34 September J .H.J. Einmahl A Bahadur-Kiefer theorem beyond the largest observation