1
Chapter 2. Order Statistics
1 The Order Statistics
For a sample of independent observations X1,X2,...,Xn on a distribution F , the ordered sample values
X(1) ≤ X(2) ≤ · · · ≤ X(n), or, in more explicit notation,
X(1:n) ≤ X(2:n) ≤ · · · ≤ X(n:n), are called the order statistics. If F is continuous, then with probability 1 the order statistics of the sample take distinct values (and conversely). There is an alternative way to visualize order statistics that, although it does not necessarily yield simple expressions for the joint density, does allow simple derivation of many important properties of order statistics. It can be called the quantile function representation. The quantile function (or inverse distribution function, if you wish) is defined by
F −1(y) = inf{x : F (x) ≥ y}. (1)
Now it is well known that if U is a Uniform(0,1) random variable, then F −1(U) has distri- bution function F . Moreover, if we envision U1,...,Un as being iid Uniform(0,1) random variables and X1,...,Xn as being iid random variables with common distribution F , then
d −1 −1 (X(1),...,X(n)) = (F (U(1)),...,F (U(n))), (2) where =d is to be read as “has the same distribution as.”
1.1 The Quantiles and Sample Quantiles
Let F be a distribution function (continuous from the right, as usual). The proof of F is right continuous can be obtained from the following fact:
F (x + hn) − F (x) = P (x < X ≤ x + hn), where {hn} is a sequence of real numbers such that 0 < hn ↓ 0 as n → ∞. It follows from the continuity property of probabaility (P (limn An) = limn P (An) if lim An exists.) that
lim [F (x + hn) − F (x)] = 0, n→∞ and hence that F is right-continuous. Let D be the set of all discontinuity points of F and n be a positive integer. Set 1 D = x ∈ D : P (X = x) ≥ . n n 2
Since F (∞)−F (−∞) = 1, the number of elements in Dn cannot exceed n. Clearly D = ∪nDn, and it follows that D is countable. Or, the set of discontinuity points of a distribution function F is countable. We then conclude that every distribution function F admits the decomposition
F (x) = αFd(x) + (1 − α)Fc(x), (0 ≤ α ≤ 1), where Fd and Fc are both continuous function such that Fd is a step function and Fc is continuous. Moreover, the above decomposition is unique. Let λ denote the Lebesgue measure on B, the σ-field of Borel sets in R. It follows from the Lebesgue decomposition theorem that we can write Fc(x) = βFs(x)+(1−β)Fac(x) where
0 ≤ β ≤ 1, Fs is singular with respect to λ, and Fac is absolutely continuous with respect to λ. On the other hand, the Radon-Nikodym theorem implies that there exists a nonnegative Borel-measurable function on R such that Z x Fac(x) = fdλ, −∞ where f is called the Radon-Nikodym derivative. This says that every distribution function F admits a unique decomposition
F (x) = α1Fd(x) + α2Fs(x) + α3Fac(x), (x ∈ R),
P3 where αi ≥ 0 and i=1 αi = 1. For 0 < p < 1, the pth quantile or fractile of F is defined as
ξ(p) = F −1(p) = inf{x : F (x) ≥ p}.
This definition is motivated by the following observation:
• If F is continuous and strictly increasing, F −1 is defined by
F −1(y) = x when y = F (x).
• If F has a discontinuity at x0, suppose that F (x0−) < y < F (x0) = F (x0+). In this case, although there exists no x for which y = F (x), F −1(y) is defined to be equal to
x0.
• Now consider the case that F is not strictly increasing. Suppose that < y for x < a F (x) = y for a ≤ x ≤ b > y for x > b
Then any value a ≤ x ≤ b could be chosen for x = F −1(y). The convention in this case is to define F −1(y) = a. 3
Now we prove that if U is uniformly distributed over the interval (0, 1), then X = −1 FX (U) has cumulative distribution function FX (x). The proof is straightforward:
−1 P (X ≤ x) = P [FX (U) ≤ x] = P [U ≤ FX (x)] = FX (x).
Note that discontinuities of F become converted into flat stretches of F −1 and flat stretches of F into discontinuities of F −1. −1 In particular, ξ1/2 = F (1/2) is called the median of F . Note that ξp satisfies
F (ξ(p)−) ≤ p ≤ F (ξ(p)).
The function F −1(t), 0 < t < 1, is called the inverse function of F . The following proposition, giving useful properties of F and F −1, is easily checked.
Lemma 1 Let F be a distribution function. The function F −1(t), 0 < t < 1, is nondecreasing and left-continuous, and satisfies
(i) F −1(F (x)) ≤ x, −∞ < x < ∞,
(ii) F (F −1(t)) ≥ t, 0 < t < 1. Hence
(iii) F (x) ≥ t if and only if x ≥ F −1(t).
Corresponding to a sample {X1,X2,...,Xn} of observations on F , the sample pth quantile is defined as the pth quantile of the sample distribution function Fn, that is, as −1 ˆ Fn (p). Regarding the sample pth quantile as an estimator of ξp, we denote it by ξpn, or ˆ simply by ξp when convenient.
Since the order stastistics is equivalent to the sample distribution function Fn, its role is fundamental even if not always explicit. Thus, for example, the sample mean may be regarded as the mean of the order statistics, and the sample pth quantile may be expressed as X if np is an integer ˆ n,np ξpn = Xn,[np]+1 if np is not an integer.
1.2 Functions of Order Statistics
Here we consider statistics which may be expressed as functions of order statistics. A variety of short-cut procedures for quick estimates of location or scale parameters, or for quick tests of related hypotheses, are provided in the form of linear functions of order statistics, that is statistics of the form n X cniX(i:n). i=1 4
We term such statistics “L-estimates.” For example, the sample range X(n:n) −X(1:n) belongs to this class. Another example is given by the α-trimmed mean.
n−[nα] 1 X X , n − 2[nα] (i:n) i=[nα]+1 which is a popular competitor of X¯ for robust estimation of location. The asymptotic dis- tribution theory of L-statistics takes quite different forms, depending on the character of the coefficients {cni}. ˆ The representations of X¯ and ξpn in terms of order statistics are a bit artificial. On the other hand, for many useful statistics, the most natural and efficient representations are in terms of order statistics. Examples are the extreme values X1:n and Xn:n and the sample range Xn:n − X1:n.
1.3 General Properties
Pn i n−i Theorem 1 (1) P (X(k) ≤ x) = i=k C(n, i)[F (x)] [1 − F (x)] for −∞ < x < ∞.
k−1 n−k (2) The density of X(k) is given by nC(n − 1, k − 1)F (x)[1 − F (x)] f(x).
(3) The joint density of X(k1) and X(k2) is given by n! k1−1 k2−k1−1 [F (x(k1))] [F (x(k2)) − F (x(k1))] (k1 − 1)!(k2 − k1 − 1)!(n − k2)!
n−k2 [1 − F (x(k2))] f(x(k1))f(x(k2))
for k1 < k2 and x(k1) < x(k2).
(4) The joint pdf of all the order statistics is n!f(z1)f(z2) ··· f(zn) for −∞ < z1 < ··· <
zn < ∞.
(5) Define V = F (X). Then V is uniformly distributed over (0, 1).
Proof. (1) The event {X(k) ≤ x} occurs if and only if at least k out of X1,X2,...,Xn are less than or equal to x. k−1 n−k (2) The density of X(k) is given by nC(n − 1, k − 1)F (x)[1 − F (x)] f(x). It can be shown by the fact that
n d X C(n, i)pi(1 − p)n−i = nC(n − 1, k − 1)pk−1(1 − p)n−k. dp i=k
Heuristically, k − 1 smallest observations are ≤ x and n − k largest are > x. X(k) falls into a small interval of length dx about x is f(x)dx. 5
1.4 Conditional Distribution of Order Statistics
In the following two theorems, we relate the conditional distribution of order statistics (con- ditioned on another order statistic) to the distribution of order statistics from a population whose distribution is a truncated form of the original population distribution function F (x).
Theorem 2 Let X1,X2,...,Xn be a random sample from an absolutely continuous popula- tion with cdf F (x) and density function f(x), and let X(1:n) ≤ X(2:n) ≤ · · · ≤ X(n:n) denote the order statistics obtained from this sample. Then the conditional distribution of X(j:n), given that X(i:n) = xi for i < j, is the same as the distribution of the (j − i)th order statistic obtained from a sample of size n − i from a population whose distribution is simply F (x) truncated on the left at xi.
Proof. From the marginal density function of X(i:n) and the joint density function of
X(i:n) and X(j:n), we have the conditional density function of X(j:n), given that X(i:n) = xi, as
fX(j:n) (xj|X(i:n) = xi) = fX(i:n),X(j:n) (xi, xj)/fX(i:n) (xi) (n − i)! F (x ) − F (x )j−i−1 = j i (j − i − 1)!(n − j)! 1 − F (xi) 1 − F (x )n−j f(x ) × j j . 1 − F (xi) 1 − F (xi)
Here i < j ≤ n and xi ≤ xj < ∞. The result follows easily by realizing that {F (xj) −
F (xi)}/{1−F (xi)} and f(xj)/{1−F (xi)} are the cdf and density function of the population whose distribution is obtained by truncating the distribution F (x) on the left at xi.
Theorem 3 Let X1,X2,...,Xn be a random sample from an absolutely continuous popula- tion with cdf F (x) and density function f(x), and let X(1:n) ≤ X(2:n) ≤ · · · ≤ X(n:n) denote the order statistics obtained from this sample. Then the conditional distribution of X(i:n), given that X(j:n) = xj for j > i, is the same as the distribution of the ith order statistic in a sample of size j − 1 from a population whose distribution is simply F (x) truncated on the right at xj.
Proof. From the marginal density function of X(i:n) and the joint density function of X(i:n) and X(j:n), we have the conditional density function of X(i:n), given that X(j:n) = xj, as
fX(i:n) (xi|X(j:n) = xj) = fX(i:n),X(j:n) (xi, xj)/fX(j:n) (xj) (j − 1)! F (x ) i−1 = i (i − 1)!(j − i − 1)! F (xj) F (x ) − F (x )j−i−1 f(x ) × j i i . F (xj) F (xj) 6
Here 1 ≤ i < j and −∞ < xi ≤ xj. The proof is completed by noting that F (xi)/F (xj) and f(xi)/F (xj) are the cdf and density function of the population whose distribution is obtained by truncating the distribution F (x) on the right at xj
1.5 Computer Simulation of Order Statistics
In this section, we will discuss some methods of simulating order statistics from a distribution F (x). First of all, it should be mentioned that a straightforward way of simulating order statistics is to generate a pseudorandom sample from the distribution F (x) and then sort the sample through an efficient algorithm like quick-sort. This general method (being time- consuming and expensive) may be avoided in many instances by making use of some of the distributional properties to be established now.
For example, if we wish to generate the complete sample (x(1), . . . , x(n)) or even a Type
II censored sample (x(1), . . . , x(r)) from the standard exponential distribution. This may be done simply by generating a pseudorandom sample y1, . . . , yr from the standard exponential distribution first, and then setting
i X x(i) = yj/(n − j + 1), i = 1, 2, . . . , r. j=1 The reason is as follows:
Theorem 4 Let X(1) ≤ X(2) ≤ · · · ≤ X(n) be the order statistics from the standard exponen- tial distribution. Then, the random variables Z1,Z2,...,Zn, where
Zi = (n − i + 1)(X(i) − X(i−1)), with X(0) ≡ 0, are statistically independent and also have standard exponential distributions.
Proof. Note that the joint density function of X(1),X(2),...,X(n) is
n ! X f1,2,...,n:n(x1, x2, . . . , xn) = n! exp − xi , 0 ≤ x1 < x2 < ··· < xn < ∞. i=1 Now let us consider the transformation
Z1 = nX(1),Z2 = (n − 1)(X(2) − X(1)),...,Zn = X(n) − X(n−1), or the equivalent transformation Z Z Z Z X = Z /n, X = 1 + 2 ,...,X = 1 + 2 + ··· + Z . (1) 1 (2) n n − 1 (n) n n − 1 n Pn Pn After noting the Jacobian of this transformation is 1/n! and that i=1 xi = i=1 zi, we immediately obtain the joint density function of Z1,Z2,...,Zn to be
n ! X fZ1,Z2,...,Zn (z1, z2, . . . , zn) = exp − zi , 0 ≤ z1, . . . , zn < ∞. i=1 7
If we wish to generate order statistics from the Uniform(0, 1) distribution, we may use the following two Theorems and avoid sorting once again. For example, if we only need the ith order statistic u(i), it may simply be generated as a pseudorandom observation from Beta(i, n − i + 1) distribution.
Theorem 5 For the Uniform(0, 1) distribution, the random variables V1 = U(i)/U(j) and
V2 = U(j), 1 ≤ i < j ≤ n, are statistically independent, with V1 and V2 having Beta(i, j − i) and Beta(j, n − j + 1) distributions, respectively.
Proof. From Theorem 1(3), we have the joint density function of U(i) and U(j) (1 ≤ i < j ≤ n) to be n! f (u , u ) = ui−1(u − u )j−i−1(1 − u )n−j, 0 < u < u < 1. i,j:n i j (i − 1)!(j − i − 1)!(n − j)! i j i j i j
Now upon makin the transformation V1 = U(i)/U(j) and V2 = U(j) and noting that the
Jacobian of this transformation is v2, we derive the joint density function of V1 and V2 to be (j − 1)! f (v , v ) = vi−1(1 − v )j−i−1 V1,V2 1 2 (i − 1)!(j − i − 1)! 1 1 n! = vj−1(1 − v )n−j, (j − 1)!(n − j)! 2 2
0 < v1 < 1, 0 < v2 < 1. From the above equation it is clear that the random variables V1 and V2 are statistically independent, and also that they are distributed as Beta(i, j − i) and Beta(j, n − j + 1), respectively.
Theorem 6 For the Uniform(0, 1) distribution, the random variables 2 n−1 ∗ U(1) ∗ U(2) ∗ U(n−1) V1 = ,V2 = , ··· ,V(n−1) = U(2) U(3) U(n)
∗ n and Vn = U(n) are all independent Uniform(0, 1) random variables.
Proof. Let X(1) < X(2) < ··· < X(n) denote the order statistics from the standard expo- nential distribution. Then upon making use of the facts that X = − log U has a standard exponential distribution and that − log u is a monotonically decreasing function in u, we d immediately have X(i) = − log U(n−i+1). The above equation yields
I −X(n−i+1) ∗ U(i) d e d Vi = = −X = exp[−i(X(n−i+1) − X(n−i))] = exp(−Yn−i+1) U(i+1) e (n−i) , upon using the above theorem, where Yi are independent standard exponential random vari- ables. The just-described methods of simulating uniform order statistics may also be used easily to generate order statistics from any known distribution F (x) for which F −1(·) is relatively easy to compute. We may simply obtain the order statistics x(1), . . . , x(n) from the −1 required distribution F (·) by setting x(i) = F (u(i)). 8
2 Large Sample Properties of Sample Quantile
2.1 An Elementary Proof
ˆ Consider the sample pth quantile, ξpn, which is X([np]) or X([np]+1) depending on whether np is an integer (here [np] denotes the integer part of np). For simplicity, we discuss the properties of X([np]) where p ∈ (0, 1) and n is large. This will in turn inform us of the ˆ properties of ξpn.
We first consider the case that X is uniformly distributed over [0, 1]. Let U([np]) denote the sample pth quantile. If i = [np], we have
n! Γ(n + 1) nC(n − 1, i − 1) = = = B(i, n − i + 1). (i − 1)!(n − i)! Γ(i)Γ(n − i + 1)
Elementary computations beginning with Theorem 1(2) yields U([np]) ∼ Beta(i0, n + 1 − i0) where i0 = [np]. Then U(i0) ∼ Beta(i0, n − i0 + 1). Note that [np] EU = → p ([np]) n + 1 [np ](n + 1 − [np ]) nCov U ,U = n 1 2 → p (1 − p ). (np1) (np2) (n + 1)2(n + 2) 1 2
P Use these facts and Chebyschev inequality, we can show easily that U([np]) → p with rate n−1/2. This generates the question whether we can claim that
ˆ P ξpn → ξp.
Recall that U = F (X). If F is absolutely continuous with finite positive density f at ξp, it is expected that the above claim holds. P −1/2 Recall that U([np]) → p with rate n . The next question would be what the distri- √ bution of n(U([np]) − p) is? Note that U([np]) is a Beta([np], n − [np] + 1) random variable. Thus, it can be expressed as
Pi0 i=1 Vi U[np] = , Pi0 V + Pn+1 V i=1 i i=i0+1 i where the Vi’s are iid Exp(1) random variables. Observe that ! √ P[np] V n i=1 i − p P[np] Pn+1 i=1 Vi + i=[np]+1 Vi n [np] n+1 o √1 P P n (1 − p) i=1 Vi − [np] − p [np]+1 Vi − (n − [np] + 1) + [(1 − p)[np] − p(n − [np] + 1)] = Pn+1 i=1 Vi/n
√ −1 and ( n) {(1 − p)[np] − p(n − [np] + 1)} → 0. Since E(Vi) = 1 and V ar(Vi) = 1, from the central limit theorem it follows that
Pi0 V − i i=1√ i 0 →d N (0, 1) i0 9 and Pn+1 Vi − (n − i0 + 1) i=i0+1√ →d N (0, 1) . n − i0 + 1 Consequently, Pi0 V − i i=1√ i 0 →d N (0, p) n and Pn+1 Vi − (n − i0 + 1) i=i0+1 √ →d N (0, 1 − p) . n Since Pi0 V and Pn V are independent for all n, i=1 i i=i0+1 i
[np] Pn+1 P V − [np] Vi − (n − [np] + 1) (1 − p) i=1 √i − p [np]+1 √ →d N(0, p(1 − p)). n n
Now, using the weak law of large numbers, we have
n+1 1 X P V → 1. n + 1 i i=1 Hence, by Slutsky’s Theorem,
√ d n U([np]) − p → N (0, p(1 − p)) .
−1 −1 For an arbitrary cdf F , we have X([np]) = F (U([np])). Upon expanding F (U([np])) in a Taylor-series around the point E(U([np])) = [np]/(n + 1), we get
d −1 −1 −1 X([np]) = F (p) + (U([np]) − p){f(F (Dn))}
where the random variable Dn is betwen U([np]) and p. This can be rearranged as
√ −1 d √ −1 −1 n X([np]) − F (p) = n U([np]) − p {f(F (Dn)} .
−1 −1 P −1 When f is continuous at F (p), it follows that as n → ∞, f(F (Dn)) → f(F (p)). Use the delta method, it yields
√ p(1 − p) n X − F −1(p) →d N 0, . ([np]) [f(F −1(p))]2
ˆ 2.2 A Probability Inequality for |ξp − ξp|
ˆ We shall use the following result of Hoeffding (1963) to show that P (|ξpn − ξp| > ) → 0 exponentially fast.
Lemma 2 Let Y1,...,Yn be independent random variables satisfying P (a ≤ Yi ≤ b) = 1, each i, where a < b. Then, for t > 0,
n n ! X X 2 2 P Yi − E(Yi) ≥ nt ≤ exp −2nt /(b − a) . i=1 i=1 10
Remark. Suppose that Y1,Y2,...,Yn are independent and identically distributed random variables. Use Bery-Esseen theorem, we have
n n ! X X p √ P Yi − E(Yi) ≥ nt = Φ(t V ar(Y1)/ n) + Error. i=1 i=1 Here
c E|Y − EY |3 |Error| ≤ √ 1 1 . 3/2 n [V ar(Y1)]
Theorem 7 Let 0 < p < 1. Suppose that ξp is the unique solution x of F (x−) ≤ p ≤ F (x). Then, for every > 0,
ˆ 2 P |ξpn − ξp| > ≤ 2 exp −2nδ , all n, where δ = min{F (ξp + ) − p, p − F (ξp − )}.
Proof. Let > 0. Write
ˆ ˆ ˆ P |ξpn − ξp| > = P ξpn > ξp + + P ξpn < ξp − .
By Lemma ??,
ˆ P ξpn > ξp + = P (p > Fn(ξp + )) n ! X = P I(Xi > ξp + ) > n(1 − p) i=1 n n ! X X = P Vi − E(Vi) > nδ1 , i=1 i=1 where Vi = I(Xi > ξp + ) and δ1 = F (ξp + ) − p. Likewise,
ˆ P ξpn < ξp − = P (p > Fn(ξp − )) n n ! X X = P Wi − E(Wi) > nδ2 , i=1 i=1 where Wi = I(Xi < ξp − ) and δ2 = p − F (ξp − ). Therefore, utilizing Hoeffding’s lemma, we have ˆ 2 P ξpn > ξp + ≤ exp −2nδ1 and ˆ 2 P ξpn < ξp − ≤ exp −2nδ2 .
Putting δ = min{δ1, δ2}, the proof is completed. 11
2.3 Asymptotic Normality
Theorem 8 Let 0 < p < 1. Suppose that F possesses a density f in a neighborhood of ξp and f is positive and continuous at ξp, then √ ˆ d p(1 − p) n(ξpn − ξp) → N 0, 2 . [f(ξp)]
Proof. We only consider p = 1/2. Note that ξp is the unique median since f(ξp) > 0. First, we consider that n is odd (i.e., n = 2m − 1). √ −1 1 √ −1 1 P n X(m) − F ≤ t = P nX(m) ≤ t F = 0 2 2 √ −1 1 = P X(m) ≤ t/ n F = 0 . 2 √ Let Sn be the number of X’s that exceed t/ n. Then
t n − 1 X ≤ √ if and only if S ≤ m − 1 = . (m) n n 2
−1 −1/2 −1 Or, Sn is a Bin(n, 1 − F (F (1/2) + tn )) random variable. Set F (1/2) = 0 and 1/2 pn = 1 − F (n t). Note that
√ 1 n − 1 P n X − F −1 ≤ t = P S ≤ (m) 2 n 2 1 ! Sn − npn 2 (n − 1) − npn = P p ≤ p . npn(1 − pn) npn(1 − pn)
Recall that pn depends on n. Write
n n Sn − npn X Yi − pn X p = p = Yin. npn(1 − pn) i=1 npn(1 − pn) i=1
Again, they can be expressed as a double array with Yi ∼ Bin(1, pn) Now utilize the Berry-Esseen Theorem to have
" 1 # n − 1 2 (n − 1) − npn P Sn ≤ − Φ p → 0 2 npn(1 − pn) as n → ∞. Writing
1 √ 1 2 (n − 1) − npn n 2 − pn p ≈ npn(1 − pn) 1/2 √ n − 1 + F (n−1/2t) F (n−1/2t) − F (0) = 2 = 2t → 2tf(0). 1/2 n−1/2t
Thus, " 1 # 2 (n − 1) − npn Φ p ≈ Φ (2f(0) · t) npn(1 − pn) or √ 1 n X − F −1(1/2) →d N 0, . (m) 4f 2(F −1(1/2)) 12
√ −1 √ When n is even (i.e., n = 2m), both P [ n(X(m) − F (1/2)) ≤ t] and P [ n(X(m+1) − F −1(1/2)) ≤ t] tend to Φ 2f(F −1(1/2)) · t. ˆ We just prove asymptotic normality of ξp in the case that F possesses derivative at the point ξp. So far we have discussed in detail the asymptotic normality of a single quantile. This discussion extends in a natural manner to the asymptotic joint normality of a fixed number of quantiles. This is made precise in the following result.
Theorem 9 Let 0 < p1 < ··· < pk < 1. Suppose that F has a density f in neighborhoods ˆ ˆ of ξp1 , . . . , ξpk and that f is positive and continuous at ξp1 , . . . , ξpk . Then (ξp1 ,..., ξpk ) is asymptotically normal with mean vector (ξp1 , . . . , ξpk ) and covariance σij/n, where
pi(1 − pj) σij = for 1 ≤ i ≤ j ≤ k f(ξpi )f(ξpj ) and σij = σji for i > j.
2 Suppose that we have a sample of size n from a normal distribution N(µ, σ ). Let mn √ represent the median of this sample. Then because f(µ) = ( 2πσ)−1,
√ d 2 2 n(mn − µ) → N(0, (1/4)/f (µ)) = N(0, πσ /2).
Compare mn with X¯n as an estimator of µ. We conclude immediately that X¯n is better than mn since the latter has a much larger variance. Now consider the above problem again with Cauchy distribution C(µ, σ) with density function
1 1 f(x) = . πσ 1 + [(x − µ)/σ]2
What is your conclusion? (Exercise)
2.4 A Measure of Dispersion Based on Quantiles
The joint normality of a fixed number of central order statistics can be used to construct simultaneous confidence regions for two or more population quantiles. As an illustration, we 1 now consider the semi-interquantile range, R = 2 (ξ3/4 − ξ1/4). (Note that the parameter σ in C(µ, σ) is the semi-interquantile range.) It is used as an alternative to σ to measure the ˆ 1 ˆ ˆ spread of the data. A natural estimate of R is Rn = 2 (ξ3/4 −ξ1/4). Theorem 4 gives the joint ˆ ˆ distribution of ξ1/4 and ξ3/4. We can use the following result, due to Cramer and Wold (1936), which reduces the convergence of multivariate distribution functions to the convergence of univariate distribution functions.
k Theorem 10 In R , the random vectors Xn converge in distribution to the random vector
X if and only if each linear combination of the components of Xn converges in distribution to the same linear combination of the components of X. 13
By Theorem 9 and the Cramer-Wold device, we have ˆ 1 3 2 3 R ∼ AN R, 2 − + 2 . 64n [f(ξ1/4)] f(ξ1/4)f(ξ3/4) [f(ξ3/4)] For F = N(µ, σ2), we have (0.7867)2σ2 Rˆ ∼ AN 0.6745σ, . n
2.5 Confidence Intervals for Population Quantiles
Assume that F posseses a density f in a neighborhood of ξp and f is positive and continuous at ξp. For simplicity, consider p = 1/2. Then √ ˆ d 1 n ξ1/2 − ξ1/2 → N 0, 2 . 4f (ξ1/2)
Therefore, we can derive a confidence interval of ξ1/2 if either f(ξ1/2) is known or a good estimator of f(ξ1/2) is available. A natural question to ask then is how do we estimate f(ξ1/2). Here we propose two estimates. The first estimate is number of observatiobns falling in ξ1/2 − hn, ξ1/2 + hn
2hn which is motivated by F ξ1/2 + hn − F ξ1/2 − hn ≈ f(ξ1/2). 2hn The second one is X − X [n/2+`] [n/2−`] where ` = O(nd) for 0 < d < 1. 2`/n
2.6 Distribution-Free Confidence Interval
When F is absolutely continuous, F (F −1(p)) = p and, hence, we have
−1 P (X(i:n) ≤ F (p)) = P (F (X(i:n)) ≤ p) = P (U(i:n) ≤ p) n X = C(n, r)pr(1 − p)n−r. (3) r=i Now for i < j, consider
−1 −1 −1 P (X(i:n) ≤ F (p)) = P (X(i:n) ≤ F (p),X(j:n) < F (p))
−1 −1 +P (X(i:n) ≤ F (p),X(j:n) ≥ F (p))
−1 −1 = P (X(j:n) < F (p)) + P (X(i:n) ≤ F (p) ≤ X(j:n)).
Since X(j:n) is absolutely continuous, this equation can be written as