6 Finite Sample Theory of Order Statistics and Extremes

6 Finite Sample Theory of Order Statistics and Extremes The ordered values of a sample of observations are called the order statistics of the sample, and the smallest and the largest called the extremes. Order statistics and extremes are among the most important functions of a set of random variables that we study in probability and statistics. There is natural interest in studying the highs and lows of a sequence, and the other order statistics help in understanding concentration of probability in a distribution, or equivalently, the diversity in the population represented by the distribution. Order statistics are also useful in statistical inference, where estimates of parameters are often based on some suitable functions of the order statistics. In particular, the median is of very special importance. There is a well developed theory of the order statistics of a fixed number n of observations from a fixed distribution, as also an asymptotic theory where n goes to infinity. We will discuss the case of fixed n in this chapter. Distribution theory for order statistics when the observations are from a discrete distribution is complex, both notationally and algebraically, because of the fact that there could be several observations which are actually equal. These ties among the sample values make the distribution theory cumbersome. We therefore concentrate on the continuous case. Principal references for this chapter are the books by David (1980). Reiss (1989), Galambos (1987), Resnick (2007), and Leadbetter, Lindgren, and Rootzén (1983). Specific other references are given in the sections. 6.1 Basic Distribution Theory Definition 6.1. Let X1.X2, ···,Xn be any n real valued random variables. Let X(1) ≤ X(2) ≤ ··· ≤ X(n) denote the ordered values of X1.X2, ···,Xn. Then, X(1),X(2), ···,X(n) are called the order statistics of X1.X2, ···,Xn. Remark: Thus, the minimum among X1.X2, ···,Xn is the first order statistic, and the maximum the nth order statistic. The middle value among X1.X2, ···,Xn is called the median. But it needs to be defined precisely, because there is really no middle value when n is an even integer. Here is our definition. Definition 6.2. Let X1.X2, ···,Xn be any n real valued random variables. Then, the median of X1.X2, ···,Xn is defined to be Mn = X(m+1) if n =2m + 1 (an odd integer), and Mn = X(m) if n =2m (an even integer). That is, in either case, the median is the order statistic X(k) where k ≥ n is the smallest integer 2 . Example 6.1. Suppose .3,.53,.68,.06,.73,.48,.87,.42,.89,.44 are ten independent observations from the U[0, 1] distribution. Then, the order statistics are .06, .3, .42, .44, .48, .53, .68, .73, .87, n .89. Thus, X(1) = .06,X(n) = .89, and since 2 =5,Mn = X(5) = .48. An important connection to understand is the connection order statistics have with the empirical CDF, a function of immense theoretical and methodological importance in both probability and statistics. Definition 6.3. Let X1,X2, ···,Xn be any n real valued random variables. The empirical CDF of X1.X2, ···,Xn, also called the empirical CDF of the sample, is the function # {Xi : Xi ≤ x} Fn(x)= , n 189 i.e., Fn(x) measures the proportion of sample values that are ≤ x for a given x. Remark: Therefore, by its definition, Fn(x) = 0 whenever x<X(1),andFn(x) = 1 whenever k x ≥ X(n). It is also a constant , namely, n , for all x-values in the interval [X(k),X(k+1)). So Fn satisfies all the properties of being a valid CDF. Indeed, it is the CDF of a discrete distribution, 1 which puts an equal probability of n at the sample values X1,X2, ···,Xn. This calls for another definition. 1 Definition 6.4. Let Pn denote the discrete distribution which assigns probability n to each Xi. Then, Pn is called the empirical measure of the sample. −1 Definition 6.5. Let Qn(p)=Fn (p) be the quantile function corresponding to Fn. Then, Qn = −1 Fn is called the quantile function of X1,X2, ···,Xn, or the empirical quantile function. −1 We can now relate the median and the order statistics to the quantile function Fn . Proposition Let X1,X2, ···,Xn be n random variables. Then, −1 i (a)X i = F ( ); ( ) n n −1 1 (b)Mn = F ( ). n 2 We now specialize to the case where X1,X2, ···,Xn are independent random variables with a common density function f(x) and CDF F (x), and work out the fundamental distribution theory of the order statistics X(1),X(2), ···,X(n). Theorem 6.1. (Joint Density of All the Order Statistics) Let X1,X2, ···,Xn be independent random variables with a common density function f(x). Then, the joint density function of X(1),X(2), ···,X(n) is given by ··· ··· f1,2,···,n(y1,y2, ,yn)=n!f(y1)f(y2) f(yn)I{y1<y2<···<yn}. Proof: A verbal heuristic argument is easy to understand. If X(1) = y1,X(2) = y2, ···,X(n) = yn, then exactly one of the sample values X1,X2, ···,Xn is y1, exactly one is y2, etc., but we can put any of the n observations at y1, any of the other n − 1 observations at y2, etc., and so the density of X(1),X(2), ···,X(n) is f(y1)f(y2) ···f(yn) × n(n − 1) ···1=n!f(y1)f(y2) ···f(yn), and obviously if the inequality y1 <y2 < ···<yn is not satisfied, then at such a point the joint density of X(1),X(2), ···,X(n) must be zero. Here is a formal proof. The multivariate transformation (X1,X2, ···,Xn) → (X(1),X(2), ···,X(n)) is not one-to-one, as any permutation of a fixed (X1,X2, ···,Xn) vector has exactly the same set of order statistics X(1),X(2), ···,X(n). However, fix a specific permutation {π(1),π(2), ···,π(n)} of {1, 2, ···,n} and consider the subset Aπ = {(x1,x2, ···,xn):xπ(1) <xπ(2) < ··· <xπ(n)}. Then, the transformation (x1,x2, ···,xn) → (x(1),x(2), ···,x(n)) is one-to-one on each such Aπ, and indeed, then x(i) = xπ(i),i =1, 2, ···,n. The Jacobian matrix of the transformation is 1, for each such Aπ. A particular vector (x1,x2, ···,xn) falls in exactly one Aπ, and there are n! such regions Aπ, as we exhaust all the n! permutations {π(1),π(2), ···,π(n)} of {1, 2, ···,n}.Bya modification of the Jacobian density theorem, we then get X f1,2,···,n(y1,y2, ···,yn)= f(x1)f(x2) ···f(xn) π 190 X = f(xπ(1))f(xπ(2)) ···f(xπ(n)) π X = f(y1)f(y2) ···f(yn) π = n!f(y1)f(y2) ···f(yn). ♣ Example 6.2. (Uniform Order Statistics). Let U1,U2, ···,Un be independent U[0, 1] variables, and U(i), 1 ≤ i ≤ n, their order statistics. Then, by our theorem above, the joint density of U(1),U(2), ···,U(n) is ··· f1,2,···,n(u1,u2, ,un)=n!I0<u1<u2<···<un<1. Once we know the joint density of all the order statistics, we can find the marginal density of any subset, by simply integrating out the rest of the coordiantes, but being extremely careful in using the correct domain over which to integrate the rest of the coordinates. For example, if we want the marginal density of just U(1), that is of the sample minimum, then we will want to integrate out u2, ···,un, and the correct domain of integration would be, for a given u1, a value of U(1), in (0,1), u1 <u2 <u3 < ···<un < 1. So, we will integrate down in the order un,un−1, ···,u2, to obtain Z Z Z 1 1 1 f1(u1)=n! ··· dundun−1 ···du3du2 u1 u2 un−1 n−1 = n(1 − u1) , 0 <u1 < 1. Likewise, if we want the marginal density of just U(n), that is of the sample maximum, then we will want to integrate out u1,u2, ···,un−1, and now the answer will be Z Z Z un un−1 u2 fn(un)=n! ··· du1du2 ···dun−1 0 0 0 n−1 = nun , 0 <un < 1. However, it is useful to note that for the special case of the minimum and the maximum, we could have obtained the densities much more easily and directly. Here is why. First take the maximum. Consider its CDF; for 0 <u<1: Yn ≤ ∩n { ≤ } ≤ P (U(n) u)=P ( i=1 Xi u )= P (Xi u) i=1 = un, d n n−1 and hence, the density of U(n) is fn(u)= du [u ]=nu , 0 <u<1. Likewise, for the minimum, for 0 <u<1, the tail CDF is: ∩n { } − n P (U(1) >u)=P ( i=1 Xi >u )=(1 u) , and so the density of U(1) is d f (u)= [1 − (1 − u)n]=n(1 − u)n−1, 0 <u<1. 1 du 191 Density of Minimum, Median, and Maximum of U[0,1] Variables; n = 15 14 12 10 8 6 4 2 x 0.2 0.4 0.6 0.8 1 For a general r, 1 ≤ r ≤ n, the density of U(r) works out to a Beta density: n! r−1 n−r fr(u)= u (1 − u) , 0 <u<1, (r − 1)!(n − r)! which is the Be(r, n − r + 1) density.

Load more