O.H. Probability III/IV – MATH 3211/4131 : lecture summaries E17
4 Order statistics
Given a sample, its kth order statistic is defined as the kth smallest value of the sample. Order statistics have lots of important applications: indeed, record values (historical maxima or minima) are used in sport, economics (especially insurance) and finance; they help to determine strength of some materials (the weakest link) as well as to model and study extreme climate events (a low river level can lead to drought, while a high level brings flood risk); order statistics are also useful in allocation of prize money in tournaments etc. Z In this section we study properties of individual order statistics, that of gaps between consecutive order statistics, and of the range of the sample. We ex- plore properties of extreme statistics of large samples from various distributions, including uniform and exponential. You might find [1, Section 4] useful.
4.1 Definition and extreme order variables
If X1,X2,...,Xn is a (random) n-sample from a particular distribution, it is often important to know the values of the largest observation, the smallest observation or the centermost observation (the median). More generally, the variable 1 def X(k) = “the kth smallest of X1, X2,..., Xn” , k = 1, 2, . . . , n , (4.1) is called the kth order variable and the (ordered) collection of observed values (X(1),X(2),...,X(n)) is called the order statistic. In general we have X X X , ie., some of the order values (1) ≤ (2) ≤ · · · ≤ (n) can coincide. If the common distribution of the n-sample X1,X2,...,Xn is continuous (having a non-negative density), the probability of such a coincidence vanishes. As in the following we will only consider samples from continuous distributions, we can (and will) assume that all of the observed values are distinct, ie., the following strict inequalities hold: X < X < < X . (4.2) (1) (2) ··· (n) It is straightforward to describe the distributions of the extreme order variables 2 X(1) and X(n). Let FX ( ) be the common cdf of the X variables; by indepen- dence, we have ·
n n F (x) = P X x = P(X x) = F (x) , X(n) (n) ≤ k ≤ X k=1 n Y (4.3) n F (x) = P(X x) = 1 P(X > x) = 1 1 F (x) . X(1) (1) ≤ − k − − X k=1 Y 1it is important to always remember that the distribution of the kth order variable depends on the total number n of observations; in the literature one sometimes specifies this dependence explicitly and uses Xk:n to denote the kth order variable. 2cumulative distribution function; in what follows, if the reference distribution is clear we will write F (x) instead of FX (x).
1 O.H. Probability III/IV – MATH 3211/4131 : lecture summaries E17
In particular, if Xk have a common density fX (x), then
n 1 n 1 f (x) = n 1 F (x) − f (x) , f (x) = n F (x) − f (x) . (4.4) X(1) − X X X(n) X X Example 4.1 Suppose that the times of eight sprinters are independent ran- dom variables with common (9.6, 10.0) distribution. Find P(X(1) < 9.69), the probability that the winningU result is smaller than 9.69.
Solution. Since the density of X is fX (x) = 2.5 1 (x), we get, with x = 9.69, · (9.6,10.0) 8 8 P(X < x) = 1 1 FX (x) 1 25 2.5 9.69 0.86986 . (1) − − ≡ − − · ≈
Exercise 4.1 (*). Let independent variables X1,..., Xn have (0, 1) dis- tribution. Show that for every x (0, 1), we have P X < x U 1 and ∈ (1) → P X > x 1 as n . (n) → → ∞ 4.2 Intermediate order variables
We now describe the distribution of the kth order variable X(k): Theorem 4.2 For k = 1, 2, . . . , n,
n n ` n ` F (x) P(X x = F (x) 1 F (x) − . (4.5) X(k) ≡ (k) ≤ ` X − X `=k X Proof. In a sample of n independent values, the number of observations not bigger than x has Bin(n, FX (x)) distribution. Further, the events
def Am(x) = exactly m of X1, X2,..., Xn are not bigger than x
n m n m are disjoint for different m, and P Am(x) = FX (x) 1 FX (x) − . The m − result now follows from
P X x P Am(x) = P(Am(x)) . (k) ≤ ≡ m k m k [≥ X≥
Remark 4.2.1 One can show that
F (x) n! k 1 n k P(X x) = y − (1 y) − dy . (4.6) (k) ≤ (k 1)!(n k)! − − − Z0 We shall derive it below using a different approach.
Exercise 4.2 (**). By using induction or otherwise, prove (4.6).
Corollary 4.3 If X has density f(x), then
n! k 1 n k f (x) = F (x) − 1 F (x) − f(x) . (4.7) X(k) (k 1)!(n k)! X − X − −
2 O.H. Probability III/IV – MATH 3211/4131 : lecture summaries E17
Exercise 4.3 (*). Derive (4.7) by differentiating (4.6).
Remark 4.3.1 If X has continuous cdf F ( ), then Y def= F (X) (0, 1), see Exercise 4.4. Consequently, the diagramm · ∼ U X ,...,X X ,...,X 1 n −→ (1) (n) ↓ ↓ Y ,...,Y Y ,...,Y 1 n −→ (1) (n) is commutative withY(k) F X (k) . Notice, that if F () is strictly increasing on its support, the maps corresponding≡ to are one-to-one,· ie., one can invert 1 ↓ 1 them by writing Xk = F − (Yk) and X(k) = F − (Y(k)). We use this fact to give an alternative proof of Corollary 4.3.
Exercise 4.4 (*). If X has continuous cdf F ( ), show that Y def= F (X) (0, 1). · ∼ U n Proof of Corollary 4.3. Let (Yj ) be a n-sample from (0, 1), and let Y be the kth j=1 U (k) order variable. Fix y (0, 1) and h > 0 such that y +h < 1, and consider the following (y,h) def ∈ def def events: Ck = Y(k) (y, y + h] , B1 = exactly one Yj in (y, y + h] , and B2 = { ∈ } (y,h) (y,h) (y,h) at least two Yj in (y, y + h] ; clearly, P(C ) = P(C B1 + P(C B2 . k k ∩ k ∩ We obviously have (y,h) n m 2 P C B2 P(B2) h = O(h ) as h 0 . k ∩ ≤ ≤ m → m 2 X≥ (y,h) On the other hand, on the event C B1, there are exactly k 1 of Yj ’s in (0, y], k ∩ − exactly one Yj in (y, y + h], and exactly n k of Yj ’s in (y + h, 1). By independence, − this implies that 3
(y,h) n! k 1 n k P C B1 = y − h (1 y h) − k ∩ (k 1)! 1! (n k)! · · − − − · · − thus implying (4.7) for the uniform distribution. The general case now follows as indicated in Remark 4.3.1. Remark 4.3.2 In what follows, we will often use provide proofs for the (0, 1) case, and refer to Remark 4.3.1 for the appropriate generalizations. U Alternatively, an argument similar to the proof above works directly for X(k) variables, see, eg., [1, Theorem 4.1.2].
4.3 Distribution of the range
For an n-sample X1,...,Xn with continuous density, let (X(1),...,X(n)) be the corresponding{ order statistic.} Our aim here is to describe the definition of 4 the range Rn, defined via
R def= X X . n (n) − (1) We start with the following simple observation. 3recall multinomial distributions! 4it characterises the spread of the sample.
3 O.H. Probability III/IV – MATH 3211/4131 : lecture summaries E17
Lemma 4.4 The joint density of X(1) and X(n) is given by
n 2 n(n 1) F (y) F (x) − f(x)f(y) , x < y , fX(1),X(n) (x, y) = − − (4.8) (0 else . Proof. Since P(X y) = F (y) n, and for x < y we have (n) ≤ n n P(X > x, X y) = P(x < Xk y) = F (y) F (x) , (1) (n) ≤ ≤ − k=1 Y it follows that
P(X x, X y) = F (y) n P(X > x, X y) . (1) ≤ (n) ≤ − (1) (n) ≤ Now, differentiate w.r.t. x and y.
Exercise 4.5 (**). In the case of an n-sample from (0, 1) distribution, derive (4.8) directly from combinatorics (cf. the second proof ofU Corollary 4.3), and then use the approach in Remark 4.3.1 to extend your result to the general case.
Theorem 4.5 The density of Rn is given by
n 2 f (r) = n(n 1) F (z + r) F (z) − f(z)f(z + r) dz . (4.9) Rn − − Z
Exercise 4.6 (*). Prove the density fRn (r) formula (4.9).
Example 4.6 If X (0, 1), the density (4.9) becomes ∼ U 1 r − n 2 n 2 f (r) = n(n 1) z + r z − dz = n(n 1) r − (1 r) (4.10) Rn − − − − Z0 5 n 1 for 0 < r < 1 and fRn (r) = 0 otherwise, so that ERn = n−+1 . Alternatively, observe that for X (0, 1), the variables X(1) and 1 X(n) have ∼ U 1 1 n −n the same distribution with EX(n) = 0 y dFX(n) (y) = n 0 y dy = n+1 . Solution. See Exercise 4.7. R R Remark 4.6.1 Recal that the beta distribution β(k, m) has density
Γ(k + m) k 1 m 1 (k + m 1)! k 1 m 1 f(x) = x − (1 x) − − x − (1 x) − , (4.11) Γ(k)Γ(m) − ≡ (k 1)!(m 1)! − − − where 0 < x < 1 and f(x) = 0 otherwise.
Exercise 4.7 (*). Let X β(k, m), ie., X has beta distribution with parameters ∼ k km k and m. Show that EX = k+m and VarX = (k+m)2(k+m+1) .
5 in particular, Rn β(n 1, 2), where the beta distribution is defined in (4.11). ∼ −
4 O.H. Probability III/IV – MATH 3211/4131 : lecture summaries E17
Example 4.7 Let X1, X2 be a sample from (0, 1), and let X(1), X(2) be the corresponding order statistics. Find the pdf forU each of the random variables: X , R = X X and 1 X . (1) 2 (2) − (1) − (2) Solution. By (4.10), fR (x) = 2(1 x) for 0 < x < 1 and fR (x) = 0 otherwise. On 2 − 2 the other hand, by (4.4), fX(1) (x) is given by the same expression and, by symmetry (or again by (4.4)), 1 X has the same density as well. − (2) Example 4.8 Let X , X , X be a sample from (0, 1), and let X , X , 1 2 3 U (1) (2) X(3) be the corresponding order statistics. Find the pdf for each of the random variables: X , R = X X and 1 X . (2) 3 (3) − (1) − (2) Solution. By (4.10), fR (x) = 6x(1 x) for 0 < x < 1 and fR (x) = 0 otherwise. On 2 − 2 the other hand, by (4.7), fX(2) (x) is given by the same expression and, by symmetry, 1 X has the same density as well. Can you explain these findings? − (2) 4.4 Gaps distribution There are many interesting (and important for applications) questions about order statistics. For example, given an n-sample from (0, 1) with the order statistic U 0 X < X < X < < X < X 1 , ≡ (0) (1) (2) ··· (n) (n+1) ≡ one is often interested in distribution of the extreme order variables X(1) and X(n), as well as in distribution of the kth gap
def ∆(k)X = X(k) X(k 1) , − − of the maximal gap max ∆(k)X = max X(k) X(k 1) , of the minimal gap − − min ∆ X = min X X (where k ranges from 1 to n + 1) among (k) (k) (k 1) others. − − Example 4.9 Given an n-sample from (0, 1), we have, for every a 0, U ≥ n n a P nX > a = P X > a/n = P(X > a/n) = 1 a/n e− , (1) (1) − ≈ in other words, the distribution ofnX(1) for n large enough is approximately Exp(1), ie., the exponential distribution with parameter one. 6
Remark 4.9.1 Notice that the previous example describes the distribution of n the first gap, X(1) X(0) X(1), namely, P(X(1) > r) = (1 r) . By symmetry, the last gap X − X ≡ has the same distribution. − (n+1) − (n)
Exercise 4.8 (**). Let X(1) be the first order variable from an n-sample with density f( ), which is positive and continuous on [0, 1], and vanishes otherwise. · y cy Let, further, f(0) = c > 0. For fixed y > 0, show that P X(1) > n e− for large n. Deduce that the distribution of Y nX is approximately≈ Exp(c) for n (1) large enough n. ≡
6recall that X Exp(λ) if for every x 0 we have P(X > x) = e λx. ∼ ≥ −
5 O.H. Probability III/IV – MATH 3211/4131 : lecture summaries E17
Exercise 4.9 (**). Let X1,..., Xn be independent positive random variables whose joint probability density function f( ) is right-continuous at the origin and satisfies f(0) = c > 0. For fixed y > 0, show· that
y cy P X > e− (1) n ≈ def for large n. Deduce that the distribution of Yn = nX(1) is approximately Exp(c) for large enough n.
Lemma 4.10 For a given n-sample from (0, 1), all gaps U def ∆(k)X = X(k) X(k 1) , k = 1, . . . , n + 1 , − − have the same distribution with P(∆ X > r) = (1 r)n for all r [0, 1]. (k) − ∈ Proof. As in the second proof of Corollary 4.3, we deduce that
n! k 2 n k − − fX(k 1),X(k) (x, y) = x (1 y) ; − (k 2)!(n k)! − − − this implies that
1 r − f∆(k)X (r) = fX(k 1),X(k) (x, x + r) dx . − Z0 Changing the variables x y using x = y(1 r) and comparing the result to the beta 7→ − n 1 distribution (4.11), we deduce that f∆ X (r) = n(1 r) − 10 Remark 4.10.2 By the previous example, for every k and n large, the distri- 7 bution of n∆(k)X is approximately Exp(1). Remark 4.10.3 As we will see below, for every fixed n, the joint distribution of the vector of gaps ∆(1)X,..., ∆(n+1)X is symmetric with respect to permutations of individual segments, 8 however, the individual gaps are not independent. Lemma 4.11 Let an n-sample from (0, 1) be fixed. Then for all m, k, satis- fying 1 m < k n + 1 we have U ≤ ≤ P ∆ X r , ∆ X r = (1 r r )n , (4.12) (m) ≥ m (k) ≥ k − m − k for all positive r , r satisfying r + r 1. m k m k ≤ 7 more precisely, for every sequence (kn)n 1 with kn 1, 2, . . . , n , the distribution of the ≥ ∈ { } rescaled gap n∆(kn)X converges to Exp(1) as n . 8 → ∞ ie., the gaps ∆(1)X, . . . , ∆(n+1) are exchangeable random variables; 6 O.H. Probability III/IV – MATH 3211/4131 : lecture summaries E17 Proof. We fix the value of X(m), and use the formula of total probability with in- tegration over the value of X (rm, 1 rk). Notice, that given X = x, the (m) ∈ − (m) distributions of (X(1),...,X(m 1)) and (X(m+1),...,X(n)) are conditionally indepen- − dent. Using the scaled version of Lemma 4.10 we deduce that 9 m 1 rm rm − Pn ∆(m)X rm X(m) = x Pm 1 ∆(m)Y = 1 , ≥ | ≡ − ≥ x − x n m rk rk − Pn ∆(k)X rk X(m) = x Pn m ∆(k m)Y = 1 . ≥ | ≡ − − ≥ 1 x − 1 x − − By the formula of total probability, 1 rk m 1 n m − rm − rk − P ∆ X rm, ∆ X rk = 1 1 fX (x) dx , (m) ≥ (k) ≥ − x − 1 x (m) Zrm − so that, using the explicit expression (4.7) for the density of X(m) and changing the variables via x = rm + (1 rm rk)y, we get − − 1 n n! m 1 n m n 1 rm rk y − (1 y) − dy (1 rm rk) , − − (m 1)!(n m)! − ≡ − − Z0 − − where the last equality follows directly from (4.11). Remark 4.11.1 Notice that by (4.12), the gaps ∆(m)X and ∆(k)X are not independent. However, for n large enough, we have rm rk r r P ∆ X , ∆ X e− m− k (m) ≥ n (k) ≥ n ≈ r r rm rk = e− m e− k P ∆ X P ∆ X , ≈ (m) ≥ n (k) ≥ n ie., the rescaled gaps n∆(m)X and n∆(k)X are asymptotically independent (and asymptotically Exp(1)-distributed). Exercise 4.10 (**). Let X , X , X , X be a sample from (0, 1), and let 1 2 3 4 U X(1), X(2), X(3), X(4) be the corresponding order statistics. Find the pdf for each of the random variables: X , X X , X X , and 1 X . (2) (3) − (1) (4) − (2) − (3) Corollary 4.12 Let (X )n be a fixed n-sample from (0, 1). Then for all k k=1 U positive r satisfying n+1 r 1 we have k k=1 k ≤ P n+1 n P ∆ X r ,..., ∆ X r = 1 r . (4.13) (1) ≥ 1 (n+1) ≥ n+1 − k k=1 X In particular, the distribution of the vector of gaps ∆(1)X,..., ∆(n+1)X is exchangeable for every n 1. ≥ 9with Y denoting a sample from (0, 1); U 7 O.H. Probability III/IV – MATH 3211/4131 : lecture summaries E17 Remark 4.12.1 Similarly to Remark 4.11.1, one can use (4.13) to deduce that 10 any finite collection of rescaled gaps, say (n∆(k1)X, . . . , n∆(km)X), becomes asymptotically independent (with individual components Exp(1)-distributed) in the limit of large n. Exercise 4.11 (***). Prove the asymptotic independence property of any finite collection of gaps stated in Remark 4.12.1. Exercise 4.12 (***). Using induction or otherwise, prove (4.13). def Example 4.13 For an n-sample from (0, 1), let Dn = mink ∆(k)X. Then U n (4.13) implies that P(D r) = 1 (n + 1)r . In particular, for every y 0, n ≥ − ≥ n 2 n + 1 y P n D > y = 1 y e− , (4.14) n n − n2 ≈ def 2 ie., for large n the distribution of Yn = n Dn is approximately Exp(1). 1 Remark 4.13.1 For an n-sample from (0, 1), a typical gap is of order n− U 2 (recall Remark 4.10.2), whereas by (4.14) the minimal gap Dn is of order n− . 4.5 Order statistic from exponential distributions λx def If X Exp(λ), ie., P(X > x) = e− for all x 0, then Y = λX Exp(1). It is thus∼ enough to consider Exp(1) random variables.≥ ∼ Example 4.14 If (X )n is an n-sample from Exp(1), then X Exp(n). k k=1 (1) ∼ n ny Solution. By independence, for every y 0, P(X(1) > y) = P(Xk > y) = e− . ≥ k=1 Q Exercise 4.13 (*). Let Xk Exp(λk), k = 1, . . . , n, be independent with fixed λ > 0. Denote X = min X ∼,...,X and λ = n λ . Show that for y 0 k 0 { 1 n} 0 k=1 k ≥ λ y λ P X > y, X = X = e− P0 k , 0 0 k λ0 ie., the minimum X of the sample satisfies X Exp(λ ) and the probability that 0 0 ∼ 0 it coincides with Xk is proportional to λk, independently of the value of X0. Recall the following memoryless property of exponential distributions: Exercise 4.14 (*). If X Exp(λ), show that for all positive a and b we have ∼ P X > a + b X > a = P(X > b) . | 10 with a little bit of extra work, one can also allow m = mn slowly enough as n . → ∞ → ∞ 8 O.H. Probability III/IV – MATH 3211/4131 : lecture summaries E17 Lemma 4.15 If (X )n are independent with X Exp(λ ), then for arbi- k k=1 k ∼ k trary fixed positive a and b1,..., bn, n n P Xk > a + bk min(Xk) > a = P Xk > bk . (4.15) k=1 k=1 \ \ In addition, n n P a < X a + b = P min(X ) > a P X b . (4.16) k ≤ k k k ≤ k k=1 k=1 \ \ λkc Proof. The results follows from independence and the fact P(Xk > c) = e− . Remark 4.15.1 By the uniformity in a 0 of the memoryless property (4.15), the latter also holds for random thresholds;≥ eg., if Y 0 is a random variable ≥ with density fY (y) independent of X1,...,Xn , then conditioning on the value of Y and using the formula of total{ probability,} we get n P Xk > Y + bk min(Xk) > Y k=1 \ n n = fY (a)P Xk > a + bk min(Xk) > a da = P Xk > bk . Z k=1 k=1 \ \ Exercise 4.15 (**). If X Exp(λ), Y Exp(µ) and Z Exp(ν) are indepen- dent, show that for every constant∼ a 0 we∼ have P(a + X∼ < Y a < Y ) = λ ; ≥ | λ+µ deduce that P a + X < min(Y,Z) a < min(Y,Z) = λ . | λ+µ+ν n Corollary 4.16 Let (Xk)k=1 be an n-sample from Exp(1). Denote Z = X + 1 X + + 1 X . n 1 2 2 ··· n n 11 Then X(n) and Zn have the same distribution. m Proof. Write Ym for the maximum X(m) of an m-sample (Xk)k=1 from Exp(1). We a a m obviously have P(Z1 a) = 1 e− and P(Ym a) = (1 e− ) for all m 1. Since ≤ − ≤ − ≥ Zm has the same distribution as the (independent) sum Zm 1+Um with Um Exp(m), − ∼ a mx P Zm a = P Zm 1 + Um a = P(Zm 1 a x) me− dx , ≤ − ≤ − ≤ − Z0 for all m 1. Next, by symmetry, ≥ P Ym a = mP Xm < min(X1,...,Xm 1) < Ym 1 a , ≤ − − ≤ so that, using (4.16), this becomes a m 1 a − mx (a x) m 1 m fX (x) P x < Xk a dx = me− 1 e− − − dx . m ≤ − Z0 k=1 Z0 \ 11 1 1 One can also show that X(k) has the same distribution as Vn+1 k + + Vn n+1 k − ··· n with jointly independent V Exp(1). − k ∼ 9 O.H. Probability III/IV – MATH 3211/4131 : lecture summaries E17 12 The result now follows by straightforward induction. Exercise 4.16 (**). Carefully prove Corollary 4.16 and compute EYn and VarYn. Exercise 4.17 (**). Let X(n) be the maximum of an n-sample from Exp(1) distribution. For x R, find the value of P X(n) log n + x in the limit n . ∈ ≤ → ∞ 4.6 Additional problems Exercise 4.18 (*). Let X1, X2 be a sample from a uniform distribution on 1, 2, 3, 4, 5 . Find the distribution of X , the minimum of the sample. { } (1) Exercise 4.19 (*). Let independent variables X1,..., Xn be Exp(1) distributed. Show that for every x > 0, we have P X x 1 and P X x 1 as (1) ≤ → (n) ≥ → n . Generalise the result to arbitrary distributions on R. → ∞ Exercise 4.20 (*). Let X1,X2,X3,X4 be a sample from a distribution with 7 x1 { } density f(x) = e − x>7. Find the pdf of the second order variable X(2). Exercise 4.21 (*). Let X1 and X2 be independent Exp(λ) random variables. a) Show that X and X X are independent and find their distributions. (1) (2) − (1) b) Compute E(X X = x ) and E(X X = x ). (2) | (1) 1 (1) | (2) 2 n Exercise 4.22 (**). Let (Xk)k=1 be an n-sample from Exp(λ) distribution. n a) Show that the gaps ∆(k)X k=1 as defined in Lemma 4.10 are independent and find their distribution. b) For fixed 1 k m n, compute the expectation E(X X = x ). ≤ ≤ ≤ (m) | (k) k Exercise 4.23 (**). If Y Exp(µ) and an arbitrary random variable X 0 are ∼ µX ≥ independent, show that for every a > 0,P(a + X < Y a < Y ) = Ee− . | 4.6.1 Optional material13 Exercise 4.24 (**). In the context of Exercise 4.18, let X1,X2 be a sample without { } replacement from 1, 2, 3, 4, 5 . Find the distribution of X , the minimum of the sample. { } (1) Exercise 4.25 (*). Let X1,X2 be an independent sample from Geom(p) distribution, { } P(X > k) = (1 p)k for integer k 0. Find the distribution of X , the minimum of − ≥ (1) the sample. Exercise 4.26 (**). Let X1,X2,...,Xn be an n-sample from a distribution with { } density f( ). Show that the joint density of the order variables X ,..., X is · (1) (n) n fX ,...,X (x1, . . . , xn) = n! f(xk) 1x < 10 O.H. Probability III/IV – MATH 3211/4131 : lecture summaries E17 Exercise 4.27 (**). Let X1,X2,X3 be a sample from (0, 1). Find the conditional { } U density fX ,X X (x, z y) of X(1) and X(3) given that X(2) = y. Explain your findings. (1) (3)| (2) | Exercise 4.28 (**). Let X1,X2,...,X100 be a sample from (0, 1). Approximate { } U the value of P(X 0.8). (75) ≤ Exercise 4.29 (**). Let X1, X2, . . . be independent random variables with cdf F ( ), · and let N > 0 be an integer-valued variable with probability generating function g( ), inde- · pendent of the sequence (Xk)k 1. Find the cdf of max X1,X2,...,XN , the maximum ≥ } of the first N terms in that sequence. Exercise 4.30 (***). Let X(1) be the first order variable from an n-sample with density f( ), which is positive and continuous on [0, 1], and vanishes otherwise. Let, · α 1 further, f(x) cx for small x > 0 and positive c and α. For y > 0 and β = α+1 , show ≈ β that the probability P(X(1) > yn− ) has a well defined limit for large n. What can you def β deduce about the distribution of the rescaled variable Yn = n X(1) for large enough n? Exercise 4.31 (***). Let X1,..., Xn be independent β(k, m)-distributed random variables whose joint distribution is given in (4.11) (with k 1 and m 1). Find δ > 0 def δ ≥ ≥ such that the distribution of the rescaled variable Yn = n X(1) converges to a well-defined limit as n . Describe the limiting distribution. → ∞ Exercise 4.32 (****). In the situation of Exercise 4.30, let α < 0. What can you say about possible limiting distribution of the suitably rescaled fisrt order variable, δ Yn = n X(1), with some δ R? ∈ Exercise 4.33 (****). Denote Yn∗ = Yn log n; show that the corresponding cdf, x − e− P(Yn∗ x), approaches e− , as n . Deduce that the expectation of the limiting ≤ → ∞14 1 π2 distribution equals γ, the Euler constant, and its variance is k2 = 6 . k 1 ≥ (x µ)/β P A distribution with cdf exp e− − is known as Gumbel distribution (with scale − and locations parameters β and µ, resp.). It can be shown that its average is 14 µ+βγ, its variance is π2β2/6, and its moment generating function equals Γ(1 βt)eµt. − Exercise 4.34 (****). By using the Weierstrass formula, ∞ z 1 1 + − ez/k = eγzΓ(z + 1) , k kY=1 (where γ is the Euler constant 14 and Γ( ) is the classical gamma function, Γ(n) = (n 1)! · − for integer n > 0) or otherwise, show that the moment generating function EetZn∗ of γt Z∗ = Zn EZn approaches e− Γ(1 t) as n (eg., for all t < 1). Deduce that in n − − → ∞ | | that limit Zn∗ + γ is asymptotically Gumbel distributed (with β = 1 and µ = 0). Exercise 4.35 (***). Let X1, X2, . . . be independent Exp(λ) random variables; further, let N Poi(ν), independent of the sequence (Xk)k 1, and let X0 0. Find the ≥ ∼ def ≡ distribution of Y = max X0,X1,X2,...,XN , the maximum of the first N terms of this { } sequence, where for N = 0 we set Y = 0. 14 n 1 the Euler constant γ is limn log n 0.5772156649...; →∞ k=1 k − ≈ P 11 O.H. Probability III/IV – MATH 3211/4131 : lecture summaries E17 5 Coupling Two random variables, say X and Y , are coupled, if they are defined on the same probablity space. To couple two given variables X and Y , one usually defines a random vector X, Y ) with joint probability P( , ) on some probability space15 · · so that the marginal distribution of X coincides with the distribution of X and the marginal distributione e of Y coincides withe the distribution of Y . e Example 5.1 Fix p , p [0, 1] such that p p and consider the following 1 2 e 1 2 joint distributions (we write∈ q = 1 p ): ≤ i − i T1 0 1 X T2 0 1 X 0 q1q2 q1p2 q1 0 q2 p2 p1 q1 e − e 1 p1q2 p1p2 p1 1 0 p1 p1 Y q2 p2 Y q2 p2 It is easye to see that in both cases X Ber(p1e), Y Ber(p2), though in the ∼ ∼ first case X and Y are independent (and the joint distribution is known as an “independent coupling”), whereas ine the second casee we have P(X Y ) = 1 (and the jointe distributione is called a “monotone coupling”). ≤ e e e Exercise 5.1 (*). In the setting of Example 5.1 show that every convex linear combination of tables T and T , ie., each table of the form T = αT + (1 α)T 1 2 α 1 − 2 with α [0, 1], gives an example of a coupling of X Ber(p1) and Y Ber(p2). Can you∈ find all possible couplings for these variables?∼ ∼ 5.1 Stochastic order If X Ber(p), its tail probabilities P(X > a) satisfy ∼ 1, a < 0 , P(X > a) = p, 0 a < 1 , ≤ 0, a 1 . ≥ Consequently, in the setup of Example 5.1, for the variable X Ber(p ) and ∼ 1 Y Ber(p2) with p1 p2 we have P(X > a) P(Y > a) for all a R. The last∼ inequality is useful≤ enough to deserve a name:≤ ∈ b Definition 5.2 [Stochastic domination] A random variable X is stochastically smaller than a random variable Y (write X 4 Y ) if the inequality P X > x P Y > x (5.1) ≤ holds for all x R. ∈ Z 15A priori the original variables X and Y can be defined on arbitrary probability spaces, so that one has no reason to expect that these spaces can be “joined” in any way! 12 O.H. Probability III/IV – MATH 3211/4131 : lecture summaries E17 Exercise 5.2 (*). Let X 0 be a random variable, and let a 0 be a fixed ≥ ≥ constant. If Y = a + X, is it true that X 4 Y ? If Z = aX, is it true that X 4 Z? Justify your answer. Exercise 5.3 (*). If X Y and g( ) is an arbitrary increasing function on R, 4 · show that g(X) 4 g(Y ). Z Example 5.3 Let random variables X 0 and Y 0 be stochastically or- ≥ ≥ dered, X Y . If g( ) 0 is a smooth increasing function on R with g(0) = 0, 4 · ≥ then g(X) 4 g(Y ) and ∞ ∞ Eg(X) g0(z)P(X > z) dz g0(z)P(Y > z) dz Eg(Y ) . (5.2) ≡ ≤ ≡ Z0 Z0 Exercise 5.4 (*). Generalise the inequality in Example 5.3 to a broader class of 2k+1 2k+1 tX tY functions g( ) and verify that if X 4 Y , then E(X ) E(Y ),Ee Ee with t > 0,E· sX EsY with s > 1 etc. ≤ ≤ ≤ Exercise 5.5 (*). Let ξ (0, 1) be a standard uniform random variable. For ∼ U fixed p (0, 1), define X = 1ξ Exercise 5.6 (**). In the setup of Example 5.1, show that X Ber(p1) is stochastically smaller than Y Ber(p ) (ie., X Y ) iff p p . Further,∼ show ∼ 2 4 1 ≤ 2 that X 4 Y is equivalent to existence of a coupling X, Y ) of X and Y in which these variables are ordered with probablity one, P(X Y ) = 1. ≤e e The next result shows that this connectione e betweene stochastic order and existence of a monotone coupling is a rather generic feature: b Lemma 5.4 A random variable X is stochastically smaller than a random vari- able Y if and only if there exists a coupling X, Y ) of X and Y such that P(X Y ) = 1. ≤ e e Remarke e e 5.4.1 Notice that one claim of Lemma 5.4 is immediate from P(x < X) P x < X = P x < X Y P x < Y P x < Y ; ≡ ≤ ≤ ≡ the other claim requirese ae moree advancede argumente e (wee shall not do it here!). Example 5.5 If X Bin(m, p) and Y Bin(n, p) with m n, then X Y . ∼ ∼ ≤ 4 Solution. Let Z1 Bin(m, p) and Z2 Bin(n m, p) be independent variables defined ∼ ∼ − on the same probability space. We then put X = Z1 and Y = Z1 + Z2 so that Y X = Z2 0 with probability one, P(X Y ) = 1, and X X, Y Y . − ≥ ≤ ∼ ∼ e e Example 5.6 If X Poi(λ) and Y Poi(µ) with λ µ, then X Y . e e ∼ e∼ e e ≤ e 4e 13 O.H. Probability III/IV – MATH 3211/4131 : lecture summaries E17 Solution. Let Z1 Poi(λ) and Z2 Poi(µ λ) be independent variables defined ∼ 16 ∼ − on the same probability space. We then put X = Z1 and Y = Z1 + Z2 so that Y X = Z2 0 with probability one, P(X Y ) = 1, and X X, Y Y . − ≥ ≤ ∼ ∼ e e 2 2 Exercisee e 5.7 (**). Let X (eµXe, σXe) and Y (µYe , σY ) bee Gaussian r.v.’s. ∼ N ∼ N a) If µ µ but σ2 = σ2 , is it true that X Y ? X ≤ Y X Y 4 b) If µ = µ but σ2 σ2 , is it true that X Y ? X Y X ≤ Y 4 Exercise 5.8 (**). Let X Exp(λ) and Y Exp(µ) be two exponential random variables. If 0 < λ µ∼ < , are the variables∼ X and Y stochastically ordered? Justify your answer≤ by proving∞ the result or giving a counter-example. Exercise 5.9 (**). Let X Geom(p) and Y Geom(r) be two geometric random variables, X Geom(p∼) and Y Geom(r∼). If 0 < p r < 1, are the variables X and Y stochastically∼ ordered?∼ Justify your answer by≤ proving the result or giving a counter-example. λa a 1 λx1 The gamma distribution Γ(a, λ) has density Γ(a) x − e− x>0 (so that Γ(1, λ) is just Exp(λ)). Gamma distributions have the following additive prop- erty: if Z1 Γ(c1, λ) and Z2 Γ(c2, λ) are independent random variables (with the same λ∼), then their sum is∼ also gamma distributed: Z + Z Γ(c + c , λ). 1 2 ∼ 1 2 Exercise 5.10 (**). Let X Γ(a, λ) and Y Γ(b, λ) be two gamma random variables. If 0 < a b < ,∼ are the variables ∼X and Y stochastically ordered? Justify your answer by≤ proving∞ the result or giving a counter-example. 5.2 Total variation distance b Definition 5.7 [Total Variation Distance] Let µ and ν be two probability mea- sures on the same probability space. The total variation distance between µ and ν is def dTV(µ, ν) = max µ(A) ν(A) . (5.3) A − If X, Y are two discrete random variables, we write d (X,Y ) for the total TV variation distance between their distributions. Example 5.8 If X Ber(p ) and Y Ber(p ) we have ∼ 1 ∼ 2 d (X,Y ) = max p p , q q = p p = 1 p p + q q . TV | 1 − 2| | 1 − 2| | 1 − 2| 2 | 1 − 2| | 1 − 2| Exercise 5.11 (**) . Let measure µ have p.m.f. px x and let measure ν { } ∈X have p.m.f. qy y . Show that { } ∈Y d (µ, ν) d p , q = 1 p q , TV ≡ TV { } { } 2 z − z z X where the sum runs over all z . Deduce that d TV( , ) is a distance between probability measures17(ie., it is∈ non-negative, X ∪ Y symmetric,· and· satisfies the triangle 16here and below we assume that Z Poi(0) means that P(Z = 0) = 1. 17so that all probability measures form∼ a metric space for this distance! 14 O.H. Probability III/IV – MATH 3211/4131 : lecture summaries E17 inequality) such that d (µ, ν) 1 for all probability measures µ and ν. TV ≤ Exercise 5.12 (**). In the setting of Exercise 5.11 show that d p , q = p min(p , q ) = q min(p , q ) . (5.4) TV { } { } z − z z z − z z z z X X An important relation between coupling and the total variation distance is explained by the following fact. b Example 5.9 [Maximal Coupling] Let random variables X and Y be such that P(X = x) = p , x , and P(Y = y) = q , y , with d p , q > 0. x ∈ X y ∈ Y TV { } { } For each z = , put r = min p , q and define the joint distribution ∈ Z X ∪ Y z z z P( , ) in via · · Z × Z b px rx qy ry P X = Y = z = rz , P X = x, Y = y = − − , x = y . dTV p , q 6 { } { } b e e b e e Then P( , ) is a coupling of X and Y . · · Solution. By using (5.4) we see that P(X = x, Y = z) = rx +(px rx) = P(X = x), b z − for all x , ie., the first marginal is as expected. Checking correctness of the Y ∈ X P marginal is similar. b Remark 5.9.1 By (5.4), we have dTV p , q = 0 iff pz = qz for all z. In this { } { } 1 case it is natural to define the maximal coupling via P X = x, Y = y = rx x=y for all x , y . Notice that in this case the coupling is concentrated on ∈ X ∈ Y the diagonal x = y and all its off-diagonal terms vanish.b e e Z Example 5.10 Consider X Ber(p1) and Y Ber(p2) with p1 p2. It is a straightforward exercise to check∼ that the second∼ table in Example≤ 5.1 provides the maximal coupling of X and Y . We notice also that in this case P X = Y = p p = d (X,Y ) . 6 2 − 1 TV Exercise 5.13 (**). b IfeP( ,e ) is the maximal coupling of X and Y as defined · · in Example 5.9 and Remark 5.9.1, show that P(X = Y ) = dTV X,Y . b 6 Lemma 5.11 Let P( , ) be the maximalb couplinge e of X and Y as defined in · · Example 5.9. Then for every other coupling P( , ) of X and Y we have b · · P X = Y P X = Y = d X,Y . (5.5) 6 ≥ 6 e TV Proof. Summing the inequalities P X = Y = z min( pz, qz) we deduce e e e b e e ≤ P X = Y 1 min(pze, qze) = e pz min(pz, qz) = dTV p , q , 6 ≥ − − { } { } z z X X e e e where the last equality follows from (5.4). The claim follows from Exercise 5.13. 15 O.H. Probability III/IV – MATH 3211/4131 : lecture summaries E17 Remark 5.11.1 Notice that according to (5.5), P X = Y P X = Y = 1 d X, Y , ≤ − TV ie., the probability that X =Y is maximised under the optimal coupling P( , ). e e e b e e e e · · Z Example 5.12 Fix p (0, 1). Then the maximal coupling of X Ber(p) and e e b Y Poi(p) satisfies ∈ ∼ ∼ p P X = Y = 0 = 1 p , P X = Y = 1 = pe− , − and P X = Yb=ex =e 0 for all x > 1, so thatb e e p 2 d X, Y P X = Y = 1 P X = Y = p 1 e− p . (5.6) b TVe e ≡ 6 − − ≤ Exercise 5.14e e (**)b . eFore fixed p b (0e, 1),e complete the construction of the maximal coupling of X Ber(p) and ∈Y Poi(p) as outlined in Example 5.12. Is any of the variables X and∼ Y stochastically∼ dominated by another? 5.2.1 The Law of rare events The following sub-additive property of total variation distance is important in applications. Example 5.13 Let X = X1 + X2 with independent X1 and X2. Similarly, let Y = Y1 + Y2 with independent Y1 and Y2. Then d (X,Y ) d (X ,Y ) + d (X ,Y ) . (5.7) TV ≤ TV 1 1 TV 2 2 Solution. By (5.5), for any joint distribution P( ) of X1,X2,Y1,Y2 we have · { } dTV(X,Y ) P(X = Y ) P(X1 = Y1) + P(X2 = Y2) . ≤ 6 ≤ 6 6 Now let Pi( , ) be the maximal coupling for the pair (Xi,Yi), i = 1, 2, and let · · 18 P = P1 P2 be the (independent) product measure of P1 and P2. Under such P, the × variables bX1 and X2 (similarly, Y1 and Y2) are independent; moreover, the RHS of the displayb aboveb becomes just dTV(X1,Y1) + dTV(X2,Y2), sob (5.7)b follows. Remark 5.13.1 An alternativee proof of (5.7) can be obtained as follows. Let Z be the independent sum Y1 + X2; by using the explicit formula for d ( , ) in Exercise 5.11 one can show that d (X,Z) = d (X ,Y ) and TV · · TV TV 1 1 dTV(Z,Y ) = dTV(X2,Y2); together with the triangle inequality for the total variation distance, d (X,Y ) d (X,Z) + d (Z,Y ), this gives (5.7). TV ≤ TV TV n Theorem 5.14 Let X = k=1 Xk, where Xk Ber(pk) are independent ran- ∼ n dom variables. Let, further, Y Poi(λ), where λ = k=1 pk. Then the maximal coupling of X and Y satisfiesP ∼ P n d X,Y P X = Y (p )2 . TV ≡ 6 ≤ k k=1 18 X ie., s.t. P Xi = ai,Yj = bj = P1(X1b= ae1,Y1e= b1) P1(X2 = a2,Y2 = b2) for all ai, bj . · b b 16 O.H. Probability III/IV – MATH 3211/4131 : lecture summaries E17 n Proof. Write Y = Yk, where Yk Poi(pk) are independent rv’s, and use the k=1 ∼ approach of Example 5.13. Of course, P n n n P Xk = Yk P(Xk = Yk) 6 ≤ 6 kX=1 kX=1 kX=1 n n for every joint distribution of (Xk)k=1 and (Yk)k=1. Let Pk be the maximal coupling for the pair Xk,Yk , and let P0 be the maximal coupling for two sums. Notice that { } the LHS above is not smaller than dTV(X,Y ) P0 X = Yb ; on the other hand, using b ≡ 6 the (independent) product measure P = P1 Pn on the right of the display ×b · · · ×e n e above we deduce that then the RHS becomes just Pk Xk = Yk . The result k=1 6 now follows from (5.6). b b P b e e λ Exercise 5.15 (**). Let X Bin(n, n ) and Y Poi(λ) for some λ > 0. Show that ∼ ∼ 1 λ2 P(X = k) P(Y = k) d X, Y for every k 0. (5.8) 2 − ≤ TV ≤ n ≥ e e λk λ Deduce that for every fixed k 0, we have P(X = k) e− as n . ≥ → k! → ∞ Z Remark 5.14.1 By (5.8), if X Bin(n, p) and Y Poi(np) then for every k 0 the probabilities P(X = k)∼ and P(Y = k) differ∼ by at most 2np2. Eg., if n ≥= 10 and p = 0.01 the discrepancy between any pair of such probabilities is bounded above by 0.002, ie., they coincide in the first two decimal places. 5.3 Additional problems Exercise 5.16 (*). Let random variable X have density f( ), and let g( ) be a smooth increasing function with g(x) 0 as x . Show· that · → → −∞ Eg(X) = dx 1z Exercise 5.17 (**). Let X Bin(1, p ) and Y Bin(2, p ) with p p . By ∼ 1 ∼ 2 1 ≤ 2 constructing an explicit coupling or otherwise, show that X 4 Y . Exercise 5.18 (***). Let X Bin(2, p ) and Y Bin(2, p ) with p p . ∼ 1 ∼ 2 1 ≤ 2 Construct an explicit coupling showing that X 4 Y . Exercise 5.19 (***). Let X Bin(3, p ) and Y Bin(3, p ) with 0 < p ∼ 1 ∼ 2 1 ≤ p2 < 1. Construct an explicit coupling showing that X 4 Y . Exercise 5.20 (***). Let X Bin(2, p) and Y Bin(4, p) with 0 < p < 1. ∼ ∼ Construct an explicit coupling showing that X 4 Y . 17 O.H. Probability III/IV – MATH 3211/4131 : lecture summaries E17 Exercise 5.21 (**). Let m n and 0 < p1 p2 < 1. Show that X Bin(m, p ) is stochastically smaller≤ than Y Bin(n, p≤ ). ∼ 1 ∼ 2 Exercise 5.22 (**). Let X Ber(p) and Y Poi(ν). Characterise all pairs ∼ ∼ (p, ν) such that X 4 Y . Is it true that X 4 Y if p = ν? Is it possible to have Y 4 X? Exercise 5.23 (**). Let X Bin(n, p) and Y Poi(ν). Characterise all pairs ∼ ∼ (p, ν) such that X 4 Y . Is it true that X 4 Y if np = ν? Is it possible to have Y 4 X? Exercise 5.24 (**). Prove the additive property of dTV( , ) for independent sums, (5.7), by following the approach in Remark 5.13.1. · · Exercise 5.25 (***). Generalise your argument in Exercise 5.24 for general sums, and derive an alternative proof of Theorem 5.14. Exercise 5.26 (**). If X, Y are stochastically ordered, X 4 Y , and Z is indepedentent of X and Y , show that (X + Z) 4 (Y + Z). Exercise 5.27 (*). If random variables X, Y , Z satisfy X 4 Y and Y 4 Z, show that X 4 Z. Exercise 5.28 (**). Let X = X1 + X2 with independent X1 and X2, similarly, Y = Y1 + Y2 with independent Y1 and Y2. If X1 4 Y1 and X2 4 Y2, deduce that X 4 Y . Exercise 5.29 (**). Let X Ber(p0) and Y Ber(p00) for fixed 0 < p0 < k ∼ k ∼ p00 < 1 and all k = 0,1,..., n. Show that X = k Xk is stochastically smaller than Y = Yk. k P P 18 O.H. Probability III/IV – MATH 3211/4131 : lecture summaries E17 6 Some non-classical limits 6.1 Convergence to Poisson distribution In Sec. 5.2.1 we used coupling to derive the “law of rare events” for sums of independent Bernoulli random variables. An alternative approach is to use generating functions: Exercise 6.1 (*). Let X Bin(n, p), where p = p(n) is such that np n ∼ → λ (0, ) as n . Show that for every fixed s R the generating function ∈ def∞ → ∞ n ∈ Xn λ(s 1) Gn(s) = E s = 1+p(s 1) converges, as n , to GY (s) = e − , the generating function of Y Poi− (λ). Deduce that the→ distribution ∞ of X approaches n that of Y in this limit. ∼ A time non-homogeneous version of the previous result can be obtained similarly: Theorem 6.1 For n 1, let X(n), k = 1, . . . , n, be independent Bernoulli ≥ k r.v.’s, X Ber p(n) . Assume that as n n ∼ k → ∞ n n (n) (n) (n) max pk 0 and pk E Xk λ (0, ) . 1 k n → ≡ → ∈ ∞ ≤ ≤ kX=1 kX=1 Then the distribution of X def= n X(n) converges, as n , to Poi(λ). n k=1 k → ∞ A straightforward proof canP be derived in analogy with that in Exercise 6.1, by using the following well-known uniform estimate for the logarithmic function: Exercise 6.2 (*). Show that x + log(1 x) x2 uniformly in x 1/2. | − | ≤ | | ≤ Exercise 6.3 (**). Use the estimate in Exercise 6.2 to prove Theorem 6.1. We now consider a few examples of such convergence for dependent random variables. Example 6.2 For a random permutation π of the set 1, 2, . . . , n , let S be { } n the number of fixed points of π. Then, as n , the distribution of Sn converges to Poi(1). → ∞ n Solution. If the events (Am)m=1 are defined via Am = m is a fixed point of π , then n Sn = 1A . Notice that for 1 m1 < m2 < < mk n we have m=1 m ≤ ··· ≤ P (n k)! P Am Am Am = − , 1 ∩ 2 ∩ · · · ∩ k n! since the number of permutations which do not move any k given points is (n k)!. − By inclusion-exclusion, n P(Sn > 0) P Am = P(Am) P Am Am + ... ≡ ∪m=1 − 1 ∩ 2 m m