Mathematical Statistics the Sample Distribution of the Median
Total Page:16
File Type:pdf, Size:1020Kb
Course Notes for Math 162: Mathematical Statistics The Sample Distribution of the Median Adam Merberg and Steven J. Miller February 15, 2008 Abstract We begin by introducing the concept of order statistics and ¯nding the density of the rth order statistic of a sample. We then consider the special case of the density of the median and provide some examples. We conclude with some appendices that describe some of the techniques and background used. Contents 1 Order Statistics 1 2 The Sample Distribution of the Median 2 3 Examples and Exercises 4 A The Multinomial Distribution 5 B Big-Oh Notation 6 C Proof That With High Probability jX~ ¡ ¹~j is Small 6 D Stirling's Approximation Formula for n! 7 E Review of the exponential function 7 1 Order Statistics Suppose that the random variables X1;X2;:::;Xn constitute a sample of size n from an in¯nite population with continuous density. Often it will be useful to reorder these random variables from smallest to largest. In reordering the variables, we will also rename them so that Y1 is a random variable whose value is the smallest of the Xi, Y2 is the next smallest, and th so on, with Yn the largest of the Xi. Yr is called the r order statistic of the sample. In considering order statistics, it is naturally convenient to know their probability density. We derive an expression for the distribution of the rth order statistic as in [MM]. Theorem 1.1. For a random sample of size n from an in¯nite population having values x and density f(x), the probability th density of the r order statistic Yr is given by ·Z ¸ ·Z ¸ n! yr r¡1 1 n¡r gr(yr) = f(x) dx f(yr) f(x) dx : (1.1) (r ¡ 1)!(n ¡ r)! ¡1 yr Proof. Let h be a positive real number. We divide the real line into three intervals: (¡1; yr), [yr; yr +h], and (yr +h; 1). We will ¯rst ¯nd the probability that Yr falls in the middle of these three intervals, and no other value from the sample falls in this interval. In order for this to be the case, we must have r ¡1 values falling in the ¯rst interval, one value falling in the second, and n ¡ r falling in the last interval. Using the multinomial distribution, which is explained in Appendix A, the probability of this event is Prob(Yr 2 [yr; yr + h] and Yi 6= [yr; yr + h] if i 6= r) ·Z ¸ "Z #1 ·Z ¸ n! yr r¡1 yr +h 1 n¡r = f(x) dx f(x) dx f(x) dx : (1.2) (r ¡ 1)!1!(n ¡ r)! ¡1 yr yr +h 1 We need also consider the case of two or more of the Yi lying in [yr; yr + h]. As this interval has length h, this probability is O(h2) (see Appendix B for a review of big-Oh notation such as O(h2)). Thus we may remove the constraint 2 that exactly one Yi 2 [yr; yr + h] in (1.2) at a cost of at most O(h ), which yields · ¸ " #1 · ¸ Z yr r¡1 Z yr +h Z 1 n¡r n! 2 Prob(Yr 2 [yr; yr + h]) = f(x) dx f(x) dx f(x) dx + O(h ): (1.3) (r ¡ 1)!1!(n ¡ r)! ¡1 yr yr +h 1 We now apply the Mean Value Theorem to ¯nd that for some ch;yr with yr · ch;yr · yr + h, we have Z yr +h f(x) dx = h ¢ f(ch;yr ): (1.6) yr We denote the point provided by the mean value theorem by ch;yr in order to emphasize its dependence on h and yr. We can substitute this result into the expression of (1.3). We divide the result by h (the length of the middle interval [yr; yr + h]), and consider the limit as h ! 0: hR ir¡1 hR i1 hR in¡r n! yr f(x) dx 1 f(x) dx 1 f(x) dx + O(h2) Prob(Y 2 [y ; y + h]) (r¡1)!1!(n¡r)! ¡1 yr +h yr +h lim r r r = lim h!0 h h!0 h hR ir¡1 hR in¡r n! yr 1 (r¡1)!1!(n¡r)! ¡1 f(x) dx h ¢ f(ch;yr ) y +h f(x) dx = lim r h!0 h ·Z ¸ ·Z ¸ n! yr r¡1 1 n¡r = lim f(x) dx f(ch;y ) f(x) dx h!0 r (r ¡ 1)!1!(n ¡ r)! ¡1 yr +h ·Z ¸ ·Z ¸ n! yr r¡1 1 n¡r = f(x) dx f(yr) f(x) dx : (1.7) (r ¡ 1)!1!(n ¡ r)! ¡1 yr Thus the proof is reduced to showing that the left hand side above is gr(yr). Let gr(yr) be the probability density of Yr. Let Gr(yr) be the cumulative distribution function of Yr. Thus Z y Prob(Yr · y) = gr(yr)dyr = Gr(y); (1.8) ¡1 0 and Gr(y) = gr(y). Thus the left hand side of (1.7) equals Prob(Yr 2 [yr; yr + h]) Gr(yr + h) ¡ Gr(yr) lim = lim = gr(yr); (1.9) h!0 h h!0 h where the last equality follows from the de¯nition of the derivative. This completes the proof. Remark 1.2. The technique employed in this proof is a common method for calculating probability densities. We ¯rst calculate the probability that a random variable Y lies in an in¯nitesimal interval [y; y + h]. This probability is G(y + h) ¡ G(y), where g is the density of Y and G is the cumulative distribution function (so G0 = g). The de¯nition of the derivative yields Prob(Y 2 [y; y + h]) G(y + h) ¡ G(y) lim = lim = g(y): (1.10) h!0 h h!0 h 2 The Sample Distribution of the Median In addition to the smallest (Y1) and largest (Yn) order statistics, we are often interested in the sample median, X~. For a sample of odd size, n = 2m + 1, the sample median is de¯ned as Ym+1. If n = 2m is even, the sample median is de¯ned 1 as 2 (Ym + Ym+1). We will prove a relation between the sample median and the population median ¹~. By de¯nition, ¹~ satis¯es Z ¹~ 1 f(x) dx = : (2.11) ¡1 2 1If F is an anti-derivative of f, then the Mean Value Theorem applied to F , F (b) ¡ F (a) = F 0(c) (1.4) b ¡ a is equivalent to Z b f(x)dx = (b ¡ a) ¢ f(c): (1.5) a 2 It is convenient to re-write the above in terms of the cumulative distribution function. If F is the cumulative distri- bution function of f, then F 0 = f and (2.11) becomes 1 F (~¹) = : (2.12) 2 We are now ready to consider the distribution of the sample median. Median Theorem. Let a sample of size n = 2m + 1 with n large be taken from an in¯nite population with a density function f(~x) that is nonzero at the population median ¹~ and continuously di®erentiable in a neighborhood of ¹~. The 1 sampling distribution of the median is approximately normal with mean ¹~ and variance 8f(~¹)2m . Proof. Let the median random variable X~ have valuesx ~ and density g(~x). The median is simply the (m + 1)th order statistic, so its distribution is given by the result of the previous section. By Theorem 1.1, "Z #m ·Z ¸ (2m + 1)! x~ 1 m g(~x) = f(~x) dx f(~x) f(x) dx : (2.13) m!m! ¡1 x~ We will ¯rst ¯nd an approximationp for the constant factor in this equation. For this, we will use Stirling's approximation, which tells us that n! = nne¡n 2¼n(1 + O(n¡1)); we sketch a proof in Appendix D. We will consider values su±ciently large so that the terms of order 1=n need not be considered. Hence p (2m + 1)! (2m + 1)(2m)! (2m + 1)(2m)2me¡2m 2¼(2m) (2m + 1)4m = ¼ p = p : (2.14) m!m! (m!)2 (mme¡m 2¼m)2 ¼m R x~ As F is the cumulative distribution function, F (~x) = ¡1 f(x) dx, which implies (2m + 1)4m g(~x) ¼ p [F (~x)]m f(~x) [1 ¡ F (~x)]m : (2.15) ¼m We will need the Taylor series expansion of F (~x) about¹ ~, which is just F (~x) = F (~¹) + F 0(~¹)(~x ¡ ¹~) + O((~x ¡ ¹~)2): (2.16) Because¹ ~ is the population median, F (~¹) = 1=2. Further, since F is the cumulative distribution function, F 0 = f and we ¯nd 1 F (~x) = + f(~¹)(~x ¡ ¹~) + O((~x ¡ ¹~)2): (2.17) 2 This approximation is only useful ifx ~ ¡ ¹~ is small; in other words, we need limm!1 jx~ ¡ ¹~j = 0. Fortunately this is easy to show, and a proof is included in Appendix C. Letting t =x ~ ¡ ¹~ (which is small and tends to 0 as m ! 1), substituting our Taylor series expansion into (2.15) yields2 · ¸ · µ ¶¸ (2m + 1)4m 1 m 1 m g(~x) ¼ p + f(~¹)t + O(t2) f(~x) 1 ¡ + f(~¹)t + O(t2) : (2.18) ¼m 2 2 By rearranging and combining factors, we ¯nd that · ¸ · ¸ (2m + 1)4m 1 m (2m + 1)f(~x) 4m(f(~¹)t)2 m g(~x) ¼ p f(~x) ¡ (f(~¹)t)2 + O(t3) = p 1 ¡ + O(t3) : (2.19) ¼m 4 ¼m m Remember that one de¯nition of ex is ³ x ´n ex = exp(x) = lim 1 + ; (2.20) n!1 n see Appendix E for a review of properties of the exponential function.