Stat 609: Mathematical Statistics Lecture 22

Chapter 6. Principles of Data Reduction Lecture 22: Sufficiency Data reduction We consider a sample X = (X1;:::;Xn), n > 1, from a population of interest (each Xi may be a vector and X may not be a random sample, although most of the time we consider a random sample). Assume the population is indexed by q, an unknown parameter vector. Let X be the range of X Let x be an observed data set, a realization of X. We want to use the information about q contained in x. The whole x may be hard to interpret, and hence we summarize the information by using a few key features (statistics). For example, the sample mean, sample variance, the largest and smallest order statistics. Let T (X) be a statistic. For T , if x 6= y but T (x) = T (y), then xbeamer-tu-logoand y provides the same information and can be treated as the same. UW-Madison (Statistics) Stat 609 Lecture 22 2015 1 / 15 T partitions X into sets At = fx : T (x) = tg; t 2 T (the range of T ) All points in At are treated the same if we are interested in T only. Thus, T provides a data reduction. We wish to reduce data as much as we can, but not lose any information about q (or at least important information). Sufficiency A sufficient statistic for q is a statistic that captures all the information about q contained in the sample. Formally we have the following definition. Definition 6.2.1 (sufficiency) A statistic T (X) is sufficient for q if the conditional distribution of X given T (X) = T (x) does not depend on q. The sufficiency depends on the parameter of interest. beamer-tu-logo UW-Madison (Statistics) Stat 609 Lecture 22 2015 2 / 15 If X is discrete, then so is T (X) and sufficiency means that P(X = xjT (X) = T (x)) is known, i.e., it does not depend on any unknown quantity. Once we observe x and compute a sufficient statistic T (x), the original data x do not contain any further information concerning q and can be discarded, i.e., T (x) is all we need regarding q. If we do need x, we can simulate a sample y from P(X = yjT (X) = T (x)) since it is known; the observed y may not be the same as x, but T (x) = T (y). Example 6.2.3 (binomial sufficient statistic) Suppose that X1;:::;Xn are iid Bernoullie variables with probability q. The joint pmf is 8 n x 1−x <> ∏q i (1 − q) i xi = 0;1;i = 1;:::;n fq (x1;:::;xn) = i=1 > : 0 otherwise beamer-tu-logo n Consider the statistic T (X) = ∑i=1 Xi , which is the number of ones in X. UW-Madison (Statistics) Stat 609 Lecture 22 2015 3 / 15 To show T is sufficient for q, we compute the conditional probability P(X = xjT = t). For t = 0;1;:::;n, let ( n ) Bt = x = (x1;:::;xn): xi = 0;1; ∑ xi = t : i=1 If x 62 Bt , then P(X = xjT = t) = 0. If x 2 Bt , then t n−t P(X = x;T = t) = P(X = x) = fq (x) = q (1 − q) : Also, since T ∼ binomial(n;p), n P(T = t) = q t (1 − q)n−t t Then, for t = 0;1;:::;n, P(X = x;T = t) 1 P(X = xjT = t) = = n x 2 Bt P(T = t) t beamer-tu-logo is a known pmf (does not depend on q). UW-Madison (Statistics) Stat 609 Lecture 22 2015 4 / 15 Hence T (X) is sufficient for q. For any realization x of X, x is a sequence of n ones and zeros. Since q is the probability of a one and T is the frequency of ones in x, it has all the information about q. Given T = t, what is left in the data set x is the redundant information about the positions of t ones, and we can reproduce the data set x if we want by using T = t. How to find sufficient statistics? To verify that a statistic T is a sufficient statistic for q by definition, we must verify that for any fixed values of x, the conditional distribution XjT (X) = T (x) does not depend on q. This may not be easy but at least we can try. But how do we find the form of T ? By guessing a statistic T that might be sufficient and computing the conditional distribution of XjT = t? For families of populations having pdfs or pmfs, a simple way of findingbeamer-tu-logo sufficient statistics is to use the following factorization theorem. UW-Madison (Statistics) Stat 609 Lecture 22 2015 5 / 15 Theorem 6.2.6 (the Factorization Theorem) Let fq (x) be the joint pdf or pmf of the sample X. A statistic T (X) is sufficient for q iff there are functions h (which does not depend on q) and gq (which depends on q) on the range of T such that fq (x) = gq T (x) h(x): In the binomial example, fq (x) = gq T (x) h(x) if we set 1 x = 0;1;i = 1;:::;n g (t) = q t (1 − q)n−t and h(x) = i q 0 otherwise Proof of Theorem 6.2.6 for the discrete case. Suppose that T (X) is sufficient. Let gq (t) = Pq (T (X) = t) and h(x) = P(X = xjT (X) = T (x)). Then fq (x) = Pq (X = x) = Pq (X = x;T (X) = T (x)) = Pq (T (X) = T (x))P(X = xjT (X) = T (x)) beamer-tu-logo = gq (T (x))h(x) UW-Madison (Statistics) Stat 609 Lecture 22 2015 6 / 15 Suppose now that fq (x) = gq T (x) h(x) for x 2 X . Let qq (t) be the pmf of T (X) and Ax = fy : T (y) = T (x)g. Then, for any x 2 X , f (x) g (T (x))h(x) g (T (x))h(x) q = q = q qq (T (x)) qq (T (x)) Pq (T (X) = T (x)) g (T (x))h(x) g (T (x))h(x) = q = q ∑y2Ax fq (y) ∑y2Ax gq (T (y))h(y) g (T (x))h(x) h(x) = q = gq (T (x))∑y2Ax h(y) ∑y2Ax h(y) which does not depend on q, i.e., T is sufficient for q. Example 6.2.4 (normal sufficient statistic) 2 2 Let X1;:::;Xn be iid N(m;s ), q = (m;s ); the joint pdf is n n 2 ! 1 2 2 1 (xi − m) f (x) = p e−(xi −m) =2s = exp − q ∏ n=2 n ∑ 2 i=1 2ps (2p) s i=1 2s ! 1 n (x − x¯)2 n(x¯ − m)2 = exp − i − beamer-tu-logo n=2 n ∑ 2 2 (2p) s i=1 2s 2s UW-Madison (Statistics) Stat 609 Lecture 22 2015 7 / 15 1 (n − 1)s2 n(x¯ − m)2 = exp − − (2p)n=2s n 2s 2 2s 2 2 −1 n 2 where s = (n − 1) ∑i=1(xi − x¯) , the realization of the sample 2 −1 n ¯ 2 variance S = (n − 1) ∑i=1(Xi − X) . Hence, by Theorem 6.2.6, (X¯ ;S2) is a two-dimensional sufficient statistic for q = (m;s 2). If s 2 is known, then X¯ is sufficient for m. If m is known, then S2 is sufficient for s 2. If both m and s 2 are unknown, we cannot say that X¯ is sufficient for m (or S2 is sufficient for s 2); the correct statement is that X¯ and S2 together is sufficient for m and s 2. We can also say that (X¯ ;S2) is sufficient for m (or s 2). Sufficiency for a sub-family Let q be a parameter and h be a subset of components of q. If T is sufficient for q, then it is also sufficient for h. beamer-tu-logo UW-Madison (Statistics) Stat 609 Lecture 22 2015 8 / 15 Example 6.2.5 (sufficient order statistics) Let X1;:::;Xn be iid with a pdf fq and X(1);:::;X(n) be the order statistics. The joint pdf of X = (X1;:::;Xn) is n n ∏ fq (xi ) = ∏ fq (x(i)) i=1 i=1 where x(1);:::;x(n) are the ordered values of x1;:::;xn. Then, by the factorization theorem, (X(1);:::;X(n)) is sufficient for q. Intuitively, given the order statistics, what is left in the original data set is the information regarding the positions of x1;:::;xn and, hence, the set of order statistics is sufficient whenever positions of xi ’s are not of interest. One-to-one transformations of a sufficient statistic It follows from the factorization theorem that, if T is sufficient and U is a one-to-one function of T , then U is also sufficient. But this is also true in general by the definition of sufficiency. beamer-tu-logo UW-Madison (Statistics) Stat 609 Lecture 22 2015 9 / 15 In the order statistics problem, U = (U1;:::;Un) is a one-to-one function n k of (X(1);:::;X(n)), where Uk = ∑i=1 Xi , k = 1;:::;n. Hence, U is also sufficient for q. Example 6.2.8 (uniform sufficient statistic) Let X1;:::;Xn be iid from uniform(0;q), where q > 0 is the unknown parameter. The joint pdf of X1;:::;Xn is n n 1 1 ∏ fq (xi ) = ∏ I(f0 < xi < qg) = n I(f0 < x(n) < qg) i=1 i=1 q q with x(n) being the largest value of x1;:::;xn.

Load more