Stat 609: Mathematical Statistics Lecture 22

Chapter 6. Principles of Data Reduction Lecture 22: Sufficiency Data reduction We consider a sample X = (X1;:::;Xn), n > 1, from a population of interest (each Xi may be a vector and X may not be a random sample, although most of the time we consider a random sample). Assume the population is indexed by q, an unknown parameter vector. Let X be the range of X Let x be an observed data set, a realization of X. We want to use the information about q contained in x. The whole x may be hard to interpret, and hence we summarize the information by using a few key features (statistics). For example, the sample mean, sample variance, the largest and smallest order statistics. Let T (X) be a statistic. For T , if x 6= y but T (x) = T (y), then xbeamer-tu-logoand y provides the same information and can be treated as the same. UW-Madison (Statistics) Stat 609 Lecture 22 2015 1 / 15 T partitions X into sets At = fx : T (x) = tg; t 2 T (the range of T ) All points in At are treated the same if we are interested in T only. Thus, T provides a data reduction. We wish to reduce data as much as we can, but not lose any information about q (or at least important information). Sufficiency A sufficient statistic for q is a statistic that captures all the information about q contained in the sample. Formally we have the following definition. Definition 6.2.1 (sufficiency) A statistic T (X) is sufficient for q if the conditional distribution of X given T (X) = T (x) does not depend on q. The sufficiency depends on the parameter of interest. beamer-tu-logo UW-Madison (Statistics) Stat 609 Lecture 22 2015 2 / 15 If X is discrete, then so is T (X) and sufficiency means that P(X = xjT (X) = T (x)) is known, i.e., it does not depend on any unknown quantity. Once we observe x and compute a sufficient statistic T (x), the original data x do not contain any further information concerning q and can be discarded, i.e., T (x) is all we need regarding q. If we do need x, we can simulate a sample y from P(X = yjT (X) = T (x)) since it is known; the observed y may not be the same as x, but T (x) = T (y). Example 6.2.3 (binomial sufficient statistic) Suppose that X1;:::;Xn are iid Bernoullie variables with probability q. The joint pmf is 8 n x 1−x <> ∏q i (1 − q) i xi = 0;1;i = 1;:::;n fq (x1;:::;xn) = i=1 > : 0 otherwise beamer-tu-logo n Consider the statistic T (X) = ∑i=1 Xi , which is the number of ones in X. UW-Madison (Statistics) Stat 609 Lecture 22 2015 3 / 15 To show T is sufficient for q, we compute the conditional probability P(X = xjT = t). For t = 0;1;:::;n, let ( n ) Bt = x = (x1;:::;xn): xi = 0;1; ∑ xi = t : i=1 If x 62 Bt , then P(X = xjT = t) = 0. If x 2 Bt , then t n−t P(X = x;T = t) = P(X = x) = fq (x) = q (1 − q) : Also, since T ∼ binomial(n;p), n P(T = t) = q t (1 − q)n−t t Then, for t = 0;1;:::;n, P(X = x;T = t) 1 P(X = xjT = t) = = n x 2 Bt P(T = t) t beamer-tu-logo is a known pmf (does not depend on q). UW-Madison (Statistics) Stat 609 Lecture 22 2015 4 / 15 Hence T (X) is sufficient for q. For any realization x of X, x is a sequence of n ones and zeros. Since q is the probability of a one and T is the frequency of ones in x, it has all the information about q. Given T = t, what is left in the data set x is the redundant information about the positions of t ones, and we can reproduce the data set x if we want by using T = t. How to find sufficient statistics? To verify that a statistic T is a sufficient statistic for q by definition, we must verify that for any fixed values of x, the conditional distribution XjT (X) = T (x) does not depend on q. This may not be easy but at least we can try. But how do we find the form of T ? By guessing a statistic T that might be sufficient and computing the conditional distribution of XjT = t? For families of populations having pdfs or pmfs, a simple way of findingbeamer-tu-logo sufficient statistics is to use the following factorization theorem. UW-Madison (Statistics) Stat 609 Lecture 22 2015 5 / 15 Theorem 6.2.6 (the Factorization Theorem) Let fq (x) be the joint pdf or pmf of the sample X. A statistic T (X) is sufficient for q iff there are functions h (which does not depend on q) and gq (which depends on q) on the range of T such that fq (x) = gq T (x) h(x): In the binomial example, fq (x) = gq T (x) h(x) if we set 1 x = 0;1;i = 1;:::;n g (t) = q t (1 − q)n−t and h(x) = i q 0 otherwise Proof of Theorem 6.2.6 for the discrete case. Suppose that T (X) is sufficient. Let gq (t) = Pq (T (X) = t) and h(x) = P(X = xjT (X) = T (x)). Then fq (x) = Pq (X = x) = Pq (X = x;T (X) = T (x)) = Pq (T (X) = T (x))P(X = xjT (X) = T (x)) beamer-tu-logo = gq (T (x))h(x) UW-Madison (Statistics) Stat 609 Lecture 22 2015 6 / 15 Suppose now that fq (x) = gq T (x) h(x) for x 2 X . Let qq (t) be the pmf of T (X) and Ax = fy : T (y) = T (x)g. Then, for any x 2 X , f (x) g (T (x))h(x) g (T (x))h(x) q = q = q qq (T (x)) qq (T (x)) Pq (T (X) = T (x)) g (T (x))h(x) g (T (x))h(x) = q = q ∑y2Ax fq (y) ∑y2Ax gq (T (y))h(y) g (T (x))h(x) h(x) = q = gq (T (x))∑y2Ax h(y) ∑y2Ax h(y) which does not depend on q, i.e., T is sufficient for q. Example 6.2.4 (normal sufficient statistic) 2 2 Let X1;:::;Xn be iid N(m;s ), q = (m;s ); the joint pdf is n n 2 ! 1 2 2 1 (xi − m) f (x) = p e−(xi −m) =2s = exp − q ∏ n=2 n ∑ 2 i=1 2ps (2p) s i=1 2s ! 1 n (x − x¯)2 n(x¯ − m)2 = exp − i − beamer-tu-logo n=2 n ∑ 2 2 (2p) s i=1 2s 2s UW-Madison (Statistics) Stat 609 Lecture 22 2015 7 / 15 1 (n − 1)s2 n(x¯ − m)2 = exp − − (2p)n=2s n 2s 2 2s 2 2 −1 n 2 where s = (n − 1) ∑i=1(xi − x¯) , the realization of the sample 2 −1 n ¯ 2 variance S = (n − 1) ∑i=1(Xi − X) . Hence, by Theorem 6.2.6, (X¯ ;S2) is a two-dimensional sufficient statistic for q = (m;s 2). If s 2 is known, then X¯ is sufficient for m. If m is known, then S2 is sufficient for s 2. If both m and s 2 are unknown, we cannot say that X¯ is sufficient for m (or S2 is sufficient for s 2); the correct statement is that X¯ and S2 together is sufficient for m and s 2. We can also say that (X¯ ;S2) is sufficient for m (or s 2). Sufficiency for a sub-family Let q be a parameter and h be a subset of components of q. If T is sufficient for q, then it is also sufficient for h. beamer-tu-logo UW-Madison (Statistics) Stat 609 Lecture 22 2015 8 / 15 Example 6.2.5 (sufficient order statistics) Let X1;:::;Xn be iid with a pdf fq and X(1);:::;X(n) be the order statistics. The joint pdf of X = (X1;:::;Xn) is n n ∏ fq (xi ) = ∏ fq (x(i)) i=1 i=1 where x(1);:::;x(n) are the ordered values of x1;:::;xn. Then, by the factorization theorem, (X(1);:::;X(n)) is sufficient for q. Intuitively, given the order statistics, what is left in the original data set is the information regarding the positions of x1;:::;xn and, hence, the set of order statistics is sufficient whenever positions of xi ’s are not of interest. One-to-one transformations of a sufficient statistic It follows from the factorization theorem that, if T is sufficient and U is a one-to-one function of T , then U is also sufficient. But this is also true in general by the definition of sufficiency. beamer-tu-logo UW-Madison (Statistics) Stat 609 Lecture 22 2015 9 / 15 In the order statistics problem, U = (U1;:::;Un) is a one-to-one function n k of (X(1);:::;X(n)), where Uk = ∑i=1 Xi , k = 1;:::;n. Hence, U is also sufficient for q. Example 6.2.8 (uniform sufficient statistic) Let X1;:::;Xn be iid from uniform(0;q), where q > 0 is the unknown parameter. The joint pdf of X1;:::;Xn is n n 1 1 ∏ fq (xi ) = ∏ I(f0 < xi < qg) = n I(f0 < x(n) < qg) i=1 i=1 q q with x(n) being the largest value of x1;:::;xn.

Stat 609: Mathematical Statistics Lecture 22

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support