Lecture 14: Inference and Asymptotic Approach Example 2.28 2 Let X1,...,Xn Be I.I.D

Lecture 14: Inference and asymptotic approach Example 2.28 2 Let X1;:::;Xn be i.i.d. from the N(m;s ) distribution with an unknown m 2 R and a known s 2. Consider the hypotheses H0 : m ≤ m0 versus H1 : m > m0, where m0 is a fixed constant. Since the sample mean X¯ is sufficient for m 2 R, it is reasonable to ¯ consider the following class of tests: Tc(X) = I(c;¥)(X), i.e., H0 is rejected (accepted) if X¯ > c (X¯ ≤ c), where c 2 R is a fixed constant. Let Φ be the c.d.f. of N(0;1). By the property of the normal distributions, p n(c − m) a (m) = P(T (X) = 1) = 1 − Φ : Tc c s Figure 2.2 provides an example of a graph of two types of error probabilities, with m0 = 0. Since Φ(t) is an increasing function of t, beamer-tu-logo UW-Madison (Statistics) Stat 709 Lecture 14 2018 1 / 18 p n(c − m0) sup aTc (m) = 1 − Φ : P2P0 s In fact, it is also true that p n(c − m0) sup [1 − aTc (m)] = Φ : P2P1 s If we would like to use an a as the level of significance, then the most effective way is to choose a ca (a test Tca (X)) such that = sup ( ); a aTca m P2P0 in which case ca must satisfy p n(c − m ) 1 − Φ a 0 = a; s p −1 i.e., ca = sz1−a = n + m0, where za = Φ (a). In Chapter 6, it is shown that for any test T (X) satisfying sup (P) ≤ , P2P0 aT a beamer-tu-logo 1 − ( ) ≥ 1 − ( ); > : aT m aTca m m m0 UW-Madison (Statistics) Stat 709 Lecture 14 2018 2 / 18 Choice of significance level The choice of a level of significance a is usually somewhat subjective. In most applications there is no precise limit to the size of T that can be tolerated. Standard values, 0.10, 0.05, and 0.01, are often used for convenience. For most tests satisfying sup (P) ≤ , a small leads to a P2P0 aT a a “small" rejection region. p-value It is good practice to determine not only whether H0 is rejected for a given a and a chosen test Ta , but also the smallest possible level of significance at which H0 would be rejected for the computed Ta (x), i.e., ab = inffa 2 (0;1): Ta (x) = 1g: Such an a, which depends on x and the chosen test and is a statistic, b beamer-tu-logo is called the p-value for the test Ta . UW-Madison (Statistics) Stat 709 Lecture 14 2018 3 / 18 Example 2.29 Let us calculate the p-value for Tca in Example 2.28. Note that p p n(c − m ) n(x¯ − m ) a = 1 − Φ a 0 > 1 − Φ 0 s s ¯ if and only if x > ca (or Tca (x) = 1). Hence p n(x¯ − m ) 1 − Φ 0 = inffa 2 (0;1): T (x) = 1g = a(x) s ca b is the p-value for Tca . It turns out that Tca (x) = I(0;a)(ab(x)). Remarks With the additional information provided by p-values, using p-values is typically more appropriate than using fixed-level tests in a scientific problem. In some cases, a fixed level of significance is unavoidable whenbeamer-tu-logo acceptance or rejection of H0 is a required decision. UW-Madison (Statistics) Stat 709 Lecture 14 2018 4 / 18 Randomized tests In Example 2.28, sup (P) = can always be achieved by a P2P0 aT a suitable choice of c. This is, however, not true in general. We need to consider randomized tests. Recall that a randomized decision rule is a probability measure d(x;·) on the action space for any fixed x. Since the action space contains only two points, 0 and 1, for a hypothesis testing problem, any randomized test d(X;A) is equivalent to a statistic T (X) 2 [0;1] with T (x) = d(x;f1g) and 1 − T (x) = d(x;f0g). A nonrandomized test is obviously a special case where T (x) does not take any value in (0;1). For any randomized test T (X), we define the type I error probability to be aT (P) = E[T (X)], P 2 P0, and the type II error probability to be 1 − aT (P) = E[1 − T (X)], P 2 P1. For a class of randomized tests, we would like to minimize 1 − aT (P) beamer-tu-logo subject to sup (P) = . P2P0 aT a UW-Madison (Statistics) Stat 709 Lecture 14 2018 5 / 18 Example 2.30 Assume that the sample X has the binomial distribution Bi(q;n) with an unknown q 2 (0;1) and a fixed integer n > 1. Consider the hypotheses H0 : q 2 (0;q0] versus H1 : q 2 (q0;1), where q0 2 (0;1) is a fixed value. Consider the following class of randomized tests: 8 < 1 X > j Tj;q(X) = q X = j : 0 X < j; where j = 0;1;:::;n − 1 and q 2 [0;1]. aTj;q (q) = P(X > j) + qP(X = j) 0 < q ≤ q0 1 − aTj;q (q) = P(X < j) + (1 − q)P(X = j) q0 < q < 1: It can be shown that for any a 2 (0;1), there exist an integer j and q 2 (0;1) such that the size of Tj;q is a. beamer-tu-logo UW-Madison (Statistics) Stat 709 Lecture 14 2018 6 / 18 Asymptotic approach In decision theory and inference, a key is to find moments and/or distributions of various statistics, which is difficult in general. When the sample size n is large, we may approximate the moments and distributions of statistics by those of the limiting distributions using the asymptotic tools discussed in §1.5, which leads to some asymptotic statistical procedures and asymptotic criteria for assessing performances. The asymptotic approach also provides a simpler solution (e.g., in computation) and requires less stringent model/loss assumption that itself is an approximation, as for a large sample, the statistical properties is less dependent on the loss functions and models. A major weakness of the asymptotic approach is that typically we don’t know whether a particular n in a problem is large enough. To overcome this difficulty, asymptotic results are often used with some numerical/empirical studies for selected values of n to beamer-tu-logo examine the finite sample performance of asymptotic procedures. UW-Madison (Statistics) Stat 709 Lecture 14 2018 7 / 18 Definition 2.10 (Consistency of point estimators) Let X = (X1;:::;Xn) be a sample from P 2 P, Tn(X) be an estimator of J for every n, and fang be a sequence of positive constants, an ! ¥. (i) Tn(X) is consistent for J iff Tn(X) !p J w.r.t. any P. (ii) Tn(X) is an-consistent for J iff an[Tn(X) − J] = Op(1) w.r.t. any P. (iii) Tn(X) is strongly consistent for J iff Tn(X) !a:s: J w.r.t. any P. (iv) Tn(X) is Lr -consistent for J iff Tn(X) !Lr J w.r.t. any P for some fixed r > 0; if r = 2, L2-consistency is called consistency in mse. Consistency is actually a concept relating to a sequence of estimators, fTng, but we just say “consistency of Tn" for simplicity. Each of the four types of consistency in Definition 2.10 describes the convergence of Tn(X) to J in some sense, as n ! ¥. A reasonable point estimator is expected to perform better, at least on the average, if more data (larger n) are available. Although the estimation error of Tn for a fixed n may never be 0, it is distasteful to use Tn which, if sampling were to continue beamer-tu-logo indefinitely, could still have a nonzero estimation error. UW-Madison (Statistics) Stat 709 Lecture 14 2018 8 / 18 Methods of proving consistency One or a combination of the WLLN, the CLT, Slutsky’s theorem, and the continuous mapping theorem (Theorems 1.10 and 1.12) can typically be applied to establish consistency of point estimators. For example, X¯ is consistent for population mean m (SLLN), and g(X¯ 2) is consistent for g(m) for any continuous function g. Example 2.34 Let X1;:::;Xn be i.i.d. from an unknown P with a continuous c.d.f. F satisfying F(q) = 1 for some q 2 R and F(x) < 1 for any x < q. Consider the largest order statistic X(n) as an estimator of q. For any e > 0, F(q − e) < 1 and n P(jX(n) − qj ≥ e) = P(X(n) ≤ q − e) = [F(q − e)] ; which imply (according to Theorem 1.8(v)) X(n) !a:s: q, i.e., X(n) is strongly consistent for q. If we assume that F (i)(q−), the ith-order left-hand derivative of F at q, (m+1) exists and vanishes for any i ≤ m and that F (q−) exists andbeamer-tu-logo is nonzero, where m is a nonnegative integer, then UW-Madison (Statistics) Stat 709 Lecture 14 2018 9 / 18 Example 2.34 (continued) (−1)mF (m+1)(q−) 1 − F(X ) = (q − X )m+1 + o jq − X jm+1 a.s. (n) (m + 1)! (n) (n) n This result and the fact that P n[1 − F(X(n))] ≥ s = (1 − s=n) imply −1 that (q − X )m+1 = O (n−1), i.e., X is n(m+1) -consistent.

Load more