A Cheap Trick to Improve the Power of a Conservative Hypothesis Test

The American Statistician ISSN: 0003-1305 (Print) 1537-2731 (Online) Journal homepage: https://www.tandfonline.com/loi/utas20 A Cheap Trick to Improve the Power of a Conservative Hypothesis Test Thomas J. Fisher & Michael W. Robbins To cite this article: Thomas J. Fisher & Michael W. Robbins (2019) A Cheap Trick to Improve the Power of a Conservative Hypothesis Test, The American Statistician, 73:3, 232-242, DOI: 10.1080/00031305.2017.1395364 To link to this article: https://doi.org/10.1080/00031305.2017.1395364 View supplementary material Accepted author version posted online: 14 Nov 2017. Published online: 17 Jul 2018. Submit your article to this journal Article views: 759 View related articles View Crossmark data Full Terms & Conditions of access and use can be found at https://www.tandfonline.com/action/journalInformation?journalCode=utas20 THE AMERICAN STATISTICIAN 2019, VOL. 73, NO. 3, 232–242: General https://doi.org/./.. A Cheap Trick to Improve the Power of a Conservative Hypothesis Test Thomas J. Fishera and Michael W. Robbinsb aDepartment of Statistics, Miami University, Oxford, OH; bRAND Corporation, Pittsburgh, PA ABSTRACT ARTICLE HISTORY Critical values and p-values of statistical hypothesis tests are often derived using asymptotic approxima- Received March tions of sampling distributions. However, this sometimes results in tests that are conservative (i.e., under- Revised September state the frequency of an incorrectly rejected null hypothesis by employing too stringent of a threshold for rejection). Although computationally rigorous options (e.g., the bootstrap) are available for such situations, KEYWORDS we illustrate that simple transformations can be used to improve both the size and power of such tests. Asymptotic; Bootstrap; Conservative; Hypothesis Using a logarithmic transformation, we show that the transformed statistic is asymptotically equivalent to test; Logarithmic its untransformed analogue under the null hypothesis and is divergent from the untransformed version transformation; Power; Size under the alternative (yielding a potentially substantial increase in power). The transformation is applied distortion to several easily-accessible statistical hypothesis tests, a few of which are taught in introductory statistics courses. With theoretical arguments and simulations, we illustrate that the log transformation is preferable to other forms of correction (such as statistics that use a multiplier). Finally, we illustrate application of the method to a well-known dataset. Supplementary materials for this article are available online. 1. Introduction Likewise, many statistics based on normal theory (the ANOVA Hypothesis testing has a rich and extensive history budding F-test, e.g.), tend to be conservative when the underlying data from astronomy, finance, genetics, and the social sciences (see have a distribution with larger tails than the normal distribution Stigler 1986). From its foundations in the trial of the Pyx at (see Pearson 1931; Glass, Peckham, and Sanders 1972). Further- the Royal Mint of London in the 13th century (see Stigler more, the Wald test included in nearly all statistical software for 1999,chap.21),throughitsearlyprobabilisticandmathemat- generalized linear models is known to be overly conservative for ical development by Bernoulli, Euler, Gauss, Laplace, Legendre, logistic regression (see Hauck and Donner 1977; Jennings 1986; and Markov (see Hald 2007, chap. 3–4), to the indispensable Hirji, Mehta, and Patel 1987;CoxandSnell1989,tonameafew). results of the early 20th century (consider Student 1908;Pear- The reduction in the rate of Type I errors seen in conserva- son 1900,tonameafew),thehypothesistesthasrevolutionized tive hypothesis tests has the adverse side effect of a reduction in the practice of modern science. The formulation of the mod- power to detect a false null hypothesis. Therefore, corrections ern statistical test can be traced to the competing philosophies for this issue are of interest, and consequentially methods exist of Fisher (1925) and Neyman and Pearson (1933), and a con- that improve the performance of asymptotic approximations volution of the two approaches is standard practice today; see in finite samples. Consider, for example, Edgeworth expansion Lehmann (1999). (Hall 1992), which involves modification of the asymptotic dis- Key to the implementation of a statistical hypothesis test is tribution for finite samples by including higher order moments the sampling distribution of the test statistic. Even with the (skewness and kurtosis); however, this method may require advent of the bootstrap (Efron 1979) and the practicality of exorbitant algebraic results (the requisite theory has not been Bayesian methods due to the evolution of computing, in prac- developed for many statistics that are conservative). Further, tice, many statistical results are still based on theoretical sam- bootstrapping is a popular method wherein a sampling dis- pling distributions. In many cases, this distribution is approxi- tribution is approximated via a resampling scheme (from the mated using an asymptotic result (e.g., a central limit theorem), observed data in the form a nonparametric bootstrap or via andinfinitesamples,thecriticalvaluesandp-values are approx- simulation in the form a parametric bootstrap), but in many imated from the asymptotic distribution. However, the practice applications, this method requires a practitioner to implement of using asymptotic approximations of sampling distributions the algorithm and can mandate a substantial computational sometimes results in statistical tests that are conservative (i.e., cost (Efron 1979). has a smaller than desired rate of rejections of a true null hypoth- Therefore, easily applicable and general methods for correct- esis). Conservative tests are frequently yielded when the statistic ing conservative test statistics are worth exploring. In this arti- is derived from a point process that has a limit distribution based cle, we propose a simple transformation that when applied to a on a continuous stochastic process such as the Kolmogorov– test statistic will increase its detection power under the alterna- Smirnov test (see Lilliefors 1967; Hollander and Wolfe 1999). tive hypothesis while retaining the same sampling distribution CONTACT Thomas J. Fisher ﬁ[email protected] Department of Statistics, Miami University, Bishop Circle, Oxford, OH . Supplementary materials for this article are available online. Please go to www.tandfonline.com/r/TAS © 2019 American Statistical Association THE AMERICAN STATISTICIAN 233 (asymptotically) under the null. The transformation is argued approximations for regression-based t-andF-tests. This article to improve performance under the null hypothesis when applied focuses on circumstances where the finite-sample distribution to statistics that are conservative; however, it will likely lead to of the test statistic is not understood and where critical values an undesirable rate of Type I error if applied to statistics that are approximated (typically using asymptotic distributions), are not conservative. In Section 2, we develop our result and which may lead to unreliable performance in finite samples. relate it to statistical practice, uniformly most power (UMP) Next, we illustrate how transformations can be used to improve tests, and the standard undergraduate curriculum. With some finite-sample performance. commonly used test statistics as motivating examples, Section 3 For a given statistic Tn satisfying the assumptions above, we provides some simulations demonstrating the potential increase propose the following modified test statistic: in power yielded by our method and comparisons to resampling ∗ =− κ ( − / κ ). techniques. Our approach is applied to an interesting empiri- Tn n log 1 Tn n (1) cal dataset in Section 4. We follow standard statistical notation The following two theorems demonstrate that T ∗ shares the throughout this article. H and H represent the null and alterna- n 0 1 same asymptotic distribution as T under the null hypothesis but tive hypotheses of a statistical test, respectively. The probability n is more likely to detect the alternative hypothesis when true (i.e., of a Type I error (rejecting H when true) is labeled α, while the 0 power). probability of a Type II error (failing to reject H0 when false) is β − β denoted .Thepowerofastatisticaltestis1 (probability of ∗ →p →∞ Theorem 2.1. When H0 is true, Tn Tn as n ;moreover, rejecting H0 when false). ∗ Tn and Tn share the same asymptotic distribution. Proof. Consider 2. Test Statistics ∗ κ κ T =−n log(1 − Tn/n ) X ={ , ,..., } = (X ) n Let n X1 X2 Xn be a sample and Tn Tn n denote 2 3 κ T 1 T 1 T a statistic for testing the competing hypotheses H0 and H1.Fur- = n + n + n +··· n κ κ κ thermore, assume the following: n 2 n 3 n ( ≥ ) = (a) Tn is strictly nonnegative: P Tn 0 1, T 2 T 3 (b) when H is true: T = O (1) (likewise, T has a limit = T + n + n +··· 0 n p n n 2nκ 3n2κ distribution), κ = Tn + An. (2) (c) when H1 is true: Tn = Op(n ) for some κ>0; that is, Tn diverges to +∞ at rate nκ , −κ When H0 is true, we see An = Op(n ) from assumption (a), where O (·) represents order in probability (i.e., X = O (a ) p p n p n whence T ∗ → T ,andT ∗ shares the same asymptotic distribu- means that for any >0, there exist a finite M > 0, such that n n n tion as Tn from standard convergence results. P(|Xn/an| > M)< for all n). Note that as a consequence ∗ of (c), H0 is rejected for large values of Tn. Many standard Theorem 2.2. When H1 is true, Tn diverges from Tn and will be statistical methods satisfy the assumptions (the ANOVA F-test, more powerful than Tn if decisions are based on the same critical the Pearson χ 2 goodness-of-fit test, to name a few). For many values. χ 2 commonly used tests (e.g., statistics that have a limit distri- ≤ = O ( κ ) κ = κ = / Proof. Consider An in (2). If H1 is true, 0 An p n by bution), it holds that 1. However, several observe 1 2 ( ∗ > ) ≥ ( > ), assumption (c). It follows that for all c, P Tn c P Tn c (e.g., z-tests).

A Cheap Trick to Improve the Power of a Conservative Hypothesis Test

Three Statistical Testing Procedures in Logistic Regression: Their Performance in Differential Item Functioning (DIF) Investigation

Testing for INAR Effects

Comparison of Wald, Score, and Likelihood Ratio Tests for Response Adaptive Designs

Robust Score and Portmanteau Tests of Volatility Spillover Mike Aguilar, Jonathan B

Econometrics-I-11.Pdf

Power Analysis for the Wald, LR, Score, and Gradient Tests in a Marginal Maximum Likelihood Framework: Applications in IRT

Rao's Score Test in Econometrics

An Improved Sample Size Calculation Method for Score Tests in Generalized Linear Models Arxiv:2006.13104V1 [Stat.ME] 23 Jun 20

Lagrange Multiplier Test

Lecture 02: Statistical Inference for Binomial Parameters

Skedastic: Heteroskedasticity Diagnostics for Linear Regression

Piagnostics for Heteroscedasticity in Regression by R. Dennis Cook and Sanford Weisberg