Hypothesis Testing

Total Page:16

File Type:pdf, Size:1020Kb

Hypothesis Testing Public Health & Intelligence Hypothesis Testing Document Control Version 0.4 Date Issued 29/11/2018 Authors David Carr, Nicole Jarvie Comments to [email protected] or [email protected] Version Date Comment Authors 0.1 24/08/2018 1st draft David Carr, Nicole Jarvie 0.2 11/09/2018 1st draft with formatting David Carr, Nicole Jarvie 0.3 17/10/2018 Final draft with changes from David Carr, Nicole Jarvie Statistical Advisory Group 0.4 29/11/2018 Final version David Carr, Nicole Jarvie Acknowledgements The authors would like to acknowledge Prof. Chris Robertson and colleagues at the University of Strathclyde for allowing the use of the data for the examples in this paper. The simulated HAI data sets used in the worked examples were originally created by the Health Protection Scotland SHAIPI team in collaboration with the University of Strathclyde. i Table of Contents 1 Introduction .............................................................................................................. 1 2 Constructing a Hypothesis Test .................................................................................. 3 2.1 Defining the Hypotheses ............................................................................................. 3 2.1.1 Null and Alternative Hypotheses .................................................................................... 3 2.1.2 One-Sided and Two-Sided Tests ..................................................................................... 4 2.2 Significance Level and Statistical Power ..................................................................... 6 2.2.1 Significance Level ............................................................................................................ 6 2.2.2 Statistical Power .............................................................................................................. 6 2.2.3 Type I and Type II Error ................................................................................................... 7 2.3 Test Statistic ................................................................................................................ 7 2.4 Rejection Region ......................................................................................................... 8 2.5 Determining Statistical Significance ............................................................................ 8 2.5.1 P-values ........................................................................................................................... 8 2.5.2 Confidence Intervals ....................................................................................................... 9 2.5.3 Comparing Results from One and Two-Sided Tests ...................................................... 11 2.6 Multiple Comparisons ............................................................................................... 11 2.6.1 The Bonferroni Correction ............................................................................................ 12 2.7 General Framework for Hypothesis Testing ............................................................. 12 3 T-tests ..................................................................................................................... 14 3.1 One Sample t-test...................................................................................................... 14 3.1.1 Example ......................................................................................................................... 16 3.2 Two Sample t-test ..................................................................................................... 19 3.2.1 Example ......................................................................................................................... 20 3.3 Paired t-test ............................................................................................................... 22 4 Non-Parametric Tests .............................................................................................. 23 4.1 Wilcoxon Signed Ranks Test ...................................................................................... 23 4.1.1 Example ......................................................................................................................... 24 4.2 Mann-Whitney U-test ............................................................................................... 25 4.2.1 Example ......................................................................................................................... 25 ii 5 Chi-Squared Tests .................................................................................................... 27 5.1 Goodness-of-Fit Test ................................................................................................. 27 5.1.1 Example ......................................................................................................................... 28 5.2 Test of Independence ................................................................................................ 29 5.2.1 Example ......................................................................................................................... 30 6 Proportion Tests ...................................................................................................... 33 6.1 One-Sample Test ....................................................................................................... 33 6.1.1 Example ......................................................................................................................... 35 6.2 Two-Sample Test ....................................................................................................... 36 6.2.1 Example ......................................................................................................................... 38 7 F-tests ..................................................................................................................... 40 7.1 F-test for Equality of Variances ................................................................................. 40 7.1.1 Example ......................................................................................................................... 41 7.2 F-test for Comparing Linear Regression Models ....................................................... 42 7.2.1 Example ......................................................................................................................... 43 Bibliography ................................................................................................................... 47 Appendices .................................................................................................................... 48 A Further Detail on Hypothesis Testing ........................................................................... 48 B Further Detail on F-test for Comparing Linear Regression Models .............................. 50 C R Code for Examples ..................................................................................................... 52 D SPSS Syntax for Examples ............................................................................................. 57 E SPSS Output for Examples ............................................................................................ 63 iii 1 Introduction The purpose of this paper is to outline the theory behind hypothesis testing and to demonstrate how hypothesis testing can be used as part of a range of statistical methods. The paper will address the following statistical methods in the context of hypothesis testing: t-tests, non-parametric tests (the Wilcoxon Signed-Ranks test and the Mann-Whitney U- test), chi-squared tests, proportion tests and F-tests. Some preliminary mathematical and statistical knowledge is assumed. Statistical hypothesis testing is about comparing two contradictory statements about one or more datasets and deciding which one is ‘correct’. For example, if an analyst was investigating if there was a difference in the average A&E waiting time between the Glasgow Royal Infirmary and the Royal Infirmary of Edinburgh, there are only two possible outcomes: either there is statistical evidence of a difference or there is not. Statistical tests that are based in hypothesis testing usually involve comparing one dataset to another. The objective is often to see if there is any statistically-significant difference between the datasets based on a statistic of interest (e.g. mean, median). Hypothesis testing theory is relevant here as the analyst is essentially investigating whether there is evidence of a difference and, if not, then concluding that there is no evidence of a difference. Most of the tests discussed in this paper are examples of ‘univariate’ analysis, where only one variable of interest can be considered. If a multivariate analysis is required, where multiple contributing factors are taken into consideration, then Regression Modelling is usually more appropriate. Table 1 summarises the tests that will be discussed in this paper. The theory behind hypothesis testing will be addressed first, before going on to discuss how the hypothesis- based tests in Table 1 can be used in practice, including showing examples in R (the equivalent SPSS syntax and output are shown in Appendices D and E, respectively). 1 Table 1: Summary of hypothesis tests Test When to Use Major Restrictions on Use One-sample t-test Comparing whether the mean of a Data must be normally-distributed (Chapter
Recommended publications
  • Summarize — Summary Statistics
    Title stata.com summarize — Summary statistics Description Quick start Menu Syntax Options Remarks and examples Stored results Methods and formulas References Also see Description summarize calculates and displays a variety of univariate summary statistics. If no varlist is specified, summary statistics are calculated for all the variables in the dataset. Quick start Basic summary statistics for continuous variable v1 summarize v1 Same as above, and include v2 and v3 summarize v1-v3 Same as above, and provide additional detail about the distribution summarize v1-v3, detail Summary statistics reported separately for each level of catvar by catvar: summarize v1 With frequency weight wvar summarize v1 [fweight=wvar] Menu Statistics > Summaries, tables, and tests > Summary and descriptive statistics > Summary statistics 1 2 summarize — Summary statistics Syntax summarize varlist if in weight , options options Description Main detail display additional statistics meanonly suppress the display; calculate only the mean; programmer’s option format use variable’s display format separator(#) draw separator line after every # variables; default is separator(5) display options control spacing, line width, and base and empty cells varlist may contain factor variables; see [U] 11.4.3 Factor variables. varlist may contain time-series operators; see [U] 11.4.4 Time-series varlists. by, collect, rolling, and statsby are allowed; see [U] 11.1.10 Prefix commands. aweights, fweights, and iweights are allowed. However, iweights may not be used with the detail option; see [U] 11.1.6 weight. Options Main £ £detail produces additional statistics, including skewness, kurtosis, the four smallest and four largest values, and various percentiles. meanonly, which is allowed only when detail is not specified, suppresses the display of results and calculation of the variance.
    [Show full text]
  • U3 Introduction to Summary Statistics
    Presentation Name Course Name Unit # – Lesson #.# – Lesson Name Statistics • The collection, evaluation, and interpretation of data Introduction to Summary Statistics • Statistical analysis of measurements can help verify the quality of a design or process Summary Statistics Mean Central Tendency Central Tendency • The mean is the sum of the values of a set • “Center” of a distribution of data divided by the number of values in – Mean, median, mode that data set. Variation • Spread of values around the center – Range, standard deviation, interquartile range x μ = i Distribution N • Summary of the frequency of values – Frequency tables, histograms, normal distribution Project Lead The Way, Inc. Copyright 2010 1 Presentation Name Course Name Unit # – Lesson #.# – Lesson Name Mean Central Tendency Mean Central Tendency x • Data Set μ = i 3 7 12 17 21 21 23 27 32 36 44 N • Sum of the values = 243 • Number of values = 11 μ = mean value x 243 x = individual data value Mean = μ = i = = 22.09 i N 11 xi = summation of all data values N = # of data values in the data set A Note about Rounding in Statistics Mean – Rounding • General Rule: Don’t round until the final • Data Set answer 3 7 12 17 21 21 23 27 32 36 44 – If you are writing intermediate results you may • Sum of the values = 243 round values, but keep unrounded number in memory • Number of values = 11 • Mean – round to one more decimal place xi 243 Mean = μ = = = 22.09 than the original data N 11 • Standard Deviation: Round to one more decimal place than the original data • Reported: Mean = 22.1 Project Lead The Way, Inc.
    [Show full text]
  • Descriptive Statistics
    Descriptive Statistics Fall 2001 Professor Paul Glasserman B6014: Managerial Statistics 403 Uris Hall Histograms 1. A histogram is a graphical display of data showing the frequency of occurrence of particular values or ranges of values. In a histogram, the horizontal axis is divided into bins, representing possible data values or ranges. The vertical axis represents the number (or proportion) of observations falling in each bin. A bar is drawn in each bin to indicate the number (or proportion) of observations corresponding to that bin. You have probably seen histograms used, e.g., to illustrate the distribution of scores on an exam. 2. All histograms are bar graphs, but not all bar graphs are histograms. For example, we might display average starting salaries by functional area in a bar graph, but such a figure would not be a histogram. Why not? Because the Y-axis values do not represent relative frequencies or proportions, and the X-axis values do not represent value ranges (in particular, the order of the bins is irrelevant). Measures of Central Tendency 1. Let X1,...,Xn be data points, such as the result of n measurements or observations. What one number best characterizes or summarizes these values? The answer depends on the context. Some familiar summary statistics are these: • The mean is given by the arithemetic average X =(X1 + ···+ Xn)/n.(No- tation: We will often write n Xi for X1 + ···+ Xn. i=1 n The symbol i=1 Xi is read “the sum from i equals 1 upto n of Xi.”) 1 • The median is larger than one half of the observations and smaller than the other half.
    [Show full text]
  • Measures of Dispersion for Multidimensional Data
    European Journal of Operational Research 251 (2016) 930–937 Contents lists available at ScienceDirect European Journal of Operational Research journal homepage: www.elsevier.com/locate/ejor Computational Intelligence and Information Management Measures of dispersion for multidimensional data Adam Kołacz a, Przemysław Grzegorzewski a,b,∗ a Faculty of Mathematics and Computer Science, Warsaw University of Technology, Koszykowa 75, Warsaw 00–662, Poland b Systems Research Institute, Polish Academy of Sciences, Newelska 6, Warsaw 01–447, Poland article info abstract Article history: We propose an axiomatic definition of a dispersion measure that could be applied for any finite sample of Received 22 February 2015 k-dimensional real observations. Next we introduce a taxonomy of the dispersion measures based on the Accepted 4 January 2016 possible behavior of these measures with respect to new upcoming observations. This way we get two Available online 11 January 2016 classes of unstable and absorptive dispersion measures. We examine their properties and illustrate them Keywords: by examples. We also consider a relationship between multidimensional dispersion measures and mul- Descriptive statistics tidistances. Moreover, we examine new interesting properties of some well-known dispersion measures Dispersion for one-dimensional data like the interquartile range and a sample variance. Interquartile range © 2016 Elsevier B.V. All rights reserved. Multidistance Spread 1. Introduction are intended for use. It is also worth mentioning that several terms are used in the literature as regards dispersion measures like mea- Various summary statistics are always applied wherever deci- sures of variability, scatter, spread or scale. Some authors reserve sions are based on sample data. The main goal of those characteris- the notion of the dispersion measure only to those cases when tics is to deliver a synthetic information on basic features of a data variability is considered relative to a given fixed point (like a sam- set under study.
    [Show full text]
  • Summary Statistics, Distributions of Sums and Means
    Summary statistics, distributions of sums and means Joe Felsenstein Department of Genome Sciences and Department of Biology Summary statistics, distributions of sums and means – p.1/18 Quantiles In both empirical distributions and in the underlying distribution, it may help us to know the points where a given fraction of the distribution lies below (or above) that point. In particular: The 2.5% point The 5% point The 25% point (the first quartile) The 50% point (the median) The 75% point (the third quartile) The 95% point (or upper 5% point) The 97.5% point (or upper 2.5% point) Note that if a distribution has a small fraction of very big values far out in one tail (such as the distributions of wealth of individuals or families), the may not be a good “typical” value; the median will do much better. (For a symmetric distribution the median is the mean). Summary statistics, distributions of sums and means – p.2/18 The mean The mean is the average of points. If the distribution is the theoretical one, it is called the expectation, it’s the theoretical mean we would be expected to get if we drew infinitely many points from that distribution. For a sample of points x1, x2,..., x100 the mean is simply their average ¯x = (x1 + x2 + x3 + ... + x100) / 100 For a distribution with possible values 0, 1, 2, 3,... where value k has occurred a fraction fk of the time, the mean weights each of these by the fraction of times it has occurred (then in effect divides by the sum of these fractions, which however is actually 1): ¯x = 0 f0 + 1 f1 + 2 f2 + ..
    [Show full text]
  • Numerical Summary Values for Quantitative Data 35
    3.1 Numerical summary values for quantitative data 35 Chapter 3 Descriptive Statistics II: Numerical Summary Values 3.1 Numerical summary values for quantitative data For many purposes a few well–chosen numerical summary values (statistics) will suffice as a description of the distribution of a quantitative variable. A statistic is a numerical characteristic of a sample. More formally, a statistic is a numerical quantity computed from the values of a variable, or variables, corresponding to the units in a sample. Thus a statistic serves to quantify some interesting aspect of the distribution of a variable in a sample. Summary statistics are particularly useful for comparing and contrasting the distribution of a variable for two different samples. If we plan to use a small number of summary statistics to characterize a distribution or to compare two distributions, then we first need to decide which aspects of the distribution are of primary interest. If the distributions of interest are essentially mound shaped with a single peak (unimodal), then there are three aspects of the distribution which are often of primary interest. The first aspect of the distribution is its location on the number line. Generally, when speaking of the location of a distribution we are referring to the location of the “center” of the distribution. The location of the center of a symmetric, mound shaped distribution is clearly the point of symmetry. There is some ambiguity in specifying the location of the center of an asymmetric, mound shaped distribution and we shall see that there are at least two standard ways to quantify location in this context.
    [Show full text]
  • 05 36534Nys130620 31
    Monte Carlos study on Power Rates of Some Heteroscedasticity detection Methods in Linear Regression Model with multicollinearity problem O.O. Alabi, Kayode Ayinde, O. E. Babalola, and H.A. Bello Department of Statistics, Federal University of Technology, P.M.B. 704, Akure, Ondo State, Nigeria Corresponding Author: O. O. Alabi, [email protected] Abstract: This paper examined the power rate exhibit by some heteroscedasticity detection methods in a linear regression model with multicollinearity problem. Violation of unequal error variance assumption in any linear regression model leads to the problem of heteroscedasticity, while violation of the assumption of non linear dependency between the exogenous variables leads to multicollinearity problem. Whenever these two problems exist one would faced with estimation and hypothesis problem. in order to overcome these hurdles, one needs to determine the best method of heteroscedasticity detection in other to avoid taking a wrong decision under hypothesis testing. This then leads us to the way and manner to determine the best heteroscedasticity detection method in a linear regression model with multicollinearity problem via power rate. In practices, variance of error terms are unequal and unknown in nature, but there is need to determine the presence or absence of this problem that do exist in unknown error term as a preliminary diagnosis on the set of data we are to analyze or perform hypothesis testing on. Although, there are several forms of heteroscedasticity and several detection methods of heteroscedasticity, but for any researcher to arrive at a reasonable and correct decision, best and consistent performed methods of heteroscedasticity detection under any forms or structured of heteroscedasticity must be determined.
    [Show full text]
  • The Effects of Simplifying Assumptions in Power Analysis
    University of Nebraska - Lincoln DigitalCommons@University of Nebraska - Lincoln Public Access Theses and Dissertations from Education and Human Sciences, College of the College of Education and Human Sciences (CEHS) 4-2011 The Effects of Simplifying Assumptions in Power Analysis Kevin A. Kupzyk University of Nebraska-Lincoln, [email protected] Follow this and additional works at: https://digitalcommons.unl.edu/cehsdiss Part of the Educational Psychology Commons Kupzyk, Kevin A., "The Effects of Simplifying Assumptions in Power Analysis" (2011). Public Access Theses and Dissertations from the College of Education and Human Sciences. 106. https://digitalcommons.unl.edu/cehsdiss/106 This Article is brought to you for free and open access by the Education and Human Sciences, College of (CEHS) at DigitalCommons@University of Nebraska - Lincoln. It has been accepted for inclusion in Public Access Theses and Dissertations from the College of Education and Human Sciences by an authorized administrator of DigitalCommons@University of Nebraska - Lincoln. Kupzyk - i THE EFFECTS OF SIMPLIFYING ASSUMPTIONS IN POWER ANALYSIS by Kevin A. Kupzyk A DISSERTATION Presented to the Faculty of The Graduate College at the University of Nebraska In Partial Fulfillment of Requirements For the Degree of Doctor of Philosophy Major: Psychological Studies in Education Under the Supervision of Professor James A. Bovaird Lincoln, Nebraska April, 2011 Kupzyk - i THE EFFECTS OF SIMPLIFYING ASSUMPTIONS IN POWER ANALYSIS Kevin A. Kupzyk, Ph.D. University of Nebraska, 2011 Adviser: James A. Bovaird In experimental research, planning studies that have sufficient probability of detecting important effects is critical. Carrying out an experiment with an inadequate sample size may result in the inability to observe the effect of interest, wasting the resources spent on an experiment.
    [Show full text]
  • Introduction to Hypothesis Testing
    Introduction to Hypothesis Testing OPRE 6301 Motivation . The purpose of hypothesis testing is to determine whether there is enough statistical evidence in favor of a certain belief, or hypothesis, about a parameter. Examples: Is there statistical evidence, from a random sample of potential customers, to support the hypothesis that more than 10% of the potential customers will pur- chase a new product? Is a new drug effective in curing a certain disease? A sample of patients is randomly selected. Half of them are given the drug while the other half are given a placebo. The conditions of the patients are then mea- sured and compared. These questions/hypotheses are similar in spirit to the discrimination example studied earlier. Below, we pro- vide a basic introduction to hypothesis testing. 1 Criminal Trials . The basic concepts in hypothesis testing are actually quite analogous to those in a criminal trial. Consider a person on trial for a “criminal” offense in the United States. Under the US system a jury (or sometimes just the judge) must decide if the person is innocent or guilty while in fact the person may be innocent or guilty. These combinations are summarized in the table below. Person is: Innocent Guilty Jury Says: Innocent No Error Error Guilty Error No Error Notice that there are two types of errors. Are both of these errors equally important? Or, is it as bad to decide that a guilty person is innocent and let them go free as it is to decide an innocent person is guilty and punish them for the crime? Or, is a jury supposed to be totally objective, not assuming that the person is either innocent or guilty and make their decision based on the weight of the evidence one way or another? 2 In a criminal trial, there actually is a favored assump- tion, an initial bias if you will.
    [Show full text]
  • Power of a Statistical Test
    Power of a Statistical Test By Smita Skrivanek, Principal Statistician, MoreSteam.com LLC What is the power of a test? The power of a statistical test gives the likelihood of rejecting the null hypothesis when the null hypothesis is false. Just as the significance level (alpha) of a test gives the probability that the null hypothesis will be rejected when it is actually true (a wrong decision), power quantifies the chance that the null hypothesis will be rejected when it is actually false (a correct decision). Thus, power is the ability of a test to correctly reject the null hypothesis. Why is it important? Although you can conduct a hypothesis test without it, calculating the power of a test beforehand will help you ensure that the sample size is large enough for the purpose of the test. Otherwise, the test may be inconclusive, leading to wasted resources. On rare occasions the power may be calculated after the test is performed, but this is not recommended except to determine an adequate sample size for a follow-up study (if a test failed to detect an effect, it was obviously underpowered – nothing new can be learned by calculating the power at this stage). How is it calculated? As an example, consider testing whether the average time per week spent watching TV is 4 hours versus the alternative that it is greater than 4 hours. We will calculate the power of the test for a specific value under the alternative hypothesis, say, 7 hours: The Null Hypothesis is H0: μ = 4 hours The Alternative Hypothesis is H1: μ = 6 hours Where μ = the average time per week spent watching TV.
    [Show full text]
  • Confidence Intervals and Hypothesis Tests
    Chapter 2 Confidence intervals and hypothesis tests This chapter focuses on how to draw conclusions about populations from sample data. We'll start by looking at binary data (e.g., polling), and learn how to estimate the true ratio of 1s and 0s with confidence intervals, and then test whether that ratio is significantly different from some baseline value using hypothesis testing. Then, we'll extend what we've learned to continuous measurements. 2.1 Binomial data Suppose we're conducting a yes/no survey of a few randomly sampled people 1, and we want to use the results of our survey to determine the answers for the overall population. 2.1.1 The estimator The obvious first choice is just the fraction of people who said yes. Formally, suppose we have samples x1,..., xn that can each be 0 or 1, and the probability that each xi is 1 is p (in frequentist style, we'll assume p is fixed but unknown: this is what we're interested in finding). We'll assume our samples are indendepent and identically distributed (i.i.d.), meaning that each one has no dependence on any of the others, and they all have the same probability p of being 1. Then our estimate for p, which we'll callp ^, or \p-hat" would be n 1 X p^ = x : n i i=1 Notice thatp ^ is a random quantity, since it depends on the random quantities xi. In statistical lingo,p ^ is known as an estimator for p. Also notice that except for the factor of 1=n in front, p^ is almost a binomial random variable (in particular, (np^) ∼ B(n; p)).
    [Show full text]
  • Understanding Statistical Hypothesis Testing: the Logic of Statistical Inference
    Review Understanding Statistical Hypothesis Testing: The Logic of Statistical Inference Frank Emmert-Streib 1,2,* and Matthias Dehmer 3,4,5 1 Predictive Society and Data Analytics Lab, Faculty of Information Technology and Communication Sciences, Tampere University, 33100 Tampere, Finland 2 Institute of Biosciences and Medical Technology, Tampere University, 33520 Tampere, Finland 3 Institute for Intelligent Production, Faculty for Management, University of Applied Sciences Upper Austria, Steyr Campus, 4040 Steyr, Austria 4 Department of Mechatronics and Biomedical Computer Science, University for Health Sciences, Medical Informatics and Technology (UMIT), 6060 Hall, Tyrol, Austria 5 College of Computer and Control Engineering, Nankai University, Tianjin 300000, China * Correspondence: [email protected]; Tel.: +358-50-301-5353 Received: 27 July 2019; Accepted: 9 August 2019; Published: 12 August 2019 Abstract: Statistical hypothesis testing is among the most misunderstood quantitative analysis methods from data science. Despite its seeming simplicity, it has complex interdependencies between its procedural components. In this paper, we discuss the underlying logic behind statistical hypothesis testing, the formal meaning of its components and their connections. Our presentation is applicable to all statistical hypothesis tests as generic backbone and, hence, useful across all application domains in data science and artificial intelligence. Keywords: hypothesis testing; machine learning; statistics; data science; statistical inference 1. Introduction We are living in an era that is characterized by the availability of big data. In order to emphasize the importance of this, data have been called the ‘oil of the 21st Century’ [1]. However, for dealing with the challenges posed by such data, advanced analysis methods are needed.
    [Show full text]