Statistical Significance, P-Values and Confidence Intervals a Brief Guide for Non-Statisticians Using SPSS Statistics

Statistical Significance, p-Values and Confidence Intervals A Brief Guide for Non-statisticians Using SPSS Statistics © Colm McGuinness, 2015 Table of Contents 1. Introduction .................................................................................................................................................................... 2 2. Statistical Significance/P-values ...................................................................................................................................... 3 3. Effect Size ........................................................................................................................................................................ 5 4. The H0 Distribution/Sampling Distributions ................................................................................................................... 6 5. Confidence Interval for a Population Statistic ................................................................................................................ 8 6. Example 1: Student marks .............................................................................................................................................. 9 7. Example 2: Household Income by Churn ...................................................................................................................... 13 8. Example 3: Selecting Subgroups ................................................................................................................................... 15 9. Too Many Group Comparisons: Familywise Error Rates .............................................................................................. 22 Page 1 of 22 1. Introduction This document attempts to give a basic understanding of statistical significance values, p values, and confidence intervals. Various simplifications and assumptions are made throughout this document to make it accessible to as wide an audience of statistical practitioners as possible. It is assumed that you are already somewhat familiar with using SPSS and the telco.sav sample file from lab classes with me, gathering data, samples & populations, and basic descriptive statistics such as the mean, standard deviation, and variance. A primary objective is to avoid technical details, and show how a basic understanding can be applied to a wide range of problems1. The basic steps, which will be detailed in later sections, for interpreting any significance/p-value, in summary, are: - Determine the null hypothesis … This is the “nothing different/new/interesting” statement for some statistic derived from your data. The particular statistic will depend on the test being performed. - Interpret the “sig.”/p-value as the probability of the statistic derived from your data occurring, if the null hypothesis is true. - If the p-value is low then reject the null hypothesis: The statistic derived from your data does not support the null hypothesis. - If the p-value is high then do not reject the null hypothesis: The statistic derived from your data agrees with the null hypothesis. Statistics can be used to compare two or more subgroups within a sample of data, for example comparing the marks of students from one year to marks obtained from another year, or the marks obtained by males against those obtained by females, or sales values before and after an advertising campaign: All sorts of things! The main example for this document is the comparison of the means from two groups. For a given set of values, the mean represents the arithmetic centre of the values. The mean can be interpreted as an average, “typical” or indicative value for the whole of the data. So if we want to compare two groups, then we instead compare their means, and infer from any difference found (or not) as to whether the groups differ or not. Comparing only group means can be less than ideal since means might not differ, but other characteristics might, or vice-versa. But it is a commonly used technique, and we will use it here as an example to discuss in the context of statistical significance. The main example used in later sections has the following basic details: The SPSS sample file telco.sav contains, amongst other things, information on gender and household income for a sample of 1000 telecoms customers. We might wonder if household income differs by gender, ie is there any evidence from this sample that male household income is different to female household income? To test this requires an “Independent-Samples T Test” from the SPSS Analyze/Compare Means menu path. Why this is the case is not covered by this document, but it is a common test to use to compare two independent subgroups. It only works for two independent subgroups, ie male and female here. For more than two subgroups you would typically consider an ANOVA test, which is a separate matter. Or for dependent subgroups (eg “before” and “after” measurements on the same subjects) you might be able to use a “Paired-Samples T Test”. The gender SPSS variable is coded as 0=Male, and 1=Female. Without going into too much technical detail, what SPSS will do for this test is calculate the means for the two 2 subgroups, say x0 and x1 for male and female respectively , and then it will calculate the difference in means, ie xx01 . It then calculates the significance associated with this specific difference answer, and that answer is the statistical significance that this document is mostly concerned with. 1 In an ideal world one should not avoid technical details as these can be crucially important! However many people who find themselves needing to use some level of statistics will not themselves need to be overly familiar with technical details. For important statistical work I recommend engaging with a professional statistician, since ultimately ignoring technical details is bad, very bad‼ 2 Note that the subscript notation here will not be shown by SPSS in output. This type of notation is very commonly used in one format or another in statistical texts. A variable name, eg x, with a bar over, eg x, it is used to indicate that it is the mean of the x’s that we are referring to. Page 2 of 22 The initial output from this comparison, showing the descriptive statistics, is shown in Table 1. Group Statistics Gende Std. Std. Error r N Mean Deviation Mean Household income in Male 483 73.2505 92.85082 4.22486 thousands Femal 517 81.5377 118.73355 5.22190 e Table 1: Descriptive statistics from an Independent-Samples T Test on the telco.sav SPSS sample file, comparing household income by gender. So here x0 is 73.2505 and x1 is 81.5377, and their difference (which we will see in Table 2) is xx01 8.2872. The main focus of this document is the understanding and interpretation of statistical significance and p values, which are described in the next section. 2. Statistical Significance/P-values Many statistical tests result in a statistical significance (“sig.”) value in SPSS (and other statistical packages). This is commonly known as the “p value” and is often quoted in research as, for example, “p=0.0819” or “p<0.01” or “p>0.05”. The Independent-Samples T Test from the introduction above produces the following results table3, which has been split to make it fit on the page. The significance answers have been highlighted with shading: Levene's Test for Equality of Variances F Sig. Household income in thousands Equal variances assumed 3.259 .071 Equal variances not assumed t-test for Equality of Means Std. Error 95% Confidence Interval of the Difference t df Sig. (2-tailed) Mean Difference Difference Lower Upper -1.224 998 .221 -8.28720 6.77230 -21.57678 5.00238 -1.234 968.412 .218 -8.28720 6.71697 -21.46868 4.89428 Table 2: Results of an independent samples T test in SPSS2. Statistical significance is highlighted by the shaded cells. The (statistical) significance is the probability that your actual statistic (the difference in means here) would be found if the null hypothesis were true4. Typically the null hypothesis is that “nothing different/new/interesting” has been found. Statistically this would typically be written as: 3 The result shown is from the TELCO.sav sample file, and compares the mean household incomes of males and females in the sample. 4 Technically it is that your actual statistic or “worse”. So in this case not just your actual difference in means, but also any even larger difference in means, but the given text is a starting point. Page 3 of 22 5 6 H0: 12 , where 1 and 2 are the population means for the two groups being compared . So the “nothing new/interesting” here would be that the means for both groups will be equal. The sig. value is then the probability of getting your actual difference (or worse) between the two sample means, if H0 is true. In Table 2 the actual difference, from the “mean difference” column, is -8.28720, and the two sig. values to the left of this are the probabilities of finding this particular difference (or worse) if in fact the two population means are equal. Although it is often not explicitly stated all “null hypothesis statistical test procedures” (NHSTP) involve a null hypothesis, and also an alternative hypothesis, commonly called H1 or Ha, for example: H1: 12 . The alternative hypothesis determines the type of test to be performed in terms of whether the test must be “1 tailed” or “2 tailed”. For example the above H1 corresponds to a 2 tailed test, since we don’t know/specify in advance whether 1 might be bigger than 2 or smaller than 2 . This is probably the more common type of

Statistical Significance, P-Values and Confidence Intervals a Brief Guide for Non-Statisticians Using SPSS Statistics

Statistical Significance Testing in Information Retrieval:An Empirical

Tests of Hypotheses Using Statistics

Statistical Significance

Understanding Statistical Hypothesis Testing: the Logic of Statistical Inference

What Are Confidence Intervals and P-Values?

Understanding Statistical Significance: a Short Guide

The Independent Samples T Test 189

Estimation: Understanding Confidence Interval S

Confidence Intervals and Statistical Significance -Statistical Literacy Guide

Statistical Approaches to Uncertainty: P Values and Confidence Intervals Unpacked

Introduction to Hypothesis Testing 3

One-Way Analysis of Variance: Comparing Several Means