Estimation: Understanding Confidence Interval S

Total Page:16

File Type:pdf, Size:1020Kb

Estimation: Understanding Confidence Interval S StatisticalV Notes I Estimation:s Understanding confidence interval Ah Cook, A Sheik INTRODUCTION presentings instead a confidence interval, one require Thes previous paper of this series described hypothesi they reader to think about what the values actuall testing, showing how a p-value indicates the mean,. thus interpreting the results more fully likelihoodn of a genuine difference existing betwee twos or more groups. An alternative to using p-value CALCULATINGLVA THE CONFIDENCE INTER alone,o to show whether or not differences exist, is t FORN A PROPORTIO report the actual size of the difference. Since a Kaur etl a carrieda out a prevalence study of asthm difference observed between groups is subject to symptoms. and diagnosis in British 12-14 year olds 1 randomt variation it becomes necessary to present no From) a sample of 27,507 children, 20.8% (n=5,736 onlyd the difference, but also a range of values aroun reportede ‘ever having’ asthma. Assuming the sampl thee observed difference within which it is believed th toe be representative, the true national prevalenc true value will lie. Such a range is known as a shouldn be close to this figure, but it remains unknow confidence. interval and a different sample would probably yield a slightly differenty estimate. To calculate a range likel Confidence, intervals may be calculated for means to containo the true figure, we need t proportions, differences between means or knowo by how much the sample proportion is likely t proportions,r relative risks, odds ratios and many othe vary. Put more technically,o we need t summaryy statistics. Here we describe in detail onl know. the standard deviation of the sample statistic onel simple calculation, that of a confidence interva This. is known as the standard error fore a proportion. Using examples from the literatur we look at the interpretation of other confidence Oneo way of finding the standard error would be t intervals,s and show the relationship between p-value taken several more samples, calculate the proportio and. confidence intervals ‘evern having’ asthma separately in each sample, the calculate. the standard deviation of these proportions WHYVALUESH P- ALONE ARE NOT ENOUG Fortunately,s such labour is unnecessary because it ha P-values express statistical significance, but been shown that most summary statistics follow statisticallyl significant results may have little clinica normals distributions, particularly when sample size significance.e This is particularly the case with larg aref large. Furthermore, the standard deviations o studies that have power to detect very small thesed distributions are directly related to the standar differences.e For example, an improvement in averag deviation of the original data. The situation is peakn flow of 1 l/min, when comparing interventio illustrated in Figure 1, which shows a possible andt control groups, may be statistically significant bu distribution of data, together with the expected clearly. has little clinical benefit distribution. of mean values generated by data samples Adrian Cook Ad further drawback of p-values is the emphasis place With% data from a normally distributed variable, 95 Statistician ont p=0.05, a value chosen purely by convention bu of observations should lie within two standard whichg has spawned a tendency to dismiss anythin deviations of the mean. Having said that a sample Azizh Sheik largery and focus attention on smaller values only. B statistic is expected to be normally distributed, it NHSy R&D National Primar Carew Training Fello Figuren 1. Expected distribution of a sample mea Departmenth of Primary Healt Caree & General Practic Imperialf College School o Medicine Correspondence: to Adriank Coo Departmenth of Primary Healt Caree & General Practic ICSM, Charing Cross Campus Reynolds, Building St Dunstan’s Road, London,P W6 8R [email protected] Date0 submitted: 19/7/0 Date0 accepted: 17/10/0 Prim Care Respir J 2000: 9(3);0 48-5 48 Primary Care Respiratory Journal StatisticalV Notes I Tabler 1. Formulae for calculating standard erro Summaryc statisti Formula Symbol Standard) error (SE •x sd(x) Mean n x n cases p(1–p) Proportion p casess + noncase n x1 x2 2 2 • • – sd(x1) sd(x2) Differences in mean – x1 x2 + n1 n2 n1 n2 cases 1 cases 2 p1–p 2 p1(1–p1)p2–(1 p2) – + Differences in proportion cases 1+s noncase 1 cases 2+s noncase 2 n1 n2 cases 1/(cases 1+s noncase 1) 1 1 1 1 * Relativek ris cases 2/(cases 2+s noncase 2) R cases1–cases1+s noncase 1+scase 2–cases 2+ noncases 2 cases 1/noncases 1 1 1 1 1 * Oddso rati OR + + + cases 2/noncases 2 cases 1 noncases 1 cases 2 noncases 2 *Standard. error of the logged statistic followsn that on 95% of occasions it will be less tha CONFIDENCESVALUE INTERVALS AND P- twoi standard errors from its true value. The probabil -It may have become apparent that the statistical tyr of the range formed by the sample statistic plus o significance of differences can be gleaned from minuss two standard errors containing the true value i confidenceg intervals. A confidence interval containin therefore 95%. This range is the 95% confidence 1.0e for a relative risk or an odds ratio means we ar interval.n Formulae for the standard error of commo lessa than 95% sure that a genuine difference exists, summary statistics are in Table 1, those for other significancee test of the difference would thus giv statisticsr are usually readily available in textbooks. Fo p>0.05.g Similarly a confidence interval not includin ther example of asthma prevalence, the standard erro 1.0d corresponds to p<0.05, while an interval bounde cane be calculated as 0.25% giving a 95% confidenc ati one end by 1.0 exactly would give p=0.05. A sim - intervals of 20.3% to 21.3%. The narrowness of thi lar situation exists with confidence intervals for confidence interval reflects the large sample size, differences in means or proportions, the only illustrating how certainty in a result grows as the differencee being that no effect is represented by th numberr of observations increase, resulting in smalle value. 0.0, rather than 1.0 standard. errors and narrower confidence intervals Ther practice of reporting confidence intervals togethe INTERPRETINGS CONFIDENCE INTERVAL withe p-values is questionable, p-values adding littl FORS RELATIVE RISK informationo for the informed reader. An exception t Waldg and Watt compared all-cause mortality amon thise rule occurs when a large number of confidenc different- types of smoker with that of lifelong non intervalsy are reported, in this instance the generall smokers. 2Comparedk to non-smokers, the relative ris discouraged habit of replacing p-values with stars (RR)s of mortality among former cigarette smokers wa indicating p<0.05 and p<0.01 becomes useful, 1.11t (95% confidence interval 0.92 to 1.34). The bes allowing. a rapid overview of results to be made estimatef of the effect on mortality is an increase o 11%,t RR=1.11, but the possibility of no effec CONCLUSIONS (RR=1.0)a remains. Among current smokers the rel - Here we have outlined the theory and practice of tive7 mortality was 2.26 (95% confidence interval 1.9 calculatingn confidence intervals, and give toe 2.58). This confidence interval does not includ pointers. toward their meaningful interpretation RR=1.0s and so we can be confident that mortality i higher. among current smokers Tables 2. Relative all-cause mortality of different smoking group Amongr pipe and cigar smokers who had neve Smokingp grou n died R † 95%I C smokeds cigarettes, mortality compared to non-smoker was. 1.23 (95% confidence interval 0.99 to 1.75) Lifelongr non-smoke 6539 346 1.00 Relative mortality, is higher than that of ex-smokers Formerr cigarette smoke 1465 162 11.1 0.924 to 1.3 butr the confidence interval is much wider. The greate Pipe/cigarr smoke widthg is partly due to the pipe/cigar group bein nevers smoked cigarette 1309 311 1.23 0.995 to 1.7 smallere than the ex-smokers group, one more or on Currentr cigarette smoke 4182 540 2.26 1.978 to 2.5 fewerd death thus has a greater effect on mortality an thee wider confidence interval reflects the less stabl †Adjustedy for age at entry to stud result. Primary Care Respiratory Journal 49 StatisticalV Notes I Clinicale significance may be gauged both from th Recommendedg readin pointf estimate of the difference, and consideration o Gardner MJ, Altman DG. Statistics. with confidence the. confidence limit’s upper and lower bounds London:. BMJ, 1993 Whetherr or not a confidence interval contains unity fo ae relative difference, or zero for an absolute differenc References .1 Kaur B, Ross Anderson H, Austin J etl a .a Prevalence of asthm revealsy statistical significance. Because they conve symptoms,n diagnosis, and treatment in 12-14 year old childre bothe aspects of significance, confidence intervals hav acrosss Great Britain (international study of asthma and allergie becomeg the strongly preferred way of presentin in childhood, ISAAC UK). BMJ 1998;316.:118-24 results. 2. Waldm NJ, Watt HC. Prospective study of effect of switching fro cigarettesg to pipes or cigars on mortality from three smokin related diseases. BMJ 1997;314:1860-3. Acknowledgements ASy is supported by an NHS R&D National Primar Care. Training Fellowship Erratum In the June 2000 issue of the Primaryl Care Respiratory Journa , kreference: Cropper JA, Frank TL, Fran PI, Hannaford PC. Primary care workload and prescribing costs in children. The impact of respiratory symptoms.
Recommended publications
  • Statistical Significance Testing in Information Retrieval:An Empirical
    Statistical Significance Testing in Information Retrieval: An Empirical Analysis of Type I, Type II and Type III Errors Julián Urbano Harlley Lima Alan Hanjalic Delft University of Technology Delft University of Technology Delft University of Technology The Netherlands The Netherlands The Netherlands [email protected] [email protected] [email protected] ABSTRACT 1 INTRODUCTION Statistical significance testing is widely accepted as a means to In the traditional test collection based evaluation of Information assess how well a difference in effectiveness reflects an actual differ- Retrieval (IR) systems, statistical significance tests are the most ence between systems, as opposed to random noise because of the popular tool to assess how much noise there is in a set of evaluation selection of topics. According to recent surveys on SIGIR, CIKM, results. Random noise in our experiments comes from sampling ECIR and TOIS papers, the t-test is the most popular choice among various sources like document sets [18, 24, 30] or assessors [1, 2, 41], IR researchers. However, previous work has suggested computer but mainly because of topics [6, 28, 36, 38, 43]. Given two systems intensive tests like the bootstrap or the permutation test, based evaluated on the same collection, the question that naturally arises mainly on theoretical arguments. On empirical grounds, others is “how well does the observed difference reflect the real difference have suggested non-parametric alternatives such as the Wilcoxon between the systems and not just noise due to sampling of topics”? test. Indeed, the question of which tests we should use has accom- Our field can only advance if the published retrieval methods truly panied IR and related fields for decades now.
    [Show full text]
  • Tests of Hypotheses Using Statistics
    Tests of Hypotheses Using Statistics Adam Massey¤and Steven J. Millery Mathematics Department Brown University Providence, RI 02912 Abstract We present the various methods of hypothesis testing that one typically encounters in a mathematical statistics course. The focus will be on conditions for using each test, the hypothesis tested by each test, and the appropriate (and inappropriate) ways of using each test. We conclude by summarizing the di®erent tests (what conditions must be met to use them, what the test statistic is, and what the critical region is). Contents 1 Types of Hypotheses and Test Statistics 2 1.1 Introduction . 2 1.2 Types of Hypotheses . 3 1.3 Types of Statistics . 3 2 z-Tests and t-Tests 5 2.1 Testing Means I: Large Sample Size or Known Variance . 5 2.2 Testing Means II: Small Sample Size and Unknown Variance . 9 3 Testing the Variance 12 4 Testing Proportions 13 4.1 Testing Proportions I: One Proportion . 13 4.2 Testing Proportions II: K Proportions . 15 4.3 Testing r £ c Contingency Tables . 17 4.4 Incomplete r £ c Contingency Tables Tables . 18 5 Normal Regression Analysis 19 6 Non-parametric Tests 21 6.1 Tests of Signs . 21 6.2 Tests of Ranked Signs . 22 6.3 Tests Based on Runs . 23 ¤E-mail: [email protected] yE-mail: [email protected] 1 7 Summary 26 7.1 z-tests . 26 7.2 t-tests . 27 7.3 Tests comparing means . 27 7.4 Variance Test . 28 7.5 Proportions . 28 7.6 Contingency Tables .
    [Show full text]
  • Statistical Significance
    Statistical significance In statistical hypothesis testing,[1][2] statistical signif- 1.1 Related concepts icance (or a statistically significant result) is at- tained whenever the observed p-value of a test statis- The significance level α is the threshhold for p below tic is less than the significance level defined for the which the experimenter assumes the null hypothesis is study.[3][4][5][6][7][8][9] The p-value is the probability of false, and something else is going on. This means α is obtaining results at least as extreme as those observed, also the probability of mistakenly rejecting the null hy- given that the null hypothesis is true. The significance pothesis, if the null hypothesis is true.[22] level, α, is the probability of rejecting the null hypothe- Sometimes researchers talk about the confidence level γ sis, given that it is true.[10] This statistical technique for = (1 − α) instead. This is the probability of not rejecting testing the significance of results was developed in the the null hypothesis given that it is true. [23][24] Confidence early 20th century. levels and confidence intervals were introduced by Ney- In any experiment or observation that involves drawing man in 1937.[25] a sample from a population, there is always the possibil- ity that an observed effect would have occurred due to sampling error alone.[11][12] But if the p-value of an ob- 2 Role in statistical hypothesis test- served effect is less than the significance level, an inves- tigator may conclude that that effect reflects the charac- ing teristics of the
    [Show full text]
  • Understanding Statistical Hypothesis Testing: the Logic of Statistical Inference
    Review Understanding Statistical Hypothesis Testing: The Logic of Statistical Inference Frank Emmert-Streib 1,2,* and Matthias Dehmer 3,4,5 1 Predictive Society and Data Analytics Lab, Faculty of Information Technology and Communication Sciences, Tampere University, 33100 Tampere, Finland 2 Institute of Biosciences and Medical Technology, Tampere University, 33520 Tampere, Finland 3 Institute for Intelligent Production, Faculty for Management, University of Applied Sciences Upper Austria, Steyr Campus, 4040 Steyr, Austria 4 Department of Mechatronics and Biomedical Computer Science, University for Health Sciences, Medical Informatics and Technology (UMIT), 6060 Hall, Tyrol, Austria 5 College of Computer and Control Engineering, Nankai University, Tianjin 300000, China * Correspondence: [email protected]; Tel.: +358-50-301-5353 Received: 27 July 2019; Accepted: 9 August 2019; Published: 12 August 2019 Abstract: Statistical hypothesis testing is among the most misunderstood quantitative analysis methods from data science. Despite its seeming simplicity, it has complex interdependencies between its procedural components. In this paper, we discuss the underlying logic behind statistical hypothesis testing, the formal meaning of its components and their connections. Our presentation is applicable to all statistical hypothesis tests as generic backbone and, hence, useful across all application domains in data science and artificial intelligence. Keywords: hypothesis testing; machine learning; statistics; data science; statistical inference 1. Introduction We are living in an era that is characterized by the availability of big data. In order to emphasize the importance of this, data have been called the ‘oil of the 21st Century’ [1]. However, for dealing with the challenges posed by such data, advanced analysis methods are needed.
    [Show full text]
  • What Are Confidence Intervals and P-Values?
    What is...? series Second edition Statistics Supported by sanofi-aventis What are confidence intervals and p-values? G A confidence interval calculated for a measure of treatment effect Huw TO Davies PhD shows the range within which the true treatment effect is likely to lie Professor of Health (subject to a number of assumptions). Care Policy and G A p-value is calculated to assess whether trial results are likely to have Management, occurred simply through chance (assuming that there is no real University of St difference between new treatment and old, and assuming, of course, Andrews that the study was well conducted). Iain K Crombie PhD G Confidence intervals are preferable to p-values, as they tell us the range FFPHM Professor of of possible effect sizes compatible with the data. Public Health, G p-values simply provide a cut-off beyond which we assert that the University of Dundee findings are ‘statistically significant’ (by convention, this is p<0.05). G A confidence interval that embraces the value of no difference between treatments indicates that the treatment under investigation is not significantly different from the control. G Confidence intervals aid interpretation of clinical trial data by putting upper and lower bounds on the likely size of any true effect. G Bias must be assessed before confidence intervals can be interpreted. Even very large samples and very narrow confidence intervals can mislead if they come from biased studies. G Non-significance does not mean ‘no effect’. Small studies will often report non-significance even when there are important, real effects which a large study would have detected.
    [Show full text]
  • Understanding Statistical Significance: a Short Guide
    UNDERSTANDING STATISTICAL SIGNIFICANCE: A SHORT GUIDE Farooq Sabri and Tracey Gyateng September 2015 Using a control or comparison group is a powerful way to measure the impact of an intervention, but doing this in a robust way requires statistical expertise. NPC’s Data Labs project, funded by the Oak Foundation, aims to help charities by opening up government data that will allow organisations to compare the longer-term outcomes of their service to non-users. This guide explains the terminology used in comparison group analysis and how to interpret the results. Introduction With budgets tightening across the charity sector, it is helpful to test whether services are actually helping beneficiaries by measuring their impact. Robust evaluation means assessing whether programmes have made a difference over and above what would have happened without them1. This is known as the ‘counterfactual’. Obviously we can only estimate this difference. The best way to do so is to find a control or comparison group2 of people who have similar characteristics to the service users, the only difference being that they did not receive the intervention in question. Assessment is made by comparing the outcomes for service users with the comparison group to see if there is any statistically significant difference. Creating comparison groups and testing for statistical significance can involve complex calculations, and interpreting the results can be difficult, especially when the result is not clear cut. That’s why NPC launched the Data Labs project to respond to this need for robust quantitative evaluations. This paper is designed as an introduction to the field, explaining the key terminology to non-specialists.
    [Show full text]
  • The Independent Samples T Test 189
    CHAPTER 7 Comparing Two Group Means: distribute The Independent or Samples t Test post, After reading this chapter, you will be able to Differentiate between the one-sample t test and the independentcopy, samples t test Summarize the relationship among an independent variable, a dependent variable, and random assignment Interpret the conceptual ingredients of thenot independent samples t test Interpret an APA style presentation of an independent samples t test Hand-calculate all ingredientsDo for an independent samples t test Conduct and interpret an- independent samples t test using SPSS In the previous chapter, we discussed the basic principles of statistically testing a null hypothesis. We high- lighted theseProof principles by introducing two parametric inferential statistical tools, the z test and the one- sample t test. Recall that we use the z test when we want to compare a sample to the population and we know the population parameters, specifically the population mean and standard deviation. We use the one-sample t test when we do not have access to the population standard deviation. In this chapter, we will add another inferential statistical tool to our toolbox. Specifically, we will learn how to compare the difference between means from two groups drawn from the same population to learn whether that mean difference might exist Draftin the population. 188 Copyright ©2017 by SAGE Publications, Inc. This work may not be reproduced or distributed in any form or by any means without express written permission of the publisher. CHAPTER 7 Comparing Two Group Means: The Independent Samples t Test 189 CONCEPTUAL UNDERSTANDING OF THE STATISTICAL TOOL The Study As a little kid, I was afraid of the dark.
    [Show full text]
  • Confidence Intervals and Statistical Significance -Statistical Literacy Guide
    Statistical literacy guide Confidence intervals and statistical significance Last updated: June 2009 Author: Ross Young & Paul Bolton This guide outlines the related concepts of confidence intervals and statistical significance and gives examples of the standard way that both are calculated or tested. In statistics it is important to measure how confident we can be in the results of a survey or experiment. One way of measuring the degree of confidence in statistical results is to review the confidence interval reported by researchers. Confidence intervals describe the range within which a result for the whole population would occur for a specified proportion of times a survey or test was repeated among a sample of the population. Confidence intervals are a standard way of expressing the statistical accuracy of a survey-based estimate. If an estimate has a high error level, the corresponding confidence interval will be wide, and the less confidence we can have that the survey results describe the situation among the whole population. It is common when quoting confidence intervals to refer to the 95% confidence interval around a survey estimate or test result, although other confidence intervals can be reported (e.g. 99%, 90%, even 80%). Where a 95% confidence interval is reported then we can be reasonably confident that the range includes the ‘true’ value for the population as a whole. Formally we would expect it to contain the ‘true’ value 95% of the time. The calculation of a confidence interval is based on the characteristics of a normal distribution. The chart below shows what a normal distribution of results is expected to look like.
    [Show full text]
  • Statistical Approaches to Uncertainty: P Values and Confidence Intervals Unpacked
    133 EBM notebook ...................................................................................... Statistical approaches to uncertainty: p values and confidence intervals unpacked Introduction What is statistical uncertainty? Statistical uncertainty is the uncertainty (present even in a representative sample) associated with the use of sample data to make statements about the wider population. Why do we need measures of uncertainty? It usually is not feasible to include all individuals from a target population in a single study. For example, in a randomised controlled trial (RCT) of a new treatment for hypertension, it would not be possible to include all individuals with hypertension. Instead, a sample (a small subset of this population) is allocated to receive either the new or the standard treatment. What are the measures of uncertainty? Either hypothesis tests (with the calculation of p values) or confidence intervals (CIs) can quantify the amount of statistical uncertainty present in a study, though CIs are usually preferred. n a previous Statistics Note, we defined terms such as not describe the uncertainty around the results. P values and experimental event risk (EER), control event risk (CER), CIs provide additional information to help us determine absolute risk reduction (ARR), and number needed to treat whether the results are both clinically and statistically I 1 (NNT). Estimates such as ARR and NNT alone, however, do significant (table 1). Table 1. Outcomes in the RCT comparing streptomycin with bed rest alone in the treatment of tuberculosis Intervention Survival Death Risk of death ARR 95% CI P value Streptomycin (n = 55) 51 4 4/55 = 7.3% (EER) 25.9% 2 7.3% = 19.6% 5.7% to 33.6% 0.006 Bed rest (n = 52) 38 14 14/52 = 25.9% (CER) The ARR (the difference in risk) is estimated to be 19.6% with streptomycin or as few as 3 to prevent 1 additional person a 95% CI of 5.7% to 33.6%.
    [Show full text]
  • Introduction to Hypothesis Testing 3
    CHAPTER Introduction to Hypothesis 8 Testing 8.1 Inferential Statistics and Hypothesis Testing 8.2 Four Steps to LEARNING OBJECTIVES Hypothesis Testing After reading this chapter, you should be able to: 8.3 Hypothesis Testing and Sampling Distributions 8.4 Making a Decision: 1 Identify the four steps of hypothesis testing. Types of Error 8.5 Testing a Research 2 Define null hypothesis, alternative hypothesis, Hypothesis: Examples level of significance, test statistic, p value, and Using the z Test statistical significance. 8.6 Research in Focus: Directional Versus 3 Define Type I error and Type II error, and identify the Nondirectional Tests type of error that researchers control. 8.7 Measuring the Size of an Effect: Cohen’s d 4 Calculate the one-independent sample z test and 8.8 Effect Size, Power, and interpret the results. Sample Size 8.9 Additional Factors That 5 Distinguish between a one-tailed and two-tailed test, Increase Power and explain why a Type III error is possible only with 8.10 SPSS in Focus: one-tailed tests. A Preview for Chapters 9 to 18 6 Explain what effect size measures and compute a 8.11 APA in Focus: Reporting the Test Cohen’s d for the one-independent sample z test. Statistic and Effect Size 7 Define power and identify six factors that influence power. 8 Summarize the results of a one-independent sample z test in American Psychological Association (APA) format. 2 PART III: PROBABILITY AND THE FOUNDATIONS OF INFERENTIAL STATISTICS 8.1 INFERENTIAL STATISTICS AND HYPOTHESIS TESTING We use inferential statistics because it allows us to measure behavior in samples to learn more about the behavior in populations that are often too large or inaccessi­ ble.
    [Show full text]
  • One-Way Analysis of Variance: Comparing Several Means
    One-Way Analysis of Variance: Comparing Several Means Diana Mindrila, Ph.D. Phoebe Balentyne, M.Ed. Based on Chapter 25 of The Basic Practice of Statistics (6th ed.) Concepts: . Comparing Several Means . The Analysis of Variance F Test . The Idea of Analysis of Variance . Conditions for ANOVA . F Distributions and Degrees of Freedom Objectives: Describe the problem of multiple comparisons. Describe the idea of analysis of variance. Check the conditions for ANOVA. Describe the F distributions. Conduct and interpret an ANOVA F test. References: Moore, D. S., Notz, W. I, & Flinger, M. A. (2013). The basic practice of statistics (6th ed.). New York, NY: W. H. Freeman and Company. Introduction . The two sample t procedures compare the means of two populations. However, many times more than two groups must be compared. It is possible to conduct t procedures and compare two groups at a time, then draw a conclusion about the differences between all of them. However, if this were done, the Type I error from every comparison would accumulate to a total called “familywise” error, which is much greater than for a single test. The overall p-value increases with each comparison. The solution to this problem is to use another method of comparison, called analysis of variance, most often abbreviated ANOVA. This method allows researchers to compare many groups simultaneously. ANOVA analyzes the variance or how spread apart the individuals are within each group as well as between the different groups. Although there are many types of analysis of variance, these notes will focus on the simplest type of ANOVA, which is called the one-way analysis of variance.
    [Show full text]
  • INTRODUCTION to ONE-WAY ANALYSIS of VARIANCE Dale Berger, Claremont Graduate University
    INTRODUCTION TO ONE-WAY ANALYSIS OF VARIANCE Dale Berger, Claremont Graduate University http://wise.cgu.edu The purpose of this paper is to explain the logic and vocabulary of one-way analysis of variance (ANOVA). The null hypothesis (H0) tested by one-way ANOVA is that two or more population means are equal. A statistically significant test indicates that observed data sampled from each of the populations would be unlikely if the null hypothesis were true. The logic used in ANOVA to compare means of multiple groups is similar to that used with the t-test to compare means of two independent groups. When one-way ANOVA is applied to the special case of two groups, one-way ANOVA gives identical results as the t-test. Not surprisingly, the assumptions needed for the t-test are also needed for ANOVA. We need to assume: 1) random, independent sampling from the k populations; 2) normal population distributions for each of the k populations; 3) equal variances within the k populations. Assumption 1 is crucial for any inferential statistic. As with the t-test, Assumptions 2 and 3 can be relaxed when large samples are used, and Assumption 3 can be relaxed when the sample sizes are roughly the same for each group even for small samples. (If there are extreme outliers or errors in the data, we need to deal with that problem first.) As a first step in this introduction to ANOVA, we will review the t-test for two independent groups, to prepare for an extension to ANOVA. Review of the t-test to test the equality of means for two independent groups Let us start with a small example.
    [Show full text]