<<

Statistical significance

In statistical hypothesis testing,[1][2] statistical signif- 1.1 Related concepts icance (or a statistically significant result) is at- tained whenever the observed p-value of a test statis- The significance level α is the threshhold for p below tic is less than the significance level defined for the which the experimenter assumes the null hypothesis is study.[3][4][5][6][7][8][9] The p-value is the probability of false, and something else is going on. This α is obtaining results at least as extreme as those observed, also the probability of mistakenly rejecting the null hy- given that the null hypothesis is true. The significance pothesis, if the null hypothesis is true.[22] level, α, is the probability of rejecting the null hypothe- Sometimes researchers talk about the confidence level γ sis, given that it is true.[10] This statistical technique for = (1 − α) instead. This is the probability of not rejecting testing the significance of results was developed in the the null hypothesis given that it is true. [23][24] Confidence early 20th century. levels and confidence intervals were introduced by Ney- In any or observation that involves drawing man in 1937.[25] a from a population, there is always the possibil- ity that an observed effect would have occurred due to error alone.[11][12] But if the p-value of an ob- 2 Role in statistical hypothesis test- served effect is less than the significance level, an inves- tigator may conclude that that effect reflects the charac- ing teristics of the whole population,[1] thereby rejecting the null hypothesis.[13] A significance level is chosen before Main articles: Statistical hypothesis testing, Null hypoth- , and typically set to 5%[14] or much lower, esis, , p-value, and Type I and type depending on the field of study.[15] II errors The term significance does not imply importance and the Statistical significance plays a pivotal role in statistical term statistical significance is not the same as research, theoretical, or practical significance.[1][2][16] For exam- ple, the term clinical significance refers to the practical importance of a treatment effect.

1 History

Main article:

In 1925, advanced the idea of statisti- In a two-tailed test, the rejection region for a significance level of cal hypothesis testing, which he called “tests of signifi- α=0.05 is partitioned to both ends of the cance”, in his publication Statistical Methods for Research and makes up 5% of the area under the curve (white areas). Workers.[17][18][19] Fisher suggested a probability of one in twenty (0.05) as a convenient cutoff level to reject hypothesis testing. It is used to determine whether the the null hypothesis.[20] In a 1933 paper, null hypothesis should be rejected or retained. The null and Egon Pearson called this cutoff the significance level, hypothesis is the default assumption that nothing hap- which they named α. They recommended that α be set pened or changed.[26] For the null hypothesis to be re- ahead of time, prior to any data collection.[20][21] jected, an observed result has to be statistically signif- Despite his initial suggestion of 0.05 as a significance icant, i.e. the observed p-value is less than the pre- level, Fisher did not intend this cutoff value to be fixed. specified significance level. For instance, in his 1956 publication Statistical methods To determine whether a result is statistically significant, and scientific inference he recommended that significant a researcher calculates a p-value, which is the probabil- levels be set according to specific circumstances.[20] ity of observing an effect given that the null hypothesis

1 2 4 SEE ALSO

is true.[9] The null hypothesis is rejected if the p-value nificance. A study that is found to be statistically signif- is less than a predetermined level, α. α is called the sig- icant, may not necessarily be practically significant. [39] nificance level, and is the probability of rejecting the null hypothesis given that it is true (a type I error). It is usually set at or below 5%. 3.1 Effect size

For example, when α is set to 5%, the conditional proba- Main article: Effect size bility of a type I error, given that the null hypothesis is true, is 5%,[27] and a statistically significant result is one where the observed p-value is less than 5%.[28] When drawing Effect size is a measure of a study’s practical significance. [40] data from a sample, this means that the rejection region A statistically significant result may have a weak ef- comprises 5% of the sampling distribution.[29] These 5% fect. To gauge the research significance of their result, can be allocated to one side of the sampling distribution, researchers are encouraged to always report an effect size as in a one-tailed test, or partitioned to both sides of the along with p-values. An effect size measure quantifies the distribution as in a two-tailed test, with each tail (or re- strength of an effect, such as the distance between two jection region) containing 2.5% of the distribution. means in units of (cf. Cohen’s d), the correlation between two variables or its square, and other The use of a one-tailed test is dependent on whether the measures.[41] or alternative hypothesis specifies a di- rection such as whether a group of objects is heavier or the performance of students on an assessment is better.[30] 3.2 A two-tailed test may still be used but it will be less powerful than a one-tailed test because the rejection re- Main article: Reproducibility gion for a one-tailed test is concentrated on one end of the null distribution and is twice the size (5% vs. 2.5%) A statistically significant result may not be easy to repro- of each rejection region for a two-tailed test. As a result, duce. In particular, some statistically significant results the null hypothesis can be rejected with a less extreme re- will in fact be false positives. Each failed attempt to re- sult if a one-tailed test was used.[31] The one-tailed test is produce a result increases the belief that the result was a only more powerful than a two-tailed test if the specified false positive. [42] direction of the alternative hypothesis is correct. If it is wrong, however, then the one-tailed test has no power. 3.3 Controversy around overuse in some journals 2.1 Stringent significance thresholds in specific fields Starting in the 2010s, some journals began question- ing whether significance testing, and particularly using Main articles: Standard deviation and Normal distribu- a threshold of α=5%, was being relied on too heavily as tion the primary measure of validity of a hypothesis.[43] Some journals encouraged authors to do more detailed analysis In specific fields such as particle physics and than just a statistical significance test. In social psychol- manufacturing, statistical significance is often ex- ogy, the Journal of Basic and Applied pressed in multiples of the standard deviation or sigma banned the use of significance testing altogether from pa- (σ) of a , with significance thresholds pers it published, requiring authors to use other measure set at a much stricter level (e.g. 5σ).[32][33] For instance, to evaluate hypotheses and impact.[44][45] the certainty of the particle’s existence was based on the 5σ criterion, which corresponds to a p-value of about 1 in 3.5 million.[33][34] 4 See also In other fields of scientific research such as genome-wide association studies significance levels as low as 5×10−8 are • A/B testing, ABX test [35][36] not uncommon. • Fisher’s method for combining independent tests of significance 3 Limitations • Look-elsewhere effect • Multiple comparisons problem Researchers focusing solely on whether their results are • Sample size statistically significant might report findings that are not substantive[37] and not replicable.[38] There is also a dif- • Texas sharpshooter fallacy (gives examples of tests ference between statistical significance and practical sig- where the significance level was set too high) 3

5 References [14] Craparo, Robert M. (2007). “Significance level”. In Salkind, Neil J. Encyclopedia of Measurement and Statis- [1] Sirkin, R. Mark (2005). “Two-sample t tests”. Statistics tics. 3. Thousand Oaks, CA: SAGE Publications. pp. for the Social (3rd ed.). Thousand Oaks, CA: 889–891. ISBN 1-412-91611-9. SAGE Publications, Inc. pp. 271–316. ISBN 1-412- 90546-X. [15] Sproull, Natalie L. (2002). “Hypothesis testing”. Hand- book of Research Methods: A Guide for Practitioners and [2] Borror, Connie M. (2009). “Statistical decision making”. Students in the Social (2nd ed.). Lanham, MD: The Certified Quality Engineer Handbook (3rd ed.). Mil- Scarecrow Press, Inc. pp. 49–64. ISBN 0-810-84486-9. waukee, WI: ASQ Quality Press. pp. 418–472. ISBN 0-873-89745-5. [16] Myers, Jerome L.; Well, Arnold D.; Lorch Jr, Robert F. (2010). “The t distribution and its applications”. Research [3] Redmond, Carol; Colton, Theodore (2001). “Clinical sig- Design and Statistical Analysis: Third Edition (3rd ed.). nificance versus statistical significance”. in New York, NY: Routledge. pp. 124–153. ISBN 0-805- Clinical Trials. Wiley Reference Series in Biostatistics 86431-8. (3rd ed.). West Sussex, United Kingdom: John Wiley & Sons Ltd. pp. 35–36. ISBN 0-471-82211-6. [17] Cumming, Geoff (2011). “From null hypothesis signif- icance to testing effect sizes”. Understanding The New [4] Cumming, Geoff (2012). Understanding The New Statis- Statistics: Effect Sizes, Confidence Intervals, and Meta- tics: Effect Sizes, Confidence Intervals, and Meta-Analysis. Analysis. Multivariate Applications Series. East Sussex, New York, USA: Routledge. pp. 27–28. United Kingdom: Routledge. pp. 21–52. ISBN 0-415- 87968-X. [5] Krzywinski, Martin; Altman, Naomi (30 October 2013). “Points of significance: Significance, P values and t- [18] Fisher, Ronald A. (1925). Statistical Methods for Research tests”. Nature Methods. Nature Publishing Group. 10 Workers. Edinburgh, UK: Oliver and Boyd. p. 43. ISBN (11): 1041–1042. doi:10.1038/nmeth.2698. Retrieved 0-050-02170-2. 3 July 2014.

[6] Sham, Pak C.; Purcell, Shaun M (17 April 2014). [19] Poletiek, Fenna H. (2001). “Formal theories of testing”. “Statistical power and significance testing in large-scale Hypothesis-testing Behaviour. Essays in Cognitive Psy- genetic studies”. Nature Reviews Genetics. Nature Pub- chology (1st ed.). East Sussex, United Kingdom: Psy- lishing Group. 15 (5): 335–346. doi:10.1038/nrg3706. chology Press. pp. 29–48. ISBN 1-841-69159-3. Retrieved 3 July 2014. [20] Quinn, Geoffrey R.; Keough, Michael J. (2002). Experi- [7] Johnson, Valen E. (October 9, 2013). “Revised stan- mental Design and Data Analysis for Biologists (1st ed.). dards for statistical evidence”. Proceedings of the National Cambridge, UK: Cambridge University Press. pp. 46–69. Academy of Sciences. National Academies of Science. ISBN 0-521-00976-6. 110: 19313–19317. doi:10.1073/pnas.1313476110. Re- trieved 3 July 2014. [21] Neyman, J.; Pearson, E.S. (1933). “The testing of statisti- cal hypotheses in relation to probabilities a priori”. Math- [8] Altman, Douglas G. (1999). Practical Statistics for Med- ematical Proceedings of the Cambridge Philosophical So- ical Research. New York, USA: Chapman & Hall/CRC. ciety. 29: 492–510. doi:10.1017/S030500410001152X. p. 167. ISBN 978-0412276309. [22] Schlotzhauer, Sandra (2007). Elementary Statistics Using [9] Devore, Jay L. (2011). Probability and Statistics for Engi- JMP (SAS Press) (PAP/CDR ed.). Cary, NC: SAS Insti- neering and the Sciences (8th ed.). Boston, MA: Cengage tute. pp. 166–169. ISBN 1-599-94375-1. Learning. pp. 300–344. ISBN 0-538-73352-7. [23] “Conclusions about statistical significance are possible [10] Schlotzhauer, Sandra (2007). Elementary Statistics Using with the help of the confidence interval. If the con- JMP (SAS Press) (PAP/CDR ed.). Cary, NC: SAS Insti- fidence interval does not include the value of zero ef- tute. pp. 166–169. ISBN 1-599-94375-1. fect, it can be assumed that there is a statistically [11] Babbie, Earl R. (2013). “The logic of sampling”. The significant result.” “Confidence Interval or P-Value?". Practice of Social Research (13th ed.). Belmont, CA: Cen- doi:10.3238/arztebl.2009.0335. gage Learning. pp. 185–226. ISBN 1-133-04979-6. [24] StatNews #73: Overlapping Confidence Intervals and Sta- [12] Faherty, Vincent (2008). “Probability and statistical sig- tistical Significance nificance”. Compassionate Statistics: Applied Quantitative Analysis for Social Services (With exercises and instruc- [25] Neyman, J. (1937). “Outline of a Theory of Statistical tions in SPSS) (1st ed.). Thousand Oaks, CA: SAGE Pub- Estimation Based on the Classical Theory of Probabil- lications, Inc. pp. 127–138. ISBN 1-412-93982-8. ity”. Philosophical Transactions of the Royal Society A. 236: 333–380. doi:10.1098/rsta.1937.0005. [13] McKillup, Steve (2006). “Probability helps you make a decision about your results”. Statistics Explained: An In- [26] Meier, Kenneth J.; Brudney, Jeffrey L.; Bohte, John troductory Guide for Life Scientists (1st ed.). Cambridge, (2011). Applied Statistics for Public and Nonprofit Admin- United Kingdom: Cambridge University Press. pp. 44– istration (3rd ed.). Boston, MA: Cengage Learning. pp. 56. ISBN 0-521-54316-9. 189–209. ISBN 1-111-34280-6. 4 6 FURTHER READING

[27] Healy, Joseph F. (2009). The Essentials of Statistics: A [41] Pedhazur, Elazar J.; Schmelkin, Liora P. (1991). Mea- Tool for Social Research (2nd ed.). Belmont, CA: Cen- surement, Design, and Analysis: An Integrated Approach gage Learning. pp. 177–205. ISBN 0-495-60143-8. (Student ed.). New York, NY: Psychology Press. pp. 180–210. ISBN 0-805-81063-3. [28] McKillup, Steve (2006). Statistics Explained: An Intro- ductory Guide for Life Scientists (1st ed.). Cambridge, [42] Stahel, Werner (2016). “Statistical Issue in Reproducibil- UK: Cambridge University Press. pp. 32–38. ISBN 0- ity”. Principles, Problems, Practices, and Prospects Repro- 521-54316-9. ducibility: Principles, Problems, Practices, and Prospects: 87-114. [29] Health, David (1995). An Introduction To Experimental Design And Statistics For Biology (1st ed.). Boston, MA: CRC press. pp. 123–154. ISBN 1-857-28132-2. [43] “CSSME Seminar Series: The argument over p-values and the Null Hypothesis Significance Testing (NHST) [30] Myers, Jerome L.; Well, Arnold D.; Lorch, Jr., Robert F. paradigm » School of Education » University of Leeds”. (2010). “Developing fundamentals of hypothesis testing www.education.leeds.ac.uk. Retrieved 2016-12-01. using the ”. and sta- tistical analysis (3rd ed.). New York, NY: Routledge. pp. [44] Woolston, Chris (2015-03-05). “Psychology jour- 65–90. ISBN 0-805-86431-8. nal bans P values”. Nature. 519 (7541): 9–9. doi:10.1038/519009f. [31] Hinton, Perry R. (2010). “Significance, error, and power”. Statistics explained (3rd ed.). New York, NY: Routledge. [45] Siegfried, Tom (2015-03-17). “P value ban: small step for pp. 79–90. ISBN 1-848-72312-1. a journal, giant leap for science”. Science News. Retrieved 2016-12-01. [32] Vaughan, Simon (2013). Scientific Inference: Learning from Data (1st ed.). Cambridge, UK: Cambridge Uni- versity Press. pp. 146–152. ISBN 1-107-02482-X.

[33] Bracken, Michael B. (2013). Risk, Chance, and Causa- 6 Further reading tion: Investigating the Origins and Treatment of Disease (1st ed.). New Haven, CT: Yale University Press. pp. • Ziliak, Stephen and Deirdre McCloskey (2008), The 260–276. ISBN 0-300-18884-6. Cult of Statistical Significance: How the Standard Er- [34] Franklin, Allan (2013). “Prologue: The rise of the sig- ror Costs Us Jobs, Justice, and Lives. Ann Arbor, mas”. Shifting Standards: in Particle Physics University of Michigan Press, 2009. ISBN 978-0- in the Twentieth Century (1st ed.). Pittsburgh, PA: Univer- 472-07007-7. Reviews and reception: (compiled by sity of Pittsburgh Press. pp. Ii–Iii. ISBN 0-822-94430-8. Ziliak)

[35] Clarke, GM; Anderson, CA; Pettersson, FH; Cardon, LR; • Thompson, Bruce (2004). “The “signifi- Morris, AP; Zondervan, KT (February 6, 2011). “Basic cance” crisis in psychology and education”. statistical analysis in genetic case-control studies”. Nature Journal of Socio-. 33: 607–613. Protocols. 6 (2): 121–33. doi:10.1038/nprot.2010.182. doi:10.1016/j.socec.2004.09.034. PMC 3154648 . PMID 21293453.

[36] Barsh, GS; Copenhaver, GP; Gibson, G; Williams, SM • Chow, Siu L., (1996). Statistical Significance: Ra- (July 5, 2012). “Guidelines for Genome-Wide Asso- tionale, Validity and Utility, Volume 1 of series In- ciation Studies”. PLoS Genetics. 8 (7): e1002812. troducing Statistical Methods, Sage Publications Ltd, doi:10.1371/journal.pgen.1002812. PMC 3390399 . ISBN 978-0-7619-5205-3 – argues that statistical PMID 22792080. significance is useful in certain circumstances.

[37] Carver, Ronald P. (1978). “The Case Against Statistical • Kline, Rex, (2004). Beyond Significance Testing: Significance Testing”. Harvard Educational Review. 48: Reforming Data Analysis Methods in Behavioral Re- 378–399. search Washington, DC: American Psychological [38] Ioannidis, John P. A. (2005). “Why most published re- Association. search findings are false”. PLoS Medicine. 2: e124. doi:10.1371/journal.pmed.0020124. PMC 1182327 . • Nuzzo, Regina (2014). Scientific method: Statisti- PMID 16060722. cal errors. Nature Vol. 506, p. 150-152 (open ac- cess). Highlights common misunderstandings about [39] Hojat, Mohammadreza; Xu, Gang (2004). “A Visitor’s the p value. Guide to Effect Sizes”. Advances in Health Sciences Edu- cation. • Cohen, Joseph (1994). . The earth is round (p<.05). [40] Hojat, Mohammadreza; Xu, Gang (2004). “A Visitor’s American Psychologist. Vol 49, p. 997-1003. Re- Guide to Effect Sizes”. Advances in Health Sciences Edu- views problems with null hypothesis statistical test- cation. ing. 5

7 External links

• The article "Earliest Known Uses of Some of the Words of Mathematics (S)" contains an entry on Sig- nificance that provides some historical information.

• "The Concept of Statistical Significance Testing" (February 1994): article by Bruce Thompon hosted by the ERIC Clearinghouse on Assessment and Evaluation, Washington, D.C. • "What does it for a result to be “statistically significant"?" (no date): an article from the Statis- tical Assessment Service at George Mason Univer- sity, Washington, D.C. 6 8 TEXT AND IMAGE SOURCES, CONTRIBUTORS, AND LICENSES

8 Text and image sources, contributors, and licenses

8.1 Text

• Statistical significance Source: https://en.wikipedia.org/wiki/Statistical_significance?oldid=752501010 Contributors: Bryan Derksen, The Anome, William Avery, Michael Hardy, Kku, Gabbe, Dcljr, Ellywa, Nichtich~enwiki, Den fjättrade ankan~enwiki, Nerd~enwiki, Cherkash, Topbanana, Paranoid, Gak, Henrygb, Giftlite, Sj, BrendanH, Pgan002, Antandrus, L353a1, DanielCD, Rich Farmbrough, Yknott, Kndiaye, Bender235, Slb, Cretog8, Arcadian, Andrewpmk, John Quiggin, Seans Potato Business, Alkarex, Woohookitty, Btyner, Rjwilmsi, Smoe, Thomas Arelatensis, Thisismikesother, ElKevbo, Cjpuffin, EvanSeeds, Lborelli~enwiki, Mathbot, Riki, Preslethe, Vonkje, Chobot, YurikBot, Wavelength, Gaius Cornelius, ENeville, Nephron, DRosenbach, Jon Olav Vik, Doc pune, Lt-wiki-bot, Davril2020, Bad- gettrg, Darrel francis, SmackBot, McGeddon, Jtneill, Robfuller, Ohnoitsjamie, Josefec, Nbarth, Danielkueh, Wen D House, Richard001, G716, Arodb, Euchiasmus, Tim bates, Nijdam, Tommyzee, Mmiller0712, Mdgross50, Grapplequip, DwightKingsbury, Joseph Solis in Australia, Abeg92, Tawkerbot4, LarryQ, Thijs!bot, Tallred, RickinBaltimore, Tillman, Erxnmedia, Fetchcomms, Magioladitis, Torchi- est, Inhumandecency, MartinBot, ChemNerd, Lilac Soul, Rod57, Coppertwig, Yym1997, Kenneth M Burke, Spellcast, Philip Trueman, Don Quixote de la Mancha, MuanN, Seraphim, Sprasad.ee, SQL, Wangerin, Lavers, Jasondet, Strasburger, The-G-Unit-Boss, Melcombe, Wjmummert, Martarius, ClueBot, Binksternet, Srudes2, Winsteps, Pwestfall, Lot49a, Qwfp, Staticshakedown, Dthomsen8, SilvonenBot, Mifter, Aam aadmi, ZooFari, Jmkim dot com, Tayste, Addbot, Eric Drexler, DOI bot, Fgnievinski, Bulletproofman19, MrOllie, Palmer- abollo, Numbo3-bot, Ehrenkater, Zorrobot, Luckas-bot, AnomieBOT, ChristopheS, Materialscientist, SvartMan, Jtamad, Xqbot, Bbarkley, Sylwia Ufnalska, M12107, Constructive editor, FrescoBot, Sławomir Biały, Pinethicket, Edderso, Tom.Reding, Georg Hurtig, RedBot, Gjsis, Cerebis, Animalparty, Indicedigini, Raylyons, Billare, Sir Arthur Williams, Rgmooney C109, GoingBatty, Schwa dk, The Blade of the Northern Lights, HiW-Bot, Kostya 888, Muditjai, Mysticyx, L Kensington, Fg63~enwiki, Mikhail Ryazanov, ClueBot NG, Mathstat, Michael D. Stephens, Helpful Pixie Bot, BG19bot, Wikstar7, Lilingxi, Matthieu Vergne, Manoguru, Minsbot, MathewTownsend, BattyBot, HankW512, ChrisGualtieri, Eggingerik, BetseyTrotwood, NicenFriendlyPerson, Sa publishers, ArmbrustBot, Soranoch, Drchriswilliams, Thewikiguru1, Rgiordan, EmilKarlsson, 1980na, Ejw wiki editor, Isambard Kingdom, User000name, ChrisLloyd58 and Anonymous: 162

8.2 Images

• File:Commons-logo.svg Source: https://upload.wikimedia.org/wikipedia/en/4/4a/Commons-logo.svg License: PD Contributors: ? Origi- nal artist: ? • File:Fisher_iris_versicolor_sepalwidth.svg Source: https://upload.wikimedia.org/wikipedia/commons/4/40/Fisher_iris_versicolor_ sepalwidth.svg License: CC BY-SA 3.0 Contributors: en:Image:Fisher iris versicolor sepalwidth.png Original artist: en:User:Qwfp (origi- nal); Pbroks13 (talk) (redraw) • File:Folder_Hexagonal_Icon.svg Source: https://upload.wikimedia.org/wikipedia/en/4/48/Folder_Hexagonal_Icon.svg License: Cc-by- sa-3.0 Contributors: ? Original artist: ? • File:Lock-green.svg Source: https://upload.wikimedia.org/wikipedia/commons/6/65/Lock-green.svg License: CC0 Contributors: en:File: Free-to-read_lock_75.svg Original artist: User:Trappist the monk • File:NormalDist1.96.png Source: https://upload.wikimedia.org/wikipedia/en/b/bf/NormalDist1.96.png License: Cc-by-sa-3.0 Contribu- tors: self-made Original artist: Qwfp (talk) • File:People_icon.svg Source: https://upload.wikimedia.org/wikipedia/commons/3/37/People_icon.svg License: CC0 Contributors: Open- Clipart Original artist: OpenClipart • File:Portal-puzzle.svg Source: https://upload.wikimedia.org/wikipedia/en/f/fd/Portal-puzzle.svg License: Public domain Contributors: ? Original artist: ? • File:Wikiversity-logo.svg Source: https://upload.wikimedia.org/wikipedia/commons/9/91/Wikiversity-logo.svg License: CC BY-SA 3.0 Contributors: Snorky (optimized and cleaned up by verdy_p) Original artist: Snorky (optimized and cleaned up by verdy_p)

8.3 Content license

• Creative Commons Attribution-Share Alike 3.0