<<

Research Skills Seminar Series 2019 CAHS Research Education Program

Introductory Biostatistics Understanding and reporting research results including P-values and Confidence Intervals

Julie Marsh UWA, School of Mathematics and Snr Research Fellow, Telethon Kids Institute

7 June 2019

Research Skills Seminar Series | CAHS Research Education Program Department of Child Research | Child and Adolescent Health Service

ResearchEducationProgram.org Introductory Biostatistics

CONTENTS:

1 PRESENTATION ...... 1 2 ADDITIONAL WEBSITES ...... 20 3 STATISTICAL ANALYSIS ...... 20 4 BRADFORD‐HILL CRITERIA FOR CAUSALITY ...... 20 5 STATISTICAL SUPPORT CONTACTS ...... 20 5.1 Perth Children’s Hospital ...... 20 5.2 Telethon Kids Institute...... 20 5.3 University of Western Australia – The Centre for Applied Statistics ...... 21 5.4 WAHTN and Management Centre ...... 21 6 REPORTING GUIDELINES ...... 22 7 STATISTICAL ERRORS, . NUZZO ...... 29

© CAHS Research Education Program, Department of Child Health Research, Child and Adolescent Health Service, Department of Health WA, WA 2019 Copyright to this material produced by the CAHS Research Education Program, Department of Child Health Research, Child and Adolescent Health Service, Western Australia, under the provisions of the Copyright Act 1968 (C’wth Australia). Apart from any fair dealing for personal, academic, research or non‐commercial use, no part may be reproduced without written permission. The Department of Child Health Research is under no obligation to grant this permission. Please acknowledge the Research Education Program, Department of Child Health Research, Child and Adolescent Health Service when reproducing or quoting material from this source. RESEARCH SKILLS SEMINAR SERIES 2019 CAHS Research Education Program Overview Introductory Biostatistics . Observational vs Experimental Design (randomisation) . Understanding and reporting research results including Chance, Bias, & P-Values and Confidence Intervals . Statistical analyses ( testing) . Understanding confidence intervals . Interpreting P‐vales Dr Julie Marsh Snr Research Fellow, Telethon Kids Institute . Standardised reporting . Where can I get statistical help? Research Skills Seminar Series | CAHS Research Education Program Department of Child Health Research | Child and Adolescent Health Service

ResearchEducationProgram.org 2

Statisticians speak in code Observational vs Experimental Design but they hide it in everyday language

3 4

1 Observational Design Observational Design An draws inferences from a to a An Observational study draws inferences from a sample to a , where the independent variable is not under the control population, where the independent variable is not under the control of the researcher because of the ethical concerns or logistical of the researcher because of the ethical concerns or logistical constraints. constraints.

Inference is where we infer (guess!) the properties of a population, A sample is a set of data collected or selected from a population i.e. we may assume that the heights of the people in this room are using a defined procedure, normally distributed i.e. we may choose to only measure the heights of the people in the first 5 rows of the auditorium

5 6

Observational Design Observational Design An Observational study draws inferences from a sample to a An Observational study draws inferences from a sample to a population, where the independent variable is not under the control population, where the independent variable is not under the control of the researcher because of the ethical concerns or logistical of the researcher because of the ethical concerns or logistical constraints. constraints.

A is all members of a defined group, The independent variable is the variable that is changed or i.e. everyone currently sitting in the auditorium controlled in an experimental design or simply observed in an observational design, i.e. exposure to asbestos or an intervention, such as a drug

7 8

2 Observational Design Experimental Design

An Observational study draws inferences from a sample to a Experimental design refers to a plan for assigning units (patients, population, where the independent variable is not under the control mice, etc.) to treatments or interventions, whilst holding other of the researcher because of the ethical concerns or logistical factors constant. A good design addresses: constraints. In observational design, statistical control often refers to the . Association technique of separating out the effect of one independent variable on the outcome (eg. exposed or not exposed). . Control In an experimental designs, it usually refers to the use of a . Variability comparator group (i.e. placebo or standard care) and assumes all other factors are constant. 9 10

Experimental Design Experimental Design

. Association . Control

This is the relationship between the independent variable and Allows the experimenter to rule out alternative explanations due the dependent variable (outcome, endpoint). We use the word to the confounding effects of factors other than the independent association as it may not be possible to determine if the variable. relationship is causal.

11 12

3 1 Experimental Design Experimental Design

. Variability . The highest level of evidence for causality in comes Randomised Controlled Trials (RCT) Need to minimise variability in the outcome to maximise study , which is often by reducing the number of units (e.g. patients) studied. • variability • Assessor variability • Homogeneous units . Patients are randomly allocated to treatment groups

13

Randomisation Randomisation

. Often not possible to control for all factors, randomisation distributes patients with these factors equally across treatment groups Generalisability (external validity) of the results is determined by the inclusion/exclusion criteria . Randomisation minimises potential bias from factors other than independent variable, including some not even measured

. Important to report baseline demographics for each treatment group in a RCT

15 16

4 Chance, Bias and Confounding

. The critical assessment of research results must rule out or Chance, Bias, and Confounding minimise: CHANCE CONFOUNDING BIAS

18

Chance and Bias Bias . Selection Bias . Chance • E.g. Farmers may appear healthier than non‐farmers as those in poor focus on the likelihood that a result or relationship is caused by something other than health may move out of the occupation random chance . Recall Bias . Bias • E.g. diagnosed with (cases) may be more likely to Where a systematic error in the design, remember exposure that non‐cases recruitment, or analysis, results in an incorrect estimation of the true effect of the exposure/intervention on the outcome

19 20

5 Bias Bias . Efficacy Bias . Treatment Allocation Bias • If treatment allocation is known, then assessment of outcome by • Choice of treatment should not be related to prognostic factors either the participant or study personnel may be distorted • E.g. disease, severity, age etc.

. Survival Bias . Protocol/Study Deviations • Outcome is dependent on the participant surviving or tolerating • E.g. Cross contamination in a school‐based intervention programme treatment until the assessment date • E.g. Not taking study medication (hence RCT include both intent‐to‐treat and per‐protocol analyses)

21 22

Confounding Factors Confounding Factors . Where the relationship between intervention or exposure and More formally: outcome is distorted by another variable . In a study of the effect of a variable X on disease Y, a third variable, W may be a confounder if: a) It is a risk factor for disease Y, and b) It is associated with (but does not cause) variable X

23 24

6 Confounding Chance, Bias & Confounding

. Evidence is generally reporting as differences, proportions, rates . Good clinical Practice (GCP) and Experimental Design and ratios (RR, OR, HR) but language can distort the evidence • attempt to minimise bias and confounding . Hypothesis Testing . Size of effect is OBJECTIVE, whereas Strength is SUBJECTIVE • assesses whether we could have observed the results by chance alone . . We often see the statement “there is strong evidence” when the • helps us understands the underlying variation in the results research is reported but you should be the judge . Reporting Guidelines • attempt to minimise subjective reporting

25 26

Statistical Analysis

Statistical Analyses Hypothesis testing “What is the probability that I would have obtained this set of data due to chance alone, under my current beliefs?”

27 28

7 Statistical Analysis Statistical Analysis

Great website that can guide you through the analysis stage if you do Links take you to example code for each type of analysis: e.g. STATA 1‐sample t‐test not have statistical support: https://stats.idre.ucla.edu/other/mult‐pkg/whatstat/ (link in your handouts)

29 30

Example: One Sample T-Test on Stata

Understanding Confidence Intervals

31 32

8 Understanding Confidence Intervals Understanding Confidence Intervals

. To illustration some statistical concepts today, I will use a simple . We collect a sample of birth weights from the offspring of mothers birth weight (full‐term) example. who exercised in pregnancy • Birth weight is normally distributed (Gaussian distribution) . We wish to know if there is any difference compared to the • Sample size is 152 live births WA population birth weight population mean μ=3.4kg

. We will construct a confidence interval First we will look at how to describe the variability for the sample mean birth weight and perform a hypothesis test using a one‐sample t‐test

33 34

Accuracy Understanding Confidence Intervals

. Accuracy of the Sample Mean depends on: . Back to the birth weight example: • Sample Size (n) • Sample Size (n) = 152 newborns • Sample Mean (x̅) = 3.320kg • Sample (s) • Sample Standard Deviation (s) = 0.424 . The variability associated with the sample mean is described by its • = 0.424/√152 = 0.0344 distribution: x̅ ~ N[μ,σ2/n] Confidence Intervals are an easy‐to‐interpret way of expressing the precision with which a We estimate σ (population standard deviation) with s, (e.g. x̅) estimates a Parameter (e.g. μ). so the standard error of the sample mean is s/√n

35 36

9 Continued… Understanding Confidence Interval Therefore: . Remember, for a standard normal Z p(‐1.96 < (x̅ – μ)< 1.96) = 0.95 s/√n Rearranging gives us: p(x̅ – 1.96(s/√n) < μ < x̅ + 1.96(s/√n) ) = 0.95 p(‐1.96 < Z < 1.96) = 0.95 95% confidence interval for mean birth weight: x̅ ± 1.96(s/√n)

3.320 ± 1.96x0.0344 i.e. 95% CI [3.25,3.39kg]

But how do we interpret this? 37 38

Understanding Confidence Interval Understanding Confidence Interval . For 20 random samples, . Intuitive: based on a • A confidence interval provides a of plausible values for a population (true) mean of 100 parameter, in light of the observed data. . We could expect around . Formal: one 95% CI to NOT contain the true mean. • If a very large number of random samples (all same size) were collected from a population, and a 95% CI computed for each sample, . This is our Type I Error (α): then true value of the parameter would like in about 95% of these • i.e. 1 in 20, or intervals • 0.05, or • 5% 39 40

10 Understanding Confidence Interval Normal compared to T-Distribution

. For constant sample size and . When sample sizes are small then it is necessary to use standard error, Student’s T‐Distribution rather than the Normal Distribution . The width of the confidence interval is determined by the . The “degree of freedom” (df) is simply the for significance level (α) the T‐Distribution

. 99% CI is the widest Eg. for a One‐Sample T‐Test: df=n‐1 (sample size minus 1) . 90% CI is the narrowest

41 42

Normal compared to T-Distribution Understanding Confidence Interval

. Standard Error gives an idea of the level of precision of our Typical mouse Typical phase Study, n=6 1 study, n=12 estimate . We can calculate a Confidence Interval based on: • The Estimate • Each tail region represents a probability of 2.5%, • Its Standard Error, and • so for a 2‐Tailed Test the significance level (α) is 5% • the Known Properties (distribution) of the Estimate . The CI gives a plausible range of values for the true population Incorrect use of If n>120 use parameter “CLT”, n=30 Normal distribution 43 43 44

11 Hypothesis Testing

Interpreting P-Values “What is the probability that I would have obtained this set of data due to chance alone, under my current beliefs?”

Statistical way to assist with answering questions/claims about certain population parameters by sampling from the population of interest.

45 46

Interpreting P-Values Interpreting P-Values Continuing with the birth weight example, we can write the You may be familiar with the formula for a null (H0) and alternative (H1) hypotheses as: one‐sample t‐test: . H : true mean birth weigh is 3.4kg (=3.4) 0 z = x̅ –  . H : true mean birth weight is not 3.4kg (3.4) 1 s/√n

• This is a 2‐Sided Test x̅ = sample mean • it does not specify the direction in which we are looking for  to depart from the value specified under the null hypothesis.  = mean under the null hypothesis (H0) s = sample standard deviation n = sample size

47 48

12 Interpreting P-Values Interpreting P-Values

. Substituting in the values from the birth weight example: A P‐value is defined to be the probability of getting a z = 3.320 –3.4 = ‐2.33 at least as extreme as the one observed, 0.424/√152 under the assumption that H0 is true. . In Hypothesis Test we assume that H is true . Z is likely to take values close to zero if H0 is correct 0 . Z is likely to take extreme, large positive or negative values • for the birth weight example we know that the test statistic (Z) has a Standard Normal Distribution, N[0,1]. if H0 is incorrect

. How likely are we to see a result as least as extreme as the one Is ‐2.33 sufficiently extreme to cause us to reject H0? observed (i.e. z < ‐2.33 & z > 2.33) ? 49 50

Interpreting P-Values Continued…

Hence the p‐value calculation is based on the area under the curve From the hypothesis test, p‐value = 0.0198 in each tail of the standard normal distribution below:

Standard Normal Distribution Since the p‐value is less than the significance level 0.05, we reject H 0.4 p‐value= p(Z < ‐2.33) + p(Z > 2.33) 0 . conclude that the mean birth weight in the exercise group is lower the = 0.0099 + 0.0099 the WA mean birth weight. 0.3 = 0.0198

0.2 Is this a clinically significant result for birth weight in the exercise group? Density

0.1 mean: 3.32kg The probability in each “tail” can be 95% CI (3.25,3.39kg) P(z< -2.33 )= 0.0099 P(z> 2.33 )= 0.0099 obtained from statistical tables or p=0.02 0.0 statistical packages.. 0.00 1.00 2.00 2.33 3.00 4.00 -4.00 -3.00 -2.33 -2.00 -1.00

z 51 52

13 Interpreting P-Values Sample Size

. Rejection of the null hypothesis (H0) is determined by the significance level (α is usually 5%, i.e. 0.05) Why is Sample Size so important? . The significance level is the probability of a type I error (falsely rejecting a true null hypothesis)

. Fisher (1965) did not intend the significance to be fixed at 5%, but set according to circumstances

53 54

Precision Sample Size

. Sample size controls the Precision of the conclusions by taking . Do you think there would be a significant difference between into account: these two samples? • Variability of the • Sensitivity of the Statistical Method • Clinically relevant Difference ()

Sample Size = 150 per group Standard Deviation = 0.5

55 56

14 Sample Size Sample Size

. What happens when we reduce the size of the difference between . What happens when we increase the variability? the two ? Original plot Original plot

Sample Size = 150 per group Sample Size = 150 per group Standard Deviation = 0.5 Standard Deviation = 0.5

57 58

Sample Size Sample Size

. The same happens when we decrease the sample size? . The significance level or Type I Error . Standard of error = s/√n (probability of falsely rejecting the null hypothesis) Original plot is illustrated below for a 2‐tailed hypothesis test

Sample Size = 10 per group Sample Size = 10 per group Standard Deviation = 0.5 Standard Deviation = 0.5

59 60

15 Sample Size Sample Size

. The statistical power of this test is illustrated below . Usually the statistical power of a hypothesis test is defined to be Type II Error is the probability of falsely failing to reject the null hypothesis at least 80% . In this case we could detect a difference of at least 0.443kg

Difference Power = 1 ‐ Type II Error 3.843‐3.4 = 0.443kg Power = 1 ‐ Type II Error = 1 ‐ 0.033 = 1 ‐ 0.2 = 0.967 (or 96.7%) Type II Error Type I Error = 0.8 (or 80%) Sample Size = 10 per group Sample Size = 10 per group Standard Deviation = 0.5 Standard Deviation = 0.5

61 62

Standardised Reporting

. P‐Values give the likelihood that an observed association could have arisen solely as a result of chance bias in sampling Standardised Reporting . Estimate of effect size –gives direction and magnitude

. Confidence intervals are an estimate of variability in the effect size (precision)

. cannot distinguish causal from non‐causal associations

63 64

16 Bradford-Hill Criteria for Causality Standardised Reporting . Temporal relationship . Reporting guidelines provide . Strength of relationship • Checklists, depending on type of study design . Dose‐response • Ensure that published research includes sufficient details for us to critically assess the quality of research . Consistency CONSORT, TREND, STROBE, REMARK, STREGA, PRISMA . Plausibility These guidelines should also be consulted: . Consideration of alternative explanations . before you Design the Study . . before you Plan the Analysis . Specificity . before you Write the Paper . Coherence Austin Bradford Hill: The Environment and Disease: Association or Causation? . before you Submit the Paper Proceedings of the Royal Society of ,65 58 (1965): 295-300. 66

Where can I find a ? Perth Children’s Hospital: Free advice through Telethon Centre Where can I get statistical help? Telethon Kids Institute (consultancy service): @telethonkids.org.au

UWA (consultancy service): consulting‐[email protected] The Centre for Applied Statistics, UWA, offers free advice to UWA postgraduate research students

. More in handouts 67 68

17 Checklist for talking to a Statistician How can I learn more about statistics?

. Clear hypothesis In the absence of large, randomised, well‐controlled clinical trials to . Proposed study design address every research questions we all need to increase our . Primary endpoint & estimate of variability statistical literacy . Clinically relevant effect size . Estimate of feasible sample size based on budget or potential Imagine if we updated our annual patient recruitment statistical skills as often as we . Important confounders & source of bias updated our computer skills. . Similar publications or systematic reviews

69 70

How can I learn more about statistics? How can I learn more about statistics?

In person at Perth Children’s Hospital: Online: Data Science Specialization Attend Research Skills Seminars. Johns Hopkins University In person at UWA: The Centre for Applied Statistics provides short courses in FAQ: You can access the course for free via https://www.coursera.org/specializations/jhu‐data‐science#courses. statistics which are heavily discounted for students. This will allow you to explore the course, watch lectures, and Joint Clinical‐Statistical Supervision: participate in discussions for free. To be eligible to earn a certificate, If one of your supervisors is a statistician then you will have you must either pay for enrolment or qualify for financial aid. unlimited access to statistical knowledge/training. Links in your handouts

71 72

18 RESEARCH SKILLS SEMINAR SERIES 2019 Research Education Program

© CAHS Research Education Program, Department of Child Health Research, Child and Adolescent Health Service, WA 2019

Copyright to this material produced by the CAHS Research Education Program, Department of Child Health Research, Child and Adolescent Health Service, Western Australia, under the provisions of the Copyright Act 1968 (C’wth Australia). Apart from any fair dealing for personal, academic, research or non-commercial use, no part may be reproduced without written permission. The Department of Child Health Research is under no obligation to grant this permission. Please acknowledge the CAHS Research Education Program, Department of Child Health Research, Child and Adolescent Health Service when reproducing or quoting material from this source.

ResearchEducationProgram.org

19 2 ADDITIONAL WEBSITES

UCLA – Institute for Digital Research and Education Choosing the Correct Statistical Test in SAS, STAT, SPSS and R https://stats.idre.ucla.edu/other/mult‐pkg/whatstat/

John Hopkins University – Data Science Specialization https://www.coursera.org/specializations/jhu‐data‐science#courses

3 STATISTICAL ANALYSIS UCLA Institute for Digital Research & Education https://stats.idre.ucla.edu/other/mult‐pkg/whatstat/ Refer to slides 29‐31

4 BRADFORD‐HILL CRITERIA FOR CAUSALITY . Temporal relationship . Strength of relationship . Dose‐response . Consistency . Plausibility . Consideration of alternative explanations . Experiment . Specificity . Coherence

5 STATISTICAL SUPPORT CONTACTS

5.1 Perth Children’s Hospital

Telethon Clinical Research Centre (TCRC), Department of Child Health Research Child and Adolescent Health Service [email protected] https://pch.health.wa.gov.au/Research/For‐researchers/Telethon‐Clinical‐Research‐Centre

Biostatistics and Data Management Support through TCRC https://cahs‐healthpoint.hdwa.health.wa.gov.au/directory/research/researchers/Pages/Biostatistics.aspx (internal Department of Health site)

5.2 Telethon Kids Institute Consultancy Service [email protected]

20 5.3 University of Western Australia – The Centre for Applied Statistics *offers free advice for UWA Postgraduate Research Students consulting‐[email protected]

5.4 WAHTN Clinical Trial and Data Management Centre The Clinical Trial and Data Management Centre is a WAHTN enabling platform which aims to enhance clinical trials and related data management in Western Australia.

The platform is a WAHTN‐wide entity sharing expertise in clinical trial study design (including novel designs), clinical trial conduct, data management, data‐linkage, analytical techniques for clinical trial datasets, bio‐repository techniques and clinical registry datasets. It facilitates the pursuit of large‐scale clinical trials and translational healthcare research in WA. https://www.wahtn.org/state‐activities/clinical‐trials‐data‐centre/ Contact: [email protected] 08 9266 1970

Clinical Research Support Service – EMHS Sessions Tuesdays 9:30am – 3:00pm EMHS Research Hub, L2 Kirkman House 30th July 6th August 8th October 3rd December

21

Reporting Guidelines ______Estimating intervention effects: CONSORT (randomised clinical trial) TREND (non-randomised study) ______Assessing causes & prognosis: STROBE (observational studies) ______Quantifying accuracy of STARD (diagnostic studies) diagnosis & prognosis tools: REMARK (tumour markers prognostic studies) ______Testing genetic association: STREGA (genetic observational studies) ______Aggregating evidence: PRISMA (previously known as QUOROM) systematic reviews & meta-analyses [MOOSE, IOM] ______

• CONSORT (CONsolidated Standards Of Reporting Trials) Moher D, Hopewell S, Schulz KF, Montori V, Gotzsche PC, Devereux PJ, et al. CONSORT 2010 Explanation and Elaboration: updated guidelines for reporting parallel group randomised trials. BMJ 2010;340:c869.

• TREND (Transparent Reporting of Evaluations with Non-randomised Designs) Des Jarlais DC, Lyles C, Crepaz N, TREND Group. Improving the reporting quality of nonrandomized evaluations of behavioral and interventions: the TREND statement. Am J Public Health 2004;94:361-6.

• STROBE (STrengthening the Reporting of OBservational studies in ) Vandenbroucke JP, von Elm E, Altman DG, Gøtzsche PC, Mulrow CD, Pocock SJ, et al; STROBE Initiative. Strengthening the reporting of observational studies in epidemiology (STROBE): explanation and elaboration. PLoS Med 2007;4:1628-54.

• STARD (STAndards for the Reporting of Diagnostic accuracy studies) Bossuyt PM, Reitsma JB, Bruns DE, Gatsonis CA, Glasziou PP, Irwig LM, et al. Towards complete and accurate reporting of studies of diagnostic accuracy: the STARD initiative. BMJ2003;326:41-4.

• REMARK (REporting recommendations for tumour MARKer prognostic studies) McShane LM, Altman DG, Sauerbrei W, Taube SE, Gion M, Clark GM. REporting recommendations for tumour MARKer prognostic studies (REMARK). Br J Cancer2005;93:387-91.

• STREGA (STrengthening the REporting of Genetic Associations) Little J, Higgins JP, Ioannidis JP, Moher D, Gagnon F, von Elm E, et al. Strengthening the reporting of genetic association studies (STREGA): an extension of the STROBE statement. Eur J Epidemiol2009;24:37-55.

22 Understanding & reporting your research results Additional Resources

Reporting Guidelines (continued)

• PRISMA (preferred reporting items for systematic reviews and meta-analyses) Moher D, Liberati A, Tetzlaff J, Altman DG; PRISMA Group. Preferred reporting items for systematic reviews and meta-analyses: the PRISMA statement. PLoS Med2009;6:e1000097. [Previously known as Quality of Reporting of Meta-analyses or QUOROM]

• MOOSE: Meta-analysis Of Observational Studies in Epidemiology http://www.consort-statement.org/mod_product/uploads/MOOSE%20Statement%202000.pdf

• IOM: Institutes of Medicine Standards for Systematic Reviews http://iom.edu/Reports/2011/Finding-What-Works-in-Health-Care-Standards-for-Systematic- Reviews.aspx

Recommended basic statistics textbook

• De Veaux, R.D., Velleman, P.F., and Bock, D. (2012). Stats: Data And Models (3rd Edition), Pearson Education, Boston.

Good source for how to perform statistical analyses (SAS, SPSS, STATA & R)

• http://www.ats.ucla.edu/stat/ • If you can’t find how to perform the statistical analysis on this site then you definitely need to consult a statistician!

The following pages are from the back pages of “Stats: Data And Models (2nd Edition)” and provide a great overview of common (basic) statistical formulas, methods and associated assumptions.

23 24

ro ilnoqB elEurllsg relaruErEd Iapohtr ernPaJord ZoMl I raldeqS !IS dno,r8 | nr,rrrr;o1

auo I

'H rot A ni leruelul "is-19s , ^1 - -lx - "x), t,gt..fS uollclpold+ 'rl rol te,ualut eldu.res jnffJrt (to)z:sf nd Z-u:lP 'orrrl) oc a'- "l acuoplluoc+ auo LZ -.+-(x-"x)Z" j --- uolle!)ossv .-rol (16olouqcel qlrrvl elnduoc) 6l le^lolul tq td acuapuuoc ro lsol-l uotssal6eg :eeut1

(selqeuen lsol zx eldues lecuo6alec acuapuedapul ou6t otv\) r - c)(r -.,) : lp aruapuadaPul zx dxt sdnotb 9Z lsol z@x3 - sqo )o zx r{1reue6or.uo;1 }uaPUadoPulruen ,rr,,L"J3iJ1il,", suollnqFlslo l-s/lao:,P i!i{o eldues ouo zx -ssaupooe

L-u:lP le^rolul-J PailEd s.lled p pd poqcle!\l u 9Z !^ j lsal, parrEd Pg

. sdnolO zu tu\ A6olouqcel P!-atll.:19.Y:s--z suEew zi-til ruol, lp lsal-l alou'lEs-zluapuadepur nz 9s isl "rt-rrt j o/v[ i-u=lP ffi v 4 rl auo ez I le^loiul, s

zu+tL1 Zg tu \ lsof-z d '- ! ! uorlrodold-Z zz zi+V-Y- sdnorD 84 -rN,' zd- td zd-td z luepuedeput zg tu le^.lslul-z O/vU + ,\ uoluodold-Z zz zbzd tbtd i suoluodord _L\ lssf-z tz'oz obod I uortrodotd- L q d z eloules Ur\ e,tr"ll'rl. auo 6r tu1":) uou.rodo;6- 1, ro alEtlrIls.I ralartrErEd Iapohl aJnpa)oJd zol\Al zlnoqE .ra1deq3 :IS dno.rB aJuararul auo {ulqr .IIAI ^^OqS aruaralul o] ePInD Irln$ 25

(saseartur u su lerrlrrc ssal*) '1q8rerls dyqeuosear sr 1o1d .,finrqeqo.rd IeurroN ro 'crrlarutuds pueJE IppourpnIppourrun sr slenprsaJ;oslenprsar *ir8o1.rgruer3o1sr11 .7 .lapotll p ., 'spea:ds io ieurroN srorrg relnurs ,tror{s (sro+)e, ioy slb1i"oq pp;ed) r1o1d*o'[ ^^olloJ 'peards 7 *ur ]uelsuor*sE+DuvJ seqDsLl sanle^Ddrrlvl\ palrrpaidpd+Jlpd,ru lsrnesrj+suleae ,1r,-rpr.",slenplsal .t 'slalal .g 'uoqeznuopueJ Jo lolcllOId luau4parl ssor)E aJuerlel lenbE alqelrns raLllo ro luaurrradxa pazruopue1 .Z 'srorra .7 '(alqrssod luapuadapul I urral uorlreralur '(ura1 uoqreralur uE apnlJur asrMraqlo) sauq 1a11ered s.,r,roqs 1o1d uor])€ralul .I .I ou r{tr.tr srotreJ Z ar€ araq} Jl) Iapohl a^BIppV srolrrpard lerrro8aler Jo sla^al sso.rce asuods;r'"e;* yo rjrft"Lg . ('q)Ee u! sra^ar Jequnu pue ,1) ":."F ro sropeJ Io raqrunu uo spuedap ,p arue;i6a }o'i;sfileuy ,rr4atuur.{s *'l{8rer1s dlquuosear sr 1o1d,(1ilqeqord letu.rog ro pue Iepowrun,,{lalerurxorddp sr slenprsar yo .p .lapou ruer8olsrg iEurroN p .Mollo} srorrg .t 'pea.rds luelsuo) ,r,{^rnl1-r1p9;rrpard ,r.r,ne'J .ruulsuo) .s 'sanle^ ffi;J;X1?,T , sr srorra }o dlqrqerrun palcrpard lsuru8e slenprsa;;o 1o1d ur rualied luaredde oi{ i .luapuadapur "; J;;;;t ; s.eloqs sanre^ p-appa.rd lsureeu rrr"';:'jl;t L]I:"tr"; 'q8noua lq8rerls ar' x r{rea lsute8n li yo sloidrallnri '1 .rea..,,l .I - sr drqsuoqelar Jo ruroil alqprrEl asuodsar aql LIIIM ro1:rpard a,rr1e1,1uenb qJEa Jo'uonerirry . (L - \ * u : Ip,tls.rop;pardy qryn uolssar8ag '9 sluno) papadxa gy 'g .a8re1 .(liuaor;1ns .E 'uo4elndod = .Z sr aldrueg aql ro %0I > pup sstls .l.rrprniipr, ,r; ,;rd ; (;daq1 ary) 'y- sluno) ar€ etECI .l [salqerre,l o^/rr uo parJrsselr uoqelndod auo uro{ aldues :(r^- r)(t '9 - l) : Jp] . < sluno) papadxa gy 'g .JBre1 d11uap1g.rl'nrn"r.."p.r"d-npl1 ,.1d*rg :g 'uollE)olle .z .luapuad-apur tuopu€r >Io %0I > pue sSuS are-ialdues nlr{ .i . (a{rql arv)_'I_ .s}unor are"l ptecl .I [alqerre,l auo uo paredruoc suogle1ndod,,(uer.u urorl saldures _ i(r'ia3rei r)(r r) = Jp] dlrauaEouroll o '9 < slunor papadxa .g kp"arr,j,iJ :g 'uorlelndod 1y .z ..i1a*ng aq] Jo %0I >-pue SuS .luapuadapui'"rn jfh*ri r",i ; (idaq1 ary)_ '1 .sluno) ar"r€ elecl .I (lapour uoqelndod r{llr, pareduroo aldures auo'alqeupl auo iI - slla) Jo # : ,p) f!} }o ssaupoo! o (zX) uotlppossv/suolmq!4slq 'rrrlaunu,{s pue Iepourun $ sa)uaJaJJrp;o ruer8olsrll 'c 'leturoN sr saJuaraJJrp;o uorlelndo4 .g 'uoqe)olle 'z .luapuadapur uropuer uo %0I > pu€ sus are slenprlrpul .Z ('u8rsap .I .s.ted aql lnoqe {qL11) a lpaqJleu are ele6 .I fi _ u _;p) srred pe{)fe}\t o pup '€, .leruroN *'trrlaunuds iepounun a.re stuer8olsrq qlog a.re suoqelndod qlog .g 'uorle)olle wopuer .Z uo %0I > pup ssus 'luapuadapur are aldrues r{rea ur 1116 ('u8rsap .I .luapuadapur ; aqt lnoqp {ulqJ) are saldure5 .1 (Abolour.{)al urory saldures .Z 1p) luapuadapul o^aI . *'rr4aruurds pqe Iepourun sr tuer8olsrll .lapou .7 .uoqelndod .I IeurroN e seq uorlelndo4 aql Jo %0I > pue SUS 'luapuadapur are slenpr^rpul .I fi - u = yp) aldure5 aue o 'qloqroj 0I < saJnlrpJ pue sassa))ns .a8re1.{pua,rgns .uorlerolle € a.re saldrue, w$}t[n"w uIopuPr uo suorlelndod pue .Z . p %01> sSUS aru r{tog 'luapuadapur a.re aldrues q)Ea ur e1e1l .Z ('papa11oc araM elep ar{l .ttor{ 1rtoqn 1rffi 'l 'luapuadapur are saldr,ueg .1 '0I .z < sarnlreJ pue sassa))ns 'ae.:e1 diluo'rrr.. uorlelndod .I .,"r'rlH[?' aqt Jo %0I > pue SUS 'luapuadapur are slpnpl^rpulrrlO . (z)suoluodord urarll aprrJalg ro goddns lpql suoqrpuoJ aql puv aJuaJarul ro1 suorldurnssy 26 ESft/ISlt : {-N't-ryJ

ESIN :r(4 r,n) : (:t - lt)/sSS : - KK 'SS 14) .ss (r - l)/rss : rsr^ !r(4 - KK : :vAoNV or' , ,K : rX:arenbs-rq3 ,ldxT - sqo)s

laur are suoplpuoJ pue suoqdrunsse uar{'\ sdno;fi qloq roJ selnruroJ Eg a,tqradsar ar{l uT saleullsa palood asaq} a+nlpsqns

A Z-att+ltt : uaamlaq aluaraJJlp 3ur1sa1 lo4 |ttt -.rr) - ts(L - 't:\ ":sueaur tu * : , ,', ,rorlrodo,d uaa^raq aluaraJJrp Burlsal rog :Bur1oo4 ch t- th

u 4li 'x) . (Iq)zusl\ (uorssarBar ^[ 2t+ L + - aldurrs ur) z" "([-'t--/\ (uolssar8a: 4li ! *.ro-nx\. (tq).zs;\ ^Tl is-: alduls ur)

r--!.L's (uo_rssar8a-r ,q W ,5 aldurs ur) (uorssar8araldqlmu !-! N:,t ,o uII -{- lSurPr.trP) z(4 - h)

(tt Lu A zrr Iu , /\ i\ zl _ rli zrl - t,rl -f-l9soLl :sl 20 Pl y v^ 4 rl s o

zd L- zbzd ' rbrd 'd -t4 -td

tl ilN /\ d d 6ta I ba ::l

(rllsllEls)1s ()psHels)os ,llsllEls ralaruEJEd

(rt|sqa$)OS )Ilsqels ,rlr*rrrT- tr.lw: lsaJ (ctlst+Dls)Xs x anloa fn4ln 1 Jllsna$ : ralatue'red roJ lelralur a)uaplluoJ :aJuaJalul 27

UA +:6)os 'rt : (4),1 qlt otu J oN aq] s aqr e or d de uorln qr4 q p 3u11 drue s ^t Iap IBru ::ilil:i?Jr'J, !*?l Lrl : : (d),1 ont 't i: I

bd"f : o du: rl ,_ubrd')' : (r)a :lerurourg

-dx d arll':o :d dt ,b : (x)a :)Ir+auroaD luapuadapur are I pue X JI (.t)rut + (x)na: (1a y)ru1 (,t)s + (x)s : (,t + x)s (y).ru1ra : (yu)n1 (x)za : (xa)t (x)na: e a y).u1 ; + (x)A : (r; X)g (x)az\t -r)K -zo-(X)rut (r)a.xK:rt:(fl1 (g)a : (vlg)a'luapuadapul ar€ g pue v JI (v)a '''", :(v1a)a G, pua v)a

(v;a)a x (v)a : (g Pua Y)r1

(spuu V)d - (s)a + (v)a: @,.tov)a (3v)a-r:(v)a

.Yq xrq fr 04 pue : tq alar{,tr xrq oq: - - ,hS -, + A I-u ,rr^rK

(paseq eqepl : z u:--on_fi o (paseq lapour) d _n: z

L-U A /[-n.i.l' : s L :N h< uDr x g'I + €O < fi rcuDI x 9'I - tO > fr:qunqJ-Jo-aln5 rarlln6 tO - sO: UDI ulw-xaw:aBuuy selnruJoJ pel)olas

28 STATISTICAL ERRORS P values, the ‘’ of statistical validity, are not as reliable as many assume.

BY REGINA NUZZO

or a brief in 2010, Matt Motyl was It turned out that the problem was not in Goodman, a physician and statistician at Stan- on the brink of scientific glory: he had dis- the data or in Motyl’s analyses. It lay in the sur- ford. “Then ‘laws’ handed down from God are covered that extremists quite literally see prisingly slippery of the P value, which no longer handed down from God. They’re F the world in black and white. is neither as reliable nor as objective as most actually handed down to us by ourselves, The results were “plain as day”, recalls Motyl, scientists assume. “P values are not doing their through the methodology we adopt.” a psychology PhD student at the University of job, because they can’t,” says Stephen Ziliak, an MURRAY EDWIN DALE Virginia in Charlottesville. Data from a study economist at Roosevelt University in Chicago, OUT OF CONTEXT of nearly 2,000 people seemed to show that Illinois, and a frequent critic of the way statis- P values have always had critics. In their almost political moderates saw shades of grey more tics are used. nine decades of existence, they have been lik- accurately than did either left-wing or right- For many scientists, this is especially worry- ened to mosquitoes (annoying and impossi- wing extremists. “The hypothesis was sexy,” ing in light of the reproducibility concerns. In ble to swat away), the emperor’s new clothes he says, “and the data provided clear support.” 2005, epidemiologist John Ioannidis of Stan- (fraught with obvious problems that everyone The P value, a common index for the strength ford University in California suggested that ignores) and the tool of a “sterile intellectual of evidence, was 0.01 — usually interpreted as most published findings are false2; since then, rake” who ravishes science but leaves it with ‘very significant’. Publication in a high-impact a string of high-profile problems no progeny3. One researcher suggested rechris- journal seemed within Motyl’s grasp. has forced scientists to rethink how they evalu- tening the methodology “statistical hypothesis But then reality intervened. Sensitive to con- ate results. inference testing”3, presumably for the acro- troversies over reproducibility, Motyl and his At the same time, statisticians are looking nym it would yield. adviser, Brian Nosek, decided to replicate the for better ways of thinking about data, to help The irony is that when UK statistician study. With extra data, the P value came out as scientists to avoid missing important informa- introduced the P value in the 0.59 — not even close to the conventional level tion or acting on false alarms. “Change your 1920s, he did not mean it to be a definitive test. of significance, 0.05. The effect had disappeared, statistical philosophy and all of a sudden dif- He intended it simply as an informal way to and with it, Motyl’s dreams of youthful fame1. ferent things become important,” says Steven judge whether evidence was significant in the

150 | NATURE | VOL 506 | 13 FEBRUARY 2014 © 2014 Macmillan Publishers Limited. All rights reserved FEATURE NEWS

PROBABLE CAUSE

62–71 (2001) A P value measures whether an observed result can be attributed to chance. But it cannot answer a Chance of real eect

55, researcher’s real question: what are the odds that a hypothesis is correct? Those odds depend on how

Chance of no real eect strong the result was and, most importantly, on how plausibile the hypothesis is in the rst place. AM. STAT. THE LONG SHOT THE TOSS-UP THE GOOD BET 19-to-1 odds against 1-to-1 odds 9-to-1 odds in favour ET AL. ET Before the experiment The plausibility of the hypothesis — the odds of 95% chance of it being true — can be no real eect estimated from previous 50% 50% 90% 10% , conjectured 5% chance mechanisms and other of real eect expert knowledge. Three

R. NUZZO; SOURCE: T. SELLKE SOURCE: T. R. NUZZO; examples are shown here.

The measured P value P = 0.05 P = 0.0 P = 0.05 P = 0.0 P = 0.05 P = 0.0 A value of 0.05 is conventionally deemed ‘statistically signi cant’; a value of 0.01 is considered 11% ‘very signi cant’. chance of real eect

After the experiment A small P value can make a hypothesis more plausible, but the dierence may not be 89% chance of 30% 70% 71% 29% 89% 11% 96% 4% 99% 1% dramatic. no real eect

old-fashioned sense: worthy of a second look. many of the authors were non-statisticians provide general rule-of-thumb conversions The idea was to run an experiment, then see if without a thorough understanding of either (see ‘Probable cause’). According to one the results were consistent with what random approach, they created a hybrid system that widely used calculation5, a P value of 0.01 cor- chance might produce. Researchers would first crammed Fisher’s easy-to-calculate P value responds to a false-alarm probability of at least set up a ‘null hypothesis’ that they wanted to into Neyman and Pearson’s reassuringly rigor- 11%, depending on the underlying probabil- disprove, such as there being no correlation or ous rule-based system. This is when a P value ity that there is a true effect; a P value of 0.05 no difference between two groups. Next, they of 0.05 became enshrined as ‘statistically sig- raises that chance to at least 29%. So Motyl’s would play the devil’s advocate and, assuming nificant’, for example. “The P value was never finding had a greater than one in ten chance of that this null hypothesis was in fact true, cal- meant to be used the way it’s used today,” says being a false alarm. Likewise, the probability culate the chances of getting results at least as Goodman. of replicating his original result was not 99%, extreme as what was actually observed. This as most would assume, but something closer probability was the P value. The smaller it was, WHAT DOES IT ALL MEAN? to 73% — or only 50%, if he wanted another suggested Fisher, the greater the likelihood that One result is an abundance of confusion about ‘very significant’ result6,7. In other words, his the straw-man null hypothesis was false. what the P value means4. Consider Motyl’s inability to replicate the result was about as For all the P value’s apparent precision, study about political extremists. Most scien- surprising as if he had called heads on a coin Fisher intended it to be just one part of a fluid, tists would look at his original P value of 0.01 toss and it had come up tails. non-numerical process that blended data and say that there was just a 1% chance of his Critics also bemoan the way that P values and background knowledge to lead to scien- result being a false alarm. But they would be can encourage muddled thinking. A prime tific conclusions. But it soon got swept into a wrong. The P value cannot say this: all it can example is their tendency to deflect attention movement to make evidence-based decision- do is summarize the data assuming a specific from the actual size of an effect. Last year, for making as rigorous and objective as possible. null hypothesis. It cannot work backwards and example, a study of more than 19,000 people This movement was spearheaded in the late make statements about the underlying reality. showed8 that those who meet their spouses 1920s by Fisher’s bitter rivals, Polish math- That requires another piece of information: online are less likely to divorce (p < 0.002) and ematician Jerzy Neyman and UK statistician the odds that a real effect was there in the first more likely to have high marital satisfaction Egon Pearson, who introduced an alternative place. To ignore this would be like waking (p < 0.001) than those who meet offline (see framework for that included sta- up with a headache and concluding that you Nature http://doi.org/rcg; 2013). That might tistical power, false positives, false negatives have a rare brain tumour — possible, but so have sounded impressive, but the effects were and many other concepts now familiar from unlikely that it requires a lot more evidence actually tiny: meeting online nudged the introductory statistics classes. They pointedly to supersede an everyday explanation such as divorce rate from 7.67% down to 5.96%, and left out the P value. an allergic reaction. The more implausible the barely budged happiness from 5.48 to 5.64 on But while the rivals feuded — Neyman hypothesis — telepathy, aliens, homeopathy — a 7-point scale. To pounce on tiny P values called some of Fisher’s work mathematically the greater the chance that an exciting finding and ignore the larger question is to fall prey to “worse than useless”; Fisher called Neyman’s is a false alarm, no mat- the “seductive certainty of significance”, says approach “childish” and “horrifying [for] intel- NATURE.COM ter what the P value is. Geoff Cumming, an emeritus psychologist at lectual freedom in the west” — other research- For more on These are sticky con- La Trobe University in Melbourne, Australia. ers lost patience and began to write statistics statistics, see: cepts, but some stat- But significance is no indicator of practical manuals for working scientists. And because go.nature.com/xlj9lr isticians have tried to relevance, he says: “We should be asking,

13 FEBRUARY 2014 | VOL 506 | NATURE | 151 © 2014 Macmillan Publishers Limited. All rights reserved NEWS FEATURE

‘How much of an effect is there?’, not ‘Is there how statistics is taught, how data analysis is in the study.” This disclosure will, he hopes, an effect?’” done and how results are reported and inter- discourage P-hacking, or at least alert readers Perhaps the worst fallacy is the kind of preted. But at least researchers are admitting to any shenan­igans and allow them to judge self-deception for which psychologist Uri that they have a problem, says Goodman. “The accordingly. Simonsohn of the University of Pennsylvania wake-up call is that so many of our published A related idea that is garnering attention is and his colleagues have popularized the term findings are not true.” Work by researchers two-stage analysis, or ‘preregistered replication’, P-hacking; it is also known as data-dredging, such as Ioannidis shows the link between says political and statistician Andrew snooping, fishing, significance-chasing and theoretical statistical complaints and actual Gelman of Columbia University in New York double-dipping. “P-hacking,” says Simonsohn, difficulties, says Goodman. “The problems City. In this approach, exploratory and con- “is trying multiple things until you get the that statisticians have predicted are exactly firmatory analyses are approached differently desired result” — even unconsciously. It may what we’re now seeing. We just don’t yet have and clearly labelled. Instead of doing four sepa- be the first statistical term to rate a definition all the fixes.” rate small studies and reporting the results in in the online Urban Dictionary, where one paper, for instance, researchers would the usage examples are telling: “That first do two small exploratory studies finding seems to have been obtained and gather potentially interesting find- through p-hacking, the authors dropped “THE P VALUE WAS ings without worrying too much about one of the conditions so that the overall NEVER MEANT TO BE false alarms. Then, on the basis of these p-value would be less than .05”, and “She results, the authors would decide exactly is a p-hacker, she always monitors data USED THE WAY IT’S how they planned to confirm the find- while it is being collected.” ings, and would publicly pre­register their Such practices have the effect of turn- USED TODAY.” intentions in a database such as the Open ing discoveries from exploratory studies Science Framework (https://osf.io). They — which should be treated with scep- would then conduct the replication stud- ticism — into what look like sound Statisticians have pointed to a num- ies and publish the results alongside those of confirmations but vanish on repli- ber of measures that might help. To the exploratory studies. This approach allows cation. Simonsohn’s simulations have avoid the trap of thinking about results for freedom and flexibility in analyses, says Gel- shown9 that changes in a few data-analysis as significant or not significant, for exam- man, while providing enough rigour to reduce decisions can increase the false-positive rate ple, Cumming thinks that researchers should the number of false alarms being published. in a single study to 60%. P-hacking is espe- always report effect sizes and confidence inter- More broadly, researchers need to realize the cially likely, he says, in today’s environment vals. These convey what a P value does not: limits of conventional statistics, Goodman says. of studies that chase small effects hidden in the magnitude and relative importance of They should instead bring into their analysis noisy data. It is tough to pin down how wide- an effect. elements of scientific judgement about the spread the problem is, but Simonsohn has Many statisticians also advocate replacing plausibility of a hypothesis and study limita- the sense that it is serious. In an analysis10, he the P value with methods that take advantage tions that are normally banished to the dis- found evidence that many published psychol- of Bayes’ rule: an eighteenth-century theorem cussion section: results of identical or similar ogy papers report P values that cluster suspi- that describes how to think about probability experiments, proposed mechanisms, clinical ciously around 0.05, just as would be expected as the plausibility of an outcome, rather than knowledge and so on. Statistician Richard if researchers fished for significant P values as the potential of that outcome. Royall of Johns Hopkins Bloomberg School until they found one. This entails a certain subjectivity — some- of Public Health in Baltimore, Maryland, thing that the statistical pioneers were trying said that there are three questions a scientist NUMBERS GAME to avoid. But the Bayesian framework makes it might want to ask after a study: ‘What is the Despite the criticisms, reform has been slow. comparatively easy for observers to incorpo- evidence?’ ‘What should I believe?’ and ‘What “The basic framework of statistics has been rate what they know about the world into their should I do?’ One method cannot answer all virtually unchanged since Fisher, Neyman and conclusions, and to calculate how probabilities these questions, Goodman says: “The numbers Pearson introduced it,” says Goodman. John change as new evidence arises. are where the scientific discussion should start, Campbell, a psychologist now at the University Others argue for a more ecumenical not end.” ■ SEE EDITORIAL P. 131 of Minnesota in Minneapolis, bemoaned the approach, encouraging researchers to try mul- issue in 1982, when he was editor of the Journal tiple methods on the same data set. Stephen Regina Nuzzo is a freelance writer and an of Applied Psychology: “It is almost impossible Senn, a statistician at the Centre for Public associate professor of statistics at Gallaudet to drag authors away from their p-values, and Health Research in Luxembourg City, likens University in Washington DC. the more zeroes after the decimal point, the this to using a floor-cleaning robot that can- 11 1. Nosek, B. A., Spies, J. R. & Motyl, M. Perspect. harder people cling to them” . In 1989, when not find its own way out of a corner: any data- Psychol. Sci. 7, 615–631 (2012). Kenneth Rothman of Boston University in analysis method will eventually hit a wall, and 2. Ioannidis, J. P. A. PLoS Med. 2, e124 (2005). Massachusetts started the journal Epidemiol- some common sense will be needed to get the 3. Lambdin, C. Theory Psychol. 22, 67–90 (2012). 4. Goodman, S. N. Ann. Internal Med. 130, 995–1004 ogy, he did his best to discourage P values in process moving again. If the various methods (1999). its pages. But he left the journal in 2001, and come up with different answers, he says, “that’s 5. Goodman, S. N. Epidemiology 12, 295–297 (2001). P values have since made a resurgence. a suggestion to be more creative and try to find 6. Goodman, S. N. Stat. Med. 11, 875–879 (1992). 7. Gorroochurn, P., Hodge, S. E., Heiman, G. A., Ioannidis is currently mining the PubMed out why”, which should lead to a better under- Durner, M. & Greenberg, D. A. Genet. Med. 9, database for insights into how authors across standing of the underlying reality. 325–321 (2007). many fields are using P values and other sta- Simonsohn argues that one of the strongest 8. Cacioppo, J. T., Cacioppo, S., Gonzagab, G. C., Ogburn, E. L. & VanderWeele, T. J. Proc. Natl Acad. tistical evidence. “A cursory look at a sample protections for scientists is to admit every- Sci. USA 110, 10135–10140 (2013). of recently published papers,” he says, “is thing. He encourages authors to brand their 9. Simmons, J. P., Nelson, L. D. & Simonsohn, U. convincing that P values are still very, very papers ‘P-certified, not P-hacked’ by includ- Psychol. Sci. 22, 1359–1366 (2011). 10. Simonsohn, U., Nelson, L. D. & Simmons, J. P. J. Exp. popular.” ing the words: “We report how we deter- Psychol. http://dx.doi.org/10.1037/a0033242 Any reform would need to sweep through mined our sample size, all data exclusions (2013). an entrenched culture. It would have to change (if any), all manipulations and all measures 11. Campbell, J. P. J. Appl. Psych. 67, 691–700 (1982).

152 | NATURE | VOL 506 | 13 FEBRUARY 2014 © 2014 Macmillan Publishers Limited. All rights reserved Research Skills Seminar Series 2019 CAHS Research Education Program

© CAHS Research Education Program, Department of Child Health Research, Child and Adolescent Health Service, WA 2019

Copyright to this material produced by the CAHS Research Education Program, Department of Child Health Research, Child and Adolescent Health Service, Western Australia, under the provisions of the Copyright Act 1968 (C’wth Australia). Apart from any fair dealing for personal, academic, research or non-commercial use, no part may be reproduced without written permission. The Department of Child Health Research is under no obligation to grant this permission. Please acknowledge the CAHS Research Education Program, Department of Child Health Research, Child and Adolescent Health Service when reproducing or quoting material from this source.

ResearchEducationProgram.org [email protected]

Research Skills Seminar Series | CAHS Research Education Program Department of Child Health Research | Child and Adolescent Health Service