Calculating and Reporting Effect Sizes
Total Page:16
File Type:pdf, Size:1020Kb
28/04/20 CALCULATING AND REPORTING EFFECT SIZES IHR BIOSTATISTICS LUNCH LECTURE SERIES PRESENTED BY DR PAOLA CHIVERS RESEARCH AND BIOSTATISTICS: INSTITUTE FOR HEALTH RESEARCH THE UNIVERSITY OF NOTRE DAME AUSTRALIA [email protected] Quoted in Sullivan and Feinn (2012). Using effect size – or why the p value is not enough. Journal of Graduate Medical Education. September: 279‐282. DOI: http://dx.doi.org/10.4300/JGME‐D‐12‐00156.1 1 28/04/20 What are Effect Sizes? A measurement of the size (magnitude) of an effect. ‐ independent of sample size ‐ standardized metric ‐ strength of association © 2020 Chivers IHR Biostatistics Lunch Lecture Series Why are effect sizes important? • Effect sizes help communicate practical importance of a result ‐ an important but non significant result ‐ an unimportant but significant result • Provides information about ‘how’ important • Allows comparisons across studies (meta analysis) • Used to inform planning for future studies a priori • Are a journal requirement © 2020 Chivers IHR Biostatistics Lunch Lecture Series APA (7th ed.) guide to effect sizes . Recommend inclusion for “readers to appreciate the magnitude or importance” (APA 2020 p. 89) . Statistical estimate and should include confidence intervals General Principal “… provide the readers with enough information to assess the magnitude of the observed effect.” (APA 2020 p. 89) © 2020 Chivers IHR Biostatistics Lunch Lecture Series 2 28/04/20 Measuring Effect Size Can be calculated in different ways. Two main approaches: 1. Standardized difference between two means • Cohen’s d • Hedge’s d • Glass’s Delta • Common language effect size (CLES) 2. Strength of association © 2020 Chivers IHR Biostatistics Lunch Lecture Series Measuring Effect Size 1. Standardized difference between two means ‐ d family ‐ difference between observations divided by the standard deviation ‐ Standard deviations units of effect 2. Strength of association ‐ r family ‐ proportion of variance that is explained by its group © 2020 Chivers IHR Biostatistics Lunch Lecture Series Effect size measures: Between subjects designs Cohen’s d (Cohen, 1988) • Standardized mean difference of an effect • Dependent variables can be measured on different scales or be completely different measurements (Lakens, 2013) • Uncorrected effect size ‐ Provides a biased estimate of the population effect size especially n<20 Fig. Cohen’s d equation for the sample (Lakens, 2013 p3) © 2020 Chivers IHR Biostatistics Lunch Lecture Series 3 28/04/20 Effect size measures: Between subjects designs d calculated from the t‐test differences between two groups Fig. Cohen’s d equation for the sample related to a t‐test (Lakens, 2013 p3) © 2020 Chivers IHR Biostatistics Lunch Lecture Series Corrections for bias Population effect size estimates based on sample averages overestimate the true population effect. Corrections for bias can be applied: • Corrections for Cohen’s d = Hedges’s g • Corrections for eta squared (η2) = omega squared (ω2) © 2020 Chivers IHR Biostatistics Lunch Lecture Series Effect size measures: Between subjects designs Hedges’s d Standardized mean difference of an effect • Corrected effect size ‐ Provides an unbiased estimate of the population effect size Fig. Hedges’s gs equation for the sample (Lakens, 2013 p3) © 2020 Chivers IHR Biostatistics Lunch Lecture Series 4 28/04/20 Effect size measures: Between subjects designs Glass’s Delta • Used when standard deviations differ substantially between conditions. • Choose the standard deviation of either the pre or post measurement. ‐ Choose effect size that best represents the effect of interest. © 2020 Chivers IHR Biostatistics Lunch Lecture Series Effect size measures: Two independent groups Common language effect size (CLES) • Non parametric effect size • Probability of a z‐score greater than a difference between groups of zero. • Converts Cohen’s d into a percentage, expressed as the probability ‐ a random sampled person from one group is higher than a randomly sampled person from the other group (between design) ‐ an individual has a higher value on one measurement compared to the other (within design) See supplementary material from Lakens, 2013 for calculation spreadsheet. http://www.frontiersin.org/journal/10.3389/fpsyg.2013. 00863/abstract © 2020 Chivers IHR Biostatistics Lunch Lecture Series Effect size measures: One sample or correlated data • Uses the differences between measurements • Standardized mean difference for within subjects design • Cohen’s dz Fig. Cohen’s dz equation based on mean difference Fig. Cohen’s dz equation based on t‐test (Lakens, 2013 p4) (Lakens, 2013 p4) © 2020 Chivers IHR Biostatistics Lunch Lecture Series 5 28/04/20 Interpreting Effect Sizes Interpreting Cohen’s d • Small d=0.2 • Medium d=0.5 • Large d=0.8 Note these are arbitrary and the best way is to relate results to other effects reported in the literature. What is the clinical importance of the result! © 2020 Chivers IHR Biostatistics Lunch Lecture Series Reporting example Independent t‐test An independent t‐test did not report a significant difference between lower exam score for males (M=76.33 SD=5.84) compared to females (M=79.18 SD=6.89; t(32)=2.11 p=.071), although a medium to large effect was found (d=0.73 95% CI [.53‐.86]). * Note examples are fictitious with values presented for illustrative purposes only. © 2020 Chivers IHR Biostatistics Lunch Lecture Series Effect size measures: Three or more independent groups (e.g. ANOVA) • Eta squared η2 (within) Degree of association 2 • Partial Eta squared ηp (between studies) for sample • Omega squared Degree of association • Intraclass correlation for population © 2020 Chivers IHR Biostatistics Lunch Lecture Series 6 28/04/20 Interpreting Effect Sizes Interpreting Eta squared • Small .01 or 1% • Medium .06 or 6% • Large .138 or 13.8% Pallant (2020) p 218 REMEMBER these are arbitrary and the best way is to relate results to other effects reported in the literature. What is the clinical importance of the result! © 2020 Chivers IHR Biostatistics Lunch Lecture Series Reporting example Dependent t‐test An dependent t‐test reported a significant increase in 3K time trial from pre season (M=189 sec SD=33) to mid season (M=151 sec SD=16; t(57)=6.91 p=.002), with a large effect (η2=.37). * Note examples are fictitious with values presented for illustrative purposes only. © 2020 Chivers IHR Biostatistics Lunch Lecture Series Reporting example ANOVA Group differences examined using ANOVA found independent schools had significantly higher ATAR scores (M=89.1 SD=5.2), compared to Catholic schools (M=82.3 SD=3.5) who were higher than public schools (M=73.7 SD=16.2; F(2, 1654)=4.53 p=.019). Despite reaching statistical significance, the actual mean difference between school types was quite small (η2=.02). * Note examples are fictitious with values presented for illustrative purposes only. © 2020 Chivers IHR Biostatistics Lunch Lecture Series 7 28/04/20 Effect size measures: Chi‐square tests . Phi ‐ Two binary variables ‐ Related to correlation and Cohen’s d ‐ Interpreted like Pearson’s r and R2 . Cramer’s Phi or V ‐ More than two categorical variables ‐ Measures inter‐correlation ‐ Biased as increases with the number of cells © 2020 Chivers IHR Biostatistics Lunch Lecture Series Interpreting Effect Sizes Interpreting phi (2x2) Interpreting Cramer’s V • Small .10 Two categories: • Medium .30 • Small .01, Medium .30, Large .50 • Large .50 Three categories: • Small .07, Medium .21, Large .35 Four categories: Pallant (2020) p 228 • Small .06, Medium .17, Large .29 REMEMBER these are arbitrary and the best way is to relate results to other effects reported in the literature. What is the clinical importance of the result! © 2020 Chivers IHR Biostatistics Lunch Lecture Series Reporting example Chi square A chi square group difference between sex and employment status indicated a significant association of medium effect (ꭓ2 (1, n=186)=12.56 p=.003, phi=.39). More females were employed part‐time compared to males (53% versus 39%), while more males were in full time employment (61% versus 47%). * Note examples are fictitious with values presented for illustrative purposes only. © 2020 Chivers IHR Biostatistics Lunch Lecture Series 8 28/04/20 Calculating effect sizes Abundance of online calculators Lenhard and Lenhard (2016) provide a number of different online calculators based on your sample and study design. Lenhard, W. & Lenhard, A. (2016). Calculation of Effect Sizes. Retrieved from: https://www.psychometrica.de/effect_size.html. Dettelbach (Germany): Psychometrica. DOI: 10.13140/RG.2.2.17823.92329 © 2020 Chivers IHR Biostatistics Lunch Lecture Series Reporting example – in text Piggott et al., 2018 p 4 Reporting example ‐ Tables Fleay, Brock, Joyce, Christopher, Banyard, Harry, and Woods, Carl T. (2018) Manipulating field dimensions during small‐sided games impacts the technical and physical profiles of Australian footballers. Journal of Strength and Conditioning Research, 32 (7). pp. 2039‐2044. Fleay et al., 2018 p 2042 9 28/04/20 Cautions when using Effect Sizes • Reported cut‐points for interpreting effect sizes are arbitrary and the best way is to relate results to other effects reported in the literature. • Effect sizes are sensitive to spurious influences, such as: ‐ which standard deviation is used, with pooled SD better ‐ pooled SD based on assumption of estimates from the same population ‐ whether there have been corrections for bias ‐ whether data has a normal distribution ‐ reliability of the measurement