<<

Choosing the Correct Statistical Test

T racie O. Afifi, PhD

Departments of Community Health Sciences & Psychiatry

University of Manitoba

Department of Community Health Sciences COLLEGE OF MEDICINE, FACULTY OF HEALTH SCIENCES Choosing the Correct Statistical Test

¡ What do you need to know to pick the right statistical test?

Department of Community Health Sciences COLLEGE OF MEDICINE, FACULTY OF HEALTH SCIENCES To pick the correct statistical test you need to know…

— What your research question is asking — The level of of the variables — The distribution of the data Common Statistical Tests

— T-test — Mann Whitney U — ANOVA — Kruskal Wallis Test — Chi-Square Test — Pearsons Correlation — Spearmans Correlation — — What is your research question asking? Choosing a Statistical Test

What is your research question asking?

Is there a Is there a difference? relationship? Is there a difference?

— Is there a difference in depression among adolescents who are sexually abused compared to adolescents who are not sexually abused? Is there a difference?

— T-test — Mann Whitney U — ANOVA — Kruskal Wallis Test — Chi-Square Test Is there a difference?

— T-test — Mann Whitney U — ANOVA — Kruskal Wallis Test — Chi-Square Test

But how do you know which one to choose? Is there a difference?

— T-test — Mann Whitney U — ANOVA — Kruskal Wallis Test — Chi-Square Test

But how do you know which one to choose?

What are the variables? Is there a difference?

— T-test — Mann Whitney U — ANOVA — Kruskal Wallis Test — Chi-Square Test

But how do you know which one to choose?

What are the variables?

How are the variables measured? Is there a difference?

— T-test — Mann Whitney U — ANOVA — Kruskal Wallis Test — Chi-Square Test

But how do you know which one to choose?

What are the variables?

How are the variables measured?

What is the distribution of the data? What are the Variables?

— Is there a difference in depression among adolescents who are sexually abused compared to adolescents who are not sexually abused? What are the Variables?

— Is there a difference in depression among adolescents who are sexually abused compared to adolescents who are not sexually abused?

One Variable is Sexual Abuse One Variable is Depression How are the Variables Measured?

Sexual Abuse Depression — Categories (yes or no) — Categories (none, minor, moderate, severe) — Scores (e.g., 0-10) How are the Variables Measured?

Level of Measurement Level of Measurement

— Nominal ¡ Named categories with no order — Ordinal ¡ Categories with a logical order or rank order — Interval ¡ Rank order AND distant between intervals of measurement have meaning (zero value is arbitrary). — Ratio ¡ Same properties as interval data AND the distance and ratio between two are defined and has an empirical (not arbitrary) zero value. ¡ You can say a score of 20 is “twice as much” as 10.

Liamputtong 2013 Level of Measurement

Type Description Nominal Classes or categories without numerical order •Male, female •Jewish, Catholic, Muslim Ordinal (ranked) Ordered categories •Mild pain, moderate pain, and severe pain •High school, undergraduate, graduate Interval The distance or interval between two measurements have meaning •Temperature in Celsius (zero = 273.15 Kelvin)

Ratio The distance and ratio between two measurements are defined and zero has a meaning of zero and therefore you can say “twice as much” •Weight •Age in years •Temperature in Kelvin (absolute zero) What is the Distribution of the Data? and Dispersion

— Central tendency ¡Where the bulk of the data lie. ÷, , , etc

— Dispersion ¡How wide or narrow the data are spread out. ÷Number of categories, , Standard , etc

Health Research Methods: A Canadian Perspective (2014) Edited by K. Bassil & D. Zabkiewicz; Chapter 7, pp. 119-142 Central Tendency

— Mode ¡ The value that appears most often ¡ (3, 4, 5, 6, 8, 8, 15) Mode = 8

— Mean ¡ The arithmetic average of the observations ¡ (3, 4, 5, 6, 8, 8, 15) Mean = 7

— Median ¡ Middle value (3, 4, 5, 6, 8, 8, 15) Median = 6 Level of Central Tendency Dispersion Measurement Nominal Mode (most frequent category) Number of categories

Ordinal Median (data are ranked, middle value Range and the with half above and half below) (median of upper half and median of lower half IQR is difference between the two)

Interval Mean (summed and divided by number) (how much each data point deviates from the mean)

Ratio Mean (summed and divided by number) Standard Deviation

Health Research Methods: A Canadian Perspective (2014) Edited by K. Bassil & D. Zabkiewicz; Chapter 7, pp. 119-142 Level of Central Tendency Dispersion Measurement Nominal Mode (most frequent category) Number of categories

Ordinal Median (data are ranked, middle value Range and the with half above and half below) Interquartile range (median of upper half and median of lower NON-PARAMETERIC TESTS half IQR is difference between the two)

Interval Mean (summed and divided by number) Standard Deviation (how much each data point deviates from PARAMETERIC TESTS the mean)

Ratio Mean (summed and divided by number) Standard Deviation

Health Research Methods: A Canadian Perspective (2014) Edited by K. Bassil & D. Zabkiewicz; Chapter 7, pp. 119-142 What is the Distribution of the Data?

Normal Distribution Or Non-Normal Distribution Normal Distribution Average Hours of Sleep

60 60

Mean = 7.92 40 Std Error = 0.13 95% CI = 7.68 to 8.18 29 25 Frequency 20

11 12 12 8 5 3 0 4 6 8 10 12 hours of sleep Non-Normal Distribution Among respondents with babies

11

10

10 Mean = 5.88 Std Error = 0.30 95% CI = 5.27 to 6.49

5 5 Frequency 4

1 1 1 0 4 6 8 10 12 hours of sleep Distribution of the Data

— Parametric test ¡ Interval or ratio level data with a NORMAL DISTRIBUTION

— Non-parametric test ¡ Nominal or ordinal level data or interval or ratio with a NON- NORMAL DISTRIBUTION Common Statistical Tests Is there a difference?

Parametric Non-Parametric

— T-test — Mann Whitney U — ANOVA — Kruskal Wallis Test — Chi-Square Test T-test

— To test if two are statistically different?

¡ One variable is Continuous (interval or ratio level) ¡ One variable is Dichotomous (two categories)

¡ Distribution of continuous variable is NORMAL (bell curve) T-test

— Is the mean depression score different for adolescents who are sexually abused compared to adolescents who are non-sexually abused? ÷ Sexual abuse = Yes or No (nominal or Dichotomous) ÷ Depression = 1 to 10 (interval with higher scores worse depression)

Depression (mean) Total 4

No Sexual abuse 2 Sexual abuse 8 What if the Distribution was NON-NORMAL?

¡ One variable is Continuous (interval or ratio level) with a NON- NORMAL DISTRIBUTION

¡ One variable is Dichotomous (two categories) Mann-Whitney U test

— A non-parametric test for comparing ordinal, or non-normal continuous level data for two independent groups

— Non-normal distribution ¡ One Variable ÷ Ordinal or non-normal continuous level ¡ One Variable ÷ Two-level-categorical, dichotomous

Bruce, 2008 Quantitative Methods for Health Research, pp. 491-495 Is there a difference?

Parametric Non-Parametric

— T-test — Mann Whitney U ¡ Difference in means in two ¡ Difference in in groups two groups Is there a difference?

— What if you have three groups or more? ¡ No sexual abuse, minor sexual abuse, moderate sexual abuse, severe sexual abuse? ANOVA Analysis of — Used to compare statistical difference between three or more group means

— ANOVA compares differences across all means at the same

— Distribution of the sample means are normal (Parametric)

¡ Dependent Variable ÷ Continuous (one variable) ¡ Independent Variable ÷ Categorical (One variable with more than two levels or groups)

Bruce, (2008); Tabachnick & Fidell (2007); Winston (1999); Liamputtong, 2013 ANOVA

— Are the mean depression score different for adolescents who experience mild sexual abuse, moderate sexual abuse, or severe sexual abuse?

¡ Distribution of depression scores is NORMAL

÷ Sexual abuse (Ordinal as none, minor, moderate, severe) ÷ Depression (interval ranging 0 to 10)

Depression (mean)

Total Sample 4

No Sexual Abuse 2 Minor Sexual Abuse 4 Moderate Sexual Abuse 7 Severe Sexual Abuse 9 ANOVA

— To test if three or means are statistically different?

¡ One variable is continuous (interval or ratio level) with a NORMAL DISTRIBUTION

¡ One variable is categorical (three or more categories) What if the Distribution was NON-NORMAL?

¡ One variable is ordinal OR continuous (interval or ratio level) with a NON-NORMAL DISTRIBUTION

¡ One variable is Categorical (three or more categories) Kruskal Wallis Test

— Median scores from three or more groups ¡ One variable = continuous (non-normal) or ordinal ¡ One variable = categorical with 3 levels or more ¡ An extension of the Mann Whitney U test and the non- parametric equivalent to ANOVA.

Liamputtong, 2013 Chi-Square Test of Significance (X2)

— Non-parametric test (Non-normal distribution)

¡ One Variable ÷ Categorical with 2 or more levels ¡ One Variable ÷ Categorical with 2 or more levels

Bruce (2007); Tabachnick & Fidell (2007); Winston (1999) Is there a difference?

Parametric Non-Parametric

— T-test — Mann Whitney U — ANOVA — Kruskal Wallis Test — Chi-Square Test Is there a relationship?

— Is there a positive correlation between sexual abuse and depression?

— Is sexual abuse severity associated with increased severity of depression?

— Is sexual abuse associated with increased odds of depression? Is there a relationship?

— Is there a positive correlation between sexual abuse and depression? Correlation

— Is sexual abuse severity associated with increased severity of depression? Linear Regression

— Is sexual abuse associated with increased odds of depression? Logistic Regression Is there a relationship?

Parametric Non-Parametric

— Pearsons Correlation — Spearmans Correlation — Linear Regression — Logistic Regression Correlation Strength of a linear relationship

Pearson Spearman

— Distribution of the — Distribution of the variables are normal variables are non-normal (parametric test) (non-parametric test) OR one or more variables are ordinal

¡ One Variable ¡ One Variable ÷ Continuous ÷ Continuous/Categorical ¡ One Variable ¡ One Variable ÷ Continuous ÷ Continuous/Categorical Bruce, 2008 Quantitative Methods for Health Research, pp. 74-78 Linear Regression

— Describes how one variable (DV) depends on the other variable (IV)

— Regression estimates the relationship between two variables

¡ One Dependent Variable ÷ Continuous ¡ One or more Independent Variables ÷ Any level of measurement

Bruce, 2008 Quantitative Methods for Health Research, pp. 232-255 Logistic Regression

— Predicts a dichotomous outcome from one or more Independent variables (Odds Ratio)

— Parametric test (some distribution assumptions apply)

¡ One Dependent Variable ÷ Dichotomous (two categories)

¡ One or More Independent Variables ÷ Any level Is there a relationship?

Parametric Test (Normal Non-Parametric Test (Non-Normal Distribution) Distribution) Pearsons Correlation Spearmans Correlation One variable = continuous One variable = continuous or categorical One variable = continuous One variable = continuous or categorical

Linear Regression Dependent variable = continuous (1 variable) Independent variable = any level (1 or more)

Logistic Regression Dependent variable = Dichotomous (1 variable) Independent variable = any level (1 or more) Is there a difference?

Parametric Test (Normal Non-Parametric Test (Non-Normal Distribution) Distribution) T-test (difference in means) Mann Whitney U (difference in Medians) One variable = continuous One variable = Continuous or ordinal One variable = Dichotomous One variable = dichotomous

ANOVA Kruskal Wallis Test One variable = continuous One variable = continuous (non-normal) or One variable = 3 or more categories ordinal One variable = 3 categories or more

Chi-Square Test One variable = 2 or more categories One variable = 2 or more categories To pick the correct statistical test you need to know… — What your research question asking — The level of measurement of the variables — The distribution of the data