Commonly Reported Statistics & Concepts Quick Reference Sheet

Grant B. Morgan, Ph.D., Baylor University James R. Andretta, Ph.D., Superior Court of the District of Columbia

Morgan, Slobogin, & Andretta (2018) reviewed 93 articles published in 2017 in Psychology, Public Policy, and Law or Law and Human Behavior in order to determine the most commonly reported statistical coefficients and concepts. The coefficients/concepts, symbols, and frequencies are reported in Table 1. As shown, the p-value was the commonly reported piece of statistical output (n = 83). This information demonstrates the of null hypothesis significance testing in the literature, which should be expected. Confidence intervals were the second most commonly reported statistical output (n = 67), which was reported in about a third of papers reviewed. Beyond these two pieces of output, other statistical coefficients or statistical information were reported in at most about a third of the papers. The purpose of this document is to offer a brief but helpful introduction to the most commonly reported coefficients and concepts.

Table 1: Most Commonly Reported Statistical Coefficients in Law and Human Behavior And Psychology, Public Policy and Law Coefficient Symbol n Coefficient Symbol n p-value p 83 Standardized root mean SRMR 3 residual Confidence interval CI 67 Rate Ratio IRR 3 Standardized beta β 39 RR 3 Regression coefficient b 39 Bayesian information BIC 5 criterion Cohen’s d d 38 Receiver operating curve ROC 4 Standard error SE 33 Wilk’s lambda Λ 4 Pearson correlation r 32 Akaike information criterion AIC 4 OR, eβ 25 Cohen’s f f 3 Cronbach’s α α 17 Hazard ratio eβ 3 2 Partial eta-squared ηpartial 15 Mean square error MSE 2 Area under curve AUC 11 Ordinary least squares OLS 2 Eta-squared η2 10 Cohen’s h h 1 Cramer’s VV 9 Tredoux’s EE 1 R-squared R2 7 Factor loading λ 1 Phi φ 7 Spearman’s ρ ρr, rs 1 Intraclass correlation ICC 6 Somers d dyx 1 Comparative fit index CFI 6 Lo-Mendell-Rubin likelihood LMR-LRT 1 ratio test Cohen’s k k 6 Adjusted BIC ABIC 1 Root mean square error of RMSEA 6 Bootstrapped likelihood BLRT 1 approximation ratio test Tucker-Lewis index TLI 3

1 p-value • Probability of observing a test statistic or one more extreme if the null hypothesis is true. • Compared against the pre-established Type I error rate (α); if less than α the null hypothesis can be rejected with 100(1 − α)% confidence.

Confidence Interval • Range of values that would not be rejected if used in hypothesis test. • General form: Statistic ± Critical Value × Standard Error • Can establish upper and bound; Or can establish the upper or lower bound of a parameter (Usually reported as the former, which is likely due to defaults of statistical software). • Can be thought of, in general, as the range of plausible values of the population parameter

Standardized Regression Coefficient (β) • What the regression coefficient would be if all variables had first been converted to z-scores • May be positive or negative • Interpretation: The expected amount of change in units of the for a one standard deviation increase in holding all other predictors constant. • Can be used as an effect size estimate; Larger β coefficients indicate stronger relationships between the outcome and predictor variables • May be preferable to report standardized coefficients for variables/measures with which readers may not be familiar

Unstandardized Regression Coefficient (b) • Interpretation: The expected amount of change in the for a one unit increase in holding all other predictors constant. • Can be used as an effect size estimate • Can be positive or negative • Is/Are on the scale of the outcome variable

Cohen’s d • Difference between two means in standard deviation units • Used as an effect size estimate; Larger values indicate greater difference between means • How different two means need to be in order to be considered meaningfully different depends on context of study and/or theoretical expectation

Standard Error • Standard deviation of the sampling distribution • Expected amount of variability of a statistic if samples of the same size were repeatedly drawn at random and some statistic was computed for each sample • Estimate of the precision of a statistic • Smaller standard errors decrease confidence intervals and increase statistical power

2 Pearson Product Moment Correlation (r) • Measure of linear relationship between two continuous variables

• Ranges from -1 to +1; Sign indicates the direction of the relationship; Proximity to -1 or +1 indicates strength of relationship

• Can be interpreted as an effect size estimate

Odds Ratio • Generally, odds are defined as: P (event occurs) P (event does not occur) • Odds ratio compares the two odds by creating a ratio of odds

• If odds are equal, then odds ratio equals 1; If odds ratio is greater than one, the odds of the outcome occurring are greater for condition in numerator; if odds ratio is less than one, the odds of the outcome occurring are greater for condition in denominator.

• Can be interpreted as an effect size estimate

Cronbach’s alpha (α) • Generally accepted as a measure of reliability of a scale or set of questions on a survey/questionnaire

• Ranges from 0 to 1

• Values ≥ .7 and .85 are generally considered acceptable for low- and high-stakes decisions, respectively.

• Better estimates of reliability are available, but Cronbach’s α is very widely used and is available in most, if not all, statistics software

2 Partial eta-squared (ηpartial) • Effect size estimate for ANOVA with factorial designs

SS • Can be computed as: effect SSeffect+SSerror • Typically ranges 0 to 1; Possible to obtain values greater that one but uncommon

2 • How large the ηpartial need to be in order to be considered meaningful different depends on context of study and/or theoretical expectation

Area Under the Curve (AUC) • Used with receiver operating characteristic (ROC) curve analysis

• Quantifies how well a score can discriminate between two known groups

• Balances specificity and sensitivity of a diagnostic criterion

• Technically can range from 0 to 1, but .5 is expected AUC if diagnosis determined at random

• Values closer to 1 indicate better diagnostic utility.

3 Eta-squared (η2) • Effect size estimate for ANOVA model SS • Can be computed as: effect SStotal • Ranges from 0 to 1; Larger values indicate larger effects

• Interpreted as the proportion of variability attributed to a factor in the model

• How large the η2 need to be in order to be considered meaningful depends on context of study and/or theoretical expectation

2 • In a one-way ANOVA, η2 is equivalent to R from regression.

Cramer’s V • Measure of association between two categorical or nominal variables

• Ranges from 0 to 1; Larger values indicate larger effects

• Rescaled χ2 statistic

• If used with a 2×2 table, it is equivalent to φ

R-squared (R2) • Referred to as coefficient of determination

• Ranges from 0 to 1; Larger values indicate larger effects explained by a regression model

• How large the R2 needs to be in order to be considered meaningful depends on context of study and/or theoretical expectation

Phi (φ) • Measure of association between two dichotomous variables

• Ranges from 0 to 1; Larger values indicate larger effects

• Rescaled χ2 statistic

Hazard ratio • Comparison of two hazard rates (between two groups or between incremental changes in a continuous vari- able) at a specific point in time

• Ratio is hazard rate of, say, focal group divided by hazard rate of reference group at a given point in time

• Hazard ratios are commonly used in or event-history analysis

4