Hypothesis Testing, Part 2

Hypothesis Testing, Part 2

Hypothesis testing, part 2 With some material from Howard Seltman, Blase Ur, Bilge Mutlu, Vibha Sazawal 1 CATEGORICAL IV, NUMERIC DV 2 Independent samples, one IV # Conditions Normal/Parametric Non-parametric Exactly 2 T-test Mann-Whitney U, bootstrap 2+ One-way ANOVA Kruskal-Wallis, bootstrap 3 Is your data normal? • Skewness: asymmetry • Kurtosis: “peakedness” rel. to normal – Both: within +- 2SE(s/u) is OK • Or use Shapiro-Wilk (null = normal) • Or look at Q-Q plot 4 T-test • Already talked about • Assumptions: normality, equal variances, independent samples – Can use Levene to test equal variance assumption • Post-test: check residuals for assumption fit – For a t-test this is the same pre or post – For other tests you check residual vs. fit post 5 One way ANOVA • H0: m1 = m2 = m3 • H1: at least one doesn’t match • NOT H1: m1 != m2 != m3 • Assumptions: normality, common variance, independent errors • Intuition: F statistic – Variance between / Variance within – Under (exact null), F=1; F >> 1 rejects null 6 One-way ANOVA • F = MSb / MSw 2 • MSw = sum [sum[ (diff from mean) ]] / dfw – dfw = N-k, where k = number of conditions – Sum over all conditions; sum per condition 2] • MSb = sum [(diff from grand mean) / dfb – dfb = k-1 – Every observation goes in the sum 7 (example from Vibha Sazawal) 8 9 F-distribution rejected 10 Now what? (Contrasts) • So we rejected the null. What did we learn? – What *didn’t* we learn? – At least one is different ... Which? All? – This is called an “omnibus test” • To answer our actual research question, we usually need pairwise contrasts 11 The trouble with contrasts • Contrasts mess with your Type I bounds – One test: 95% confident – Three tests: 85.7% confident – 5 conditions, all pairs: 4 + 3 + 2 + 1 = 10 tests: 59.9% – UH OH 12 Planned vs. post hoc • Planned: You have a theory. – Really, no cheating – You get n-1 pairwise comparisons for free – In theory, should not be control vs. all, but prob. OK – NO COMPARISONS unless omnibus passes • Post-hoc – Anything unplanned – More than n-1 – Requires correction! – Doesn’t necessarily require omnibus first 13 Correction • Adjust {p-values, alpha} to compensate for multiple testing post-hoc • Bonferroni (most conservative) – Assume all possible pairs: m = k(k-1)/m (comb.) – alphac = alpha / m – Once you have looked, implication is you did all the comparisons implicitly! • Holm-Bonferroni is less conservative – Stepwise adjusting alpha as you go • Dunnett for specifically all vs. control, others 14 Independent samples, one IV # Conditions Normal/Parametric Non-parametric Exactly 2 T-test Mann-Whitney U, bootstrap 2+ One-way ANOVA Kruskal-Wallis, bootstrap 15 Non-parametrics: MWU and K-W • Good for non-normal data, likert data (ordinal, not actually numeric) • Assumptions: independent, at least ordinal • Null: P(X > Y) = P(Y > X) where X,Y are observations from the 2 distributions (MWU) – If assume same distribution shape, continuous then this can can be seen as comparing medians 16 MWU and K-W continued • Essentially: rank order all data (both conditions) – Total ranks for condition 1, compare to “expected” – Various procecures to correct for ties 17 Bootstrap • Resampling technique(s) • Intuition: – Create “null” distribution by e.g. subtracting means so mA = mB = 0 • Now you have shifted samples A-hat and B-hat – Combine these to make a null distribution – Draw sample of size N, with replacement • Do it 1000 (or 10k) times – Use this to determine critical value (alpha = 0.05) – Compare this critical value to your real data for test 18 Paired samples, one IV # Conditions Normal/Parametric Non-parametric Exactly 2 Paired T-test Wilcoxon signed-rank 2+ 2-way ANOVA w/ Friedman subJect random factor Mixed models (later) 19 Paired T-test • Two samples per participant item • Test subtracts them • Then uses one-sample T-test with H0: m = 0 and H1: m != 0 • Regular T-test assumptions, plus: does subtraction make sense here? 20 Wilcoxon S.R. / Friedman • H0: difference btwn pairs is symmetric around 0 • H1: … or not • Excludes no-change items • Essentially: rank by abs. difference; compare signs * ranks • (Friedman = 3+ generalization) 21 One numeric IV, numeric DV SIMPLE LINEAR REGRESSION 22 Simple linear regression • E(Y|x) = b0 + b1x … looks at populations – Population mean at this value of x • Key H0: b1 != 0 – b0 usually not important for significance (obv. important in model fit) • b1 : slope à change in Y per unit X • Best fit: Least squares, or maximum likelihood – LSq: minimize sum of squares of residuals – ML: max prob. of seeing this data with this model 23 Assumptions, caveats • Assumes: – linearity in Y ~ X – normally distributed error for each x, with constant variance at all x – Error measuring X is small compared to var. Y (fixed X) • Independent errors! – Serial correlation, data that is grouped, etc. (later) • Don’t interpret widely outside available x vals • Can transform for linearity! – Log(Y), sqrt(y), 1/y, y^2 24 Assumption/residual checking • Before: Use scatterplot for plausible linearity • After: residual vs. fit – Residual on Y vs. predicted on X – Should be relatively even distributed around 0 (linear) – Should have relatively even v. spread (eq. var) • After: quantile-normal of residuals 25 Model interpretation • Interpret b1, interpret the p-value • CI: if it crosses 0, it’s not significant • R2: fraction of total variation accounted for – Intutively: explained variance / total variance – Explained = var(Y) – residual errors • F2 = R2 / (1 – R R2); SML: 0.02, 0.15, 0.35 (cohen) 26 Robustness • Brittle to linearity, independent errors • Somewhat brittle to fixed-X • Fairly robust to equal variance • Quite robust to normality 27 CATEGORICAL OUTCOMES 28 One Cat. IV, Cat. DV, independent • Contingency tables: how many people in each combination of categories 29 Chi-square test of independence • H0: distribution of Var1 is the same at every level of Var2 (and vice versa) – Null dist. Approaches X^2 when sample size grows – Heuristic: no cells < 5 – Can use FET instead • Intuition: – Sum over rows/columns: (observed – expected)^2 / expected – Expected: marginal % * count in other margin 30 Paired 2x2 tables • Use McNemar’s test – Contigency table: matches and mismatches for each option. • H0: marginals are the same Cond1: Yes Cond 1: No Cond2: Yes a b a + b Cond2: No c d c + d a + c b + d N • Essentially a X^2 test on the agreement – Test stat: (b-c)^2 / (b+c) 31 Paired, continued • Cochran’s Q: extended for more than two conditions • Other similar extensions for related tasks 32 Critiques • Choose a paper that has one (or more) empirical experiments as a central contribution – Doesn’t have to be human subjects, but can be – Does have to have enough description of experiment • 10-12 minute presentation • Briefly: research questions, necessary background • Main: describe and critique methods – Experimental design, data collection, analysis – Good, bad, ugly, missing • Briefly, results? 33 Logistic regression (logit) • Numeric IV, binary DV (or ordinal) • log( E(Y)/ (1-E(Y)) ) == log ( Pr (Y=1) / Pr (Y=0)) = b0 + b1x • Log odds of success = linear function – Odds: 0 to inf., 1 is the middle – e.g.: odds = 5 = 5:1 … for five successes, one fail – Log odds: -inf to inf w/ 0 in the middle: good for regression • Modeled as binomial distribution 34 Interpreting logistic regression • Take exp(coef) to get interpretable odds. • For each unit increase in x, odds increase b1 times – Note that this can make small coefs important! • Use e.g., Homer-Lemeshow test for goodness of fit – null == data fit the model – But not a lot of power! 35 MULTIVARIATE 36 Multiple regression • Linear/logistic regression with more variables! – At least one numeric, 0+ categorical • Still: fixed x, normal errors w/ equal variance, independent errors (linear) • Linear relationship in E(Y) and one x, when other inputs held constant – Effects of each x are independent! • Still check q-n of residuals, residual vs. fit 37 Model selection • Which covariates to keep? (more on this in a bit) 38 Adding categorical vars • Indicator variables (everything is 0 or 1) • Need one fewer indicator than conditions – One condition is true; or none are true (baseline) – Coefs are *relative to baseline*! • Model selection: keep all or none for one factor • Called “ANCOVA” when at least one each numeric + categorical 39 Interaction • What if your covariates *aren’t* independent? • E(Y) = b0 + b1x1 + b2x2 + b12x1x2 – Slope for x1 is diff. for each value of x2 • Superadditive: all in same direction, interaction makes effects stronger • Subadditive: interaction is in opposite direction • For indicator vars, all or none 40 Model selection! • Which covariates to keep? • From theory • Keep interaction only if it’s significant? – If keep interaction, should keep corresponding mains • ”Adjusted” R^2? – Regular R^2 always higher w/ more covars • BIC and AIC – Take model likelihood and penalize for more params – Abs value not interpretable; lower is better • All combinations? Stepwise? 41 Know they exist; look them up if relevant THINGS WE ARE ONLY GOING TO MENTION BRIEFLY 42 Multi-way ANOVA • >1 cat IVs, 1 numeric DV • Normality, equal variance, indep. Errors • With interaction: every combo of factor levels has its own population mean • Without interaction (additive): change in one var consistent as all fixed vals for others • Works basically like standard ANOVA, etc. 43 Mixed models regression • Explicitly model correlations in data • Fixed effects: affect outcome for everyone • Random effects: deviations per data item, don’t want to model individually • Simplest example: repeated measures – Y ~ b0 + b1x1 + b2x2 …. + random ID intercept – Each participant has their own intercept adjustment 44 POWER ANALYSIS 45 What is power? • Null distribution: designed so that we’d only see a test statistic this extreme 5% of the time • This bounds type I but not type II • Power = 1 – type II error rate • Heuristic: 80% is “good enough” 46 Alternative scenarios • One null, but infinitely many alternatives! • Alternative distribution: given some n, underlying variance, underlying diff.

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    52 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us