Basics of Mixed Effects Models in R

Summer workshop: July 5, 2018 the Korean Society of Speech Sciences

Basics of mixed effects models in R

Jongho Jun Hyesun Cho

([email protected]) ([email protected]) Seoul National University Dankook University Topics

• Mixed effects linear regression • Mixed effects logistic regression

• Fixed effect • Random effect ü Random intercept ü Random slope • Model comparison

2 Roadmap

I. Mixed effects linear regression ○ Wall Street Journal corpus data ○ Hypothetical VC duration data ○ Interaction terms and model selection II. Mixed effects logistic regression ○ English dative alternation

3 Data for in-class discussion • vbarg.txt • BresDative.txt Download 7. Syntax (.zip) from the DOWNLOADS tab at https://www.wiley.com/en- us/Quantitative+Methods+In+Linguistics-p- 9781405144247. • vcdur.txt: download from https://goo.gl/N2oDaS

4 References Johnson, Keith. (2008) Quantitative methods in linguistics. Blackwell. ch7.

Baayen, R. Harald. (2008). Analyzing Linguistic Data: A Practical Introduction to Statistics Using R. Cambridge University Press. ch7. Muller, Samuel, J. L. Scealy & A.H. Welsh (2013) Model Selection in Linear Mixed Models. Statistical Science 28(2), 135-167

5 I. Mixed effects linear regression

○ Wall Street Journal corpus data ○ Hypothetical VC duration data ○ Interaction terms and model selection

6 Wall Street Journal (WSJ) corpus

• Discussed in Johnson (2008 section 7.3) • Coding

(N0 According to the media)(A0 President

Menem)(V took)(A1 office)(AM July 8). ü A0 = agent; ü N0 = earlier material in the sentence. ü A1 = argument ü …

7 Research question (WSJ)

Is there any relationship between # words of N0 and # words of A0?

• (N0 According to the media)(A0 President Menem)(V took)(A1 office)(AM July 8). • More specifically,

Prediction: N0 size à A0 size?

8 WSJ corpus data: vbarg.txt

• a0, no: log transformed size

verb A0size N0size a0 n0 …. 1 take 2 17 1.09 2.89 … 2 say 4 0 1.60 0 … 3 expect 1 5 0.69 1.79 … 4 sell 4 0 1.60 0 … 5 say 9 0 2.30 0 … 6 increase 3 0 1.38 0 … … … … … … … … 30590 say 0 11 0 2.48

9 Linear regression model

• Equation: y= b + a*x

ü b intercept ü a slope

10 Linear regression model (WSJ)

• Equation: y= b + a*x

Predictor: n0 dependent var: a0

ü b intercept ü a slope

11 Linear regression model (WSJ)

• Equation: a0 = b + a*n0 • R function: lm()

12 Linear regression model (WSJ)

• Equation: a0 = b + a*n0

> lm(a0~n0,data=vbarg)

13 Linear regression model (WSJ)

• Equation: a0 = b + a*n0

> summary(lm(a0~n0,data=vbarg)) Call: lm(formula = a0 ~ n0, data = vbarg) … Coefficients: Est Std. Error t value Pr(>|t|) (Intercept) 1.14 0.005 196.68 <2e-16 *** n0 -0.19 0.003 -53.46 <2e-16 ***

14 Linear regression model (WSJ)

• Equation: a0 = b + a*n0

> summary(lm(a0~n0,data=vbarg)) Call: lm(formula = a0 ~ n0, data = vbarg) … Coefficients: Est Std. Error t value Pr(>|t|) (Intercept) 1.14 0.005 196.68 <2e-16 *** n0 -0.19 0.003 -53.46 <2e-16 ***

15 Linear regression model (WSJ)

• Equation: a0 = 1.14 + a*n0

> summary(lm(a0~n0,data=vbarg)) Call: lm(formula = a0 ~ n0, data = vbarg) … Coefficients: Est Std. Error t value Pr(>|t|) (Intercept) 1.14 0.005 196.68 <2e-16 *** n0 -0.19 0.003 -53.46 <2e-16 ***

16 Linear regression model (WSJ)

• Equation: a0 = 1.14 -0.19*n0 > summary(lm(a0~n0,data=vbarg)) Call: lm(formula = a0 ~ n0, data = vbarg) … Coefficients: Est Std. Error t value Pr(>|t|) (Intercept) 1.14 0.005 196.68 <2e-16 *** n0 -0.19 0.003 -53.46 <2e-16 ***

17 Linear regression model (WSJ)

• Equation: a0 = 1.14 -0.19*n0

> summary(lm(a0~n0,data=vbarg)) Call: lm(formula = a0 ~ n0, data = vbarg) … Coefficients: Est Std. Error t value Pr(>|t|) (Intercept) 1.14 0.005 196.68 <2e-16 *** n0 -0.19 0.003 -53.46 <2e-16 ***

18 Linear regression model (WSJ)

• Linear regression model found:

a0 = 1.14 - 0.19*n0

“As n0 size increases, a0 size decreases.”

19 Linear regression model (WSJ)

• Linear regression model found:

a0 = 1.14 - 0.19*n0 “As n0 size increases, a0 size decreases.” • Incorrect assumption All verbs have the same average a0 size.

20 Linear regression model (WSJ)

• Incorrect assumption All verbs have the same average a0 size. • Different verbs might behave differently.

21 Linear regression model (WSJ) • Different verbs might have the different average a0 size.

22 Linear regression model (WSJ) • Different verbs might have the different average a0 size. à Random effect

23 Random effect

• Similar examples in an experiment ü individual participants (subject) ü individual test words (item) • Items and subjects are randomly sampled from their populations. • If we repeat the same experiment, different items and subjects will be employed.

24 Fixed effect

• Usually not interested in random effects, but rather in fixed effects. • Fixed factor The set of possible levels of a factor is fixed, and each of these levels can be repeated. • Examples ü N0 ü Treatment factors with two levels: treatment vs. control group

25 Mixed effects model

• A model containing both fixed and random factors. • R Package: lmerTest • R function: lmer() (Note: Johnson (2008) uses different package and function.)

26 Mixed effects model

• random effects ü random intercept ü random slope

• Equation: y= b + a*x

27 Mixed effects model with random intercept (WSJ)

• Assumption

Average A0 size may be verb-specific. • The model includes a separate intercept for each verb. • Formula lmer(a0 ~ n0 + (1|verb), data=vbarg)

28 Mixed effects model with random intercept (WSJ)

• Assumption Average A0 size may be verb-specific. • The model includes a separate intercept for each verb. • Formula lmer(a0 ~ n0 + (1|verb), data=vbarg)

29 Mixed effects model with random intercept (WSJ)

> library(lmerTest) > verb.lmer01 <- lmer(a0~n0+(1|verb),data=vbarg) > summary(verb.lmer01)

30 Mixed effects model with random intercept (WSJ) > summary(verb.lmer01) … Random effects: Groups Name Variance Std.Dev. verb (Intercept) 0.1659 0.4074 Residual 0.3926 0.6266 Number of obs: 30590, groups: verb, 32 Fixed effects: Est Std.Error df t value Pr(>|t|) (Intercept) 0.850 0.07 31 11.77 5.04e-13 *** n0 -0.102 0.003 30570 -31.46 < 2e-16 ***

31 Mixed effects model with random intercept (WSJ) > summary(verb.lmer01) … Random effects: Groups Name Variance Std.Dev. verb (Intercept) 0.1659 0.4074 Residual 0.3926 0.6266 Number of obs: 30590, groups: verb, 32 Fixed effects: Est Std.Error df t value Pr(>|t|) (Intercept) 0.850 0.07 31 11.77 5.04e-13 *** n0 -0.102 0.003 30570 -31.46 < 2e-16 ***

32 Mixed effects model with random intercept (WSJ) > summary(verb.lmer01) … Random effects: Groups Name Variance Std.Dev. verb (Intercept) 0.1659 0.4074 Residual 0.3926 0.6266 Number of obs: 30590, groups: verb, 32 Fixed effects: Est Std.Error df t value Pr(>|t|) (Intercept) 0.850 0.07 31 11.77 5.04e-13 *** n0 -0.102 0.003 30570 -31.46 < 2e-16 ***

33 Mixed effects model with random intercept (WSJ) • model found:

a0 = 0.850 - 0.102*n0

There is a strong effect of n0 on a0 even after controlling for the different

average size of a0 for different verbs.

34 Mixed effects model with random intercept (WSJ) • model found:

a0 = 0.850 - 0.102*n0

There is a strong effect of n0 on a0 even after controlling for the different average size of a0 for different verbs. Cf. previous linear regression model:

a0 = 1.14 - 0.19*n0

35 Mixed effects model with random intercept and random slope (WSJ)

• Assumption Both average A0 size and the A0-N0 size relationship are verb-specific. • The model includes verb-specific intercept and slope. • Formula lmer(a0 ~ n0 + (1+n0|verb), data=vbarg)

36 Mixed effects model with random intercept and random slope (WSJ)

• Assumption Both average A0 size and the A0-N0 size relationship are verb-specific. • The model includes verb-specific intercept and slope. • Formula lmer(a0 ~ n0 + (1+n0|verb), data=vbarg)

37 Mixed effects model with random intercept and random slope (WSJ)

• Assumption Both average A0 size and the A0-N0 size relationship are verb-specific. • The model includes verb-specific intercept and slope. • Formula lmer(a0 ~ n0 + (1+n0|verb), data=vbarg)

38 Mixed effects model with random intercept and random slope (WSJ)

• Assumption Both average A0 size and the A0-N0 size relationship are verb-specific. • The model includes verb-specific intercept and slope. • Formula lmer(a0 ~ n0 + (1+n0|verb), data=vbarg)

39 Mixed effects model with random intercept and random slope (WSJ) > verb.lmer02<- lmer(a0~n0+(1+n0|verb),data=vbarg) > summary(verb.lmer02)

40 Mixed effects model with random intercept and random slope (WSJ)

> summary(verb.lmer02) … Random effects: Groups Name Variance Std.Dev. Corr verb (Intercept) 0.220 0.469 n0 0.004 0.065 -0.84 Residual 0.387 0.622 Number of obs: 30590, groups: verb, 32 Fixed effects: Est Std.Error df t value Pr(>|t|) (Intercept) 0.852 0.083 31.05 10.23 1.81e-11 *** n0 -0.101 0.012 30.75 -8.31 2.27e-09 ***

41 Mixed effects model with random intercept and random slope (WSJ)

42 Mixed effects model with random intercept and random slope (WSJ)

43 Mixed effects model with random intercept and random slope (WSJ) • A model with random intercept and slope:

a0 = 0.852 - 0.101*n0

44 Mixed effects model with random intercept and random slope (WSJ) • A model with random intercept and slope:

a0 = 0.852 - 0.101*n0

The estimate of n0 is very similar to the previous model with only random intercept, and it is still significant. Cf. a model with only random intercept

a0 = 0.850 - 0.102*n0

45 Model comparison

• Question Does a model with random slope and intercept fits the data better than the one with only random intercept? • Likelihood ratio test ü Compare smaller (or simpler) vs. larger (or more complicated) models. ü It is carried out by the anova() function. anova(verb.lmer01, verb.lmer02)

46 Model comparison (WSJ)

> anova(verb.lmer01, verb.lmer02) Data: vbarg Models: verb.lmer01: a0 ~ n0 + (1 | verb) verb.lmer02: a0 ~ n0 + (1 + n0 | verb) Df AIC BIC … Chisq Chi Df Pr(>Chisq) verb.lmer01 4 58397 … verb.lmer02 6 58083 … 318.39 2 < 2.2e-16 *** ---

47 Model comparison

• The likelihood ratio test is taken to compare models with and without a factor. • It measures how much the likelihood of the model improves. • If the ratio is close to 1.0, then the improved fit offered by the added factor is insubstantial and considered a non- significant predictor of the criterion variable.

48 Model comparison (WSJ)

• The ratio is 318.39 which is significantly greater than chance. The associated probability is very small. • This indicates that adding a random slope significantly improved the fit of the model.

> anova(verb.lmer01, verb.lmer02) … … LogLik … Chisq Chi Df Pr(>Chisq) verb.lmer01 … -29195 … verb.lmer02 … -29036 … 318.39 2 < 2.2e-16 ***

49 Roadmap

I. Mixed effects linear regression ● Wall Street Journal corpus data ○ Hypothetical VC duration data ○ Interaction terms and model selection II. Mixed effects logistic regression ○ English dative alternation

50 Hypothetical VC duration data

• Duration measurement: vowel- consonant sequence. e.g. (c)ab V C

(c)ap V C • Suppose: there is inverse relationship between consonant duration (Cdur) vs. vowel duration (Vdur).

51 VC duration

• Construct two types of measurement data: Data with … (01) subject-specific intercept only (02) subject-specific intercept and slope

52 Data 01: subject-specific intercept

• Equation: y = a + b*x + e (where y = Vdur, x = Cdur, e = error)

53 Data 01: subject-specific intercept

• Equation: Vdur = a + b*Cdur + e

• a (intercept) = 300 ms

• b (slope) = -1 • x (Cdur) = 100 ms

54 Data 01: subject-specific intercept

• Equation: Vdur = a + b*Cdur + e

• a (intercept) = 300 ms

n sd of subject-specific intercept = 20 ms • b (slope) = -1 • x (Cdur) = 100 ms

55 Data 01 with no error term

• Equation: y = a + b*x + e (where e = 0)

• Number of subjects (subjN) = 5 • Number of words (wordN) = 7

56 Data 01 w/o error subject word Cdur Vdur 1 1 94 232 1 2 92 235 1 3 91 235 1 4 103 224 1 5 92 234 1 6 103 223 1 7 123 203 2 1 100 189 … … … … 5 6 99 206 5 7 111 194

mean (sd) 98.4 (9.8) 198.5 (19.44)

57 Data 01 w/o error

• Average Vdur by subject subject Vdur 1 227 2 192 3 180 4 185 5 207

58 Data 01 w/o error: scatter plot

59 Data 01 w/o error: scatter plot

60 Data 01 with error term

• Equation: y = a + b*x + e (where e ¹ 0)

• Number of subjects (subjN) = 5 • Number of words (wordN) = 7

61 Data 01 with error subject word Cdur Vdur 1 1 94 233 1 2 92 236 1 3 91 234 1 4 103 221 1 5 92 241 1 6 103 220 1 7 123 197 2 1 100 193 … … … … 5 6 99 212 5 7 111 195

mean (sd) 98.4 (9.8) 199.8 (19.08)

62 Data 01 w/ error: scatter plot

63 Data 01 w/ error

• Mixed effects model with a separate intercept for each subject: • Average Vdur may be subject-specific.

> lmer(Vdur ~ Cdur + (1|subject),data=d)

64 Data 01 w/ error • Random effects Groups Name Variance Std.Dev. subject (Intercept) 303.93 17.434 Residual 24.05 4.904

• Fixed effects Estimate … t value Pr(>|t|) (Intercept) 302.964 25.885 <0.001 Cdur -1.047 -11.865 <0.001

65 Data 02: subject-specific intercept and slope • Equation: Vdur = a + b*Cdur + e

• a (intercept) = 300 ms

n sd of subject-specific intercept = 20 ms • b (slope) = -1

n sd of subject-specific slope = 0.3 • x (Cdur) = 100 ms

66 Data 02: subject-specific intercept and slope • Equation: y = a + b*x + e

• Number of subjects (subjN) = 5 • Number of words (wordN) = 7

67 Data 02 w/ subject-specific intercept & slope

subject word Cdur Vdur 1 1 94 224 1 2 92 227 1 3 91 225 1 4 103 211 1 5 92 232 1 6 103 210 1 7 123 185 2 1 100 198 … … … … 5 6 99 196 5 7 111 177

mean (sd) 98.4 (9.8) 195.2 (16.9)

68 Data 02: scatter plot

69 Data 02

• Mixed effects model with subject- specific intercept & slope:

> lmer(Vdur ~ Cdur + (1+Cdur|subject),data=d)

70 Data 02 • Random effects Groups Name Variance Std.Dev. Corr subject (Intercept) 708.599 26.62 Cdur 0.015 0.122 -1.00 Residual 20.989 4.581 • Fixed effects Estimate … t value Pr(>|t|) (Intercept) 305.987 21.186 <0.001 Cdur -1.124 -11.331 <0.001

71 Data 02 • Random effects Groups Name VarianceNot evenStd.Dev close. to 0.3.Corr subject (Intercept) 708.599 26.62 Cdur 0.015 0.122 -1.00 Residual 20.989 4.581 • Fixed effects Estimate … t value Pr(>|t|) (Intercept) 305.987 21.186 <0.001 Cdur -1.124 -11.331 <0.001

72 Data 02: larger data

• Equation: y = a + b*x + e

• Number of subjects (subjN) = 30 • Number of words (wordN) = 50

73 Data 02: larger data • Random effects Groups Name Variance Std.Dev. Corr subject (Intercept) 474.678 21.787 Cdur 0.089 0.298 -0.19 Residual 25.064 5.006 • Fixed effects Estimate … t value Pr(>|t|) (Intercept) 296.2 70.702 <0.001 Cdur -0.908 -16.218 <0.001

74 Roadmap

I. Mixed effects linear regression ● Wall Street Journal corpus data ● Hypothetical VC duration data ○ Interaction terms and model selection II. Mixed effects logistic regression ○ English dative alternation

75 Interaction terms and model selection

• When there are more than one fixed effect, we can also consider the interactions terms. • As a number of terms are added to the model, evaluating which model best fits the data becomes necessary. • vdur ~ 1 • vdur ~ cdur • vdur ~ cdur + voicing • vdur ~ cdur + voicing + cdur × voicing

76 Interaction terms and model selection

• A key aspect of the mixed-effects analysis is often model selection, the choice of a particular model within a class of candidate models. (Muller and Welsh 2013)

77 Model selection

• Aim: A parsimonious model with other desirable properties. Get as good a fit as possible with a minimum of predictive variables. • Starting from the saturated model (the full model with all factors and their interactions), eliminate non-significant predictors, applying LRT. • Backward elimination for linear mixed model

78 Functions for model selection

• update(): removes a factor from a model • anova(): Likelihood-ratio test

• step(): function for model selection (gives the results (=the best model) all at once)

79 Data: vcdur.txt

• English speakers’ production of monosyllabic English words differing in coda voicing (bit, bid, beat, bead..). • Speaker regions: NZ(3), UK(2), US(2) • Dependent variable: vdur

80 Specifying interaction terms a*b = a + b + a:b vdur ~ cdur*voicing vdur ~ cdur + voicing + cdur∶voicing vdur ~ cdur*voicing*height vdur ~ cdur + voicing + height + cdur:voicing+ voicing:height + cdur:height + cdur:voicing:height

81 Saturated model

• Model with all factors and their interactions.

i.e. ML (Use this for All factors and their interactions model selection) ☞ Use REML=TRUE (for parameter estimation)

• Starting with this saturated model, eliminate factors and interactions that do not significantly contribute to the goodness of model fit. • The contribution is tested by LRT (p<0.05).

82 Saturated model Model with all interaction terms summary(mm1)

Fixed effects: Estimate Std. Error df t value Pr(>|t|) (Intercept) 216.04612 20.24309 41.60000 10.673 1.76e-13 *** cdur -0.15703 0.07356 480.90000 -2.135 0.03328 * voicingVOICELESS -24.87089 25.79788 38.70000 -0.964 0.34100 heightNONHIGH 12.46191 24.78135 33.00000 0.503 0.61839 cdur:voicingVOICELESS -0.23325 0.08912 474.40000 -2.617 0.00915 ** cdur:heightNONHIGH 0.18897 0.09312 474.90000 2.029 0.04299 * voicingVOICELESS:heightNONHIGH 8.80019 36.61632 39.20000 0.240 0.81132 cdur:voicingVOICELESS:heightNONHIGH -0.17443 0.13220 473.60000 -1.319 0.18765 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

We will eliminate this non- significant three-way interaction. 83 Eliminating a factor: Use Update()

• Syntax: update(x, ~.-factor) • Delete the non-significant three-way interaction term by update(). mm2<-update(mm1, ~.-cdur:voicing:height)

• Output model specification:

Linear mixed model fit by maximum likelihood . t-tests use Satterthwaite's method [ lmerModLmerTest] Formula: vdur ~ cdur + voicing + height + (1 | speaker) + (1 | item) + cdur:voicing + cdur:height + voicing:height

84 Likelihood Ratio Test

> anova(mm1,mm2) Data: d Models: mm2: vdur ~ cdur + voicing + height + (1 | speaker) + (1 | item) + mm2: cdur:voicing + cdur:height + voicing:height mm1: vdur ~ cdur * voicing * height + (1 | speaker) + (1 | item) Df AIC BIC logLik deviance Chisq Chi Df Pr(>Chisq) mm2 10 4996.9 5039.1 -2488.5 4976.9 mm1 11 4997.2 5043.6 -2487.6 4975.2 1.7379 1 0.1874

• The saturated model (mm1) is not better than the reduced model (mm2), so we can discard mm1.

85 Reporting the result

Report Chi sq statistics: χ2(df)=Chisq value, p= p value 86 Current model: summary (mm2)

Fixed effects: Estimate Std. Error df t value Pr(>|t|) (Intercept) 212.42246 20.06869 40.10000 10.585 3.55e-13 *** cdur -0.12437 0.06938 478.10000 -1.793 0.0737 . voicingVOICELESS -14.43787 24.57580 31.90000 -0.587 0.5610 heightNONHIGH 21.47215 23.83963 28.30000 0.901 0.3754 cdur:voicingVOICELESS -0.31050 0.06731 474.70000 -4.613 5.11e-06 *** cdur:heightNONHIGH 0.10303 0.06668 474.20000 1.545 0.1230 voicingVOICELESS:heightNONHIGH -13.76816 32.39704 24.10000 -0.425 0.6746 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Among the interactions, eliminate the interaction with higher p-value

87 Elimination (2nd)

> mm3<-update(mm2,~.-voicing:height) > anova(mm2,mm3) Data: d Models: mm3: vdur ~ cdur + voicing + height + (1 | speaker) + (1 | item) + mm3: cdur:voicing + cdur:height mm2: vdur ~ cdur + voicing + height + (1 | speaker) + (1 | item) + mm2: cdur:voicing + cdur:height + voicing:height Df AIC BIC logLik deviance Chisq Chi Df Pr(>Chisq) mm3 9 4995.1 5033.1 -2488.6 4977.1 mm2 10 4996.9 5039.1 -2488.5 4976.9 0.18 1 0.6714 • A model with more parameters(mm2) is not better than mm3, so we can discard mm2 and adopt a simpler model mm3. • Reporting: The interaction of voicing and vowel height did not significantly improve the goodness of fit 2 (χ (1)=0.18,p=0.67). 88 Summary of the current model (mm3)

Fixed effects: Estimate Std. Error df t value Pr(>|t|) (Intercept) 215.77739 18.48194 41.70000 11.675 1.02e-14 *** cdur -0.12390 0.06938 478.10000 -1.786 0.0748 . voicingVOICELESS -21.31848 18.52951 40.40000 -1.151 0.2567 heightNONHIGH 14.94762 18.29788 38.30000 0.817 0.4190 cdur:voicingVOICELESS -0.31019 0.06730 474.90000 -4.609 5.21e-06 *** cdur:heightNONHIGH 0.10022 0.06634 482.60000 1.511 0.1315 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Now remove cdur:height.

• Repeat the same procedure until deleting a term yields a significant change in the goodness of fit (p<0.05), ..then: 89 > anova(mm1,mm2,mm3,mm4,mm5,mm6,mm7,mm8) Data: d Models: mm8: vdur ~ cdur + (1 | speaker) + (1 | item) mm6: vdur ~ cdur + (1 | speaker) + (1 | item) + cdur:voicing mm7: vdur ~ (1 | speaker) + (1 | item) + cdur:voicing mm5: vdur ~ cdur + voicing + (1 | speaker) + (1 | item) + cdur:voicing mm4: vdur ~ cdur + voicing + height + (1 | speaker) + (1 | item) + mm4: cdur:voicing mm3: vdur ~ cdur + voicing + height + (1 | speaker) + (1 | item) + mm3: cdur:voicing + cdur:height mm2: vdur ~ cdur + voicing + height + (1 | speaker) + (1 | item) + mm2: cdur:voicing + cdur:height + voicing:height mm6:Best mm1: vdur ~ cdur * voicing * height + (1 | speaker) + (1 | item) model Df AIC BIC logLik deviance Chisq Chi Df Pr(>Chisq) mm8 5 5022.7 5043.8 -2506.3 5012.7 mm6 6 4995.4 5020.7 -2491.7 4983.4 29.2976 1 6.207e-08 *** mm7 6 4995.4 5020.7 -2491.7 4983.4 0.0000 0 1.00000 mm5 7 4996.2 5025.7 -2491.1 4982.2 1.1459 1 0.28442 mm4 8 4995.4 5029.1 -2489.7 4979.4 2.8169 1 0.09328 . mm3 9 4995.1 5033.1 -2488.6 4977.1 2.2750 1 0.13148 mm2 10 4996.9 5039.1 -2488.5 4976.9 0.1800 1 0.67140 mm1 11 4997.2 5043.6 -2487.6 4975.2 mm5:1.7379 Best 1model 0.18741 that keeps --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’all 0.05 main ‘.’ effects0.1 ‘ ’ in 1 the interactions 90 Eliminating random effects

• To the best model so far (mm5), try removing random effects. mm9<–update(mm5, ~.-(1|speaker),RELM=TRUE) mm10<-update(mm5, ~.-(1|item),RELM=TRUE) anova(mm5,mm9) anova(mm5,mm10) • The current model is significantly better than the model without the by-speaker random intercept (χ2(1)=201.22, p<0.0001) and the one without the by- item random intercept (χ2(1)=401.54, p<0.0001).

91 Refitting the best model with REML

• After the best model is decided, refit the model with REML to obtain the best estimate of parameter values.

Remove REML=FALSE mm5<- lmer(vdur~cdur+voicing+cdur:voicing+(1|speak er)+(1|item),data=d)

92 The best model summary & interpretation

Fixed effects: Estimate Std. Error df t value Pr(>|t|) (Intercept) 224.61615 17.24766 33.40000 13.023 1.25e-14 *** cdur -0.08896 0.06580 468.50000 -1.352 0.177 voicingVOICELESS -20.87662 19.86242 34.90000 -1.051 0.300 cdur:voicingVOICELESS -0.30664 0.06758 472.70000 -4.537 7.23e-06 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 • Interpretation: • Vowel duration changes by -0.31 ms (t(473)=-4.54, p< .0001), as consonant duration increases by 1 ms if the consonant is voiceless.

93 Backward Elimination using Step()

• Function step() performs backward elimination of non- significant effects of linear mixed effects model. • The step() function of package lmerTest overrides the one for lm objects (which does forward selection based on AIC). • Applying to our data:

> step(mm1) Backward reduced random-effect table:

Eliminated npar logLik AIC LRT Df Pr(>Chisq) 11 -2487.6 4997.2 (1 | speaker) 0 10 -2588.7 5197.3 202.09 1 < 2.2e-16 *** (1 | item) 0 10 -2673.5 5367.0 371.78 1 < 2.2e-16 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

94 Fixed effects:

Backward reduced fixed-effect table: Degrees of freedom method: Satterthwaite

Eliminated Sum Sq Mean Sq NumDF DenDF F value Pr(>F) cdur:voicing:height 1 1654.5 1654.5 1 473.56 1.7410 0.18765 voicing:height 2 172.3 172.3 1 24.10 0.1806 0.67462 cdur:height 3 2176.4 2176.4 1 482.55 2.2819 0.13155 height 4 2866.7 2866.7 1 23.61 2.9903 0.09682 . cdur:voicing 0 19776.1 19776.1 1 474.75 20.6284 7.079e-06 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Selected model:

Model found: vdur ~ cdur + voicing + (1 | speaker) + (1 | item) + cdur:voicing

95 Roadmap

I. Mixed effects linear regression ● Wall Street Journal corpus data ● Hypothetical VC duration data ● Interaction terms and model selection II. Mixed effects logistic regression ○ English dative alternation

96 II. Mixed effects logistic regression

○ English dative alternation

97 English dative alternation

• Discussed in Johnson (2008: section 7.4, citing Bresnan et al 2007) • Two alternative ways dative PP dative NP I pushed the box to John. I pushed John the box.

98 English dative alternation

• Preference (A > B: A is preferred to B.) dative PP dative NP

I pushed the box to John. > I pushed John the box.

That movie gave the creeps That movie gave me the to me. < creeps. This will give the creeps to > This will give just about just about anyone. anyone the creeps.

99 English dative alternation

• Factors for preference: recipient = pronoun or not; short or long; given or new …?

100 Data: BresDative.txt

Two corpora, • the Switchboard corpus of conversational speech • the Wall Street Journal corpus of text

101 Data: BresDative.txt Code: • the realization of the dative (PP, NP), • the discourse accessibility, definiteness, animacy, and pronominality of the recipient and theme; • the semantic class of the verb ü abstract, transfer, future transfer, prevention of possession, and communication • a measure of the difference between the (log) length of the recipient and (log) length of the theme.

102 Data: BresDative.txt

real verb vsense class animrec …. 1 NP feed feed.t t animate … 2 NP give give.a a animate … 3 NP give give.a a animate … 4 NP give give.a a animate … 5 NP offer offer.c c animate … 6 NP give give.a a animate … … … … … … … … 3265 NP give give.a a animate …

103 Data: BresDative.txt • Question: … predict …?

class accessibility à realization of dative definiteness …

104 Data: BresDative.txt • Logistic regression R function: glm()

105 Logistic regression • Regression for count data • We cannot apply standard linear regression to the proportions. ü Proportions are bounded between 0 and 1, but lm() does not know this. ü Other problems. è Logit transformation

106 Logistic regression • Logit transformation i. p proportion 0 ~ 1

! ii. odds 0 ~ ¥ "#! ! iii. log( ) log odds, i.e. logit -¥ ~ +¥ "#!

107 Data: BresDative.txt

• Logistic regression model of dative alternation ü glm(real~class+accrec+…, family=binomial, data=SwitchDat) “PP realization as a function of class, …”

108 Data: BresDative.txt

> summary(glm(real ~ class+accrec+accth+prorec+ proth+defrec+defth+animrec+ldiff, family=binomial, data=SwitchDat)) ... Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) 0.3498 0.3554 0.984 0.32503 classc -1.3516 0.3141 -4.303 1.68e-05 *** classf 0.5138 0.4922 1.044 0.29651 classp -3.4277 1.2504 -2.741 0.00612 ** classt 1.1571 0.2055 5.631 1.80e-08 *** accrecnotgiven 1.1282 0.2681 4.208 2.57e-05 *** ...

109 Data: BresDative.txt

... Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) 0.3498 0.3554 0.984 0.32503 classc -1.3516 0.3141 -4.303 1.68e-05 *** classf 0.5138 0.4922 1.044 0.29651 classp -3.4277 1.2504 -2.741 0.00612 ** classt 1.1571 0.2055 5.631 1.80e-08 *** accrecnotgiven 1.1282 0.2681 4.208 2.57e-05 *** ... When recipient is not given earlier, PP is more likely.

110 Data: BresDative.txt

• Fixed effects logistic regression model real = 0.34 – 1.35*classc + 0.51*classf – 3.42*classp + 1.15*classt + 1.12*accrecnotgiven + …

111 Data: BresDative.txt

• A fixed effects logistic regression model real = 0.34 – 1.35*classc + 0.51*classf – 3.42*classp + 1.15*classt + 1.12*accrecnotgiven + … • Incorrect assumption All verbs have the same preference of dative alternation.

112 Data: BresDative.txt

• Possibility Verbs may differ in the preference of dative alternation.

113 Data: BresDative.txt

• Possibility Verbs may differ in the preference of dative alternation. • Verbs may have more than one senses. E.g. pay • Transfer: to pay him some money • More abstract: to pay attention to the clock

114 Data: BresDative.txt

• Possibility (revised) Verb senses may differ in the preference of dative alternation. • Add a random factor for verb sense.

115 Mixed effects logistic regression model

• Logistic regression model containing both fixed and random factors. • R Package: lmerTest • R function: glmer()

116 A model with random intercept

• Assumption Average preference of PP realization may be specific to each verb sense. • The model includes a separate intercept for each verb sense.

117 A model with random intercept

• The model includes a separate intercept for each verb sense. • Formula for “PP realization as a function of class, etc.”. • glmer(real ~ class + accrec + accth + prorec + proth + defrec + defth + animrec + ldiff + (1|vsense), family=binomial, data=SwitchDat)

118 A model with random intercept

119 Mixed effects logistic model w/ random intercept > summary(glmer(real ~ class...(1|vsense),...) ... Random effects: Groups Name Variance Std.Dev. vsense (Intercept) 4.976 2.23 Fixed effects: Estimate Std. Error z value Pr(>|z|) (Intercept) 1.3943 0.8069 1.728 0.083984 . classc -1.3882 1.1481 -1.209 0.226614 classf -0.2491 1.3619 -0.183 0.854862 classp -4.9325 2.1697 -2.273 0.023007 * classt 0.9035 0.9647 0.937 0.348957 accrecnotgiven 1.6421 0.3417 4.805 1.55e-06 *** ...

120 Mixed effects logistic model w/ random intercept

• Compare the outputs of mixed effects model vs. model with no random factor.

> summary(glmer(...(1|vsense)...) > summary(glm(...) Fixed effects: Estimate Pr(>|z|) Estimate Pr(>|z|) (Intercept) 1.3943 0.083984 . 0.3498 0.32503 classc -1.3882 0.226614 -1.3516 1.68e-05 *** classf -0.2491 0.854862 0.5138 0.29651 classp -4.9325 0.023007 * -3.4277 0.00612 ** classt 0.9035 0.348957 1.1571 1.80e-08 *** Accrecnotgiven 1.6421 1.55e-06 *** 1.1282 2.57e-05 *** ......

121 Random intercepts and slopes

• Many maximal random effects models (e.g. both random intercept and slope) fail to converge (i.e. can’t find a solution). • In that case, you will have to use a simpler model (e.g. intercepts-only).

122 What we’ve covered today

123 Thank you.

124