Summer workshop: July 5, 2018 the Korean Society of Speech Sciences

Basics of mixed effects models in

Jongho Jun Hyesun Cho

([email protected]) ([email protected]) Seoul National University Dankook University Topics

• Mixed effects • Mixed effects

• Fixed effect • Random effect ü Random intercept ü Random slope • Model comparison

2 Roadmap

I. Mixed effects linear regression ○ Wall Street Journal corpus ○ Hypothetical VC duration data ○ terms and II. Mixed effects logistic regression ○ English dative alternation

3 Data for in-class discussion • vbarg.txt • BresDative.txt Download 7. Syntax (.zip) from the DOWNLOADS tab at https://www.wiley.com/en- us/Quantitative+Methods+In+Linguistics-p- 9781405144247. • vcdur.txt: download from https://goo.gl/N2oDaS

4 References Ÿ Johnson, Keith. (2008) Quantitative methods in linguistics. Blackwell. ch7.

Ÿ Baayen, R. Harald. (2008). Analyzing Linguistic Data: A Practical Introduction to Using R. Cambridge University Press. ch7. Ÿ Muller, Samuel, J. L. Scealy & A.H. Welsh (2013) Model Selection in Linear Mixed Models. Statistical Science 28(2), 135-167

5 I. Mixed effects linear regression

○ Wall Street Journal corpus data ○ Hypothetical VC duration data ○ Interaction terms and model selection

6 Wall Street Journal (WSJ) corpus

• Discussed in Johnson (2008 section 7.3) • Coding

(N0 According to the media)(A0 President

Menem)(V took)(A1 office)(AM July 8). ü A0 = agent; ü N0 = earlier material in the sentence. ü A1 = argument ü …

7 Research question (WSJ)

Is there any relationship between # words of N0 and # words of A0?

• (N0 According to the media)(A0 President Menem)(V took)(A1 office)(AM July 8). • More specifically,

Prediction: N0 size à A0 size?

8 WSJ corpus data: vbarg.txt

• a0, no: log transformed size

verb A0size N0size a0 n0 …. 1 take 2 17 1.09 2.89 … 2 say 4 0 1.60 0 … 3 expect 1 5 0.69 1.79 … 4 sell 4 0 1.60 0 … 5 say 9 0 2.30 0 … 6 increase 3 0 1.38 0 … … … … … … … … 30590 say 0 11 0 2.48

9 Linear regression model

• Equation: y= b + a*x

ü b intercept ü a slope

10 Linear regression model (WSJ)

• Equation: y= b + a*x

Predictor: n0 dependent var: a0

ü b intercept ü a slope

11 Linear regression model (WSJ)

• Equation: a0 = b + a*n0 • R function: lm()

12 Linear regression model (WSJ)

• Equation: a0 = b + a*n0

> lm(a0~n0,data=vbarg)

13 Linear regression model (WSJ)

• Equation: a0 = b + a*n0

> summary(lm(a0~n0,data=vbarg)) Call: lm(formula = a0 ~ n0, data = vbarg) … Coefficients: Est Std. Error t value Pr(>|t|) (Intercept) 1.14 0.005 196.68 <2e-16 *** n0 -0.19 0.003 -53.46 <2e-16 ***

14 Linear regression model (WSJ)

• Equation: a0 = b + a*n0

> summary(lm(a0~n0,data=vbarg)) Call: lm(formula = a0 ~ n0, data = vbarg) … Coefficients: Est Std. Error t value Pr(>|t|) (Intercept) 1.14 0.005 196.68 <2e-16 *** n0 -0.19 0.003 -53.46 <2e-16 ***

15 Linear regression model (WSJ)

• Equation: a0 = 1.14 + a*n0

> summary(lm(a0~n0,data=vbarg)) Call: lm(formula = a0 ~ n0, data = vbarg) … Coefficients: Est Std. Error t value Pr(>|t|) (Intercept) 1.14 0.005 196.68 <2e-16 *** n0 -0.19 0.003 -53.46 <2e-16 ***

16 Linear regression model (WSJ)

• Equation: a0 = 1.14 -0.19*n0 > summary(lm(a0~n0,data=vbarg)) Call: lm(formula = a0 ~ n0, data = vbarg) … Coefficients: Est Std. Error t value Pr(>|t|) (Intercept) 1.14 0.005 196.68 <2e-16 *** n0 -0.19 0.003 -53.46 <2e-16 ***

17 Linear regression model (WSJ)

• Equation: a0 = 1.14 -0.19*n0

> summary(lm(a0~n0,data=vbarg)) Call: lm(formula = a0 ~ n0, data = vbarg) … Coefficients: Est Std. Error t value Pr(>|t|) (Intercept) 1.14 0.005 196.68 <2e-16 *** n0 -0.19 0.003 -53.46 <2e-16 ***

18 Linear regression model (WSJ)

• Linear regression model found:

a0 = 1.14 - 0.19*n0

“As n0 size increases, a0 size decreases.”

19 Linear regression model (WSJ)

• Linear regression model found:

a0 = 1.14 - 0.19*n0 “As n0 size increases, a0 size decreases.” • Incorrect assumption All verbs have the same average a0 size.

20 Linear regression model (WSJ)

• Incorrect assumption All verbs have the same average a0 size. • Different verbs might behave differently.

21 Linear regression model (WSJ) • Different verbs might have the different average a0 size.

verb A0size N0size a0 n0 …. 1 take 2 17 1.09 2.89 … 2 say 4 0 1.60 0 … 3 expect 1 5 0.69 1.79 … 4 sell 4 0 1.60 0 … 5 say 9 0 2.30 0 … 6 increase 3 0 1.38 0 … … … … … … … … 30590 say 0 11 0 2.48

22 Linear regression model (WSJ) • Different verbs might have the different average a0 size. à Random effect

verb A0size N0size a0 n0 …. 1 take 2 17 1.09 2.89 … 2 say 4 0 1.60 0 … 3 expect 1 5 0.69 1.79 … 4 sell 4 0 1.60 0 … 5 say 9 0 2.30 0 … 6 increase 3 0 1.38 0 … … … … … … … … 30590 say 0 11 0 2.48

23 Random effect

• Similar examples in an ü individual participants (subject) ü individual test words (item) • Items and subjects are randomly sampled from their populations. • If we repeat the same experiment, different items and subjects will be employed.

24 Fixed effect

• Usually not interested in random effects, but rather in fixed effects. • Fixed factor The set of possible levels of a factor is fixed, and each of these levels can be repeated. • Examples ü N0 ü Treatment factors with two levels: treatment vs. control group

25 Mixed effects model

• A model containing both fixed and random factors. • R Package: lmerTest • R function: lmer() (Note: Johnson (2008) uses different package and function.)

26 Mixed effects model

• random effects ü random intercept ü random slope

• Equation: y= b + a*x

27 Mixed effects model with random intercept (WSJ)

• Assumption

Average A0 size may be verb-specific. • The model includes a separate intercept for each verb. • Formula lmer(a0 ~ n0 + (1|verb), data=vbarg)

28 Mixed effects model with random intercept (WSJ)

• Assumption Average A0 size may be verb-specific. • The model includes a separate intercept for each verb. • Formula lmer(a0 ~ n0 + (1|verb), data=vbarg)

29 Mixed effects model with random intercept (WSJ)

> library(lmerTest) > verb.lmer01 <- lmer(a0~n0+(1|verb),data=vbarg) > summary(verb.lmer01)

30 Mixed effects model with random intercept (WSJ) > summary(verb.lmer01) … Random effects: Groups Name Std.Dev. verb (Intercept) 0.1659 0.4074 Residual 0.3926 0.6266 Number of obs: 30590, groups: verb, 32 Fixed effects: Est Std.Error df t value Pr(>|t|) (Intercept) 0.850 0.07 31 11.77 5.04e-13 *** n0 -0.102 0.003 30570 -31.46 < 2e-16 ***

31 Mixed effects model with random intercept (WSJ) > summary(verb.lmer01) … Random effects: Groups Name Variance Std.Dev. verb (Intercept) 0.1659 0.4074 Residual 0.3926 0.6266 Number of obs: 30590, groups: verb, 32 Fixed effects: Est Std.Error df t value Pr(>|t|) (Intercept) 0.850 0.07 31 11.77 5.04e-13 *** n0 -0.102 0.003 30570 -31.46 < 2e-16 ***

32 Mixed effects model with random intercept (WSJ) > summary(verb.lmer01) … Random effects: Groups Name Variance Std.Dev. verb (Intercept) 0.1659 0.4074 Residual 0.3926 0.6266 Number of obs: 30590, groups: verb, 32 Fixed effects: Est Std.Error df t value Pr(>|t|) (Intercept) 0.850 0.07 31 11.77 5.04e-13 *** n0 -0.102 0.003 30570 -31.46 < 2e-16 ***

33 Mixed effects model with random intercept (WSJ) • model found:

a0 = 0.850 - 0.102*n0

There is a strong effect of n0 on a0 even after controlling for the different

average size of a0 for different verbs.

34 Mixed effects model with random intercept (WSJ) • model found:

a0 = 0.850 - 0.102*n0

There is a strong effect of n0 on a0 even after controlling for the different average size of a0 for different verbs. Cf. previous linear regression model:

a0 = 1.14 - 0.19*n0

35 Mixed effects model with random intercept and random slope (WSJ)

• Assumption Both average A0 size and the A0-N0 size relationship are verb-specific. • The model includes verb-specific intercept and slope. • Formula lmer(a0 ~ n0 + (1+n0|verb), data=vbarg)

36 Mixed effects model with random intercept and random slope (WSJ)

• Assumption Both average A0 size and the A0-N0 size relationship are verb-specific. • The model includes verb-specific intercept and slope. • Formula lmer(a0 ~ n0 + (1+n0|verb), data=vbarg)

37 Mixed effects model with random intercept and random slope (WSJ)

• Assumption Both average A0 size and the A0-N0 size relationship are verb-specific. • The model includes verb-specific intercept and slope. • Formula lmer(a0 ~ n0 + (1+n0|verb), data=vbarg)

38 Mixed effects model with random intercept and random slope (WSJ)

• Assumption Both average A0 size and the A0-N0 size relationship are verb-specific. • The model includes verb-specific intercept and slope. • Formula lmer(a0 ~ n0 + (1+n0|verb), data=vbarg)

39 Mixed effects model with random intercept and random slope (WSJ) > verb.lmer02<- lmer(a0~n0+(1+n0|verb),data=vbarg) > summary(verb.lmer02)

40 Mixed effects model with random intercept and random slope (WSJ)

> summary(verb.lmer02) … Random effects: Groups Name Variance Std.Dev. Corr verb (Intercept) 0.220 0.469 n0 0.004 0.065 -0.84 Residual 0.387 0.622 Number of obs: 30590, groups: verb, 32 Fixed effects: Est Std.Error df t value Pr(>|t|) (Intercept) 0.852 0.083 31.05 10.23 1.81e-11 *** n0 -0.101 0.012 30.75 -8.31 2.27e-09 ***

41 Mixed effects model with random intercept and random slope (WSJ)

> summary(verb.lmer02) … Random effects: Groups Name Variance Std.Dev. Corr verb (Intercept) 0.220 0.469 n0 0.004 0.065 -0.84 Residual 0.387 0.622 Number of obs: 30590, groups: verb, 32 Fixed effects: Est Std.Error df t value Pr(>|t|) (Intercept) 0.852 0.083 31.05 10.23 1.81e-11 *** n0 -0.101 0.012 30.75 -8.31 2.27e-09 ***

42 Mixed effects model with random intercept and random slope (WSJ)

> summary(verb.lmer02) … Random effects: Groups Name Variance Std.Dev. Corr verb (Intercept) 0.220 0.469 n0 0.004 0.065 -0.84 Residual 0.387 0.622 Number of obs: 30590, groups: verb, 32 Fixed effects: Est Std.Error df t value Pr(>|t|) (Intercept) 0.852 0.083 31.05 10.23 1.81e-11 *** n0 -0.101 0.012 30.75 -8.31 2.27e-09 ***

43 Mixed effects model with random intercept and random slope (WSJ) • A model with random intercept and slope:

a0 = 0.852 - 0.101*n0

44 Mixed effects model with random intercept and random slope (WSJ) • A model with random intercept and slope:

a0 = 0.852 - 0.101*n0

The estimate of n0 is very similar to the previous model with only random intercept, and it is still significant. Cf. a model with only random intercept

a0 = 0.850 - 0.102*n0

45 Model comparison

• Question Does a model with random slope and intercept fits the data better than the one with only random intercept? • Likelihood ratio test ü Compare smaller (or simpler) vs. larger (or more complicated) models. ü It is carried out by the anova() function. anova(verb.lmer01, verb.lmer02)

46 Model comparison (WSJ)

> anova(verb.lmer01, verb.lmer02) Data: vbarg Models: verb.lmer01: a0 ~ n0 + (1 | verb) verb.lmer02: a0 ~ n0 + (1 + n0 | verb) Df AIC BIC … Chisq Chi Df Pr(>Chisq) verb.lmer01 4 58397 … verb.lmer02 6 58083 … 318.39 2 < 2.2e-16 *** ---

47 Model comparison

• The likelihood ratio test is taken to compare models with and without a factor. • It measures how much the likelihood of the model improves. • If the ratio is close to 1.0, then the improved fit offered by the added factor is insubstantial and considered a non- significant predictor of the criterion variable.

48 Model comparison (WSJ)

• The ratio is 318.39 which is significantly greater than chance. The associated probability is very small. • This indicates that adding a random slope significantly improved the fit of the model.

> anova(verb.lmer01, verb.lmer02) … … LogLik … Chisq Chi Df Pr(>Chisq) verb.lmer01 … -29195 … verb.lmer02 … -29036 … 318.39 2 < 2.2e-16 ***

49 Roadmap

I. Mixed effects linear regression ● Wall Street Journal corpus data ○ Hypothetical VC duration data ○ Interaction terms and model selection II. Mixed effects logistic regression ○ English dative alternation

50 Hypothetical VC duration data

• Duration measurement: vowel- consonant sequence. e.g. (c)ab V C

(c)ap V C • Suppose: there is inverse relationship between consonant duration (Cdur) vs. vowel duration (Vdur).

51 VC duration

• Construct two types of measurement data: Data with … (01) subject-specific intercept only (02) subject-specific intercept and slope

52 Data 01: subject-specific intercept

• Equation: y = a + b*x + e (where y = Vdur, x = Cdur, e = error)

53 Data 01: subject-specific intercept

• Equation: Vdur = a + b*Cdur + e

• a (intercept) = 300 ms

• b (slope) = -1 • x (Cdur) = 100 ms

54 Data 01: subject-specific intercept

• Equation: Vdur = a + b*Cdur + e

• a (intercept) = 300 ms

n sd of subject-specific intercept = 20 ms • b (slope) = -1 • x (Cdur) = 100 ms

55 Data 01 with no error term

• Equation: y = a + b*x + e (where e = 0)

• Number of subjects (subjN) = 5 • Number of words (wordN) = 7

56 Data 01 w/o error subject word Cdur Vdur 1 1 94 232 1 2 92 235 1 3 91 235 1 4 103 224 1 5 92 234 1 6 103 223 1 7 123 203 2 1 100 189 … … … … 5 6 99 206 5 7 111 194

(sd) 98.4 (9.8) 198.5 (19.44)

57 Data 01 w/o error

• Average Vdur by subject subject Vdur 1 227 2 192 3 180 4 185 5 207

58 Data 01 w/o error:

59 Data 01 w/o error: scatter plot

60 Data 01 with error term

• Equation: y = a + b*x + e (where e ¹ 0)

• Number of subjects (subjN) = 5 • Number of words (wordN) = 7

61 Data 01 with error subject word Cdur Vdur 1 1 94 233 1 2 92 236 1 3 91 234 1 4 103 221 1 5 92 241 1 6 103 220 1 7 123 197 2 1 100 193 … … … … 5 6 99 212 5 7 111 195

mean (sd) 98.4 (9.8) 199.8 (19.08)

62 Data 01 w/ error: scatter plot

63 Data 01 w/ error

• Mixed effects model with a separate intercept for each subject: • Average Vdur may be subject-specific.

> lmer(Vdur ~ Cdur + (1|subject),data=d)

64 Data 01 w/ error • Random effects Groups Name Variance Std.Dev. subject (Intercept) 303.93 17.434 Residual 24.05 4.904

• Fixed effects Estimate … t value Pr(>|t|) (Intercept) 302.964 25.885 <0.001 Cdur -1.047 -11.865 <0.001

65 Data 02: subject-specific intercept and slope • Equation: Vdur = a + b*Cdur + e

• a (intercept) = 300 ms

n sd of subject-specific intercept = 20 ms • b (slope) = -1

n sd of subject-specific slope = 0.3 • x (Cdur) = 100 ms

66 Data 02: subject-specific intercept and slope • Equation: y = a + b*x + e

• Number of subjects (subjN) = 5 • Number of words (wordN) = 7

67 Data 02 w/ subject-specific intercept & slope

subject word Cdur Vdur 1 1 94 224 1 2 92 227 1 3 91 225 1 4 103 211 1 5 92 232 1 6 103 210 1 7 123 185 2 1 100 198 … … … … 5 6 99 196 5 7 111 177

mean (sd) 98.4 (9.8) 195.2 (16.9)

68 Data 02: scatter plot

69 Data 02

• Mixed effects model with subject- specific intercept & slope:

> lmer(Vdur ~ Cdur + (1+Cdur|subject),data=d)

70 Data 02 • Random effects Groups Name Variance Std.Dev. Corr subject (Intercept) 708.599 26.62 Cdur 0.015 0.122 -1.00 Residual 20.989 4.581 • Fixed effects Estimate … t value Pr(>|t|) (Intercept) 305.987 21.186 <0.001 Cdur -1.124 -11.331 <0.001

71 Data 02 • Random effects Groups Name VarianceNot evenStd.Dev close. to 0.3.Corr subject (Intercept) 708.599 26.62 Cdur 0.015 0.122 -1.00 Residual 20.989 4.581 • Fixed effects Estimate … t value Pr(>|t|) (Intercept) 305.987 21.186 <0.001 Cdur -1.124 -11.331 <0.001

72 Data 02: larger data

• Equation: y = a + b*x + e

• Number of subjects (subjN) = 30 • Number of words (wordN) = 50

73 Data 02: larger data • Random effects Groups Name Variance Std.Dev. Corr subject (Intercept) 474.678 21.787 Cdur 0.089 0.298 -0.19 Residual 25.064 5.006 • Fixed effects Estimate … t value Pr(>|t|) (Intercept) 296.2 70.702 <0.001 Cdur -0.908 -16.218 <0.001

74 Roadmap

I. Mixed effects linear regression ● Wall Street Journal corpus data ● Hypothetical VC duration data ○ Interaction terms and model selection II. Mixed effects logistic regression ○ English dative alternation

75 Interaction terms and model selection

• When there are more than one fixed effect, we can also consider the interactions terms. • As a number of terms are added to the model, evaluating which model best fits the data becomes necessary. • vdur ~ 1 • vdur ~ cdur • vdur ~ cdur + voicing • vdur ~ cdur + voicing + cdur × voicing

76 Interaction terms and model selection

• A key aspect of the mixed-effects analysis is often model selection, the choice of a particular model within a class of candidate models. (Muller and Welsh 2013)

77 Model selection

• Aim: A parsimonious model with other desirable properties. Get as good a fit as possible with a minimum of predictive variables. • Starting from the saturated model (the full model with all factors and their interactions), eliminate non-significant predictors, applying LRT. • Backward elimination for linear mixed model

78 Functions for model selection

• update(): removes a factor from a model • anova(): Likelihood-ratio test

• step(): function for model selection (gives the results (=the best model) all at once)

79 Data: vcdur.txt

• English speakers’ production of monosyllabic English words differing in coda voicing (bit, bid, beat, bead..). • Speaker regions: NZ(3), UK(2), US(2) • Dependent variable: vdur

80 Specifying interaction terms a*b = a + b + a:b vdur ~ cdur*voicing vdur ~ cdur + voicing + cdur∶voicing vdur ~ cdur*voicing*height vdur ~ cdur + voicing + height + cdur:voicing+ voicing:height + cdur:height + cdur:voicing:height

81 Saturated model

• Model with all factors and their interactions.

i.e. ML (Use this for All factors and their interactions model selection) ☞ Use REML=TRUE (for parameter estimation)

• Starting with this saturated model, eliminate factors and interactions that do not significantly contribute to the goodness of model fit. • The contribution is tested by LRT (p<0.05).

82 Saturated model Model with all interaction terms summary(mm1)

Fixed effects: Estimate Std. Error df t value Pr(>|t|) (Intercept) 216.04612 20.24309 41.60000 10.673 1.76e-13 *** cdur -0.15703 0.07356 480.90000 -2.135 0.03328 * voicingVOICELESS -24.87089 25.79788 38.70000 -0.964 0.34100 heightNONHIGH 12.46191 24.78135 33.00000 0.503 0.61839 cdur:voicingVOICELESS -0.23325 0.08912 474.40000 -2.617 0.00915 ** cdur:heightNONHIGH 0.18897 0.09312 474.90000 2.029 0.04299 * voicingVOICELESS:heightNONHIGH 8.80019 36.61632 39.20000 0.240 0.81132 cdur:voicingVOICELESS:heightNONHIGH -0.17443 0.13220 473.60000 -1.319 0.18765 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

We will eliminate this non- significant three-way interaction. 83 Eliminating a factor: Use Update()

• Syntax: update(x, ~.-factor) • Delete the non-significant three-way interaction term by update(). mm2<-update(mm1, ~.-cdur:voicing:height)

• Output model specification:

Linear mixed model fit by maximum likelihood . t-tests use Satterthwaite's method [ lmerModLmerTest] Formula: vdur ~ cdur + voicing + height + (1 | speaker) + (1 | item) + cdur:voicing + cdur:height + voicing:height

84 Likelihood Ratio Test

> anova(mm1,mm2) Data: d Models: mm2: vdur ~ cdur + voicing + height + (1 | speaker) + (1 | item) + mm2: cdur:voicing + cdur:height + voicing:height mm1: vdur ~ cdur * voicing * height + (1 | speaker) + (1 | item) Df AIC BIC logLik Chisq Chi Df Pr(>Chisq) mm2 10 4996.9 5039.1 -2488.5 4976.9 mm1 11 4997.2 5043.6 -2487.6 4975.2 1.7379 1 0.1874

• The saturated model (mm1) is not better than the reduced model (mm2), so we can discard mm1.

85 Reporting the result

> anova(mm1,mm2) Data: d Models: mm2: vdur ~ cdur + voicing + height + (1 | speaker) + (1 | item) + mm2: cdur:voicing + cdur:height + voicing:height mm1: vdur ~ cdur * voicing * height + (1 | speaker) + (1 | item) Df AIC BIC logLik deviance Chisq Chi Df Pr(>Chisq) mm2 10 4996.9 5039.1 -2488.5 4976.9 mm1 11 4997.2 5043.6 -2487.6 4975.2 1.7379 1 0.1874 “The model with three-way interactions was not significantly better than the model without the interaction (χ2(1) = 1.74, p = .19).”

Report Chi sq statistics: χ2(df)=Chisq value, p= p value 86 Current model: summary (mm2)

Fixed effects: Estimate Std. Error df t value Pr(>|t|) (Intercept) 212.42246 20.06869 40.10000 10.585 3.55e-13 *** cdur -0.12437 0.06938 478.10000 -1.793 0.0737 . voicingVOICELESS -14.43787 24.57580 31.90000 -0.587 0.5610 heightNONHIGH 21.47215 23.83963 28.30000 0.901 0.3754 cdur:voicingVOICELESS -0.31050 0.06731 474.70000 -4.613 5.11e-06 *** cdur:heightNONHIGH 0.10303 0.06668 474.20000 1.545 0.1230 voicingVOICELESS:heightNONHIGH -13.76816 32.39704 24.10000 -0.425 0.6746 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Among the interactions, eliminate the interaction with higher p-value

87 Elimination (2nd)

> mm3<-update(mm2,~.-voicing:height) > anova(mm2,mm3) Data: d Models: mm3: vdur ~ cdur + voicing + height + (1 | speaker) + (1 | item) + mm3: cdur:voicing + cdur:height mm2: vdur ~ cdur + voicing + height + (1 | speaker) + (1 | item) + mm2: cdur:voicing + cdur:height + voicing:height Df AIC BIC logLik deviance Chisq Chi Df Pr(>Chisq) mm3 9 4995.1 5033.1 -2488.6 4977.1 mm2 10 4996.9 5039.1 -2488.5 4976.9 0.18 1 0.6714 • A model with more parameters(mm2) is not better than mm3, so we can discard mm2 and adopt a simpler model mm3. • Reporting: The interaction of voicing and vowel height did not significantly improve the 2 (χ (1)=0.18,p=0.67). 88 Summary of the current model (mm3)

Fixed effects: Estimate Std. Error df t value Pr(>|t|) (Intercept) 215.77739 18.48194 41.70000 11.675 1.02e-14 *** cdur -0.12390 0.06938 478.10000 -1.786 0.0748 . voicingVOICELESS -21.31848 18.52951 40.40000 -1.151 0.2567 heightNONHIGH 14.94762 18.29788 38.30000 0.817 0.4190 cdur:voicingVOICELESS -0.31019 0.06730 474.90000 -4.609 5.21e-06 *** cdur:heightNONHIGH 0.10022 0.06634 482.60000 1.511 0.1315 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Now remove cdur:height.

• Repeat the same procedure until deleting a term yields a significant change in the goodness of fit (p<0.05), ..then: 89 > anova(mm1,mm2,mm3,mm4,mm5,mm6,mm7,mm8) Data: d Models: mm8: vdur ~ cdur + (1 | speaker) + (1 | item) mm6: vdur ~ cdur + (1 | speaker) + (1 | item) + cdur:voicing mm7: vdur ~ (1 | speaker) + (1 | item) + cdur:voicing mm5: vdur ~ cdur + voicing + (1 | speaker) + (1 | item) + cdur:voicing mm4: vdur ~ cdur + voicing + height + (1 | speaker) + (1 | item) + mm4: cdur:voicing mm3: vdur ~ cdur + voicing + height + (1 | speaker) + (1 | item) + mm3: cdur:voicing + cdur:height mm2: vdur ~ cdur + voicing + height + (1 | speaker) + (1 | item) + mm2: cdur:voicing + cdur:height + voicing:height mm6:Best mm1: vdur ~ cdur * voicing * height + (1 | speaker) + (1 | item) model Df AIC BIC logLik deviance Chisq Chi Df Pr(>Chisq) mm8 5 5022.7 5043.8 -2506.3 5012.7 mm6 6 4995.4 5020.7 -2491.7 4983.4 29.2976 1 6.207e-08 *** mm7 6 4995.4 5020.7 -2491.7 4983.4 0.0000 0 1.00000 mm5 7 4996.2 5025.7 -2491.1 4982.2 1.1459 1 0.28442 mm4 8 4995.4 5029.1 -2489.7 4979.4 2.8169 1 0.09328 . mm3 9 4995.1 5033.1 -2488.6 4977.1 2.2750 1 0.13148 mm2 10 4996.9 5039.1 -2488.5 4976.9 0.1800 1 0.67140 mm1 11 4997.2 5043.6 -2487.6 4975.2 mm5:1.7379 Best 1model 0.18741 that keeps --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’all 0.05 main ‘.’ effects0.1 ‘ ’ in 1 the interactions 90 Eliminating random effects

• To the best model so far (mm5), try removing random effects. mm9<–update(mm5, ~.-(1|speaker),RELM=TRUE) mm10<-update(mm5, ~.-(1|item),RELM=TRUE) anova(mm5,mm9) anova(mm5,mm10) • The current model is significantly better than the model without the by-speaker random intercept (χ2(1)=201.22, p<0.0001) and the one without the by- item random intercept (χ2(1)=401.54, p<0.0001).

91 Refitting the best model with REML

• After the best model is decided, refit the model with REML to obtain the best estimate of parameter values.

Remove REML=FALSE mm5<- lmer(vdur~cdur+voicing+cdur:voicing+(1|speak er)+(1|item),data=d)

92 The best model summary & interpretation

Fixed effects: Estimate Std. Error df t value Pr(>|t|) (Intercept) 224.61615 17.24766 33.40000 13.023 1.25e-14 *** cdur -0.08896 0.06580 468.50000 -1.352 0.177 voicingVOICELESS -20.87662 19.86242 34.90000 -1.051 0.300 cdur:voicingVOICELESS -0.30664 0.06758 472.70000 -4.537 7.23e-06 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 • Interpretation: • Vowel duration changes by -0.31 ms (t(473)=-4.54, p< .0001), as consonant duration increases by 1 ms if the consonant is voiceless.

93 Backward Elimination using Step()

• Function step() performs backward elimination of non- significant effects of linear mixed effects model. • The step() function of package lmerTest overrides the one for lm objects (which does forward selection based on AIC). • Applying to our data:

> step(mm1) Backward reduced random-effect table:

Eliminated npar logLik AIC LRT Df Pr(>Chisq) 11 -2487.6 4997.2 (1 | speaker) 0 10 -2588.7 5197.3 202.09 1 < 2.2e-16 *** (1 | item) 0 10 -2673.5 5367.0 371.78 1 < 2.2e-16 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

94 Fixed effects:

Backward reduced fixed-effect table: Degrees of freedom method: Satterthwaite

Eliminated Sum Sq Mean Sq NumDF DenDF F value Pr(>F) cdur:voicing:height 1 1654.5 1654.5 1 473.56 1.7410 0.18765 voicing:height 2 172.3 172.3 1 24.10 0.1806 0.67462 cdur:height 3 2176.4 2176.4 1 482.55 2.2819 0.13155 height 4 2866.7 2866.7 1 23.61 2.9903 0.09682 . cdur:voicing 0 19776.1 19776.1 1 474.75 20.6284 7.079e-06 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Selected model:

Model found: vdur ~ cdur + voicing + (1 | speaker) + (1 | item) + cdur:voicing

95 Roadmap

I. Mixed effects linear regression ● Wall Street Journal corpus data ● Hypothetical VC duration data ● Interaction terms and model selection II. Mixed effects logistic regression ○ English dative alternation

96 II. Mixed effects logistic regression

○ English dative alternation

97 English dative alternation

• Discussed in Johnson (2008: section 7.4, citing Bresnan et al 2007) • Two alternative ways dative PP dative NP I pushed the box to John. I pushed John the box.

98 English dative alternation

• Preference (A > B: A is preferred to B.) dative PP dative NP

I pushed the box to John. > I pushed John the box.

That movie gave the creeps That movie gave me the to me. < creeps. This will give the creeps to > This will give just about just about anyone. anyone the creeps.

99 English dative alternation

• Factors for preference: recipient = pronoun or not; short or long; given or new …?

100 Data: BresDative.txt

Two corpora, • the Switchboard corpus of conversational speech • the Wall Street Journal corpus of text

101 Data: BresDative.txt Code: • the realization of the dative (PP, NP), • the discourse accessibility, definiteness, animacy, and pronominality of the recipient and theme; • the semantic class of the verb ü abstract, transfer, future transfer, prevention of possession, and communication • a measure of the difference between the (log) length of the recipient and (log) length of the theme.

102 Data: BresDative.txt

real verb vsense class animrec …. 1 NP feed feed.t t animate … 2 NP give give.a a animate … 3 NP give give.a a animate … 4 NP give give.a a animate … 5 NP offer offer.c c animate … 6 NP give give.a a animate … … … … … … … … 3265 NP give give.a a animate …

103 Data: BresDative.txt • Question: … predict …?

class accessibility à realization of dative definiteness …

104 Data: BresDative.txt • Logistic regression R function: glm()

105 Logistic regression • Regression for count data • We cannot apply standard linear regression to the proportions. ü Proportions are bounded between 0 and 1, but lm() does not know this. ü Other problems. è Logit transformation

106 Logistic regression • Logit transformation i. p proportion 0 ~ 1

! ii. odds 0 ~ ¥ "#! ! iii. log( ) log odds, i.e. logit -¥ ~ +¥ "#!

107 Data: BresDative.txt

• Logistic regression model of dative alternation ü glm(real~class+accrec+…, family=binomial, data=SwitchDat) “PP realization as a function of class, …”

108 Data: BresDative.txt

> summary(glm(real ~ class+accrec+accth+prorec+ proth+defrec+defth+animrec+ldiff, family=binomial, data=SwitchDat)) ... Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) 0.3498 0.3554 0.984 0.32503 classc -1.3516 0.3141 -4.303 1.68e-05 *** classf 0.5138 0.4922 1.044 0.29651 classp -3.4277 1.2504 -2.741 0.00612 ** classt 1.1571 0.2055 5.631 1.80e-08 *** accrecnotgiven 1.1282 0.2681 4.208 2.57e-05 *** ...

109 Data: BresDative.txt

... Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) 0.3498 0.3554 0.984 0.32503 classc -1.3516 0.3141 -4.303 1.68e-05 *** classf 0.5138 0.4922 1.044 0.29651 classp -3.4277 1.2504 -2.741 0.00612 ** classt 1.1571 0.2055 5.631 1.80e-08 *** accrecnotgiven 1.1282 0.2681 4.208 2.57e-05 *** ... When recipient is not given earlier, PP is more likely.

110 Data: BresDative.txt

• Fixed effects logistic regression model real = 0.34 – 1.35*classc + 0.51*classf – 3.42*classp + 1.15*classt + 1.12*accrecnotgiven + …

111 Data: BresDative.txt

• A fixed effects logistic regression model real = 0.34 – 1.35*classc + 0.51*classf – 3.42*classp + 1.15*classt + 1.12*accrecnotgiven + … • Incorrect assumption All verbs have the same preference of dative alternation.

112 Data: BresDative.txt

• Possibility Verbs may differ in the preference of dative alternation.

113 Data: BresDative.txt

• Possibility Verbs may differ in the preference of dative alternation. • Verbs may have more than one senses. E.g. pay • Transfer: to pay him some money • More abstract: to pay attention to the clock

114 Data: BresDative.txt

• Possibility (revised) Verb senses may differ in the preference of dative alternation. • Add a random factor for verb sense.

115 Mixed effects logistic regression model

• Logistic regression model containing both fixed and random factors. • R Package: lmerTest • R function: glmer()

116 A model with random intercept

• Assumption Average preference of PP realization may be specific to each verb sense. • The model includes a separate intercept for each verb sense.

117 A model with random intercept

• The model includes a separate intercept for each verb sense. • Formula for “PP realization as a function of class, etc.”. • glmer(real ~ class + accrec + accth + prorec + proth + defrec + defth + animrec + ldiff + (1|vsense), family=binomial, data=SwitchDat)

118 A model with random intercept

• The model includes a separate intercept for each verb sense. • Formula for “PP realization as a function of class, etc.”. • glmer(real ~ class + accrec + accth + prorec + proth + defrec + defth + animrec + ldiff + (1|vsense), family=binomial, data=SwitchDat)

119 Mixed effects logistic model w/ random intercept > summary(glmer(real ~ class...(1|vsense),...) ... Random effects: Groups Name Variance Std.Dev. vsense (Intercept) 4.976 2.23 Fixed effects: Estimate Std. Error z value Pr(>|z|) (Intercept) 1.3943 0.8069 1.728 0.083984 . classc -1.3882 1.1481 -1.209 0.226614 classf -0.2491 1.3619 -0.183 0.854862 classp -4.9325 2.1697 -2.273 0.023007 * classt 0.9035 0.9647 0.937 0.348957 accrecnotgiven 1.6421 0.3417 4.805 1.55e-06 *** ...

120 Mixed effects logistic model w/ random intercept

• Compare the outputs of mixed effects model vs. model with no random factor.

> summary(glmer(...(1|vsense)...) > summary(glm(...) Fixed effects: Estimate Pr(>|z|) Estimate Pr(>|z|) (Intercept) 1.3943 0.083984 . 0.3498 0.32503 classc -1.3882 0.226614 -1.3516 1.68e-05 *** classf -0.2491 0.854862 0.5138 0.29651 classp -4.9325 0.023007 * -3.4277 0.00612 ** classt 0.9035 0.348957 1.1571 1.80e-08 *** Accrecnotgiven 1.6421 1.55e-06 *** 1.1282 2.57e-05 *** ......

121 Random intercepts and slopes

• Many maximal random effects models (e.g. both random intercept and slope) fail to converge (i.e. can’t find a solution). • In that case, you will have to use a simpler model (e.g. intercepts-only).

122 What we’ve covered today

I. Mixed effects linear regression ● Wall Street Journal corpus data ● Hypothetical VC duration data ● Interaction terms and model selection II. Mixed effects logistic regression ● English dative alternation

123 Thank you.

124