Statistical Analysis of Corpus Data with R A short introduction to regression and linear models
Designed by Marco Baroni1 and Stefan Evert2
1Center for Mind/Brain Sciences (CIMeC) University of Trento
2Institute of Cognitive Science (IKW) University of Onsabr¨uck
Baroni & Evert (Trento/Osnabr¨uck) SIGIL: Linear Models 1 / 15 Outline Outline
1 Regression Simple linear regression General linear regression
2 Linear statistical models A statistical model of linear regression Statistical inference
3 Generalised linear models
Baroni & Evert (Trento/Osnabr¨uck) SIGIL: Linear Models 2 / 15 Least-squares regression minimizes prediction error
n X 2 Q = yi − (β0 + β1xi ) i=1
for data points (x1, y1),..., (xn, yn)
Regression Simple linear regression Linear regression
Can random variable Y be predicted from r. v. X ? focus on linear relationship between variables
Linear predictor: Y ≈ β0 + β1 · X
I β0 = intercept of regression line I β1 = slope of regression line
Baroni & Evert (Trento/Osnabr¨uck) SIGIL: Linear Models 3 / 15 Regression Simple linear regression Linear regression
Can random variable Y be predicted from r. v. X ? focus on linear relationship between variables
Linear predictor: Y ≈ β0 + β1 · X
I β0 = intercept of regression line I β1 = slope of regression line
Least-squares regression minimizes prediction error
n X 2 Q = yi − (β0 + β1xi ) i=1
for data points (x1, y1),..., (xn, yn)
Baroni & Evert (Trento/Osnabr¨uck) SIGIL: Linear Models 3 / 15 Mathematical derivation of regression coefficients I minimum of Q(β0, β1) satisfies ∂Q/∂β0 = ∂Q/∂β1 = 0 I leads to normal equations (system of 2 linear equations) n n n Xˆ ˜ X X −2 yi − (β0 + β1xi ) = 0 § β0n + β1 xi = yi i=1 i=1 i=1 n n n n X ˆ ˜ X X 2 X −2 xi yi − (β0 + β1xi ) = 0 § β0 xi + β1 xi = xi yi i=1 i=1 i=1 i=1
I regression coefficients = unique solution βˆ0, βˆ1
Regression Simple linear regression Simple linear regression
Coefficients of least-squares line Pn ˆ i=1 xi yi − nxny n β1 = Pn 2 2 i=1 xi − nxn ˆ ˆ β0 = y n − β1xn
Baroni & Evert (Trento/Osnabr¨uck) SIGIL: Linear Models 4 / 15 Regression Simple linear regression Simple linear regression
Coefficients of least-squares line Pn ˆ i=1 xi yi − nxny n β1 = Pn 2 2 i=1 xi − nxn ˆ ˆ β0 = y n − β1xn
Mathematical derivation of regression coefficients I minimum of Q(β0, β1) satisfies ∂Q/∂β0 = ∂Q/∂β1 = 0 I leads to normal equations (system of 2 linear equations) n n n Xˆ ˜ X X −2 yi − (β0 + β1xi ) = 0 § β0n + β1 xi = yi i=1 i=1 i=1 n n n n X ˆ ˜ X X 2 X −2 xi yi − (β0 + β1xi ) = 0 § β0 xi + β1 xi = xi yi i=1 i=1 i=1 i=1
I regression coefficients = unique solution βˆ0, βˆ1
Baroni & Evert (Trento/Osnabr¨uck) SIGIL: Linear Models 4 / 15 Pearson correlation = amount of variation “explained” by X 2 Pn 2 2 Sresid i=1(yi − β0 − β1xi ) R = 1 − 2 = 1 − Pn 2 Sy i=1(yi − y n)
correlation vs. slope of regression line
2 R = βˆ1(y ∼ x) · βˆ1(x ∼ y)
Regression Simple linear regression The Pearson correlation coefficient
Measuring the “goodness of fit” of the linear prediction 2 I variation among observed values of Y = sum of squares Sy I closely related to (sample estimate for) variance of Y
n 2 X 2 Sy = (yi − y n) i=1
2 I residual variation wrt. linear prediction: Sresid = Q
Baroni & Evert (Trento/Osnabr¨uck) SIGIL: Linear Models 5 / 15 Regression Simple linear regression The Pearson correlation coefficient
Measuring the “goodness of fit” of the linear prediction 2 I variation among observed values of Y = sum of squares Sy I closely related to (sample estimate for) variance of Y
n 2 X 2 Sy = (yi − y n) i=1
2 I residual variation wrt. linear prediction: Sresid = Q
Pearson correlation = amount of variation “explained” by X 2 Pn 2 2 Sresid i=1(yi − β0 − β1xi ) R = 1 − 2 = 1 − Pn 2 Sy i=1(yi − y n)
correlation vs. slope of regression line
2 R = βˆ1(y ∼ x) · βˆ1(x ∼ y)
Baroni & Evert (Trento/Osnabr¨uck) SIGIL: Linear Models 5 / 15 Regression General linear regression Multiple linear regression
Linear regression with multiple predictor variables
Y ≈ β0 + β1X1 + ··· + βk Xk
minimises
n X 2 Q = yi − (β0 + β1xi1 + ··· + βk xik ) i=1 for data points x11,..., x1k , y1 ,..., xn1,..., xnk , yn
Multiple linear regression fits n-dimensional hyperplane instead of regression line
Baroni & Evert (Trento/Osnabr¨uck) SIGIL: Linear Models 6 / 15 Regression General linear regression Multiple linear regression: The design matrix
Matrix notation of linear regression problem
y ≈ Zβ
“Design matrix” Z of the regression data 1 x11 x12 ··· x1k 1 x21 x22 ··· x2k Z = . . . . . . . . 1 xn1 xn2 ··· xnk 0 y = y1 y2 ... yn 0 β = β0 β1 β2 . . . βk A0 denotes transpose of a matrix; y, β are column vectors
Baroni & Evert (Trento/Osnabr¨uck) SIGIL: Linear Models 7 / 15 System of normal equations satisfying ∇β Q = 0:
Z0Zβ = Z0y
Leads to regression coefficients
βˆ = (Z0Z)−1Z0y
Regression General linear regression General linear regression
Matrix notation of linear regression problem
y ≈ Zβ
Residual error Q = (y − Zβ)0(y − Zβ)
Baroni & Evert (Trento/Osnabr¨uck) SIGIL: Linear Models 8 / 15 Leads to regression coefficients
βˆ = (Z0Z)−1Z0y
Regression General linear regression General linear regression
Matrix notation of linear regression problem
y ≈ Zβ
Residual error Q = (y − Zβ)0(y − Zβ)
System of normal equations satisfying ∇β Q = 0:
Z0Zβ = Z0y
Baroni & Evert (Trento/Osnabr¨uck) SIGIL: Linear Models 8 / 15 Regression General linear regression General linear regression
Matrix notation of linear regression problem
y ≈ Zβ
Residual error Q = (y − Zβ)0(y − Zβ)
System of normal equations satisfying ∇β Q = 0:
Z0Zβ = Z0y
Leads to regression coefficients
βˆ = (Z0Z)−1Z0y
Baroni & Evert (Trento/Osnabr¨uck) SIGIL: Linear Models 8 / 15 Regression General linear regression General linear regression
Predictor variables can also be functions of the observed variables § regression only has to be linear in coefficients β
E.g. polynomial regression with design matrix
2 k 1 x1 x1 ··· x1 2 k 1 x2 x2 ··· x2 Z = . . . . . . . . 2 k 1 xn xn ··· xn corresponding to regression model
2 k Y ≈ β0 + β1X + β2X + ··· + βk X
Baroni & Evert (Trento/Osnabr¨uck) SIGIL: Linear Models 9 / 15 Mathematical notation: