<<

Or almost never! But often the linearity assumption is good enough. When its not ... , • step functions, • splines, • local regression, and • generalized additive models • offer a lot of flexibility, without losing the ease and interpretability of linear models.

Moving Beyond Linearity

The truth is never linear!

1 / 23 But often the linearity assumption is good enough. When its not ... polynomials, • step functions, • splines, • local regression, and • generalized additive models • offer a lot of flexibility, without losing the ease and interpretability of linear models.

Moving Beyond Linearity

The truth is never linear! Or almost never!

1 / 23 When its not ... polynomials, • step functions, • splines, • local regression, and • generalized additive models • offer a lot of flexibility, without losing the ease and interpretability of linear models.

Moving Beyond Linearity

The truth is never linear! Or almost never! But often the linearity assumption is good enough.

1 / 23 Moving Beyond Linearity

The truth is never linear! Or almost never! But often the linearity assumption is good enough. When its not ... polynomials, • step functions, • splines, • local regression, and • generalized additive models • offer a lot of flexibility, without losing the ease and interpretability of linear models.

1 / 23 Regression

2 3 d yi = β0 + β1xi + β2xi + β3xi + ... + βdxi + i

Degree−4 Polynomial

| | || | || | | ||| || |||| | || | || ||| | | ||| | ||| | || || | || || | || | ||| | ) Age | 250 Wage > Wage ( Pr 50 100 150 200 250 300

|||||||||||||||||||||| || ||||||| ||| | |||||| |||||| |||||| | |||||| ||||||||| | ||| |||||| ||| || ||| || ||| ||||||| ||| ||| ||| |||| | ||| |||||| | | |||||| | || ||||| || | | ||| | || | | |||||| || |||||| | ||| ||| | ||| || |||| || | || |||| ||| || || | || ||| ||| ||||| ||| || | || || |||| | || ||| ||| | || | | |||| |||| |||||| | |||||| ||| || || |||| | ||| || | ||| ||| |||| |||| || |||| || || || | || ||| |||| |||||| | || ||||| |||||| ||| | |||||| ||||| |||||| | || | ||| ||| |||||||||| ||| | |||| ||| |||| ||| ||||| ||| ||||||| |||| ||| |||| ||| |||| | |||| | ||| ||| | ||| ||| ||| | ||| |||| | || | | ||| || || ||| | | | | | || || | || ||| || |||||| | | ||| ||| | ||| ||| | | | | | ||| ||| | | | | | ||| | | || || |||| | | | || | | | || || | | || | ||| || | | || 0.00 0.05 0.10 0.15 0.20

20 30 40 50 60 70 80 20 30 40 50 60 70 80

Age Age 2 / 23 Not really interested in the coefficients; more interested in • the fitted values at any value x0:

ˆ ˆ ˆ ˆ 2 ˆ 3 ˆ 4 f(x0) = β0 + β1x0 + β2x0 + β3x0 + β4x0.

ˆ ˆ Since f(x0) is a of the β`, can get a simple • ˆ expression for pointwise-variances Var[f(x0)] at any value x0. In the figure we have computed the fit and pointwise standard errors on a grid of values for x0. We show fˆ(x ) 2 se[fˆ(x )]. 0 ± · 0 We either fix the degree d at some reasonably low value, • else use cross-validation to choose d.

Details

Create new variables X = X, X = X2, etc and then treat • 1 2 as multiple .

3 / 23 ˆ ˆ Since f(x0) is a linear function of the β`, can get a simple • ˆ expression for pointwise-variances Var[f(x0)] at any value x0. In the figure we have computed the fit and pointwise standard errors on a grid of values for x0. We show fˆ(x ) 2 se[fˆ(x )]. 0 ± · 0 We either fix the degree d at some reasonably low value, • else use cross-validation to choose d.

Details

Create new variables X = X, X = X2, etc and then treat • 1 2 as multiple linear regression. Not really interested in the coefficients; more interested in • the fitted function values at any value x0:

ˆ ˆ ˆ ˆ 2 ˆ 3 ˆ 4 f(x0) = β0 + β1x0 + β2x0 + β3x0 + β4x0.

3 / 23 We either fix the degree d at some reasonably low value, • else use cross-validation to choose d.

Details

Create new variables X = X, X = X2, etc and then treat • 1 2 as multiple linear regression. Not really interested in the coefficients; more interested in • the fitted function values at any value x0:

ˆ ˆ ˆ ˆ 2 ˆ 3 ˆ 4 f(x0) = β0 + β1x0 + β2x0 + β3x0 + β4x0.

ˆ ˆ Since f(x0) is a linear function of the β`, can get a simple • ˆ expression for pointwise-variances Var[f(x0)] at any value x0. In the figure we have computed the fit and pointwise standard errors on a grid of values for x0. We show fˆ(x ) 2 se[fˆ(x )]. 0 ± · 0

3 / 23 Details

Create new variables X = X, X = X2, etc and then treat • 1 2 as multiple linear regression. Not really interested in the coefficients; more interested in • the fitted function values at any value x0:

ˆ ˆ ˆ ˆ 2 ˆ 3 ˆ 4 f(x0) = β0 + β1x0 + β2x0 + β3x0 + β4x0.

ˆ ˆ Since f(x0) is a linear function of the β`, can get a simple • ˆ expression for pointwise-variances Var[f(x0)] at any value x0. In the figure we have computed the fit and pointwise standard errors on a grid of values for x0. We show fˆ(x ) 2 se[fˆ(x )]. 0 ± · 0 We either fix the degree d at some reasonably low value, • else use cross-validation to choose d.

3 / 23 Can do separately on several variables—just stack the • variables into one matrix, and separate out the pieces afterwards (see GAMs later). Caveat: polynomials have notorious tail behavior — very • bad for extrapolation. Can fit using y poly(x, degree = 3) in formula. • ∼

Details continued

Logistic regression follows naturally. For example, in figure • we model exp(β + β x + β x2 + ... + β xd) Pr(y > 250 x ) = 0 1 i 2 i d i . i i 2 d | 1 + exp(β0 + β1xi + β2xi + ... + βdxi ) To get confidence intervals, compute upper and lower • bounds on on the logit scale, and then invert to get on probability scale.

4 / 23 Caveat: polynomials have notorious tail behavior — very • bad for extrapolation. Can fit using y poly(x, degree = 3) in formula. • ∼

Details continued

Logistic regression follows naturally. For example, in figure • we model exp(β + β x + β x2 + ... + β xd) Pr(y > 250 x ) = 0 1 i 2 i d i . i i 2 d | 1 + exp(β0 + β1xi + β2xi + ... + βdxi ) To get confidence intervals, compute upper and lower • bounds on on the logit scale, and then invert to get on probability scale. Can do separately on several variables—just stack the • variables into one matrix, and separate out the pieces afterwards (see GAMs later).

4 / 23 Details continued

Logistic regression follows naturally. For example, in figure • we model exp(β + β x + β x2 + ... + β xd) Pr(y > 250 x ) = 0 1 i 2 i d i . i i 2 d | 1 + exp(β0 + β1xi + β2xi + ... + βdxi ) To get confidence intervals, compute upper and lower • bounds on on the logit scale, and then invert to get on probability scale. Can do separately on several variables—just stack the • variables into one matrix, and separate out the pieces afterwards (see GAMs later). Caveat: polynomials have notorious tail behavior — very • bad for extrapolation. Can fit using y poly(x, degree = 3) in formula. • ∼

4 / 23 Step Functions Another way of creating transformations of a variable — cut the variable into distinct regions.

C1(X) = I(X < 35),C2(X) = I(35 X < 50),...,C3(X) = I(X 65) ≤ ≥

Piecewise Constant

| | ||| || || | ||| || | | || | ||| || | ||||| || || | || | | || | | || || | | | | | ) Age | 250 Wage > Wage ( Pr 50 100 150 200 250 300

||||||||||||||||||| |||||| ||| |||||||||| | |||||||||| || |||||||| ||| || |||||| |||||||||||| ||| ||||||| |||||| | ||| || | ||| | ||| ||||||| | || |||| ||| ||||| ||| || || ||| ||| ||| | || |||| ||| | ||| |||||| | ||| || |||| ||| ||| ||| ||| ||| ||| || || | ||| |||| ||| |||| ||| ||| | ||| | ||| ||||| ||| ||| | || | || | ||| | |||||| |||||| |||| | || |||||| | ||| |||||| | |||||| ||| | |||| || ||| ||||||| || |||| | ||||||| ||||| |||||| || ||||||| || ||| |||| ||| ||| ||| |||| |||||||| || ||| ||| ||||| || ||| | ||||| | || |||| ||| |||| | || ||||| | | ||| ||| ||| ||| | ||| | || | ||| || || ||| | || ||| || | || | | ||| ||| | | | |||| |||| | || | | || | ||| || || | || || || || | | ||| | || || | | ||| ||| | ||| | | | ||| | | || | | | | | | || | | | | | || | 0.00 0.05 0.10 0.15 0.20

20 30 40 50 60 70 80 20 30 40 50 60 70 80

Age Age 5 / 23 Useful way of creating interactions that are easy to • interpret. For example, interaction effect of Year and Age:

I(Year < 2005) Age,I(Year 2005) Age · ≥ · would allow for different linear functions in each age category. In R: I(year < 2005) or cut(age, c(18, 25, 40, 65, 90)). • Choice of cutpoints or knots can be problematic. For • creating nonlinearities, smoother alternatives such as splines are available.

Step functions continued

Easy to work with. Creates a series of dummy variables • representing each group.

6 / 23 In R: I(year < 2005) or cut(age, c(18, 25, 40, 65, 90)). • Choice of cutpoints or knots can be problematic. For • creating nonlinearities, smoother alternatives such as splines are available.

Step functions continued

Easy to work with. Creates a series of dummy variables • representing each group. Useful way of creating interactions that are easy to • interpret. For example, interaction effect of Year and Age:

I(Year < 2005) Age,I(Year 2005) Age · ≥ · would allow for different linear functions in each age category.

6 / 23 Choice of cutpoints or knots can be problematic. For • creating nonlinearities, smoother alternatives such as splines are available.

Step functions continued

Easy to work with. Creates a series of dummy variables • representing each group. Useful way of creating interactions that are easy to • interpret. For example, interaction effect of Year and Age:

I(Year < 2005) Age,I(Year 2005) Age · ≥ · would allow for different linear functions in each age category. In R: I(year < 2005) or cut(age, c(18, 25, 40, 65, 90)). •

6 / 23 Step functions continued

Easy to work with. Creates a series of dummy variables • representing each group. Useful way of creating interactions that are easy to • interpret. For example, interaction effect of Year and Age:

I(Year < 2005) Age,I(Year 2005) Age · ≥ · would allow for different linear functions in each age category. In R: I(year < 2005) or cut(age, c(18, 25, 40, 65, 90)). • Choice of cutpoints or knots can be problematic. For • creating nonlinearities, smoother alternatives such as splines are available.

6 / 23 Piecewise Polynomials

Instead of a single polynomial in X over its whole domain, • we can rather use different polynomials in regions defined by knots. E.g. (see figure) ( β + β x + β x2 + β x3 +  if x < c; y = 01 11 i 21 i 31 i i i i 2 3 β + β xi + β x + β x + i if xi c. 02 12 22 i 32 i ≥ Better to add constraints to the polynomials, e.g. • continuity. Splines have the “maximum” amount of continuity. •

7 / 23 Piecewise Cubic Continuous Piecewise Cubic Wage Wage 50 100 150 200 250 50 100 150 200 250

20 30 40 50 60 70 20 30 40 50 60 70

Age Age

Cubic Spline Linear Spline Wage Wage 50 100 150 200 250 50 100 150 200 250

20 30 40 50 60 70 20 30 40 50 60 70

Age Age 8 / 23 b1(xi) = xi

bk (xi) = (xi ξk) , k = 1,...,K +1 − +

Here the ()+ means positive part; i.e.  xi ξk if xi > ξk (xi ξk)+ = − − 0 otherwise

Linear Splines A linear spline with knots at ξk, k = 1,...,K is a piecewise linear polynomial continuous at each knot. We can represent this model as

yi = β + β b (xi) + β b (xi) + + βK bK (xi) + i, 0 1 1 2 2 ··· +1 +1 where the bk are basis functions.

9 / 23 Linear Splines A linear spline with knots at ξk, k = 1,...,K is a piecewise linear polynomial continuous at each knot. We can represent this model as

yi = β + β b (xi) + β b (xi) + + βK bK (xi) + i, 0 1 1 2 2 ··· +1 +1 where the bk are basis functions.

b1(xi) = xi

bk (xi) = (xi ξk) , k = 1,...,K +1 − +

Here the ()+ means positive part; i.e.  xi ξk if xi > ξk (xi ξk)+ = − − 0 otherwise

9 / 23 0.9 0.7 f(x) 0.5 0.3

0.0 0.2 0.4 0.6 0.8 1.0

x 0.4 b(x) 0.2 0.0 0.0 0.2 0.4 0.6 0.8 1.0

x

10 / 23 Cubic Splines A cubic spline with knots at ξk, k = 1,...,K is a piecewise cubic polynomial with continuous up to order 2 at each knot. Again we can represent this model with truncated power basis functions

yi = β + β b (xi) + β b (xi) + + βK bK (xi) + i, 0 1 1 2 2 ··· +3 +3

b1(xi) = xi 2 b2(xi) = xi 3 b3(xi) = xi 3 bk (xi) = (xi ξk) , k = 1,...,K +3 − + where  3 3 (xi ξk) if xi > ξk (xi ξk) = − − + 0 otherwise 11 / 23 1.6 1.4 f(x) 1.2 1.0

0.0 0.2 0.4 0.6 0.8 1.0

x 0.4 b(x) 0.2 0.0 0.0 0.2 0.4 0.6 0.8 1.0

x

12 / 23 Natural Cubic Splines A natural cubic spline extrapolates linearly beyond the boundary knots. This adds 4 = 2 2 extra constraints, and allows us to put more internal knots× for the same degrees of freedom as a regular cubic spline.

Natural Cubic Spline Cubic Spline Wage 50 100 150 200 250

20 30 40 50 60 70 13 / 23 Age Fitting splines in R is easy: bs(x, ...) for any degree splines, and ns(x, ...) for natural cubic splines, in package splines.

Natural Cubic Spline

| | || || ||| || || || | || |||| || | |||| | || | || || | || | || || || || | || || | ) Age | 250 Wage > Wage ( Pr 50 100 150 200 250 300

||| ||||||||||||| |||||||| ||||| ||| |||||||| |||| |||| |||||| ||| ||||| |||| ||||||| |||||| ||| | ||||| ||| | ||| | ||||| | ||| | | |||||| ||| | ||||||| ||| || | ||||||||| | |||||| ||||||||| |||| | ||| ||||| ||| ||| ||| | ||| || | ||| |||| | || | ||| | |||||| | | | ||| ||| |||| ||| | ||| ||| ||| ||||| || | ||| ||| || | |||||| ||||| ||| || ||| ||||||| ||| || ||| ||||||| |||||| ||| ||| ||||| ||| || | ||| | || ||| ||| ||| ||| ||||||| | ||| | ||||||| |||||| ||||||| ||| ||| ||| |||| || | |||| ||| ||||||| ||| ||| ||| ||| ||| |||| ||| |||| || || ||| ||| || || |||| ||| |||| | ||| |||| ||| ||| | ||| ||| | ||| |||| ||| | || ||| | | | ||| | ||| | | | | |||| |||| | || ||| ||||| || ||| | | ||| | | || | || || | || | ||| | | || | ||| | | ||| || || | | | | || | | | | | | | | | | | || 0.00 0.05 0.10 0.15 0.20

20 30 40 50 60 70 80 20 30 40 50 60 70 80

Age Age

14 / 23 ns(age, df=14) poly(age, deg=14)

Knot placement One strategy is to decide K, the number of knots, and then • place them at appropriate quantiles of the observed X. A cubic spline with K knots has K + 4 parameters or • degrees of freedom. A natural spline with K knots has K degrees of freedom. •

Natural Cubic Spline Comparison of a Polynomial degree-14 polyno- mial and a natural cubic spline, each Wage with 15df. 50 100 150 200 250 300

20 30 40 50 60 70 80

Age 15 / 23 Knot placement One strategy is to decide K, the number of knots, and then • place them at appropriate quantiles of the observed X. A cubic spline with K knots has K + 4 parameters or • degrees of freedom. A natural spline with K knots has K degrees of freedom. •

Natural Cubic Spline Comparison of a Polynomial degree-14 polyno- mial and a natural cubic spline, each Wage with 15df. ns(age, df=14)

50 100 150 200 250 300 poly(age, deg=14)

20 30 40 50 60 70 80

Age 15 / 23 The first term is RSS, and tries to make g(x) match the • data at each xi. The second term is a roughness penalty and controls how • wiggly g(x) is. It is modulated by the tuning parameter λ 0. ≥ • The smaller λ, the more wiggly the function, eventually interpolating yi when λ = 0. • As λ , the function g(x) becomes linear. → ∞

Smoothing Splines

This section is a little bit mathematical Consider this criterion for fitting a smooth function g(x) to some data: n Z X 2 00 2 minimize (yi g(xi)) + λ g (t) dt g∈S − i=1

16 / 23 The second term is a roughness penalty and controls how • wiggly g(x) is. It is modulated by the tuning parameter λ 0. ≥ • The smaller λ, the more wiggly the function, eventually interpolating yi when λ = 0. • As λ , the function g(x) becomes linear. → ∞

Smoothing Splines

This section is a little bit mathematical Consider this criterion for fitting a smooth function g(x) to some data: n Z X 2 00 2 minimize (yi g(xi)) + λ g (t) dt g∈S − i=1

The first term is RSS, and tries to make g(x) match the • data at each xi.

16 / 23 • The smaller λ, the more wiggly the function, eventually interpolating yi when λ = 0. • As λ , the function g(x) becomes linear. → ∞

Smoothing Splines

This section is a little bit mathematical Consider this criterion for fitting a smooth function g(x) to some data: n Z X 2 00 2 minimize (yi g(xi)) + λ g (t) dt g∈S − i=1

The first term is RSS, and tries to make g(x) match the • data at each xi. The second term is a roughness penalty and controls how • wiggly g(x) is. It is modulated by the tuning parameter λ 0. ≥

16 / 23 • As λ , the function g(x) becomes linear. → ∞

Smoothing Splines

This section is a little bit mathematical Consider this criterion for fitting a smooth function g(x) to some data: n Z X 2 00 2 minimize (yi g(xi)) + λ g (t) dt g∈S − i=1

The first term is RSS, and tries to make g(x) match the • data at each xi. The second term is a roughness penalty and controls how • wiggly g(x) is. It is modulated by the tuning parameter λ 0. ≥ • The smaller λ, the more wiggly the function, eventually interpolating yi when λ = 0.

16 / 23 Smoothing Splines

This section is a little bit mathematical Consider this criterion for fitting a smooth function g(x) to some data: n Z X 2 00 2 minimize (yi g(xi)) + λ g (t) dt g∈S − i=1

The first term is RSS, and tries to make g(x) match the • data at each xi. The second term is a roughness penalty and controls how • wiggly g(x) is. It is modulated by the tuning parameter λ 0. ≥ • The smaller λ, the more wiggly the function, eventually interpolating yi when λ = 0. • As λ , the function g(x) becomes linear. → ∞ 16 / 23 Some details Smoothing splines avoid the knot-selection issue, leaving a • single λ to be chosen. The algorithmic details are too complex to describe here. • In R, the function smooth.spline() will fit a smoothing spline. The vector of n fitted values can be written as gˆ = S y, • λ λ where Sλ is a n n matrix (determined by the xi and λ). × The effective degrees of freedom are given by • n X dfλ = Sλ ii. { } i=1

Smoothing Splines continued The solution is a natural cubic spline, with a knot at every unique value of xi. The roughness penalty still controls the roughness via λ.

17 / 23 The algorithmic details are too complex to describe here. • In R, the function smooth.spline() will fit a smoothing spline. The vector of n fitted values can be written as gˆ = S y, • λ λ where Sλ is a n n matrix (determined by the xi and λ). × The effective degrees of freedom are given by • n X dfλ = Sλ ii. { } i=1

Smoothing Splines continued The solution is a natural cubic spline, with a knot at every unique value of xi. The roughness penalty still controls the roughness via λ. Some details Smoothing splines avoid the knot-selection issue, leaving a • single λ to be chosen.

17 / 23 The vector of n fitted values can be written as gˆ = S y, • λ λ where Sλ is a n n matrix (determined by the xi and λ). × The effective degrees of freedom are given by • n X dfλ = Sλ ii. { } i=1

Smoothing Splines continued The solution is a natural cubic spline, with a knot at every unique value of xi. The roughness penalty still controls the roughness via λ. Some details Smoothing splines avoid the knot-selection issue, leaving a • single λ to be chosen. The algorithmic details are too complex to describe here. • In R, the function smooth.spline() will fit a smoothing spline.

17 / 23 Smoothing Splines continued The solution is a natural cubic spline, with a knot at every unique value of xi. The roughness penalty still controls the roughness via λ. Some details Smoothing splines avoid the knot-selection issue, leaving a • single λ to be chosen. The algorithmic details are too complex to describe here. • In R, the function smooth.spline() will fit a smoothing spline. The vector of n fitted values can be written as gˆ = S y, • λ λ where Sλ is a n n matrix (determined by the xi and λ). × The effective degrees of freedom are given by • n X dfλ = Sλ ii. { } i=1

17 / 23 The leave-one-out (LOO) cross-validated error is given by • n n  2 X (−i) 2 X yi gˆλ(xi) RSScv(λ) = (yi gˆλ (xi)) = − . − 1 Sλ ii i=1 i=1 − { } In R: smooth.spline(age, wage)

Smoothing Splines continued — choosing λ

We can specify df rather than λ! • In R: smooth.spline(age, wage, df = 10)

18 / 23 Smoothing Splines continued — choosing λ

We can specify df rather than λ! • In R: smooth.spline(age, wage, df = 10) The leave-one-out (LOO) cross-validated error is given by • n n  2 X (−i) 2 X yi gˆλ(xi) RSScv(λ) = (yi gˆλ (xi)) = − . − 1 Sλ ii i=1 i=1 − { } In R: smooth.spline(age, wage)

18 / 23 Smoothing Spline

16 Degrees of Freedom 6.8 Degrees of Freedom (LOOCV) Wage 0 50 100 200 300

20 30 40 50 60 70 80

Age

19 / 23 Local Regression

Local Regression

OO OO O O O O O O O O O O OO OO O O OO OO O O O O OO O O OO O O O O O O O O O O O O O O O OO O OO O O O O O O O O OO O O O OO O O O OO O OO O OO O OO O O O O O OO O O O O OO OO O OO O O O O OO O OO O O O O OOOO O O O O OOOO O O O O O O O O O O O O O O OO O O OO O O O O O O O O O O O O O O O O O O O OO O O O OO O O O O O O OOO O OOO O O O O O

O O −1.0 −0.5 0.0 0.5 1.0 1.5 −1.0 −0.5 0.0 0.5 1.0 1.5

0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 With a sliding weight function, we fit separate linear fits over the range of X by weighted least squares. See text for more details, and loess() function in R.

20 / 23 Generalized Additive Models

Allows for flexible nonlinearities in several variables, but retains the additive structure of linear models.

yi = β + f (xi ) + f (xi ) + + fp(xip) + i. 0 1 1 2 2 ··· Coll ) ) ) age year ( ( 2 1 education ( f f 3 f −30 −20 −10 0 10 20 30 −50 −40 −30 −20 −10 0 10 20 −30 −20 −10 0 10 20 30 40

2003 2005 2007 2009 20 30 40 50 60 70 80 education year age

21 / 23 Coefficients not that interesting; fitted functions are. The • previous plot was produced using plot.gam. Can mix terms — some linear, some nonlinear — and use • anova() to compare models. Can use smoothing splines or local regression as well: • gam(wage s(year, df = 5) + lo(age, span = .5) + education) ∼

GAMs are additive, although low-order interactions can be • included in a natural way using, e.g. bivariate smoothers or interactions of the form ns(age,df=5):ns(year,df=5).

GAM details Can fit a GAM simply using, e.g. natural splines: • lm(wage ns(year, df = 5) + ns(age, df = 5) + education) ∼

22 / 23 Can mix terms — some linear, some nonlinear — and use • anova() to compare models. Can use smoothing splines or local regression as well: • gam(wage s(year, df = 5) + lo(age, span = .5) + education) ∼

GAMs are additive, although low-order interactions can be • included in a natural way using, e.g. bivariate smoothers or interactions of the form ns(age,df=5):ns(year,df=5).

GAM details Can fit a GAM simply using, e.g. natural splines: • lm(wage ns(year, df = 5) + ns(age, df = 5) + education) ∼

Coefficients not that interesting; fitted functions are. The • previous plot was produced using plot.gam.

22 / 23 Can use smoothing splines or local regression as well: • gam(wage s(year, df = 5) + lo(age, span = .5) + education) ∼

GAMs are additive, although low-order interactions can be • included in a natural way using, e.g. bivariate smoothers or interactions of the form ns(age,df=5):ns(year,df=5).

GAM details Can fit a GAM simply using, e.g. natural splines: • lm(wage ns(year, df = 5) + ns(age, df = 5) + education) ∼

Coefficients not that interesting; fitted functions are. The • previous plot was produced using plot.gam. Can mix terms — some linear, some nonlinear — and use • anova() to compare models.

22 / 23 GAMs are additive, although low-order interactions can be • included in a natural way using, e.g. bivariate smoothers or interactions of the form ns(age,df=5):ns(year,df=5).

GAM details Can fit a GAM simply using, e.g. natural splines: • lm(wage ns(year, df = 5) + ns(age, df = 5) + education) ∼

Coefficients not that interesting; fitted functions are. The • previous plot was produced using plot.gam. Can mix terms — some linear, some nonlinear — and use • anova() to compare models. Can use smoothing splines or local regression as well: • gam(wage s(year, df = 5) + lo(age, span = .5) + education) ∼

22 / 23 GAM details Can fit a GAM simply using, e.g. natural splines: • lm(wage ns(year, df = 5) + ns(age, df = 5) + education) ∼

Coefficients not that interesting; fitted functions are. The • previous plot was produced using plot.gam. Can mix terms — some linear, some nonlinear — and use • anova() to compare models. Can use smoothing splines or local regression as well: • gam(wage s(year, df = 5) + lo(age, span = .5) + education) ∼

GAMs are additive, although low-order interactions can be • included in a natural way using, e.g. bivariate smoothers or interactions of the form ns(age,df=5):ns(year,df=5).

22 / 23 GAMs for classification

 p(X)  log = β + f (X ) + f (X ) + + fp(Xp). 1 p(X) 0 1 1 2 2 ··· − HS Coll ) ) ) age year ( ( 2 1 education ( f f 3 f −8 −6 −4 −2 0 2 −4 −2 0 2 4 −4 −2 0 2 4 2003 2005 2007 2009 20 30 40 50 60 70 80 education year age gam(I(wage > 250) ∼ year + s(age, df = 5) + education, family = binomial)

23 / 23