Moving Beyond Linearity

Moving Beyond Linearity Chapter 7 – Part I 1 Moving Beyond Linearity • Polynomial Regression • Basis Functions • Polynomials • Piecewise Constant and Step Functions 2 Moving Beyond Linearity • Truth is never exactly linear. • Often can be good approximation. • But when it is not, need methods for non-linearity. • Want flexibility, but keep some interpretability if possible. • First consider single predictor, then generalize to multiple predictors. 3 Polynomial Regression • Replace simple linear regression with polynomial regression • Higher degree polynomials give more flexible fit. 4 Polynomial Regression 5 Polynomial Regression • Where do the confidence intervals come from? ˆ • For any given , we need: Var() f() x0 • Then we use • Have variances and covariances of coefficients. • Recall: • Use repeatedly on • Or use matrix form: Var ()() cTTββˆˆ= c Var c 6 Polynomial Regression • Seems to be distinct split in wages. • Most under 250, few above 250. • Instead of predicting wage itself, estimate probability of wage > 250. 7 Polynomial Regression • Fit logistic regression. Pr( Wagei > 250 Age i) = Pr()zx ii = 1 • Example uses 4th degree. 8 Basis Functions • Polynomial regression is special case of basis function approach. • Idea: • Construct set of fixed basis functions to apply to predictor variable, X. • Fit regression: 9 Local Basis Functions • Polynomial basis functions have problems. • Polynomial basis leads to global fitting. • Local fitting can be better. 10 Piecewise Constant Basis: Step Functions • Simplest local basis creates a step function. • Function is constant on each disjoint interval. b01() xii= Ix[] < c, b1() xii= Ic[] 12 ≤< x c , bK−−11() x i= Ic[ K ≤< x iK c], bKi() x= Ix[] i ≥ c K 11 Piecewise Constant Basis: Step Functions • Fit regression using constant basis functions. • Omit indicator for first interval. Interpretation: • All basis functions in model are zero for values in first interval. • What is the predicted value? • Can also do logistic regression if response is binary. 12 Piecewise Constant Basis: Step Functions 13 Piecewise Constant Basis: Step Functions • Benefits: • Drawbacks: 14 Polynomials and Piecewise Constants • Polynomials have advantage of being smooth, not jumps. • Piecewise constant basis functions give local instead of global fit. • How about combining the two? More next time! 15 Moving Beyond Linearity: Regression Splines Chapter 7 – Part II 16 Moving Beyond Linearity: Regression Splines • Regression Splines • Piecewise Polynomial Basis • Splines • Choosing Number and Location of Knots 17 Piecewise Polynomials • Generalize piecewise constant approach. • Instead of constant between knots, fit polynomials. • Idea: Instead of an overall polynomial, can fit separate between knots. 18 Piecewise Polynomials • Example: Piecewise cubic with a single cutpoint at c = 50. • Looks strange because of jump at Age = 50. 19 Piecewise Polynomials • Fit piecewise cubic constrained to be continuous at knot. • Much better. • Still odd with non-differentiable point at Age = 50. • Creates a sharp corner. • For “maximum smoothness” at knots for cubic, want: • 20 Splines 21 Spline Basis and Regression Splines • How do we implement the constraints needed for the splines? • In general, with K knots, splines have d + K + 1 degrees of freedom. • Unconstrained would have (K + 1) (d + 1) = K d + d + K + 1. • But, have d constraints for each of K knots. • So left with df = d + K + 1. • Cubic: d = 3, so df = K + 4 --- Linear: d = 1, so df = K + 2 • We can write set of d + K + 1 basis functions, known as a spline basis. • Then fit, for example using cubic splines: • Using spline basis for regression, called regression splines. 22 Linear Spline Basis • How can we write a spline basis? • Note: first term is intercept, only need d + K basis functions. • Consider linear splines. • K knots gives K + 1 intervals. • Linear spline fully determined by: 1. Intercept. 2. Value of slope on each of the K + 1 intervals. • Equivalently, can use differences in slopes between successive intervals. 23 Linear Spline Basis • So, fit: yi=ββ0 + 11 bx()i + β 2 bx 2 () i + β 33 bx ()... i ++ β KK++1 b 1 () x i + ε i with bx1 ()ii= x = − bx21()()ii x c+ = − bx32()()ii x c+ = − bK+1 ()() x i xc iK+ • Known as truncated power basis of order 1. • 24 Cubic Spline Basis yi=+++ββ0 11 bx() iiiiKKii β 22 bx () β 33 bx () + β 44 bx ()... + β++3 b 3 () x + ε bx1 ()ii= x 2 bx2 ()ii= x 3 bx3 ()ii= x = − 3 bx41()()ii x c+ = − 3 bK+3 ()() x i xc iK+ 25 Cubic Spline Basis • Why are cubic splines the most popular degree? • Low order polynomial reduces overfitting problems. • Visual smoothness. • Problem: First and last interval often highly variable if cubic allowed. • Natural Cubic Splines – 26 Natural Cubic Splines 27 Natural Cubic Splines 28 Choosing Number and Location of Knots • Once again, have to decide on tuning parameter. • Ideally, place more knots in regions that function varies more, and less in regions of lower variation. • Not really a feasible option. 29 Choosing Number and Location of Knots • Placing knots at quantiles of data then determines locations once number is specified. • Can tune on number of knots, K. • Common to use cross-validation. • In software, can specify df desired for regression spline. • Software will place knots at quantiles. 30 Example: Natural Cubic Spline vs. Polynomial 31 Regression Splines • Regression splines use spline basis to fit regression. • Can use in logistic regression problems. • More stable than global polynomial fit. • Choose knots or, equivalently degrees of freedom. • Next time will consider an alternative to regression splines, known as smoothing splines. More next time! 32 Moving Beyond Linearity: Smoothing Splines Chapter 7 – Part III 33 Moving Beyond Linearity: Smoothing Splines • Smoothing Splines • Mathematical Formulation • Choosing the Smoothing Parameter 34 Smoothing Splines • Goal is to find function g to make errors small, i.e. we want small • With no restrictions, we can simply make g match data and get perfect fit. • Idea: Make g smooth. • Polynomials and cubic splines accomplished this task. • Instead, formulate problem mathematically. • How to measure smoothness? 35 Smoothing Splines • Consider squared 2nd derivative at a specific point, t, given by • Want good fit, i.e. small RSS. • But also smoothness, i.e. small for all t, or at least on average. • Penalization, or regularization, approach. • Find g to minimize 36 Smoothing Splines • Find g to minimize • Again it is RSS plus penalty term, with choice of penalty parameter. • Here, penalty term is called roughness penalty. • With no penalty, get complete interpolation to data. • As penalty parameter grows, 2nd derivative forced towards zero. • Solution is called smoothing spline. 37 Smoothing Splines: Solution • For every value of penalty parameter, nice mathematical result shows: • No penalty, df = n ---- infinite penalty, gives linear fit, so df = 2. • What is true effective df for a particular choice of penalty? 38 Smoothing Splines: Degrees of Freedom • Consider the fitted values at data points arising from smoothing spline. • This is a vector of length n. • Denote by gˆ λλ= ( gxgxˆˆ()()12, λ ,..., gx ˆ λ()n ) • Note this depends on the choice of tuning parameter. • Can be shown that where is n x n smoother matrix that is a known function of natural cubic spline basis and tuning parameter. • Fitted values are smoother matrix times response. • For linear regression, same thing, called hat matrix. • Degrees of freedom is trace of smoother matrix. 39 Choosing the Tuning Parameter • For smoothing splines, we do not choose knots. • Instead, choose penalty parameter. • Equivalently, can specify effective degrees of freedom. • As with previous case in ridge or lasso using fraction of full estimate. • Cross-validation can be used for tuning. • Common approach for smoothing splines is Leave-One-Out-CV. 40 Choosing the Tuning Parameter • Why Leave-One-Out CV (LOOCV)? • Recall, for linear regression, we had for LOOCV, that • Also applies to regression splines using basis functions. • For smoothing splines 1 CV ()λ = ()n n • 41 Choosing the Tuning Parameter 42 Smoothing Splines • Smoothing Splines are alternatives to regression splines. • Avoids choice of number and location of knots. • Instead chooses degree of smoothness. • Theoretically much easier to study its properties. • Problem: If n is extremely large, have a large n x n smoother matrix. • Computation may be difficult. Next time, will discuss local methods quite different than splines. More next time! 43 Moving Beyond Linearity: Local Regression Chapter 7 – Part IV 44 Moving Beyond Linearity: Local Regression • Local Regression • K-Nearest Neighbors Regression • Local Linear Regression 45 Local Regression: KNN Regression • Recall: K-Nearest Neighbors (KNN) Classification • For each point, consider its K nearest neighbors. • Can use same idea for KNN Regression. • For each point, we predict by: 46 Local Regression: KNN Regression 47 Local Regression: KNN Regression 48 Local Regression: KNN Regression 49 Local Regression: KNN Regression 50 Local Regression: KNN Regression 51 Curse of Dimensionality for KNN 52 Local Linear Regression • Instead of averaging neighbors, use linear regression in local neighborhoods. • Combine K-nearest neighbors and regression. • Idea: Can approximate any function by linear in small range. • At each point, we predict by using linear regression weighting only nearby points. • Choice of weight function, also called Kernel function. • Local regression

Moving Beyond Linearity

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support