Moving Beyond Linearity

Moving Beyond Linearity Chapter 7 – Part I 1 Moving Beyond Linearity • Polynomial Regression • Basis Functions • Polynomials • Piecewise Constant and Step Functions 2 Moving Beyond Linearity • Truth is never exactly linear. • Often can be good approximation. • But when it is not, need methods for non-linearity. • Want flexibility, but keep some interpretability if possible. • First consider single predictor, then generalize to multiple predictors. 3 Polynomial Regression • Replace simple linear regression with polynomial regression • Higher degree polynomials give more flexible fit. 4 Polynomial Regression 5 Polynomial Regression • Where do the confidence intervals come from? ˆ • For any given , we need: Var() f() x0 • Then we use • Have variances and covariances of coefficients. • Recall: • Use repeatedly on • Or use matrix form: Var ()() cTTββˆˆ= c Var c 6 Polynomial Regression • Seems to be distinct split in wages. • Most under 250, few above 250. • Instead of predicting wage itself, estimate probability of wage > 250. 7 Polynomial Regression • Fit logistic regression. Pr( Wagei > 250 Age i) = Pr()zx ii = 1 • Example uses 4th degree. 8 Basis Functions • Polynomial regression is special case of basis function approach. • Idea: • Construct set of fixed basis functions to apply to predictor variable, X. • Fit regression: 9 Local Basis Functions • Polynomial basis functions have problems. • Polynomial basis leads to global fitting. • Local fitting can be better. 10 Piecewise Constant Basis: Step Functions • Simplest local basis creates a step function. • Function is constant on each disjoint interval. b01() xii= Ix[] < c, b1() xii= Ic[] 12 ≤< x c , bK−−11() x i= Ic[ K ≤< x iK c], bKi() x= Ix[] i ≥ c K 11 Piecewise Constant Basis: Step Functions • Fit regression using constant basis functions. • Omit indicator for first interval. Interpretation: • All basis functions in model are zero for values in first interval. • What is the predicted value? • Can also do logistic regression if response is binary. 12 Piecewise Constant Basis: Step Functions 13 Piecewise Constant Basis: Step Functions • Benefits: • Drawbacks: 14 Polynomials and Piecewise Constants • Polynomials have advantage of being smooth, not jumps. • Piecewise constant basis functions give local instead of global fit. • How about combining the two? More next time! 15 Moving Beyond Linearity: Regression Splines Chapter 7 – Part II 16 Moving Beyond Linearity: Regression Splines • Regression Splines • Piecewise Polynomial Basis • Splines • Choosing Number and Location of Knots 17 Piecewise Polynomials • Generalize piecewise constant approach. • Instead of constant between knots, fit polynomials. • Idea: Instead of an overall polynomial, can fit separate between knots. 18 Piecewise Polynomials • Example: Piecewise cubic with a single cutpoint at c = 50. • Looks strange because of jump at Age = 50. 19 Piecewise Polynomials • Fit piecewise cubic constrained to be continuous at knot. • Much better. • Still odd with non-differentiable point at Age = 50. • Creates a sharp corner. • For “maximum smoothness” at knots for cubic, want: • 20 Splines 21 Spline Basis and Regression Splines • How do we implement the constraints needed for the splines? • In general, with K knots, splines have d + K + 1 degrees of freedom. • Unconstrained would have (K + 1) (d + 1) = K d + d + K + 1. • But, have d constraints for each of K knots. • So left with df = d + K + 1. • Cubic: d = 3, so df = K + 4 --- Linear: d = 1, so df = K + 2 • We can write set of d + K + 1 basis functions, known as a spline basis. • Then fit, for example using cubic splines: • Using spline basis for regression, called regression splines. 22 Linear Spline Basis • How can we write a spline basis? • Note: first term is intercept, only need d + K basis functions. • Consider linear splines. • K knots gives K + 1 intervals. • Linear spline fully determined by: 1. Intercept. 2. Value of slope on each of the K + 1 intervals. • Equivalently, can use differences in slopes between successive intervals. 23 Linear Spline Basis • So, fit: yi=ββ0 + 11 bx()i + β 2 bx 2 () i + β 33 bx ()... i ++ β KK++1 b 1 () x i + ε i with bx1 ()ii= x = − bx21()()ii x c+ = − bx32()()ii x c+ = − bK+1 ()() x i xc iK+ • Known as truncated power basis of order 1. • 24 Cubic Spline Basis yi=+++ββ0 11 bx() iiiiKKii β 22 bx () β 33 bx () + β 44 bx ()... + β++3 b 3 () x + ε bx1 ()ii= x 2 bx2 ()ii= x 3 bx3 ()ii= x = − 3 bx41()()ii x c+ = − 3 bK+3 ()() x i xc iK+ 25 Cubic Spline Basis • Why are cubic splines the most popular degree? • Low order polynomial reduces overfitting problems. • Visual smoothness. • Problem: First and last interval often highly variable if cubic allowed. • Natural Cubic Splines – 26 Natural Cubic Splines 27 Natural Cubic Splines 28 Choosing Number and Location of Knots • Once again, have to decide on tuning parameter. • Ideally, place more knots in regions that function varies more, and less in regions of lower variation. • Not really a feasible option. 29 Choosing Number and Location of Knots • Placing knots at quantiles of data then determines locations once number is specified. • Can tune on number of knots, K. • Common to use cross-validation. • In software, can specify df desired for regression spline. • Software will place knots at quantiles. 30 Example: Natural Cubic Spline vs. Polynomial 31 Regression Splines • Regression splines use spline basis to fit regression. • Can use in logistic regression problems. • More stable than global polynomial fit. • Choose knots or, equivalently degrees of freedom. • Next time will consider an alternative to regression splines, known as smoothing splines. More next time! 32 Moving Beyond Linearity: Smoothing Splines Chapter 7 – Part III 33 Moving Beyond Linearity: Smoothing Splines • Smoothing Splines • Mathematical Formulation • Choosing the Smoothing Parameter 34 Smoothing Splines • Goal is to find function g to make errors small, i.e. we want small • With no restrictions, we can simply make g match data and get perfect fit. • Idea: Make g smooth. • Polynomials and cubic splines accomplished this task. • Instead, formulate problem mathematically. • How to measure smoothness? 35 Smoothing Splines • Consider squared 2nd derivative at a specific point, t, given by • Want good fit, i.e. small RSS. • But also smoothness, i.e. small for all t, or at least on average. • Penalization, or regularization, approach. • Find g to minimize 36 Smoothing Splines • Find g to minimize • Again it is RSS plus penalty term, with choice of penalty parameter. • Here, penalty term is called roughness penalty. • With no penalty, get complete interpolation to data. • As penalty parameter grows, 2nd derivative forced towards zero. • Solution is called smoothing spline. 37 Smoothing Splines: Solution • For every value of penalty parameter, nice mathematical result shows: • No penalty, df = n ---- infinite penalty, gives linear fit, so df = 2. • What is true effective df for a particular choice of penalty? 38 Smoothing Splines: Degrees of Freedom • Consider the fitted values at data points arising from smoothing spline. • This is a vector of length n. • Denote by gˆ λλ= ( gxgxˆˆ()()12, λ ,..., gx ˆ λ()n ) • Note this depends on the choice of tuning parameter. • Can be shown that where is n x n smoother matrix that is a known function of natural cubic spline basis and tuning parameter. • Fitted values are smoother matrix times response. • For linear regression, same thing, called hat matrix. • Degrees of freedom is trace of smoother matrix. 39 Choosing the Tuning Parameter • For smoothing splines, we do not choose knots. • Instead, choose penalty parameter. • Equivalently, can specify effective degrees of freedom. • As with previous case in ridge or lasso using fraction of full estimate. • Cross-validation can be used for tuning. • Common approach for smoothing splines is Leave-One-Out-CV. 40 Choosing the Tuning Parameter • Why Leave-One-Out CV (LOOCV)? • Recall, for linear regression, we had for LOOCV, that • Also applies to regression splines using basis functions. • For smoothing splines 1 CV ()λ = ()n n • 41 Choosing the Tuning Parameter 42 Smoothing Splines • Smoothing Splines are alternatives to regression splines. • Avoids choice of number and location of knots. • Instead chooses degree of smoothness. • Theoretically much easier to study its properties. • Problem: If n is extremely large, have a large n x n smoother matrix. • Computation may be difficult. Next time, will discuss local methods quite different than splines. More next time! 43 Moving Beyond Linearity: Local Regression Chapter 7 – Part IV 44 Moving Beyond Linearity: Local Regression • Local Regression • K-Nearest Neighbors Regression • Local Linear Regression 45 Local Regression: KNN Regression • Recall: K-Nearest Neighbors (KNN) Classification • For each point, consider its K nearest neighbors. • Can use same idea for KNN Regression. • For each point, we predict by: 46 Local Regression: KNN Regression 47 Local Regression: KNN Regression 48 Local Regression: KNN Regression 49 Local Regression: KNN Regression 50 Local Regression: KNN Regression 51 Curse of Dimensionality for KNN 52 Local Linear Regression • Instead of averaging neighbors, use linear regression in local neighborhoods. • Combine K-nearest neighbors and regression. • Idea: Can approximate any function by linear in small range. • At each point, we predict by using linear regression weighting only nearby points. • Choice of weight function, also called Kernel function. • Local regression

Load more