Estadística II Chapter 4: Simple Linear Regression

Estadística II Chapter 4: Simple Linear Regression

Estad´ıstica II Chapter 4: Simple linear regression Chapter 4. Simple linear regression Contents I Objectives of the analysis. I Model specification. I Least Square Estimators (LSE): construction and properties I Statistical inference: I For the slope. I For the variance. I Prediction for a new observation (the actual value or the average value) Chapter 4. Simple linear regression Learning objectives I Ability to construct a model to describe the influence of X on Y I Ability to find estimates I Ability to construct confidence intervals and carry out tests of hypothesis I Ability to estimate the average value of Y for a given x (point estimate and confidence intervals) I Ability to estimate the individual value of Y for a given x (point estimate and confidence intervals) Chapter 4. Simple Linear Regression Bibliography I Newbold, P. \Statistics for Business and Economics" (2013) I Ch. 10 I Ross, S. \Introductory Statistics" (2005) I Ch. 12 Introduction A regression model is a model that allows us to describe an effect of a variable X on a variable Y . I X: independent or explanatory or exogenous variable I Y: dependent or response or endogenous variable The objective is to obtain reasonable estimates of Y for X based on a sample of n bivariate observations (x1; y1);:::; (xn; yn). Introduction Examples I Study how the father's height influences the son's height. I Estimate the price of an apartment depending on its size. I Predict an unemployment rate for a given age group. I Approximate a final grade in Est II based on the weekly number of study hours. I Predict the computing time as a function of the processor speed. Introduction Types of relationships I Deterministic: Given a value of X , the value of Y can be perfectly identified. y = f (x) Example: The relationship between the temp in C (X ) and Fahrenheit (Y ) is: y = 1:8x + 32 Plot of Grados Fahrenheit vs Grados centígrados 112 92 72 52 GradosFahrenheit 32 0 10 20 30 40 Grados centígrados Introduction Types of relationships I Nondeterministic (random/stochastic): Given a value of X , the value of Y cannot be perfectly known. y = f (x) + u where u is an unknown (random) perturbation (random variable). Example: Production (X ) and price (Y ). Plot of Costos vs Volumen 80 60 40 Costos 20 0 26 31 36 41 46 51 56 Volumen There is a linear pattern, but not perfect. Introduction Types of relationships I Linear: When the function f (x) is linear, f (x) = β0 + β1x I If β1 > 0 there is a positive linear relationship. I If β1 < 0 there is a negative linear relationship. Relación lineal positiva Relación lineal negativa 10 10 6 6 Y 2 Y 2 -2 -2 -6 -6 -2 -1 0 1 2 -2 -1 0 1 2 X X The scatterplot is (American) football-shaped. Introduction Types of relationships I Nonlinear: When f (x) is nonlinear. For example, f (x) = log(x); f (x) = x2 + 3;::: Relación no lineal 2 1 0 Y -1 -2 -3 -4 -2 -1 0 1 2 X The scatterplot is not (American) football-shaped. Introduction Types of relationships I Lack of relationship: When f (x) = 0. Ausencia de relación 2,5 1,5 0,5 Y -0,5 -1,5 -2,5 -2 -1 0 1 2 X Measures of linear dependence Covariance The covariance is defined as n n X X (xi − x¯)(yi − y¯) xi yi − n(¯x)(¯y) cov (x; y) = i=1 = i=1 n − 1 n − 1 I If there is a positive linear relationship, cov > 0 I If there is a negative linear relationship, cov < 0 I If there is no relationship or the relationship is nonlinear, cov ≈ 0 Problem: Covariance depends on the units of X and Y . Measures of linear dependence Correlation coefficient The correlation coefficient (unitless) is defined as cov (x; y) r(x;y) = cor (x; y) = sx sy where n n X 2 X 2 (xi − x¯) (yi − y¯) s2 = i=1 and s2 = i=1 x n − 1 y n − 1 I -1≤ cor (x; y) ≤ 1 I cor (x; y) = cor (y; x) I cor (ax + b; cy + d) = sign(a)sign(c)cor (x; y) for arbitrary numbers a; b; c; d. Simple linear regression model The simple linear regression model assumes that Yi = β0 + β1xi + ui where I Yi is the value of the dependent variable Y when the random variable X takes a specific value xi I xi is the specific value of the random variable X I ui is an error, a random variable that is assumed to be normal 2 2 with mean 0 and unknown variance σ , ui ∼ N(0; σ ) I β0 and β1 are the population coefficients: I β0 : population intercept I β1 : population slope The (population) parameters that we need to estimate are: β0, β1 and σ2. Simple linear regression model Our objective is to find the estimators/estimates β^0, β^1 of β0, β1 in order to obtain the regression line: y^ = β^0 + β^1x which is the best fit to the data with a linear pattern. Example: Let's say that the regression line for the last example is Price[ = −15:65 + 1:29 Production Plot of Fitted Model 80 60 40 Costos 20 0 26 31 36 41 46 51 56 Volumen Based on the regression line, we can estimate the price when Production is 25 millions: Price[ = −15:65 + 1:29(25) = 16:6 Simple linear regression model The difference between the observed value of the response variable yi and its estimatey ^i is called a residual: ei = yi − y^i Valor observado Dato (y) Recta de regresión estimada Example (cont.): Clearly, if for a given year the production is 25 millions, the price will not be exactly 16:6 mil euros. That small difference, the residual, in that case will be ei = 18 − 16:6 = 1:4 Simple linear regression model: model assumptions I Linearity: The underlying relationship between X and Y is linear, f (x) = β0 + β1x I Homogeneity: The errors have mean zero, E[ui ] = 0 I Homoscedasticity: The variance of the errors is constant, 2 Var(ui ) = σ I Independence: The errors are independent, E[ui uj ] = 0 I Normality: The errors follow a normal distribution, 2 ui ∼ N(0; σ ) Simple linear regression model: model assumptions Linearity The scaterplot should have an (American) football-shape, i.e., it should show scatter around a straight line. Plot of Fitted Model 80 60 If not, the regression40 line is not an adequate model for the data. Costos 20 Plot of Fitted Model 34 0 26 31 36 41 46 51 56 24 Volumen Y 14 4 -6 -5 -3 -1 1 3 5 X Simple linear regerssion model: model assumptions Homoscedasticity The vertical spread around the line should roughly remain constant. Plot of Costos vs Volumen 80 60 40 Costos 20 0 26 31 36 41 46 51 56 Volumen If that's not the case, heteroscedasticity is present. Modelo general de regresión Regresión simple consumo y peso de automóviles Núm. Obs. Peso Consumo (i) kg litros/100 km 25 1 981 11 2 878 12 Objetivo: Analizar la relación entre una o varias 3 708 8 4 1138 11 5 1064 13 20 variables dependientes y un conjunto de factores 6 655 6 7 1273 14 independientes. 8 1485 17 9 1366 18 15 10 1351 18 Simple linear regerssion model: model assumptions 11 1635 20 f (YY , ,..., Ykl | X , X ,..., X ) 12 900 10 12 1 2 13 888 7 10 14 766 9 15 981 13 Tipos de relaciones: 16 729 7 Independence 17 1034 12 Consumo (litros/100 Km) 18 1384 17 5 19 776 12 - Relación no lineal 20 835 10 21 650 9 22 956 12 0 23 688 8 I The observations- Relación should lineal be independent. 24 716 7 500 700 900 1100 1300 1500 1700 25 608 7 26 802 11 Peso (Kg) 27 1578 18 28 688 7 I One observationRegresión doesn't lineal imply simple any information about another. 29 1461 17 30 1556 15 I In general,Regresión time Lineal series fail this assumption. 2 Regresión Lineal 3 Normality I A priori, weModelo assume that the observations are normal. Hipótesis del modelo y x u u N 2 i E 0 E1 i i , i o (0,V ) Linealidad y = + x + u i E0 E1 i i Parámetros yi Normalidad y |x N + x , 2 E0 x i i (E0 E1 i V ) E 0 E1 Homocedasticidad E1 2 Var [y |x ] = i i V V 2 Independencia xi 2 Cov [yi, yk] = 0 E 0 , E1,V : parámetros desconocidos Regresión Lineal 4 Regresión Lineal 5 Modelo Recta de regresión y x u u N 2 i E 0 E1 i i , i o (0,V ) ei y i : Variable dependiente yi xi : Variable independiente y ui : Parte aleatoria V (Ordinary) Least Square Estimators: LSE x x 0 In 1809 Gauss proposed the least squaresi method to obtain the estimators β^0 and β^1 that provide the best fit Regresión Lineal 6 Regresión Lineal 7 y^i = β^0 + β^1xi The method is based on a criterion in which we minimize the sum of squares of the residuals, SSR, that is, the sum of squared Recta de regresión vertical distancesResiduos between the observed yi and predictedy ^i values n n n X X X 2 e2 =y (y − y^ )2 =ˆ ˆ x y − β^ e+ β^ x y ˆ ˆ x i i i i E0 E1 i i 0 i 1 i ˆ E 0 E1 N N i=1Valor Observadoi=1 Valor iPrevisto=1 Residuo ei y yi Pendiente ˆ E1 yˆi Eˆ Eˆ xi ˆ y ˆ x 0 1 E0 E1 x xi Regresión Lineal 8 Regresión Lineal 9 Modelo Recta de regresión y x u u N 2 Least Squares Estimatorsi E 0 E1 i i , i o (0,V ) ei The resulting estimatorsy are i : Variable dependiente yi x y i : Variable independienten u : Parte aleatoria X i (xi − x¯)(yi − y¯) V ^ cov(x; y) i=1 β1 = 2 = n x x s 0 i x X 2 Regresión Lineal (xi − x¯)6 Regresión Lineal 7 i=1 β^ =y ¯ − β^ x¯ Recta de regresión0 1 Residuos y ˆ ˆ x e y ˆ ˆ x i E0 E1 i i ˆ E 0 E1 N N Valor Observado Valor Previsto Residuo ei y yi Pendiente ˆ E1 yˆi Eˆ Eˆ xi ˆ y ˆ x 0 1 E0 E1 x xi Regresión Lineal 8 Regresión Lineal 9 Fitting the regression line Example 4.1.

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    46 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us