Estad´ıstica II Chapter 4: Simple linear regression Chapter 4. Simple linear regression Contents I Objectives of the analysis. I Model specification. I Least Square Estimators (LSE): construction and properties I Statistical inference: I For the slope. I For the variance. I Prediction for a new observation (the actual value or the average value) Chapter 4. Simple linear regression Learning objectives I Ability to construct a model to describe the influence of X on Y I Ability to find estimates I Ability to construct confidence intervals and carry out tests of hypothesis I Ability to estimate the average value of Y for a given x (point estimate and confidence intervals) I Ability to estimate the individual value of Y for a given x (point estimate and confidence intervals) Chapter 4. Simple Linear Regression Bibliography I Newbold, P. \Statistics for Business and Economics" (2013) I Ch. 10 I Ross, S. \Introductory Statistics" (2005) I Ch. 12 Introduction A regression model is a model that allows us to describe an effect of a variable X on a variable Y . I X: independent or explanatory or exogenous variable I Y: dependent or response or endogenous variable The objective is to obtain reasonable estimates of Y for X based on a sample of n bivariate observations (x1; y1);:::; (xn; yn). Introduction Examples I Study how the father's height influences the son's height. I Estimate the price of an apartment depending on its size. I Predict an unemployment rate for a given age group. I Approximate a final grade in Est II based on the weekly number of study hours. I Predict the computing time as a function of the processor speed. Introduction Types of relationships I Deterministic: Given a value of X , the value of Y can be perfectly identified. y = f (x) Example: The relationship between the temp in C (X ) and Fahrenheit (Y ) is: y = 1:8x + 32 Plot of Grados Fahrenheit vs Grados centígrados 112 92 72 52 GradosFahrenheit 32 0 10 20 30 40 Grados centígrados Introduction Types of relationships I Nondeterministic (random/stochastic): Given a value of X , the value of Y cannot be perfectly known. y = f (x) + u where u is an unknown (random) perturbation (random variable). Example: Production (X ) and price (Y ). Plot of Costos vs Volumen 80 60 40 Costos 20 0 26 31 36 41 46 51 56 Volumen There is a linear pattern, but not perfect. Introduction Types of relationships I Linear: When the function f (x) is linear, f (x) = β0 + β1x I If β1 > 0 there is a positive linear relationship. I If β1 < 0 there is a negative linear relationship. Relación lineal positiva Relación lineal negativa 10 10 6 6 Y 2 Y 2 -2 -2 -6 -6 -2 -1 0 1 2 -2 -1 0 1 2 X X The scatterplot is (American) football-shaped. Introduction Types of relationships I Nonlinear: When f (x) is nonlinear. For example, f (x) = log(x); f (x) = x2 + 3;::: Relación no lineal 2 1 0 Y -1 -2 -3 -4 -2 -1 0 1 2 X The scatterplot is not (American) football-shaped. Introduction Types of relationships I Lack of relationship: When f (x) = 0. Ausencia de relación 2,5 1,5 0,5 Y -0,5 -1,5 -2,5 -2 -1 0 1 2 X Measures of linear dependence Covariance The covariance is defined as n n X X (xi − x¯)(yi − y¯) xi yi − n(¯x)(¯y) cov (x; y) = i=1 = i=1 n − 1 n − 1 I If there is a positive linear relationship, cov > 0 I If there is a negative linear relationship, cov < 0 I If there is no relationship or the relationship is nonlinear, cov ≈ 0 Problem: Covariance depends on the units of X and Y . Measures of linear dependence Correlation coefficient The correlation coefficient (unitless) is defined as cov (x; y) r(x;y) = cor (x; y) = sx sy where n n X 2 X 2 (xi − x¯) (yi − y¯) s2 = i=1 and s2 = i=1 x n − 1 y n − 1 I -1≤ cor (x; y) ≤ 1 I cor (x; y) = cor (y; x) I cor (ax + b; cy + d) = sign(a)sign(c)cor (x; y) for arbitrary numbers a; b; c; d. Simple linear regression model The simple linear regression model assumes that Yi = β0 + β1xi + ui where I Yi is the value of the dependent variable Y when the random variable X takes a specific value xi I xi is the specific value of the random variable X I ui is an error, a random variable that is assumed to be normal 2 2 with mean 0 and unknown variance σ , ui ∼ N(0; σ ) I β0 and β1 are the population coefficients: I β0 : population intercept I β1 : population slope The (population) parameters that we need to estimate are: β0, β1 and σ2. Simple linear regression model Our objective is to find the estimators/estimates β^0, β^1 of β0, β1 in order to obtain the regression line: y^ = β^0 + β^1x which is the best fit to the data with a linear pattern. Example: Let's say that the regression line for the last example is Price[ = −15:65 + 1:29 Production Plot of Fitted Model 80 60 40 Costos 20 0 26 31 36 41 46 51 56 Volumen Based on the regression line, we can estimate the price when Production is 25 millions: Price[ = −15:65 + 1:29(25) = 16:6 Simple linear regression model The difference between the observed value of the response variable yi and its estimatey ^i is called a residual: ei = yi − y^i Valor observado Dato (y) Recta de regresión estimada Example (cont.): Clearly, if for a given year the production is 25 millions, the price will not be exactly 16:6 mil euros. That small difference, the residual, in that case will be ei = 18 − 16:6 = 1:4 Simple linear regression model: model assumptions I Linearity: The underlying relationship between X and Y is linear, f (x) = β0 + β1x I Homogeneity: The errors have mean zero, E[ui ] = 0 I Homoscedasticity: The variance of the errors is constant, 2 Var(ui ) = σ I Independence: The errors are independent, E[ui uj ] = 0 I Normality: The errors follow a normal distribution, 2 ui ∼ N(0; σ ) Simple linear regression model: model assumptions Linearity The scaterplot should have an (American) football-shape, i.e., it should show scatter around a straight line. Plot of Fitted Model 80 60 If not, the regression40 line is not an adequate model for the data. Costos 20 Plot of Fitted Model 34 0 26 31 36 41 46 51 56 24 Volumen Y 14 4 -6 -5 -3 -1 1 3 5 X Simple linear regerssion model: model assumptions Homoscedasticity The vertical spread around the line should roughly remain constant. Plot of Costos vs Volumen 80 60 40 Costos 20 0 26 31 36 41 46 51 56 Volumen If that's not the case, heteroscedasticity is present. Modelo general de regresión Regresión simple consumo y peso de automóviles Núm. Obs. Peso Consumo (i) kg litros/100 km 25 1 981 11 2 878 12 Objetivo: Analizar la relación entre una o varias 3 708 8 4 1138 11 5 1064 13 20 variables dependientes y un conjunto de factores 6 655 6 7 1273 14 independientes. 8 1485 17 9 1366 18 15 10 1351 18 Simple linear regerssion model: model assumptions 11 1635 20 f (YY , ,..., Ykl | X , X ,..., X ) 12 900 10 12 1 2 13 888 7 10 14 766 9 15 981 13 Tipos de relaciones: 16 729 7 Independence 17 1034 12 Consumo (litros/100 Km) 18 1384 17 5 19 776 12 - Relación no lineal 20 835 10 21 650 9 22 956 12 0 23 688 8 I The observations- Relación should lineal be independent. 24 716 7 500 700 900 1100 1300 1500 1700 25 608 7 26 802 11 Peso (Kg) 27 1578 18 28 688 7 I One observationRegresión doesn't lineal imply simple any information about another. 29 1461 17 30 1556 15 I In general,Regresión time Lineal series fail this assumption. 2 Regresión Lineal 3 Normality I A priori, weModelo assume that the observations are normal. Hipótesis del modelo y x u u N 2 i E 0 E1 i i , i o (0,V ) Linealidad y = + x + u i E0 E1 i i Parámetros yi Normalidad y |x N + x , 2 E0 x i i (E0 E1 i V ) E 0 E1 Homocedasticidad E1 2 Var [y |x ] = i i V V 2 Independencia xi 2 Cov [yi, yk] = 0 E 0 , E1,V : parámetros desconocidos Regresión Lineal 4 Regresión Lineal 5 Modelo Recta de regresión y x u u N 2 i E 0 E1 i i , i o (0,V ) ei y i : Variable dependiente yi xi : Variable independiente y ui : Parte aleatoria V (Ordinary) Least Square Estimators: LSE x x 0 In 1809 Gauss proposed the least squaresi method to obtain the estimators β^0 and β^1 that provide the best fit Regresión Lineal 6 Regresión Lineal 7 y^i = β^0 + β^1xi The method is based on a criterion in which we minimize the sum of squares of the residuals, SSR, that is, the sum of squared Recta de regresión vertical distancesResiduos between the observed yi and predictedy ^i values n n n X X X 2 e2 =y (y − y^ )2 =ˆ ˆ x y − β^ e+ β^ x y ˆ ˆ x i i i i E0 E1 i i 0 i 1 i ˆ E 0 E1 N N i=1Valor Observadoi=1 Valor iPrevisto=1 Residuo ei y yi Pendiente ˆ E1 yˆi Eˆ Eˆ xi ˆ y ˆ x 0 1 E0 E1 x xi Regresión Lineal 8 Regresión Lineal 9 Modelo Recta de regresión y x u u N 2 Least Squares Estimatorsi E 0 E1 i i , i o (0,V ) ei The resulting estimatorsy are i : Variable dependiente yi x y i : Variable independienten u : Parte aleatoria X i (xi − x¯)(yi − y¯) V ^ cov(x; y) i=1 β1 = 2 = n x x s 0 i x X 2 Regresión Lineal (xi − x¯)6 Regresión Lineal 7 i=1 β^ =y ¯ − β^ x¯ Recta de regresión0 1 Residuos y ˆ ˆ x e y ˆ ˆ x i E0 E1 i i ˆ E 0 E1 N N Valor Observado Valor Previsto Residuo ei y yi Pendiente ˆ E1 yˆi Eˆ Eˆ xi ˆ y ˆ x 0 1 E0 E1 x xi Regresión Lineal 8 Regresión Lineal 9 Fitting the regression line Example 4.1.
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages46 Page
-
File Size-