Linear Regression with One Regressor

AIM QA.7.1 Explain how in econometrics measures the relationship between dependent and independent variables.

A regression analysis has the goal of measuring how changes in one variable, called a dependent or explained variable can be explained by changes in one or more other variables called the independent or explanatory variables . The regression analysis measures the relationship by estimating an equation (e.g., linear regression model). The parameters of the equation indicate the relationship.

AIM QA.7.2 Interpret a population regression function, regression coefficients, parameters, slope, intercept, and the error term.

The general form of the linear regression model is:

Yi =β0+β1Xi+εi Where

The subscript i runs over observations, i=1,, n;

Yi is the dependent variable, the regressand, or simply the left-hand variable;

Xi is the independent variable, the regressor, or simply the right-hand variable;

β0+β1X is the population regression line or population regression function;

β0 is the intercept of the population regression line (represents the value of Y if X is zero);

β1 is the slope of the population regression line (measures the change in Y for a one unit change in X); and

εi is the error term or noise component; its expected value is zero.

Y (Dependent variable)

Yi =β0+β1Xi

εi εi β 1 (Slope) β0 (Intercept)

X (Independent variable)

57

AIM QA.7.3 Interpret a sample regression function, regression coefficients, parameters, slope, intercept, and the error term.

The general form of the sample regression function is:

ˆ Yi =b0+b1Xi+ei

Where

The sample regression coefficients are b0 and b 1, which are the intercept and slope.

ˆ The ei is called the residual (ࣖ૥): ei = Yi - b0+b1Xi

Sample (b 0+b1) are computed as estimates of the parameters β 0 and β 1

AIM QA.7.4 Describe the key properties of a linear regression.

The Assumptions of the classical normal linear regression model are:

1. A linear relation exists between the dependent variable and the independent variable

2. The independent variable is uncorrelated with the error terms

3. The expected value of the error term is zero. E(ε i)=0 4. Homoskedasticity (ɯÕͿ同ૐд):The variance of the error term is the same for all

2 observations. V(ε1)= = V(ε n)= σ ε

No serial correlation (ώٝδ關) of the error terms; The error term is independent .5

across observations. Corr(εi, εj) = 0 6. The error term is normally distributed

58

AIM QA.7.5 Describe an (OLS) regression and calculate the intercept and slope of the regression. AIM QA.7.6 Describe the method and the three key assumptions of OLS for estimation of parameters. AIM QA.7.7 Summarize the benefits of using OLS estimators. AIM QA.7.8 Describe the properties of OLS estimators and their sampling distributions, and explain the properties of consistent estimators in general. AIM QA.7.9 Interpret the explained sum of squares (ESS), the total sum of squares (TSS), and the (RSS), the standard error of the regression (SER), and the regression R2 AIM QA.7.10 Interpret the results of an OLS regression

The mistake made in predicting the i th observation is

Y Yˆ Y b b X Y b b X i − i = i − ( 0 + 1 i ) = i − 0 − 1 i

The sum of these squared prediction mistakes over all n observations is n (Y − b − b X )2 ∑ i 0 1 i i =1 The estimators of the intercept and slope that minimize the sum of squared mistakes are called the ordinary least squares (OLS) estimators of β0 and β 1.

The OLS estimators of the slope β 1 and the intercept β 0 are

n n (X −X )(Y − Y ) X Y −nXY ∑ i i S ∑ i i βˆ = i =1 XY βˆ = i =1 1 = 1 n S n (X − X ) 2 XX X 2 − nX 2 ∑ i ∑ i i =1 i =1

ˆ Y ˆ X β0 = − β1

ˆ The OLS predicted values Yi and residuals εˆi are

Yˆ ˆ ˆ X i = β0 + β1 i , i = 1, , n

ˆ εˆi = Y i − Yi , i = 1, , n

ˆ ˆ The estimated intercept ( β0 ), slope ( β1 ), and residual ( εˆi ) are computed from a sample of n observations of Xi and Yi, i = 1, , n. These are estimates of the

unknown true population intercept (β0), slpoe (β 1), and error term (ε i).

59

The Least Squares Assumptions :

(1) The error term ε i has conditional mean zero given Xi:E(ε i∣Xi) = 0;

(2) (Xi, Yi), I = 1, , n are independent and identically distributed (i.i.d.) draws from their joint distribution;

(3) Large outliers are unlikely: X i and Yi have nonzero finite fourth moments.

The explained sum of squares (ESS) is also known as sum of squares regression

(SSR), is the sum of squared deviations of Yˆi from their average

n ESS 2 = (Yˆi − Y ) ∑ i =1

60

The total sum of squares (TSS) is also known as sum of squares total (SST), is the sum of squared deviations of Yi from its average. n TSS Y Y 2 = ∑ ( i − ) i =1 The residual sum of squares (RSS) is also known as sum of squared errors (SSE), is the sum of the squared OLS residuals. n RSS = (Y − Yˆ )2 ∑ i i i =1

Total sum of squares = Explained sum of squares + Residual sum of squares

n n n Y Y 2 2 2 ( i − ) = (Yˆi − Y ) + (Yi − Y ˆ i ) ∑ ∑ ∑ i =1 i =1 i =1

TSS = ESS + RSS

The R2, coefficient of determination (ನও係Ϳ), is the fraction of the sample variance of Yi explained by (or predicted by) Xi. ESS RSS R2 = = 1 − TSS TSS

The standard error of the regression (SER) is an estimator of the standard deviation of the regression error εi.

RSS SER = n − 2

61