The Simple Linear Regression Model: Specification and Estimation 2.1 An

12/11/2007

The Simple Linear Regression Model: Specification and Estimation

Prepared by Vera Tabakova, East Carolina University

2.1 An Economic Model 2.2 An Econometric Model 2.3 Estimating the Regression Parameters 24A2.4 Assess ing t he Least Squares Est imators 2.5 The Gauss-Markov Theorem 2.6 The Probability Distributions of the Least Squares Estimators 2.7 Estimating the Variance of the Error Term

Principles of Econometrics, 3rd Edition Slide 2-2

Figure 2.1a Probability distribution of food expenditure y given income x = $1000

Principles of Econometrics, 3rd Edition Slide 2-2-33

1 12/11/2007

Figure 2.1b Probability distributions of food expenditures y given incomes x = $1000 and x = $2000

Principles of Econometrics, 3rd Edition Slide2- 2-44

The simple regression function

E(|y x )==+μββyx| 12x (2.1)

Principles of Econometrics, 3rd Edition Slide 2-5

Figure 2.2 The economic model: a linear relationship between average per person food expenditure and income

Principles of Econometrics, 3rd Edition Slide 2-6

2 12/11/2007

Slope of regression line

ΔE(y| x) dE(y|x) β == (2.2) 2 Δx dx

“Δ” denotes “change in”

Principles of Econometrics, 3rd Edition Slide 2-7

Figure 2.3 The probability density function for y at two levels of income

Principles of Econometrics, 3rd Edition Slide 2-8

Assumptions of the Simple Linear Regression Model –I– I

The mean value of y, for each value of x, is given by the linear regression

E ()yx| =β12 +β x

Principles of Econometrics, 3rd Edition Slide 2-9

3 12/11/2007

Assumptions of the Simple Linear Regression Model –I– I

For each value of x, the values of y are distributed about their mean value, following probability distributions that all have thihe same variance, var()yx | =σ2

Principles of Econometrics, 3rd Edition Slide 2-10

Assumptions of the Simple Linear Regression Model –I– I

The sample values of y are all uncorrelated, and have zero covariance, implying that there is no linear association among them,

cov()yyij ,= 0

This assumption can be made stronger by assuming that the values of y are all statistically independent.

Principles of Econometrics, 3rd Edition Slide 2-11

Assumptions of the Simple Linear Regression Model –I– I

The variable x is not random, and must take at least two different values.

Principles of Econometrics, 3rd Edition Slide 2-12

4 12/11/2007

Assumptions of the Simple Linear Regression Model –I– I

(optional) The values of y are normally distributed about their mean for each value of x,

∼ ⎡⎤2 yN⎣⎦β+β12 x, σ

Principles of Econometrics, 3rd Edition Slide 2-13

Assumptions of the Simple Linear Regression Model -I- I

•The mean value of y, for each value of x, is given by the linear regression

E(|)yx=β12 +β x •For each value of x, the values of y are distributed about their mean value , following probability distributions that all have the same variance, var(yx | ) = σ2 •The sample values of y are all uncorrelated, and have zero covariance, implying that there is no linear association among them, cov( yyij ,) = 0 This assumption can be made stronger by assuming that the values of y are all statistically independent. •The variable x is not random, and must take at least two different values.

•(optional) The values of y are normally distributed about their mean for each value of x, ⎡⎤2 yN~,⎣⎦()β+β12 x σ

Principles of Econometrics, 3rd Edition Slide 2-14

2.2.1 Introducing the Error Term The random error term is defined as

eyEyxy=−(|) =−β−β12 x (2.3) Rearranging gives

yxe=+ββ12 + (2.4) y is dependent variable; x is independent variable

Principles of Econometrics, 3rd Edition Slide 2-15

5 12/11/2007

The expected value of the error term, given x, is

Eex()||=−β−β= Eyx ( )12 x 0

The mean value of the error term, given x, is zero.

Principles of Econometrics, 3rd Edition Slide 2-16

Figure 2.4 Probability density functions for e and y

Principles of Econometrics, 3rd Edition Slide 2-17

Assumptions of the Simple Linear Regression Model –II– II

SR1. The value of y, for each value of x, is

yxe=β12 +β +

Principles of Econometrics, 3rd Edition Slide 2-18

6 12/11/2007

Assumptions of the Simple Linear Regression Model –II– II

SR2. The expected value of the random error e is

Ee()= 0

Which is equivalent to assuming that

E()yx=β12 +β

Principles of Econometrics, 3rd Edition Slide 2-19

Assumptions of the Simple Linear Regression Model –II– II

SR3. The variance of the random error e is

var(ey )=σ2 = var( )

The random variables y and e have the same variance because they differ only by a constant.

Principles of Econometrics, 3rd Edition Slide 2-20

Assumptions of the Simple Linear Regression Model –II– II

SR4. The covariance between any pair of random errors,

ei and ej is

cov(eeij , )== cov( y i , y j ) 0

The stronger version of this assumption is that the random errors e are statistically independent, in which case the values of the dependent variable y are also statistically independent.

Principles of Econometrics, 3rd Edition Slide 2-21

7 12/11/2007

Assumptions of the Simple Linear Regression Model –II– II

SR5. The variable x is not random, and must take at least two differen t val ues.

Principles of Econometrics, 3rd Edition Slide 2-22

Assumptions of the Simple Linear Regression Model –II– II

SR6. (optional) The values of e are normally distributed about ththeireir mean eN∼ ()0,σ2

if the values of y are normally distributed, and vice versa.

Principles of Econometrics, 3rd Edition Slide 2-23

Assumptions of the Simple Linear Regression Model -II- II

•SR1. yxe=+ββ12 + •SR2. ⇔ Ee()= 0 Ey()=β12 +β x •SR3. var(ey )=σ2 = var( )

•SR4. cov(eeij , )== cov( yy ij , ) 0 •SR5. The variable x is not random, and must take at least two different values. 2 •SR6. (optional) The values of e are normally distributed about their mean eN~(,)0 σ

Principles of Econometrics, 3rd Edition Slide 2-24

8 12/11/2007

Figure 2.5 The relationship among y, e and the true regression line

Principles of Econometrics, 3rd Edition Slide 2-25

Principles of Econometrics, 3rd Edition Slide 2-26

Figure 2.6 Data for food expenditure example

Principles of Econometrics, 3rd Edition Slide 2-27

9 12/11/2007

2.3.1 The Least Squares Principle The fitted regression line is

ybbxˆii=+12 (2.5)

The least squares residual

eyyybbxˆˆiiii=−=−−12 i (2.6)

Principles of Econometrics, 3rd Edition Slide 2-28

Figure 2.7 The relationship among y, ê and the fitted regression line

Principles of Econometrics, 3rd Edition Slide 2-29

Any other fitted line

*** ybbxˆii=+12

Least squares line has smaller sum of squared residuals

NN ˆˆ2**2* if SSE=∑∑ eii and SSE = e then SSE < SSE ii==11

Principles of Econometrics, 3rd Edition Slide 2-30

10 12/11/2007

Least squares estimates for the unknown parameters β1

and β2 are obtained my minimizing the sum of squares function

N 2 Syx()ββ12,( =∑ ii −β−β 1 2 ) i=1

Principles of Econometrics, 3rd Edition Slide 2-31

The Least Squares Estimators

∑( xii−−xy)( y) (2 7) b2 = 2 (2.7) ∑( xxi − )

bybx12=− (2.8)

Principles of Econometrics, 3rd Edition Slide 2-32

2.3.2 Estimates for the Food Expenditure Function

()xxyy−−()18671.2684 b ===∑ ii 10.2096 2 2 1828.7876 ∑( xxi − )

bybx12=− =283.5735 − (10.2096)(19.6048) = 83.4160

A convenient way to report the values for b1 and b2 is to write out the estimated or fitted regression line:

yxˆii=+83.42 10.21

Principles of Econometrics, 3rd Edition Slide 2-33

11 12/11/2007

Figure 2.8 The fitted regression line

Principles of Econometrics, 3rd Edition Slide 2-34

2.3.3 Interpreting the Estimates

The value b2 = 10.21 is an estimate of β2, the amount by which weekly expenditure on food per household increases when household weeklyy,y income increases by $100. Thus, we estimate that if income goes up by $100, expected weekly expenditure on food will increase by approximately $10.21.

Strictly speaking, the intercept estimate b1 = 83.42 is an estimate of the weekly food expenditure on food for a household with zero income.

Principles of Econometrics, 3rd Edition Slide 2-35

2.3.3a Elasticities Income elasticity is a useful way to characterize the responsiveness of consumer expenditure to changes in income. The elasticity of a variable y with respect to another variable x is percentage change in yyyyxΔΔ ε= = = percentage change in x ΔΔxx xy

In the linear economic model given by (2.1) we have shown that ΔE ()y β= 2 Δx

Principles of Econometrics, 3rd Edition Slide 2-36

12 12/11/2007

The elasticity of mean expenditure with respect to income is

ΔΔE()/()yEy Ey () x x ε= = ⋅ =β ⋅ (2.9) ΔΔx /()()x x Ey2 Ey

A frequently used alternative is to calculate the elasticity at the “point of the means” because it is a representative point on the regression line. x 19.60 ε=ˆ b =10.21 × = .71 2 y 283.57

Principles of Econometrics, 3rd Edition Slide 2-37

2.3.3b Prediction Suppose that we wanted to predict weekly food expenditure for a household with a weekly income of $2000. This prediction is carried out by substituting x = 20 into our estimated equation to obtain

yxˆii=+83.42 10.21 =+ 83.42 10.21(20) = 287.61

We predict that a household with a weekly income of $2000 will spend $287.61 per week on food.

Principles of Econometrics, 3rd Edition Slide 2-38

2.3.3c Examining Computer Output

Figure 2.9 EViews Regression Output

Principles of Econometrics, 3rd Edition Slide 2-39

13 12/11/2007

2.3.4 Other Economic Models The “log-log” model

ln(yx )=β12 +β ln( )

dy[ln( )] 1 dy =⋅ dx y dx

dx[β+β ln( )] 1 12 =⋅β dx x 2 dy x β= ⋅ 2 dx y

Principles of Econometrics, 3rd Edition Slide 2-40

2.4.1 The estimator b2

N bwy2 = ∑ ii (2.10) i=1

xi − x wi = 2 (2.11) ∑()xi − x

bwe22=β +∑ ii (2.12)

Principles of Econometrics, 3rd Edition Slide 2-41

2.4.2 The Expected Values of b1 and b2 We will show that if our model assumptions hold, then , which means Eb()22=β that the estimator is unbiased.

We can find the expected value of b2 using the fact that the expected value of a sum is the sum of expected values

Eb()22=β+ E( ∑ weii) =β++ E( 21122 we we ++ we N N)

=β+EEweEweEwe()21122 ( ) + ( ) ++ (NN ) (2.13)

=β+EEwe()2 ∑ (ii )

=β22 +∑ wEii() e =β

using Ewe () ii = wEe i () i and Ee () i = 0 . Principles of Econometrics, 3rd Edition Slide 2-42

14 12/11/2007

2.4.3 Repeated Sampling

Principles of Econometrics, 3rd Edition Slide 2-43

2 The variance of b is defined as 2 var ()bEbEb222=−⎣⎦⎡⎤ ()

Figure 2.10 Two possible probability density functions for b2

Principles of Econometrics, 3rd Edition Slide 2-44

2.4.4 The Variances and Covariances of b1 and b2

If the regression model assumptions SR1-SR5 are correct (assumption SR6 is not

required), then the variances and covariance of b1 and b2 are:

2 2 ⎡⎤∑ xi var(b1 ) =σ ⎢⎥2 (2.14) ⎣⎦Nxx∑()i − σ2 var(b2 ) = 2 (2.15) ∑()xi − x

2 ⎡⎤−x cov(bb12 , ) =σ ⎢⎥2 (2.16) ⎣⎦∑()xxi −

Principles of Econometrics, 3rd Edition Slide 2-45

15 12/11/2007

2.4.4 The Variances and Covariances of b1 and b2 The larger the variance term σ 2, the greater the uncertainty there is in the statistical model, and the larger the variances and covariance of the least squares estimators. 2 The larger the sum of squares, ∑ () xx i − , the smaller the variances of the least squares estimators and the more precilisely we can estimate the unknown parameters. The larger the sample size N, the smaller the variances and covariance of the least squares estimators. 2 The larger this term ∑ x i is, the larger the variance of the least squares estimator b1. The absolute magnitude of the covariance increases the larger in magnitude is the sample mean x , and the covariance has a sign opposite to that of x .

Principles of Econometrics, 3rd Edition Slide 2-46

2 The variance of b is defined as 2 var ()bEbEb222=−⎣⎦⎡⎤ ()

Figure 2.11 The influence of variation in the explanatory variable x on precision of estimation (a) Low x variation, low precision (b) High x variation, high precision

Principles of Econometrics, 3rd Edition Slide 2-47

Gauss-Markov Theorem: Under the assumptions SR1-SR5 of the linear regression model, the estimators

b1 and b2 have the smallest variance of all linear and

unbiased estimators of b1 and b2. They are the Best

Linear Unbiased Estimators (BLUE) of b1 and b2

Principles of Econometrics, 3rd Edition Slide 2-48

16 12/11/2007

1. The estimators b1 and b2 are “best” when compared to similar estimators, those

which are linear and unbiased. The Theorem does not say that b1 and b2 are the best of all possible estimators.

2. The estimators b1 and b2 are best within their class because they have the minimum variance . When comparing two linear and unbiased estimators , we always want to use the one with the smaller variance, since that estimation rule gives us the higher probability of obtaining an estimate that is close to the true parameter value.

3. In order for the Gauss-Markov Theorem to hold, assumptions SR1-SR5 must

be true. If any of these assumptions are not true, then b1 and b2 are not the best

linear unbiased estimators of β1 and β2.

Principles of Econometrics, 3rd Edition Slide 2-49

4. The Gauss-Markov Theorem does not depend on the assumption of normality (assumption SR6).

5. In the simple linear regression model, if we want to use a linear and unbiased

estimator, then we have to do no more searching. The estimators b1 and b2 are the ones to use. This explains why we are studying these estimators and why they are so widely used in research, not only in economics but in all social and physical sciences as well.

6. The Gauss-Markov theorem applies to the least squares estimators. It does not apply to the least squares estimates from a single sample.

Principles of Econometrics, 3rd Edition Slide 2-50

If we make the normality assumption (assumption SR6 about the error term) then the least squares estimators are normally distributed

22 ⎛⎞σ ∑ xi bN11~,⎜⎟β 2 (2.17) ⎝⎠Nxx∑()i −

⎛⎞σ2 bN22~,⎜⎟β 2 (2.18) ⎝⎠∑()xxi −

A Central Limit Theorem: If assumptions SR1-SR5 hold, and if the sample size N is sufficiently large, then the least squares estimators have a distribution that approximates the normal distributions shown in (2.17) and (2.18).

Principles of Econometrics, 3rd Edition Slide 2-51

17 12/11/2007

The variance of the random error ei is

222 var(eEeEeEeiiii )=σ = [ − ( )] = ( )

if the assumption E(ei) = 0 is correct. Since the “expectation” is an average value we might consider estimating σ2 as the average of the squared errors, e2 σ=ˆ 2 ∑ i N

Recall that the random errors are

eyii=−β−β12 x i

Principles of Econometrics, 3rd Edition Slide 2-52

The least squares residuals are obtained by replacing the unknown parameters by their least squares estimates,

eyyybbxˆˆiiii=−=−−12 i

eˆ2 σ=ˆ 2 ∑ i N There is a simple modification that produces an unbiased estimator, and that is

eˆ2 σ=ˆ 2 ∑ i (2.19) N − 2

E()σ=σˆ 22

Principles of Econometrics, 3rd Edition Slide 2-53

2 2 Replace the unknown error variance σ in (2.14)-(2.16) by σˆ to obtain:

⎡⎤x2 ˆ 2 ∑ i (2.20) var ()b1 =σ ⎢⎥2 ⎣⎦Nxx∑()i −

2 σˆ var ()b2 = 2 (2.21) ∑()xi − x

⎡⎤−x ˆ 2 cov()bb12 , =σ ⎢⎥2 (2.22) ⎣⎦∑()xxi −

Principles of Econometrics, 3rd Edition Slide 2-54

18 12/11/2007

The square roots of the estimated variances are the “standard errors” of b1 and b2.

se(bb11) = var( ) (2.23)

se()bb22= var () (2.24)

Principles of Econometrics, 3rd Edition Slide 2-55

eˆ2 304505.2 σ=ˆ 2 ∑ i = =8013.29 N − 238

Principles of Econometrics, 3rd Edition Slide 2-56

The estimated variances and covariances for a regression are arrayed in a rectangular array, or matrix, with variances on the diagonal and covariances in the “off-diagonal” positions.

⎡⎤var(bbb) cov( , ) ⎢⎥112 ⎢⎥ ⎣⎦cov()bb12 , var() b 2

Principles of Econometrics, 3rd Edition Slide 2-57

19 12/11/2007

For the food expenditure data the estimated covariance matrix is:

C INCOME C 1884.442 -85.90316 INCOME -85.90316 4.381752

Principles of Econometrics, 3rd Edition Slide 2-58

var()b1 = 1884.442

var()b2 = 4.381752

cov()bb12 ,=− 85.90316

se()bb11== var () 1884.442 = 43.410

se()bb22== var () 4.381752 = 2.093

Principles of Econometrics, 3rd Edition Slide 2-59

assumptions homoskedastic sampling precision asymptotic independent variable sampling properties B.L.U.E. least squares scatter diagram biased estimator estimates simple linear degrees of freedom least squares regression function dependent variable estimators specification error deviation from the least squares principle unbiased estimator mean form least squares residuals econometric model linear estimator economic model prediction elasticity random error term Gauss-Markov regression model Theorem regression parameters heteroskedastic repeated sampling

Principles of Econometrics, 3rd Edition Slide 2-60

20 12/11/2007

Principles of Econometrics, 3rd Edition Slide 2-61

N 2 Syx(,ββ12 ) =∑ (ii −β−β 1 2 ) (2A.1) i =1

∂S =β−22N 12∑∑yxii + 2( ) β ∂β1 (2A.2)

∂S 2 =β−+β222()∑∑∑xxyxiiii21() ∂β2

Principles of Econometrics, 3rd Edition Slide 2-62

Figure 2A.1 The sum of squares function and the minimizing values b1 and b2

Principles of Econometrics, 3rd Edition Slide 2-63

21 12/11/2007

⎡⎤ 20⎣⎦∑∑yNbii−−12( xb) =

20⎡⎤xy−− x b x2 b = ⎣⎦∑∑ii() i12() ∑ i

Nbxby12+=(∑∑ii) (2A.3)

2 (∑∑∑xbiiii) 12+=( x) b xy (2A.4)

N ∑∑∑xyii− x i y i b = (2A.5) 2 2 2 Nx∑∑ii− () x

Principles of Econometrics, 3rd Edition Slide 2-64

22 22⎛⎞1 2 ∑∑∑∑()xiiii−=x x − 2 x x + Nx = x − 2 x⎜⎟ N ∑ x i + Nx N ⎝⎠(2B.1)

22222 =−∑∑xNxNxxNxii2 +=−

2 ()xi 2222 2∑ (2B.2) ∑∑()xxii−= x − Nx = ∑∑∑ xxx iii − = x − N

x y ∑∑ii (2B.3) ∑∑∑()()xxyyii−−=−=− xyNxyxy ii ii N

Principles of Econometrics, 3rd Edition Slide 2-65

We can rewrite b2 in deviation from the mean form as:

∑()()xii−−xy y b2 = 2 ∑()xxi −

Principles of Econometrics, 3rd Edition Slide 2-66

22 12/11/2007

∑( xxi −=) 0

∑∑∑()()()xii−−xy y x ii −− xy y () x i − x b2 ==22 ∑∑()xii−−xxx ()

∑()xxyi − ii⎡⎤()xx− ==22∑∑⎢⎥ywyiii = ∑∑()xxii−−⎣⎦ () xx

Principles of Econometrics, 3rd Edition Slide 2-67

To obtain (2.12) replace yi in (2.11) by yxe iii =β 12 +β + and simplify:

b212==∑∑wyii w i()β+β x i + e i

=β12∑∑∑wwxweiiiii +β +

=β2 +∑weii

Principles of Econometrics, 3rd Edition Slide 2-68

⎡⎤ ()xxi − 1 ∑∑wxxii==⎢⎥22 ∑() −=0 ⎣⎦⎢⎥∑∑()xxii−−() xx

∑wxii=1

β=β22∑ wxii

∑()0xxi −=

Principles of Econometrics, 3rd Edition Slide 2-69

23 12/11/2007

2 ∑∑( xxiii−=) ( xxxx −)( −)

=−−∑∑()xiixx x() x i − x

=−∑()xxxii

∑∑( xxxii−−) ( xxx ii) ∑ wxii===2 1 xxx− ∑()xxi − ∑()ii

Principles of Econometrics, 3rd Edition Slide 2-70

bwe22=β +∑ ii

2 var ()bEbEb222=⎡−⎣⎦ () ⎤

Principles of Econometrics, 3rd Edition Slide 2-71

2 var ()bE22=β+⎣⎦⎡⎤∑ weii −β 2

2 = Ewe⎣⎦⎡⎤∑ ii

⎡⎤22 =+E ⎢⎥∑∑∑weii2 wwee i jij [square of bracketed term] ⎣⎦ij≠

22 =+∑∑∑wEii() e2 wwE ijij() ee [because w i not random] ij≠

22 =σ ∑ wi

σ2 = 2 ∑()xxi −

Principles of Econometrics, 3rd Edition Slide 2-72

24 12/11/2007

2 2 2 2 σ=var()eEeEeEeiii =⎣⎦ ⎡ − () ⎤=[] i − 0 = Ee() i

covee ,=− E⎡⎤ e Ee e − Ee = Eee = 0 ( ij) ⎣⎦( i() i)( j( j)) ( ij)

⎡⎤22 2 ⎢⎥()xxii−−∑() xx 1 wi === ∑∑⎢⎥22222 xx−− xx ∑()xi − x ⎣⎦⎢⎥{}∑∑()ii{}()

var(aX+= bY) a22 var( X) + b var( Y) + 2 ab cov( X , Y )

Principles of Econometrics, 3rd Edition Slide 2-73

var()bwe22=β+ var()∑ ii [since β 2 is a constant]

2 =∑∑∑weii var()+ wwee ijij cov() , [generalizing the variance rule] ij≠

2 =∑ weii var( ) [usi0]ing cov(eeij , ) = 0]

22 2 =σ∑ wei [using var()i =σ ]

σ2 = 2 ∑()xxi −

Principles of Econometrics, 3rd Edition Slide 2-74

* Let bky 2 = ∑ ii be any other linear estimator of β2.

Suppose that ki = wi + ci.

* bkywcywcxe212==+=+β+β+∑∑ii() i i i ∑ ()() i i i i

=+β++β++∑∑()wcii12 () wc ii x i ∑ () wce iii (2F.1)

=β112∑∑∑wci +β i +β wxcxwce ii +β 2 ∑∑ ii +() i + i i

=β122∑∑∑ccxwceiiiiii +β +β +() +

Principles of Econometrics, 3rd Edition Slide 2-75

25 12/11/2007

* E()bc21=β∑∑∑iiiiii +β 22 +β cxwcEe + ( + )() (2F.2)

=β122∑∑ccxiii +β +β

∑∑ccxiii==0 and 0 (2F.3)

* bkywce22==β++∑∑ii() i i i (2F.4)

Principles of Econometrics, 3rd Edition Slide 2-76

⎡⎤ cxii()− x 1 x ∑∑cwii==⎢⎥22 ∑ cx ii −= 2 ∑ c i 0 ⎣⎦⎢⎥∑∑()xxii−−() xx ∑() xx i −

* 2 var(bwcewce22) =⎡ var⎣⎦β ++⎤=+∑∑( iii) ( ii) var ( i)

22 2 222 =σ∑∑∑()wcii + =σ w i +σ c i

22 =+σvar ()bc2 ∑ i

≥ var ()b2

Principles of Econometrics, 3rd Edition Slide 2-77