12/10/2007

Prediction, Goodness-of-Fit, and Modeling Issues

Prepared by Vera Tabakova, East Carolina University

ƒ 4.1

ƒ 4.2 Measuring Goodness-of-Fit

ƒ 4.3 Modlideling Issues

ƒ 4.4 Log-Linear Models

Principles of Econometrics, 3rd Edition Slide 4-2

yxe01200=+ββ + (4.1)

where e0 is a random error. We assume that Ey ( 0120 ) =β +β x and . We also ass ume that 2 and E (e0 ) = 0 var (e0 ) =σ

cov(ee0 ,i ) == 0 i 1,2,… , N

ybbxˆ0120=+ (4.2)

Principles of Econometrics, 3rd Edition Slide 4-3

1 12/10/2007

Figure 4.1 A point prediction

Principles of Econometrics, 3rd Edition Slide 4-4

f =−=β+β+yy00ˆ ( 1200 xe) −( bbx 120 + ) (4.3)

Ef( ) = β+β120 x + Ee( 0) − ⎣⎦⎡+ ⎡ Eb( 1) + Ebx( 20) ⎤

=β120 +βxx +00 −[] β 120 +β =

2 2 ⎡⎤1 ()xx0 − var(f )=σ⎢⎥ 1 + + 2 (4.4) ⎣⎦Nxx∑()i −

Principles of Econometrics, 3rd Edition Slide 4-5

The of the forecast error is smaller when i. the overall uncertainty in the model is smaller, as measured by the variance of the random errors ; ii. the sample size N is larger; iii. the variation in the exppylanatory variable is larg g;er; and iv. the value of is small.

Principles of Econometrics, 3rd Edition Slide 4-6

2 12/10/2007

⎡⎤2 2 1(xx0 − ) var(f )=σˆ ⎢⎥ 1 + + 2 ⎣⎦Nxx∑()i −

se()f = var ()f (4.5)

ytˆ0 ± cse( f) (4.6)

Principles of Econometrics, 3rd Edition Slide 4-7

y

Figure 4.2 Point and interval prediction

Principles of Econometrics, 3rd Edition Slide 4-8

ybbxˆ0120=+ =83.4160 + 10.2096(20) = 287.6089

⎡⎤2 2 1(xx0 − ) var(f )=σˆ ⎢⎥ 1 + + 2 ⎣⎦Nxx∑()i −

22 22σσˆˆ =σˆ + +()xx0 − 2 N ∑()xxi −

σˆ 2 =σˆ 22 + +()varxx − () b N 02

ytˆ0 ±=cse( f ) 287.6069 ± 2.0244(90.6328) =[ 104.1323,471.0854]

Principles of Econometrics, 3rd Edition Slide 4-9

3 12/10/2007

yxeiii=β12 +β + (4.7)

yEyeiii=+() (4.8)

yyeiii=+ˆˆ (4.9)

yyiii−=() yyeˆˆ − + (4.10)

Principles of Econometrics, 3rd Edition Slide 4-10

Figure 4.3 Explained and unexplained components of yi

Principles of Econometrics, 3rd Edition Slide 4-11

()yy− 2 σ=ˆ 2 ∑ i y N −1

222(4.11) ∑∑∑()()yyiii−= yyˆˆ −+ e

Principles of Econometrics, 3rd Edition Slide 4-12

4 12/10/2007

2 ƒ ∑()yi −y = total sum of squares = SST: a measure of total variation in y about the sample .

2 ƒ ∑()yˆi − y = sum of squares due to the regression = SSR: that part of total variation in y, about the sample mean, that is explained by, or due to, the regression. Also known as the “explained sum of squares .”

2 ƒ ∑eˆi = sum of squares due to error = SSE: that part of total variation in y about its mean that is not explained by the regression. Also known as the unexplained sum of squares, the residual sum of squares, or the sum of squared errors.

ƒ SST = SSR + SSE

Principles of Econometrics, 3rd Edition Slide 4-13

2 SSR SSE R ==−1 (4.12) SST SST

2 ƒ The closer R is to one, the closer the sample values yi are to the fitted regression 2 equation ybbxˆ ii =+ 12 . If R = 1, then all the sample fall exactly on the fitted least squares line, so SSE = 0, and the model fits the data “perfectly.” If the sample data for y and x are uncorrelated and show no linear association, then the least squares fitted line is “horizontal,” so that SSR = 0 and R2 = 0.

Principles of Econometrics, 3rd Edition Slide 4-14

cov(xy , ) σxy ρ=xy = (4.13) var(xy ) var( ) σσx y

σˆ xy rxy = (4.14) σσˆˆx y

σ=ˆ xy∑()()xxyy i − i −( N − 1)

2 σ=ˆ xi∑()xx −() N − 1 (4.15)

2 σ=ˆ yi∑()yy −() N − 1

Principles of Econometrics, 3rd Edition Slide 4-15

5 12/10/2007

22 rRxy =

22 R = ryyˆ

2 R measures the linear association, or goodness-of-fit, between the sample data 2 and their predicted values. Consequently R is sometimes called a measure of “goodness-of-fit.”

Principles of Econometrics, 3rd Edition Slide 4-16

2 SST=−=∑() yi y 495132.160

2 2 SSE=−==∑∑() yii yˆˆ e i304505.176

SSE 304505.176 R2 =−11 =− = .385 SST 495132.160

σˆ xy 478.75 rxy == =.62 σσˆˆxy ()()6.848 112.675

Principles of Econometrics, 3rd Edition Slide 4-17

Figure 4.4 Plot of predicted y, yˆ against y

Principles of Econometrics, 3rd Edition Slide 4-18

6 12/10/2007

ƒ FOOD_EXP = weekly food expenditure by a household of size 3, in dollars

ƒ INCOME = weekly household income, in $100 units

FOOD_EXP =83.42+= 10.21 INCOME R2 .385 (se) (43.41)**** (2.09)

* indicates significant at the 10% level ** indicates significant at the 5% level *** indicates significant at the 1% level

Principles of Econometrics, 3rd Edition Slide 4-19

ƒ 4.3.1 The Effects of Scaling the Data

ƒ Changing the scale of x:

** yxecxcexe= β12+ β + =()(/)β 1+ β 2+ =β 12+ β +

** where ββ22==cxxc and

ƒ Changing the scale of y:

*** * yc/(/)(/)(/)=β12 c +β cx + ec or y =β+β 12 x + e

Principles of Econometrics, 3rd Edition Slide 4-20

Variable transformations:

ƒ Power: if x is a variable then xp raising the variable to the power p; examples are quadratic (x2) and cubic (x3) transformations.

ƒ The natural logarithm: if x is a variable then its natural logg(arithm is ln(x).

ƒ The reciprocal: if x is a variable then its reciprocal is 1/x.

Principles of Econometrics, 3rd Edition Slide 4-21

7 12/10/2007

Figure 4.5 A nonlinear relationship between food expenditure and income

Principles of Econometrics, 3rd Edition Slide 4-22

ƒ The log-log model

ln(yx )=β12 +β ln( ) The parameter β is the elasticity of y with respect to x.

ƒ The log-

ln(yxii ) =β12 +β

A one-unit increase in x leads to (approximately) a 100 β2 percent change in y.

ƒ The linear-log model Δβy yx=β +βln() or = 2 12 100()Δ xx 100

A 1% increase in x leads to a β2/100 unit change in y.

Principles of Econometrics, 3rd Edition Slide 4-23

ƒ The reciprocal model is 1 FOOD_ EXP= β + β + e 12INCOME

ƒ The linear-log model is

FOOD_ln() EXP=β12 +β INCOME + e

Principles of Econometrics, 3rd Edition Slide 4-24

8 12/10/2007

Remark: Given this array of models, that involve different transformations of the dependent and independent variables, and some of which have similar shapes, what are some ggguidelines for choosing a functional form? 1. Choose a shape that is consistent with what economic theory tells us about the relationship. 2. Choose a shape that is sufficiently flexible to “fit” the data 3. Choose a shape so that assumptions SR1-SR6 are satisfied, ensuring that the least squares estimators have the desirable properties described in Chapters 2 and 3.

Principles of Econometrics, 3rd Edition Slide 4-25

Figure 4.6 EViews output: residuals and summary for food expenditure example

Principles of Econometrics, 3rd Edition Slide 4-26

ƒ The Jarque-Bera is given by

2 N ⎛⎞()K − 3 JB=+⎜⎟ S 2 ⎜⎟ 64⎝⎠ where N is the sample size, S is , and K is .

ƒ In the food expenditure example

2 40 ⎛⎞()2.99− 3 JB =−⎜⎟.0972 + = .063 ⎜⎟ 64⎝⎠

Principles of Econometrics, 3rd Edition Slide 4-27

9 12/10/2007

Figure 4.7 of wheat yield over time

Principles of Econometrics, 3rd Edition Slide 4-28

YIELDttt=β12 +β TIME + e

2 YIELDtt=+.638 .0210 TIME R = .649 (se) (.064) (.0022)

Principles of Econometrics, 3rd Edition Slide 4-29

Figure 4.8 Predicted, actual and residual values from straight line

Principles of Econometrics, 3rd Edition Slide 4-30

10 12/10/2007

Figure 4.9 of residuals from straight line

Principles of Econometrics, 3rd Edition Slide 4-31

3 YIELDttt=β12 +β TIME + e

TIMECUBE= TIME3 1000000

2 YIELDtt=+0.874 9.68 TIMECUBE R = 0.751 (se) (.036) (.082)

Principles of Econometrics, 3rd Edition Slide 4-32

Figure 4.10 Fitted, actual and residual values from equation with cubic term

Principles of Econometrics, 3rd Edition Slide 4-33

11 12/10/2007

ƒ 4.4.1 The Growth Model

ln(YIELDt ) =++ ln( YIELD0 ) ln( 1 g) t

= β+β12t

ln()YIELDt =− .3434 + .0178 t (se) (.0584) (.0021)

Principles of Econometrics, 3rd Edition Slide 4-34

ƒ 4.4.2 A Wage Equation

ln(WAGE) =++ ln( WAGE0 ) ln( 1 r) EDUC

= βββ12+ β EDUC

ln()WAGE=+× .7884 .1038 EDUC (se) (.0849) (.0063)

Principles of Econometrics, 3rd Edition Slide 4-35

ƒ 4.4.3 Prediction in the Log-Linear Model

yybbxˆ ==+exp ln exp n ( ()) ()12

22σˆ 2 yEˆˆˆcn==Ebb( y) exp(bb12 ++σ=xye2)

ln()WAGE=+× .7884 .1038 EDUC =+×= .7884 .1038 12 2.0335

σˆ 2 2 yEyyeˆˆcn==() =7.6408 × 1.1276 = 8.6161

Principles of Econometrics, 3rd Edition Slide 4-36

12 12/10/2007

ƒ 4.4.4 A Generalized R2 Measure

2 22ˆ Ryyrgyy=⎡⎣⎦corr() , ⎤ = , ˆ

2 22ˆ Ryygc=⎡⎣⎦corr() , ⎤ = .4739 = .2246

R2 values tend to be small with microeconomic, cross-sectional data, because the variations in individual behavior are difficult to fully explain.

Principles of Econometrics, 3rd Edition Slide 4-37

ƒ 4.4.5 Prediction Intervals in the Log-Linear Model

⎡⎤expln()y −+tf se() ,expln() ytf se() ⎣⎦⎢⎥( cc) ( )

⎣⎦⎡−×exp( 2.0335 1.96 .4905) ,exp( 2.0335 +×⎤= 1.96 .4905) [ 2.9184,20.0046]

Principles of Econometrics, 3rd Edition Slide 4-38

ƒ coefficient of ƒ linear model determination ƒ linear relationship ƒ correlation ƒ linear-log model ƒ data scale ƒ log-linear model ƒ forecast error ƒ log-log model ƒ forecast standard ƒ log-normal error distribution ƒ functional form ƒ prediction ƒ goodness-of-fit ƒ ƒ growth model ƒ R2 ƒ Jarque-Bera test ƒ residual ƒ kurtosis ƒ skewness ƒ least squares predictor

Principles of Econometrics, 3rd Edition Slide 4-39

13 12/10/2007

Principles of Econometrics, 3rd Edition Slide 4-40

f =−=β+β+yy00ˆ ( 1200 xe) −+( bbx 120)

2 var( yˆ0120102012) =+=+ var(bbx) var( b) x var( b) + 2 x cov( bb , )

22 2 σ ∑ xi 22σ−x =+22xx00 +σ2 2 Nxx∑∑()ii−−() xx ∑() xx i −

Principles of Econometrics, 3rd Edition Slide 4-41

⎡⎤⎡⎤22 ⎧⎫22 222 ⎧⎫ 22 σ ∑ xi ⎪⎪σσNx x0 σ−()2xx0 ⎪⎪ σ Nx var ()yˆ0 =−⎢⎥⎢⎥⎨⎬ +++ ⎨⎬ N xx−−−−−22222 N xx xx xx N xx ⎣⎦⎣⎦⎢⎥⎢⎥∑∑∑∑∑()iiiii⎩⎭⎪⎪() () () ⎩⎭⎪⎪()

⎡⎤2222 2 ∑ xNxi − xxxx00−+2 =σ⎢⎥22 + ⎣⎦⎢⎥Nxx∑∑( ii−−) ( xx)

⎡⎤22 2 ∑()xxi −−() xx0 =σ⎢⎥22 + ⎣⎦⎢⎥Nxx∑∑()ii−−() xx

2 ⎡⎤1 ()xx− =σ2 ⎢⎥ + 0 N 2 ⎣⎦⎢⎥∑()xi − x

Principles of Econometrics, 3rd Edition Slide 4-42

14 12/10/2007

f ~(0,1)N var(f )

2 2 ⎡⎤1(xx0 − ) var( f ) =σˆ ⎢⎥ 1 + + 2 ⎣⎦Nxx∑()i −

fyy00− ˆ = ~ t(2)N− (4A.1) var()f se(f )

(4A.2) Pt()1−≤≤cc tt =−α

Principles of Econometrics, 3rd Edition Slide 4-43

yy− ˆ Pt[]1−≤00 ≤ t =−α ccse(f )

Py[ ˆˆ000−≤≤+=−α tccse( f ) y y t se( f )] 1

Principles of Econometrics, 3rd Edition Slide 4-44

2 2 22 ()yiiiiiii−=−+=−++−yyyeyyeyye[()ˆˆˆˆˆˆ] () 2()

2 22 ∑∑∑∑( yiiiii−=yyyeyye) ()ˆˆˆˆ −++ 2()−

∑∑∑∑∑( yˆˆˆˆˆii−=ye) yeye iii − =( bbxeye12 + iii) ˆ − ˆ

=+bebxeye12∑∑ˆˆˆiiii − ∑

Principles of Econometrics, 3rd Edition Slide 4-45

15 12/10/2007

∑∑eybbxyNbbxˆii=−−=−−=( 12 ii) ∑ 12 ∑ i0

2 ∑∑xeiiˆ =−−=−−= x i( y i b12 b x i) ∑ xy i i b 1 ∑∑ x i b 2 x i 0

∑( yyeˆˆii−=) 0

If the model contains an intercept it is guaranteed that SST = SSR + SSE.

If, however, the model does not contain an intercept, then ∑ e ˆ i ≠ 0 and SST ≠ SSR + SSE.

Principles of Econometrics, 3rd Edition Slide 4-46

Suppose that the variable y has a normal distribution, with mean μ and variance σ2. If we consider we = y then ywN =μσ ln ( ) ~ ( , 2 ) is said to have a log-normal distribution.

2 E (we) = μ+σ 2

22 var()we=−2μ+σ( e σ 1)

Principles of Econometrics, 3rd Edition Slide 4-47

Given the log-linear model ln ()y =β12 +βxe +

If we assume that eN~0,( σ2 )

βββ12+β xii+exeβ ββ 12+β i i Ey( i ) === Ee( ) Ee( e)

2 2 eEeeeβ+β12xeii()== β+β 12 x iσ 2 e β+β 12 x i +σ 2

2 bbx12++σi ˆ 2 Ey()i = e

Principles of Econometrics, 3rd Edition Slide 4-48

16 12/10/2007

The growth and wage equations:

β2 β=2 ln( 1 +r) and re=−1

bN~var~ ,var b2 xx2 222(β=σ( ) ∑( i − ) )

b β+var(b ) /2 bb22+var() /2 2 22 ˆ Ee⎣⎦⎡⎤= e re=−1

2 2 var()bxx2 =σˆ ∑()i −

Principles of Econometrics, 3rd Edition Slide 4-49

17