Discussion-Questions: Philosophy of Science and Theory Building

Lab-session 4:

1. In the attached table are two variables (x and y) with 20 observations each, y is the dependent and x the independent variable, i indicates the observation number. Please estimate the linear relationship between the 2 variables: calculating the two parameters alpha and beta by simple ordinary least square estimation of the model: y_i = alpha + beta*x_i + epsilon_i

i x y 1 6 11 2 -1 -5 3 3 5 4 -1 -21 5 8 53 6 0 -11 7 -3 2 8 1 10 9 4 14 10 7 29 11 2 23 12 1 -4 13 7 31 14 -4 -10 15 4 30 16 4 30 17 4 22 18 -1 -12 19 -3 -28 20 3 25

2. Calculate the standard error of beta. 3. Calculate a two sided t-test for the significance of beta with a significance level of 5 %! Note: the critical values for n=20 (n-2=18) are 2.101 and -2.101 4. Calculate the 95% confidence interval for beta. 5. Calculate the R-squared for the above model.

6. download the file garmit_lab.dta into your m-drive: this file contains a panel of 18 OECD countries from 1961 till 1994. Backgroud-reading: Garrett-Mitchell paper on government spending (sent around by e-mail)

7. open stata

8. change the working directory to your m drive: cd "c:\...\..." open the data file in stata: use garmit_lab.dta

9. summarize the main characteristics of the interesting variables:

sum spend unem growthpc depratio left cdem trade lowwage fdi skand 10. estimate a basic model for government spending and interpret the regression table: Interpret the regression results, coefficients, standard errors, confidence intervals, R², F- Test

reg spend unem growthpc depratio left cdem trade lowwage fdi

11. now change the confidence levels (note the default setting is 95% levels)

set level 99 set level 90 estimate the same regression again, what has changed, how and why?

Set the level back to default:

set level 95 estimate the model again:

12. Now run the same regression but estimate standardized coefficients – how to interpret standardized coefficients? Why do we need standardized coefficients?

reg spend unem growthpc depratio left cdem trade lowwage fdi, beta

Heteroscedasticity and Omitted Variable Bias

13. calculate predicted values and residuals:

predict spend_hat predict spend_resid, resid

14. Now create scatterplots for the residuals against some of the explanatory variables and the country codes: what can you see? Omitted variable bias, heteroskedasticity?

twoway (scatter spend_resid unem) twoway (scatter spend_resid cc) twoway (scatter spend_resid growthpc)

15. Stata provides build in tests for omitted variable bias and heteroskedasticity: How to interprete the test results?

estat hettest estat ovtest estat szroeter unem growthpc depratio left cdem trade lowwage fdi

16. What can we do about Heteroskedasticity and omitted variable bias? reg spend unem growthpc depratio left cdem trade lowwage fdi skand

What can be observed, how to interpret the coefficient for "skand"?

17. Now do the same tests again for the new model and interpret the results:

predict spend_hat2 predict spend_resid2, resid estat hettest estat ovtest twoway (scatter spend spend_hat2 if year==1984, mlabel(country)) twoway (scatter spend_resid2 cc) twoway (scatter spend_resid2 unem) estat szroeter unem growthpc depratio left cdem trade lowwage fdi skand

18. Just treating the standard errors: robust White SEs:

reg spend unem growthpc depratio left cdem trade lowwage fdi, robust

reg spend unem growthpc depratio left cdem trade lowwage fdi, vce(robust)

reg spend unem growthpc depratio left cdem trade lowwage fdi, vce(cluster cc) reg spend unem growthpc depratio left cdem trade lowwage fdi, vce(bootstrap) reg spend unem growthpc depratio left cdem trade lowwage fdi, vce(jacknife) reg spend unem growthpc depratio left cdem trade lowwage fdi, vce(hc2) reg spend unem growthpc depratio left cdem trade lowwage fdi, vce(hc3)

19. GLS: robust Huber-White Sandwich Estimator

xtgls spend unem growthpc depratio left cdem trade lowwage fdi, p(h) open stata change the working directory to your m drive:

cd "c:\...\..." open the data file in stata: use garmit_lab.dta open log file

log using lab5.log estimate a basic model for government spending and interpret the regression table:

Multicollinearity: calculate correlation coefficients for all explanatory variables, do you find problems of multicollinearity?

corr unem growthpc depratio left cdem trade lowwage fdi skand

What is the trade-off between multi-collinearity and omitted variable bias?

If there is complete multicollinearity we don't have to worry, STATA does the job:

Linear transformations of a variable are perfect multicollinear to the original variable:

gen unem2=2*unem+3 corr unem unem2 reg spend unem unem2 if year==1986

Was does Stata do in case of perfect multicollinearity?

Now look at two variables that are highly correlated but not multicollinear, what is the problem?

gen unem3=2*unem+3*invnorm(uniform()) corr unem unem3 reg spend unem unem3 if year==1986 reg spend unem skand if year==1986 reg spend unem unem3 skand if year==1986 run the original regression again:

Run a variance inflation factor test for higher order multicollinearity:

estat vif estat vif, uncentered

How to interprete the vif, what does vif measure?

AV-plots (additional) variables are for multivariate models to show what ceteris paribus effect a single independent variable has:

avplots avplot unem for linear OLS models the marginal effect of a single independent variable is equal for all values of the independent variable:

mfx mfx compute, at(mean) mfx compute, at(median)

Autocorrelation: now lets look at a single time-series, e.g. Germany (or UK) – what has changed, what is the difference to the above models?

reg spend unem growthpc depratio left cdem trade lowwage fdi if country=="Germany"

Do the same tests again: what can you observe?

estat hettest estat ovtest

Now let's turn to another violation of Gauss-Markov assumptions: autocorrelation, serial correlation and test for it: Durbin-Watson statistic and Breusch-Godfrey test, how to interprete the results?

estat dwatson estat bgodfrey

Or a simpler test: what can we observe?

predict spend_residger, resid gen lagspend_residger=l.spend_residger

reg spend_residger lagspend_residger unem growthpc depratio left cdem trade lowwage fdi if country == "Germany" the simplest way to deal with serial correlation is to include the lagged values of the dependent variable to the right hand side of the model – the LDV (lagged dependent variable; BUT with including and excluding variables from the model and do some more Omitted variable bias and heteroskedasticity tests…

Now do the same for the UK!

: opens another can of worms… We will talk about this in more detail when we talk about time series models): How to interpret the coefficient of the LDV?

reg spend spendl unem growthpc depratio left cdem trade lowwage fdi if country=="Germany"

And do the tests again:

estat dwatson estat bgodfrey Run a Prais-Winsten model:

prais spend unem growthpc depratio left cdem trade lowwage fdi if country=="Germany"

Play a little around

Interpretatipon of a dummy variable and interaction effects:

reg spend skand if year==1986 reg spend unem skand gen skand_unem=unem*skand reg spend unem skand skand_unem reg spend unem skand skand_unem growthpc depratio left cdem trade lowwage fdi

Interaction effects:

gen unem_trade=unem*trade

reg spend unem trade growthpc depratio left cdem lowwage fdi

reg spend unem trade unem_trade growthpc depratio left cdem lowwage fdi

sslope spend unem trade unem_trade growthpc depratio left cdem lowwage fdi, i(unem trade unem_trade)

sslope spend unem trade unem_trade growthpc depratio left cdem lowwage fdi, i(trade unem unem_trade) graphical display of IA effects:

sslope spend unem trade unem_trade growthpc depratio left cdem lowwage fdi, i(unem trade unem_trade) graph

sslope spend unem trade unem_trade growthpc depratio left cdem lowwage fdi, i(trade unem unem_trade) graph

sum trade unem Program marginal effects of IA effects:

capture drop MV-lower reg spend unem trade unem_trade growthpc depratio left cdem lowwage fdi generate MV=((_n-1)*10) replace MV=. if _n>17 matrix b=e(b) matrix V=e(V) scalar b1=b[1,1] scalar b2=b[1,2] scalar b3=b[1,3] scalar varb1=V[1,1] scalar varb2=V[2,2] scalar varb3=V[3,3] scalar covb1b3=V[1,3] scalar covb2b3=V[2,3] scalar list b1 b2 b3 varb1 varb2 varb3 covb1b3 covb2b3 gen conb=b1+b3*MV if _n<=17 gen conse=sqrt(varb1+varb3*(MV^2)+2*covb1b3*MV) if _n<=171 gen a=1.96*conse gen upper=conb+a gen lower=conb-a graph twoway (line conb MV, clwidth(medium) clcolor(blue) clcolor(black))/* */(line upper MV, clpattern(dash) clwidth(thin) clcolor(black))/* */(line lower MV, clpattern(dash) clwidth(thin) clcolor(black)) , /* */ xlabel(0 20 40 60 80 100 120 140 160, labsize(2.5)) /* */ ylabel(-0.5 0 0.5 1 1.5 2, labsize(2.5)) yscale(noline) xscale(noline) legend(col(1) order(1 2) label(1 "Marginal Effect of Unemployment on Spending") label(2 "95% Confidence Interval") /* */ label(3 " ")) yline(0, lcolor(black)) title("Marginal Effect of Unemployment on Spending as Trade Openness changes", size(4))/* */ subtitle(" " "Dependent Variable: Government Spending" " ", size(3)) xtitle( Trade Openness, size(3) ) xsca(titlegap(2)) ysca(titlegap(2)) /* */ ytitle("Marginal Effect of Unemployment", size(3)) scheme(s2mono) graphregion(fcolor(white)) graph export m:\...\unem_trade1.eps, replace translate @Graph m:\...\unem_trade1.wmf, replace capture drop MV-lower

or much easier: download grinter.ado from Fred Boehmke

net from http://myweb.uiowa.edu/fboehmke/stata/grinter

reg spend unem trade unem_trade growthpc depratio left cdem lowwage fdi

grinter unem, inter (unem_trade) const02(trade) depvar(spend) kdensity yline(0) test for functional form:

acprplot unem, mspline acprplot left, mspline

etc…

gen ln_unem = ln(unem) reg spend ln_unem growthpc depratio left cdem trade lowwage fdi

gen sqrt_unem=unem^2 reg spend unem sqrt_unem growthpc depratio left cdem trade lowwage fdi

what effect can we observe? Calculate the “turning point”!

gen sqrt_unem=unem^2

Outliers: test for outliers:

dotplot spend symplot spend

rvfplot

lvr2plot lvr2plot, ml(country)

dfbeta

Solution: jacknife and bootstrapping:

jacknife _b _se, eclass saving(name.dta): reg spend unem growthpc depratio left cdem trade lowwage fdi skand

bootstrap _b _se, reps(1000) saving(name.dta): reg spend unem growthpc depratio left cdem trade lowwage fdi skand