Quantitative Methods in Regulation

Quantitative techniques in practice Introduction to panel data

Quantitative Techniques in Regulation in Practice

Panel data

In last week’s lab we looked at the use of single year cross sections to carry out comparative efficiency assessment.

Panel data methods are ways to get good estimates when you have data on more than one year.

Suppose you have several years' data on the same company. That is called a time series. Issues which arise include serial correlation, and the process of adjustment to equilibrium, and as usual with econometrics, multicollinearity.

Much macro-economic modelling has been done using time series data. Time series data are also used for estimation of beta coefficients in the CAPM model of asset pricing and the cost of capital.

You have panel data when you have data over time on several members of a panel. The concept is simplest when all the members of the panel are present for every time period.

Panel data analysis allows you to do things that are not possible with pure time series or cross-section data[1]. At the same time the possibilities of abuse and misinterpretation are also increased!

The simplest form of panel data estimation is pooled cross-section time series analysis. All you do here is to treat every observation the same and just calculate an OLS regression equation. However, it is unlikely that OLS will give you the best results. They will almost certainly be inefficient and may possibly be biased, depending on the underlying process which generated the data.

Recall that unbiasedness of OLS requires that the explanatory variables are not correlated with the error terms, and that efficiency requires that all the error terms are independent of each other and of constant variance.

The fact that you are following the same individuals over time suggests a lack of independence between the observations of a particular individual. The way we view that dependency will affect the best way to estimate the relationship.

Consider the model

yit = bXit + vit

The error term varies over time and over the individual (household, company, whatever). Let N be the number of panel members (e.g. companies) and T be the number of time periods.

In the fixed-effects formulation which was always used in early panel data studies we could think of the overall error term vit in the following way:

vit = aMi + cPt + uit

M is a firm-effects variable and P is a time-effects variable. Unfortunately we cannot observe M and P directly, so potentially we have an omitted variable problem. The best we can do is to estimate a variable-intercepts model, where the intercept term varies according to both the firm and the time period.

See Hsiao page 27 for an example.

There are two equivalent ways to implement the fixed-effect model. The first is to introduce a set of (N-l) dummy variables for the firm effects and (T-l) dummy variables for the time specific fixed effects.

Equivalently one takes deviations of the data around the company mean (i.e. replace Xit with Xit – Average(Xi ), similarly with the y’s. For time-varying fixed effects we take the deviations around the time mean and use these deviations as the variables.

The disadvantage of this approach is that one is in danger of throwing away useful information contained in the cross sectional variation of the Xs and y s. The random effects model is more general in that instead of assuming that there is a fixed difference between companies or between time periods, different observations of the same company are simply correlated with each other.

i.e. E(vit vis) = rl > 0 for s ¹ t, and for the time dimension

E(vit vjt) = r2 >0 for i ¹ j.

This leads to a different method of estimation based on feasible generalised least squares. Typically a two stage approach is used. First run OLS to get an estimate of rl and r2 and then calculate the GLS estimator taking into account the estimated values of rl and r2. An alternative approach is called feasible maximum likelihood. This uses OLS as a starting point of an optimisation routine that calculates the maximum of a specified likelihood function. Maximum likelihood methods are asymptotically efficient.

You do not need to worry about the details of the estimation procedure, as the software will run it for you!

You can run both the pooled OLS and the fixed-effects models using Excel as long as you know how to create dummy variables. The next exercise will provide you with some telecoms data with which to try it out.

Warning: standard panel data methods only work with static models. Specifically, if your model requires the use of a lagged dependent variable the result will be biased and inconsistent. Fortunately, for our purposes the panel is an extension of a static cross section.

Testing for fixed effects versus random effects. The normal test to use is a Hausman test, which tests to whether there are important fixed effects which are correlated with the explanatory variables. If such effects exist, then the random effects estimator would be biased. In the applications that concern us, if big companies were systematically more efficient, we would get a downward bias in the coefficient of cost on size. This would result in an underestimate of the efficiency of large companies and an overestimate for smaller companies. In EViews you have to calculate your own Hausman test.

2

The City University

[1] See Hsiao Analysis of Panel Data, Cambridge University Press 1986, pages 1-3