Cointegration and Antitrust: a Primer
Total Page:16
File Type:pdf, Size:1020Kb
Economics Committee Newsletter Cointegration and Antitrust: A Primer Jonathan L. Rubin, J.D., Ph.D.* American Antitrust Institute Introduction On October 8, 2003, Robert F. Engle and It is this last (and most technical) aspect of Clive W . J. Granger were awarded the cointegration which accounts for its Nobel Prize for their research on the influence in the econometric world. statistical analysis of economic time series. Cointegration methods will inevitably make Both made important contributions on their their way into the statistical analysis of own, but their most influential work by far is antitrust issues and, ultimately, into the contained in a short and elegant paper they courtroom. The purpose of this article is to published together in 1987.1 Their paper introduce the intuition behind cointegration influenced the way statisticians perform in the context of antitrust econometrics. The almost all regression analysis. focus will be the multivariate cointegration model pioneered by Johansen.2 Their insight, known as cointegration, has been described as a method of uncovering The Nature of Time Series and the long-run relationships between variables that Problem of Spurious Regression are concealed by the noise of short-term fluctuations. An engineer might look at this Econometric studies relevant to antitrust as disentangling the “signal” from the issues are often concerned with time series, “noise.” An economist could consider it a i.e., a list of n sequential observations, Xt = way of distinguishing between a random {x1, x2, x3, ..., xn} of a particular variable that fluctuation and a correction back to an varies over time. The graph of a typical equilibrium level. A statistician would price series is given in Fig. 1, which shows regard it as a way of doing regression the price of a particular variety and grade of analysis on non-stationary (i.e., lumber over a 22-year period. stochastically trending) variables that gives statistically valid results. Volume4, Number 1 10 Spring 2004 Economics Committee Newsletter Figure 1: Real Price of Lumber, 1975-1996 This time series consists of 88 quarterly understating it in the latter part. A strategy observations. The mean of the sample (i.e., involving waiting a quarter or two for the the average price) is $1,096.75, which is price to revert to the mean would nearly indicated on the graph by the horizontal always fail. Econometricians call this dotted line. What is most obvious about the property “non-stationarity,” and the price data in Fig. 1 is its tendency to move from variable in this case is said to be “non- the lower left of the graph to the upper right, stationary.” which is typical in any market in which prices tend to increase over time (the prices An example of a stationary variable would shown are real, that is, they have been be the time series defined as the difference corrected to eliminate the effect of of this price series, telling us to look at the inflation). The significance of this is that time series consisting of the differences of the sample mean summarizes the price quite the prices from one observation to the next. poorly. Except for the periods around 1983 The graph of the lumber prices in or mid-1993, the statement, “The average differences is shown in Fig. 2. price over the sample is $1,096.75” is fairly uninformative, greatly overstating the price in the earlier part of the sample while Volume 4, Number 1 11 Spring 2004 Economics Committee Newsletter Figure 2: Real Price of Lumber in Differences, 1975-1996 Again, the mean, or average, difference, in “BLUE,” provided that the assumptions this case $6.84, is indicated by the underlying the regression model are horizontal dotted line. While the price in fulfilled. Regression estimates that are not levels in Fig. 1 crossed the mean three times, BLUE are more likely to be excluded under the price in differences crosses the mean 42 the Daubert standard, and regression studies times. Clearly, the difference from any given involving non-stationary data are not BLUE quarter to the next may differ widely from because they do not fulfill the OLS the mean, a quarter or two later the assumptions. Econometric studies that do difference reverts to the average. Such a not take non-stationarity into account are mean-reverting series is said to be flawed, and have little probative value. It “stationary.” has been shown that regressing two non- stationary series leads to false positives, also The distinction between stationary and non- known as “spurious regression.”3 stationary time series is important because Econometric relationships that appear to be these data have dramatically different statistically significant in the presence of statistical properties. Standard regression non-stationarity may not, in fact, have any analysis, also known as “ordinary least meaningful relationship whatsoever. squares,” or “OLS,” is said to give the Best, Linear, Unbiased Estimates, i.e., they are Volume 4, Number 1 12 Spring 2004 Economics Committee Newsletter Integration and Cointegration Multivariate Cointegration The technical term for non-stationary time Cointegration theory reaches far beyond series is that they are “integrated.” The explaining, and being able to correct for, cause of such integration can be traced to the spurious regression. It also easily permits a accumulation of random influences on the superior approach to multiple regression variable. The simplest integrated process is modeling which virtually eliminates known as a “random walk.” The random simultaneity bias. Simultaneity bias in walk process is said to be integrated of order regression analysis results when causality one because, like the price series in Fig. 1, if runs not only from the explanatory variable it is differenced once it becomes stationary. to the dependent variable, but “feeds back” More generally, if a variable can be made from the dependent variable to the stationary by differencing it d times, it is explanatory variable, as well. This problem said to be integrated of order d. The concept is discussed in a widely available reference of an integrated time series has not only on scientific evidence, in which Professor been extended to higher orders of d, but to Rubinfeld states, fractional values of d as well. The assumption of no feedback is Ordinarily, the sum of two non-stationary especially important in litigation, because it is possible for the (integrated) time series is also non- defendant (if responsible, for stationary. On occasion, however, a unique example, for price-fixing or combination of two integrated time series discrimination) to affect the values results in a stationary time series, in which of the explanatory variables and case it is said that the data is cointegrated. thus to bias the usual statistical tests that are used in multiple regression.4 Intuitively, two series that are cointegrated may be individually non-stationary, but they The problem is illustrated by supposing that will not move too far apart over time. A the defendant’s expert wants to demonstrate common heuristic example of two that the price of a product, Pt, is determined cointegrated series is that of a drunk dog- by three variables: a demand variable, Dt, a owner walking in a desert. Assuming the cost variable, Ct, and advertising, At. owner is drunk enough to have no sense of Provided that the “no feedback” assumption direction (and does not double-back), his is fulfilled, the researcher might estimate a path might resemble a (non-stationary) multivariate model of the form random walk. His dog’s path might also look like a random walk. At any one time P D C A t = α + β1 t + β 2 t + β 3 t + εt . they may be close together, and at another time further apart, but over the long run they Setting aside for the time being the spurious will move together, and never take off in regression problem, if the explanatory opposite directions. Before any regression variables together account for all but analysis can be considered valid, therefore, residual random variations in the price, and the econometrician must be satisfied either the parameters, ßi, are all statistically that a) the regressors are stationary, or, b) significant, estimations from this model may the regressors are cointegrated. constitute sound statistical evidence. However, if demand also reacts to price, i.e., Pt also causes variations in Dt, then simultaneity bias will invalidate the results. Volume 4, Number 1 13 Spring 2004 Economics Committee Newsletter To remedy this, Dr. Rubinfeld suggests provide probative evidence, the cointegrated dropping the questionable variable to VAR model represents a superior approach. determine whether its exclusion makes a But because there are numerous contexts in difference, or expanding the model by which the interpretation of a stationary adding one or more equations that explain cointegrating process can be theoretically the feedback effect. meaningful, cointegration analysis can provide a wealth of other kinds of The cointegration approach generally solves information to a fact-finder. Of particular the problem by expanding the model into a interest to the antitrust practitioner is the system of equations in which each variable case in which statistical evidence is needed may influence every other variable. The to determine product or market delineation. statistical significance of the dependence of each variable on every other variable can The use of statistical correlation between then be tested. Instead of the researcher price series has be justifiably criticized assuming that Pt should be considered the (because of the spurious regression problem, dependent variable and that Dt, Ct, and At inter alia) as a means of determining should be the explanatory variables, the whether price realizations from potentially direction of causality as between each substitutable products or from different variable can be tested within the model to geographical areas belong to the same or arrive at a specification that does not suffer separate markets.5 But the cointegration from simultaneity bias.