<<

Learn About Cross- Correlations in Stata With Data From the USDA Feed Grains Database (1876–2015)

© 2020 SAGE Publications, Ltd. All Rights Reserved. This PDF has been generated from SAGE Research Methods Datasets. SAGE SAGE Research Methods Datasets Part 2020 SAGE Publications, Ltd. All Rights Reserved. 1 Learn About Time Series Cross- Correlations in Stata With Data From the USDA Feed Grains Database (1876–2015)

Student Guide

Introduction This dataset example introduces researchers to estimating cross-correlations between two time series variables. Cross-correlations help researchers explore whether two variables are related to each other and, if so, whether movement in one variable tends to precede or follow movement in the other. This example uses a subset of data from the United States Department of Agriculture (USDA) Database. It examines the cross-correlation between the average annual prices per bushel for barley and oats in the United States from 1876 to 2015. Understanding whether prices for two grains are correlated and, if so, whether one price leads or follows the other could help policy makers, farmers, and economists make better forecasts of future agricultural prices.

What Is a Cross-Correlation? When a single variable is measured repeatedly over time it is referred to as a time series. Values of two time series variables may be correlated with each other. They may be correlated contemporaneously, meaning that when values of one of the variables go up or down, the values of the other variable tend to go up or down in the same years. However, they could also be correlated at different time

Page 2 of 10 Learn About Time Series Cross-Correlations in Stata With Data From the USDA Feed Grains Database (1876–2015) SAGE SAGE Research Methods Datasets Part 2020 SAGE Publications, Ltd. All Rights Reserved. 1 points. For example, increases in one variable might lead to increases in the other variable at future points in time rather than at the same time. Cross-correlations can be used to evaluate whether two time series variables are associated with each other, and if so, whether movement in one precedes the other, follows the other, or happens at the same time. In this regard, a cross-correlation might be used as a first step toward evaluating a potential causal relationship between two variables measured over time. Cross-correlations are often estimated prior to estimating Vector Autoregression (VAR) models and/or testing whether or not two time series variables cause each other.

When researchers compute and plot a series of cross-correlations across a range of lags, it is sometimes called a cross-correlation function. Cross-correlation functions are generally presented in plots with bars at each lag indicating the strength and direction of the correlation between two variables at given distances apart.

Consider a hypothetical cross-correlation function for two variables named X and Y. Further suppose that we ordered the variables as first X then Y. A cross- correlation function would estimate the correlation between X and Y at several different lag lengths – both positive and negative. A significant correlation at negative lags would indicate that movement in the first variable, X, occurs in time before a corresponding movement in Y occurs. In other words, significant cross-correlations at negative lags indicate that the first variable leads the second variable. In contrast, significant cross-correlations at positive lags indicate that changes in X follow changes in Y.

It is possible for cross-correlations between two variables to be significant for both negative and positive lags. Such a pattern would indicate that changes in each variable both lead and respond to changes in the other variable. Finally, it is also possible that the two series are only correlated contemporaneously, which would

Page 3 of 10 Learn About Time Series Cross-Correlations in Stata With Data From the USDA Feed Grains Database (1876–2015) SAGE SAGE Research Methods Datasets Part 2020 SAGE Publications, Ltd. All Rights Reserved. 1 manifest itself as a significant correlation at lag = 0 but not at any negative or positive lags.

Cross-correlations can be estimated on two time series variables prior to making any transformations in them. However, such cross-correlations can be misleading if one or both of the variables is non-stationary. A stationary time series is one where the main statistical properties of the series, such as its , variance, autocorrelation, etc., are constant throughout the entire series. A trend in the data, such as a steady increase on average over time in the values of the series, violates the idea of a constant mean.

Stationarity can be evaluated graphically by examining the ACF of each time series of interest. See the SAGE Research Methods Datasets example on Time Series ACFs and PACFs for more information. The most common treatment for non-stationarity in a time series is to calculate the first difference of the series (also called differencing the series). You calculate the first difference of a series by taking each observation and subtracting from it its previous value. Then you can compute the cross-correlations between the two series.

Some researchers suggest estimating a full ARIMA model for both time series and then estimating the cross-correlation function on the residuals of those two ARIMA models (see the SAGE Research Methods Datasets example for Time Series ARIMA Models for more information). Following this approach requires that the two ARIMA models have the same configuration. For this example we will focus on just differencing each time series in question to ensure stationarity.

Because cross-correlation functions focus on computing correlations, they are most appropriate for use when examining two continuous variables. In theory, continuous variables are variables that can take on any numerical value within their range. In practice, variables that take on a large number of different values within their range are treated as continuous. Variables like age measured in years,

Page 4 of 10 Learn About Time Series Cross-Correlations in Stata With Data From the USDA Feed Grains Database (1876–2015) SAGE SAGE Research Methods Datasets Part 2020 SAGE Publications, Ltd. All Rights Reserved. 1 income measured in dollars, or unemployment measured in percentages are all good examples.

Illustrative Example: U.S. Barley and Oats Prices per Bushel, 1876–2015 This example explores annual average prices per bushel for barley and oats in the United States from 1876 to 2015. The research question is just a simple descriptive one:

Are annual average prices per bushel for barley and oats in the U.S. correlated with each other over time?

The Data This example uses two variables from the USDA Database:

• Average barley prices per bushel in the U.S. in a given year (barleyprice), measured annually in dollars. • Average oats prices per bushel in the U.S. in a given year (oatsprice), measured annually in dollars.

Average annual barley prices range from a low of 0.23 to a high of 6.43 dollars per bushel, with a mean of 1.35 and a of 1.21. Average annual oats prices range from a low of 0.15 to a high of 3.89 dollars per bushel, with a mean of 0.86 and a standard deviation of 0.75. There are 140 time points in the dataset, so there are 140 observations. Both variables are continuous variables and are measured once per year without gaps for 140 years. This makes them appropriate for producing a cross-correlation function.

Analyzing the Data

Page 5 of 10 Learn About Time Series Cross-Correlations in Stata With Data From the USDA Feed Grains Database (1876–2015) SAGE SAGE Research Methods Datasets Part 2020 SAGE Publications, Ltd. All Rights Reserved. 1 Figure 1 presents a cross-correlation function of average annual barley and oat prices per bushel in the United States from 1876 to 2015. Each series was differenced before producing the cross-correlation function to ensure that each series was stationary.

The graph is titled “Barley Price per Bushel with Oats Price per Bushel.” The vertical axis is labeled “CCF” and ranging from negative 1 to 1 in increments of 0.5. The horizontal axis is labeled “Lag Number” and ranging from negative 7 to 7 in increments of 1.

The bins denote the coefficient values. A horizontal line drawn from a point opposite 0.18 on the vertical axis denotes the upper confidence limit while a horizontal line drawn from a point opposite negative 0.18 on the vertical axis denotes the lower confidence limit. The coefficient values for the given lag numbers are as follows: (negative 7, 0.19); (negative 6, negative 0.02); (negative 5, 0.04); (negative 4, 0.17); (negative 3, negative 0.26); (negative 2, negative 0.295); (negative 1, 0.30); (0, 0.75); (1, negative 0.10); (2, negative 0.32); (3, negative 0.06); (4, 0.17); (5, 0.17); (6, negative 0.10); (7, negative 0.01). All values are approximated.

Figure 1: Cross-correlation function for first differences in average annual barley and oats prices per bushel in the United States from 1876 to 2015, USDA Database.

Page 6 of 10 Learn About Time Series Cross-Correlations in Stata With Data From the USDA Feed Grains Database (1876–2015) SAGE SAGE Research Methods Datasets Part 2020 SAGE Publications, Ltd. All Rights Reserved. 1

Figure 1 shows that the strongest correlation between the two series occurs at lag 0. The correlation itself equals 0.76. This shows that changes in barley and oat prices are strongly positively correlated with each other contemporaneously. In other words, these two grain prices move up and down together in the same year. This suggests that some other factor influences the price of these two grains in the same way at the same time.

Figure 1 also shows significant negative correlations at lags −2 and −3. Given that barley prices were entered into the cross-correlation function first, this suggests that increases in barely prices at a given point in time precede decreases in oats prices by two to three years. In other words, higher than average barley prices

Page 7 of 10 Learn About Time Series Cross-Correlations in Stata With Data From the USDA Feed Grains Database (1876–2015) SAGE SAGE Research Methods Datasets Part 2020 SAGE Publications, Ltd. All Rights Reserved. 1 tend to lead to lower than average oats prices two to three years later.

In addition, Figure 1 also shows a significant negative correlation at the +2 lag. This indicates that higher than average oats prices tend to lead to lower than average barley prices two years later.

Finally, Figure 1 does show a positive correlation at lag −1 between barley and oats prices. This suggests that an increase in barley prices in a given year is moderately associated with an increase in oats prices the following year. The absence of a significant correlation at lag +1 shows that the reverse is not true. We note that subsequent analysis of the cross-correlation function of residuals from having estimated ARIMA(0,1,1) models on each series produced similar results to those shown in Figure 1 except that the correlation at lag −1 was reduced to being statistically insignificant.

Overall, the results presented in Figure 1 indicate that both grain prices respond in the same way to exogenous factors, but the negative correlations at both positive and negative lags of about the same length also suggest that the prices of the two grains are negatively related to each other over time. Neither barley prices nor oats prices appear to be better at predicting future changes in the prices of the other.

Presenting the Results Results for a cross-correlation function of two time series variables can be presented as follows:

“We used a subset of data from the USDA Database reporting feed grain production in the United States from 1876 to 2015 to evaluate the following research question:

Are annual average prices per bushel for barley and oats in the U.S.

Page 8 of 10 Learn About Time Series Cross-Correlations in Stata With Data From the USDA Feed Grains Database (1876–2015) SAGE SAGE Research Methods Datasets Part 2020 SAGE Publications, Ltd. All Rights Reserved. 1 correlated with each other over time?

Figure 1 presents a cross-correlation function for the first differences of annual average prices per bushel of barley and oats in the United States. Each series was differenced to ensure stationarity. The strongest correlation in Figure 1 occurs at lag 0, which that the two grain prices are strongly contemporaneously correlated. This strong positive correlation indicates that both grain prices respond similarly and simultaneously to some other exogenous factors. Figure 1 also shows negative cross-correlations as lags −2 and −3, suggesting that higher than average barley prices lead to lower than average oats prices two to three years later. However, Figure 1 also shows a negative correlation at lag +2, suggesting that higher than average oats prices lead to lower than average barley prices two years later. Thus, while the two grains positively respond to contemporaneous factors, they are moderately negatively related to each other over time. Further analysis estimating a Vector Autoregression (VAR) model and/ or testing for Granger Causality should be explored.”

Review Cross-correlations help researchers explore whether two variables are related to each other and, if so, whether movement in one variable tends to precede or follow movement in the other. Cross-correlations are often estimated as a first step to a more complex analysis of the dynamic interrelationship between two (or more) times series variables.

You should know:

• What types of variable are suitable for estimating a cross-correlation function. • How to produce and interpret a cross-correlation function. • How to report the results of a cross-correlation function for two time series

Page 9 of 10 Learn About Time Series Cross-Correlations in Stata With Data From the USDA Feed Grains Database (1876–2015) SAGE SAGE Research Methods Datasets Part 2020 SAGE Publications, Ltd. All Rights Reserved. 1 variables.

Your Turn You can download this sample dataset along with a guide showing how to produce a cross-correlation function using statistical software. The sample dataset also includes a variable named cornprice, which measures the average annual price of corn per bushel in the U.S. over the same 140-year time period. See if you can reproduce the results presented here, and try producing your own cross- correlation function by replacing either the price of barley or the price of oats with the price of corn.

Page 10 of 10 Learn About Time Series Cross-Correlations in Stata With Data From the USDA Feed Grains Database (1876–2015)