Learn About Time Series Cross-Correlations in Stata With

Learn About Time Series Cross- Correlations in Stata With Data From the USDA Feed Grains Database (1876–2015) © 2020 SAGE Publications, Ltd. All Rights Reserved. This PDF has been generated from SAGE Research Methods Datasets. SAGE SAGE Research Methods Datasets Part 2020 SAGE Publications, Ltd. All Rights Reserved. 1 Learn About Time Series Cross- Correlations in Stata With Data From the USDA Feed Grains Database (1876–2015) Student Guide Introduction This dataset example introduces researchers to estimating cross-correlations between two time series variables. Cross-correlations help researchers explore whether two variables are related to each other and, if so, whether movement in one variable tends to precede or follow movement in the other. This example uses a subset of data from the United States Department of Agriculture (USDA) Database. It examines the cross-correlation between the average annual prices per bushel for barley and oats in the United States from 1876 to 2015. Understanding whether prices for two grains are correlated and, if so, whether one price leads or follows the other could help policy makers, farmers, and economists make better forecasts of future agricultural prices. What Is a Cross-Correlation? When a single variable is measured repeatedly over time it is referred to as a time series. Values of two time series variables may be correlated with each other. They may be correlated contemporaneously, meaning that when values of one of the variables go up or down, the values of the other variable tend to go up or down in the same years. However, they could also be correlated at different time Page 2 of 10 Learn About Time Series Cross-Correlations in Stata With Data From the USDA Feed Grains Database (1876–2015) SAGE SAGE Research Methods Datasets Part 2020 SAGE Publications, Ltd. All Rights Reserved. 1 points. For example, increases in one variable might lead to increases in the other variable at future points in time rather than at the same time. Cross-correlations can be used to evaluate whether two time series variables are associated with each other, and if so, whether movement in one precedes the other, follows the other, or happens at the same time. In this regard, a cross-correlation might be used as a first step toward evaluating a potential causal relationship between two variables measured over time. Cross-correlations are often estimated prior to estimating Vector Autoregression (VAR) models and/or testing whether or not two time series variables cause each other. When researchers compute and plot a series of cross-correlations across a range of lags, it is sometimes called a cross-correlation function. Cross-correlation functions are generally presented in plots with bars at each lag indicating the strength and direction of the correlation between two variables at given distances apart. Consider a hypothetical cross-correlation function for two variables named X and Y. Further suppose that we ordered the variables as first X then Y. A cross- correlation function would estimate the correlation between X and Y at several different lag lengths – both positive and negative. A significant correlation at negative lags would indicate that movement in the first variable, X, occurs in time before a corresponding movement in Y occurs. In other words, significant cross-correlations at negative lags indicate that the first variable leads the second variable. In contrast, significant cross-correlations at positive lags indicate that changes in X follow changes in Y. It is possible for cross-correlations between two variables to be significant for both negative and positive lags. Such a pattern would indicate that changes in each variable both lead and respond to changes in the other variable. Finally, it is also possible that the two series are only correlated contemporaneously, which would Page 3 of 10 Learn About Time Series Cross-Correlations in Stata With Data From the USDA Feed Grains Database (1876–2015) SAGE SAGE Research Methods Datasets Part 2020 SAGE Publications, Ltd. All Rights Reserved. 1 manifest itself as a significant correlation at lag = 0 but not at any negative or positive lags. Cross-correlations can be estimated on two time series variables prior to making any transformations in them. However, such cross-correlations can be misleading if one or both of the variables is non-stationary. A stationary time series is one where the main statistical properties of the series, such as its mean, variance, autocorrelation, etc., are constant throughout the entire series. A trend in the data, such as a steady increase on average over time in the values of the series, violates the idea of a constant mean. Stationarity can be evaluated graphically by examining the ACF of each time series of interest. See the SAGE Research Methods Datasets example on Time Series ACFs and PACFs for more information. The most common treatment for non-stationarity in a time series is to calculate the first difference of the series (also called differencing the series). You calculate the first difference of a series by taking each observation and subtracting from it its previous value. Then you can compute the cross-correlations between the two series. Some researchers suggest estimating a full ARIMA model for both time series and then estimating the cross-correlation function on the residuals of those two ARIMA models (see the SAGE Research Methods Datasets example for Time Series ARIMA Models for more information). Following this approach requires that the two ARIMA models have the same configuration. For this example we will focus on just differencing each time series in question to ensure stationarity. Because cross-correlation functions focus on computing correlations, they are most appropriate for use when examining two continuous variables. In theory, continuous variables are variables that can take on any numerical value within their range. In practice, variables that take on a large number of different values within their range are treated as continuous. Variables like age measured in years, Page 4 of 10 Learn About Time Series Cross-Correlations in Stata With Data From the USDA Feed Grains Database (1876–2015) SAGE SAGE Research Methods Datasets Part 2020 SAGE Publications, Ltd. All Rights Reserved. 1 income measured in dollars, or unemployment measured in percentages are all good examples. Illustrative Example: U.S. Barley and Oats Prices per Bushel, 1876–2015 This example explores annual average prices per bushel for barley and oats in the United States from 1876 to 2015. The research question is just a simple descriptive one: Are annual average prices per bushel for barley and oats in the U.S. correlated with each other over time? The Data This example uses two variables from the USDA Database: • Average barley prices per bushel in the U.S. in a given year (barleyprice), measured annually in dollars. • Average oats prices per bushel in the U.S. in a given year (oatsprice), measured annually in dollars. Average annual barley prices range from a low of 0.23 to a high of 6.43 dollars per bushel, with a mean of 1.35 and a standard deviation of 1.21. Average annual oats prices range from a low of 0.15 to a high of 3.89 dollars per bushel, with a mean of 0.86 and a standard deviation of 0.75. There are 140 time points in the dataset, so there are 140 observations. Both variables are continuous variables and are measured once per year without gaps for 140 years. This makes them appropriate for producing a cross-correlation function. Analyzing the Data Page 5 of 10 Learn About Time Series Cross-Correlations in Stata With Data From the USDA Feed Grains Database (1876–2015) SAGE SAGE Research Methods Datasets Part 2020 SAGE Publications, Ltd. All Rights Reserved. 1 Figure 1 presents a cross-correlation function of average annual barley and oat prices per bushel in the United States from 1876 to 2015. Each series was differenced before producing the cross-correlation function to ensure that each series was stationary. The graph is titled “Barley Price per Bushel with Oats Price per Bushel.” The vertical axis is labeled “CCF” and ranging from negative 1 to 1 in increments of 0.5. The horizontal axis is labeled “Lag Number” and ranging from negative 7 to 7 in increments of 1. The bins denote the coefficient values. A horizontal line drawn from a point opposite 0.18 on the vertical axis denotes the upper confidence limit while a horizontal line drawn from a point opposite negative 0.18 on the vertical axis denotes the lower confidence limit. The coefficient values for the given lag numbers are as follows: (negative 7, 0.19); (negative 6, negative 0.02); (negative 5, 0.04); (negative 4, 0.17); (negative 3, negative 0.26); (negative 2, negative 0.295); (negative 1, 0.30); (0, 0.75); (1, negative 0.10); (2, negative 0.32); (3, negative 0.06); (4, 0.17); (5, 0.17); (6, negative 0.10); (7, negative 0.01). All values are approximated. Figure 1: Cross-correlation function for first differences in average annual barley and oats prices per bushel in the United States from 1876 to 2015, USDA Database. Page 6 of 10 Learn About Time Series Cross-Correlations in Stata With Data From the USDA Feed Grains Database (1876–2015) SAGE SAGE Research Methods Datasets Part 2020 SAGE Publications, Ltd. All Rights Reserved. 1 Figure 1 shows that the strongest correlation between the two series occurs at lag 0. The correlation itself equals 0.76. This shows that changes in barley and oat prices are strongly positively correlated with each other contemporaneously. In other words, these two grain prices move up and down together in the same year.

Load more