Stat 520 Forecasting and Time Series

STAT 520 FORECASTING AND TIME SERIES Fall, 2013 Lecture Notes Joshua M. Tebbs Department of Statistics University of South Carolina TABLE OF CONTENTS STAT 520, J. TEBBS Contents 1 Introduction and Examples 1 2 Fundamental Concepts 20 2.1 Summary of important distribution theory . 20 2.1.1 Univariate random variables . 20 2.1.2 Bivariate random vectors . 22 2.1.3 Multivariate extensions and linear combinations . 26 2.1.4 Miscellaneous . 27 2.2 Time series and stochastic processes . 28 2.3 Means, variances, and covariances . 29 2.4 Some (named) stochastic processes . 29 2.5 Stationarity . 38 3 Modeling Deterministic Trends 44 3.1 Introduction . 44 3.2 Estimation of a constant mean . 46 3.3 Regression methods . 51 3.3.1 Straight line regression . 51 3.3.2 Polynomial regression . 57 3.3.3 Seasonal means model . 61 3.3.4 Cosine trend model . 64 3.4 Interpreting regression output . 68 3.5 Residual analysis (model diagnostics) . 70 3.5.1 Assessing normality . 71 3.5.2 Assessing independence . 73 3.5.3 Sample autocorrelation function . 76 i TABLE OF CONTENTS STAT 520, J. TEBBS 4 Models for Stationary Time Series 80 4.1 Introduction . 80 4.2 Moving average processes . 81 4.2.1 MA(1) process . 81 4.2.2 MA(2) process . 85 4.2.3 MA(q) process . 87 4.3 Autoregressive processes . 88 4.3.1 AR(1) process . 89 4.3.2 AR(2) process . 94 4.3.3 AR(p) process . 103 4.4 Invertibility . 105 4.5 Autoregressive moving average (ARMA) processes . 107 5 Models for Nonstationary Time Series 113 5.1 Introduction . 113 5.2 Autoregressive integrated moving average (ARIMA) models . 118 5.2.1 IMA(1,1) process . 122 5.2.2 IMA(2,2) process . 123 5.2.3 ARI(1,1) process . 125 5.2.4 ARIMA(1,1,1) process . 125 5.3 Constant terms in ARIMA models . 127 5.4 Transformations . 129 6 Model Specification 136 6.1 Introduction . 136 6.2 The sample autocorrelation function . 136 6.3 The partial autocorrelation function . 143 6.4 The extended autocorrelation function . 155 ii TABLE OF CONTENTS STAT 520, J. TEBBS 6.5 Nonstationarity . 163 6.6 Other model selection methods . 170 6.7 Summary . 173 7 Estimation 175 7.1 Introduction . 175 7.2 Method of moments . 176 7.2.1 Autoregressive models . 176 7.2.2 Moving average models . 178 7.2.3 Mixed ARMA models . 179 7.2.4 White noise variance . 179 7.2.5 Examples . 180 7.3 Least squares estimation . 185 7.3.1 Autoregressive models . 185 7.3.2 Moving average models . 187 7.3.3 Mixed ARMA models . 188 7.3.4 White noise variance . 189 7.3.5 Examples . 189 7.4 Maximum likelihood estimation . 197 7.4.1 Large-sample properties of MLEs . 200 7.4.2 Examples . 202 8 Model Diagnostics 208 8.1 Introduction . 208 8.2 Residual analysis . 209 8.2.1 Normality and independence . 211 8.2.2 Residual ACF . 215 8.3 Overfitting . 228 iii TABLE OF CONTENTS STAT 520, J. TEBBS 9 Forecasting 231 9.1 Introduction . 231 9.2 Deterministic trend models . 233 9.3 ARIMA models . 237 9.3.1 AR(1) . 238 9.3.2 MA(1) . 244 9.3.3 ARMA(p; q).............................. 248 9.3.4 Nonstationary models . 253 9.4 Prediction intervals . 258 9.4.1 Deterministic trend models . 259 9.4.2 ARIMA models . 261 9.5 Forecasting transformed series . 263 9.5.1 Differencing . 263 9.5.2 Log-transformed series . 265 10 Seasonal ARIMA Models 267 10.1 Introduction . 267 10.2 Purely seasonal (stationary) ARMA models . 269 10.2.1 MA(Q)s ................................ 269 10.2.2 AR(P )s ................................ 274 10.2.3 ARMA(P; Q)s ............................. 278 10.3 Multiplicative seasonal (stationary) ARMA models . 280 10.4 Nonstationary seasonal ARIMA (SARIMA) models . 290 10.5 Additional topics . 306 iv CHAPTER 1 STAT 520, J. TEBBS 1 Introduction and Examples Complementary reading: Chapter 1 (CC). TERMINOLOGY :A time series is a sequence of ordered data. The \ordering" refers generally to time, but other orderings could be envisioned (e.g., over space, etc.). In this class, we will be concerned exclusively with time series that are • measured on a single continuous random variable Y • equally spaced in discrete time; that is, we will have a single realization of Y at each second, hour, day, month, year, etc. UBIQUITY : Time series data arise in a variety of fields. Here are just a few examples. • In business, we observe daily stock prices, weekly interest rates, quarterly sales, monthly supply figures, annual earnings, etc. • In agriculture, we observe annual yields (e.g., crop production), daily crop prices, annual herd sizes, etc. • In engineering, we observe electric signals, voltage measurements, etc. • In natural sciences, we observe chemical yields, turbulence in ocean waves, earth tectonic plate positions, etc. • In medicine, we observe EKG measurements on patients, drug concentrations, blood pressure readings, etc. • In epidemiology, we observe the number of flu cases per day, the number of health-care clinic visits per week, annual tuberculosis counts, etc. • In meteorology, we observe daily high temperatures, annual rainfall, hourly wind speeds, earthquake frequency, etc. • In social sciences, we observe annual birth and death rates, accident frequencies, crime rates, school enrollments, etc. PAGE 1 CHAPTER 1 STAT 520, J. TEBBS Global temperature deviations Global temperature −0.4 −0.2 0.0 0.2 0.4 1860 1880 1900 1920 1940 1960 1980 2000 Year Figure 1.1: Global temperature data. The data are a combination of land-air average temperature anomalies, measured in degrees Centigrade. Example 1.1. Global temperature data. \Global warming" refers to an increase in the average temperature of the Earth's near-surface air and oceans since the mid-20th century and its projected continuation. The data in Figure 1.1 are annual temperature deviations (1856-1997) in deg C, measured from a baseline average. • Data file: globaltemps • There are n = 142 observations. • Measurements are taken each year. • What are the noticeable patterns? • Predictions? (Are we doomed?) PAGE 2 CHAPTER 1 STAT 520, J. TEBBS Amount of milk produced 1300 1400 1500 1600 1700 1994 1996 1998 2000 2002 2004 2006 Year Figure 1.2: United States milk production data. Monthly production figures, measured in millions of pounds, from January, 1994 to December, 2005. Example 1.2. Milk production data. Commercial dairy farming produces the vast majority of milk in the United States. The data in Figure 1.2 are the monthly U.S. milk production (in millions of pounds) from January, 1994 to December, 2005. • Data file: milk (TSA) • There are n = 144 observations. • Measurements are taken each month. • What are the noticeable patterns? • Predictions? PAGE 3 CHAPTER 1 STAT 520, J. TEBBS CREF stock values 170 180 190 200 210 220 0 100 200 300 400 500 Time Figure 1.3: CREF stock data. Daily values of one unit of CREF stock values: August 26, 2004 to August 15, 2006. Example 1.3. CREF stock data. TIAA-CREF is the leading provider of retirement accounts and products to employees in academic, research, medical, and cultural in- stitutions. The data in Figure 1.3 are daily values of one unit of the CREF (College Retirement Equity Fund) stock fund from 8/26/04 to 8/15/06. • Data file: CREF (TSA) • There are n = 501 observations. • Measurements are taken each trading day. • What are the noticeable patterns? • Predictions? (My retirement depends on these!) PAGE 4 CHAPTER 1 STAT 520, J. TEBBS Number of homeruns 50 100 150 200 1920 1940 1960 1980 2000 Year Figure 1.4: Homerun data. Number of homeruns hit by the Boston Red Sox each year during 1909-2010. Example 1.4. Homerun data. The Boston Red Sox are a professional baseball team based in Boston, Massachusetts, and a member of the Major League Baseball's American League Eastern Division. The data in Figure 1.4 are the number of homeruns hit by the team each year from 1909 to 2010. Source: Ted Hornback (Spring, 2010). • Data file: homeruns • There are n = 102 observations. • Measurements are taken each year. • What are the noticeable patterns? • Predictions? PAGE 5 CHAPTER 1 STAT 520, J. TEBBS Number of earthquakes (7.0 or greater) 5 10 15 20 25 30 35 40 1900 1920 1940 1960 1980 2000 Year Figure 1.5: Earthquake data. Number of \large" earthquakes per year from 1900-1998. Example 1.5. Earthquake data. An earthquake occurs when there is a sudden release of energy in the Earth's crust. Earthquakes are caused mostly by rupture of geological faults, but also by other events such as volcanic activity, landslides, mine blasts, and nuclear tests. The data in Figure 1.5 are the number of global earthquakes annually (with intensities of 7.0 or greater) during 1900-1998. Source: Craig Whitlow (Spring, 2010). • Data file: earthquake • There are n = 99 observations. • Measurements are taken each year. • What are the noticeable patterns? • Predictions? PAGE 6 CHAPTER 1 STAT 520, J. TEBBS USC Columbia fall enrollment USC Columbia fall 5000 10000 15000 20000 25000 1960 1970 1980 1990 2000 2010 Year Figure 1.6: University of South Carolina fall enrollment data. Number of students reg- istered for classes.

Stat 520 Forecasting and Time Series

Chapter 2 Time Series and Forecasting

Demand Forecasting

Generalized Linear Models and Generalized Additive Models

Here Is an Example Where I Analyze the Lags Needed to Analyze Okun's

Moving Average Filters

Stochastic Process - Introduction

Parameters Estimation for Asymmetric Bifurcating Autoregressive Processes with Missing Data

Autocovariance Function Estimation Via Penalized Regression

LECTURES 2 - 3 : Stochastic Processes, Autocorrelation Function

A University of Sussex Phd Thesis Available Online Via Sussex

Regularity of Solutions and Parameter Estimation for Spde’S with Space-Time White Noise

Variance Function Regressions for Studying Inequality