ARMA Time-Series Modeling with Graphical Models
Total Page:16
File Type:pdf, Size:1020Kb
ARMA Time-Series Modeling with Graphical Models Bo Thiesson David Maxwell Chickering David Heckerman Christopher Meek Microsoft Research Microsoft Research Microsoft Research Microsoft Research [email protected] [email protected] [email protected] [email protected] Abstract moving average (ARMA) time-series model (e.g., Box, Jenkins, and Reinsel, 1994) as a graphical model to We express the classic ARMA time-series achieve similar benefits. As we shall see, the ARMA model as a directed graphical model. In model includes deterministic relationships, making it doing so, we find that the deterministic re- effectively impossible to estimate model parameters lationships in the model make it effectively via the Expectation{Maximization (EM) algorithm. impossible to use the EM algorithm for Consequently, we introduce a variation of the ARMA learning model parameters. To remedy this model for which the EM algorithm can be applied. problem, we replace the deterministic rela- Our modification, called stochastic ARMA (σARMA), tionships with Gaussian distributions hav- replaces the deterministic component of the ARMA ing a small variance, yielding the stochastic model with a Gaussian distribution having a small ARMA (σARMA) model. This modification variance. To our surprise, this addition not only has allows us to use the EM algorithm to learn the desired effect of making the EM algorithm effective parameters and to forecast, even in situa- for parameter estimation, but our experiments also tions where some data is missing. This mod- suggest that it provides a smoothing technique that ification, in conjunction with the graphical- produces more accurate estimates. model approach, also allows us to include We have chosen to focus on ARMA time-series mod- cross predictors in situations where there are els in this paper, but our paper applies immediately multiple time series and/or additional non- to autoregressive integrated moving average (ARIMA) temporal covariates. More surprising, ex- time-series models as well. In particular, an ARIMA periments suggest that the move to stochas- model for a time series is simply an ARMA model tic ARMA yields improved accuracy through for a preprocessed version of that same time series. better smoothing. We demonstrate improve- This preprocessing consists of d consecutive differenc- ments afforded by cross prediction and better ing transformations, where each transformation re- smoothing on real data. places the observations with the differences between successive observations. For example, when d = 0 an ARIMA model is a regular ARMA model, when d = 1 1 Introduction an ARIMA model is an ARMA model of the differ- ences, and when d = 2 an ARIMA model is an ARMA Graphical models have been used to represent time- model of the differences of the differences. In practice, series models for almost two decades (e.g., Dean and d ≤ 2 is almost always sufficient for good results (Box, Kanazawa, 1988; Cooper, Horvitz, and Heckerman, Jenkins, and Reinsel, 1994). 1988). The benefits of such representation include the ability to apply standard graphical-model-inference al- In Section 2, we review the ARMA model and intro- gorithms for parameter estimation (including estima- duce our stochastic variant. We also extend the model tion in the presence of missing data), for prediction, to allow for cross prediction, yielding σARMAxp, a and for cross prediction|the use of one time series model more general than previous extensions to cross- to help predict another time series (e.g., Ghahramani, prediction ARMA (e.g., the vector ARMA model; 1998; Meek, Chickering, and Heckerman, 2002; Bach Reinsel, 2003). In Section 3, we describe how the EM and Jordan, 2004). algorithm can be applied to σARMAxp for parameter estimation. Our approach is, to our knowledge, the In this paper, we express the classic autoregressive first EM-based alternative to parameter estimation in Our stochastic time-series models handle incomplete the ARMA model class. In Section 4, we describe time-series sequences in the sense that some values in how to predict future observations|that is, forecast| the sequence are missing. Hence, in real-world sit- using σARMAxp. In Sections 5 and 6, we demonstrate uations where the length of multiple cross-predicting the utility of our extensions in an evaluation using two time series do not match, we have two possibilities. real data collections. We show that (1) the stochastic We can introduce observation variables with missing variant of ARMA produces more accurate predictions values, or we can shorten a sequence of observation than those of standard ARMA, (2) estimation and pre- variables, as necessary. In the following we will as- diction in the face of missing data using our approach sume that Y , E, and C are all of the same length. yields better forecasts than by a heuristic approach in For any sequence, say Y , we denote the sub-sequence which missing data are filled in by interpolation, and consisting of the i'th through the j'th element by Y j = (3) the inclusion of cross predictions can lead to more i (Y ;Y ;:::;Y ), i < j. accurate forecasting. i i+1 j 2.1 ARMA Models 2 Time-Series Models In slightly different notation than usual (see, e.g., In this section, we review the well-known class of Box, Jenkins, and Reinsel, 1994 or Ansley, 1979) the autoregressive-moving average (ARMA) time series ARMA(p; q) time series model is defined as the deter- models and define two closely related stochastic varia- ministic relation ∗ tions, the σARMA and the σARMA classes. We also Xq Xp define a generalization of these two stochastic varia- Y = ζ + β E − + α Y − (1) xp ∗xp t j t j i t i tions, called σARMA and σARMA , which ad- j=0 i=1 ditionally allow for selective cross-predictors from re- P p lated time series. where ζ is the intercept,P i=1 αiYt−i is the autore- q gressive (AR) part, β E − is the moving aver- We begin by introducing notation and nomenclature. j=0 j t j age (MA) part with β fixed as 1, and E ∼ N (0; γ) We denote a temporal sequence of observation vari- 0 t is \white noise" with E mutually independent for all ables by Y = (Y ;Y ;:::;Y ). Time-series data is t 1 2 T t. The construction of this model therefore involves a sequence of values for these variables denoted by estimation of the free parameters ζ,(α ; : : : ; α ), y = (y ; y ; : : : ; y ). We suppose that these observa- 1 p 1 2 T (β ; : : : ; β ), and γ. tions are obtained at discrete, equispaced intervals of 1 q time. In this paper we will consider incomplete obser- For a constructed model, the one step-ahead forecast vation sequences in the sense that some of the obser- Y^t given the past can be computed as vation variables may have missing observations. For Xq Xp notational convenience, we will represent such a se- Y^ = ζ + β E − + α Y − (2) quence of observations as a complete sequence, and it t j t j i t i j=1 i=1 will be clear from context that this sequence has miss- ing observations. where we exploit that at any time t, the error in the ARMA model can be determined as the difference be- In the time-series models that we consider in this pa- tween the actual observed value and the one step- per, we associate a latent \white noise" variable with ahead forecast each observable variable. These latent variables are Et = Yt − Y^t (3) denoted E = (E1;E2;:::;ET ). The variance for this forecast is γ. Some of the models will contain cross-predictor se- quences. A cross-predictor sequence is a sequence An ARMA model can be represented by a directed of observation variables from a related time series, graphical model (or Bayes net) with both stochastic which is used in the predictive model for the time and deterministic nodes. A node is deterministic if series under consideration. For each cross-predictor the value of the variable represented by that node is a sequence in a model, a cross-predictor variable is as- deterministic function of the values for variables rep- sociated with each observable variable. For instance, resented by nodes pointing to that node in the graph- 0 0 0 0 00 00 00 00 Y = (Y1 ;Y2 ;:::;YT 0 ) and Y = (Y1 ;Y2 ;:::;YT 00 ) ical representation. From the definition of the ARMA 0 0 may be related time series sequences where Yt−1 Yt−12 models, we see that the observable variables (the Y 's) 00 and Yt−1 are cross-predictor variables for Yt. Let Ct are represented by deterministic nodes and the error denote a vector of cross-predictor variables for Yt. The variables (the E's) are represented by stochastic nodes. set of cross-predictor vectors for all variables Y is de- The relations between variables are defined by (1) and noted C = (C1;C2;:::;CT ). accordingly, Yt−p;:::;Yt−1 and Et−q;:::;Et all point to Yt. In this paper we are interested in the condi- ting β0 vary freely. We will denote this class of models tional likelihood models, where we condition on the first as σARMA∗. R = max(p; q) variables. Relations between variables A σARMA (or σARMA∗) model has the same graphi- for t ≤ R can therefore be ignored. The graphical cal representation as the similar ARMA model, except representation for an ARMA(2,2) model is shown in that deterministic nodes are now stochastic. Figure 1. It should be noted that if we artificially ex- tend the time series back in time for R (unobserved) time steps, this model represents what is known in the 2.3 σARMAxp Models literature as the exact likelihood model.