
Simultaneously Reconciled Quantile Forecasting of Hierarchically Related Time Series Xing Han Sambarta Dasgupta Joydeep Ghosh UT Austin IntuitAI UT Austin [email protected] [email protected] [email protected] Abstract relations, governed by a set of aggregation and dis- aggregation operations (Hyndman et al., 2011; Taieb et al., 2017). These aggregations and disaggregations Many real-life applications involve simulta- can occur across multiple time series or over the same neously forecasting multiple time series that time series across multiple time granularities. An ex- are hierarchically related via aggregation or ample of the first kind can be forecasting demand at disaggregation operations. For instance, com- county, city, state, and country levels (Hyndman et al., mercial organizations often want to forecast 2011). An example of the second kind of hierarchy inventories simultaneously at store, city, and is forecasting demand at different time granularities state levels for resource planning purposes. like daily, weekly, and monthly (Athanasopoulos et al., In such applications, it is important that the 2017). The need to forecast multiple time series that forecasts, in addition to being reasonably ac- are hierarchically related arise in many applications, curate, are also consistent w.r.t one another. from financial forecasting (sas) to demand forecasting Although forecasting such hierarchical time (Hyndman et al., 2016; Zhao et al., 2016) and psephol- series has been pursued by economists and ogy (Lauderdale et al., 2019). The recently announced data scientists, the current state-of-the-art M5 competition1 from the International Institute of models use strong assumptions, e.g., all fore- Forecasters also involves hierarchical forecasting on casts being unbiased estimates, noise distribu- Walmart data with a $100K prize. A novel challenge in tion being Gaussian. Besides, state-of-the-art such forecasting problems is to produce accurate fore- models have not harnessed the power of mod- casts while maintaining the consistency of the forecasts ern nonlinear models, especially ones based on across multiple hierarchies. deep learning. In this paper, we propose us- ing a flexible nonlinear model that optimizes Related Works Existing hierarchical forecasting quantile regression loss coupled with suitable methods predominantly employ linear auto-regressive regularization terms to maintain the consis- (AR) models that are initially trained while ignoring tency of forecasts across hierarchies. The the- the hierarchy. The output forecasts produced by these oretical framework introduced herein can be AR models are reconciled afterward for consistency. As applied to any forecasting model with an un- arXiv:2102.12612v1 [cs.LG] 25 Feb 2021 shown in Figure 1(c), such reconciliation is achieved by derlying differentiable loss function. A proof defining a mapping matrix, often denoted by S, which of optimality of our proposed method is also encapsulates the mutual relationships among the time provided. Simulation studies over a range of series (Hyndman et al., 2011). The reconciliation step datasets highlight the efficacy of our approach. involves inversion and multiplication of the S matrix that leads to the computational complexity of O(n3h), where n is the number of nodes, and h represents how 1 Introduction many levels of hierarchy the set of time series are or- ganized into. Thus, reconciling hundreds of thousands Hierarchical time series refers to a set of time series or- of time series at a time required for specific industrial ganized in a logical hierarchy with the parent-children applications becomes difficult. Improved versions of the reconciliation were proposed by employing trace th Proceedings of the 24 International Conference on Artifi- minimization (Wickramasuriya et al., 2015) and sparse cial Intelligence and Statistics (AISTATS) 2021, San Diego, California, USA. PMLR: Volume 130. Copyright 2021 by matrix computation (Hyndman et al., 2016). These the author(s). 1https://mofc.unic.ac.cy/m5-competition/ Simultaneously Reconciled Quantile Forecasting of Hierarchically Related Time Series v e 1 1;2 e1;3 v2 v3 v4 v5 v6 v7 (a) (b) (c) (d) Figure 1: (a) Example of hierarchically related time series: state population growth forecast, the data is aggregated by geographical locations; (b) corresponding graph structure as nodes and vertices; (c) corresponding matrix representation with four bottom level time series and three aggregated levels; (d) time series forecast at each node. algorithms assume that the individual base estimators loss functions. It is left to human experts’ discretion are unbiased, which is unrealistic in many real-life ap- to select the appropriate loss based on the time series’s plications. The unbiasedness assumption was relaxed nature. This approach cannot be generalized and in- in (Taieb et al., 2017) while introducing other assump- volves human intervention; it is especially not feasible tions like ergodicity with exponentially decaying mixing for an industrial forecasting pipeline where predictions coefficient. Moreover, all these existing methods try to are to be generated for a vast number of time series, impose any reconciliation constraints among the time which can have a widely varying nature of observation series as a post inference step, which possesses two noise. Besides, Bayesian approaches also face problems challenges: 1. ignoring the relationships across time se- as the prior distribution and loss function assumptions ries during training can potentially lead to suboptimal will not be met across all of the time series. One ap- solutions, 2. additional computational complexity in proach that does not need to specify a parametric form the inference step, owing to the inversion of S matrix, of distribution is through quantile regression. Prior which makes it challenging to use the technique when works include combining sequence to sequence models a significant amount of time series are encountered in with quantile loss to generate multi-step probabilistic an industrial forecasting pipeline. forecasts (Wen et al., 2017) and modeling conditional quantile functions using regression splines (Gasthaus A critical development in time series forecasting has et al., 2019). These works are incapable of handling been the application of Deep Neural Networks (DNN) hierarchical structures within time series. (e.g., Chung et al., 2014; Lai et al., 2018; Mukherjee et al., 2018; Oreshkin et al., 2019; Salinas et al., 2019; Key aspects of multiple, related time series forecasting Sen et al., 2019; Zhu and Laptev, 2017) which have addressed by our proposed model include: shown to outperform statistical auto-regressive or other statistical models in several situations. However, no 1. introduction of a regularized loss function that cap- existing method incorporates the hierarchical structure tures the mutual relationships among each group of the set of time series into the DNN learning step. of time series from adjacent aggregation levels, Instead, one hopes that the DNN will learn the rela- tionships from the data. Graph neural networks (GNN) 2. generation of probabilistic forecasts using quan- (e.g., Franceschi et al., 2019; Lachapelle et al., 2019; Wu tile regression and simultaneously reconciling each et al., 2020; Yu et al., 2017, 2019; Zhang et al., 2020; quantile during model training, Zheng et al., 2018) have also been used to learn inher- ent relations among multiple time series; however, they 3. clear demonstration of superior model capabili- need a pre-defined graph model that can adequately ties, especially on real e-commerce datasets with capture the relationships among the time series. Char- sparsity and skewed noise distributions. acterizing uncertainty of the DNN forecast is another critical aspect, which becomes even more complicated Background: Hierarchical Time Series Forecast when additive noise does not follow a Gaussian distribu- m k Denote bt 2 R , at 2 R as the observations at time t tion (e.g., Blundell et al., 2015; Iwata and Ghahramani, for the m and k series at the bottom and aggregation 2017; Kuleshov et al., 2018; Lakshminarayanan et al., n level(s), respectively. Then yt = Sbt 2 R contains 2017; Sun et al., 2019). If the observation noise model ^ observations at time t for all levels. Similarly, let bT (h) is misspecified, then the performance would be poor be the h−step ahead forecast on the bottom-level at however complex neural network architecture one uses. time T , we can obtain forecasts in higher aggregation Other works like Salinas et al. (2019) use multiple obser- ^ levels by computing y^T (h) = SbT (h). This simple vation noise models (Gaussian, Negative Binomial) and method is called bottom-up (BU), which guarantees Xing Han, Sambarta Dasgupta, Joydeep Ghosh reconciled forecasts. However, the error from bottom- can later extend these to set of non-linear constraints: x (t) = H P e x (t) H 2 1 level forecasts will accumulate to higher levels, leading vi vi ei;k2E i;k vk , where vi C . to poor results. BU also cannot leverage any training But linear hierarchical aggregation has already covered data that is available at the more granular levels. A most real-world applications. The graph has hierar- more straightforward approach called base forecast is to chies L := fl1; l2; : : : ; li; : : : ; lqg, where li is the set of perform forecasting for each time series independently all vertices belonging to the ith level of hierarchy. Note without considering the structure at
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages17 Page
-
File Size-