Gaussian Process Volatility Model

Gaussian Process Volatility Model Yue Wu [email protected] Jose´ Miguel Hernandez´ Lobato [email protected] Zoubin Ghahramani [email protected] University of Cambridge, Department of Engineering, Cambridge CB2 1PZ, UK Abstract 1986). The accurate prediction of time-changing vari- GARCH has further inspired a host of variants and ex- ances is an important task in the modeling of fi- tensions A review of many of these models can be found nancial data. Standard econometric models are in Hentschel(1995). Most of these GARCH variants at- often limited as they assume rigid functional re- tempt to address one or both limitations of GARCH: a) lationships for the variances. Moreover, function the assumption of a linear dependency between the cur- parameters are usually learned using maximum rent volatility and past volatilities, and b) the assumption likelihood, which can lead to overfitting. To ad- that positive and negative returns have symmetric effects dress these problems we introduce a novel model on volatility. Asymmetric effects are often observed, as for time-changing variances using Gaussian Pro- large negative returns send measures of volatility soaring, cesses. A Gaussian Process (GP) defines a distri- while large positive returns do not (Bekaert & Wu, 2000; bution over functions, which allows us to cap- Campbell & Hentschel, 1992). ture highly flexible functional relationships for the variances. In addition, we develop an on- Most solutions proposed in these GARCH variants involve: line algorithm to perform inference. The algo- a) introducing nonlinear functional relationships for the rithm has two main advantages. First, it takes a volatility, and b) adding asymmetric terms in the functional Bayesian approach, thereby avoiding overfitting. relationships. However, the GARCH variants do not funda- Second, it is much quicker than current offline in- mentally address the problem that the functional relation- ference procedures. Finally, our new model was ship of the volatility is unknown. In addition, these variants evaluated on financial data and showed signifi- can have a high number of parameters, which may lead to cant improvement in predictive performance over overfitting when learned using maximum likelihood. current standard models. More recently, volatility modeling has received attention within the machine learning community, with the develop- ment of copula processes (Wilson & Ghahramani, 2010) 1. Introduction and heteroscedastic Gaussian processes (Lazaro-Gredilla´ & Titsias, 2011). These models leverage the flexibility Time series of financial returns often exhibit heteroscedas- of Gaussian Processes (Rasmussen & Williams, 2006) to ticity, that is the standard deviation or volatility of the re- model the unknown relationship in the variances. How- arXiv:1402.3085v1 [stat.ME] 13 Feb 2014 turns is time-dependent. In particular, large returns (either ever, these models do not address the asymmetric effects of positive or negative) are often followed by returns that are positive and negative returns on volatility. also large in size. The result is that financial time series frequently display periods of low and high volatility. This In this paper we introduce a new non-parametric volatility phenomenon is known as volatility clustering (Cont, 2001). model, called the Gaussian Process Volatility Model (GP- Several univariate models have been proposed for capturing Vol). This new model is more flexible, as it is not lim- this property. The best known are the Autoregressive Con- ited by a fixed functional form. Instead a prior distribution ditional Heteroscedasticity model (ARCH) (Engle, 1982) is placed on possible functions using GPs, and the func- and its extension, the Generalised Autoregressive Con- tional relationship is learned from the data. Furthermore, ditional Heteroscedasticity model (GARCH) (Bollerslev, GP-Vol explicitly models the asymmetric effects on volatility from positive and negative returns. Our new volatility st Proceedings of the 31 International Conference on Machine model is evaluated in a series of experiments on real fi- Learning, Beijing, China, 2014. JMLR: W&CP volume 32. Copy- right 2014 by the author(s). nancial returns, comparing it against popular econometric Gaussian Process Volatility Model models, namely GARCH, EGARCH (Nelson, 1991) and g(xt) = θxt + λ jxtj : GJR-GARCH (Glosten et al., 1993). Overall, we found our proposed model has the best predictive performance. In ad- Asymmetry in the effects of positive and negative returns is dition, the functional relationship learned by our model is introduced through the function g(xt). Then if the coeffi- highly intuitive and automatically discovers the nonlinear cient θ is negative, negative returns will increase volatility. and asymmetric features that previous models attempt to Another popular GARCH extension with asymmetric effect capture. of returns is GJR-GARCH (Glosten et al., 1993): The second main contribution of the paper is the develop- q p r 2 X 2 X 2 X 2 ment of an online algorithm for learning GP-Vol. GP-Vol is σt = α0+ αjxt−j + βiσt−i + γkxt−kIt−k ; an instance of a Gaussian Process State Space Model (GP- j=1 i=1 k=1 SSM). Most previous work on GP-SSMs (Ko & Fox, 2009; (4) Deisenroth et al., 2009; Deisenroth & Mohamed, 2012) ( 0 ; if xt−k ≥ 0 have focused on developing approximation methods for It−k = : 1 ; x < 0 filtering and smoothing the hidden states in GP-SSM, as- if t−k suming known GP transition dynamics. Only very recently 2 The asymmetric effect is captured by γkx It−k, which has Frigola et al.(2013) addressed the problem of learning t−k is nonzero if xt−k < 0. both the hidden states and the transition dynamics by using Particle Gibbs with ancestor sampling (PGAS) (Lind- sten et al., 2012). In this paper, we introduce a new online 3. Gaussian Process State Space Models algorithm for performing inference on GP-SSMs. Our al- GARCH, EGARCH and GJR-GARCH can be all repre- gorithm has similar predictive performance as PGAS on fi- sented as General State-Space or Hidden Markov models nancial datasets, but is much quicker as inference is online. (HMM) (Baum & Petrie, 1966; Doucet et al., 2001), with the unobserved dynamic variances being the hidden states. 2. Review of GARCH and GARCH variants Transition functions for the hidden states are fixed and assumed to be linear in these models. The linear assumption The standard heteroscedastic variance model for financial limits the flexibility of these models. data is GARCH. GARCH assumes a Gaussian observation model (1) and a linear transition function so that the time- More generally, a non-parametric approach can be taken 2 varying variance σt is linearly dependent on p previous where a Gaussian Process prior is placed on the transition variance values and q previous squared time series values: function, so that the functional form can be learned from data. This Gaussian Process state space model (GP-SSM) 2 xt ∼ N (0; σt ) ; (1) is a generalization of HMM. The two class of models dif- q p fer in two main ways. First, in HMM the transition func- 2 X 2 X 2 σt = α0 + αjxt−j + βiσt−i ; (2) tion has fixed functional form, while in GP-SSM it is rep- j=1 i=1 resented by a GP. Second, in GP-SSM the states do not have Markovian structure once the transition function is where xt are the values of the return time series being mod- eled. The GARCH(p,q) generative model is flexible and marginalized out, see Section5 for details. can produce a variety of clustering behavior of high and However, the flexibility of GP-SSMs comes at a cost. low volatility periods for different settings of the model co- Specifically, inference in GP-SSMs is complicated. Most efficients, α1; : : : ; αq and β1; : : : ; βp. previous work on GP-SSMs (Ko & Fox, 2009; Deisenroth While GARCH is flexible, it has several limitations. First, et al., 2009; Deisenroth & Mohamed, 2012) have focused 2 2 on developing approximation methods for filtering and a linear relationship between σt−p:t−1 and σt is assumed. Second, the effect of positive and negative returns is the smoothing the hidden states in GP-SSM assuming known 2 GP dynamics. A few papers considered learning the GP same due to the quadratic term xt−j. However, it is often observed that large negative returns lead to sharp rises dynamics and the states, but for special cases of GP-SSMs. in volatility, while positive returns do not (Bekaert & Wu, For example, Turner et al.(2010) applied EM to obtain 2000; Campbell & Hentschel, 1992). maximum likelihood estimates for parametric systems that can be represented by GPs. Recently, Frigola et al.(2013) A more flexible and often cited GARCH extension is Ex- learned both the hidden states and the GP dynamics using ponential GARCH (EGARCH) (Nelson, 1991): PGAS (Lindsten et al., 2012). Unfortunately PGAS is a q p full MCMC inference method, and can be expensive com- 2 X X 2 log(σt ) = α0 + αjg(xt−j) + βi log(σt−i) ; (3) putationally. In this paper, we present an online Bayesian j=1 i=1 inference algorithm for learning the hidden states v1:T , the Gaussian Process Volatility Model unknown function f, and any hyper-parameters θ of the tion would be of the form: model. Our algorithm has similar predictive performance as PGAS, but is much quicker. vt = f(vt−1; vt−2; :::; vt−p; xt−1; xt−2; :::; xt−q) + : (8) 4. Gaussian Process Volatility Model 5. Bayesian Inference for GP-Vol We introduce now our new nonparametric volatility model, In the standard GP regression setting the inputs and tar- which is an instance of GP-SSM. We call our new model gets are observed, then the function f can be learned using the Gaussian Process Volatility Model (GP-Vol): exact inference. However, this is not the case in GP-Vol, 2 xt ∼ N (0; σt ) ; (5) where some inputs and all targets vt are unknown.

Load more